Search (362 results, page 1 of 19)

  • × theme_ss:"Retrievalalgorithmen"
  1. Behnert, C.; Plassmeier, K.; Borst, T.; Lewandowski, D.: Evaluierung von Rankingverfahren für bibliothekarische Informationssysteme (2019) 0.09
    0.092548564 = product of:
      0.12339809 = sum of:
        0.0052010044 = weight(_text_:a in 5023) [ClassicSimilarity], result of:
          0.0052010044 = score(doc=5023,freq=2.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.089176424 = fieldWeight in 5023, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5023)
        0.11216935 = weight(_text_:70 in 5023) [ClassicSimilarity], result of:
          0.11216935 = score(doc=5023,freq=2.0), product of:
            0.27085114 = queryWeight, product of:
              5.354766 = idf(docFreq=567, maxDocs=44218)
              0.05058132 = queryNorm
            0.41413653 = fieldWeight in 5023, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.354766 = idf(docFreq=567, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5023)
        0.006027733 = product of:
          0.012055466 = sum of:
            0.012055466 = weight(_text_:information in 5023) [ClassicSimilarity], result of:
              0.012055466 = score(doc=5023,freq=2.0), product of:
                0.088794395 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.05058132 = queryNorm
                0.13576832 = fieldWeight in 5023, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5023)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Source
    Information - Wissenschaft und Praxis. 70(2019) H.1, S.14-23
    Type
    a
  2. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.07
    0.07454625 = product of:
      0.1490925 = sum of:
        0.0118880095 = weight(_text_:a in 402) [ClassicSimilarity], result of:
          0.0118880095 = score(doc=402,freq=2.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.20383182 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
        0.13720448 = sum of:
          0.02755535 = weight(_text_:information in 402) [ClassicSimilarity], result of:
            0.02755535 = score(doc=402,freq=2.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.3103276 = fieldWeight in 402, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.125 = fieldNorm(doc=402)
          0.10964913 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
            0.10964913 = score(doc=402,freq=2.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.61904186 = fieldWeight in 402, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.125 = fieldNorm(doc=402)
      0.5 = coord(2/4)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
    Type
    a
  3. González-Ibáñez, R.; Esparza-Villamán, A.; Vargas-Godoy, J.C.; Shah, C.: ¬A comparison of unimodal and multimodal models for implicit detection of relevance in interactive IR (2019) 0.07
    0.07202916 = product of:
      0.09603888 = sum of:
        0.0098289745 = weight(_text_:a in 5417) [ClassicSimilarity], result of:
          0.0098289745 = score(doc=5417,freq=14.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.1685276 = fieldWeight in 5417, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5417)
        0.08012097 = weight(_text_:70 in 5417) [ClassicSimilarity], result of:
          0.08012097 = score(doc=5417,freq=2.0), product of:
            0.27085114 = queryWeight, product of:
              5.354766 = idf(docFreq=567, maxDocs=44218)
              0.05058132 = queryNorm
            0.29581183 = fieldWeight in 5417, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.354766 = idf(docFreq=567, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5417)
        0.0060889297 = product of:
          0.012177859 = sum of:
            0.012177859 = weight(_text_:information in 5417) [ClassicSimilarity], result of:
              0.012177859 = score(doc=5417,freq=4.0), product of:
                0.088794395 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.05058132 = queryNorm
                0.13714671 = fieldWeight in 5417, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5417)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Implicit detection of relevance has been approached by many during the last decade. From the use of individual measures to the use of multiple features from different sources (multimodality), studies have shown the feasibility to automatically detect whether a document is relevant. Despite promising results, it is not clear yet to what extent multimodality constitutes an effective approach compared to unimodality. In this article, we hypothesize that it is possible to build unimodal models capable of outperforming multimodal models in the detection of perceived relevance. To test this hypothesis, we conducted three experiments to compare unimodal and multimodal classification models built using a combination of 24 features. Our classification experiments showed that a univariate unimodal model based on the left-click feature supports our hypothesis. On the other hand, our prediction experiment suggests that multimodality slightly improves early classification compared to the best unimodal models. Based on our results, we argue that the feasibility for practical applications of state-of-the-art multimodal approaches may be strongly constrained by technology, cultural, ethical, and legal aspects, in which case unimodality may offer a better alternative today for supporting relevance detection in interactive information retrieval systems.
    Source
    Journal of the Association for Information Science and Technology. 70(2019) no.11, S.1223-1235
    Type
    a
  4. Jacucci, G.; Barral, O.; Daee, P.; Wenzel, M.; Serim, B.; Ruotsalo, T.; Pluchino, P.; Freeman, J.; Gamberini, L.; Kaski, S.; Blankertz, B.: Integrating neurophysiologic relevance feedback in intent modeling for information retrieval (2019) 0.07
    0.07194084 = product of:
      0.095921114 = sum of:
        0.005253808 = weight(_text_:a in 5356) [ClassicSimilarity], result of:
          0.005253808 = score(doc=5356,freq=4.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.090081796 = fieldWeight in 5356, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5356)
        0.08012097 = weight(_text_:70 in 5356) [ClassicSimilarity], result of:
          0.08012097 = score(doc=5356,freq=2.0), product of:
            0.27085114 = queryWeight, product of:
              5.354766 = idf(docFreq=567, maxDocs=44218)
              0.05058132 = queryNorm
            0.29581183 = fieldWeight in 5356, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.354766 = idf(docFreq=567, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5356)
        0.010546336 = product of:
          0.021092672 = sum of:
            0.021092672 = weight(_text_:information in 5356) [ClassicSimilarity], result of:
              0.021092672 = score(doc=5356,freq=12.0), product of:
                0.088794395 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.05058132 = queryNorm
                0.23754507 = fieldWeight in 5356, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5356)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    The use of implicit relevance feedback from neurophysiology could deliver effortless information retrieval. However, both computing neurophysiologic responses and retrieving documents are characterized by uncertainty because of noisy signals and incomplete or inconsistent representations of the data. We present the first-of-its-kind, fully integrated information retrieval system that makes use of online implicit relevance feedback generated from brain activity as measured through electroencephalography (EEG), and eye movements. The findings of the evaluation experiment (N = 16) show that we are able to compute online neurophysiology-based relevance feedback with performance significantly better than chance in complex data domains and realistic search tasks. We contribute by demonstrating how to integrate in interactive intent modeling this inherently noisy implicit relevance feedback combined with scarce explicit feedback. Although experimental measures of task performance did not allow us to demonstrate how the classification outcomes translated into search task performance, the experiment proved that our approach is able to generate relevance feedback from brain signals and eye movements in a realistic scenario, thus providing promising implications for future work in neuroadaptive information retrieval (IR).
    Footnote
    Beitrag in einem 'Special issue on neuro-information science'.
    Source
    Journal of the Association for Information Science and Technology. 70(2019) no.9, S.917-930
    Type
    a
  5. Jiang, J.-D.; Jiang, J.-Y.; Cheng, P.-J.: Cocluster hypothesis and ranking consistency for relevance ranking in web search (2019) 0.07
    0.07069161 = product of:
      0.09425548 = sum of:
        0.0098289745 = weight(_text_:a in 5247) [ClassicSimilarity], result of:
          0.0098289745 = score(doc=5247,freq=14.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.1685276 = fieldWeight in 5247, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5247)
        0.08012097 = weight(_text_:70 in 5247) [ClassicSimilarity], result of:
          0.08012097 = score(doc=5247,freq=2.0), product of:
            0.27085114 = queryWeight, product of:
              5.354766 = idf(docFreq=567, maxDocs=44218)
              0.05058132 = queryNorm
            0.29581183 = fieldWeight in 5247, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.354766 = idf(docFreq=567, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5247)
        0.0043055234 = product of:
          0.008611047 = sum of:
            0.008611047 = weight(_text_:information in 5247) [ClassicSimilarity], result of:
              0.008611047 = score(doc=5247,freq=2.0), product of:
                0.088794395 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.05058132 = queryNorm
                0.09697737 = fieldWeight in 5247, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5247)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Conventional approaches to relevance ranking typically optimize ranking models by each query separately. The traditional cluster hypothesis also does not consider the dependency between related queries. The goal of this paper is to leverage similar search intents to perform ranking consistency so that the search performance can be improved accordingly. Different from the previous supervised approach, which learns relevance by click-through data, we propose a novel cocluster hypothesis to bridge the gap between relevance ranking and ranking consistency. A nearest-neighbors test is also designed to measure the extent to which the cocluster hypothesis holds. Based on the hypothesis, we further propose a two-stage unsupervised approach, in which two ranking heuristics and a cost function are developed to optimize the combination of consistency and uniqueness (or inconsistency). Extensive experiments have been conducted on a real and large-scale search engine log. The experimental results not only verify the applicability of the proposed cocluster hypothesis but also show that our approach is effective in boosting the retrieval performance of the commercial search engine and reaches a comparable performance to the supervised approach.
    Source
    Journal of the Association for Information Science and Technology. 70(2019) no.6, S.535-546
    Type
    a
  6. Ojala, M.: Commands that RANKle (1997) 0.07
    0.07004078 = product of:
      0.14008155 = sum of:
        0.0118880095 = weight(_text_:a in 428) [ClassicSimilarity], result of:
          0.0118880095 = score(doc=428,freq=8.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.20383182 = fieldWeight in 428, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=428)
        0.12819354 = weight(_text_:70 in 428) [ClassicSimilarity], result of:
          0.12819354 = score(doc=428,freq=2.0), product of:
            0.27085114 = queryWeight, product of:
              5.354766 = idf(docFreq=567, maxDocs=44218)
              0.05058132 = queryNorm
            0.4732989 = fieldWeight in 428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.354766 = idf(docFreq=567, maxDocs=44218)
              0.0625 = fieldNorm(doc=428)
      0.5 = coord(2/4)
    
    Abstract
    Examines the RANK command on DIALOG using a statistical analysis of articles in DATABASE as an example. The RANK command was used to find authors, company names, and length of articles. Use of the command revealed a number of complexities and revealed some problematic indexing on the part of the database producers. The LEXIS-NEXIS RANK command was also used, but this fulfils a different function to the command of the same name in DIALOG
    Source
    Online. 21(1997) no.4, S.70-73
    Type
    a
  7. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.07
    0.06522796 = product of:
      0.13045593 = sum of:
        0.010402009 = weight(_text_:a in 3445) [ClassicSimilarity], result of:
          0.010402009 = score(doc=3445,freq=2.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.17835285 = fieldWeight in 3445, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=3445)
        0.12005392 = sum of:
          0.024110932 = weight(_text_:information in 3445) [ClassicSimilarity], result of:
            0.024110932 = score(doc=3445,freq=2.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.27153665 = fieldWeight in 3445, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.109375 = fieldNorm(doc=3445)
          0.09594299 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
            0.09594299 = score(doc=3445,freq=2.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.5416616 = fieldWeight in 3445, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.109375 = fieldNorm(doc=3445)
      0.5 = coord(2/4)
    
    Date
    25. 8.2005 17:42:22
    Source
    Library and information research news. 24(2000) no.77, S.30-34
    Type
    a
  8. Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.05
    0.04783556 = product of:
      0.09567112 = sum of:
        0.0132912 = weight(_text_:a in 1422) [ClassicSimilarity], result of:
          0.0132912 = score(doc=1422,freq=10.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.22789092 = fieldWeight in 1422, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=1422)
        0.082379915 = sum of:
          0.02755535 = weight(_text_:information in 1422) [ClassicSimilarity], result of:
            0.02755535 = score(doc=1422,freq=8.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.3103276 = fieldWeight in 1422, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0625 = fieldNorm(doc=1422)
          0.054824565 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
            0.054824565 = score(doc=1422,freq=2.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.30952093 = fieldWeight in 1422, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=1422)
      0.5 = coord(2/4)
    
    Abstract
    We propose a novel approach to incorporate term similarity and inverse document frequency into a logical model of information retrieval. The ability of the logic to handle expressive representations along with the use of such classical notions are promising characteristics for IR systems. The approach proposed here has been efficiently implemented and experiments against test collections are presented.
    Date
    22. 3.2003 19:27:23
    Footnote
    Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.4, S.285-301
    Type
    a
  9. Chang, C.-H.; Hsu, C.-C.: Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval (1998) 0.04
    0.04054542 = product of:
      0.08109084 = sum of:
        0.009008404 = weight(_text_:a in 1319) [ClassicSimilarity], result of:
          0.009008404 = score(doc=1319,freq=6.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.1544581 = fieldWeight in 1319, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1319)
        0.07208243 = sum of:
          0.024110932 = weight(_text_:information in 1319) [ClassicSimilarity], result of:
            0.024110932 = score(doc=1319,freq=8.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.27153665 = fieldWeight in 1319, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1319)
          0.047971494 = weight(_text_:22 in 1319) [ClassicSimilarity], result of:
            0.047971494 = score(doc=1319,freq=2.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.2708308 = fieldWeight in 1319, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1319)
      0.5 = coord(2/4)
    
    Abstract
    Keyword based querying has been an immediate and efficient way to specify and retrieve related information that the user inquired. However, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given. Proposes an idea to integrate 2 existing techniques, query expansion and relevance feedback to achieve a concept-based information search for the Web
    Date
    1. 8.1996 22:08:06
    Footnote
    Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia
    Type
    a
  10. Faloutsos, C.: Signature files (1992) 0.04
    0.040245127 = product of:
      0.080490254 = sum of:
        0.0118880095 = weight(_text_:a in 3499) [ClassicSimilarity], result of:
          0.0118880095 = score(doc=3499,freq=8.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.20383182 = fieldWeight in 3499, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=3499)
        0.06860224 = sum of:
          0.013777675 = weight(_text_:information in 3499) [ClassicSimilarity], result of:
            0.013777675 = score(doc=3499,freq=2.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.1551638 = fieldWeight in 3499, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0625 = fieldNorm(doc=3499)
          0.054824565 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
            0.054824565 = score(doc=3499,freq=2.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.30952093 = fieldWeight in 3499, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=3499)
      0.5 = coord(2/4)
    
    Abstract
    Presents a survey and discussion on signature-based text retrieval methods. It describes the main idea behind the signature approach and its advantages over other text retrieval methods, it provides a classification of the signature methods that have appeared in the literature, it describes the main representatives of each class, together with the relative advantages and drawbacks, and it gives a list of applications as well as commercial or university prototypes that use the signature approach
    Date
    7. 5.1999 15:22:48
    Source
    Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates
    Type
    a
  11. Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.04
    0.040245127 = product of:
      0.080490254 = sum of:
        0.0118880095 = weight(_text_:a in 1431) [ClassicSimilarity], result of:
          0.0118880095 = score(doc=1431,freq=8.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.20383182 = fieldWeight in 1431, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=1431)
        0.06860224 = sum of:
          0.013777675 = weight(_text_:information in 1431) [ClassicSimilarity], result of:
            0.013777675 = score(doc=1431,freq=2.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.1551638 = fieldWeight in 1431, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0625 = fieldNorm(doc=1431)
          0.054824565 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
            0.054824565 = score(doc=1431,freq=2.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.30952093 = fieldWeight in 1431, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=1431)
      0.5 = coord(2/4)
    
    Abstract
    Properties of a percentile-based rating scale needed in bibliometrics are formulated. Based on these properties, P100 was recently introduced as a new citation-rank approach (Bornmann, Leydesdorff, & Wang, 2013). In this paper, we conceptualize P100 and propose an improvement which we call P100'. Advantages and disadvantages of citation-rank indicators are noted.
    Date
    22. 8.2014 17:05:18
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.9, S.1939-1943
    Type
    a
  12. Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.04
    0.038199015 = product of:
      0.07639803 = sum of:
        0.0099684 = weight(_text_:a in 2419) [ClassicSimilarity], result of:
          0.0099684 = score(doc=2419,freq=10.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.1709182 = fieldWeight in 2419, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
        0.06642963 = sum of:
          0.025311206 = weight(_text_:information in 2419) [ClassicSimilarity], result of:
            0.025311206 = score(doc=2419,freq=12.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.2850541 = fieldWeight in 2419, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046875 = fieldNorm(doc=2419)
          0.041118424 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
            0.041118424 = score(doc=2419,freq=2.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.23214069 = fieldWeight in 2419, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2419)
      0.5 = coord(2/4)
    
    Abstract
    The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.
    Date
    16.11.2008 16:22:48
    Type
    a
  13. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.04
    0.037573118 = product of:
      0.075146236 = sum of:
        0.012321272 = weight(_text_:a in 1428) [ClassicSimilarity], result of:
          0.012321272 = score(doc=1428,freq=22.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.21126054 = fieldWeight in 1428, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1428)
        0.062824965 = sum of:
          0.02855961 = weight(_text_:information in 1428) [ClassicSimilarity], result of:
            0.02855961 = score(doc=1428,freq=22.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.32163754 = fieldWeight in 1428, product of:
                4.690416 = tf(freq=22.0), with freq of:
                  22.0 = termFreq=22.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1428)
          0.034265354 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
            0.034265354 = score(doc=1428,freq=2.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.19345059 = fieldWeight in 1428, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1428)
      0.5 = coord(2/4)
    
    Abstract
    Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
    Date
    22. 3.2003 19:35:46
    Footnote
    Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.4, S.321-334
    Type
    a
  14. Crestani, F.; Dominich, S.; Lalmas, M.; Rijsbergen, C.J.K. van: Mathematical, logical, and formal methods in information retrieval : an introduction to the special issue (2003) 0.04
    0.0363671 = product of:
      0.0727342 = sum of:
        0.006304569 = weight(_text_:a in 1451) [ClassicSimilarity], result of:
          0.006304569 = score(doc=1451,freq=4.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.10809815 = fieldWeight in 1451, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1451)
        0.06642963 = sum of:
          0.025311206 = weight(_text_:information in 1451) [ClassicSimilarity], result of:
            0.025311206 = score(doc=1451,freq=12.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.2850541 = fieldWeight in 1451, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046875 = fieldNorm(doc=1451)
          0.041118424 = weight(_text_:22 in 1451) [ClassicSimilarity], result of:
            0.041118424 = score(doc=1451,freq=2.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.23214069 = fieldWeight in 1451, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=1451)
      0.5 = coord(2/4)
    
    Abstract
    Research an the use of mathematical, logical, and formal methods, has been central to Information Retrieval research for a long time. Research in this area is important not only because it helps enhancing retrieval effectiveness, but also because it helps clarifying the underlying concepts of Information Retrieval. In this article we outline some of the major aspects of the subject, and summarize the papers of this special issue with respect to how they relate to these aspects. We conclude by highlighting some directions of future research, which are needed to better understand the formal characteristics of Information Retrieval.
    Date
    22. 3.2003 19:27:36
    Footnote
    Einführung zu den Beiträgen eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.4, S.281-284
    Type
    a
  15. Kanaeva, Z.: Ranking: Google und CiteSeer (2005) 0.04
    0.035110753 = product of:
      0.070221506 = sum of:
        0.0052010044 = weight(_text_:a in 3276) [ClassicSimilarity], result of:
          0.0052010044 = score(doc=3276,freq=2.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.089176424 = fieldWeight in 3276, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3276)
        0.0650205 = sum of:
          0.017049003 = weight(_text_:information in 3276) [ClassicSimilarity], result of:
            0.017049003 = score(doc=3276,freq=4.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.1920054 = fieldWeight in 3276, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3276)
          0.047971494 = weight(_text_:22 in 3276) [ClassicSimilarity], result of:
            0.047971494 = score(doc=3276,freq=2.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.2708308 = fieldWeight in 3276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3276)
      0.5 = coord(2/4)
    
    Abstract
    Im Rahmen des klassischen Information Retrieval wurden verschiedene Verfahren für das Ranking sowie die Suche in einer homogenen strukturlosen Dokumentenmenge entwickelt. Die Erfolge der Suchmaschine Google haben gezeigt dass die Suche in einer zwar inhomogenen aber zusammenhängenden Dokumentenmenge wie dem Internet unter Berücksichtigung der Dokumentenverbindungen (Links) sehr effektiv sein kann. Unter den von der Suchmaschine Google realisierten Konzepten ist ein Verfahren zum Ranking von Suchergebnissen (PageRank), das in diesem Artikel kurz erklärt wird. Darüber hinaus wird auf die Konzepte eines Systems namens CiteSeer eingegangen, welches automatisch bibliographische Angaben indexiert (engl. Autonomous Citation Indexing, ACI). Letzteres erzeugt aus einer Menge von nicht vernetzten wissenschaftlichen Dokumenten eine zusammenhängende Dokumentenmenge und ermöglicht den Einsatz von Banking-Verfahren, die auf den von Google genutzten Verfahren basieren.
    Date
    20. 3.2005 16:23:22
    Source
    Information - Wissenschaft und Praxis. 56(2005) H.2, S.87-92
    Type
    a
  16. Campos, L.M. de; Fernández-Luna, J.M.; Huete, J.F.: Implementing relevance feedback in the Bayesian network retrieval model (2003) 0.03
    0.034492277 = product of:
      0.06898455 = sum of:
        0.0099684 = weight(_text_:a in 825) [ClassicSimilarity], result of:
          0.0099684 = score(doc=825,freq=10.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.1709182 = fieldWeight in 825, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=825)
        0.05901615 = sum of:
          0.017897725 = weight(_text_:information in 825) [ClassicSimilarity], result of:
            0.017897725 = score(doc=825,freq=6.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.20156369 = fieldWeight in 825, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046875 = fieldNorm(doc=825)
          0.041118424 = weight(_text_:22 in 825) [ClassicSimilarity], result of:
            0.041118424 = score(doc=825,freq=2.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.23214069 = fieldWeight in 825, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=825)
      0.5 = coord(2/4)
    
    Abstract
    Relevance Feedback consists in automatically formulating a new query according to the relevance judgments provided by the user after evaluating a set of retrieved documents. In this article, we introduce several relevance feedback methods for the Bayesian Network Retrieval ModeL The theoretical frame an which our methods are based uses the concept of partial evidences, which summarize the new pieces of information gathered after evaluating the results obtained by the original query. These partial evidences are inserted into the underlying Bayesian network and a new inference process (probabilities propagation) is run to compute the posterior relevance probabilities of the documents in the collection given the new query. The quality of the proposed methods is tested using a preliminary experimentation with different standard document collections.
    Date
    22. 3.2003 19:30:19
    Footnote
    Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.4, S.302-313
    Type
    a
  17. Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.03
    0.034471694 = product of:
      0.06894339 = sum of:
        0.008307 = weight(_text_:a in 2591) [ClassicSimilarity], result of:
          0.008307 = score(doc=2591,freq=10.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.14243183 = fieldWeight in 2591, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2591)
        0.060636386 = sum of:
          0.012177859 = weight(_text_:information in 2591) [ClassicSimilarity], result of:
            0.012177859 = score(doc=2591,freq=4.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.13714671 = fieldWeight in 2591, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2591)
          0.048458528 = weight(_text_:22 in 2591) [ClassicSimilarity], result of:
            0.048458528 = score(doc=2591,freq=4.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.27358043 = fieldWeight in 2591, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2591)
      0.5 = coord(2/4)
    
    Abstract
    Purpose In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.
    Date
    20. 1.2015 18:30:22
    18. 9.2018 18:22:56
    Source
    Aslib journal of information management. 67(2015) no.6, S.700-714
    Type
    a
  18. Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.03
    0.03396608 = product of:
      0.06793216 = sum of:
        0.008916007 = weight(_text_:a in 2239) [ClassicSimilarity], result of:
          0.008916007 = score(doc=2239,freq=8.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.15287387 = fieldWeight in 2239, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2239)
        0.05901615 = sum of:
          0.017897725 = weight(_text_:information in 2239) [ClassicSimilarity], result of:
            0.017897725 = score(doc=2239,freq=6.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.20156369 = fieldWeight in 2239, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046875 = fieldNorm(doc=2239)
          0.041118424 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
            0.041118424 = score(doc=2239,freq=2.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.23214069 = fieldWeight in 2239, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2239)
      0.5 = coord(2/4)
    
    Abstract
    Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
    Date
    31. 5.2004 19:22:06
    Source
    Journal of the American Society for Information Science and technology. 55(2004) no.7, S.628-636
    Type
    a
  19. Witschel, H.F.: Global term weights in distributed environments (2008) 0.03
    0.03376331 = product of:
      0.06752662 = sum of:
        0.01179477 = weight(_text_:a in 2096) [ClassicSimilarity], result of:
          0.01179477 = score(doc=2096,freq=14.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.20223314 = fieldWeight in 2096, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2096)
        0.055731855 = sum of:
          0.014613431 = weight(_text_:information in 2096) [ClassicSimilarity], result of:
            0.014613431 = score(doc=2096,freq=4.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.16457605 = fieldWeight in 2096, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046875 = fieldNorm(doc=2096)
          0.041118424 = weight(_text_:22 in 2096) [ClassicSimilarity], result of:
            0.041118424 = score(doc=2096,freq=2.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.23214069 = fieldWeight in 2096, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2096)
      0.5 = coord(2/4)
    
    Abstract
    This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the target retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection - an "extended stop word list" - are known and all terms which are not in that list are treated equally. However, the list cannot always be fully estimated from a general-purpose reference corpus, but some "domain-specific stop words" need to be added. A good solution for achieving this is to mix estimates from small samples of the target retrieval collection with ones derived from a reference corpus.
    Date
    1. 8.2008 9:44:22
    Source
    Information processing and management. 44(2008) no.3, S.1049-1061
    Type
    a
  20. Kelledy, F.; Smeaton, A.F.: Signature files and beyond (1996) 0.03
    0.032850128 = product of:
      0.065700255 = sum of:
        0.0099684 = weight(_text_:a in 6973) [ClassicSimilarity], result of:
          0.0099684 = score(doc=6973,freq=10.0), product of:
            0.05832264 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.05058132 = queryNorm
            0.1709182 = fieldWeight in 6973, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=6973)
        0.055731855 = sum of:
          0.014613431 = weight(_text_:information in 6973) [ClassicSimilarity], result of:
            0.014613431 = score(doc=6973,freq=4.0), product of:
              0.088794395 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.05058132 = queryNorm
              0.16457605 = fieldWeight in 6973, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046875 = fieldNorm(doc=6973)
          0.041118424 = weight(_text_:22 in 6973) [ClassicSimilarity], result of:
            0.041118424 = score(doc=6973,freq=2.0), product of:
              0.17712717 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05058132 = queryNorm
              0.23214069 = fieldWeight in 6973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=6973)
      0.5 = coord(2/4)
    
    Abstract
    Proposes that signature files be used as a viable alternative to other indexing strategies such as inverted files for searching through large volumes of text. Demonstrates through simulation, that search times can be further reduced by enhancing the basic signature file concept using deterministic partitioning algorithms which eliminate the need for an exhaustive search of the entire signature file. Reports research to evaluate the performance of some deterministic partitioning algorithms in a non simulated environment using 276 MB of raw newspaper text (taken from the Wall Street Journal) and real user queries. Presents a selection of results to illustrate trends and highlight important aspects of the performance of these methods under realistic rather than simulated operating conditions. As a result of the research reported here certain aspects of this approach to signature files are shown to be found wanting and require improvement. Suggests lines of future research on the partitioning of signature files
    Source
    Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon
    Type
    a

Years

Languages

Types

  • a 337
  • m 12
  • el 8
  • s 5
  • r 3
  • p 2
  • x 2
  • More… Less…