Search (254 results, page 1 of 13)

  • × language_ss:"e"
  • × theme_ss:"Retrievalalgorithmen"
  • × type_ss:"a"
  1. Qi, Q.; Hessen, D.J.; Heijden, P.G.M. van der: Improving information retrieval through correspondenceanalysis instead of latent semantic analysis (2023) 0.00
    0.0026538838 = product of:
      0.015923303 = sum of:
        0.0026870528 = weight(_text_:in in 1045) [ClassicSimilarity], result of:
          0.0026870528 = score(doc=1045,freq=2.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.09017298 = fieldWeight in 1045, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=1045)
        0.00724622 = weight(_text_:der in 1045) [ClassicSimilarity], result of:
          0.00724622 = score(doc=1045,freq=2.0), product of:
            0.048934754 = queryWeight, product of:
              2.2337668 = idf(docFreq=12875, maxDocs=44218)
              0.021906832 = queryNorm
            0.14807922 = fieldWeight in 1045, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.2337668 = idf(docFreq=12875, maxDocs=44218)
              0.046875 = fieldNorm(doc=1045)
        0.005990031 = product of:
          0.017970093 = sum of:
            0.017970093 = weight(_text_:29 in 1045) [ClassicSimilarity], result of:
              0.017970093 = score(doc=1045,freq=2.0), product of:
                0.077061385 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.021906832 = queryNorm
                0.23319192 = fieldWeight in 1045, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1045)
          0.33333334 = coord(1/3)
      0.16666667 = coord(3/18)
    
    Abstract
    The initial dimensions extracted by latent semantic analysis (LSA) of a document-term matrixhave been shown to mainly display marginal effects, which are irrelevant for informationretrieval. To improve the performance of LSA, usually the elements of the raw document-term matrix are weighted and the weighting exponent of singular values can be adjusted.An alternative information retrieval technique that ignores the marginal effects is correspon-dence analysis (CA). In this paper, the information retrieval performance of LSA and CA isempirically compared. Moreover, it is explored whether the two weightings also improve theperformance of CA. The results for four empirical datasets show that CA always performsbetter than LSA. Weighting the elements of the raw data matrix can improve CA; however,it is data dependent and the improvement is small. Adjusting the singular value weightingexponent often improves the performance of CA; however, the extent of the improvementdepends on the dataset and the number of dimensions. (PDF) Improving information retrieval through correspondence analysis instead of latent semantic analysis.
    Date
    15. 9.2023 12:28:29
  2. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.00
    0.0025550222 = product of:
      0.0229952 = sum of:
        0.007165474 = weight(_text_:in in 402) [ClassicSimilarity], result of:
          0.007165474 = score(doc=402,freq=2.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.24046129 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
        0.015829725 = product of:
          0.047489174 = sum of:
            0.047489174 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.047489174 = score(doc=402,freq=2.0), product of:
                0.076713994 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.021906832 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  3. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.00
    0.0022356445 = product of:
      0.0201208 = sum of:
        0.0062697898 = weight(_text_:in in 2134) [ClassicSimilarity], result of:
          0.0062697898 = score(doc=2134,freq=2.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.21040362 = fieldWeight in 2134, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.109375 = fieldNorm(doc=2134)
        0.013851009 = product of:
          0.041553028 = sum of:
            0.041553028 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.041553028 = score(doc=2134,freq=2.0), product of:
                0.076713994 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.021906832 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Date
    30. 3.2001 13:32:22
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  4. Zhang, W.; Korf, R.E.: Performance of linear-space search algorithms (1995) 0.00
    0.0019711377 = product of:
      0.017740238 = sum of:
        0.007756854 = weight(_text_:in in 4744) [ClassicSimilarity], result of:
          0.007756854 = score(doc=4744,freq=6.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.260307 = fieldWeight in 4744, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.078125 = fieldNorm(doc=4744)
        0.009983385 = product of:
          0.029950155 = sum of:
            0.029950155 = weight(_text_:29 in 4744) [ClassicSimilarity], result of:
              0.029950155 = score(doc=4744,freq=2.0), product of:
                0.077061385 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.021906832 = queryNorm
                0.38865322 = fieldWeight in 4744, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4744)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    Search algorithms in artificial intelligence systems that use space linear in the search depth are employed in practice to solve difficult problems optimally, such as planning and scheduling. Studies the average-case performance of linear-space search algorithms, including depth-first branch-and-bound, iterative-deepening, and recursive best-first search
    Date
    2. 8.1996 10:29:15
  5. Cole, C.: Intelligent information retrieval: diagnosing information need : Part II: uncertainty expansion in a prototype of a diagnostic IR tool (1998) 0.00
    0.0019282409 = product of:
      0.017354168 = sum of:
        0.0053741056 = weight(_text_:in in 6432) [ClassicSimilarity], result of:
          0.0053741056 = score(doc=6432,freq=2.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.18034597 = fieldWeight in 6432, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.09375 = fieldNorm(doc=6432)
        0.011980062 = product of:
          0.035940185 = sum of:
            0.035940185 = weight(_text_:29 in 6432) [ClassicSimilarity], result of:
              0.035940185 = score(doc=6432,freq=2.0), product of:
                0.077061385 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.021906832 = queryNorm
                0.46638384 = fieldWeight in 6432, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6432)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Date
    11. 8.2001 14:48:29
  6. Uratani, N.; Takeda, M.: ¬A fast string-searching algorithm for multiple patterns (1993) 0.00
    0.0015769101 = product of:
      0.014192191 = sum of:
        0.0062054833 = weight(_text_:in in 6275) [ClassicSimilarity], result of:
          0.0062054833 = score(doc=6275,freq=6.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.2082456 = fieldWeight in 6275, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0625 = fieldNorm(doc=6275)
        0.007986708 = product of:
          0.023960123 = sum of:
            0.023960123 = weight(_text_:29 in 6275) [ClassicSimilarity], result of:
              0.023960123 = score(doc=6275,freq=2.0), product of:
                0.077061385 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.021906832 = queryNorm
                0.31092256 = fieldWeight in 6275, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6275)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    The string-searching problem is to find all occurrences of pattern(s) in a text string. The Aho-Corasick string searching algorithm simultaneously finds all occurrences of multiple patterns in one pass through the text. The Boyer-Moore algorithm is the fastest algorithm for a single pattern. By combining the ideas of these two algorithms, presents an efficient string searching algorithm for multiple patterns. The algorithm runs in sublinear time, on the average, as the BM algorithm achieves, and its preprocessing time is linear proportional to the sum of the lengths of the patterns like the AC algorithm
    Source
    Information processing and management. 29(1993) no.6, S.775-791
  7. MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.00
    0.0015689273 = product of:
      0.014120346 = sum of:
        0.0062054833 = weight(_text_:in in 5108) [ClassicSimilarity], result of:
          0.0062054833 = score(doc=5108,freq=6.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.2082456 = fieldWeight in 5108, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0625 = fieldNorm(doc=5108)
        0.007914863 = product of:
          0.023744587 = sum of:
            0.023744587 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
              0.023744587 = score(doc=5108,freq=2.0), product of:
                0.076713994 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.021906832 = queryNorm
                0.30952093 = fieldWeight in 5108, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5108)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    In this paper methods for both speeding up passage processing and examining more passages using parallel computers are explored. The number of passages processed are varied in order to examine the effect on retrieval effectiveness and efficiency. The particular algorithm applied has previously been used to good effect in Okapi experiments at TREC. This algorithm and the mechanism for applying parallel computing to speed up processing are described.
    Date
    20. 1.2007 18:30:22
  8. Käki, M.: fKWIC: frequency-based Keyword-in-Context Index for filtering Web search results (2006) 0.00
    0.0014554784 = product of:
      0.013099305 = sum of:
        0.007109274 = weight(_text_:in in 6112) [ClassicSimilarity], result of:
          0.007109274 = score(doc=6112,freq=14.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.23857531 = fieldWeight in 6112, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=6112)
        0.005990031 = product of:
          0.017970093 = sum of:
            0.017970093 = weight(_text_:29 in 6112) [ClassicSimilarity], result of:
              0.017970093 = score(doc=6112,freq=2.0), product of:
                0.077061385 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.021906832 = queryNorm
                0.23319192 = fieldWeight in 6112, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6112)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    Enormous Web search engine databases combined with short search queries result in large result sets that are often difficult to access. Result ranking works fairly well, but users need help when it fails. For these situations, we propose a filtering interface that is inspired by keyword-in-context (KWIC) indices. The user interface lists the most frequent keyword contexts (fKWIC). When a context is selected, the corresponding results are displayed in the result list, allowing users to concentrate on the specific context. We compared the keyword context index user interface to the rank order result listing in an experiment with 36 participants. The results show that the proposed user interface was 29% faster in finding relevant results, and the precision of the selected results was 19% higher. In addition, participants showed positive attitudes toward the system.
  9. Vechtomova, O.; Karamuftuoglu, M.: Lexical cohesion and term proximity in document ranking (2008) 0.00
    0.0014503849 = product of:
      0.013053464 = sum of:
        0.0050667557 = weight(_text_:in in 2101) [ClassicSimilarity], result of:
          0.0050667557 = score(doc=2101,freq=4.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.17003182 = fieldWeight in 2101, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0625 = fieldNorm(doc=2101)
        0.007986708 = product of:
          0.023960123 = sum of:
            0.023960123 = weight(_text_:29 in 2101) [ClassicSimilarity], result of:
              0.023960123 = score(doc=2101,freq=2.0), product of:
                0.077061385 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.021906832 = queryNorm
                0.31092256 = fieldWeight in 2101, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2101)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    We demonstrate effective new methods of document ranking based on lexical cohesive relationships between query terms. The proposed methods rely solely on the lexical relationships between original query terms, and do not involve query expansion or relevance feedback. Two types of lexical cohesive relationship information between query terms are used in document ranking: short-distance collocation relationship between query terms, and long-distance relationship, determined by the collocation of query terms with other words. The methods are evaluated on TREC corpora, and show improvements over baseline systems.
    Date
    1. 8.2008 12:29:05
  10. Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.00
    0.001442402 = product of:
      0.012981618 = sum of:
        0.0050667557 = weight(_text_:in in 1431) [ClassicSimilarity], result of:
          0.0050667557 = score(doc=1431,freq=4.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.17003182 = fieldWeight in 1431, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0625 = fieldNorm(doc=1431)
        0.007914863 = product of:
          0.023744587 = sum of:
            0.023744587 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
              0.023744587 = score(doc=1431,freq=2.0), product of:
                0.076713994 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.021906832 = queryNorm
                0.30952093 = fieldWeight in 1431, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1431)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    Properties of a percentile-based rating scale needed in bibliometrics are formulated. Based on these properties, P100 was recently introduced as a new citation-rank approach (Bornmann, Leydesdorff, & Wang, 2013). In this paper, we conceptualize P100 and propose an improvement which we call P100'. Advantages and disadvantages of citation-rank indicators are noted.
    Date
    22. 8.2014 17:05:18
  11. Van der Veer Martens, B.; Fleet, C. van: Opening the black box of "relevance work" : a domain analysis (2012) 0.00
    0.0014022585 = product of:
      0.012620326 = sum of:
        0.0053741056 = weight(_text_:in in 247) [ClassicSimilarity], result of:
          0.0053741056 = score(doc=247,freq=8.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.18034597 = fieldWeight in 247, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=247)
        0.00724622 = weight(_text_:der in 247) [ClassicSimilarity], result of:
          0.00724622 = score(doc=247,freq=2.0), product of:
            0.048934754 = queryWeight, product of:
              2.2337668 = idf(docFreq=12875, maxDocs=44218)
              0.021906832 = queryNorm
            0.14807922 = fieldWeight in 247, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.2337668 = idf(docFreq=12875, maxDocs=44218)
              0.046875 = fieldNorm(doc=247)
      0.11111111 = coord(2/18)
    
    Abstract
    In response to Hjørland's recent call for a reconceptualization of the foundations of relevance, we suggest that the sociocognitive aspects of intermediation by information agencies, such as archives and libraries, are a necessary and unexplored part of the infrastructure of the subject knowledge domains central to his recommended "view of relevance informed by a social paradigm" (2010, p. 217). From a comparative analysis of documents from 39 graduate-level introductory courses in archives, reference, and strategic/competitive intelligence taught in 13 American Library Association-accredited library and information science (LIS) programs, we identify four defining sociocognitive dimensions of "relevance work" in information agencies within Hjørland's proposed framework for relevance: tasks, time, systems, and assessors. This study is intended to supply sociocognitive content from within the relevance work domain to support further domain analytic research, and to emphasize the importance of intermediary relevance work for all subject knowledge domains.
  12. Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.00
    0.0013867489 = product of:
      0.0124807395 = sum of:
        0.0054849237 = weight(_text_:in in 2591) [ClassicSimilarity], result of:
          0.0054849237 = score(doc=2591,freq=12.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.18406484 = fieldWeight in 2591, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2591)
        0.006995816 = product of:
          0.020987447 = sum of:
            0.020987447 = weight(_text_:22 in 2591) [ClassicSimilarity], result of:
              0.020987447 = score(doc=2591,freq=4.0), product of:
                0.076713994 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.021906832 = queryNorm
                0.27358043 = fieldWeight in 2591, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2591)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    Purpose In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.
    Date
    20. 1.2015 18:30:22
    18. 9.2018 18:22:56
  13. Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.00
    0.0013331628 = product of:
      0.011998464 = sum of:
        0.006008433 = weight(_text_:in in 2648) [ClassicSimilarity], result of:
          0.006008433 = score(doc=2648,freq=10.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.20163295 = fieldWeight in 2648, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=2648)
        0.005990031 = product of:
          0.017970093 = sum of:
            0.017970093 = weight(_text_:29 in 2648) [ClassicSimilarity], result of:
              0.017970093 = score(doc=2648,freq=2.0), product of:
                0.077061385 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.021906832 = queryNorm
                0.23319192 = fieldWeight in 2648, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2648)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    An inverted index stores, for each term that appears in a collection of documents, a list of document numbers containing that term. Such an index is indispensible when Boolean or informal ranked queries are to be answered. Construction of the index ist, however, a non trivial task. Simple methods using in.memory data structures cannot be used for large collections because they require too much random access storage, and traditional disc based methods require large amounts of temporary file space. Describes a new indexing algorithm designed to create large compressed inverted indexes in situ. It makes use of simple compression codes for the positive integers and an in place external multi way merge sort. The new techniques has been used to invert a 2-gigabyte text collection in under 4 hours, using less than 40 megabytes of temporary disc space, and less than 20 megabytes of main memory
    Date
    27.11.1995 21:29:58
  14. Kaszkiel, M.; Zobel, J.: Effective ranking with arbitrary passages (2001) 0.00
    0.0013331628 = product of:
      0.011998464 = sum of:
        0.006008433 = weight(_text_:in in 5764) [ClassicSimilarity], result of:
          0.006008433 = score(doc=5764,freq=10.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.20163295 = fieldWeight in 5764, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=5764)
        0.005990031 = product of:
          0.017970093 = sum of:
            0.017970093 = weight(_text_:29 in 5764) [ClassicSimilarity], result of:
              0.017970093 = score(doc=5764,freq=2.0), product of:
                0.077061385 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.021906832 = queryNorm
                0.23319192 = fieldWeight in 5764, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5764)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material among otherwise irrelevant text. In this article, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents
    Date
    29. 9.2001 14:00:39
  15. Kekäläinen, J.: Binary and graded relevance in IR evaluations : comparison of the effects on ranking of IR systems (2005) 0.00
    0.0013331628 = product of:
      0.011998464 = sum of:
        0.006008433 = weight(_text_:in in 1036) [ClassicSimilarity], result of:
          0.006008433 = score(doc=1036,freq=10.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.20163295 = fieldWeight in 1036, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=1036)
        0.005990031 = product of:
          0.017970093 = sum of:
            0.017970093 = weight(_text_:29 in 1036) [ClassicSimilarity], result of:
              0.017970093 = score(doc=1036,freq=2.0), product of:
                0.077061385 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.021906832 = queryNorm
                0.23319192 = fieldWeight in 1036, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1036)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    In this study the rankings of IR systems based on binary and graded relevance in TREC 7 and 8 data are compared. Relevance of a sample TREC results is reassessed using a relevance scale with four levels: non-relevant, marginally relevant, fairly relevant, highly relevant. Twenty-one topics and 90 systems from TREC 7 and 20 topics and 121 systems from TREC 8 form the data. Binary precision, and cumulated gain, discounted cumulated gain and normalised discounted cumulated gain are the measures compared. Different weighting schemes for relevance levels are tested with cumulated gain measures. Kendall's rank correlations are computed to determine to what extent the rankings produced by different measures are similar. Weighting schemes from binary to emphasising highly relevant documents form a continuum, where the measures correlate strongly in the binary end, and less in the heavily weighted end. The results show the different character of the measures.
    Date
    26.12.2007 20:29:18
  16. Witschel, H.F.: Global term weights in distributed environments (2008) 0.00
    0.0013271755 = product of:
      0.01194458 = sum of:
        0.006008433 = weight(_text_:in in 2096) [ClassicSimilarity], result of:
          0.006008433 = score(doc=2096,freq=10.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.20163295 = fieldWeight in 2096, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=2096)
        0.0059361467 = product of:
          0.01780844 = sum of:
            0.01780844 = weight(_text_:22 in 2096) [ClassicSimilarity], result of:
              0.01780844 = score(doc=2096,freq=2.0), product of:
                0.076713994 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.021906832 = queryNorm
                0.23214069 = fieldWeight in 2096, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2096)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the target retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection - an "extended stop word list" - are known and all terms which are not in that list are treated equally. However, the list cannot always be fully estimated from a general-purpose reference corpus, but some "domain-specific stop words" need to be added. A good solution for achieving this is to mix estimates from small samples of the target retrieval collection with ones derived from a reference corpus.
    Date
    1. 8.2008 9:44:22
  17. Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.00
    0.0013271755 = product of:
      0.01194458 = sum of:
        0.006008433 = weight(_text_:in in 2419) [ClassicSimilarity], result of:
          0.006008433 = score(doc=2419,freq=10.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.20163295 = fieldWeight in 2419, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
        0.0059361467 = product of:
          0.01780844 = sum of:
            0.01780844 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
              0.01780844 = score(doc=2419,freq=2.0), product of:
                0.076713994 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.021906832 = queryNorm
                0.23214069 = fieldWeight in 2419, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2419)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.
    Date
    16.11.2008 16:22:48
    Series
    Lecture notes in computer science; vol.3232
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  18. Campos, L.M. de; Fernández-Luna, J.M.; Huete, J.F.: Implementing relevance feedback in the Bayesian network retrieval model (2003) 0.00
    0.0013271755 = product of:
      0.01194458 = sum of:
        0.006008433 = weight(_text_:in in 825) [ClassicSimilarity], result of:
          0.006008433 = score(doc=825,freq=10.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.20163295 = fieldWeight in 825, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=825)
        0.0059361467 = product of:
          0.01780844 = sum of:
            0.01780844 = weight(_text_:22 in 825) [ClassicSimilarity], result of:
              0.01780844 = score(doc=825,freq=2.0), product of:
                0.076713994 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.021906832 = queryNorm
                0.23214069 = fieldWeight in 825, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=825)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    Relevance Feedback consists in automatically formulating a new query according to the relevance judgments provided by the user after evaluating a set of retrieved documents. In this article, we introduce several relevance feedback methods for the Bayesian Network Retrieval ModeL The theoretical frame an which our methods are based uses the concept of partial evidences, which summarize the new pieces of information gathered after evaluating the results obtained by the original query. These partial evidences are inserted into the underlying Bayesian network and a new inference process (probabilities propagation) is run to compute the posterior relevance probabilities of the documents in the collection given the new query. The quality of the proposed methods is tested using a preliminary experimentation with different standard document collections.
    Date
    22. 3.2003 19:30:19
    Footnote
    Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
  19. Calegari, S.; Sanchez, E.: Object-fuzzy concept network : an enrichment of ontologies in semantic information retrieval (2008) 0.00
    0.001301036 = product of:
      0.011709324 = sum of:
        0.0067176316 = weight(_text_:in in 2393) [ClassicSimilarity], result of:
          0.0067176316 = score(doc=2393,freq=18.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.22543246 = fieldWeight in 2393, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2393)
        0.0049916925 = product of:
          0.0149750775 = sum of:
            0.0149750775 = weight(_text_:29 in 2393) [ClassicSimilarity], result of:
              0.0149750775 = score(doc=2393,freq=2.0), product of:
                0.077061385 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.021906832 = queryNorm
                0.19432661 = fieldWeight in 2393, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2393)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    This article shows how a fuzzy ontology-based approach can improve semantic documents retrieval. After formally defining a fuzzy ontology and a fuzzy knowledge base, a special type of new fuzzy relationship called (semantic) correlation, which links the concepts or entities in a fuzzy ontology, is discussed. These correlations, first assigned by experts, are updated after querying or when a document has been inserted into a database. Moreover, in order to define a dynamic knowledge of a domain adapting itself to the context, it is shown how to handle a tradeoff between the correct definition of an object, taken in the ontology structure, and the actual meaning assigned by individuals. The notion of a fuzzy concept network is extended, incorporating database objects so that entities and documents can similarly be represented in the network. Information retrieval (IR) algorithm, using an object-fuzzy concept network (O-FCN), is introduced and described. This algorithm allows us to derive a unique path among the entities involved in the query to obtain maxima semantic associations in the knowledge domain. Finally, the study has been validated by querying a database using fuzzy recall, fuzzy precision, and coefficient variant measures in the crisp and fuzzy cases.
    Date
    9.11.2008 13:07:29
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  20. Thompson, P.: Looking back: on relevance, probabilistic indexing and information retrieval (2008) 0.00
    0.0012854938 = product of:
      0.011569444 = sum of:
        0.003582737 = weight(_text_:in in 2074) [ClassicSimilarity], result of:
          0.003582737 = score(doc=2074,freq=2.0), product of:
            0.029798867 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.021906832 = queryNorm
            0.120230645 = fieldWeight in 2074, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0625 = fieldNorm(doc=2074)
        0.007986708 = product of:
          0.023960123 = sum of:
            0.023960123 = weight(_text_:29 in 2074) [ClassicSimilarity], result of:
              0.023960123 = score(doc=2074,freq=2.0), product of:
                0.077061385 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.021906832 = queryNorm
                0.31092256 = fieldWeight in 2074, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2074)
          0.33333334 = coord(1/3)
      0.11111111 = coord(2/18)
    
    Abstract
    Forty-eight years ago Maron and Kuhns published their paper, "On Relevance, Probabilistic Indexing and Information Retrieval" (1960). This was the first paper to present a probabilistic approach to information retrieval, and perhaps the first paper on ranked retrieval. Although it is one of the most widely cited papers in the field of information retrieval, many researchers today may not be familiar with its influence. This paper describes the Maron and Kuhns article and the influence that it has had on the field of information retrieval.
    Date
    31. 7.2008 19:58:29