Document (#34464)

Author
Dumais, S.T.
Title
Latent semantic analysis
Source
Annual review of information science and technology. 38(2004), S.189-230
Year
2003
Abstract
Latent Semantic Analysis (LSA) was first introduced in Dumais, Furnas, Landauer, and Deerwester (1988) and Deerwester, Dumais, Furnas, Landauer, and Harshman (1990) as a technique for improving information retrieval. The key insight in LSA was to reduce the dimensionality of the information retrieval problem. Most approaches to retrieving information depend an a lexical match between words in the user's query and those in documents. Indeed, this lexical matching is the way that the popular Web and enterprise search engines work. Such systems are, however, far from ideal. We are all aware of the tremendous amount of irrelevant information that is retrieved when searching. We also fail to find much of the existing relevant material. LSA was designed to address these retrieval problems, using dimension reduction techniques. Fundamental characteristics of human word usage underlie these retrieval failures. People use a wide variety of words to describe the same object or concept (synonymy). Furnas, Landauer, Gomez, and Dumais (1987) showed that people generate the same keyword to describe well-known objects only 20 percent of the time. Poor agreement was also observed in studies of inter-indexer consistency (e.g., Chan, 1989; Tarr & Borko, 1974) in the generation of search terms (e.g., Fidel, 1985; Bates, 1986), and in the generation of hypertext links (Furner, Ellis, & Willett, 1999). Because searchers and authors often use different words, relevant materials are missed. Someone looking for documents an "human-computer interaction" will not find articles that use only the phrase "man-machine studies" or "human factors." People also use the same word to refer to different things (polysemy). Words like "saturn," "jaguar," or "chip" have several different meanings. A short query like "saturn" will thus return many irrelevant documents. The query "Saturn Gar" will return fewer irrelevant items, but it will miss some documents that use only the terms "Saturn automobile." In searching, there is a constant tension between being overly specific and missing relevant information, and being more general and returning irrelevant information.
A number of approaches have been developed in information retrieval to address the problems caused by the variability in word usage. Stemming is a popular technique used to normalize some kinds of surface-level variability by converting words to their morphological root. For example, the words "retrieve," "retrieval," "retrieved," and "retrieving" would all be converted to their root form, "retrieve." The root form is used for both document and query processing. Stemming sometimes helps retrieval, although not much (Harman, 1991; Hull, 1996). And, it does not address Gases where related words are not morphologically related (e.g., physician and doctor). Controlled vocabularies have also been used to limit variability by requiring that query and index terms belong to a pre-defined set of terms. Documents are indexed by a specified or authorized list of subject headings or index terms, called the controlled vocabulary. Library of Congress Subject Headings, Medical Subject Headings, Association for Computing Machinery (ACM) keywords, and Yellow Pages headings are examples of controlled vocabularies. If searchers can find the right controlled vocabulary terms, they do not have to think of all the morphologically related or synonymous terms that authors might have used. However, assigning controlled vocabulary terms in a consistent and thorough manner is a time-consuming and usually manual process. A good deal of research has been published about the effectiveness of controlled vocabulary indexing compared to full text indexing (e.g., Bates, 1998; Lancaster, 1986; Svenonius, 1986). The combination of both full text and controlled vocabularies is often better than either alone, although the size of the advantage is variable (Lancaster, 1986; Markey, Atherton, & Newton, 1982; Srinivasan, 1996). Richer thesauri have also been used to provide synonyms, generalizations, and specializations of users' search terms (see Srinivasan, 1992, for a review). Controlled vocabularies and thesaurus entries can be generated either manually or by the automatic analysis of large collections of texts.
With the advent of large-scale collections of full text, statistical approaches are being used more and more to analyze the relationships among terms and documents. LSA takes this approach. LSA induces knowledge about the meanings of documents and words by analyzing large collections of texts. The approach simultaneously models the relationships among documents based an their constituent words, and the relationships between words based an their occurrence in documents. By using fewer dimensions for representation than there are unique words, LSA induces similarities among terms that are useful in solving the information retrieval problems described earlier. LSA is a fully automatic statistical approach to extracting relations among words by means of their contexts of use in documents, passages, or sentences. It makes no use of natural language processing techniques for analyzing morphological, syntactic, or semantic relations. Nor does it use humanly constructed resources like dictionaries, thesauri, lexical reference systems (e.g., WordNet), semantic networks, or other knowledge representations. Its only input is large amounts of texts. LSA is an unsupervised learning technique. It starts with a large collection of texts, builds a term-document matrix, and tries to uncover some similarity structures that are useful for information retrieval and related text-analysis problems. Several recent ARIST chapters have focused an text mining and discovery (Benoit, 2002; Solomon, 2002; Trybula, 2000). These chapters provide complementary coverage of the field of text analysis.
Theme
Literaturübersicht
Object
Latent Semantic Indexing

Similar documents (author)

  1. Gordon, M.D.; Dumais, S.: Using latent semantic indexing for literature based discovery (1998) 4.67
    4.670967 = sum of:
      4.670967 = weight(author_txt:dumais in 6306) [ClassicSimilarity], result of:
        4.670967 = fieldWeight in 6306, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.341934 = idf(docFreq=9, maxDocs=41962)
          0.5 = fieldNorm(doc=6306)
    
  2. Dumais, S.; Chen, H.: Hierarchical classification of Web content (2000) 4.67
    4.670967 = sum of:
      4.670967 = weight(author_txt:dumais in 1493) [ClassicSimilarity], result of:
        4.670967 = fieldWeight in 1493, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.341934 = idf(docFreq=9, maxDocs=41962)
          0.5 = fieldNorm(doc=1493)
    
  3. Dumais, S.T.; Belkin, N.J.: ¬The TREC interactive tracks : putting the user into search (2005) 4.67
    4.670967 = sum of:
      4.670967 = weight(author_txt:dumais in 82) [ClassicSimilarity], result of:
        4.670967 = fieldWeight in 82, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.341934 = idf(docFreq=9, maxDocs=41962)
          0.5 = fieldNorm(doc=82)
    
  4. Teevan, J.; Dumais, S.: Web retrieval, ranking and personalization (2011) 4.67
    4.670967 = sum of:
      4.670967 = weight(author_txt:dumais in 2552) [ClassicSimilarity], result of:
        4.670967 = fieldWeight in 2552, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.341934 = idf(docFreq=9, maxDocs=41962)
          0.5 = fieldNorm(doc=2552)
    
  5. Berry, M.W.; Dumais, S.T.; O'Brien, G.W.: Using linear algebra for intelligent information retrieval (1995) 3.50
    3.5032253 = sum of:
      3.5032253 = weight(author_txt:dumais in 4207) [ClassicSimilarity], result of:
        3.5032253 = fieldWeight in 4207, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.341934 = idf(docFreq=9, maxDocs=41962)
          0.375 = fieldNorm(doc=4207)
    

Similar documents (content)

  1. Krovetz, R.; Croft, W.B.: Lexical ambiguity and information retrieval (1992) 0.33
    0.3335255 = sum of:
      0.3335255 = product of:
        0.8338137 = sum of:
          0.031300455 = weight(abstract_txt:analysis in 4028) [ClassicSimilarity], result of:
            0.031300455 = score(doc=4028,freq=1.0), product of:
              0.08999217 = queryWeight, product of:
                1.0283569 = boost
                3.7100062 = idf(docFreq=2791, maxDocs=41962)
                0.023587734 = queryNorm
              0.34781307 = fieldWeight in 4028, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7100062 = idf(docFreq=2791, maxDocs=41962)
                0.09375 = fieldNorm(doc=4028)
          0.14620757 = weight(abstract_txt:lexical in 4028) [ClassicSimilarity], result of:
            0.14620757 = score(doc=4028,freq=2.0), product of:
              0.16834153 = queryWeight, product of:
                1.0894637 = boost
                6.5507693 = idf(docFreq=162, maxDocs=41962)
                0.023587734 = queryNorm
              0.8685175 = fieldWeight in 4028, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5507693 = idf(docFreq=162, maxDocs=41962)
                0.09375 = fieldNorm(doc=4028)
          0.021901108 = weight(abstract_txt:that in 4028) [ClassicSimilarity], result of:
            0.021901108 = score(doc=4028,freq=2.0), product of:
              0.068479806 = queryWeight, product of:
                1.2035358 = boost
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.023587734 = queryNorm
              0.3198185 = fieldWeight in 4028, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.09375 = fieldNorm(doc=4028)
          0.022655148 = weight(abstract_txt:information in 4028) [ClassicSimilarity], result of:
            0.022655148 = score(doc=4028,freq=2.0), product of:
              0.07004273 = queryWeight, product of:
                1.2171925 = boost
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.023587734 = queryNorm
              0.32344753 = fieldWeight in 4028, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.09375 = fieldNorm(doc=4028)
          0.041657988 = weight(abstract_txt:have in 4028) [ClassicSimilarity], result of:
            0.041657988 = score(doc=4028,freq=2.0), product of:
              0.09667982 = queryWeight, product of:
                1.2611694 = boost
                3.2499464 = idf(docFreq=4422, maxDocs=41962)
                0.023587734 = queryNorm
              0.43088606 = fieldWeight in 4028, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2499464 = idf(docFreq=4422, maxDocs=41962)
                0.09375 = fieldNorm(doc=4028)
          0.06526769 = weight(abstract_txt:query in 4028) [ClassicSimilarity], result of:
            0.06526769 = score(doc=4028,freq=1.0), product of:
              0.14688241 = queryWeight, product of:
                1.3137914 = boost
                4.739769 = idf(docFreq=996, maxDocs=41962)
                0.023587734 = queryNorm
              0.44435334 = fieldWeight in 4028, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.739769 = idf(docFreq=996, maxDocs=41962)
                0.09375 = fieldNorm(doc=4028)
          0.04906843 = weight(abstract_txt:text in 4028) [ClassicSimilarity], result of:
            0.04906843 = score(doc=4028,freq=1.0), product of:
              0.12905242 = queryWeight, product of:
                1.3490101 = boost
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.023587734 = queryNorm
              0.38022092 = fieldWeight in 4028, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.09375 = fieldNorm(doc=4028)
          0.079253025 = weight(abstract_txt:retrieval in 4028) [ClassicSimilarity], result of:
            0.079253025 = score(doc=4028,freq=3.0), product of:
              0.14100416 = queryWeight, product of:
                1.7270055 = boost
                3.4614017 = idf(docFreq=3579, maxDocs=41962)
                0.023587734 = queryNorm
              0.5620616 = fieldWeight in 4028, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4614017 = idf(docFreq=3579, maxDocs=41962)
                0.09375 = fieldNorm(doc=4028)
          0.14848953 = weight(abstract_txt:documents in 4028) [ClassicSimilarity], result of:
            0.14848953 = score(doc=4028,freq=3.0), product of:
              0.22195815 = queryWeight, product of:
                2.2839787 = boost
                4.1199584 = idf(docFreq=1852, maxDocs=41962)
                0.023587734 = queryNorm
              0.6689979 = fieldWeight in 4028, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1199584 = idf(docFreq=1852, maxDocs=41962)
                0.09375 = fieldNorm(doc=4028)
          0.2280128 = weight(abstract_txt:words in 4028) [ClassicSimilarity], result of:
            0.2280128 = score(doc=4028,freq=1.0), product of:
              0.45277336 = queryWeight, product of:
                3.5734487 = boost
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.023587734 = queryNorm
              0.5035915 = fieldWeight in 4028, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.09375 = fieldNorm(doc=4028)
        0.4 = coord(10/25)
    
  2. Vechtomova, O.; Karamuftuoglum, M.; Robertson, S.E.: On document relevance and lexical cohesion between query terms (2006) 0.33
    0.33032456 = sum of:
      0.33032456 = product of:
        0.91756815 = sum of:
          0.05299963 = weight(abstract_txt:semantic in 2988) [ClassicSimilarity], result of:
            0.05299963 = score(doc=2988,freq=2.0), product of:
              0.106371924 = queryWeight, product of:
                4.509629 = idf(docFreq=1254, maxDocs=41962)
                0.023587734 = queryNorm
              0.4982483 = fieldWeight in 2988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.509629 = idf(docFreq=1254, maxDocs=41962)
                0.078125 = fieldNorm(doc=2988)
          0.24367929 = weight(abstract_txt:lexical in 2988) [ClassicSimilarity], result of:
            0.24367929 = score(doc=2988,freq=8.0), product of:
              0.16834153 = queryWeight, product of:
                1.0894637 = boost
                6.5507693 = idf(docFreq=162, maxDocs=41962)
                0.023587734 = queryNorm
              1.4475292 = fieldWeight in 2988, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.5507693 = idf(docFreq=162, maxDocs=41962)
                0.078125 = fieldNorm(doc=2988)
          0.018250924 = weight(abstract_txt:that in 2988) [ClassicSimilarity], result of:
            0.018250924 = score(doc=2988,freq=2.0), product of:
              0.068479806 = queryWeight, product of:
                1.2035358 = boost
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.023587734 = queryNorm
              0.2665154 = fieldWeight in 2988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.078125 = fieldNorm(doc=2988)
          0.013349674 = weight(abstract_txt:information in 2988) [ClassicSimilarity], result of:
            0.013349674 = score(doc=2988,freq=1.0), product of:
              0.07004273 = queryWeight, product of:
                1.2171925 = boost
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.023587734 = queryNorm
              0.19059329 = fieldWeight in 2988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.078125 = fieldNorm(doc=2988)
          0.09420579 = weight(abstract_txt:query in 2988) [ClassicSimilarity], result of:
            0.09420579 = score(doc=2988,freq=3.0), product of:
              0.14688241 = queryWeight, product of:
                1.3137914 = boost
                4.739769 = idf(docFreq=996, maxDocs=41962)
                0.023587734 = queryNorm
              0.64136875 = fieldWeight in 2988, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.739769 = idf(docFreq=996, maxDocs=41962)
                0.078125 = fieldNorm(doc=2988)
          0.08178072 = weight(abstract_txt:text in 2988) [ClassicSimilarity], result of:
            0.08178072 = score(doc=2988,freq=4.0), product of:
              0.12905242 = queryWeight, product of:
                1.3490101 = boost
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.023587734 = queryNorm
              0.63370156 = fieldWeight in 2988, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.078125 = fieldNorm(doc=2988)
          0.03813063 = weight(abstract_txt:retrieval in 2988) [ClassicSimilarity], result of:
            0.03813063 = score(doc=2988,freq=1.0), product of:
              0.14100416 = queryWeight, product of:
                1.7270055 = boost
                3.4614017 = idf(docFreq=3579, maxDocs=41962)
                0.023587734 = queryNorm
              0.270422 = fieldWeight in 2988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4614017 = idf(docFreq=3579, maxDocs=41962)
                0.078125 = fieldNorm(doc=2988)
          0.10645581 = weight(abstract_txt:terms in 2988) [ClassicSimilarity], result of:
            0.10645581 = score(doc=2988,freq=2.0), product of:
              0.23724784 = queryWeight, product of:
                2.4765892 = boost
                4.061272 = idf(docFreq=1964, maxDocs=41962)
                0.023587734 = queryNorm
              0.4487114 = fieldWeight in 2988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.061272 = idf(docFreq=1964, maxDocs=41962)
                0.078125 = fieldNorm(doc=2988)
          0.26871568 = weight(abstract_txt:words in 2988) [ClassicSimilarity], result of:
            0.26871568 = score(doc=2988,freq=2.0), product of:
              0.45277336 = queryWeight, product of:
                3.5734487 = boost
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.023587734 = queryNorm
              0.5934883 = fieldWeight in 2988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.078125 = fieldNorm(doc=2988)
        0.36 = coord(9/25)
    
  3. Aldana, J.F.; Gómez, A.C.; Moreno, N.; Nebro, A.J.; Roldán, M.M.: Metadata functionality for semantic Web integration (2003) 0.33
    0.3254977 = sum of:
      0.3254977 = product of:
        0.62595713 = sum of:
          0.050279867 = weight(abstract_txt:semantic in 3732) [ClassicSimilarity], result of:
            0.050279867 = score(doc=3732,freq=5.0), product of:
              0.106371924 = queryWeight, product of:
                4.509629 = idf(docFreq=1254, maxDocs=41962)
                0.023587734 = queryNorm
              0.47267985 = fieldWeight in 3732, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.509629 = idf(docFreq=1254, maxDocs=41962)
                0.046875 = fieldNorm(doc=3732)
          0.032919962 = weight(abstract_txt:among in 3732) [ClassicSimilarity], result of:
            0.032919962 = score(doc=3732,freq=2.0), product of:
              0.10885553 = queryWeight, product of:
                1.0116068 = boost
                4.561971 = idf(docFreq=1190, maxDocs=41962)
                0.023587734 = queryNorm
              0.30241883 = fieldWeight in 3732, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.561971 = idf(docFreq=1190, maxDocs=41962)
                0.046875 = fieldNorm(doc=3732)
          0.014324305 = weight(abstract_txt:used in 3732) [ClassicSimilarity], result of:
            0.014324305 = score(doc=3732,freq=1.0), product of:
              0.090150274 = queryWeight, product of:
                1.1274977 = boost
                3.3897307 = idf(docFreq=3845, maxDocs=41962)
                0.023587734 = queryNorm
              0.15889363 = fieldWeight in 3732, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3897307 = idf(docFreq=3845, maxDocs=41962)
                0.046875 = fieldNorm(doc=3732)
          0.03784278 = weight(abstract_txt:vocabulary in 3732) [ClassicSimilarity], result of:
            0.03784278 = score(doc=3732,freq=1.0), product of:
              0.15050225 = queryWeight, product of:
                1.1894823 = boost
                5.364124 = idf(docFreq=533, maxDocs=41962)
                0.023587734 = queryNorm
              0.2514433 = fieldWeight in 3732, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.364124 = idf(docFreq=533, maxDocs=41962)
                0.046875 = fieldNorm(doc=3732)
          0.017314347 = weight(abstract_txt:that in 3732) [ClassicSimilarity], result of:
            0.017314347 = score(doc=3732,freq=5.0), product of:
              0.068479806 = queryWeight, product of:
                1.2035358 = boost
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.023587734 = queryNorm
              0.25283873 = fieldWeight in 3732, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.046875 = fieldNorm(doc=3732)
          0.017910467 = weight(abstract_txt:information in 3732) [ClassicSimilarity], result of:
            0.017910467 = score(doc=3732,freq=5.0), product of:
              0.07004273 = queryWeight, product of:
                1.2171925 = boost
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.023587734 = queryNorm
              0.25570774 = fieldWeight in 3732, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.046875 = fieldNorm(doc=3732)
          0.020828994 = weight(abstract_txt:have in 3732) [ClassicSimilarity], result of:
            0.020828994 = score(doc=3732,freq=2.0), product of:
              0.09667982 = queryWeight, product of:
                1.2611694 = boost
                3.2499464 = idf(docFreq=4422, maxDocs=41962)
                0.023587734 = queryNorm
              0.21544303 = fieldWeight in 3732, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2499464 = idf(docFreq=4422, maxDocs=41962)
                0.046875 = fieldNorm(doc=3732)
          0.032633845 = weight(abstract_txt:query in 3732) [ClassicSimilarity], result of:
            0.032633845 = score(doc=3732,freq=1.0), product of:
              0.14688241 = queryWeight, product of:
                1.3137914 = boost
                4.739769 = idf(docFreq=996, maxDocs=41962)
                0.023587734 = queryNorm
              0.22217667 = fieldWeight in 3732, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.739769 = idf(docFreq=996, maxDocs=41962)
                0.046875 = fieldNorm(doc=3732)
          0.1015795 = weight(abstract_txt:irrelevant in 3732) [ClassicSimilarity], result of:
            0.1015795 = score(doc=3732,freq=1.0), product of:
              0.2906866 = queryWeight, product of:
                1.6530995 = boost
                7.454865 = idf(docFreq=65, maxDocs=41962)
                0.023587734 = queryNorm
              0.3494468 = fieldWeight in 3732, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.454865 = idf(docFreq=65, maxDocs=41962)
                0.046875 = fieldNorm(doc=3732)
          0.03235491 = weight(abstract_txt:retrieval in 3732) [ClassicSimilarity], result of:
            0.03235491 = score(doc=3732,freq=2.0), product of:
              0.14100416 = queryWeight, product of:
                1.7270055 = boost
                3.4614017 = idf(docFreq=3579, maxDocs=41962)
                0.023587734 = queryNorm
              0.22946069 = fieldWeight in 3732, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4614017 = idf(docFreq=3579, maxDocs=41962)
                0.046875 = fieldNorm(doc=3732)
          0.04286523 = weight(abstract_txt:documents in 3732) [ClassicSimilarity], result of:
            0.04286523 = score(doc=3732,freq=1.0), product of:
              0.22195815 = queryWeight, product of:
                2.2839787 = boost
                4.1199584 = idf(docFreq=1852, maxDocs=41962)
                0.023587734 = queryNorm
              0.19312304 = fieldWeight in 3732, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1199584 = idf(docFreq=1852, maxDocs=41962)
                0.046875 = fieldNorm(doc=3732)
          0.063873485 = weight(abstract_txt:terms in 3732) [ClassicSimilarity], result of:
            0.063873485 = score(doc=3732,freq=2.0), product of:
              0.23724784 = queryWeight, product of:
                2.4765892 = boost
                4.061272 = idf(docFreq=1964, maxDocs=41962)
                0.023587734 = queryNorm
              0.26922685 = fieldWeight in 3732, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.061272 = idf(docFreq=1964, maxDocs=41962)
                0.046875 = fieldNorm(doc=3732)
          0.1612294 = weight(abstract_txt:words in 3732) [ClassicSimilarity], result of:
            0.1612294 = score(doc=3732,freq=2.0), product of:
              0.45277336 = queryWeight, product of:
                3.5734487 = boost
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.023587734 = queryNorm
              0.35609296 = fieldWeight in 3732, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.046875 = fieldNorm(doc=3732)
        0.52 = coord(13/25)
    
  4. Jacquemin, C.: Spotting and discovering terms through natural language processing (2001) 0.32
    0.3240781 = sum of:
      0.3240781 = product of:
        0.73654115 = sum of:
          0.029981118 = weight(abstract_txt:semantic in 2120) [ClassicSimilarity], result of:
            0.029981118 = score(doc=2120,freq=1.0), product of:
              0.106371924 = queryWeight, product of:
                4.509629 = idf(docFreq=1254, maxDocs=41962)
                0.023587734 = queryNorm
              0.2818518 = fieldWeight in 2120, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.509629 = idf(docFreq=1254, maxDocs=41962)
                0.0625 = fieldNorm(doc=2120)
          0.027010167 = weight(abstract_txt:used in 2120) [ClassicSimilarity], result of:
            0.027010167 = score(doc=2120,freq=2.0), product of:
              0.090150274 = queryWeight, product of:
                1.1274977 = boost
                3.3897307 = idf(docFreq=3845, maxDocs=41962)
                0.023587734 = queryNorm
              0.2996127 = fieldWeight in 2120, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3897307 = idf(docFreq=3845, maxDocs=41962)
                0.0625 = fieldNorm(doc=2120)
          0.014600739 = weight(abstract_txt:that in 2120) [ClassicSimilarity], result of:
            0.014600739 = score(doc=2120,freq=2.0), product of:
              0.068479806 = queryWeight, product of:
                1.2035358 = boost
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.023587734 = queryNorm
              0.21321233 = fieldWeight in 2120, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.0625 = fieldNorm(doc=2120)
          0.018497849 = weight(abstract_txt:information in 2120) [ClassicSimilarity], result of:
            0.018497849 = score(doc=2120,freq=3.0), product of:
              0.07004273 = queryWeight, product of:
                1.2171925 = boost
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.023587734 = queryNorm
              0.2640938 = fieldWeight in 2120, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.0625 = fieldNorm(doc=2120)
          0.060512356 = weight(abstract_txt:texts in 2120) [ClassicSimilarity], result of:
            0.060512356 = score(doc=2120,freq=1.0), product of:
              0.1698861 = queryWeight, product of:
                1.2637624 = boost
                5.699099 = idf(docFreq=381, maxDocs=41962)
                0.023587734 = queryNorm
              0.3561937 = fieldWeight in 2120, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.699099 = idf(docFreq=381, maxDocs=41962)
                0.0625 = fieldNorm(doc=2120)
          0.043511793 = weight(abstract_txt:query in 2120) [ClassicSimilarity], result of:
            0.043511793 = score(doc=2120,freq=1.0), product of:
              0.14688241 = queryWeight, product of:
                1.3137914 = boost
                4.739769 = idf(docFreq=996, maxDocs=41962)
                0.023587734 = queryNorm
              0.29623556 = fieldWeight in 2120, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.739769 = idf(docFreq=996, maxDocs=41962)
                0.0625 = fieldNorm(doc=2120)
          0.046262156 = weight(abstract_txt:text in 2120) [ClassicSimilarity], result of:
            0.046262156 = score(doc=2120,freq=2.0), product of:
              0.12905242 = queryWeight, product of:
                1.3490101 = boost
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.023587734 = queryNorm
              0.35847571 = fieldWeight in 2120, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.0625 = fieldNorm(doc=2120)
          0.05283535 = weight(abstract_txt:retrieval in 2120) [ClassicSimilarity], result of:
            0.05283535 = score(doc=2120,freq=3.0), product of:
              0.14100416 = queryWeight, product of:
                1.7270055 = boost
                3.4614017 = idf(docFreq=3579, maxDocs=41962)
                0.023587734 = queryNorm
              0.37470773 = fieldWeight in 2120, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4614017 = idf(docFreq=3579, maxDocs=41962)
                0.0625 = fieldNorm(doc=2120)
          0.10791608 = weight(abstract_txt:controlled in 2120) [ClassicSimilarity], result of:
            0.10791608 = score(doc=2120,freq=1.0), product of:
              0.31477186 = queryWeight, product of:
                2.432761 = boost
                5.4854245 = idf(docFreq=472, maxDocs=41962)
                0.023587734 = queryNorm
              0.34283903 = fieldWeight in 2120, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4854245 = idf(docFreq=472, maxDocs=41962)
                0.0625 = fieldNorm(doc=2120)
          0.120441005 = weight(abstract_txt:terms in 2120) [ClassicSimilarity], result of:
            0.120441005 = score(doc=2120,freq=4.0), product of:
              0.23724784 = queryWeight, product of:
                2.4765892 = boost
                4.061272 = idf(docFreq=1964, maxDocs=41962)
                0.023587734 = queryNorm
              0.507659 = fieldWeight in 2120, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.061272 = idf(docFreq=1964, maxDocs=41962)
                0.0625 = fieldNorm(doc=2120)
          0.21497254 = weight(abstract_txt:words in 2120) [ClassicSimilarity], result of:
            0.21497254 = score(doc=2120,freq=2.0), product of:
              0.45277336 = queryWeight, product of:
                3.5734487 = boost
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.023587734 = queryNorm
              0.4747906 = fieldWeight in 2120, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.0625 = fieldNorm(doc=2120)
        0.44 = coord(11/25)
    
  5. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.31
    0.31079897 = sum of:
      0.31079897 = product of:
        0.77699745 = sum of:
          0.014600739 = weight(abstract_txt:that in 2605) [ClassicSimilarity], result of:
            0.014600739 = score(doc=2605,freq=2.0), product of:
              0.068479806 = queryWeight, product of:
                1.2035358 = boost
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.023587734 = queryNorm
              0.21321233 = fieldWeight in 2605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.0625 = fieldNorm(doc=2605)
          0.010679739 = weight(abstract_txt:information in 2605) [ClassicSimilarity], result of:
            0.010679739 = score(doc=2605,freq=1.0), product of:
              0.07004273 = queryWeight, product of:
                1.2171925 = boost
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.023587734 = queryNorm
              0.15247463 = fieldWeight in 2605, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.439594 = idf(docFreq=9945, maxDocs=41962)
                0.0625 = fieldNorm(doc=2605)
          0.051927965 = weight(abstract_txt:large in 2605) [ClassicSimilarity], result of:
            0.051927965 = score(doc=2605,freq=2.0), product of:
              0.13116643 = queryWeight, product of:
                1.2415175 = boost
                4.4790263 = idf(docFreq=1293, maxDocs=41962)
                0.023587734 = queryNorm
              0.39589372 = fieldWeight in 2605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4790263 = idf(docFreq=1293, maxDocs=41962)
                0.0625 = fieldNorm(doc=2605)
          0.034013607 = weight(abstract_txt:have in 2605) [ClassicSimilarity], result of:
            0.034013607 = score(doc=2605,freq=3.0), product of:
              0.09667982 = queryWeight, product of:
                1.2611694 = boost
                3.2499464 = idf(docFreq=4422, maxDocs=41962)
                0.023587734 = queryNorm
              0.351817 = fieldWeight in 2605, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.2499464 = idf(docFreq=4422, maxDocs=41962)
                0.0625 = fieldNorm(doc=2605)
          0.08557739 = weight(abstract_txt:texts in 2605) [ClassicSimilarity], result of:
            0.08557739 = score(doc=2605,freq=2.0), product of:
              0.1698861 = queryWeight, product of:
                1.2637624 = boost
                5.699099 = idf(docFreq=381, maxDocs=41962)
                0.023587734 = queryNorm
              0.50373393 = fieldWeight in 2605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.699099 = idf(docFreq=381, maxDocs=41962)
                0.0625 = fieldNorm(doc=2605)
          0.032712284 = weight(abstract_txt:text in 2605) [ClassicSimilarity], result of:
            0.032712284 = score(doc=2605,freq=1.0), product of:
              0.12905242 = queryWeight, product of:
                1.3490101 = boost
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.023587734 = queryNorm
              0.2534806 = fieldWeight in 2605, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.0625 = fieldNorm(doc=2605)
          0.030504502 = weight(abstract_txt:retrieval in 2605) [ClassicSimilarity], result of:
            0.030504502 = score(doc=2605,freq=1.0), product of:
              0.14100416 = queryWeight, product of:
                1.7270055 = boost
                3.4614017 = idf(docFreq=3579, maxDocs=41962)
                0.023587734 = queryNorm
              0.2163376 = fieldWeight in 2605, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4614017 = idf(docFreq=3579, maxDocs=41962)
                0.0625 = fieldNorm(doc=2605)
          0.12779944 = weight(abstract_txt:documents in 2605) [ClassicSimilarity], result of:
            0.12779944 = score(doc=2605,freq=5.0), product of:
              0.22195815 = queryWeight, product of:
                2.2839787 = boost
                4.1199584 = idf(docFreq=1852, maxDocs=41962)
                0.023587734 = queryNorm
              0.5757817 = fieldWeight in 2605, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.1199584 = idf(docFreq=1852, maxDocs=41962)
                0.0625 = fieldNorm(doc=2605)
          0.08516465 = weight(abstract_txt:terms in 2605) [ClassicSimilarity], result of:
            0.08516465 = score(doc=2605,freq=2.0), product of:
              0.23724784 = queryWeight, product of:
                2.4765892 = boost
                4.061272 = idf(docFreq=1964, maxDocs=41962)
                0.023587734 = queryNorm
              0.35896912 = fieldWeight in 2605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.061272 = idf(docFreq=1964, maxDocs=41962)
                0.0625 = fieldNorm(doc=2605)
          0.3040171 = weight(abstract_txt:words in 2605) [ClassicSimilarity], result of:
            0.3040171 = score(doc=2605,freq=4.0), product of:
              0.45277336 = queryWeight, product of:
                3.5734487 = boost
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.023587734 = queryNorm
              0.6714553 = fieldWeight in 2605, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.0625 = fieldNorm(doc=2605)
        0.4 = coord(10/25)