Search (3 results, page 1 of 1)

Dumais, S.T.; Belkin, N.J.: ¬The TREC interactive tracks : putting the user into search (2005) 0.02

0.018607289 = product of:
  0.037214577 = sum of:
    0.037214577 = product of:
      0.074429154 = sum of:
        0.074429154 = weight(_text_:u in 5081) [ClassicSimilarity], result of:
          0.074429154 = score(doc=5081,freq=2.0), product of:
            0.17144279 = queryWeight, product of:
              3.2744443 = idf(docFreq=4547, maxDocs=44218)
              0.052357826 = queryNorm
            0.43413407 = fieldWeight in 5081, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2744443 = idf(docFreq=4547, maxDocs=44218)
              0.09375 = fieldNorm(doc=5081)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: TREC: experiment and evaluation in information retrieval. Ed.: E.M. Voorhees, u. D.K. Harman

Berry, M.W.; Dumais, S.T.; O'Brien, G.W.: Using linear algebra for intelligent information retrieval (1995) 0.01

0.009303644 = product of:
  0.018607289 = sum of:
    0.018607289 = product of:
      0.037214577 = sum of:
        0.037214577 = weight(_text_:u in 2206) [ClassicSimilarity], result of:
          0.037214577 = score(doc=2206,freq=2.0), product of:
            0.17144279 = queryWeight, product of:
              3.2744443 = idf(docFreq=4547, maxDocs=44218)
              0.052357826 = queryNorm
            0.21706703 = fieldWeight in 2206, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2744443 = idf(docFreq=4547, maxDocs=44218)
              0.046875 = fieldNorm(doc=2206)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Theme: Semantisches Umfeld in Indexierung u. Retrieval

Dumais, S.T.: Latent semantic analysis (2003) 0.01
```
0.006375829 = product of:
  0.012751658 = sum of:
    0.012751658 = product of:
      0.051006634 = sum of:
        0.051006634 = weight(_text_:authors in 2462) [ClassicSimilarity], result of:
          0.051006634 = score(doc=2462,freq=4.0), product of:
            0.2386896 = queryWeight, product of:
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.052357826 = queryNorm
            0.21369441 = fieldWeight in 2462, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.0234375 = fieldNorm(doc=2462)
      0.25 = coord(1/4)
  0.5 = coord(1/2)
```
Abstract

Latent Semantic Analysis (LSA) was first introduced in Dumais, Furnas, Landauer, and Deerwester (1988) and Deerwester, Dumais, Furnas, Landauer, and Harshman (1990) as a technique for improving information retrieval. The key insight in LSA was to reduce the dimensionality of the information retrieval problem. Most approaches to retrieving information depend an a lexical match between words in the user's query and those in documents. Indeed, this lexical matching is the way that the popular Web and enterprise search engines work. Such systems are, however, far from ideal. We are all aware of the tremendous amount of irrelevant information that is retrieved when searching. We also fail to find much of the existing relevant material. LSA was designed to address these retrieval problems, using dimension reduction techniques. Fundamental characteristics of human word usage underlie these retrieval failures. People use a wide variety of words to describe the same object or concept (synonymy). Furnas, Landauer, Gomez, and Dumais (1987) showed that people generate the same keyword to describe well-known objects only 20 percent of the time. Poor agreement was also observed in studies of inter-indexer consistency (e.g., Chan, 1989; Tarr & Borko, 1974) in the generation of search terms (e.g., Fidel, 1985; Bates, 1986), and in the generation of hypertext links (Furner, Ellis, & Willett, 1999). Because searchers and authors often use different words, relevant materials are missed. Someone looking for documents an "human-computer interaction" will not find articles that use only the phrase "man-machine studies" or "human factors." People also use the same word to refer to different things (polysemy). Words like "saturn," "jaguar," or "chip" have several different meanings. A short query like "saturn" will thus return many irrelevant documents. The query "Saturn Gar" will return fewer irrelevant items, but it will miss some documents that use only the terms "Saturn automobile." In searching, there is a constant tension between being overly specific and missing relevant information, and being more general and returning irrelevant information.
A number of approaches have been developed in information retrieval to address the problems caused by the variability in word usage. Stemming is a popular technique used to normalize some kinds of surface-level variability by converting words to their morphological root. For example, the words "retrieve," "retrieval," "retrieved," and "retrieving" would all be converted to their root form, "retrieve." The root form is used for both document and query processing. Stemming sometimes helps retrieval, although not much (Harman, 1991; Hull, 1996). And, it does not address Gases where related words are not morphologically related (e.g., physician and doctor). Controlled vocabularies have also been used to limit variability by requiring that query and index terms belong to a pre-defined set of terms. Documents are indexed by a specified or authorized list of subject headings or index terms, called the controlled vocabulary. Library of Congress Subject Headings, Medical Subject Headings, Association for Computing Machinery (ACM) keywords, and Yellow Pages headings are examples of controlled vocabularies. If searchers can find the right controlled vocabulary terms, they do not have to think of all the morphologically related or synonymous terms that authors might have used. However, assigning controlled vocabulary terms in a consistent and thorough manner is a time-consuming and usually manual process. A good deal of research has been published about the effectiveness of controlled vocabulary indexing compared to full text indexing (e.g., Bates, 1998; Lancaster, 1986; Svenonius, 1986). The combination of both full text and controlled vocabularies is often better than either alone, although the size of the advantage is variable (Lancaster, 1986; Markey, Atherton, & Newton, 1982; Srinivasan, 1996). Richer thesauri have also been used to provide synonyms, generalizations, and specializations of users' search terms (see Srinivasan, 1992, for a review). Controlled vocabularies and thesaurus entries can be generated either manually or by the automatic analysis of large collections of texts.

Search (3 results, page 1 of 1)

Authors

Years

Themes