Search (1 results, page 1 of 1)

  • × author_ss:"Dumais, S.T."
  • × theme_ss:"Literaturübersicht"
  1. Dumais, S.T.: Latent semantic analysis (2003) 0.00
    0.0010100347 = product of:
      0.010100347 = sum of:
        0.010100347 = weight(_text_:web in 2462) [ClassicSimilarity], result of:
          0.010100347 = score(doc=2462,freq=2.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.108171105 = fieldWeight in 2462, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0234375 = fieldNorm(doc=2462)
      0.1 = coord(1/10)
    
    Abstract
    Latent Semantic Analysis (LSA) was first introduced in Dumais, Furnas, Landauer, and Deerwester (1988) and Deerwester, Dumais, Furnas, Landauer, and Harshman (1990) as a technique for improving information retrieval. The key insight in LSA was to reduce the dimensionality of the information retrieval problem. Most approaches to retrieving information depend an a lexical match between words in the user's query and those in documents. Indeed, this lexical matching is the way that the popular Web and enterprise search engines work. Such systems are, however, far from ideal. We are all aware of the tremendous amount of irrelevant information that is retrieved when searching. We also fail to find much of the existing relevant material. LSA was designed to address these retrieval problems, using dimension reduction techniques. Fundamental characteristics of human word usage underlie these retrieval failures. People use a wide variety of words to describe the same object or concept (synonymy). Furnas, Landauer, Gomez, and Dumais (1987) showed that people generate the same keyword to describe well-known objects only 20 percent of the time. Poor agreement was also observed in studies of inter-indexer consistency (e.g., Chan, 1989; Tarr & Borko, 1974) in the generation of search terms (e.g., Fidel, 1985; Bates, 1986), and in the generation of hypertext links (Furner, Ellis, & Willett, 1999). Because searchers and authors often use different words, relevant materials are missed. Someone looking for documents an "human-computer interaction" will not find articles that use only the phrase "man-machine studies" or "human factors." People also use the same word to refer to different things (polysemy). Words like "saturn," "jaguar," or "chip" have several different meanings. A short query like "saturn" will thus return many irrelevant documents. The query "Saturn Gar" will return fewer irrelevant items, but it will miss some documents that use only the terms "Saturn automobile." In searching, there is a constant tension between being overly specific and missing relevant information, and being more general and returning irrelevant information.