Search (1 results, page 1 of 1)

  • × author_ss:"Chandrasekar, R."
  • × theme_ss:"Computerlinguistik"
  1. Chandrasekar, R.; Bangalore, S.: Glean : using syntactic information in document filtering (2002) 0.07
    0.07253622 = product of:
      0.10880433 = sum of:
        0.058109686 = weight(_text_:search in 4257) [ClassicSimilarity], result of:
          0.058109686 = score(doc=4257,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.33256388 = fieldWeight in 4257, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4257)
        0.05069464 = product of:
          0.10138928 = sum of:
            0.10138928 = weight(_text_:engines in 4257) [ClassicSimilarity], result of:
              0.10138928 = score(doc=4257,freq=4.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.39693922 = fieldWeight in 4257, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4257)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In today's networked world, a huge amount of data is available in machine-processable form. Likewise, there are any number of search engines and specialized information retrieval (IR) programs that seek to extract relevant information from these data repositories. Most IR systems and Web search engines have been designed for speed and tend to maximize the quantity of information (recall) rather than the relevance of the information (precision) to the query. As a result, search engine users get inundated with information for practically any query, and are forced to scan a large number of potentially relevant items to get to the information of interest. The Holy Grail of IR is to somehow retrieve those and only those documents pertinent to the user's query. Polysemy and synonymy - the fact that often there are several meanings for a word or phrase, and likewise, many ways to express a conceptmake this a very hard task. While conventional IR systems provide usable solutions, there are a number of open problems to be solved, in areas such as syntactic processing, semantic analysis, and user modeling, before we develop systems that "understand" user queries and text collections. Meanwhile, we can use tools and techniques available today to improve the precision of retrieval. In particular, using the approach described in this article, we can approximate understanding using the syntactic structure and patterns of language use that is latent in documents to make IR more effective.