Search (1 results, page 1 of 1)

  • × author_ss:"Kulyukin, V."
  • × theme_ss:"Computerlinguistik"
  1. Bookstein, A.; Kulyukin, V.; Raita, T.; Nicholson, J.: Adapting measures of clumping strength to assess term-term similarity (2003) 0.02
    0.023887785 = product of:
      0.04777557 = sum of:
        0.04777557 = product of:
          0.09555114 = sum of:
            0.09555114 = weight(_text_:encyclopedia in 1609) [ClassicSimilarity], result of:
              0.09555114 = score(doc=1609,freq=2.0), product of:
                0.270842 = queryWeight, product of:
                  5.321862 = idf(docFreq=586, maxDocs=44218)
                  0.05089233 = queryNorm
                0.35279295 = fieldWeight in 1609, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.321862 = idf(docFreq=586, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1609)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Automated information retrieval relies heavily an statistical regularities that emerge as terms are deposited to produce text. This paper examines statistical patterns expected of a pair of terms that are semantically related to each other. Guided by a conceptualization of the text generation process, we derive measures of how tightly two terms are semantically associated. Our main objective is to probe whether such measures yield reasonable results. Specifically, we examine how the tendency of a content bearing term to clump, as quantified by previously developed measures of term clumping, is influenced by the presence of other terms. This approach allows us to present a toolkit from which a range of measures can be constructed. As an illustration, one of several suggested measures is evaluated an a large text corpus built from an on-line encyclopedia.