Search (3 results, page 1 of 1)

  • × author_ss:"Raita, T."
  • × author_ss:"Bookstein, A."
  1. Bookstein, A.; Klein, S.T.; Raita, T.: Clumping properties of content-bearing words (1998) 0.00
    0.0029446408 = product of:
      0.011778563 = sum of:
        0.011778563 = weight(_text_:information in 442) [ClassicSimilarity], result of:
          0.011778563 = score(doc=442,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.1920054 = fieldWeight in 442, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=442)
      0.25 = coord(1/4)
    
    Abstract
    Information Retrieval Systems identify content bearing words, and possibly also assign weights, as part of the process of formulating requests. For optimal retrieval efficiency, it is desirable that this be done automatically. This article defines the notion of serial clustering of words in text, and explores the value of such clustering as an indicator of a word's bearing content. This approach is flexible in the sense that it is sensitive to context: a term may be assessed as content-bearing within one collection, but not another. Our approach, being numerical, may also be of value in assigning weights to terms in requests. Experimental support is obtained from natural text databases in three different languages
    Source
    Journal of the American Society for Information Science. 49(1998) no.2, S.102-114
  2. Bookstein, A.; Raita, T.: Discovering term occurence structure in text (2001) 0.00
    0.0029446408 = product of:
      0.011778563 = sum of:
        0.011778563 = weight(_text_:information in 5751) [ClassicSimilarity], result of:
          0.011778563 = score(doc=5751,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.1920054 = fieldWeight in 5751, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5751)
      0.25 = coord(1/4)
    
    Abstract
    This article examines some consequences for information control of the tendency of occurrences of contentbearing terms to appear together, or clump. Properties of previously defined clumping measures are reviewed and extended, and the significance of these measures for devising retrieval strategies discussed. A new type of clumping measure, which extends the earlier measures by permitting gaps within a clump, is defined, and several variants examined. Experiments are carried out that indicate the relation between the new measure and one of the earlier measures, as well as the ability of the two types of measure to predict compression efficiency
    Source
    Journal of the American Society for Information Science and technology. 52(2001) no.6, S.476-486
  3. Bookstein, A.; Kulyukin, V.; Raita, T.; Nicholson, J.: Adapting measures of clumping strength to assess term-term similarity (2003) 0.00
    0.0025239778 = product of:
      0.010095911 = sum of:
        0.010095911 = weight(_text_:information in 1609) [ClassicSimilarity], result of:
          0.010095911 = score(doc=1609,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.16457605 = fieldWeight in 1609, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1609)
      0.25 = coord(1/4)
    
    Abstract
    Automated information retrieval relies heavily an statistical regularities that emerge as terms are deposited to produce text. This paper examines statistical patterns expected of a pair of terms that are semantically related to each other. Guided by a conceptualization of the text generation process, we derive measures of how tightly two terms are semantically associated. Our main objective is to probe whether such measures yield reasonable results. Specifically, we examine how the tendency of a content bearing term to clump, as quantified by previously developed measures of term clumping, is influenced by the presence of other terms. This approach allows us to present a toolkit from which a range of measures can be constructed. As an illustration, one of several suggested measures is evaluated an a large text corpus built from an on-line encyclopedia.
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.7, S.611-620