Document (#23752)

Author
Bookstein, A.
Raita, T.
Title
Discovering term occurence structure in text
Source
Journal of the American Society for Information Science and technology. 52(2001) no.6, S.476-486
Year
2001
Abstract
This article examines some consequences for information control of the tendency of occurrences of contentbearing terms to appear together, or clump. Properties of previously defined clumping measures are reviewed and extended, and the significance of these measures for devising retrieval strategies discussed. A new type of clumping measure, which extends the earlier measures by permitting gaps within a clump, is defined, and several variants examined. Experiments are carried out that indicate the relation between the new measure and one of the earlier measures, as well as the ability of the two types of measure to predict compression efficiency
Theme
Informetrie

Similar documents (author)

  1. Bookstein, A.: Probability and Fuzzy-set applications to information retrieval (1985) 5.35
    5.3508706 = sum of:
      5.3508706 = weight(author_txt:bookstein in 781) [ClassicSimilarity], result of:
        5.3508706 = fieldWeight in 781, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.561393 = idf(docFreq=22, maxDocs=44218)
          0.625 = fieldNorm(doc=781)
    
  2. Bookstein, A.: Relevance (1979) 5.35
    5.3508706 = sum of:
      5.3508706 = weight(author_txt:bookstein in 839) [ClassicSimilarity], result of:
        5.3508706 = fieldWeight in 839, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.561393 = idf(docFreq=22, maxDocs=44218)
          0.625 = fieldNorm(doc=839)
    
  3. Bookstein, A.: Fuzzy requests : an approach to weighted Boolean searches (1979) 5.35
    5.3508706 = sum of:
      5.3508706 = weight(author_txt:bookstein in 5504) [ClassicSimilarity], result of:
        5.3508706 = fieldWeight in 5504, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.561393 = idf(docFreq=22, maxDocs=44218)
          0.625 = fieldNorm(doc=5504)
    
  4. Bookstein, A.: Informetric distributions : I. Unified overview (1990) 5.35
    5.3508706 = sum of:
      5.3508706 = weight(author_txt:bookstein in 6902) [ClassicSimilarity], result of:
        5.3508706 = fieldWeight in 6902, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.561393 = idf(docFreq=22, maxDocs=44218)
          0.625 = fieldNorm(doc=6902)
    
  5. Bookstein, A.: ¬The bibliometric distributions (1976) 5.35
    5.3508706 = sum of:
      5.3508706 = weight(author_txt:bookstein in 5061) [ClassicSimilarity], result of:
        5.3508706 = fieldWeight in 5061, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.561393 = idf(docFreq=22, maxDocs=44218)
          0.625 = fieldNorm(doc=5061)
    

Similar documents (content)

  1. Bookstein, A.; Kulyukin, V.; Raita, T.; Nicholson, J.: Adapting measures of clumping strength to assess term-term similarity (2003) 0.14
    0.14201596 = sum of:
      0.14201596 = product of:
        0.88759977 = sum of:
          0.051363166 = weight(abstract_txt:previously in 1609) [ClassicSimilarity], result of:
            0.051363166 = score(doc=1609,freq=1.0), product of:
              0.107142515 = queryWeight, product of:
                1.0717093 = boost
                6.1362057 = idf(docFreq=259, maxDocs=44218)
                0.016292395 = queryNorm
              0.47939107 = fieldWeight in 1609, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1362057 = idf(docFreq=259, maxDocs=44218)
                0.078125 = fieldNorm(doc=1609)
          0.08486468 = weight(abstract_txt:tendency in 1609) [ClassicSimilarity], result of:
            0.08486468 = score(doc=1609,freq=1.0), product of:
              0.14974257 = queryWeight, product of:
                1.266977 = boost
                7.2542357 = idf(docFreq=84, maxDocs=44218)
                0.016292395 = queryNorm
              0.5667372 = fieldWeight in 1609, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2542357 = idf(docFreq=84, maxDocs=44218)
                0.078125 = fieldNorm(doc=1609)
          0.4320781 = weight(abstract_txt:clumping in 1609) [ClassicSimilarity], result of:
            0.4320781 = score(doc=1609,freq=1.0), product of:
              0.55835724 = queryWeight, product of:
                3.4599285 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.016292395 = queryNorm
              0.7738381 = fieldWeight in 1609, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.078125 = fieldNorm(doc=1609)
          0.3192938 = weight(abstract_txt:measures in 1609) [ClassicSimilarity], result of:
            0.3192938 = score(doc=1609,freq=5.0), product of:
              0.33626702 = queryWeight, product of:
                3.7972414 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.016292395 = queryNorm
              0.9495246 = fieldWeight in 1609, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=1609)
        0.16 = coord(4/25)
    
  2. Sun, A.; Lim, E.-P.; Ng, W.-K.: Performance measurement framework for hierarchical text classification (2003) 0.08
    0.08231779 = sum of:
      0.08231779 = product of:
        0.5144862 = sum of:
          0.040190738 = weight(abstract_txt:extended in 1808) [ClassicSimilarity], result of:
            0.040190738 = score(doc=1808,freq=1.0), product of:
              0.10557262 = queryWeight, product of:
                1.0638287 = boost
                6.091085 = idf(docFreq=271, maxDocs=44218)
                0.016292395 = queryNorm
              0.3806928 = fieldWeight in 1808, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.091085 = idf(docFreq=271, maxDocs=44218)
                0.0625 = fieldNorm(doc=1808)
          0.050769407 = weight(abstract_txt:defined in 1808) [ClassicSimilarity], result of:
            0.050769407 = score(doc=1808,freq=1.0), product of:
              0.15543377 = queryWeight, product of:
                1.825508 = boost
                5.2260876 = idf(docFreq=645, maxDocs=44218)
                0.016292395 = queryNorm
              0.32663047 = fieldWeight in 1808, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2260876 = idf(docFreq=645, maxDocs=44218)
                0.0625 = fieldNorm(doc=1808)
          0.12129128 = weight(abstract_txt:measure in 1808) [ClassicSimilarity], result of:
            0.12129128 = score(doc=1808,freq=2.0), product of:
              0.2523776 = queryWeight, product of:
                2.848932 = boost
                5.437306 = idf(docFreq=522, maxDocs=44218)
                0.016292395 = queryNorm
              0.4805945 = fieldWeight in 1808, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.437306 = idf(docFreq=522, maxDocs=44218)
                0.0625 = fieldNorm(doc=1808)
          0.3022348 = weight(abstract_txt:measures in 1808) [ClassicSimilarity], result of:
            0.3022348 = score(doc=1808,freq=7.0), product of:
              0.33626702 = queryWeight, product of:
                3.7972414 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.016292395 = queryNorm
              0.89879405 = fieldWeight in 1808, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=1808)
        0.16 = coord(4/25)
    
  3. Bar-Hillel, Y.; Carnap, R.: ¬An outline of a theory of semantic information (1952) 0.08
    0.0814224 = sum of:
      0.0814224 = product of:
        0.407112 = sum of:
          0.033381883 = weight(abstract_txt:carried in 3369) [ClassicSimilarity], result of:
            0.033381883 = score(doc=3369,freq=1.0), product of:
              0.09328415 = queryWeight, product of:
                5.7256255 = idf(docFreq=391, maxDocs=44218)
                0.016292395 = queryNorm
              0.3578516 = fieldWeight in 3369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7256255 = idf(docFreq=391, maxDocs=44218)
                0.0625 = fieldNorm(doc=3369)
          0.04011814 = weight(abstract_txt:efficiency in 3369) [ClassicSimilarity], result of:
            0.04011814 = score(doc=3369,freq=1.0), product of:
              0.10544545 = queryWeight, product of:
                1.0631878 = boost
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.016292395 = queryNorm
              0.38046345 = fieldWeight in 3369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.0625 = fieldNorm(doc=3369)
          0.050769407 = weight(abstract_txt:defined in 3369) [ClassicSimilarity], result of:
            0.050769407 = score(doc=3369,freq=1.0), product of:
              0.15543377 = queryWeight, product of:
                1.825508 = boost
                5.2260876 = idf(docFreq=645, maxDocs=44218)
                0.016292395 = queryNorm
              0.32663047 = fieldWeight in 3369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2260876 = idf(docFreq=645, maxDocs=44218)
                0.0625 = fieldNorm(doc=3369)
          0.12129128 = weight(abstract_txt:measure in 3369) [ClassicSimilarity], result of:
            0.12129128 = score(doc=3369,freq=2.0), product of:
              0.2523776 = queryWeight, product of:
                2.848932 = boost
                5.437306 = idf(docFreq=522, maxDocs=44218)
                0.016292395 = queryNorm
              0.4805945 = fieldWeight in 3369, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.437306 = idf(docFreq=522, maxDocs=44218)
                0.0625 = fieldNorm(doc=3369)
          0.1615513 = weight(abstract_txt:measures in 3369) [ClassicSimilarity], result of:
            0.1615513 = score(doc=3369,freq=2.0), product of:
              0.33626702 = queryWeight, product of:
                3.7972414 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.016292395 = queryNorm
              0.48042563 = fieldWeight in 3369, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=3369)
        0.2 = coord(5/25)
    
  4. Eck, N.J. van; Waltman, L.: How to normalize cooccurrence data? : an analysis of some well-known similarity measures (2009) 0.08
    0.07651254 = sum of:
      0.07651254 = product of:
        0.6376045 = sum of:
          0.04539628 = weight(abstract_txt:properties in 2942) [ClassicSimilarity], result of:
            0.04539628 = score(doc=2942,freq=1.0), product of:
              0.09867508 = queryWeight, product of:
                1.0284894 = boost
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.016292395 = queryNorm
              0.46005818 = fieldWeight in 2942, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.078125 = fieldNorm(doc=2942)
          0.21441472 = weight(abstract_txt:measure in 2942) [ClassicSimilarity], result of:
            0.21441472 = score(doc=2942,freq=4.0), product of:
              0.2523776 = queryWeight, product of:
                2.848932 = boost
                5.437306 = idf(docFreq=522, maxDocs=44218)
                0.016292395 = queryNorm
              0.84957904 = fieldWeight in 2942, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.437306 = idf(docFreq=522, maxDocs=44218)
                0.078125 = fieldNorm(doc=2942)
          0.37779352 = weight(abstract_txt:measures in 2942) [ClassicSimilarity], result of:
            0.37779352 = score(doc=2942,freq=7.0), product of:
              0.33626702 = queryWeight, product of:
                3.7972414 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.016292395 = queryNorm
              1.1234926 = fieldWeight in 2942, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=2942)
        0.12 = coord(3/25)
    
  5. Heine, M.H.: Distance between sets as an objective measure of retrieval effectiveness (1973) 0.07
    0.07298286 = sum of:
      0.07298286 = product of:
        0.4561429 = sum of:
          0.06420003 = weight(abstract_txt:properties in 5515) [ClassicSimilarity], result of:
            0.06420003 = score(doc=5515,freq=2.0), product of:
              0.09867508 = queryWeight, product of:
                1.0284894 = boost
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.016292395 = queryNorm
              0.6506205 = fieldWeight in 5515, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.078125 = fieldNorm(doc=5515)
          0.06346176 = weight(abstract_txt:defined in 5515) [ClassicSimilarity], result of:
            0.06346176 = score(doc=5515,freq=1.0), product of:
              0.15543377 = queryWeight, product of:
                1.825508 = boost
                5.2260876 = idf(docFreq=645, maxDocs=44218)
                0.016292395 = queryNorm
              0.4082881 = fieldWeight in 5515, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2260876 = idf(docFreq=645, maxDocs=44218)
                0.078125 = fieldNorm(doc=5515)
          0.1856886 = weight(abstract_txt:measure in 5515) [ClassicSimilarity], result of:
            0.1856886 = score(doc=5515,freq=3.0), product of:
              0.2523776 = queryWeight, product of:
                2.848932 = boost
                5.437306 = idf(docFreq=522, maxDocs=44218)
                0.016292395 = queryNorm
              0.73575705 = fieldWeight in 5515, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.437306 = idf(docFreq=522, maxDocs=44218)
                0.078125 = fieldNorm(doc=5515)
          0.14279252 = weight(abstract_txt:measures in 5515) [ClassicSimilarity], result of:
            0.14279252 = score(doc=5515,freq=1.0), product of:
              0.33626702 = queryWeight, product of:
                3.7972414 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.016292395 = queryNorm
              0.4246403 = fieldWeight in 5515, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=5515)
        0.16 = coord(4/25)