Document (#18400)

Author
Adamson, G.W.
Boreham, J.
Title
¬The use of an association measure based on character structure to identify semantically related pairs of words and document titles
Source
Information storage and retrieval. 10(1974), S.253-260
Year
1974
Abstract
An automatic classification technique has been developed, based on the character structure of words. Dice's similarity coefficient is computed from the number of matching diagrams in pairs of character strings, and used to cluster sets of character strings. A sample of words from a chemical data base was chosen to contain certain stems derived from the names of chemical elements. They were successfully clusterd into groups of semantically related words. Each cluster is characterised by the root word from which all its members are derived. A second example of titles from Mathematical Reviews was clustered into well-defined classes, which compare favourably with the subject groupings of Mathematical Reviews
Theme
Automatisches Klassifizieren
Field
Chemie

Similar documents (content)

  1. Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 0.18
    0.17712265 = sum of:
      0.17712265 = product of:
        0.73801106 = sum of:
          0.074035935 = weight(abstract_txt:coefficient in 227) [ClassicSimilarity], result of:
            0.074035935 = score(doc=227,freq=1.0), product of:
              0.15298618 = queryWeight, product of:
                1.1548073 = boost
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.017109303 = queryNorm
              0.48393872 = fieldWeight in 227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.0625 = fieldNorm(doc=227)
          0.10200843 = weight(abstract_txt:pairs in 227) [ClassicSimilarity], result of:
            0.10200843 = score(doc=227,freq=1.0), product of:
              0.23866636 = queryWeight, product of:
                2.0398302 = boost
                6.838563 = idf(docFreq=125, maxDocs=43254)
                0.017109303 = queryNorm
              0.4274102 = fieldWeight in 227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.838563 = idf(docFreq=125, maxDocs=43254)
                0.0625 = fieldNorm(doc=227)
          0.017143149 = weight(abstract_txt:from in 227) [ClassicSimilarity], result of:
            0.017143149 = score(doc=227,freq=1.0), product of:
              0.0986448 = queryWeight, product of:
                2.073507 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.017109303 = queryNorm
              0.17378664 = fieldWeight in 227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.0625 = fieldNorm(doc=227)
          0.12567188 = weight(abstract_txt:strings in 227) [ClassicSimilarity], result of:
            0.12567188 = score(doc=227,freq=1.0), product of:
              0.274279 = queryWeight, product of:
                2.1867278 = boost
                7.3310394 = idf(docFreq=76, maxDocs=43254)
                0.017109303 = queryNorm
              0.45818996 = fieldWeight in 227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3310394 = idf(docFreq=76, maxDocs=43254)
                0.0625 = fieldNorm(doc=227)
          0.1695844 = weight(abstract_txt:words in 227) [ClassicSimilarity], result of:
            0.1695844 = score(doc=227,freq=3.0), product of:
              0.2925908 = queryWeight, product of:
                3.1940649 = boost
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.017109303 = queryNorm
              0.5795958 = fieldWeight in 227, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.0625 = fieldNorm(doc=227)
          0.24956726 = weight(abstract_txt:character in 227) [ClassicSimilarity], result of:
            0.24956726 = score(doc=227,freq=2.0), product of:
              0.43333676 = queryWeight, product of:
                3.8871043 = boost
                6.5157895 = idf(docFreq=173, maxDocs=43254)
                0.017109303 = queryNorm
              0.57591987 = fieldWeight in 227, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5157895 = idf(docFreq=173, maxDocs=43254)
                0.0625 = fieldNorm(doc=227)
        0.24 = coord(6/25)
    
  2. Spitkovsky, V.; Norvig, P.: From words to concepts and back : dictionaries for linking text, entities and ideas (2012) 0.14
    0.14474389 = sum of:
      0.14474389 = product of:
        0.45232466 = sum of:
          0.017339092 = weight(abstract_txt:into in 1802) [ClassicSimilarity], result of:
            0.017339092 = score(doc=1802,freq=2.0), product of:
              0.070415325 = queryWeight, product of:
                1.1079803 = boost
                3.7145214 = idf(docFreq=2864, maxDocs=43254)
                0.017109303 = queryNorm
              0.24624032 = fieldWeight in 1802, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7145214 = idf(docFreq=2864, maxDocs=43254)
                0.046875 = fieldNorm(doc=1802)
          0.062720925 = weight(abstract_txt:groupings in 1802) [ClassicSimilarity], result of:
            0.062720925 = score(doc=1802,freq=1.0), product of:
              0.16592988 = queryWeight, product of:
                1.202668 = boost
                8.063927 = idf(docFreq=36, maxDocs=43254)
                0.017109303 = queryNorm
              0.37799656 = fieldWeight in 1802, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.063927 = idf(docFreq=36, maxDocs=43254)
                0.046875 = fieldNorm(doc=1802)
          0.018031474 = weight(abstract_txt:related in 1802) [ClassicSimilarity], result of:
            0.018031474 = score(doc=1802,freq=1.0), product of:
              0.09106408 = queryWeight, product of:
                1.2600042 = boost
                4.224184 = idf(docFreq=1720, maxDocs=43254)
                0.017109303 = queryNorm
              0.19800863 = fieldWeight in 1802, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.224184 = idf(docFreq=1720, maxDocs=43254)
                0.046875 = fieldNorm(doc=1802)
          0.019931687 = weight(abstract_txt:structure in 1802) [ClassicSimilarity], result of:
            0.019931687 = score(doc=1802,freq=1.0), product of:
              0.09735442 = queryWeight, product of:
                1.3027955 = boost
                4.367643 = idf(docFreq=1490, maxDocs=43254)
                0.017109303 = queryNorm
              0.20473325 = fieldWeight in 1802, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.367643 = idf(docFreq=1490, maxDocs=43254)
                0.046875 = fieldNorm(doc=1802)
          0.044930477 = weight(abstract_txt:titles in 1802) [ClassicSimilarity], result of:
            0.044930477 = score(doc=1802,freq=1.0), product of:
              0.16737361 = queryWeight, product of:
                1.7082126 = boost
                5.72681 = idf(docFreq=382, maxDocs=43254)
                0.017109303 = queryNorm
              0.2684442 = fieldWeight in 1802, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.72681 = idf(docFreq=382, maxDocs=43254)
                0.046875 = fieldNorm(doc=1802)
          0.022269601 = weight(abstract_txt:from in 1802) [ClassicSimilarity], result of:
            0.022269601 = score(doc=1802,freq=3.0), product of:
              0.0986448 = queryWeight, product of:
                2.073507 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.017109303 = queryNorm
              0.22575545 = fieldWeight in 1802, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.046875 = fieldNorm(doc=1802)
          0.16325258 = weight(abstract_txt:strings in 1802) [ClassicSimilarity], result of:
            0.16325258 = score(doc=1802,freq=3.0), product of:
              0.274279 = queryWeight, product of:
                2.1867278 = boost
                7.3310394 = idf(docFreq=76, maxDocs=43254)
                0.017109303 = queryNorm
              0.59520626 = fieldWeight in 1802, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.3310394 = idf(docFreq=76, maxDocs=43254)
                0.046875 = fieldNorm(doc=1802)
          0.10384881 = weight(abstract_txt:words in 1802) [ClassicSimilarity], result of:
            0.10384881 = score(doc=1802,freq=2.0), product of:
              0.2925908 = queryWeight, product of:
                3.1940649 = boost
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.017109303 = queryNorm
              0.3549285 = fieldWeight in 1802, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.046875 = fieldNorm(doc=1802)
        0.32 = coord(8/25)
    
  3. Chen, T.T.: ¬The congruity between linkage-based factors and content-based clusters : an experimental study using multiple document corpora (2016) 0.13
    0.13471353 = sum of:
      0.13471353 = product of:
        0.48111975 = sum of:
          0.10470263 = weight(abstract_txt:coefficient in 4240) [ClassicSimilarity], result of:
            0.10470263 = score(doc=4240,freq=2.0), product of:
              0.15298618 = queryWeight, product of:
                1.1548073 = boost
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.017109303 = queryNorm
              0.6843927 = fieldWeight in 4240, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.0625 = fieldNorm(doc=4240)
          0.07703538 = weight(abstract_txt:clustered in 4240) [ClassicSimilarity], result of:
            0.07703538 = score(doc=4240,freq=1.0), product of:
              0.15709075 = queryWeight, product of:
                1.1701963 = boost
                7.846204 = idf(docFreq=45, maxDocs=43254)
                0.017109303 = queryNorm
              0.49038774 = fieldWeight in 4240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.846204 = idf(docFreq=45, maxDocs=43254)
                0.0625 = fieldNorm(doc=4240)
          0.024041966 = weight(abstract_txt:related in 4240) [ClassicSimilarity], result of:
            0.024041966 = score(doc=4240,freq=1.0), product of:
              0.09106408 = queryWeight, product of:
                1.2600042 = boost
                4.224184 = idf(docFreq=1720, maxDocs=43254)
                0.017109303 = queryNorm
              0.2640115 = fieldWeight in 4240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.224184 = idf(docFreq=1720, maxDocs=43254)
                0.0625 = fieldNorm(doc=4240)
          0.04603026 = weight(abstract_txt:structure in 4240) [ClassicSimilarity], result of:
            0.04603026 = score(doc=4240,freq=3.0), product of:
              0.09735442 = queryWeight, product of:
                1.3027955 = boost
                4.367643 = idf(docFreq=1490, maxDocs=43254)
                0.017109303 = queryNorm
              0.4728112 = fieldWeight in 4240, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.367643 = idf(docFreq=1490, maxDocs=43254)
                0.0625 = fieldNorm(doc=4240)
          0.10535458 = weight(abstract_txt:derived in 4240) [ClassicSimilarity], result of:
            0.10535458 = score(doc=4240,freq=3.0), product of:
              0.16908133 = queryWeight, product of:
                1.7169049 = boost
                5.755951 = idf(docFreq=371, maxDocs=43254)
                0.017109303 = queryNorm
              0.6231 = fieldWeight in 4240, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.755951 = idf(docFreq=371, maxDocs=43254)
                0.0625 = fieldNorm(doc=4240)
          0.08966864 = weight(abstract_txt:cluster in 4240) [ClassicSimilarity], result of:
            0.08966864 = score(doc=4240,freq=1.0), product of:
              0.21900845 = queryWeight, product of:
                1.9540193 = boost
                6.550881 = idf(docFreq=167, maxDocs=43254)
                0.017109303 = queryNorm
              0.40943006 = fieldWeight in 4240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.550881 = idf(docFreq=167, maxDocs=43254)
                0.0625 = fieldNorm(doc=4240)
          0.034286298 = weight(abstract_txt:from in 4240) [ClassicSimilarity], result of:
            0.034286298 = score(doc=4240,freq=4.0), product of:
              0.0986448 = queryWeight, product of:
                2.073507 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.017109303 = queryNorm
              0.34757328 = fieldWeight in 4240, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.0625 = fieldNorm(doc=4240)
        0.28 = coord(7/25)
    
  4. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.12
    0.12151722 = sum of:
      0.12151722 = product of:
        0.7594826 = sum of:
          0.016347453 = weight(abstract_txt:into in 207) [ClassicSimilarity], result of:
            0.016347453 = score(doc=207,freq=1.0), product of:
              0.070415325 = queryWeight, product of:
                1.1079803 = boost
                3.7145214 = idf(docFreq=2864, maxDocs=43254)
                0.017109303 = queryNorm
              0.23215759 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7145214 = idf(docFreq=2864, maxDocs=43254)
                0.0625 = fieldNorm(doc=207)
          0.017143149 = weight(abstract_txt:from in 207) [ClassicSimilarity], result of:
            0.017143149 = score(doc=207,freq=1.0), product of:
              0.0986448 = queryWeight, product of:
                2.073507 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.017109303 = queryNorm
              0.17378664 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.0625 = fieldNorm(doc=207)
          0.2937288 = weight(abstract_txt:words in 207) [ClassicSimilarity], result of:
            0.2937288 = score(doc=207,freq=9.0), product of:
              0.2925908 = queryWeight, product of:
                3.1940649 = boost
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.017109303 = queryNorm
              1.0038894 = fieldWeight in 207, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.0625 = fieldNorm(doc=207)
          0.4322632 = weight(abstract_txt:character in 207) [ClassicSimilarity], result of:
            0.4322632 = score(doc=207,freq=6.0), product of:
              0.43333676 = queryWeight, product of:
                3.8871043 = boost
                6.5157895 = idf(docFreq=173, maxDocs=43254)
                0.017109303 = queryNorm
              0.99752253 = fieldWeight in 207, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.5157895 = idf(docFreq=173, maxDocs=43254)
                0.0625 = fieldNorm(doc=207)
        0.16 = coord(4/25)
    
  5. Beagle, D.: Visualizing keyword distribution across multidisciplinary c-space (2003) 0.11
    0.108037606 = sum of:
      0.108037606 = product of:
        0.33761752 = sum of:
          0.011559394 = weight(abstract_txt:into in 3203) [ClassicSimilarity], result of:
            0.011559394 = score(doc=3203,freq=2.0), product of:
              0.070415325 = queryWeight, product of:
                1.1079803 = boost
                3.7145214 = idf(docFreq=2864, maxDocs=43254)
                0.017109303 = queryNorm
              0.1641602 = fieldWeight in 3203, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7145214 = idf(docFreq=2864, maxDocs=43254)
                0.03125 = fieldNorm(doc=3203)
          0.03620421 = weight(abstract_txt:diagrams in 3203) [ClassicSimilarity], result of:
            0.03620421 = score(doc=3203,freq=1.0), product of:
              0.15073584 = queryWeight, product of:
                1.1462826 = boost
                7.685861 = idf(docFreq=53, maxDocs=43254)
                0.017109303 = queryNorm
              0.24018316 = fieldWeight in 3203, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.685861 = idf(docFreq=53, maxDocs=43254)
                0.03125 = fieldNorm(doc=3203)
          0.04181395 = weight(abstract_txt:groupings in 3203) [ClassicSimilarity], result of:
            0.04181395 = score(doc=3203,freq=1.0), product of:
              0.16592988 = queryWeight, product of:
                1.202668 = boost
                8.063927 = idf(docFreq=36, maxDocs=43254)
                0.017109303 = queryNorm
              0.2519977 = fieldWeight in 3203, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.063927 = idf(docFreq=36, maxDocs=43254)
                0.03125 = fieldNorm(doc=3203)
          0.012020983 = weight(abstract_txt:related in 3203) [ClassicSimilarity], result of:
            0.012020983 = score(doc=3203,freq=1.0), product of:
              0.09106408 = queryWeight, product of:
                1.2600042 = boost
                4.224184 = idf(docFreq=1720, maxDocs=43254)
                0.017109303 = queryNorm
              0.13200575 = fieldWeight in 3203, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.224184 = idf(docFreq=1720, maxDocs=43254)
                0.03125 = fieldNorm(doc=3203)
          0.07924991 = weight(abstract_txt:titles in 3203) [ClassicSimilarity], result of:
            0.07924991 = score(doc=3203,freq=7.0), product of:
              0.16737361 = queryWeight, product of:
                1.7082126 = boost
                5.72681 = idf(docFreq=382, maxDocs=43254)
                0.017109303 = queryNorm
              0.47349107 = fieldWeight in 3203, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.72681 = idf(docFreq=382, maxDocs=43254)
                0.03125 = fieldNorm(doc=3203)
          0.0634053 = weight(abstract_txt:cluster in 3203) [ClassicSimilarity], result of:
            0.0634053 = score(doc=3203,freq=2.0), product of:
              0.21900845 = queryWeight, product of:
                1.9540193 = boost
                6.550881 = idf(docFreq=167, maxDocs=43254)
                0.017109303 = queryNorm
              0.28951076 = fieldWeight in 3203, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.550881 = idf(docFreq=167, maxDocs=43254)
                0.03125 = fieldNorm(doc=3203)
          0.008571574 = weight(abstract_txt:from in 3203) [ClassicSimilarity], result of:
            0.008571574 = score(doc=3203,freq=1.0), product of:
              0.0986448 = queryWeight, product of:
                2.073507 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.017109303 = queryNorm
              0.08689332 = fieldWeight in 3203, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.03125 = fieldNorm(doc=3203)
          0.0847922 = weight(abstract_txt:words in 3203) [ClassicSimilarity], result of:
            0.0847922 = score(doc=3203,freq=3.0), product of:
              0.2925908 = queryWeight, product of:
                3.1940649 = boost
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.017109303 = queryNorm
              0.2897979 = fieldWeight in 3203, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.03125 = fieldNorm(doc=3203)
        0.32 = coord(8/25)