Document (#18399)

Author
Adamson, G.W.
Boreham, J.
Title
¬The use of an association measure based on character structure to identify semantically related pairs of words and document titles
Source
Information storage and retrieval. 10(1974), S.253-260
Year
1974
Abstract
An automatic classification technique has been developed, based on the character structure of words. Dice's similarity coefficient is computed from the number of matching diagrams in pairs of character strings, and used to cluster sets of character strings. A sample of words from a chemical data base was chosen to contain certain stems derived from the names of chemical elements. They were successfully clusterd into groups of semantically related words. Each cluster is characterised by the root word from which all its members are derived. A second example of titles from Mathematical Reviews was clustered into well-defined classes, which compare favourably with the subject groupings of Mathematical Reviews
Theme
Automatisches Klassifizieren
Field
Chemie

Similar documents (content)

  1. Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 0.18
    0.17676666 = sum of:
      0.17676666 = product of:
        0.7365278 = sum of:
          0.07470179 = weight(abstract_txt:coefficient in 5226) [ClassicSimilarity], result of:
            0.07470179 = score(doc=5226,freq=1.0), product of:
              0.15392391 = queryWeight, product of:
                1.1613474 = boost
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.017068645 = queryNorm
              0.48531634 = fieldWeight in 5226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.0625 = fieldNorm(doc=5226)
          0.10029298 = weight(abstract_txt:pairs in 5226) [ClassicSimilarity], result of:
            0.10029298 = score(doc=5226,freq=1.0), product of:
              0.23601654 = queryWeight, product of:
                2.0337396 = boost
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.017068645 = queryNorm
              0.42494047 = fieldWeight in 5226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.0625 = fieldNorm(doc=5226)
          0.016843218 = weight(abstract_txt:from in 5226) [ClassicSimilarity], result of:
            0.016843218 = score(doc=5226,freq=1.0), product of:
              0.09750468 = queryWeight, product of:
                2.06684 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.017068645 = queryNorm
              0.17274266 = fieldWeight in 5226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=5226)
          0.12554026 = weight(abstract_txt:strings in 5226) [ClassicSimilarity], result of:
            0.12554026 = score(doc=5226,freq=1.0), product of:
              0.27412635 = queryWeight, product of:
                2.1917927 = boost
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.017068645 = queryNorm
              0.45796496 = fieldWeight in 5226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.0625 = fieldNorm(doc=5226)
          0.1695549 = weight(abstract_txt:words in 5226) [ClassicSimilarity], result of:
            0.1695549 = score(doc=5226,freq=3.0), product of:
              0.29259837 = queryWeight, product of:
                3.2023962 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.017068645 = queryNorm
              0.57948 = fieldWeight in 5226, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=5226)
          0.24959469 = weight(abstract_txt:character in 5226) [ClassicSimilarity], result of:
            0.24959469 = score(doc=5226,freq=2.0), product of:
              0.43343002 = queryWeight, product of:
                3.8976119 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.017068645 = queryNorm
              0.57585925 = fieldWeight in 5226, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.0625 = fieldNorm(doc=5226)
        0.24 = coord(6/25)
    
  2. Spitkovsky, V.; Norvig, P.: From words to concepts and back : dictionaries for linking text, entities and ideas (2012) 0.14
    0.14415152 = sum of:
      0.14415152 = product of:
        0.45047352 = sum of:
          0.017184583 = weight(abstract_txt:into in 337) [ClassicSimilarity], result of:
            0.017184583 = score(doc=337,freq=2.0), product of:
              0.07000632 = queryWeight, product of:
                1.1076249 = boost
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.017068645 = queryNorm
              0.24547188 = fieldWeight in 337, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.046875 = fieldNorm(doc=337)
          0.062035974 = weight(abstract_txt:groupings in 337) [ClassicSimilarity], result of:
            0.062035974 = score(doc=337,freq=1.0), product of:
              0.164743 = queryWeight, product of:
                1.2014691 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.017068645 = queryNorm
              0.37656212 = fieldWeight in 337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.046875 = fieldNorm(doc=337)
          0.017790483 = weight(abstract_txt:related in 337) [ClassicSimilarity], result of:
            0.017790483 = score(doc=337,freq=1.0), product of:
              0.09026369 = queryWeight, product of:
                1.2577103 = boost
                4.2046843 = idf(docFreq=1793, maxDocs=44218)
                0.017068645 = queryNorm
              0.19709457 = fieldWeight in 337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2046843 = idf(docFreq=1793, maxDocs=44218)
                0.046875 = fieldNorm(doc=337)
          0.019808384 = weight(abstract_txt:structure in 337) [ClassicSimilarity], result of:
            0.019808384 = score(doc=337,freq=1.0), product of:
              0.09696625 = queryWeight, product of:
                1.3035702 = boost
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.017068645 = queryNorm
              0.20428121 = fieldWeight in 337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.046875 = fieldNorm(doc=337)
          0.044861782 = weight(abstract_txt:titles in 337) [ClassicSimilarity], result of:
            0.044861782 = score(doc=337,freq=1.0), product of:
              0.16722669 = queryWeight, product of:
                1.7118942 = boost
                5.723078 = idf(docFreq=392, maxDocs=44218)
                0.017068645 = queryNorm
              0.26826927 = fieldWeight in 337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.723078 = idf(docFreq=392, maxDocs=44218)
                0.046875 = fieldNorm(doc=337)
          0.021879982 = weight(abstract_txt:from in 337) [ClassicSimilarity], result of:
            0.021879982 = score(doc=337,freq=3.0), product of:
              0.09750468 = queryWeight, product of:
                2.06684 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.017068645 = queryNorm
              0.2243993 = fieldWeight in 337, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.046875 = fieldNorm(doc=337)
          0.16308157 = weight(abstract_txt:strings in 337) [ClassicSimilarity], result of:
            0.16308157 = score(doc=337,freq=3.0), product of:
              0.27412635 = queryWeight, product of:
                2.1917927 = boost
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.017068645 = queryNorm
              0.5949139 = fieldWeight in 337, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.046875 = fieldNorm(doc=337)
          0.10383075 = weight(abstract_txt:words in 337) [ClassicSimilarity], result of:
            0.10383075 = score(doc=337,freq=2.0), product of:
              0.29259837 = queryWeight, product of:
                3.2023962 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.017068645 = queryNorm
              0.35485756 = fieldWeight in 337, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.046875 = fieldNorm(doc=337)
        0.32 = coord(8/25)
    
  3. Chen, T.T.: ¬The congruity between linkage-based factors and content-based clusters : an experimental study using multiple document corpora (2016) 0.13
    0.13451022 = sum of:
      0.13451022 = product of:
        0.48039362 = sum of:
          0.10564428 = weight(abstract_txt:coefficient in 2775) [ClassicSimilarity], result of:
            0.10564428 = score(doc=2775,freq=2.0), product of:
              0.15392391 = queryWeight, product of:
                1.1613474 = boost
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.017068645 = queryNorm
              0.6863409 = fieldWeight in 2775, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
          0.07708397 = weight(abstract_txt:clustered in 2775) [ClassicSimilarity], result of:
            0.07708397 = score(doc=2775,freq=1.0), product of:
              0.1571791 = queryWeight, product of:
                1.1735632 = boost
                7.84674 = idf(docFreq=46, maxDocs=44218)
                0.017068645 = queryNorm
              0.49042124 = fieldWeight in 2775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.84674 = idf(docFreq=46, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
          0.023720644 = weight(abstract_txt:related in 2775) [ClassicSimilarity], result of:
            0.023720644 = score(doc=2775,freq=1.0), product of:
              0.09026369 = queryWeight, product of:
                1.2577103 = boost
                4.2046843 = idf(docFreq=1793, maxDocs=44218)
                0.017068645 = queryNorm
              0.26279277 = fieldWeight in 2775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2046843 = idf(docFreq=1793, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
          0.045745503 = weight(abstract_txt:structure in 2775) [ClassicSimilarity], result of:
            0.045745503 = score(doc=2775,freq=3.0), product of:
              0.09696625 = queryWeight, product of:
                1.3035702 = boost
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.017068645 = queryNorm
              0.47176725 = fieldWeight in 2775, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
          0.10486712 = weight(abstract_txt:derived in 2775) [ClassicSimilarity], result of:
            0.10486712 = score(doc=2775,freq=3.0), product of:
              0.1685833 = queryWeight, product of:
                1.7188239 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.017068645 = queryNorm
              0.6220493 = fieldWeight in 2775, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
          0.08964569 = weight(abstract_txt:cluster in 2775) [ClassicSimilarity], result of:
            0.08964569 = score(doc=2775,freq=1.0), product of:
              0.21900214 = queryWeight, product of:
                1.9590625 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.017068645 = queryNorm
              0.40933704 = fieldWeight in 2775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
          0.033686437 = weight(abstract_txt:from in 2775) [ClassicSimilarity], result of:
            0.033686437 = score(doc=2775,freq=4.0), product of:
              0.09750468 = queryWeight, product of:
                2.06684 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.017068645 = queryNorm
              0.34548533 = fieldWeight in 2775, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=2775)
        0.28 = coord(7/25)
    
  4. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.12
    0.12144535 = sum of:
      0.12144535 = product of:
        0.75903344 = sum of:
          0.01620178 = weight(abstract_txt:into in 5206) [ClassicSimilarity], result of:
            0.01620178 = score(doc=5206,freq=1.0), product of:
              0.07000632 = queryWeight, product of:
                1.1076249 = boost
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.017068645 = queryNorm
              0.23143311 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.016843218 = weight(abstract_txt:from in 5206) [ClassicSimilarity], result of:
            0.016843218 = score(doc=5206,freq=1.0), product of:
              0.09750468 = queryWeight, product of:
                2.06684 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.017068645 = queryNorm
              0.17274266 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.29367772 = weight(abstract_txt:words in 5206) [ClassicSimilarity], result of:
            0.29367772 = score(doc=5206,freq=9.0), product of:
              0.29259837 = queryWeight, product of:
                3.2023962 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.017068645 = queryNorm
              1.0036888 = fieldWeight in 5206, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.4323107 = weight(abstract_txt:character in 5206) [ClassicSimilarity], result of:
            0.4323107 = score(doc=5206,freq=6.0), product of:
              0.43343002 = queryWeight, product of:
                3.8976119 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.017068645 = queryNorm
              0.9974175 = fieldWeight in 5206, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
        0.16 = coord(4/25)
    
  5. Beagle, D.: Visualizing keyword distribution across multidisciplinary c-space (2003) 0.11
    0.10765067 = sum of:
      0.10765067 = product of:
        0.33640835 = sum of:
          0.011456388 = weight(abstract_txt:into in 1202) [ClassicSimilarity], result of:
            0.011456388 = score(doc=1202,freq=2.0), product of:
              0.07000632 = queryWeight, product of:
                1.1076249 = boost
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.017068645 = queryNorm
              0.16364792 = fieldWeight in 1202, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.03125 = fieldNorm(doc=1202)
          0.03601747 = weight(abstract_txt:diagrams in 1202) [ClassicSimilarity], result of:
            0.03601747 = score(doc=1202,freq=1.0), product of:
              0.15023838 = queryWeight, product of:
                1.1473596 = boost
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.017068645 = queryNorm
              0.23973548 = fieldWeight in 1202, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.03125 = fieldNorm(doc=1202)
          0.041357316 = weight(abstract_txt:groupings in 1202) [ClassicSimilarity], result of:
            0.041357316 = score(doc=1202,freq=1.0), product of:
              0.164743 = queryWeight, product of:
                1.2014691 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.017068645 = queryNorm
              0.2510414 = fieldWeight in 1202, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.03125 = fieldNorm(doc=1202)
          0.011860322 = weight(abstract_txt:related in 1202) [ClassicSimilarity], result of:
            0.011860322 = score(doc=1202,freq=1.0), product of:
              0.09026369 = queryWeight, product of:
                1.2577103 = boost
                4.2046843 = idf(docFreq=1793, maxDocs=44218)
                0.017068645 = queryNorm
              0.13139638 = fieldWeight in 1202, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2046843 = idf(docFreq=1793, maxDocs=44218)
                0.03125 = fieldNorm(doc=1202)
          0.07912874 = weight(abstract_txt:titles in 1202) [ClassicSimilarity], result of:
            0.07912874 = score(doc=1202,freq=7.0), product of:
              0.16722669 = queryWeight, product of:
                1.7118942 = boost
                5.723078 = idf(docFreq=392, maxDocs=44218)
                0.017068645 = queryNorm
              0.4731825 = fieldWeight in 1202, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.723078 = idf(docFreq=392, maxDocs=44218)
                0.03125 = fieldNorm(doc=1202)
          0.06338907 = weight(abstract_txt:cluster in 1202) [ClassicSimilarity], result of:
            0.06338907 = score(doc=1202,freq=2.0), product of:
              0.21900214 = queryWeight, product of:
                1.9590625 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.017068645 = queryNorm
              0.28944498 = fieldWeight in 1202, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.03125 = fieldNorm(doc=1202)
          0.008421609 = weight(abstract_txt:from in 1202) [ClassicSimilarity], result of:
            0.008421609 = score(doc=1202,freq=1.0), product of:
              0.09750468 = queryWeight, product of:
                2.06684 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.017068645 = queryNorm
              0.08637133 = fieldWeight in 1202, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.03125 = fieldNorm(doc=1202)
          0.08477745 = weight(abstract_txt:words in 1202) [ClassicSimilarity], result of:
            0.08477745 = score(doc=1202,freq=3.0), product of:
              0.29259837 = queryWeight, product of:
                3.2023962 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.017068645 = queryNorm
              0.28974 = fieldWeight in 1202, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.03125 = fieldNorm(doc=1202)
        0.32 = coord(8/25)