Document (#19813)

Author
Cheng, K.-H.
Title
Automatic identification for topics of electronic documents
Source
Bulletin of the Library Association of China. 1997, no.59, Dec., S.43-58
Year
1997
Abstract
With the rapid rise in numbers of electronic documents on the Internet, how to effectively assign topics to documents become an important issue. Current research in this area focuses on the behaviour of nouns in documents. Proposes, however, that nouns and verbs together contribute to the process of topic identification. Constructs a mathematical model taking into account the following factors: word importance, word frequency, word co-occurence, and word distance. Preliminary experiments ahow that the performance of the proposed model is equivalent to that of a human being
Footnote
[In Chinesisch]
Theme
Automatisches Indexieren
Internet
Computerlinguistik

Similar documents (author)

  1. Cheng, L.R.L.: Beyond bilingualism : a quest for communicative competence (1996) 5.36
    5.364876 = sum of:
      5.364876 = weight(author_txt:cheng in 6292) [ClassicSimilarity], result of:
        5.364876 = score(doc=6292,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.583802 = idf(docFreq=21, maxDocs=43254)
            0.116498485 = queryNorm
          5.3648763 = fieldWeight in 6292, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.583802 = idf(docFreq=21, maxDocs=43254)
            0.625 = fieldNorm(doc=6292)
    
  2. Cheng, P.T.K.; Wu, A.K.W.: ACS: an automatic classification system (1995) 4.29
    4.2919006 = sum of:
      4.2919006 = weight(author_txt:cheng in 3257) [ClassicSimilarity], result of:
        4.2919006 = score(doc=3257,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.583802 = idf(docFreq=21, maxDocs=43254)
            0.116498485 = queryNorm
          4.291901 = fieldWeight in 3257, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.583802 = idf(docFreq=21, maxDocs=43254)
            0.5 = fieldNorm(doc=3257)
    
  3. Cheng, L.-y.: On bibliographic(al) control (1998) 4.29
    4.2919006 = sum of:
      4.2919006 = weight(author_txt:cheng in 5377) [ClassicSimilarity], result of:
        4.2919006 = score(doc=5377,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.583802 = idf(docFreq=21, maxDocs=43254)
            0.116498485 = queryNorm
          4.291901 = fieldWeight in 5377, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.583802 = idf(docFreq=21, maxDocs=43254)
            0.5 = fieldNorm(doc=5377)
    
  4. Harter, S.P.; Cheng, Y.-R.: Colinked descriptors : improving vocabulary selection for end-user searching (1996) 3.76
    3.7554133 = sum of:
      3.7554133 = weight(author_txt:cheng in 5285) [ClassicSimilarity], result of:
        3.7554133 = score(doc=5285,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.583802 = idf(docFreq=21, maxDocs=43254)
            0.116498485 = queryNorm
          3.7554135 = fieldWeight in 5285, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.583802 = idf(docFreq=21, maxDocs=43254)
            0.4375 = fieldNorm(doc=5285)
    
  5. Cheng, W.-N.; Khoo, C.S.G.: Information and argument structures in Sociology research abstracts (2018) 3.76
    3.7554133 = sum of:
      3.7554133 = weight(author_txt:cheng in 751) [ClassicSimilarity], result of:
        3.7554133 = score(doc=751,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.583802 = idf(docFreq=21, maxDocs=43254)
            0.116498485 = queryNorm
          3.7554135 = fieldWeight in 751, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.583802 = idf(docFreq=21, maxDocs=43254)
            0.4375 = fieldNorm(doc=751)
    

Similar documents (content)

  1. WordNet : an electronic lexical database (language, speech and communication) (1998) 0.20
    0.19666027 = sum of:
      0.19666027 = product of:
        0.9833013 = sum of:
          0.25622997 = weight(abstract_txt:verbs in 4435) [ClassicSimilarity], result of:
            0.25622997 = score(doc=4435,freq=2.0), product of:
              0.2354223 = queryWeight, product of:
                1.508907 = boost
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.019005928 = queryNorm
              1.0883844 = fieldWeight in 4435, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.09375 = fieldNorm(doc=4435)
          0.050648753 = weight(abstract_txt:electronic in 4435) [ClassicSimilarity], result of:
            0.050648753 = score(doc=4435,freq=1.0), product of:
              0.12681194 = queryWeight, product of:
                1.5661514 = boost
                4.260272 = idf(docFreq=1659, maxDocs=43254)
                0.019005928 = queryNorm
              0.3994005 = fieldWeight in 4435, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.260272 = idf(docFreq=1659, maxDocs=43254)
                0.09375 = fieldNorm(doc=4435)
          0.1318586 = weight(abstract_txt:identification in 4435) [ClassicSimilarity], result of:
            0.1318586 = score(doc=4435,freq=1.0), product of:
              0.2399864 = queryWeight, product of:
                2.1545024 = boost
                5.8607144 = idf(docFreq=334, maxDocs=43254)
                0.019005928 = queryNorm
              0.549442 = fieldWeight in 4435, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8607144 = idf(docFreq=334, maxDocs=43254)
                0.09375 = fieldNorm(doc=4435)
          0.33361006 = weight(abstract_txt:nouns in 4435) [ClassicSimilarity], result of:
            0.33361006 = score(doc=4435,freq=1.0), product of:
              0.44559512 = queryWeight, product of:
                2.9357824 = boost
                7.9859657 = idf(docFreq=39, maxDocs=43254)
                0.019005928 = queryNorm
              0.7486843 = fieldWeight in 4435, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9859657 = idf(docFreq=39, maxDocs=43254)
                0.09375 = fieldNorm(doc=4435)
          0.21095389 = weight(abstract_txt:word in 4435) [ClassicSimilarity], result of:
            0.21095389 = score(doc=4435,freq=1.0), product of:
              0.41360202 = queryWeight, product of:
                4.0 = boost
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.019005928 = queryNorm
              0.51004076 = fieldWeight in 4435, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.09375 = fieldNorm(doc=4435)
        0.2 = coord(5/25)
    
  2. Kim, W.; Wilbur, W.J.: Corpus-based statistical screening for content-bearing terms (2001) 0.17
    0.17086634 = sum of:
      0.17086634 = product of:
        0.61023694 = sum of:
          0.048733097 = weight(abstract_txt:frequency in 189) [ClassicSimilarity], result of:
            0.048733097 = score(doc=189,freq=2.0), product of:
              0.12359389 = queryWeight, product of:
                1.0932945 = boost
                5.947997 = idf(docFreq=306, maxDocs=43254)
                0.019005928 = queryNorm
              0.39430022 = fieldWeight in 189, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.947997 = idf(docFreq=306, maxDocs=43254)
                0.046875 = fieldNorm(doc=189)
          0.03492046 = weight(abstract_txt:taking in 189) [ClassicSimilarity], result of:
            0.03492046 = score(doc=189,freq=1.0), product of:
              0.12469364 = queryWeight, product of:
                1.0981479 = boost
                5.9744015 = idf(docFreq=298, maxDocs=43254)
                0.019005928 = queryNorm
              0.28005007 = fieldWeight in 189, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9744015 = idf(docFreq=298, maxDocs=43254)
                0.046875 = fieldNorm(doc=189)
          0.014910616 = weight(abstract_txt:that in 189) [ClassicSimilarity], result of:
            0.014910616 = score(doc=189,freq=5.0), product of:
              0.05963554 = queryWeight, product of:
                1.3153819 = boost
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.019005928 = queryNorm
              0.25002903 = fieldWeight in 189, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.046875 = fieldNorm(doc=189)
          0.06451987 = weight(abstract_txt:assign in 189) [ClassicSimilarity], result of:
            0.06451987 = score(doc=189,freq=1.0), product of:
              0.18775289 = queryWeight, product of:
                1.34751 = boost
                7.3310394 = idf(docFreq=76, maxDocs=43254)
                0.019005928 = queryNorm
              0.34364247 = fieldWeight in 189, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3310394 = idf(docFreq=76, maxDocs=43254)
                0.046875 = fieldNorm(doc=189)
          0.0659293 = weight(abstract_txt:identification in 189) [ClassicSimilarity], result of:
            0.0659293 = score(doc=189,freq=1.0), product of:
              0.2399864 = queryWeight, product of:
                2.1545024 = boost
                5.8607144 = idf(docFreq=334, maxDocs=43254)
                0.019005928 = queryNorm
              0.274721 = fieldWeight in 189, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8607144 = idf(docFreq=334, maxDocs=43254)
                0.046875 = fieldNorm(doc=189)
          0.10215785 = weight(abstract_txt:documents in 189) [ClassicSimilarity], result of:
            0.10215785 = score(doc=189,freq=5.0), product of:
              0.2367748 = queryWeight, product of:
                3.0264702 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.019005928 = queryNorm
              0.43145576 = fieldWeight in 189, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.046875 = fieldNorm(doc=189)
          0.27906576 = weight(abstract_txt:word in 189) [ClassicSimilarity], result of:
            0.27906576 = score(doc=189,freq=7.0), product of:
              0.41360202 = queryWeight, product of:
                4.0 = boost
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.019005928 = queryNorm
              0.67472047 = fieldWeight in 189, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.046875 = fieldNorm(doc=189)
        0.28 = coord(7/25)
    
  3. Dias, G.: Multiword unit hybrid extraction (o.J.) 0.15
    0.14607751 = sum of:
      0.14607751 = product of:
        0.73038757 = sum of:
          0.01571717 = weight(abstract_txt:that in 2108) [ClassicSimilarity], result of:
            0.01571717 = score(doc=2108,freq=2.0), product of:
              0.05963554 = queryWeight, product of:
                1.3153819 = boost
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.019005928 = queryNorm
              0.26355374 = fieldWeight in 2108, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.078125 = fieldNorm(doc=2108)
          0.15098496 = weight(abstract_txt:verbs in 2108) [ClassicSimilarity], result of:
            0.15098496 = score(doc=2108,freq=1.0), product of:
              0.2354223 = queryWeight, product of:
                1.508907 = boost
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.019005928 = queryNorm
              0.6413367 = fieldWeight in 2108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.078125 = fieldNorm(doc=2108)
          0.10988217 = weight(abstract_txt:identification in 2108) [ClassicSimilarity], result of:
            0.10988217 = score(doc=2108,freq=1.0), product of:
              0.2399864 = queryWeight, product of:
                2.1545024 = boost
                5.8607144 = idf(docFreq=334, maxDocs=43254)
                0.019005928 = queryNorm
              0.4578683 = fieldWeight in 2108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8607144 = idf(docFreq=334, maxDocs=43254)
                0.078125 = fieldNorm(doc=2108)
          0.27800837 = weight(abstract_txt:nouns in 2108) [ClassicSimilarity], result of:
            0.27800837 = score(doc=2108,freq=1.0), product of:
              0.44559512 = queryWeight, product of:
                2.9357824 = boost
                7.9859657 = idf(docFreq=39, maxDocs=43254)
                0.019005928 = queryNorm
              0.6239036 = fieldWeight in 2108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9859657 = idf(docFreq=39, maxDocs=43254)
                0.078125 = fieldNorm(doc=2108)
          0.1757949 = weight(abstract_txt:word in 2108) [ClassicSimilarity], result of:
            0.1757949 = score(doc=2108,freq=1.0), product of:
              0.41360202 = queryWeight, product of:
                4.0 = boost
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.019005928 = queryNorm
              0.42503393 = fieldWeight in 2108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.078125 = fieldNorm(doc=2108)
        0.2 = coord(5/25)
    
  4. Green, R.: WordNet (2009) 0.13
    0.13223396 = sum of:
      0.13223396 = product of:
        0.82646227 = sum of:
          0.013336461 = weight(abstract_txt:that in 1161) [ClassicSimilarity], result of:
            0.013336461 = score(doc=1161,freq=1.0), product of:
              0.05963554 = queryWeight, product of:
                1.3153819 = boost
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.019005928 = queryNorm
              0.22363278 = fieldWeight in 1161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.09375 = fieldNorm(doc=1161)
          0.18118194 = weight(abstract_txt:verbs in 1161) [ClassicSimilarity], result of:
            0.18118194 = score(doc=1161,freq=1.0), product of:
              0.2354223 = queryWeight, product of:
                1.508907 = boost
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.019005928 = queryNorm
              0.76960397 = fieldWeight in 1161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.09375 = fieldNorm(doc=1161)
          0.33361006 = weight(abstract_txt:nouns in 1161) [ClassicSimilarity], result of:
            0.33361006 = score(doc=1161,freq=1.0), product of:
              0.44559512 = queryWeight, product of:
                2.9357824 = boost
                7.9859657 = idf(docFreq=39, maxDocs=43254)
                0.019005928 = queryNorm
              0.7486843 = fieldWeight in 1161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9859657 = idf(docFreq=39, maxDocs=43254)
                0.09375 = fieldNorm(doc=1161)
          0.29833382 = weight(abstract_txt:word in 1161) [ClassicSimilarity], result of:
            0.29833382 = score(doc=1161,freq=2.0), product of:
              0.41360202 = queryWeight, product of:
                4.0 = boost
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.019005928 = queryNorm
              0.7213065 = fieldWeight in 1161, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.09375 = fieldNorm(doc=1161)
        0.16 = coord(4/25)
    
  5. Yang, Y.; Wilbur, J.: Using corpus statistics to remove redundant words in text categorization (1996) 0.13
    0.12714408 = sum of:
      0.12714408 = product of:
        0.63572043 = sum of:
          0.055450916 = weight(abstract_txt:numbers in 5268) [ClassicSimilarity], result of:
            0.055450916 = score(doc=5268,freq=1.0), product of:
              0.12073438 = queryWeight, product of:
                1.0805731 = boost
                5.878787 = idf(docFreq=328, maxDocs=43254)
                0.019005928 = queryNorm
              0.45928025 = fieldWeight in 5268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.878787 = idf(docFreq=328, maxDocs=43254)
                0.078125 = fieldNorm(doc=5268)
          0.011113717 = weight(abstract_txt:that in 5268) [ClassicSimilarity], result of:
            0.011113717 = score(doc=5268,freq=1.0), product of:
              0.05963554 = queryWeight, product of:
                1.3153819 = boost
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.019005928 = queryNorm
              0.18636064 = fieldWeight in 5268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.078125 = fieldNorm(doc=5268)
          0.10988217 = weight(abstract_txt:identification in 5268) [ClassicSimilarity], result of:
            0.10988217 = score(doc=5268,freq=1.0), product of:
              0.2399864 = queryWeight, product of:
                2.1545024 = boost
                5.8607144 = idf(docFreq=334, maxDocs=43254)
                0.019005928 = queryNorm
              0.4578683 = fieldWeight in 5268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8607144 = idf(docFreq=334, maxDocs=43254)
                0.078125 = fieldNorm(doc=5268)
          0.10768384 = weight(abstract_txt:documents in 5268) [ClassicSimilarity], result of:
            0.10768384 = score(doc=5268,freq=2.0), product of:
              0.2367748 = queryWeight, product of:
                3.0264702 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.019005928 = queryNorm
              0.45479432 = fieldWeight in 5268, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.078125 = fieldNorm(doc=5268)
          0.3515898 = weight(abstract_txt:word in 5268) [ClassicSimilarity], result of:
            0.3515898 = score(doc=5268,freq=4.0), product of:
              0.41360202 = queryWeight, product of:
                4.0 = boost
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.019005928 = queryNorm
              0.85006785 = fieldWeight in 5268, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.078125 = fieldNorm(doc=5268)
        0.2 = coord(5/25)