Document (#19812)

Author
Cheng, K.-H.
Title
Automatic identification for topics of electronic documents
Source
Bulletin of the Library Association of China. 1997, no.59, Dec., S.43-58
Year
1997
Abstract
With the rapid rise in numbers of electronic documents on the Internet, how to effectively assign topics to documents become an important issue. Current research in this area focuses on the behaviour of nouns in documents. Proposes, however, that nouns and verbs together contribute to the process of topic identification. Constructs a mathematical model taking into account the following factors: word importance, word frequency, word co-occurence, and word distance. Preliminary experiments ahow that the performance of the proposed model is equivalent to that of a human being
Footnote
[In Chinesisch]
Theme
Automatisches Indexieren
Internet
Computerlinguistik

Similar documents (author)

  1. Cheng, L.R.L.: Beyond bilingualism : a quest for communicative competence (1996) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:cheng in 5223) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 5223, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=5223)
    
  2. Cheng, P.T.K.; Wu, A.K.W.: ACS: an automatic classification system (1995) 4.16
    4.164796 = sum of:
      4.164796 = weight(author_txt:cheng in 2188) [ClassicSimilarity], result of:
        4.164796 = fieldWeight in 2188, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.5 = fieldNorm(doc=2188)
    
  3. Cheng, L.-y.: On bibliographic(al) control (1998) 4.16
    4.164796 = sum of:
      4.164796 = weight(author_txt:cheng in 3376) [ClassicSimilarity], result of:
        4.164796 = fieldWeight in 3376, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.5 = fieldNorm(doc=3376)
    
  4. Harter, S.P.; Cheng, Y.-R.: Colinked descriptors : improving vocabulary selection for end-user searching (1996) 3.64
    3.6441965 = sum of:
      3.6441965 = weight(author_txt:cheng in 4216) [ClassicSimilarity], result of:
        3.6441965 = fieldWeight in 4216, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.4375 = fieldNorm(doc=4216)
    
  5. Cheng, W.-N.; Khoo, C.S.G.: Information and argument structures in Sociology research abstracts (2018) 3.64
    3.6441965 = sum of:
      3.6441965 = weight(author_txt:cheng in 4750) [ClassicSimilarity], result of:
        3.6441965 = fieldWeight in 4750, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.4375 = fieldNorm(doc=4750)
    

Similar documents (content)

  1. WordNet : an electronic lexical database (language, speech and communication) (1998) 0.20
    0.19741756 = sum of:
      0.19741756 = product of:
        0.9870878 = sum of:
          0.2582584 = weight(abstract_txt:verbs in 2434) [ClassicSimilarity], result of:
            0.2582584 = score(doc=2434,freq=2.0), product of:
              0.2366506 = queryWeight, product of:
                1.5164827 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.018958744 = queryNorm
              1.0913068 = fieldWeight in 2434, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.09375 = fieldNorm(doc=2434)
          0.05140911 = weight(abstract_txt:electronic in 2434) [ClassicSimilarity], result of:
            0.05140911 = score(doc=2434,freq=1.0), product of:
              0.12807116 = queryWeight, product of:
                1.5776997 = boost
                4.281712 = idf(docFreq=1660, maxDocs=44218)
                0.018958744 = queryNorm
              0.40141052 = fieldWeight in 2434, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.281712 = idf(docFreq=1660, maxDocs=44218)
                0.09375 = fieldNorm(doc=2434)
          0.1307588 = weight(abstract_txt:identification in 2434) [ClassicSimilarity], result of:
            0.1307588 = score(doc=2434,freq=1.0), product of:
              0.23863745 = queryWeight, product of:
                2.1536145 = boost
                5.8446846 = idf(docFreq=347, maxDocs=44218)
                0.018958744 = queryNorm
              0.5479392 = fieldWeight in 2434, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8446846 = idf(docFreq=347, maxDocs=44218)
                0.09375 = fieldNorm(doc=2434)
          0.3363266 = weight(abstract_txt:nouns in 2434) [ClassicSimilarity], result of:
            0.3363266 = score(doc=2434,freq=1.0), product of:
              0.44798702 = queryWeight, product of:
                2.950743 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.018958744 = queryNorm
              0.7507508 = fieldWeight in 2434, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.09375 = fieldNorm(doc=2434)
          0.21033484 = weight(abstract_txt:word in 2434) [ClassicSimilarity], result of:
            0.21033484 = score(doc=2434,freq=1.0), product of:
              0.41277063 = queryWeight, product of:
                4.0056043 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.018958744 = queryNorm
              0.50956833 = fieldWeight in 2434, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.09375 = fieldNorm(doc=2434)
        0.2 = coord(5/25)
    
  2. Kim, W.; Wilbur, W.J.: Corpus-based statistical screening for content-bearing terms (2001) 0.17
    0.17031598 = sum of:
      0.17031598 = product of:
        0.60827136 = sum of:
          0.048712987 = weight(abstract_txt:frequency in 5188) [ClassicSimilarity], result of:
            0.048712987 = score(doc=5188,freq=2.0), product of:
              0.12355333 = queryWeight, product of:
                1.0957485 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.018958744 = queryNorm
              0.3942669 = fieldWeight in 5188, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.03472493 = weight(abstract_txt:taking in 5188) [ClassicSimilarity], result of:
            0.03472493 = score(doc=5188,freq=1.0), product of:
              0.12422115 = queryWeight, product of:
                1.0987059 = boost
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.018958744 = queryNorm
              0.2795412 = fieldWeight in 5188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.014611353 = weight(abstract_txt:that in 5188) [ClassicSimilarity], result of:
            0.014611353 = score(doc=5188,freq=5.0), product of:
              0.05883178 = queryWeight, product of:
                1.309635 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.018958744 = queryNorm
              0.24835816 = fieldWeight in 5188, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.06408346 = weight(abstract_txt:assign in 5188) [ClassicSimilarity], result of:
            0.06408346 = score(doc=5188,freq=1.0), product of:
              0.18689539 = queryWeight, product of:
                1.347668 = boost
                7.314861 = idf(docFreq=79, maxDocs=44218)
                0.018958744 = queryNorm
              0.3428841 = fieldWeight in 5188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.314861 = idf(docFreq=79, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.0653794 = weight(abstract_txt:identification in 5188) [ClassicSimilarity], result of:
            0.0653794 = score(doc=5188,freq=1.0), product of:
              0.23863745 = queryWeight, product of:
                2.1536145 = boost
                5.8446846 = idf(docFreq=347, maxDocs=44218)
                0.018958744 = queryNorm
              0.2739696 = fieldWeight in 5188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8446846 = idf(docFreq=347, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.10251236 = weight(abstract_txt:documents in 5188) [ClassicSimilarity], result of:
            0.10251236 = score(doc=5188,freq=5.0), product of:
              0.23730966 = queryWeight, product of:
                3.037186 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.018958744 = queryNorm
              0.43197718 = fieldWeight in 5188, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.27824682 = weight(abstract_txt:word in 5188) [ClassicSimilarity], result of:
            0.27824682 = score(doc=5188,freq=7.0), product of:
              0.41277063 = queryWeight, product of:
                4.0056043 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.018958744 = queryNorm
              0.6740955 = fieldWeight in 5188, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
        0.28 = coord(7/25)
    
  3. Dias, G.: Multiword unit hybrid extraction (o.J.) 0.15
    0.14641976 = sum of:
      0.14641976 = product of:
        0.7320988 = sum of:
          0.015401718 = weight(abstract_txt:that in 643) [ClassicSimilarity], result of:
            0.015401718 = score(doc=643,freq=2.0), product of:
              0.05883178 = queryWeight, product of:
                1.309635 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.018958744 = queryNorm
              0.26179248 = fieldWeight in 643, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.15218022 = weight(abstract_txt:verbs in 643) [ClassicSimilarity], result of:
            0.15218022 = score(doc=643,freq=1.0), product of:
              0.2366506 = queryWeight, product of:
                1.5164827 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.018958744 = queryNorm
              0.6430587 = fieldWeight in 643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.10896567 = weight(abstract_txt:identification in 643) [ClassicSimilarity], result of:
            0.10896567 = score(doc=643,freq=1.0), product of:
              0.23863745 = queryWeight, product of:
                2.1536145 = boost
                5.8446846 = idf(docFreq=347, maxDocs=44218)
                0.018958744 = queryNorm
              0.45661598 = fieldWeight in 643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8446846 = idf(docFreq=347, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.28027216 = weight(abstract_txt:nouns in 643) [ClassicSimilarity], result of:
            0.28027216 = score(doc=643,freq=1.0), product of:
              0.44798702 = queryWeight, product of:
                2.950743 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.018958744 = queryNorm
              0.6256256 = fieldWeight in 643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.17527904 = weight(abstract_txt:word in 643) [ClassicSimilarity], result of:
            0.17527904 = score(doc=643,freq=1.0), product of:
              0.41277063 = queryWeight, product of:
                4.0056043 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.018958744 = queryNorm
              0.4246403 = fieldWeight in 643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
        0.2 = coord(5/25)
    
  4. Green, R.: WordNet (2009) 0.13
    0.13271521 = sum of:
      0.13271521 = product of:
        0.82947004 = sum of:
          0.013068791 = weight(abstract_txt:that in 4696) [ClassicSimilarity], result of:
            0.013068791 = score(doc=4696,freq=1.0), product of:
              0.05883178 = queryWeight, product of:
                1.309635 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.018958744 = queryNorm
              0.22213829 = fieldWeight in 4696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.09375 = fieldNorm(doc=4696)
          0.18261628 = weight(abstract_txt:verbs in 4696) [ClassicSimilarity], result of:
            0.18261628 = score(doc=4696,freq=1.0), product of:
              0.2366506 = queryWeight, product of:
                1.5164827 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.018958744 = queryNorm
              0.77167046 = fieldWeight in 4696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.09375 = fieldNorm(doc=4696)
          0.3363266 = weight(abstract_txt:nouns in 4696) [ClassicSimilarity], result of:
            0.3363266 = score(doc=4696,freq=1.0), product of:
              0.44798702 = queryWeight, product of:
                2.950743 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.018958744 = queryNorm
              0.7507508 = fieldWeight in 4696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.09375 = fieldNorm(doc=4696)
          0.29745838 = weight(abstract_txt:word in 4696) [ClassicSimilarity], result of:
            0.29745838 = score(doc=4696,freq=2.0), product of:
              0.41277063 = queryWeight, product of:
                4.0056043 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.018958744 = queryNorm
              0.72063845 = fieldWeight in 4696, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.09375 = fieldNorm(doc=4696)
        0.16 = coord(4/25)
    
  5. Yang, Y.; Wilbur, J.: Using corpus statistics to remove redundant words in text categorization (1996) 0.13
    0.12680529 = sum of:
      0.12680529 = product of:
        0.6340264 = sum of:
          0.055554487 = weight(abstract_txt:numbers in 4199) [ClassicSimilarity], result of:
            0.055554487 = score(doc=4199,freq=1.0), product of:
              0.120878264 = queryWeight, product of:
                1.0838215 = boost
                5.8827567 = idf(docFreq=334, maxDocs=44218)
                0.018958744 = queryNorm
              0.45959038 = fieldWeight in 4199, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8827567 = idf(docFreq=334, maxDocs=44218)
                0.078125 = fieldNorm(doc=4199)
          0.01089066 = weight(abstract_txt:that in 4199) [ClassicSimilarity], result of:
            0.01089066 = score(doc=4199,freq=1.0), product of:
              0.05883178 = queryWeight, product of:
                1.309635 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.018958744 = queryNorm
              0.18511525 = fieldWeight in 4199, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=4199)
          0.10896567 = weight(abstract_txt:identification in 4199) [ClassicSimilarity], result of:
            0.10896567 = score(doc=4199,freq=1.0), product of:
              0.23863745 = queryWeight, product of:
                2.1536145 = boost
                5.8446846 = idf(docFreq=347, maxDocs=44218)
                0.018958744 = queryNorm
              0.45661598 = fieldWeight in 4199, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8446846 = idf(docFreq=347, maxDocs=44218)
                0.078125 = fieldNorm(doc=4199)
          0.10805751 = weight(abstract_txt:documents in 4199) [ClassicSimilarity], result of:
            0.10805751 = score(doc=4199,freq=2.0), product of:
              0.23730966 = queryWeight, product of:
                3.037186 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.018958744 = queryNorm
              0.4553439 = fieldWeight in 4199, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=4199)
          0.35055807 = weight(abstract_txt:word in 4199) [ClassicSimilarity], result of:
            0.35055807 = score(doc=4199,freq=4.0), product of:
              0.41277063 = queryWeight, product of:
                4.0056043 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.018958744 = queryNorm
              0.8492806 = fieldWeight in 4199, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=4199)
        0.2 = coord(5/25)