Document (#19813)

Author
Cheng, K.-H.
Title
Automatic identification for topics of electronic documents
Source
Bulletin of the Library Association of China. 1997, no.59, Dec., S.43-58
Year
1997
Abstract
With the rapid rise in numbers of electronic documents on the Internet, how to effectively assign topics to documents become an important issue. Current research in this area focuses on the behaviour of nouns in documents. Proposes, however, that nouns and verbs together contribute to the process of topic identification. Constructs a mathematical model taking into account the following factors: word importance, word frequency, word co-occurence, and word distance. Preliminary experiments ahow that the performance of the proposed model is equivalent to that of a human being
Footnote
[In Chinesisch]
Theme
Automatisches Indexieren
Internet
Computerlinguistik

Similar documents (author)

  1. Cheng, L.R.L.: Beyond bilingualism : a quest for communicative competence (1996) 5.38
    5.380101 = sum of:
      5.380101 = weight(author_txt:cheng in 5292) [ClassicSimilarity], result of:
        5.380101 = fieldWeight in 5292, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.608162 = idf(docFreq=20, maxDocs=42306)
          0.625 = fieldNorm(doc=5292)
    
  2. Cheng, P.T.K.; Wu, A.K.W.: ACS: an automatic classification system (1995) 4.30
    4.304081 = sum of:
      4.304081 = weight(author_txt:cheng in 2257) [ClassicSimilarity], result of:
        4.304081 = fieldWeight in 2257, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.608162 = idf(docFreq=20, maxDocs=42306)
          0.5 = fieldNorm(doc=2257)
    
  3. Cheng, L.-y.: On bibliographic(al) control (1998) 4.30
    4.304081 = sum of:
      4.304081 = weight(author_txt:cheng in 4377) [ClassicSimilarity], result of:
        4.304081 = fieldWeight in 4377, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.608162 = idf(docFreq=20, maxDocs=42306)
          0.5 = fieldNorm(doc=4377)
    
  4. Harter, S.P.; Cheng, Y.-R.: Colinked descriptors : improving vocabulary selection for end-user searching (1996) 3.77
    3.7660708 = sum of:
      3.7660708 = weight(author_txt:cheng in 4285) [ClassicSimilarity], result of:
        3.7660708 = fieldWeight in 4285, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.608162 = idf(docFreq=20, maxDocs=42306)
          0.4375 = fieldNorm(doc=4285)
    
  5. Cheng, W.-N.; Khoo, C.S.G.: Information and argument structures in Sociology research abstracts (2018) 3.77
    3.7660708 = sum of:
      3.7660708 = weight(author_txt:cheng in 1669) [ClassicSimilarity], result of:
        3.7660708 = fieldWeight in 1669, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.608162 = idf(docFreq=20, maxDocs=42306)
          0.4375 = fieldNorm(doc=1669)
    

Similar documents (content)

  1. WordNet : an electronic lexical database (language, speech and communication) (1998) 0.20
    0.19586267 = sum of:
      0.19586267 = product of:
        0.9793133 = sum of:
          0.25371003 = weight(abstract_txt:verbs in 3435) [ClassicSimilarity], result of:
            0.25371003 = score(doc=3435,freq=2.0), product of:
              0.23373803 = queryWeight, product of:
                1.4976648 = boost
                8.186948 = idf(docFreq=31, maxDocs=42306)
                0.019063065 = queryNorm
              1.0854461 = fieldWeight in 3435, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.186948 = idf(docFreq=31, maxDocs=42306)
                0.09375 = fieldNorm(doc=3435)
          0.049859174 = weight(abstract_txt:electronic in 3435) [ClassicSimilarity], result of:
            0.049859174 = score(doc=3435,freq=1.0), product of:
              0.1254164 = queryWeight, product of:
                1.5514654 = boost
                4.240524 = idf(docFreq=1655, maxDocs=42306)
                0.019063065 = queryNorm
              0.3975491 = fieldWeight in 3435, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.240524 = idf(docFreq=1655, maxDocs=42306)
                0.09375 = fieldNorm(doc=3435)
          0.13259198 = weight(abstract_txt:identification in 3435) [ClassicSimilarity], result of:
            0.13259198 = score(doc=3435,freq=1.0), product of:
              0.24073307 = queryWeight, product of:
                2.149477 = boost
                5.875032 = idf(docFreq=322, maxDocs=42306)
                0.019063065 = queryNorm
              0.55078423 = fieldWeight in 3435, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.875032 = idf(docFreq=322, maxDocs=42306)
                0.09375 = fieldNorm(doc=3435)
          0.33025426 = weight(abstract_txt:nouns in 3435) [ClassicSimilarity], result of:
            0.33025426 = score(doc=3435,freq=1.0), product of:
              0.4423403 = queryWeight, product of:
                2.913689 = boost
                7.9638047 = idf(docFreq=39, maxDocs=42306)
                0.019063065 = queryNorm
              0.7466067 = fieldWeight in 3435, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9638047 = idf(docFreq=39, maxDocs=42306)
                0.09375 = fieldNorm(doc=3435)
          0.21289788 = weight(abstract_txt:word in 3435) [ClassicSimilarity], result of:
            0.21289788 = score(doc=3435,freq=1.0), product of:
              0.4158932 = queryWeight, product of:
                3.9954972 = boost
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.019063065 = queryNorm
              0.5119052 = fieldWeight in 3435, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.09375 = fieldNorm(doc=3435)
        0.2 = coord(5/25)
    
  2. Kim, W.; Wilbur, W.J.: Corpus-based statistical screening for content-bearing terms (2001) 0.17
    0.17211653 = sum of:
      0.17211653 = product of:
        0.6147019 = sum of:
          0.048916034 = weight(abstract_txt:frequency in 189) [ClassicSimilarity], result of:
            0.048916034 = score(doc=189,freq=2.0), product of:
              0.123829775 = queryWeight, product of:
                1.0900903 = boost
                5.958952 = idf(docFreq=296, maxDocs=42306)
                0.019063065 = queryNorm
              0.39502645 = fieldWeight in 189, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.958952 = idf(docFreq=296, maxDocs=42306)
                0.046875 = fieldNorm(doc=189)
          0.03506653 = weight(abstract_txt:taking in 189) [ClassicSimilarity], result of:
            0.03506653 = score(doc=189,freq=1.0), product of:
              0.124967225 = queryWeight, product of:
                1.0950854 = boost
                5.9862576 = idf(docFreq=288, maxDocs=42306)
                0.019063065 = queryNorm
              0.28060582 = fieldWeight in 189, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9862576 = idf(docFreq=288, maxDocs=42306)
                0.046875 = fieldNorm(doc=189)
          0.015251053 = weight(abstract_txt:that in 189) [ClassicSimilarity], result of:
            0.015251053 = score(doc=189,freq=5.0), product of:
              0.060504064 = queryWeight, product of:
                1.3197839 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.019063065 = queryNorm
              0.25206658 = fieldWeight in 189, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.046875 = fieldNorm(doc=189)
          0.06559829 = weight(abstract_txt:assign in 189) [ClassicSimilarity], result of:
            0.06559829 = score(doc=189,freq=1.0), product of:
              0.18972705 = queryWeight, product of:
                1.3493187 = boost
                7.376018 = idf(docFreq=71, maxDocs=42306)
                0.019063065 = queryNorm
              0.34575084 = fieldWeight in 189, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.376018 = idf(docFreq=71, maxDocs=42306)
                0.046875 = fieldNorm(doc=189)
          0.06629599 = weight(abstract_txt:identification in 189) [ClassicSimilarity], result of:
            0.06629599 = score(doc=189,freq=1.0), product of:
              0.24073307 = queryWeight, product of:
                2.149477 = boost
                5.875032 = idf(docFreq=322, maxDocs=42306)
                0.019063065 = queryNorm
              0.27539212 = fieldWeight in 189, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.875032 = idf(docFreq=322, maxDocs=42306)
                0.046875 = fieldNorm(doc=189)
          0.10193663 = weight(abstract_txt:documents in 189) [ClassicSimilarity], result of:
            0.10193663 = score(doc=189,freq=5.0), product of:
              0.23629314 = queryWeight, product of:
                3.0116568 = boost
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.019063065 = queryNorm
              0.43139905 = fieldWeight in 189, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.046875 = fieldNorm(doc=189)
          0.2816374 = weight(abstract_txt:word in 189) [ClassicSimilarity], result of:
            0.2816374 = score(doc=189,freq=7.0), product of:
              0.4158932 = queryWeight, product of:
                3.9954972 = boost
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.019063065 = queryNorm
              0.67718685 = fieldWeight in 189, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.046875 = fieldNorm(doc=189)
        0.28 = coord(7/25)
    
  3. Dias, G.: Multiword unit hybrid extraction (o.J.) 0.15
    0.14573924 = sum of:
      0.14573924 = product of:
        0.7286962 = sum of:
          0.01607602 = weight(abstract_txt:that in 2644) [ClassicSimilarity], result of:
            0.01607602 = score(doc=2644,freq=2.0), product of:
              0.060504064 = queryWeight, product of:
                1.3197839 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.019063065 = queryNorm
              0.2657015 = fieldWeight in 2644, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.078125 = fieldNorm(doc=2644)
          0.14950009 = weight(abstract_txt:verbs in 2644) [ClassicSimilarity], result of:
            0.14950009 = score(doc=2644,freq=1.0), product of:
              0.23373803 = queryWeight, product of:
                1.4976648 = boost
                8.186948 = idf(docFreq=31, maxDocs=42306)
                0.019063065 = queryNorm
              0.6396053 = fieldWeight in 2644, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.186948 = idf(docFreq=31, maxDocs=42306)
                0.078125 = fieldNorm(doc=2644)
          0.110493325 = weight(abstract_txt:identification in 2644) [ClassicSimilarity], result of:
            0.110493325 = score(doc=2644,freq=1.0), product of:
              0.24073307 = queryWeight, product of:
                2.149477 = boost
                5.875032 = idf(docFreq=322, maxDocs=42306)
                0.019063065 = queryNorm
              0.45898688 = fieldWeight in 2644, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.875032 = idf(docFreq=322, maxDocs=42306)
                0.078125 = fieldNorm(doc=2644)
          0.27521187 = weight(abstract_txt:nouns in 2644) [ClassicSimilarity], result of:
            0.27521187 = score(doc=2644,freq=1.0), product of:
              0.4423403 = queryWeight, product of:
                2.913689 = boost
                7.9638047 = idf(docFreq=39, maxDocs=42306)
                0.019063065 = queryNorm
              0.62217224 = fieldWeight in 2644, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9638047 = idf(docFreq=39, maxDocs=42306)
                0.078125 = fieldNorm(doc=2644)
          0.1774149 = weight(abstract_txt:word in 2644) [ClassicSimilarity], result of:
            0.1774149 = score(doc=2644,freq=1.0), product of:
              0.4158932 = queryWeight, product of:
                3.9954972 = boost
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.019063065 = queryNorm
              0.42658764 = fieldWeight in 2644, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.078125 = fieldNorm(doc=2644)
        0.2 = coord(5/25)
    
  4. Green, R.: WordNet (2009) 0.13
    0.13190053 = sum of:
      0.13190053 = product of:
        0.8243784 = sum of:
          0.013640955 = weight(abstract_txt:that in 1697) [ClassicSimilarity], result of:
            0.013640955 = score(doc=1697,freq=1.0), product of:
              0.060504064 = queryWeight, product of:
                1.3197839 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.019063065 = queryNorm
              0.2254552 = fieldWeight in 1697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.09375 = fieldNorm(doc=1697)
          0.17940012 = weight(abstract_txt:verbs in 1697) [ClassicSimilarity], result of:
            0.17940012 = score(doc=1697,freq=1.0), product of:
              0.23373803 = queryWeight, product of:
                1.4976648 = boost
                8.186948 = idf(docFreq=31, maxDocs=42306)
                0.019063065 = queryNorm
              0.7675264 = fieldWeight in 1697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.186948 = idf(docFreq=31, maxDocs=42306)
                0.09375 = fieldNorm(doc=1697)
          0.33025426 = weight(abstract_txt:nouns in 1697) [ClassicSimilarity], result of:
            0.33025426 = score(doc=1697,freq=1.0), product of:
              0.4423403 = queryWeight, product of:
                2.913689 = boost
                7.9638047 = idf(docFreq=39, maxDocs=42306)
                0.019063065 = queryNorm
              0.7466067 = fieldWeight in 1697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9638047 = idf(docFreq=39, maxDocs=42306)
                0.09375 = fieldNorm(doc=1697)
          0.30108306 = weight(abstract_txt:word in 1697) [ClassicSimilarity], result of:
            0.30108306 = score(doc=1697,freq=2.0), product of:
              0.4158932 = queryWeight, product of:
                3.9954972 = boost
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.019063065 = queryNorm
              0.72394323 = fieldWeight in 1697, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.09375 = fieldNorm(doc=1697)
        0.16 = coord(4/25)
    
  5. Yang, Y.; Wilbur, J.: Using corpus statistics to remove redundant words in text categorization (1996) 0.13
    0.1278255 = sum of:
      0.1278255 = product of:
        0.6391275 = sum of:
          0.05498627 = weight(abstract_txt:numbers in 4268) [ClassicSimilarity], result of:
            0.05498627 = score(doc=4268,freq=1.0), product of:
              0.11998803 = queryWeight, product of:
                1.0730474 = boost
                5.865787 = idf(docFreq=325, maxDocs=42306)
                0.019063065 = queryNorm
              0.45826462 = fieldWeight in 4268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.865787 = idf(docFreq=325, maxDocs=42306)
                0.078125 = fieldNorm(doc=4268)
          0.0113674635 = weight(abstract_txt:that in 4268) [ClassicSimilarity], result of:
            0.0113674635 = score(doc=4268,freq=1.0), product of:
              0.060504064 = queryWeight, product of:
                1.3197839 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.019063065 = queryNorm
              0.18787934 = fieldWeight in 4268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.078125 = fieldNorm(doc=4268)
          0.110493325 = weight(abstract_txt:identification in 4268) [ClassicSimilarity], result of:
            0.110493325 = score(doc=4268,freq=1.0), product of:
              0.24073307 = queryWeight, product of:
                2.149477 = boost
                5.875032 = idf(docFreq=322, maxDocs=42306)
                0.019063065 = queryNorm
              0.45898688 = fieldWeight in 4268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.875032 = idf(docFreq=322, maxDocs=42306)
                0.078125 = fieldNorm(doc=4268)
          0.10745065 = weight(abstract_txt:documents in 4268) [ClassicSimilarity], result of:
            0.10745065 = score(doc=4268,freq=2.0), product of:
              0.23629314 = queryWeight, product of:
                3.0116568 = boost
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.019063065 = queryNorm
              0.45473453 = fieldWeight in 4268, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.078125 = fieldNorm(doc=4268)
          0.3548298 = weight(abstract_txt:word in 4268) [ClassicSimilarity], result of:
            0.3548298 = score(doc=4268,freq=4.0), product of:
              0.4158932 = queryWeight, product of:
                3.9954972 = boost
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.019063065 = queryNorm
              0.8531753 = fieldWeight in 4268, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.078125 = fieldNorm(doc=4268)
        0.2 = coord(5/25)