Document (#10863)

Author
Cohen, J.D.
Title
Highlights: language- and domain-independent automatic indexing terms for abstracting
Source
Journal of the American Society for Information Science. 46(1995) no.3, S.162-174
Year
1995
Abstract
Presents a model of drawing index terms from text. The approach uses no stop list, stemmer, or other language and domain specific component, allowing operation in any language or domain with only trivial modification. The method uses n-grams counts, achieving a function similar to, but more general than, a stemmer. The generated index terms, called 'highlights', are suitable for identifying the topic for perusal and selection. An extension is also described and demonstrated which selects index terms to represent a subset of documents, distinguishing them from the corpus. Presents some experimental results, showing operation in English, Spanish, German, Georgian, Russian and Japanese
Theme
Automatisches Indexieren

Similar documents (author)

  1. Cohen, W.W.: ¬The whirl approach to information integration (1998) 5.49
    5.4902954 = sum of:
      5.4902954 = weight(author_txt:cohen in 5639) [ClassicSimilarity], result of:
        5.4902954 = score(doc=5639,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.784473 = idf(docFreq=17, maxDocs=43254)
            0.11383721 = queryNorm
          5.490296 = fieldWeight in 5639, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.784473 = idf(docFreq=17, maxDocs=43254)
            0.625 = fieldNorm(doc=5639)
    
  2. Cohen, J.: ¬The hermeneutics of the reference question (1993) 5.49
    5.4902954 = sum of:
      5.4902954 = weight(author_txt:cohen in 428) [ClassicSimilarity], result of:
        5.4902954 = score(doc=428,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.784473 = idf(docFreq=17, maxDocs=43254)
            0.11383721 = queryNorm
          5.490296 = fieldWeight in 428, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.784473 = idf(docFreq=17, maxDocs=43254)
            0.625 = fieldNorm(doc=428)
    
  3. Cohen, P.: Different approaches to quality (1996) 5.49
    5.4902954 = sum of:
      5.4902954 = weight(author_txt:cohen in 360) [ClassicSimilarity], result of:
        5.4902954 = score(doc=360,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.784473 = idf(docFreq=17, maxDocs=43254)
            0.11383721 = queryNorm
          5.490296 = fieldWeight in 360, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.784473 = idf(docFreq=17, maxDocs=43254)
            0.625 = fieldNorm(doc=360)
    
  4. Cohen, J.D.: Massive query resolution for rapid selective dissemination of information (1999) 5.49
    5.4902954 = sum of:
      5.4902954 = weight(author_txt:cohen in 5055) [ClassicSimilarity], result of:
        5.4902954 = score(doc=5055,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.784473 = idf(docFreq=17, maxDocs=43254)
            0.11383721 = queryNorm
          5.490296 = fieldWeight in 5055, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.784473 = idf(docFreq=17, maxDocs=43254)
            0.625 = fieldNorm(doc=5055)
    
  5. Cohen, M.: 99 philosophische Rätsel (2004) 5.49
    5.4902954 = sum of:
      5.4902954 = weight(author_txt:cohen in 3292) [ClassicSimilarity], result of:
        5.4902954 = score(doc=3292,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.784473 = idf(docFreq=17, maxDocs=43254)
            0.11383721 = queryNorm
          5.490296 = fieldWeight in 3292, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.784473 = idf(docFreq=17, maxDocs=43254)
            0.625 = fieldNorm(doc=3292)
    

Similar documents (content)

  1. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.10
    0.09635944 = sum of:
      0.09635944 = product of:
        0.6022465 = sum of:
          0.10053471 = weight(abstract_txt:stop in 4951) [ClassicSimilarity], result of:
            0.10053471 = score(doc=4951,freq=1.0), product of:
              0.17085685 = queryWeight, product of:
                1.1820862 = boost
                7.53171 = idf(docFreq=62, maxDocs=43254)
                0.019190649 = queryNorm
              0.58841485 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.53171 = idf(docFreq=62, maxDocs=43254)
                0.078125 = fieldNorm(doc=4951)
          0.07355056 = weight(abstract_txt:language in 4951) [ClassicSimilarity], result of:
            0.07355056 = score(doc=4951,freq=2.0), product of:
              0.15879719 = queryWeight, product of:
                1.9738538 = boost
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.019190649 = queryNorm
              0.46317294 = fieldWeight in 4951, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.078125 = fieldNorm(doc=4951)
          0.06290043 = weight(abstract_txt:terms in 4951) [ClassicSimilarity], result of:
            0.06290043 = score(doc=4951,freq=1.0), product of:
              0.19840112 = queryWeight, product of:
                2.5476222 = boost
                4.058069 = idf(docFreq=2031, maxDocs=43254)
                0.019190649 = queryNorm
              0.31703666 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.058069 = idf(docFreq=2031, maxDocs=43254)
                0.078125 = fieldNorm(doc=4951)
          0.36526084 = weight(abstract_txt:stemmer in 4951) [ClassicSimilarity], result of:
            0.36526084 = score(doc=4951,freq=1.0), product of:
              0.5087454 = queryWeight, product of:
                2.8846836 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.019190649 = queryNorm
              0.71796393 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.078125 = fieldNorm(doc=4951)
        0.16 = coord(4/25)
    
  2. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.10
    0.095668964 = sum of:
      0.095668964 = product of:
        1.195862 = sum of:
          0.052414827 = weight(abstract_txt:presents in 4586) [ClassicSimilarity], result of:
            0.052414827 = score(doc=4586,freq=1.0), product of:
              0.11142495 = queryWeight, product of:
                1.3500168 = boost
                4.3008432 = idf(docFreq=1593, maxDocs=43254)
                0.019190649 = queryNorm
              0.47040474 = fieldWeight in 4586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3008432 = idf(docFreq=1593, maxDocs=43254)
                0.109375 = fieldNorm(doc=4586)
          1.1434473 = weight(abstract_txt:stemmer in 4586) [ClassicSimilarity], result of:
            1.1434473 = score(doc=4586,freq=5.0), product of:
              0.5087454 = queryWeight, product of:
                2.8846836 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.019190649 = queryNorm
              2.2475827 = fieldWeight in 4586, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.109375 = fieldNorm(doc=4586)
        0.08 = coord(2/25)
    
  3. Li, Q.; Chen, Y.P.; Myaeng, S.-H.; Jin, Y.; Kang, B.-Y.: Concept unification of terms in different languages via web mining for Information Retrieval (2009) 0.09
    0.086052686 = sum of:
      0.086052686 = product of:
        0.43026343 = sum of:
          0.06679969 = weight(abstract_txt:achieving in 680) [ClassicSimilarity], result of:
            0.06679969 = score(doc=680,freq=1.0), product of:
              0.15096562 = queryWeight, product of:
                1.1111482 = boost
                7.0797253 = idf(docFreq=98, maxDocs=43254)
                0.019190649 = queryNorm
              0.44248283 = fieldWeight in 680, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0797253 = idf(docFreq=98, maxDocs=43254)
                0.0625 = fieldNorm(doc=680)
          0.0825476 = weight(abstract_txt:japanese in 680) [ClassicSimilarity], result of:
            0.0825476 = score(doc=680,freq=1.0), product of:
              0.173846 = queryWeight, product of:
                1.1923817 = boost
                7.5973077 = idf(docFreq=58, maxDocs=43254)
                0.019190649 = queryNorm
              0.47483173 = fieldWeight in 680, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5973077 = idf(docFreq=58, maxDocs=43254)
                0.0625 = fieldNorm(doc=680)
          0.083212964 = weight(abstract_txt:language in 680) [ClassicSimilarity], result of:
            0.083212964 = score(doc=680,freq=4.0), product of:
              0.15879719 = queryWeight, product of:
                1.9738538 = boost
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.019190649 = queryNorm
              0.5240204 = fieldWeight in 680, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.0625 = fieldNorm(doc=680)
          0.085183464 = weight(abstract_txt:index in 680) [ClassicSimilarity], result of:
            0.085183464 = score(doc=680,freq=2.0), product of:
              0.20321809 = queryWeight, product of:
                2.2329283 = boost
                4.7423973 = idf(docFreq=1024, maxDocs=43254)
                0.019190649 = queryNorm
              0.41917264 = fieldWeight in 680, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7423973 = idf(docFreq=1024, maxDocs=43254)
                0.0625 = fieldNorm(doc=680)
          0.11251971 = weight(abstract_txt:terms in 680) [ClassicSimilarity], result of:
            0.11251971 = score(doc=680,freq=5.0), product of:
              0.19840112 = queryWeight, product of:
                2.5476222 = boost
                4.058069 = idf(docFreq=2031, maxDocs=43254)
                0.019190649 = queryNorm
              0.5671324 = fieldWeight in 680, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.058069 = idf(docFreq=2031, maxDocs=43254)
                0.0625 = fieldNorm(doc=680)
        0.2 = coord(5/25)
    
  4. King, S.V.: ELNET: the electronic library database system (1992) 0.08
    0.0811283 = sum of:
      0.0811283 = product of:
        0.5070519 = sum of:
          0.20429488 = weight(abstract_txt:japanese in 4263) [ClassicSimilarity], result of:
            0.20429488 = score(doc=4263,freq=2.0), product of:
              0.173846 = queryWeight, product of:
                1.1923817 = boost
                7.5973077 = idf(docFreq=58, maxDocs=43254)
                0.019190649 = queryNorm
              1.1751485 = fieldWeight in 4263, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5973077 = idf(docFreq=58, maxDocs=43254)
                0.109375 = fieldNorm(doc=4263)
          0.07281134 = weight(abstract_txt:language in 4263) [ClassicSimilarity], result of:
            0.07281134 = score(doc=4263,freq=1.0), product of:
              0.15879719 = queryWeight, product of:
                1.9738538 = boost
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.019190649 = queryNorm
              0.45851782 = fieldWeight in 4263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.109375 = fieldNorm(doc=4263)
          0.10540916 = weight(abstract_txt:index in 4263) [ClassicSimilarity], result of:
            0.10540916 = score(doc=4263,freq=1.0), product of:
              0.20321809 = queryWeight, product of:
                2.2329283 = boost
                4.7423973 = idf(docFreq=1024, maxDocs=43254)
                0.019190649 = queryNorm
              0.5186997 = fieldWeight in 4263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7423973 = idf(docFreq=1024, maxDocs=43254)
                0.109375 = fieldNorm(doc=4263)
          0.1245365 = weight(abstract_txt:terms in 4263) [ClassicSimilarity], result of:
            0.1245365 = score(doc=4263,freq=2.0), product of:
              0.19840112 = queryWeight, product of:
                2.5476222 = boost
                4.058069 = idf(docFreq=2031, maxDocs=43254)
                0.019190649 = queryNorm
              0.62770057 = fieldWeight in 4263, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.058069 = idf(docFreq=2031, maxDocs=43254)
                0.109375 = fieldNorm(doc=4263)
        0.16 = coord(4/25)
    
  5. Panzer, M.: Dewey: how to make it work for you (2013) 0.08
    0.080064155 = sum of:
      0.080064155 = product of:
        0.50040096 = sum of:
          0.14074859 = weight(abstract_txt:stop in 798) [ClassicSimilarity], result of:
            0.14074859 = score(doc=798,freq=1.0), product of:
              0.17085685 = queryWeight, product of:
                1.1820862 = boost
                7.53171 = idf(docFreq=62, maxDocs=43254)
                0.019190649 = queryNorm
              0.8237808 = fieldWeight in 798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.53171 = idf(docFreq=62, maxDocs=43254)
                0.109375 = fieldNorm(doc=798)
          0.16618258 = weight(abstract_txt:highlights in 798) [ClassicSimilarity], result of:
            0.16618258 = score(doc=798,freq=1.0), product of:
              0.24047506 = queryWeight, product of:
                1.9832752 = boost
                6.318259 = idf(docFreq=211, maxDocs=43254)
                0.019190649 = queryNorm
              0.6910595 = fieldWeight in 798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.318259 = idf(docFreq=211, maxDocs=43254)
                0.109375 = fieldNorm(doc=798)
          0.10540916 = weight(abstract_txt:index in 798) [ClassicSimilarity], result of:
            0.10540916 = score(doc=798,freq=1.0), product of:
              0.20321809 = queryWeight, product of:
                2.2329283 = boost
                4.7423973 = idf(docFreq=1024, maxDocs=43254)
                0.019190649 = queryNorm
              0.5186997 = fieldWeight in 798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7423973 = idf(docFreq=1024, maxDocs=43254)
                0.109375 = fieldNorm(doc=798)
          0.0880606 = weight(abstract_txt:terms in 798) [ClassicSimilarity], result of:
            0.0880606 = score(doc=798,freq=1.0), product of:
              0.19840112 = queryWeight, product of:
                2.5476222 = boost
                4.058069 = idf(docFreq=2031, maxDocs=43254)
                0.019190649 = queryNorm
              0.44385132 = fieldWeight in 798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.058069 = idf(docFreq=2031, maxDocs=43254)
                0.109375 = fieldNorm(doc=798)
        0.16 = coord(4/25)