Document (#10863)

Author
Cohen, J.D.
Title
Highlights: language- and domain-independent automatic indexing terms for abstracting
Source
Journal of the American Society for Information Science. 46(1995) no.3, S.162-174
Year
1995
Abstract
Presents a model of drawing index terms from text. The approach uses no stop list, stemmer, or other language and domain specific component, allowing operation in any language or domain with only trivial modification. The method uses n-grams counts, achieving a function similar to, but more general than, a stemmer. The generated index terms, called 'highlights', are suitable for identifying the topic for perusal and selection. An extension is also described and demonstrated which selects index terms to represent a subset of documents, distinguishing them from the corpus. Presents some experimental results, showing operation in English, Spanish, German, Georgian, Russian and Japanese
Theme
Automatisches Indexieren

Similar documents (author)

  1. Cohen, W.W.: ¬The whirl approach to information integration (1998) 5.50
    5.504072 = sum of:
      5.504072 = weight(author_txt:cohen in 5639) [ClassicSimilarity], result of:
        5.504072 = fieldWeight in 5639, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.625 = fieldNorm(doc=5639)
    
  2. Cohen, J.: ¬The hermeneutics of the reference question (1993) 5.50
    5.504072 = sum of:
      5.504072 = weight(author_txt:cohen in 7428) [ClassicSimilarity], result of:
        5.504072 = fieldWeight in 7428, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.625 = fieldNorm(doc=7428)
    
  3. Cohen, P.: Different approaches to quality (1996) 5.50
    5.504072 = sum of:
      5.504072 = weight(author_txt:cohen in 7290) [ClassicSimilarity], result of:
        5.504072 = fieldWeight in 7290, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.625 = fieldNorm(doc=7290)
    
  4. Cohen, J.D.: Massive query resolution for rapid selective dissemination of information (1999) 5.50
    5.504072 = sum of:
      5.504072 = weight(author_txt:cohen in 3054) [ClassicSimilarity], result of:
        5.504072 = fieldWeight in 3054, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.625 = fieldNorm(doc=3054)
    
  5. Cohen, M.: 99 philosophische Rätsel (2004) 5.50
    5.504072 = sum of:
      5.504072 = weight(author_txt:cohen in 1291) [ClassicSimilarity], result of:
        5.504072 = fieldWeight in 1291, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.625 = fieldNorm(doc=1291)
    

Similar documents (content)

  1. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.10
    0.09663446 = sum of:
      0.09663446 = product of:
        0.6039654 = sum of:
          0.10027256 = weight(abstract_txt:stop in 2950) [ClassicSimilarity], result of:
            0.10027256 = score(doc=2950,freq=1.0), product of:
              0.17061998 = queryWeight, product of:
                1.1820083 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.019188773 = queryNorm
              0.5876953 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.07309896 = weight(abstract_txt:language in 2950) [ClassicSimilarity], result of:
            0.07309896 = score(doc=2950,freq=2.0), product of:
              0.1582024 = queryWeight, product of:
                1.9713907 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.019188773 = queryNorm
              0.46205974 = fieldWeight in 2950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.062308315 = weight(abstract_txt:terms in 2950) [ClassicSimilarity], result of:
            0.062308315 = score(doc=2950,freq=1.0), product of:
              0.19722372 = queryWeight, product of:
                2.5416465 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.019188773 = queryNorm
              0.3159271 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.36828554 = weight(abstract_txt:stemmer in 2950) [ClassicSimilarity], result of:
            0.36828554 = score(doc=2950,freq=1.0), product of:
              0.51173085 = queryWeight, product of:
                2.8949518 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.019188773 = queryNorm
              0.71968603 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
        0.16 = coord(4/25)
    
  2. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.10
    0.09643006 = sum of:
      0.09643006 = product of:
        1.2053758 = sum of:
          0.052459743 = weight(abstract_txt:presents in 2585) [ClassicSimilarity], result of:
            0.052459743 = score(doc=2585,freq=1.0), product of:
              0.111528 = queryWeight, product of:
                1.351489 = boost
                4.300552 = idf(docFreq=1629, maxDocs=44218)
                0.019188773 = queryNorm
              0.47037286 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.300552 = idf(docFreq=1629, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          1.1529161 = weight(abstract_txt:stemmer in 2585) [ClassicSimilarity], result of:
            1.1529161 = score(doc=2585,freq=5.0), product of:
              0.51173085 = queryWeight, product of:
                2.8949518 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.019188773 = queryNorm
              2.2529736 = fieldWeight in 2585, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
        0.08 = coord(2/25)
    
  3. Li, Q.; Chen, Y.P.; Myaeng, S.-H.; Jin, Y.; Kang, B.-Y.: Concept unification of terms in different languages via web mining for Information Retrieval (2009) 0.09
    0.08595925 = sum of:
      0.08595925 = product of:
        0.42979622 = sum of:
          0.06664944 = weight(abstract_txt:achieving in 4215) [ClassicSimilarity], result of:
            0.06664944 = score(doc=4215,freq=1.0), product of:
              0.1507924 = queryWeight, product of:
                1.1112078 = boost
                7.071914 = idf(docFreq=101, maxDocs=44218)
                0.019188773 = queryNorm
              0.44199464 = fieldWeight in 4215, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.071914 = idf(docFreq=101, maxDocs=44218)
                0.0625 = fieldNorm(doc=4215)
          0.08335647 = weight(abstract_txt:japanese in 4215) [ClassicSimilarity], result of:
            0.08335647 = score(doc=4215,freq=1.0), product of:
              0.17504165 = queryWeight, product of:
                1.1972263 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.019188773 = queryNorm
              0.47620937 = fieldWeight in 4215, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.0625 = fieldNorm(doc=4215)
          0.08270203 = weight(abstract_txt:language in 4215) [ClassicSimilarity], result of:
            0.08270203 = score(doc=4215,freq=4.0), product of:
              0.1582024 = queryWeight, product of:
                1.9713907 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.019188773 = queryNorm
              0.5227609 = fieldWeight in 4215, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0625 = fieldNorm(doc=4215)
          0.08562776 = weight(abstract_txt:index in 4215) [ClassicSimilarity], result of:
            0.08562776 = score(doc=4215,freq=2.0), product of:
              0.20399615 = queryWeight, product of:
                2.2386034 = boost
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.019188773 = queryNorm
              0.41975182 = fieldWeight in 4215, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.0625 = fieldNorm(doc=4215)
          0.11146051 = weight(abstract_txt:terms in 4215) [ClassicSimilarity], result of:
            0.11146051 = score(doc=4215,freq=5.0), product of:
              0.19722372 = queryWeight, product of:
                2.5416465 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.019188773 = queryNorm
              0.5651476 = fieldWeight in 4215, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4215)
        0.2 = coord(5/25)
    
  4. King, S.V.: ELNET: the electronic library database system (1992) 0.08
    0.08127745 = sum of:
      0.08127745 = product of:
        0.5079841 = sum of:
          0.20629673 = weight(abstract_txt:japanese in 4263) [ClassicSimilarity], result of:
            0.20629673 = score(doc=4263,freq=2.0), product of:
              0.17504165 = queryWeight, product of:
                1.1972263 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.019188773 = queryNorm
              1.178558 = fieldWeight in 4263, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.109375 = fieldNorm(doc=4263)
          0.07236428 = weight(abstract_txt:language in 4263) [ClassicSimilarity], result of:
            0.07236428 = score(doc=4263,freq=1.0), product of:
              0.1582024 = queryWeight, product of:
                1.9713907 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.019188773 = queryNorm
              0.45741582 = fieldWeight in 4263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.109375 = fieldNorm(doc=4263)
          0.105958946 = weight(abstract_txt:index in 4263) [ClassicSimilarity], result of:
            0.105958946 = score(doc=4263,freq=1.0), product of:
              0.20399615 = queryWeight, product of:
                2.2386034 = boost
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.019188773 = queryNorm
              0.5194164 = fieldWeight in 4263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.109375 = fieldNorm(doc=4263)
          0.12336417 = weight(abstract_txt:terms in 4263) [ClassicSimilarity], result of:
            0.12336417 = score(doc=4263,freq=2.0), product of:
              0.19722372 = queryWeight, product of:
                2.5416465 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.019188773 = queryNorm
              0.6255037 = fieldWeight in 4263, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.109375 = fieldNorm(doc=4263)
        0.16 = coord(4/25)
    
  5. Panzer, M.: Dewey: how to make it work for you (2013) 0.08
    0.07940781 = sum of:
      0.07940781 = product of:
        0.49629885 = sum of:
          0.14038159 = weight(abstract_txt:stop in 5797) [ClassicSimilarity], result of:
            0.14038159 = score(doc=5797,freq=1.0), product of:
              0.17061998 = queryWeight, product of:
                1.1820083 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.019188773 = queryNorm
              0.82277346 = fieldWeight in 5797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.109375 = fieldNorm(doc=5797)
          0.16272667 = weight(abstract_txt:highlights in 5797) [ClassicSimilarity], result of:
            0.16272667 = score(doc=5797,freq=1.0), product of:
              0.23721325 = queryWeight, product of:
                1.9710155 = boost
                6.2719374 = idf(docFreq=226, maxDocs=44218)
                0.019188773 = queryNorm
              0.68599313 = fieldWeight in 5797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2719374 = idf(docFreq=226, maxDocs=44218)
                0.109375 = fieldNorm(doc=5797)
          0.105958946 = weight(abstract_txt:index in 5797) [ClassicSimilarity], result of:
            0.105958946 = score(doc=5797,freq=1.0), product of:
              0.20399615 = queryWeight, product of:
                2.2386034 = boost
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.019188773 = queryNorm
              0.5194164 = fieldWeight in 5797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.109375 = fieldNorm(doc=5797)
          0.087231636 = weight(abstract_txt:terms in 5797) [ClassicSimilarity], result of:
            0.087231636 = score(doc=5797,freq=1.0), product of:
              0.19722372 = queryWeight, product of:
                2.5416465 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.019188773 = queryNorm
              0.4422979 = fieldWeight in 5797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.109375 = fieldNorm(doc=5797)
        0.16 = coord(4/25)