Document (#5815)

Author
Damerau, F.J.
Title
Generating an evaluating domain-oriented multi-word terms from texts
Source
Information processing and management. 29(1993) no.4, S.433-447
Year
1993
Abstract
Examines techniques for automatically generating domain vocabularies from large text collections. Focuses on the problem of generating multi-word vocabulary terms (specifically pairs). Discusses statistical issues associated with word co-occurrences likely to be of use in a natural language interface. Provides a more objective evaluation of the selection procedures. As substantial experimentation with subjects using a working query system is absent, all evaluation is necessarily subjective. Uses surrogate for experimentation by relying on pre-existing dictionaries as indicators of domain relevance
Theme
Automatisches Indexieren

Similar documents (content)

  1. Spiteri, L.F.: Word association testing and thesaurus construction : a pilot study (2005) 0.21
    0.21186186 = sum of:
      0.21186186 = product of:
        0.8827578 = sum of:
          0.06898044 = weight(abstract_txt:indicators in 217) [ClassicSimilarity], result of:
            0.06898044 = score(doc=217,freq=1.0), product of:
              0.12132825 = queryWeight, product of:
                1.0871358 = boost
                6.0644684 = idf(docFreq=269, maxDocs=42740)
                0.018402863 = queryNorm
              0.5685439 = fieldWeight in 217, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0644684 = idf(docFreq=269, maxDocs=42740)
                0.09375 = fieldNorm(doc=217)
          0.09908613 = weight(abstract_txt:pairs in 217) [ClassicSimilarity], result of:
            0.09908613 = score(doc=217,freq=1.0), product of:
              0.15446137 = queryWeight, product of:
                1.2266277 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.018402863 = queryNorm
              0.6414946 = fieldWeight in 217, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.09375 = fieldNorm(doc=217)
          0.058542006 = weight(abstract_txt:terms in 217) [ClassicSimilarity], result of:
            0.058542006 = score(doc=217,freq=2.0), product of:
              0.10875679 = queryWeight, product of:
                1.4556133 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.018402863 = queryNorm
              0.5382837 = fieldWeight in 217, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.09375 = fieldNorm(doc=217)
          0.10123192 = weight(abstract_txt:domain in 217) [ClassicSimilarity], result of:
            0.10123192 = score(doc=217,freq=1.0), product of:
              0.22597654 = queryWeight, product of:
                2.569775 = boost
                4.7784038 = idf(docFreq=976, maxDocs=42740)
                0.018402863 = queryNorm
              0.44797534 = fieldWeight in 217, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7784038 = idf(docFreq=976, maxDocs=42740)
                0.09375 = fieldNorm(doc=217)
          0.2607635 = weight(abstract_txt:word in 217) [ClassicSimilarity], result of:
            0.2607635 = score(doc=217,freq=3.0), product of:
              0.29442576 = queryWeight, product of:
                2.9332654 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.018402863 = queryNorm
              0.88566804 = fieldWeight in 217, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.09375 = fieldNorm(doc=217)
          0.29415378 = weight(abstract_txt:generating in 217) [ClassicSimilarity], result of:
            0.29415378 = score(doc=217,freq=1.0), product of:
              0.46015203 = queryWeight, product of:
                3.6670272 = boost
                6.8187037 = idf(docFreq=126, maxDocs=42740)
                0.018402863 = queryNorm
              0.6392535 = fieldWeight in 217, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8187037 = idf(docFreq=126, maxDocs=42740)
                0.09375 = fieldNorm(doc=217)
        0.24 = coord(6/25)
    
  2. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.12
    0.120685585 = sum of:
      0.120685585 = product of:
        0.7542849 = sum of:
          0.06899241 = weight(abstract_txt:terms in 2564) [ClassicSimilarity], result of:
            0.06899241 = score(doc=2564,freq=4.0), product of:
              0.10875679 = queryWeight, product of:
                1.4556133 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.018402863 = queryNorm
              0.6343734 = fieldWeight in 2564, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.078125 = fieldNorm(doc=2564)
          0.26899692 = weight(abstract_txt:multi in 2564) [ClassicSimilarity], result of:
            0.26899692 = score(doc=2564,freq=6.0), product of:
              0.2353549 = queryWeight, product of:
                2.1413095 = boost
                5.972531 = idf(docFreq=295, maxDocs=42740)
                0.018402863 = queryNorm
              1.1429417 = fieldWeight in 2564, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.972531 = idf(docFreq=295, maxDocs=42740)
                0.078125 = fieldNorm(doc=2564)
          0.08435994 = weight(abstract_txt:domain in 2564) [ClassicSimilarity], result of:
            0.08435994 = score(doc=2564,freq=1.0), product of:
              0.22597654 = queryWeight, product of:
                2.569775 = boost
                4.7784038 = idf(docFreq=976, maxDocs=42740)
                0.018402863 = queryNorm
              0.3733128 = fieldWeight in 2564, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7784038 = idf(docFreq=976, maxDocs=42740)
                0.078125 = fieldNorm(doc=2564)
          0.33193564 = weight(abstract_txt:word in 2564) [ClassicSimilarity], result of:
            0.33193564 = score(doc=2564,freq=7.0), product of:
              0.29442576 = queryWeight, product of:
                2.9332654 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.018402863 = queryNorm
              1.1274002 = fieldWeight in 2564, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.078125 = fieldNorm(doc=2564)
        0.16 = coord(4/25)
    
  3. He, Q.: ¬A study of the strength indexes in co-word analysis (2000) 0.11
    0.11357205 = sum of:
      0.11357205 = product of:
        0.56786025 = sum of:
          0.06859493 = weight(abstract_txt:likely in 1112) [ClassicSimilarity], result of:
            0.06859493 = score(doc=1112,freq=2.0), product of:
              0.10833867 = queryWeight, product of:
                1.0272936 = boost
                5.730645 = idf(docFreq=376, maxDocs=42740)
                0.018402863 = queryNorm
              0.6331528 = fieldWeight in 1112, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.730645 = idf(docFreq=376, maxDocs=42740)
                0.078125 = fieldNorm(doc=1112)
          0.14301851 = weight(abstract_txt:pairs in 1112) [ClassicSimilarity], result of:
            0.14301851 = score(doc=1112,freq=3.0), product of:
              0.15446137 = queryWeight, product of:
                1.2266277 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.018402863 = queryNorm
              0.9259177 = fieldWeight in 1112, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.078125 = fieldNorm(doc=1112)
          0.10444767 = weight(abstract_txt:occurrences in 1112) [ClassicSimilarity], result of:
            0.10444767 = score(doc=1112,freq=1.0), product of:
              0.1806611 = queryWeight, product of:
                1.3265852 = boost
                7.400211 = idf(docFreq=70, maxDocs=42740)
                0.018402863 = queryNorm
              0.57814145 = fieldWeight in 1112, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.400211 = idf(docFreq=70, maxDocs=42740)
                0.078125 = fieldNorm(doc=1112)
          0.034496207 = weight(abstract_txt:terms in 1112) [ClassicSimilarity], result of:
            0.034496207 = score(doc=1112,freq=1.0), product of:
              0.10875679 = queryWeight, product of:
                1.4556133 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.018402863 = queryNorm
              0.3171867 = fieldWeight in 1112, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.078125 = fieldNorm(doc=1112)
          0.21730289 = weight(abstract_txt:word in 1112) [ClassicSimilarity], result of:
            0.21730289 = score(doc=1112,freq=3.0), product of:
              0.29442576 = queryWeight, product of:
                2.9332654 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.018402863 = queryNorm
              0.73805666 = fieldWeight in 1112, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.078125 = fieldNorm(doc=1112)
        0.2 = coord(5/25)
    
  4. Lamirel, J.-C.: Multi-view data analysis and concept extraction methods for text (2013) 0.11
    0.107480064 = sum of:
      0.107480064 = product of:
        0.4478336 = sum of:
          0.03614695 = weight(abstract_txt:oriented in 3073) [ClassicSimilarity], result of:
            0.03614695 = score(doc=3073,freq=1.0), product of:
              0.10333638 = queryWeight, product of:
                1.0032969 = boost
                5.596782 = idf(docFreq=430, maxDocs=42740)
                0.018402863 = queryNorm
              0.3497989 = fieldWeight in 3073, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.596782 = idf(docFreq=430, maxDocs=42740)
                0.0625 = fieldNorm(doc=3073)
          0.040633008 = weight(abstract_txt:objective in 3073) [ClassicSimilarity], result of:
            0.040633008 = score(doc=3073,freq=1.0), product of:
              0.111718416 = queryWeight, product of:
                1.0431943 = boost
                5.819346 = idf(docFreq=344, maxDocs=42740)
                0.018402863 = queryNorm
              0.36370912 = fieldWeight in 3073, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.819346 = idf(docFreq=344, maxDocs=42740)
                0.0625 = fieldNorm(doc=3073)
          0.058946934 = weight(abstract_txt:subjective in 3073) [ClassicSimilarity], result of:
            0.058946934 = score(doc=3073,freq=1.0), product of:
              0.1431681 = queryWeight, product of:
                1.1809349 = boost
                6.5877166 = idf(docFreq=159, maxDocs=42740)
                0.018402863 = queryNorm
              0.4117323 = fieldWeight in 3073, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5877166 = idf(docFreq=159, maxDocs=42740)
                0.0625 = fieldNorm(doc=3073)
          0.052886438 = weight(abstract_txt:evaluation in 3073) [ClassicSimilarity], result of:
            0.052886438 = score(doc=3073,freq=2.0), product of:
              0.13317877 = queryWeight, product of:
                1.6107767 = boost
                4.492771 = idf(docFreq=1299, maxDocs=42740)
                0.018402863 = queryNorm
              0.3971086 = fieldWeight in 3073, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.492771 = idf(docFreq=1299, maxDocs=42740)
                0.0625 = fieldNorm(doc=3073)
          0.124244355 = weight(abstract_txt:multi in 3073) [ClassicSimilarity], result of:
            0.124244355 = score(doc=3073,freq=2.0), product of:
              0.2353549 = queryWeight, product of:
                2.1413095 = boost
                5.972531 = idf(docFreq=295, maxDocs=42740)
                0.018402863 = queryNorm
              0.5279021 = fieldWeight in 3073, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.972531 = idf(docFreq=295, maxDocs=42740)
                0.0625 = fieldNorm(doc=3073)
          0.1349759 = weight(abstract_txt:domain in 3073) [ClassicSimilarity], result of:
            0.1349759 = score(doc=3073,freq=4.0), product of:
              0.22597654 = queryWeight, product of:
                2.569775 = boost
                4.7784038 = idf(docFreq=976, maxDocs=42740)
                0.018402863 = queryNorm
              0.59730047 = fieldWeight in 3073, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.7784038 = idf(docFreq=976, maxDocs=42740)
                0.0625 = fieldNorm(doc=3073)
        0.24 = coord(6/25)
    
  5. Tomov, D.T.: Some critical remarks on the stop word lists of ISI publications (2001) 0.10
    0.09620621 = sum of:
      0.09620621 = product of:
        0.48103106 = sum of:
          0.03614695 = weight(abstract_txt:oriented in 479) [ClassicSimilarity], result of:
            0.03614695 = score(doc=479,freq=1.0), product of:
              0.10333638 = queryWeight, product of:
                1.0032969 = boost
                5.596782 = idf(docFreq=430, maxDocs=42740)
                0.018402863 = queryNorm
              0.3497989 = fieldWeight in 479, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.596782 = idf(docFreq=430, maxDocs=42740)
                0.0625 = fieldNorm(doc=479)
          0.07917358 = weight(abstract_txt:dictionaries in 479) [ClassicSimilarity], result of:
            0.07917358 = score(doc=479,freq=1.0), product of:
              0.17428459 = queryWeight, product of:
                1.3029637 = boost
                7.268441 = idf(docFreq=80, maxDocs=42740)
                0.018402863 = queryNorm
              0.45427758 = fieldWeight in 479, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.268441 = idf(docFreq=80, maxDocs=42740)
                0.0625 = fieldNorm(doc=479)
          0.039028004 = weight(abstract_txt:terms in 479) [ClassicSimilarity], result of:
            0.039028004 = score(doc=479,freq=2.0), product of:
              0.10875679 = queryWeight, product of:
                1.4556133 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.018402863 = queryNorm
              0.35885578 = fieldWeight in 479, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.0625 = fieldNorm(doc=479)
          0.12594673 = weight(abstract_txt:absent in 479) [ClassicSimilarity], result of:
            0.12594673 = score(doc=479,freq=1.0), product of:
              0.2374999 = queryWeight, product of:
                1.5210186 = boost
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.018402863 = queryNorm
              0.5303023 = fieldWeight in 479, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.0625 = fieldNorm(doc=479)
          0.2007358 = weight(abstract_txt:word in 479) [ClassicSimilarity], result of:
            0.2007358 = score(doc=479,freq=4.0), product of:
              0.29442576 = queryWeight, product of:
                2.9332654 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.018402863 = queryNorm
              0.68178755 = fieldWeight in 479, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0625 = fieldNorm(doc=479)
        0.2 = coord(5/25)