Document (#32953)

Author
Ercan, G.
Cicekli, I.
Title
Using lexical chains for keyword extraction
Source
Information processing and management. 43(2007) no.6, S.1705-1714
Year
2007
Abstract
Keywords can be considered as condensed versions of documents and short forms of their summaries. In this paper, the problem of automatic extraction of keywords from documents is treated as a supervised learning task. A lexical chain holds a set of semantically related words of a text and it can be said that a lexical chain represents the semantic content of a portion of the text. Although lexical chains have been extensively used in text summarization, their usage for keyword extraction problem has not been fully investigated. In this paper, a keyword extraction technique that uses lexical chains is described, and encouraging results are obtained.
Theme
Automatisches Abstracting

Similar documents (content)

  1. Wang, F.L.; Yang, C.C.: ¬The impact analysis of language differences on an automatic multilingual text summarization system (2006) 0.17
    0.17137082 = sum of:
      0.17137082 = product of:
        0.6120386 = sum of:
          0.117597245 = weight(abstract_txt:summarization in 50) [ClassicSimilarity], result of:
            0.117597245 = score(doc=50,freq=5.0), product of:
              0.11781413 = queryWeight, product of:
                1.1435828 = boost
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.014424308 = queryNorm
              0.9981591 = fieldWeight in 50, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.0625 = fieldNorm(doc=50)
          0.023960073 = weight(abstract_txt:been in 50) [ClassicSimilarity], result of:
            0.023960073 = score(doc=50,freq=3.0), product of:
              0.060937446 = queryWeight, product of:
                1.1631241 = boost
                3.6321454 = idf(docFreq=3110, maxDocs=43254)
                0.014424308 = queryNorm
              0.39319128 = fieldWeight in 50, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6321454 = idf(docFreq=3110, maxDocs=43254)
                0.0625 = fieldNorm(doc=50)
          0.03487615 = weight(abstract_txt:documents in 50) [ClassicSimilarity], result of:
            0.03487615 = score(doc=50,freq=3.0), product of:
              0.07826685 = queryWeight, product of:
                1.3181744 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.014424308 = queryNorm
              0.4456056 = fieldWeight in 50, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.0625 = fieldNorm(doc=50)
          0.025554584 = weight(abstract_txt:problem in 50) [ClassicSimilarity], result of:
            0.025554584 = score(doc=50,freq=1.0), product of:
              0.09174417 = queryWeight, product of:
                1.4271617 = boost
                4.4566684 = idf(docFreq=1363, maxDocs=43254)
                0.014424308 = queryNorm
              0.27854177 = fieldWeight in 50, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4566684 = idf(docFreq=1363, maxDocs=43254)
                0.0625 = fieldNorm(doc=50)
          0.07045073 = weight(abstract_txt:text in 50) [ClassicSimilarity], result of:
            0.07045073 = score(doc=50,freq=6.0), product of:
              0.1136326 = queryWeight, product of:
                1.9452751 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.014424308 = queryNorm
              0.619987 = fieldWeight in 50, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.0625 = fieldNorm(doc=50)
          0.13834237 = weight(abstract_txt:extraction in 50) [ClassicSimilarity], result of:
            0.13834237 = score(doc=50,freq=1.0), product of:
              0.3563795 = queryWeight, product of:
                3.9779131 = boost
                6.2110133 = idf(docFreq=235, maxDocs=43254)
                0.014424308 = queryNorm
              0.38818833 = fieldWeight in 50, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2110133 = idf(docFreq=235, maxDocs=43254)
                0.0625 = fieldNorm(doc=50)
          0.2012575 = weight(abstract_txt:lexical in 50) [ClassicSimilarity], result of:
            0.2012575 = score(doc=50,freq=1.0), product of:
              0.49288696 = queryWeight, product of:
                5.230312 = boost
                6.5331817 = idf(docFreq=170, maxDocs=43254)
                0.014424308 = queryNorm
              0.40832385 = fieldWeight in 50, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5331817 = idf(docFreq=170, maxDocs=43254)
                0.0625 = fieldNorm(doc=50)
        0.28 = coord(7/25)
    
  2. Yang, C.C.; Wang, F.L.: Hierarchical summarization of large documents (2008) 0.15
    0.15367007 = sum of:
      0.15367007 = product of:
        0.5488217 = sum of:
          0.049955506 = weight(abstract_txt:summaries in 3720) [ClassicSimilarity], result of:
            0.049955506 = score(doc=3720,freq=1.0), product of:
              0.11384436 = queryWeight, product of:
                1.1241511 = boost
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.014424308 = queryNorm
              0.43880528 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
          0.13914295 = weight(abstract_txt:summarization in 3720) [ClassicSimilarity], result of:
            0.13914295 = score(doc=3720,freq=7.0), product of:
              0.11781413 = queryWeight, product of:
                1.1435828 = boost
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.014424308 = queryNorm
              1.1810378 = fieldWeight in 3720, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
          0.019563315 = weight(abstract_txt:been in 3720) [ClassicSimilarity], result of:
            0.019563315 = score(doc=3720,freq=2.0), product of:
              0.060937446 = queryWeight, product of:
                1.1631241 = boost
                3.6321454 = idf(docFreq=3110, maxDocs=43254)
                0.014424308 = queryNorm
              0.32103932 = fieldWeight in 3720, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6321454 = idf(docFreq=3110, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
          0.028476255 = weight(abstract_txt:documents in 3720) [ClassicSimilarity], result of:
            0.028476255 = score(doc=3720,freq=2.0), product of:
              0.07826685 = queryWeight, product of:
                1.3181744 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.014424308 = queryNorm
              0.36383545 = fieldWeight in 3720, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
          0.08727663 = weight(abstract_txt:condensed in 3720) [ClassicSimilarity], result of:
            0.08727663 = score(doc=3720,freq=1.0), product of:
              0.16514087 = queryWeight, product of:
                1.35393 = boost
                8.455969 = idf(docFreq=24, maxDocs=43254)
                0.014424308 = queryNorm
              0.52849805 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.455969 = idf(docFreq=24, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
          0.02876139 = weight(abstract_txt:text in 3720) [ClassicSimilarity], result of:
            0.02876139 = score(doc=3720,freq=1.0), product of:
              0.1136326 = queryWeight, product of:
                1.9452751 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.014424308 = queryNorm
              0.25310862 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
          0.19564565 = weight(abstract_txt:extraction in 3720) [ClassicSimilarity], result of:
            0.19564565 = score(doc=3720,freq=2.0), product of:
              0.3563795 = queryWeight, product of:
                3.9779131 = boost
                6.2110133 = idf(docFreq=235, maxDocs=43254)
                0.014424308 = queryNorm
              0.5489812 = fieldWeight in 3720, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2110133 = idf(docFreq=235, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
        0.28 = coord(7/25)
    
  3. Naing, M.-M.; Lim, E.-P.; Chiang, R.H.L.: Extracting link chains of relationship instances from a Web site (2006) 0.15
    0.15140611 = sum of:
      0.15140611 = product of:
        0.9462882 = sum of:
          0.03194323 = weight(abstract_txt:problem in 1112) [ClassicSimilarity], result of:
            0.03194323 = score(doc=1112,freq=1.0), product of:
              0.09174417 = queryWeight, product of:
                1.4271617 = boost
                4.4566684 = idf(docFreq=1363, maxDocs=43254)
                0.014424308 = queryNorm
              0.34817722 = fieldWeight in 1112, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4566684 = idf(docFreq=1363, maxDocs=43254)
                0.078125 = fieldNorm(doc=1112)
          0.23643878 = weight(abstract_txt:chain in 1112) [ClassicSimilarity], result of:
            0.23643878 = score(doc=1112,freq=3.0), product of:
              0.24160059 = queryWeight, product of:
                2.31597 = boost
                7.232194 = idf(docFreq=84, maxDocs=43254)
                0.014424308 = queryNorm
              0.97863495 = fieldWeight in 1112, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.232194 = idf(docFreq=84, maxDocs=43254)
                0.078125 = fieldNorm(doc=1112)
          0.34585592 = weight(abstract_txt:extraction in 1112) [ClassicSimilarity], result of:
            0.34585592 = score(doc=1112,freq=4.0), product of:
              0.3563795 = queryWeight, product of:
                3.9779131 = boost
                6.2110133 = idf(docFreq=235, maxDocs=43254)
                0.014424308 = queryNorm
              0.97047085 = fieldWeight in 1112, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2110133 = idf(docFreq=235, maxDocs=43254)
                0.078125 = fieldNorm(doc=1112)
          0.33205032 = weight(abstract_txt:chains in 1112) [ClassicSimilarity], result of:
            0.33205032 = score(doc=1112,freq=1.0), product of:
              0.50021756 = queryWeight, product of:
                4.0813985 = boost
                8.496791 = idf(docFreq=23, maxDocs=43254)
                0.014424308 = queryNorm
              0.6638118 = fieldWeight in 1112, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.496791 = idf(docFreq=23, maxDocs=43254)
                0.078125 = fieldNorm(doc=1112)
        0.16 = coord(4/25)
    
  4. Tseng, Y.-H.: Keyword extraction techniques and relevance feedback (1997) 0.15
    0.14969254 = sum of:
      0.14969254 = product of:
        0.7484627 = sum of:
          0.011623629 = weight(abstract_txt:their in 3831) [ClassicSimilarity], result of:
            0.011623629 = score(doc=3831,freq=1.0), product of:
              0.046761356 = queryWeight, product of:
                1.0188904 = boost
                3.1817396 = idf(docFreq=4880, maxDocs=43254)
                0.014424308 = queryNorm
              0.24857341 = fieldWeight in 3831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1817396 = idf(docFreq=4880, maxDocs=43254)
                0.078125 = fieldNorm(doc=3831)
          0.03194323 = weight(abstract_txt:problem in 3831) [ClassicSimilarity], result of:
            0.03194323 = score(doc=3831,freq=1.0), product of:
              0.09174417 = queryWeight, product of:
                1.4271617 = boost
                4.4566684 = idf(docFreq=1363, maxDocs=43254)
                0.014424308 = queryNorm
              0.34817722 = fieldWeight in 3831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4566684 = idf(docFreq=1363, maxDocs=43254)
                0.078125 = fieldNorm(doc=3831)
          0.07966033 = weight(abstract_txt:keywords in 3831) [ClassicSimilarity], result of:
            0.07966033 = score(doc=3831,freq=1.0), product of:
              0.16871512 = queryWeight, product of:
                1.9353563 = boost
                6.043633 = idf(docFreq=278, maxDocs=43254)
                0.014424308 = queryNorm
              0.47215882 = fieldWeight in 3831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.043633 = idf(docFreq=278, maxDocs=43254)
                0.078125 = fieldNorm(doc=3831)
          0.23855682 = weight(abstract_txt:keyword in 3831) [ClassicSimilarity], result of:
            0.23855682 = score(doc=3831,freq=4.0), product of:
              0.25277314 = queryWeight, product of:
                2.901316 = boost
                6.0400553 = idf(docFreq=279, maxDocs=43254)
                0.014424308 = queryNorm
              0.9437586 = fieldWeight in 3831, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.0400553 = idf(docFreq=279, maxDocs=43254)
                0.078125 = fieldNorm(doc=3831)
          0.38667867 = weight(abstract_txt:extraction in 3831) [ClassicSimilarity], result of:
            0.38667867 = score(doc=3831,freq=5.0), product of:
              0.3563795 = queryWeight, product of:
                3.9779131 = boost
                6.2110133 = idf(docFreq=235, maxDocs=43254)
                0.014424308 = queryNorm
              1.0850194 = fieldWeight in 3831, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.2110133 = idf(docFreq=235, maxDocs=43254)
                0.078125 = fieldNorm(doc=3831)
        0.2 = coord(5/25)
    
  5. Morris, J.: Individual differences in the interpretation of text : implications for information science (2009) 0.14
    0.143748 = sum of:
      0.143748 = product of:
        0.89842504 = sum of:
          0.058735963 = weight(abstract_txt:semantically in 319) [ClassicSimilarity], result of:
            0.058735963 = score(doc=319,freq=1.0), product of:
              0.10929123 = queryWeight, product of:
                1.1014419 = boost
                6.8790545 = idf(docFreq=120, maxDocs=43254)
                0.014424308 = queryNorm
              0.5374261 = fieldWeight in 319, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8790545 = idf(docFreq=120, maxDocs=43254)
                0.078125 = fieldNorm(doc=319)
          0.071903475 = weight(abstract_txt:text in 319) [ClassicSimilarity], result of:
            0.071903475 = score(doc=319,freq=4.0), product of:
              0.1136326 = queryWeight, product of:
                1.9452751 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.014424308 = queryNorm
              0.63277155 = fieldWeight in 319, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.078125 = fieldNorm(doc=319)
          0.33205032 = weight(abstract_txt:chains in 319) [ClassicSimilarity], result of:
            0.33205032 = score(doc=319,freq=1.0), product of:
              0.50021756 = queryWeight, product of:
                4.0813985 = boost
                8.496791 = idf(docFreq=23, maxDocs=43254)
                0.014424308 = queryNorm
              0.6638118 = fieldWeight in 319, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.496791 = idf(docFreq=23, maxDocs=43254)
                0.078125 = fieldNorm(doc=319)
          0.4357353 = weight(abstract_txt:lexical in 319) [ClassicSimilarity], result of:
            0.4357353 = score(doc=319,freq=3.0), product of:
              0.49288696 = queryWeight, product of:
                5.230312 = boost
                6.5331817 = idf(docFreq=170, maxDocs=43254)
                0.014424308 = queryNorm
              0.8840471 = fieldWeight in 319, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5331817 = idf(docFreq=170, maxDocs=43254)
                0.078125 = fieldNorm(doc=319)
        0.16 = coord(4/25)