Document (#32952)

Author
Ercan, G.
Cicekli, I.
Title
Using lexical chains for keyword extraction
Source
Information processing and management. 43(2007) no.6, S.1705-1714
Year
2007
Abstract
Keywords can be considered as condensed versions of documents and short forms of their summaries. In this paper, the problem of automatic extraction of keywords from documents is treated as a supervised learning task. A lexical chain holds a set of semantically related words of a text and it can be said that a lexical chain represents the semantic content of a portion of the text. Although lexical chains have been extensively used in text summarization, their usage for keyword extraction problem has not been fully investigated. In this paper, a keyword extraction technique that uses lexical chains is described, and encouraging results are obtained.
Theme
Automatisches Abstracting

Similar documents (content)

  1. Wang, F.L.; Yang, C.C.: ¬The impact analysis of language differences on an automatic multilingual text summarization system (2006) 0.17
    0.17111202 = sum of:
      0.17111202 = product of:
        0.6111143 = sum of:
          0.1174791 = weight(abstract_txt:summarization in 5049) [ClassicSimilarity], result of:
            0.1174791 = score(doc=5049,freq=5.0), product of:
              0.11785593 = queryWeight, product of:
                1.1411788 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0144795 = queryNorm
              0.9968026 = fieldWeight in 5049, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0625 = fieldNorm(doc=5049)
          0.023745857 = weight(abstract_txt:been in 5049) [ClassicSimilarity], result of:
            0.023745857 = score(doc=5049,freq=3.0), product of:
              0.060635813 = queryWeight, product of:
                1.1575975 = boost
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.0144795 = queryNorm
              0.3916144 = fieldWeight in 5049, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.0625 = fieldNorm(doc=5049)
          0.035110522 = weight(abstract_txt:documents in 5049) [ClassicSimilarity], result of:
            0.035110522 = score(doc=5049,freq=3.0), product of:
              0.07869772 = queryWeight, product of:
                1.3187852 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0144795 = queryNorm
              0.44614407 = fieldWeight in 5049, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=5049)
          0.025700275 = weight(abstract_txt:problem in 5049) [ClassicSimilarity], result of:
            0.025700275 = score(doc=5049,freq=1.0), product of:
              0.09218697 = queryWeight, product of:
                1.4273411 = boost
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.0144795 = queryNorm
              0.27878425 = fieldWeight in 5049, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.0625 = fieldNorm(doc=5049)
          0.07036076 = weight(abstract_txt:text in 5049) [ClassicSimilarity], result of:
            0.07036076 = score(doc=5049,freq=6.0), product of:
              0.11365225 = queryWeight, product of:
                1.9410094 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0144795 = queryNorm
              0.6190881 = fieldWeight in 5049, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=5049)
          0.13746828 = weight(abstract_txt:extraction in 5049) [ClassicSimilarity], result of:
            0.13746828 = score(doc=5049,freq=1.0), product of:
              0.35524067 = queryWeight, product of:
                3.9625006 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0144795 = queryNorm
              0.38697222 = fieldWeight in 5049, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0625 = fieldNorm(doc=5049)
          0.20124955 = weight(abstract_txt:lexical in 5049) [ClassicSimilarity], result of:
            0.20124955 = score(doc=5049,freq=1.0), product of:
              0.49337938 = queryWeight, product of:
                5.220998 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.0144795 = queryNorm
              0.4079002 = fieldWeight in 5049, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.0625 = fieldNorm(doc=5049)
        0.28 = coord(7/25)
    
  2. Yang, C.C.; Wang, F.L.: Hierarchical summarization of large documents (2008) 0.15
    0.1536651 = sum of:
      0.1536651 = product of:
        0.5488039 = sum of:
          0.050378826 = weight(abstract_txt:summaries in 1719) [ClassicSimilarity], result of:
            0.050378826 = score(doc=1719,freq=1.0), product of:
              0.11460399 = queryWeight, product of:
                1.1253247 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.0144795 = queryNorm
              0.4395905 = fieldWeight in 1719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.13900314 = weight(abstract_txt:summarization in 1719) [ClassicSimilarity], result of:
            0.13900314 = score(doc=1719,freq=7.0), product of:
              0.11785593 = queryWeight, product of:
                1.1411788 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0144795 = queryNorm
              1.1794327 = fieldWeight in 1719, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.019388411 = weight(abstract_txt:been in 1719) [ClassicSimilarity], result of:
            0.019388411 = score(doc=1719,freq=2.0), product of:
              0.060635813 = queryWeight, product of:
                1.1575975 = boost
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.0144795 = queryNorm
              0.31975183 = fieldWeight in 1719, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.028667621 = weight(abstract_txt:documents in 1719) [ClassicSimilarity], result of:
            0.028667621 = score(doc=1719,freq=2.0), product of:
              0.07869772 = queryWeight, product of:
                1.3187852 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0144795 = queryNorm
              0.36427513 = fieldWeight in 1719, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.08823179 = weight(abstract_txt:condensed in 1719) [ClassicSimilarity], result of:
            0.08823179 = score(doc=1719,freq=1.0), product of:
              0.16651413 = queryWeight, product of:
                1.3564492 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.0144795 = queryNorm
              0.5298757 = fieldWeight in 1719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.02872466 = weight(abstract_txt:text in 1719) [ClassicSimilarity], result of:
            0.02872466 = score(doc=1719,freq=1.0), product of:
              0.11365225 = queryWeight, product of:
                1.9410094 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0144795 = queryNorm
              0.25274166 = fieldWeight in 1719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.19440949 = weight(abstract_txt:extraction in 1719) [ClassicSimilarity], result of:
            0.19440949 = score(doc=1719,freq=2.0), product of:
              0.35524067 = queryWeight, product of:
                3.9625006 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0144795 = queryNorm
              0.54726136 = fieldWeight in 1719, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
        0.28 = coord(7/25)
    
  3. Naing, M.-M.; Lim, E.-P.; Chiang, R.H.L.: Extracting link chains of relationship instances from a Web site (2006) 0.15
    0.15117621 = sum of:
      0.15117621 = product of:
        0.9448514 = sum of:
          0.032125346 = weight(abstract_txt:problem in 6111) [ClassicSimilarity], result of:
            0.032125346 = score(doc=6111,freq=1.0), product of:
              0.09218697 = queryWeight, product of:
                1.4273411 = boost
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.0144795 = queryNorm
              0.3484803 = fieldWeight in 6111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.078125 = fieldNorm(doc=6111)
          0.23818615 = weight(abstract_txt:chain in 6111) [ClassicSimilarity], result of:
            0.23818615 = score(doc=6111,freq=3.0), product of:
              0.24303843 = queryWeight, product of:
                2.3175573 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.0144795 = queryNorm
              0.98003495 = fieldWeight in 6111, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.078125 = fieldNorm(doc=6111)
          0.34367067 = weight(abstract_txt:extraction in 6111) [ClassicSimilarity], result of:
            0.34367067 = score(doc=6111,freq=4.0), product of:
              0.35524067 = queryWeight, product of:
                3.9625006 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0144795 = queryNorm
              0.96743053 = fieldWeight in 6111, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.078125 = fieldNorm(doc=6111)
          0.33086923 = weight(abstract_txt:chains in 6111) [ClassicSimilarity], result of:
            0.33086923 = score(doc=6111,freq=1.0), product of:
              0.49954242 = queryWeight, product of:
                4.069348 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.0144795 = queryNorm
              0.66234463 = fieldWeight in 6111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.078125 = fieldNorm(doc=6111)
        0.16 = coord(4/25)
    
  4. Tseng, Y.-H.: Keyword extraction techniques and relevance feedback (1997) 0.15
    0.14909258 = sum of:
      0.14909258 = product of:
        0.7454629 = sum of:
          0.011416632 = weight(abstract_txt:their in 1830) [ClassicSimilarity], result of:
            0.011416632 = score(doc=1830,freq=1.0), product of:
              0.04625191 = queryWeight, product of:
                1.0110155 = boost
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.0144795 = queryNorm
              0.24683589 = fieldWeight in 1830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.078125 = fieldNorm(doc=1830)
          0.032125346 = weight(abstract_txt:problem in 1830) [ClassicSimilarity], result of:
            0.032125346 = score(doc=1830,freq=1.0), product of:
              0.09218697 = queryWeight, product of:
                1.4273411 = boost
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.0144795 = queryNorm
              0.3484803 = fieldWeight in 1830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.078125 = fieldNorm(doc=1830)
          0.07870883 = weight(abstract_txt:keywords in 1830) [ClassicSimilarity], result of:
            0.07870883 = score(doc=1830,freq=1.0), product of:
              0.16754058 = queryWeight, product of:
                1.9242123 = boost
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.0144795 = queryNorm
              0.46978965 = fieldWeight in 1830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.078125 = fieldNorm(doc=1830)
          0.23897661 = weight(abstract_txt:keyword in 1830) [ClassicSimilarity], result of:
            0.23897661 = score(doc=1830,freq=4.0), product of:
              0.2533291 = queryWeight, product of:
                2.897885 = boost
                6.037405 = idf(docFreq=286, maxDocs=44218)
                0.0144795 = queryNorm
              0.94334453 = fieldWeight in 1830, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.037405 = idf(docFreq=286, maxDocs=44218)
                0.078125 = fieldNorm(doc=1830)
          0.3842355 = weight(abstract_txt:extraction in 1830) [ClassicSimilarity], result of:
            0.3842355 = score(doc=1830,freq=5.0), product of:
              0.35524067 = queryWeight, product of:
                3.9625006 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0144795 = queryNorm
              1.0816202 = fieldWeight in 1830, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.078125 = fieldNorm(doc=1830)
        0.2 = coord(5/25)
    
  5. Morris, J.: Individual differences in the interpretation of text : implications for information science (2009) 0.14
    0.14356045 = sum of:
      0.14356045 = product of:
        0.89725286 = sum of:
          0.05885394 = weight(abstract_txt:semantically in 3318) [ClassicSimilarity], result of:
            0.05885394 = score(doc=3318,freq=1.0), product of:
              0.10954975 = queryWeight, product of:
                1.1002306 = boost
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.0144795 = queryNorm
              0.5372348 = fieldWeight in 3318, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.078125 = fieldNorm(doc=3318)
          0.071811646 = weight(abstract_txt:text in 3318) [ClassicSimilarity], result of:
            0.071811646 = score(doc=3318,freq=4.0), product of:
              0.11365225 = queryWeight, product of:
                1.9410094 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0144795 = queryNorm
              0.6318542 = fieldWeight in 3318, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=3318)
          0.33086923 = weight(abstract_txt:chains in 3318) [ClassicSimilarity], result of:
            0.33086923 = score(doc=3318,freq=1.0), product of:
              0.49954242 = queryWeight, product of:
                4.069348 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.0144795 = queryNorm
              0.66234463 = fieldWeight in 3318, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.078125 = fieldNorm(doc=3318)
          0.43571806 = weight(abstract_txt:lexical in 3318) [ClassicSimilarity], result of:
            0.43571806 = score(doc=3318,freq=3.0), product of:
              0.49337938 = queryWeight, product of:
                5.220998 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.0144795 = queryNorm
              0.88312984 = fieldWeight in 3318, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.078125 = fieldNorm(doc=3318)
        0.16 = coord(4/25)