Document (#32945)

Author
Soricut, R.
Marcu, D.
Title
Abstractive headline generation using WIDL-expressions
Source
Information processing and management. 43(2007) no.6, S.1536-1548
Year
2007
Abstract
We present a new paradigm for the automatic creation of document headlines that is based on direct transformation of relevant textual information into well-formed textual output. Starting from an input document, we automatically create compact representations of weighted finite sets of strings, called WIDL-expressions, which encode the most important topics in the document. A generic natural language generation engine performs the headline generation task, driven by both statistical knowledge encapsulated in WIDL-expressions (representing topic biases induced by the input document) and statistical knowledge encapsulated in language models (representing biases induced by the target language). Our evaluation shows similar performance in quality with a state-of-the-art, extractive approach to headline generation, and significant improvements in quality over previously proposed solutions to abstractive headline generation.
Theme
Automatisches Abstracting

Similar documents (content)

  1. Aker, A.; Gaizauskas, R.: Generating descriptive multi-document summaries of geo-located entities using entity type models (2015) 0.09
    0.093328714 = sum of:
      0.093328714 = product of:
        0.46664357 = sum of:
          0.02984851 = weight(abstract_txt:quality in 3727) [ClassicSimilarity], result of:
            0.02984851 = score(doc=3727,freq=1.0), product of:
              0.10168588 = queryWeight, product of:
                1.465832 = boost
                4.696583 = idf(docFreq=1040, maxDocs=41962)
                0.014770475 = queryNorm
              0.29353642 = fieldWeight in 3727, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.696583 = idf(docFreq=1040, maxDocs=41962)
                0.0625 = fieldNorm(doc=3727)
          0.17178458 = weight(abstract_txt:extractive in 3727) [ClassicSimilarity], result of:
            0.17178458 = score(doc=3727,freq=2.0), product of:
              0.20572245 = queryWeight, product of:
                1.4742792 = boost
                9.447295 = idf(docFreq=8, maxDocs=41962)
                0.014770475 = queryNorm
              0.8350308 = fieldWeight in 3727, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.447295 = idf(docFreq=8, maxDocs=41962)
                0.0625 = fieldNorm(doc=3727)
          0.055516683 = weight(abstract_txt:language in 3727) [ClassicSimilarity], result of:
            0.055516683 = score(doc=3727,freq=3.0), product of:
              0.12206315 = queryWeight, product of:
                1.9669431 = boost
                4.2014413 = idf(docFreq=1707, maxDocs=41962)
                0.014770475 = queryNorm
              0.45481935 = fieldWeight in 3727, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2014413 = idf(docFreq=1707, maxDocs=41962)
                0.0625 = fieldNorm(doc=3727)
          0.07832061 = weight(abstract_txt:document in 3727) [ClassicSimilarity], result of:
            0.07832061 = score(doc=3727,freq=3.0), product of:
              0.16899188 = queryWeight, product of:
                2.6724021 = boost
                4.28124 = idf(docFreq=1576, maxDocs=41962)
                0.014770475 = queryNorm
              0.46345782 = fieldWeight in 3727, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.28124 = idf(docFreq=1576, maxDocs=41962)
                0.0625 = fieldNorm(doc=3727)
          0.1311732 = weight(abstract_txt:generation in 3727) [ClassicSimilarity], result of:
            0.1311732 = score(doc=3727,freq=1.0), product of:
              0.3702732 = queryWeight, product of:
                4.422675 = boost
                5.668169 = idf(docFreq=393, maxDocs=41962)
                0.014770475 = queryNorm
              0.35426056 = fieldWeight in 3727, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.668169 = idf(docFreq=393, maxDocs=41962)
                0.0625 = fieldNorm(doc=3727)
        0.2 = coord(5/25)
    
  2. Stede, M.: Lexicalization in natural language generation (2002) 0.09
    0.086807676 = sum of:
      0.086807676 = product of:
        0.43403837 = sum of:
          0.046908665 = weight(abstract_txt:generic in 246) [ClassicSimilarity], result of:
            0.046908665 = score(doc=246,freq=2.0), product of:
              0.09465035 = queryWeight, product of:
                6.4080777 = idf(docFreq=187, maxDocs=41962)
                0.014770475 = queryNorm
              0.49559948 = fieldWeight in 246, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.4080777 = idf(docFreq=187, maxDocs=41962)
                0.0546875 = fieldNorm(doc=246)
          0.011872284 = weight(abstract_txt:knowledge in 246) [ClassicSimilarity], result of:
            0.011872284 = score(doc=246,freq=1.0), product of:
              0.060116872 = queryWeight, product of:
                1.127073 = boost
                3.6111858 = idf(docFreq=3081, maxDocs=41962)
                0.014770475 = queryNorm
              0.19748673 = fieldWeight in 246, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6111858 = idf(docFreq=3081, maxDocs=41962)
                0.0546875 = fieldNorm(doc=246)
          0.08299156 = weight(abstract_txt:input in 246) [ClassicSimilarity], result of:
            0.08299156 = score(doc=246,freq=2.0), product of:
              0.17444271 = queryWeight, product of:
                1.9199075 = boost
                6.1514583 = idf(docFreq=242, maxDocs=41962)
                0.014770475 = queryNorm
              0.47575256 = fieldWeight in 246, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1514583 = idf(docFreq=242, maxDocs=41962)
                0.0546875 = fieldNorm(doc=246)
          0.062712766 = weight(abstract_txt:language in 246) [ClassicSimilarity], result of:
            0.062712766 = score(doc=246,freq=5.0), product of:
              0.12206315 = queryWeight, product of:
                1.9669431 = boost
                4.2014413 = idf(docFreq=1707, maxDocs=41962)
                0.014770475 = queryNorm
              0.51377314 = fieldWeight in 246, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.2014413 = idf(docFreq=1707, maxDocs=41962)
                0.0546875 = fieldNorm(doc=246)
          0.2295531 = weight(abstract_txt:generation in 246) [ClassicSimilarity], result of:
            0.2295531 = score(doc=246,freq=4.0), product of:
              0.3702732 = queryWeight, product of:
                4.422675 = boost
                5.668169 = idf(docFreq=393, maxDocs=41962)
                0.014770475 = queryNorm
              0.619956 = fieldWeight in 246, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.668169 = idf(docFreq=393, maxDocs=41962)
                0.0546875 = fieldNorm(doc=246)
        0.2 = coord(5/25)
    
  3. Robin, J.; McKeown, K.: Empirically designing and evaluating a new revision-based model for summary generation (1996) 0.08
    0.079657726 = sum of:
      0.079657726 = product of:
        0.4978608 = sum of:
          0.13979886 = weight(abstract_txt:encode in 6820) [ClassicSimilarity], result of:
            0.13979886 = score(doc=6820,freq=1.0), product of:
              0.17241584 = queryWeight, product of:
                1.3496696 = boost
                8.6487875 = idf(docFreq=19, maxDocs=41962)
                0.014770475 = queryNorm
              0.8108238 = fieldWeight in 6820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.6487875 = idf(docFreq=19, maxDocs=41962)
                0.09375 = fieldNorm(doc=6820)
          0.09330834 = weight(abstract_txt:textual in 6820) [ClassicSimilarity], result of:
            0.09330834 = score(doc=6820,freq=1.0), product of:
              0.16590711 = queryWeight, product of:
                1.8723471 = boost
                5.999073 = idf(docFreq=282, maxDocs=41962)
                0.014770475 = queryNorm
              0.5624131 = fieldWeight in 6820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.999073 = idf(docFreq=282, maxDocs=41962)
                0.09375 = fieldNorm(doc=6820)
          0.06799378 = weight(abstract_txt:language in 6820) [ClassicSimilarity], result of:
            0.06799378 = score(doc=6820,freq=2.0), product of:
              0.12206315 = queryWeight, product of:
                1.9669431 = boost
                4.2014413 = idf(docFreq=1707, maxDocs=41962)
                0.014770475 = queryNorm
              0.5570377 = fieldWeight in 6820, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2014413 = idf(docFreq=1707, maxDocs=41962)
                0.09375 = fieldNorm(doc=6820)
          0.19675979 = weight(abstract_txt:generation in 6820) [ClassicSimilarity], result of:
            0.19675979 = score(doc=6820,freq=1.0), product of:
              0.3702732 = queryWeight, product of:
                4.422675 = boost
                5.668169 = idf(docFreq=393, maxDocs=41962)
                0.014770475 = queryNorm
              0.53139085 = fieldWeight in 6820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.668169 = idf(docFreq=393, maxDocs=41962)
                0.09375 = fieldNorm(doc=6820)
        0.16 = coord(4/25)
    
  4. Helbig, H.: Knowledge representation and the semantics of natural language (2014) 0.08
    0.07947564 = sum of:
      0.07947564 = product of:
        0.4967228 = sum of:
          0.023501026 = weight(abstract_txt:knowledge in 4397) [ClassicSimilarity], result of:
            0.023501026 = score(doc=4397,freq=3.0), product of:
              0.060116872 = queryWeight, product of:
                1.127073 = boost
                3.6111858 = idf(docFreq=3081, maxDocs=41962)
                0.014770475 = queryNorm
              0.3909223 = fieldWeight in 4397, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6111858 = idf(docFreq=3081, maxDocs=41962)
                0.0625 = fieldNorm(doc=4397)
          0.09065837 = weight(abstract_txt:language in 4397) [ClassicSimilarity], result of:
            0.09065837 = score(doc=4397,freq=8.0), product of:
              0.12206315 = queryWeight, product of:
                1.9669431 = boost
                4.2014413 = idf(docFreq=1707, maxDocs=41962)
                0.014770475 = queryNorm
              0.7427169 = fieldWeight in 4397, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.2014413 = idf(docFreq=1707, maxDocs=41962)
                0.0625 = fieldNorm(doc=4397)
          0.19705647 = weight(abstract_txt:expressions in 4397) [ClassicSimilarity], result of:
            0.19705647 = score(doc=4397,freq=2.0), product of:
              0.32513204 = queryWeight, product of:
                3.21018 = boost
                6.857028 = idf(docFreq=119, maxDocs=41962)
                0.014770475 = queryNorm
              0.60608137 = fieldWeight in 4397, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.857028 = idf(docFreq=119, maxDocs=41962)
                0.0625 = fieldNorm(doc=4397)
          0.18550691 = weight(abstract_txt:generation in 4397) [ClassicSimilarity], result of:
            0.18550691 = score(doc=4397,freq=2.0), product of:
              0.3702732 = queryWeight, product of:
                4.422675 = boost
                5.668169 = idf(docFreq=393, maxDocs=41962)
                0.014770475 = queryNorm
              0.5010001 = fieldWeight in 4397, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.668169 = idf(docFreq=393, maxDocs=41962)
                0.0625 = fieldNorm(doc=4397)
        0.16 = coord(4/25)
    
  5. Kalczynski, P.J.; Chou, A.: Temporal Document Retrieval Model for business news archives (2005) 0.07
    0.07121319 = sum of:
      0.07121319 = product of:
        0.5934433 = sum of:
          0.10607541 = weight(abstract_txt:representing in 3031) [ClassicSimilarity], result of:
            0.10607541 = score(doc=3031,freq=2.0), product of:
              0.16197154 = queryWeight, product of:
                1.8500063 = boost
                5.927492 = idf(docFreq=303, maxDocs=41962)
                0.014770475 = queryNorm
              0.65490156 = fieldWeight in 3031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.927492 = idf(docFreq=303, maxDocs=41962)
                0.078125 = fieldNorm(doc=3031)
          0.09790076 = weight(abstract_txt:document in 3031) [ClassicSimilarity], result of:
            0.09790076 = score(doc=3031,freq=3.0), product of:
              0.16899188 = queryWeight, product of:
                2.6724021 = boost
                4.28124 = idf(docFreq=1576, maxDocs=41962)
                0.014770475 = queryNorm
              0.5793223 = fieldWeight in 3031, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.28124 = idf(docFreq=1576, maxDocs=41962)
                0.078125 = fieldNorm(doc=3031)
          0.38946706 = weight(abstract_txt:expressions in 3031) [ClassicSimilarity], result of:
            0.38946706 = score(doc=3031,freq=5.0), product of:
              0.32513204 = queryWeight, product of:
                3.21018 = boost
                6.857028 = idf(docFreq=119, maxDocs=41962)
                0.014770475 = queryNorm
              1.1978735 = fieldWeight in 3031, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.857028 = idf(docFreq=119, maxDocs=41962)
                0.078125 = fieldNorm(doc=3031)
        0.12 = coord(3/25)