Document (#32944)

Author
Soricut, R.
Marcu, D.
Title
Abstractive headline generation using WIDL-expressions
Source
Information processing and management. 43(2007) no.6, S.1536-1548
Year
2007
Abstract
We present a new paradigm for the automatic creation of document headlines that is based on direct transformation of relevant textual information into well-formed textual output. Starting from an input document, we automatically create compact representations of weighted finite sets of strings, called WIDL-expressions, which encode the most important topics in the document. A generic natural language generation engine performs the headline generation task, driven by both statistical knowledge encapsulated in WIDL-expressions (representing topic biases induced by the input document) and statistical knowledge encapsulated in language models (representing biases induced by the target language). Our evaluation shows similar performance in quality with a state-of-the-art, extractive approach to headline generation, and significant improvements in quality over previously proposed solutions to abstractive headline generation.
Theme
Automatisches Abstracting

Similar documents (content)

  1. Aker, A.; Gaizauskas, R.: Generating descriptive multi-document summaries of geo-located entities using entity type models (2015) 0.09
    0.09193895 = sum of:
      0.09193895 = product of:
        0.45969474 = sum of:
          0.1655767 = weight(abstract_txt:extractive in 1726) [ClassicSimilarity], result of:
            0.1655767 = score(doc=1726,freq=2.0), product of:
              0.20145048 = queryWeight, product of:
                1.4521546 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.014918308 = queryNorm
              0.82192254 = fieldWeight in 1726, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
          0.029333573 = weight(abstract_txt:quality in 1726) [ClassicSimilarity], result of:
            0.029333573 = score(doc=1726,freq=1.0), product of:
              0.1008708 = queryWeight, product of:
                1.4532036 = boost
                4.6528544 = idf(docFreq=1145, maxDocs=44218)
                0.014918308 = queryNorm
              0.2908034 = fieldWeight in 1726, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6528544 = idf(docFreq=1145, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
          0.055339824 = weight(abstract_txt:language in 1726) [ClassicSimilarity], result of:
            0.055339824 = score(doc=1726,freq=3.0), product of:
              0.12223738 = queryWeight, product of:
                1.9592568 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.014918308 = queryNorm
              0.45272425 = fieldWeight in 1726, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
          0.07979229 = weight(abstract_txt:document in 1726) [ClassicSimilarity], result of:
            0.07979229 = score(doc=1726,freq=3.0), product of:
              0.17171136 = queryWeight, product of:
                2.681379 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.014918308 = queryNorm
              0.46468848 = fieldWeight in 1726, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
          0.12965237 = weight(abstract_txt:generation in 1726) [ClassicSimilarity], result of:
            0.12965237 = score(doc=1726,freq=1.0), product of:
              0.3687136 = queryWeight, product of:
                4.3929706 = boost
                5.6261497 = idf(docFreq=432, maxDocs=44218)
                0.014918308 = queryNorm
              0.35163435 = fieldWeight in 1726, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6261497 = idf(docFreq=432, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
        0.2 = coord(5/25)
    
  2. Stede, M.: Lexicalization in natural language generation (2002) 0.09
    0.086125754 = sum of:
      0.086125754 = product of:
        0.43062878 = sum of:
          0.04731171 = weight(abstract_txt:generic in 4245) [ClassicSimilarity], result of:
            0.04731171 = score(doc=4245,freq=2.0), product of:
              0.095530614 = queryWeight, product of:
                6.4035826 = idf(docFreq=198, maxDocs=44218)
                0.014918308 = queryNorm
              0.4952518 = fieldWeight in 4245, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.4035826 = idf(docFreq=198, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4245)
          0.0114267515 = weight(abstract_txt:knowledge in 4245) [ClassicSimilarity], result of:
            0.0114267515 = score(doc=4245,freq=1.0), product of:
              0.058811914 = queryWeight, product of:
                1.1096253 = boost
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.014918308 = queryNorm
              0.19429314 = fieldWeight in 4245, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4245)
          0.08248568 = weight(abstract_txt:input in 4245) [ClassicSimilarity], result of:
            0.08248568 = score(doc=4245,freq=2.0), product of:
              0.17435157 = queryWeight, product of:
                1.9105422 = boost
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.014918308 = queryNorm
              0.47309974 = fieldWeight in 4245, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4245)
          0.06251298 = weight(abstract_txt:language in 4245) [ClassicSimilarity], result of:
            0.06251298 = score(doc=4245,freq=5.0), product of:
              0.12223738 = queryWeight, product of:
                1.9592568 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.014918308 = queryNorm
              0.5114064 = fieldWeight in 4245, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4245)
          0.22689165 = weight(abstract_txt:generation in 4245) [ClassicSimilarity], result of:
            0.22689165 = score(doc=4245,freq=4.0), product of:
              0.3687136 = queryWeight, product of:
                4.3929706 = boost
                5.6261497 = idf(docFreq=432, maxDocs=44218)
                0.014918308 = queryNorm
              0.61536014 = fieldWeight in 4245, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.6261497 = idf(docFreq=432, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4245)
        0.2 = coord(5/25)
    
  3. Robin, J.; McKeown, K.: Empirically designing and evaluating a new revision-based model for summary generation (1996) 0.08
    0.07910469 = sum of:
      0.07910469 = product of:
        0.49440435 = sum of:
          0.13920291 = weight(abstract_txt:encode in 6751) [ClassicSimilarity], result of:
            0.13920291 = score(doc=6751,freq=1.0), product of:
              0.17253752 = queryWeight, product of:
                1.3439109 = boost
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.014918308 = queryNorm
              0.8067979 = fieldWeight in 6751, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.09375 = fieldNorm(doc=6751)
          0.092945725 = weight(abstract_txt:textual in 6751) [ClassicSimilarity], result of:
            0.092945725 = score(doc=6751,freq=1.0), product of:
              0.16606608 = queryWeight, product of:
                1.8645935 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.014918308 = queryNorm
              0.5596912 = fieldWeight in 6751, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.09375 = fieldNorm(doc=6751)
          0.06777717 = weight(abstract_txt:language in 6751) [ClassicSimilarity], result of:
            0.06777717 = score(doc=6751,freq=2.0), product of:
              0.12223738 = queryWeight, product of:
                1.9592568 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.014918308 = queryNorm
              0.55447173 = fieldWeight in 6751, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.09375 = fieldNorm(doc=6751)
          0.19447854 = weight(abstract_txt:generation in 6751) [ClassicSimilarity], result of:
            0.19447854 = score(doc=6751,freq=1.0), product of:
              0.3687136 = queryWeight, product of:
                4.3929706 = boost
                5.6261497 = idf(docFreq=432, maxDocs=44218)
                0.014918308 = queryNorm
              0.5274515 = fieldWeight in 6751, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6261497 = idf(docFreq=432, maxDocs=44218)
                0.09375 = fieldNorm(doc=6751)
        0.16 = coord(4/25)
    
  4. Helbig, H.: Knowledge representation and the semantics of natural language (2014) 0.08
    0.07837876 = sum of:
      0.07837876 = product of:
        0.48986727 = sum of:
          0.022619102 = weight(abstract_txt:knowledge in 2396) [ClassicSimilarity], result of:
            0.022619102 = score(doc=2396,freq=3.0), product of:
              0.058811914 = queryWeight, product of:
                1.1096253 = boost
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.014918308 = queryNorm
              0.38460067 = fieldWeight in 2396, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.0625 = fieldNorm(doc=2396)
          0.09036955 = weight(abstract_txt:language in 2396) [ClassicSimilarity], result of:
            0.09036955 = score(doc=2396,freq=8.0), product of:
              0.12223738 = queryWeight, product of:
                1.9592568 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.014918308 = queryNorm
              0.7392956 = fieldWeight in 2396, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0625 = fieldNorm(doc=2396)
          0.19352253 = weight(abstract_txt:expressions in 2396) [ClassicSimilarity], result of:
            0.19352253 = score(doc=2396,freq=2.0), product of:
              0.32237667 = queryWeight, product of:
                3.1817873 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.014918308 = queryNorm
              0.6002994 = fieldWeight in 2396, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.0625 = fieldNorm(doc=2396)
          0.18335612 = weight(abstract_txt:generation in 2396) [ClassicSimilarity], result of:
            0.18335612 = score(doc=2396,freq=2.0), product of:
              0.3687136 = queryWeight, product of:
                4.3929706 = boost
                5.6261497 = idf(docFreq=432, maxDocs=44218)
                0.014918308 = queryNorm
              0.49728605 = fieldWeight in 2396, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6261497 = idf(docFreq=432, maxDocs=44218)
                0.0625 = fieldNorm(doc=2396)
        0.16 = coord(4/25)
    
  5. Kalczynski, P.J.; Chou, A.: Temporal Document Retrieval Model for business news archives (2005) 0.07
    0.07038603 = sum of:
      0.07038603 = product of:
        0.58655024 = sum of:
          0.10432735 = weight(abstract_txt:representing in 1030) [ClassicSimilarity], result of:
            0.10432735 = score(doc=1030,freq=2.0), product of:
              0.1607574 = queryWeight, product of:
                1.8345485 = boost
                5.8738413 = idf(docFreq=337, maxDocs=44218)
                0.014918308 = queryNorm
              0.6489739 = fieldWeight in 1030, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8738413 = idf(docFreq=337, maxDocs=44218)
                0.078125 = fieldNorm(doc=1030)
          0.09974036 = weight(abstract_txt:document in 1030) [ClassicSimilarity], result of:
            0.09974036 = score(doc=1030,freq=3.0), product of:
              0.17171136 = queryWeight, product of:
                2.681379 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.014918308 = queryNorm
              0.5808606 = fieldWeight in 1030, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=1030)
          0.3824825 = weight(abstract_txt:expressions in 1030) [ClassicSimilarity], result of:
            0.3824825 = score(doc=1030,freq=5.0), product of:
              0.32237667 = queryWeight, product of:
                3.1817873 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.014918308 = queryNorm
              1.186446 = fieldWeight in 1030, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.078125 = fieldNorm(doc=1030)
        0.12 = coord(3/25)