Document (#40123)

Author
Finegan-Dollak, C.
Radev, D.R.
Title
Sentence simplification, compression, and disaggregation for summarization of sophisticated documents
Source
Journal of the Association for Information Science and Technology. 67(2016) no.10, S.2437-2453
Year
2016
Abstract
Sophisticated documents like legal cases and biomedical articles can contain unusually long sentences. Extractive summarizers can select such sentences-potentially adding hundreds of unnecessary words to the summary-or exclude them and lose important content. Sentence simplification or compression seems on the surface to be a promising solution. However, compression removes words before the selection algorithm can use them, and simplification generates sentences that may be ambiguous in an extractive summary. We therefore compare the performance of an extractive summarizer selecting from the sentences of the original document with that of the summarizer selecting from sentences shortened in three ways: simplification, compression, and disaggregation, which splits one sentence into several according to rules designed to keep all meaning. We find that on legal cases and biomedical articles, these shortening methods generate ungrammatical output. Human evaluators performed an extrinsic evaluation consisting of comprehension questions about the summaries. Evaluators given compressed, simplified, or disaggregated versions of the summaries answered fewer questions correctly than did those given summaries with unaltered sentences. Error analysis suggests 2 causes: Altered sentences sometimes interact with the sentence selection algorithm, and alterations to sentences sometimes obscure information in the summary. We discuss future work to alleviate these problems.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23576/full.
Theme
Automatisches Abstracting

Similar documents (author)

  1. Otterbacher, J.; Radev, D.: Exploring fact-focused relevance and novelty detection (2008) 4.57
    4.565969 = sum of:
      4.565969 = weight(author_txt:radev in 2210) [ClassicSimilarity], result of:
        4.565969 = fieldWeight in 2210, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.5 = fieldNorm(doc=2210)
    
  2. Radev, D.R.; Libner, K.; Fan, W.: Getting answers to natural language questions on the Web (2002) 3.42
    3.4244766 = sum of:
      3.4244766 = weight(author_txt:radev in 5204) [ClassicSimilarity], result of:
        3.4244766 = fieldWeight in 5204, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.375 = fieldNorm(doc=5204)
    
  3. Otterbacher, J.; Radev, D.; Kareem, O.: Hierarchical summarization for delivering information to mobile devices (2008) 3.42
    3.4244766 = sum of:
      3.4244766 = weight(author_txt:radev in 2071) [ClassicSimilarity], result of:
        3.4244766 = fieldWeight in 2071, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.375 = fieldNorm(doc=2071)
    
  4. Otterbacher, J.; Erkan, G.; Radev, D.R.: Biased LexRank : passage retrieval using random walks with question-based priors (2009) 3.42
    3.4244766 = sum of:
      3.4244766 = weight(author_txt:radev in 2450) [ClassicSimilarity], result of:
        3.4244766 = fieldWeight in 2450, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.375 = fieldNorm(doc=2450)
    
  5. Lam, W.; Chan, K.; Radev, D.; Saggion, H.; Teufel, S.: Context-based generic cross-lingual retrieval of documents and automated summaries (2005) 2.85
    2.8537307 = sum of:
      2.8537307 = weight(author_txt:radev in 1965) [ClassicSimilarity], result of:
        2.8537307 = fieldWeight in 1965, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.3125 = fieldNorm(doc=1965)
    

Similar documents (content)

  1. Ling, X.; Jiang, J.; He, X.; Mei, Q.; Zhai, C.; Schatz, B.: Generating gene summaries from biomedical literature : a study of semi-structured summarization (2007) 0.27
    0.27054042 = sum of:
      0.27054042 = product of:
        0.9662157 = sum of:
          0.01714423 = weight(abstract_txt:given in 946) [ClassicSimilarity], result of:
            0.01714423 = score(doc=946,freq=1.0), product of:
              0.05834942 = queryWeight, product of:
                1.0111033 = boost
                4.701121 = idf(docFreq=1091, maxDocs=44218)
                0.012275511 = queryNorm
              0.29382005 = fieldWeight in 946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.701121 = idf(docFreq=1091, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.018046172 = weight(abstract_txt:articles in 946) [ClassicSimilarity], result of:
            0.018046172 = score(doc=946,freq=1.0), product of:
              0.06037836 = queryWeight, product of:
                1.0285323 = boost
                4.7821565 = idf(docFreq=1006, maxDocs=44218)
                0.012275511 = queryNorm
              0.29888478 = fieldWeight in 946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7821565 = idf(docFreq=1006, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.13388477 = weight(abstract_txt:biomedical in 946) [ClassicSimilarity], result of:
            0.13388477 = score(doc=946,freq=5.0), product of:
              0.13431422 = queryWeight, product of:
                1.5340456 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.012275511 = queryNorm
              0.9968026 = fieldWeight in 946, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.13414592 = weight(abstract_txt:summary in 946) [ClassicSimilarity], result of:
            0.13414592 = score(doc=946,freq=4.0), product of:
              0.16583897 = queryWeight, product of:
                2.0876908 = boost
                6.4711404 = idf(docFreq=185, maxDocs=44218)
                0.012275511 = queryNorm
              0.80889255 = fieldWeight in 946, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.4711404 = idf(docFreq=185, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.1217937 = weight(abstract_txt:summaries in 946) [ClassicSimilarity], result of:
            0.1217937 = score(doc=946,freq=2.0), product of:
              0.19591223 = queryWeight, product of:
                2.2691002 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.012275511 = queryNorm
              0.62167484 = fieldWeight in 946, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.14967588 = weight(abstract_txt:sentence in 946) [ClassicSimilarity], result of:
            0.14967588 = score(doc=946,freq=2.0), product of:
              0.24739587 = queryWeight, product of:
                2.9443436 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.012275511 = queryNorm
              0.60500556 = fieldWeight in 946, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.39152506 = weight(abstract_txt:sentences in 946) [ClassicSimilarity], result of:
            0.39152506 = score(doc=946,freq=3.0), product of:
              0.5169444 = queryWeight, product of:
                6.019067 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.012275511 = queryNorm
              0.7573833 = fieldWeight in 946, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
        0.28 = coord(7/25)
    
  2. Bando, L.L.; Scholer, F.; Turpin, A.: Query-biased summary generation assisted by query expansion : temporality (2015) 0.26
    0.2647381 = sum of:
      0.2647381 = product of:
        1.1030755 = sum of:
          0.043839715 = weight(abstract_txt:words in 1820) [ClassicSimilarity], result of:
            0.043839715 = score(doc=1820,freq=3.0), product of:
              0.075653546 = queryWeight, product of:
                1.151309 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.012275511 = queryNorm
              0.57948 = fieldWeight in 1820, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=1820)
          0.036297064 = weight(abstract_txt:selection in 1820) [ClassicSimilarity], result of:
            0.036297064 = score(doc=1820,freq=2.0), product of:
              0.076359354 = queryWeight, product of:
                1.1566671 = boost
                5.377919 = idf(docFreq=554, maxDocs=44218)
                0.012275511 = queryNorm
              0.47534537 = fieldWeight in 1820, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.377919 = idf(docFreq=554, maxDocs=44218)
                0.0625 = fieldNorm(doc=1820)
          0.09485548 = weight(abstract_txt:summary in 1820) [ClassicSimilarity], result of:
            0.09485548 = score(doc=1820,freq=2.0), product of:
              0.16583897 = queryWeight, product of:
                2.0876908 = boost
                6.4711404 = idf(docFreq=185, maxDocs=44218)
                0.012275511 = queryNorm
              0.5719734 = fieldWeight in 1820, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.4711404 = idf(docFreq=185, maxDocs=44218)
                0.0625 = fieldNorm(doc=1820)
          0.2109529 = weight(abstract_txt:summaries in 1820) [ClassicSimilarity], result of:
            0.2109529 = score(doc=1820,freq=6.0), product of:
              0.19591223 = queryWeight, product of:
                2.2691002 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.012275511 = queryNorm
              1.0767725 = fieldWeight in 1820, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.0625 = fieldNorm(doc=1820)
          0.21167366 = weight(abstract_txt:sentence in 1820) [ClassicSimilarity], result of:
            0.21167366 = score(doc=1820,freq=4.0), product of:
              0.24739587 = queryWeight, product of:
                2.9443436 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.012275511 = queryNorm
              0.8556071 = fieldWeight in 1820, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.0625 = fieldNorm(doc=1820)
          0.5054567 = weight(abstract_txt:sentences in 1820) [ClassicSimilarity], result of:
            0.5054567 = score(doc=1820,freq=5.0), product of:
              0.5169444 = queryWeight, product of:
                6.019067 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.012275511 = queryNorm
              0.9777776 = fieldWeight in 1820, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=1820)
        0.24 = coord(6/25)
    
  3. Aker, A.; Gaizauskas, R.: Generating descriptive multi-document summaries of geo-located entities using entity type models (2015) 0.21
    0.20706333 = sum of:
      0.20706333 = product of:
        0.8627639 = sum of:
          0.035794977 = weight(abstract_txt:words in 1726) [ClassicSimilarity], result of:
            0.035794977 = score(doc=1726,freq=2.0), product of:
              0.075653546 = queryWeight, product of:
                1.151309 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.012275511 = queryNorm
              0.47314343 = fieldWeight in 1726, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
          0.22342584 = weight(abstract_txt:summarizer in 1726) [ClassicSimilarity], result of:
            0.22342584 = score(doc=1726,freq=3.0), product of:
              0.2240473 = queryWeight, product of:
                1.9812858 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.012275511 = queryNorm
              0.9972262 = fieldWeight in 1726, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
          0.06707296 = weight(abstract_txt:summary in 1726) [ClassicSimilarity], result of:
            0.06707296 = score(doc=1726,freq=1.0), product of:
              0.16583897 = queryWeight, product of:
                2.0876908 = boost
                6.4711404 = idf(docFreq=185, maxDocs=44218)
                0.012275511 = queryNorm
              0.40444627 = fieldWeight in 1726, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4711404 = idf(docFreq=185, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
          0.14916621 = weight(abstract_txt:summaries in 1726) [ClassicSimilarity], result of:
            0.14916621 = score(doc=1726,freq=3.0), product of:
              0.19591223 = queryWeight, product of:
                2.2691002 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.012275511 = queryNorm
              0.7613931 = fieldWeight in 1726, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
          0.10583683 = weight(abstract_txt:sentence in 1726) [ClassicSimilarity], result of:
            0.10583683 = score(doc=1726,freq=1.0), product of:
              0.24739587 = queryWeight, product of:
                2.9443436 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.012275511 = queryNorm
              0.42780355 = fieldWeight in 1726, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
          0.2814671 = weight(abstract_txt:extractive in 1726) [ClassicSimilarity], result of:
            0.2814671 = score(doc=1726,freq=2.0), product of:
              0.34244967 = queryWeight, product of:
                3.0 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.012275511 = queryNorm
              0.82192254 = fieldWeight in 1726, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
        0.24 = coord(6/25)
    
  4. Ye, S.; Chua, T.-S.; Kan, M.-Y.; Qiu, L.: Document concept lattice for text understanding and summarization (2007) 0.19
    0.18856941 = sum of:
      0.18856941 = product of:
        0.7857059 = sum of:
          0.01714423 = weight(abstract_txt:given in 941) [ClassicSimilarity], result of:
            0.01714423 = score(doc=941,freq=1.0), product of:
              0.05834942 = queryWeight, product of:
                1.0111033 = boost
                4.701121 = idf(docFreq=1091, maxDocs=44218)
                0.012275511 = queryNorm
              0.29382005 = fieldWeight in 941, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.701121 = idf(docFreq=1091, maxDocs=44218)
                0.0625 = fieldNorm(doc=941)
          0.025665902 = weight(abstract_txt:selection in 941) [ClassicSimilarity], result of:
            0.025665902 = score(doc=941,freq=1.0), product of:
              0.076359354 = queryWeight, product of:
                1.1566671 = boost
                5.377919 = idf(docFreq=554, maxDocs=44218)
                0.012275511 = queryNorm
              0.33611995 = fieldWeight in 941, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.377919 = idf(docFreq=554, maxDocs=44218)
                0.0625 = fieldNorm(doc=941)
          0.045632865 = weight(abstract_txt:selecting in 941) [ClassicSimilarity], result of:
            0.045632865 = score(doc=941,freq=1.0), product of:
              0.11206665 = queryWeight, product of:
                1.4012494 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.012275511 = queryNorm
              0.407194 = fieldWeight in 941, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.0625 = fieldNorm(doc=941)
          0.12899497 = weight(abstract_txt:summarizer in 941) [ClassicSimilarity], result of:
            0.12899497 = score(doc=941,freq=1.0), product of:
              0.2240473 = queryWeight, product of:
                1.9812858 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.012275511 = queryNorm
              0.5757488 = fieldWeight in 941, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=941)
          0.11617376 = weight(abstract_txt:summary in 941) [ClassicSimilarity], result of:
            0.11617376 = score(doc=941,freq=3.0), product of:
              0.16583897 = queryWeight, product of:
                2.0876908 = boost
                6.4711404 = idf(docFreq=185, maxDocs=44218)
                0.012275511 = queryNorm
              0.70052147 = fieldWeight in 941, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.4711404 = idf(docFreq=185, maxDocs=44218)
                0.0625 = fieldNorm(doc=941)
          0.4520942 = weight(abstract_txt:sentences in 941) [ClassicSimilarity], result of:
            0.4520942 = score(doc=941,freq=4.0), product of:
              0.5169444 = queryWeight, product of:
                6.019067 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.012275511 = queryNorm
              0.8745509 = fieldWeight in 941, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=941)
        0.24 = coord(6/25)
    
  5. Vanderwende, L.; Suzuki, H.; Brockett, J.M.; Nenkova, A.: Beyond SumBasic : task-focused summarization with sentence simplification and lexical expansion (2007) 0.18
    0.18084303 = sum of:
      0.18084303 = product of:
        0.7535126 = sum of:
          0.025310872 = weight(abstract_txt:words in 948) [ClassicSimilarity], result of:
            0.025310872 = score(doc=948,freq=1.0), product of:
              0.075653546 = queryWeight, product of:
                1.151309 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.012275511 = queryNorm
              0.33456293 = fieldWeight in 948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=948)
          0.06707296 = weight(abstract_txt:summary in 948) [ClassicSimilarity], result of:
            0.06707296 = score(doc=948,freq=1.0), product of:
              0.16583897 = queryWeight, product of:
                2.0876908 = boost
                6.4711404 = idf(docFreq=185, maxDocs=44218)
                0.012275511 = queryNorm
              0.40444627 = fieldWeight in 948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4711404 = idf(docFreq=185, maxDocs=44218)
                0.0625 = fieldNorm(doc=948)
          0.14916621 = weight(abstract_txt:summaries in 948) [ClassicSimilarity], result of:
            0.14916621 = score(doc=948,freq=3.0), product of:
              0.19591223 = queryWeight, product of:
                2.2691002 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.012275511 = queryNorm
              0.7613931 = fieldWeight in 948, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.0625 = fieldNorm(doc=948)
          0.10583683 = weight(abstract_txt:sentence in 948) [ClassicSimilarity], result of:
            0.10583683 = score(doc=948,freq=1.0), product of:
              0.24739587 = queryWeight, product of:
                2.9443436 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.012275511 = queryNorm
              0.42780355 = fieldWeight in 948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.0625 = fieldNorm(doc=948)
          0.1990273 = weight(abstract_txt:extractive in 948) [ClassicSimilarity], result of:
            0.1990273 = score(doc=948,freq=1.0), product of:
              0.34244967 = queryWeight, product of:
                3.0 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.012275511 = queryNorm
              0.581187 = fieldWeight in 948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0625 = fieldNorm(doc=948)
          0.20709851 = weight(abstract_txt:simplification in 948) [ClassicSimilarity], result of:
            0.20709851 = score(doc=948,freq=1.0), product of:
              0.38703704 = queryWeight, product of:
                3.6827185 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.012275511 = queryNorm
              0.53508705 = fieldWeight in 948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.0625 = fieldNorm(doc=948)
        0.24 = coord(6/25)