Document (#40124)

Author
Finegan-Dollak, C.
Radev, D.R.
Title
Sentence simplification, compression, and disaggregation for summarization of sophisticated documents
Source
Journal of the Association for Information Science and Technology. 67(2016) no.10, S.2437-2453
Year
2016
Abstract
Sophisticated documents like legal cases and biomedical articles can contain unusually long sentences. Extractive summarizers can select such sentences-potentially adding hundreds of unnecessary words to the summary-or exclude them and lose important content. Sentence simplification or compression seems on the surface to be a promising solution. However, compression removes words before the selection algorithm can use them, and simplification generates sentences that may be ambiguous in an extractive summary. We therefore compare the performance of an extractive summarizer selecting from the sentences of the original document with that of the summarizer selecting from sentences shortened in three ways: simplification, compression, and disaggregation, which splits one sentence into several according to rules designed to keep all meaning. We find that on legal cases and biomedical articles, these shortening methods generate ungrammatical output. Human evaluators performed an extrinsic evaluation consisting of comprehension questions about the summaries. Evaluators given compressed, simplified, or disaggregated versions of the summaries answered fewer questions correctly than did those given summaries with unaltered sentences. Error analysis suggests 2 causes: Altered sentences sometimes interact with the sentence selection algorithm, and alterations to sentences sometimes obscure information in the summary. We discuss future work to alleviate these problems.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23576/full.
Theme
Automatisches Abstracting

Similar documents (author)

  1. Otterbacher, J.; Radev, D.: Exploring fact-focused relevance and novelty detection (2008) 4.55
    4.5489707 = sum of:
      4.5489707 = weight(author_txt:radev in 4211) [ClassicSimilarity], result of:
        4.5489707 = fieldWeight in 4211, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.097941 = idf(docFreq=12, maxDocs=42740)
          0.5 = fieldNorm(doc=4211)
    
  2. Radev, D.R.; Libner, K.; Fan, W.: Getting answers to natural language questions on the Web (2002) 3.41
    3.411728 = sum of:
      3.411728 = weight(author_txt:radev in 205) [ClassicSimilarity], result of:
        3.411728 = fieldWeight in 205, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.097941 = idf(docFreq=12, maxDocs=42740)
          0.375 = fieldNorm(doc=205)
    
  3. Otterbacher, J.; Radev, D.; Kareem, O.: Hierarchical summarization for delivering information to mobile devices (2008) 3.41
    3.411728 = sum of:
      3.411728 = weight(author_txt:radev in 4072) [ClassicSimilarity], result of:
        3.411728 = fieldWeight in 4072, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.097941 = idf(docFreq=12, maxDocs=42740)
          0.375 = fieldNorm(doc=4072)
    
  4. Otterbacher, J.; Erkan, G.; Radev, D.R.: Biased LexRank : passage retrieval using random walks with question-based priors (2009) 3.41
    3.411728 = sum of:
      3.411728 = weight(author_txt:radev in 4451) [ClassicSimilarity], result of:
        3.411728 = fieldWeight in 4451, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.097941 = idf(docFreq=12, maxDocs=42740)
          0.375 = fieldNorm(doc=4451)
    
  5. Lam, W.; Chan, K.; Radev, D.; Saggion, H.; Teufel, S.: Context-based generic cross-lingual retrieval of documents and automated summaries (2005) 2.84
    2.8431067 = sum of:
      2.8431067 = weight(author_txt:radev in 2966) [ClassicSimilarity], result of:
        2.8431067 = fieldWeight in 2966, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.097941 = idf(docFreq=12, maxDocs=42740)
          0.3125 = fieldNorm(doc=2966)
    

Similar documents (content)

  1. Ling, X.; Jiang, J.; He, X.; Mei, Q.; Zhai, C.; Schatz, B.: Generating gene summaries from biomedical literature : a study of semi-structured summarization (2007) 0.27
    0.27089414 = sum of:
      0.27089414 = product of:
        0.96747905 = sum of:
          0.017079739 = weight(abstract_txt:given in 2947) [ClassicSimilarity], result of:
            0.017079739 = score(doc=2947,freq=1.0), product of:
              0.05811235 = queryWeight, product of:
                1.0151204 = boost
                4.702543 = idf(docFreq=1053, maxDocs=42740)
                0.012173575 = queryNorm
              0.29390892 = fieldWeight in 2947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.702543 = idf(docFreq=1053, maxDocs=42740)
                0.0625 = fieldNorm(doc=2947)
          0.018406397 = weight(abstract_txt:articles in 2947) [ClassicSimilarity], result of:
            0.018406397 = score(doc=2947,freq=1.0), product of:
              0.06108391 = queryWeight, product of:
                1.0407507 = boost
                4.821275 = idf(docFreq=935, maxDocs=42740)
                0.012173575 = queryNorm
              0.3013297 = fieldWeight in 2947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.821275 = idf(docFreq=935, maxDocs=42740)
                0.0625 = fieldNorm(doc=2947)
          0.136905 = weight(abstract_txt:biomedical in 2947) [ClassicSimilarity], result of:
            0.136905 = score(doc=2947,freq=5.0), product of:
              0.13611433 = queryWeight, product of:
                1.5535858 = boost
                7.1969824 = idf(docFreq=86, maxDocs=42740)
                0.012173575 = queryNorm
              1.0058088 = fieldWeight in 2947, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.1969824 = idf(docFreq=86, maxDocs=42740)
                0.0625 = fieldNorm(doc=2947)
          0.13344543 = weight(abstract_txt:summary in 2947) [ClassicSimilarity], result of:
            0.13344543 = score(doc=2947,freq=4.0), product of:
              0.16500376 = queryWeight, product of:
                2.0949607 = boost
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.012173575 = queryNorm
              0.8087417 = fieldWeight in 2947, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.0625 = fieldNorm(doc=2947)
          0.12045424 = weight(abstract_txt:summaries in 2947) [ClassicSimilarity], result of:
            0.12045424 = score(doc=2947,freq=2.0), product of:
              0.1941703 = queryWeight, product of:
                2.272586 = boost
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.012173575 = queryNorm
              0.6203536 = fieldWeight in 2947, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.0625 = fieldNorm(doc=2947)
          0.15098074 = weight(abstract_txt:sentence in 2947) [ClassicSimilarity], result of:
            0.15098074 = score(doc=2947,freq=2.0), product of:
              0.2484441 = queryWeight, product of:
                2.9683332 = boost
                6.8753986 = idf(docFreq=119, maxDocs=42740)
                0.012173575 = queryNorm
              0.6077051 = fieldWeight in 2947, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8753986 = idf(docFreq=119, maxDocs=42740)
                0.0625 = fieldNorm(doc=2947)
          0.39020747 = weight(abstract_txt:sentences in 2947) [ClassicSimilarity], result of:
            0.39020747 = score(doc=2947,freq=3.0), product of:
              0.5149807 = queryWeight, product of:
                6.0437818 = boost
                6.9994516 = idf(docFreq=105, maxDocs=42740)
                0.012173575 = queryNorm
              0.75771284 = fieldWeight in 2947, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9994516 = idf(docFreq=105, maxDocs=42740)
                0.0625 = fieldNorm(doc=2947)
        0.28 = coord(7/25)
    
  2. Bando, L.L.; Scholer, F.; Turpin, A.: Query-biased summary generation assisted by query expansion : temporality (2015) 0.26
    0.26399848 = sum of:
      0.26399848 = product of:
        1.0999937 = sum of:
          0.043768246 = weight(abstract_txt:words in 3821) [ClassicSimilarity], result of:
            0.043768246 = score(doc=3821,freq=3.0), product of:
              0.0754536 = queryWeight, product of:
                1.1567068 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.012173575 = queryNorm
              0.58006835 = fieldWeight in 3821, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0625 = fieldNorm(doc=3821)
          0.035957746 = weight(abstract_txt:selection in 3821) [ClassicSimilarity], result of:
            0.035957746 = score(doc=3821,freq=2.0), product of:
              0.07576453 = queryWeight, product of:
                1.1590877 = boost
                5.369471 = idf(docFreq=540, maxDocs=42740)
                0.012173575 = queryNorm
              0.47459868 = fieldWeight in 3821, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.369471 = idf(docFreq=540, maxDocs=42740)
                0.0625 = fieldNorm(doc=3821)
          0.094360165 = weight(abstract_txt:summary in 3821) [ClassicSimilarity], result of:
            0.094360165 = score(doc=3821,freq=2.0), product of:
              0.16500376 = queryWeight, product of:
                2.0949607 = boost
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.012173575 = queryNorm
              0.57186675 = fieldWeight in 3821, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.0625 = fieldNorm(doc=3821)
          0.20863287 = weight(abstract_txt:summaries in 3821) [ClassicSimilarity], result of:
            0.20863287 = score(doc=3821,freq=6.0), product of:
              0.1941703 = queryWeight, product of:
                2.272586 = boost
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.012173575 = queryNorm
              1.074484 = fieldWeight in 3821, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.0625 = fieldNorm(doc=3821)
          0.21351902 = weight(abstract_txt:sentence in 3821) [ClassicSimilarity], result of:
            0.21351902 = score(doc=3821,freq=4.0), product of:
              0.2484441 = queryWeight, product of:
                2.9683332 = boost
                6.8753986 = idf(docFreq=119, maxDocs=42740)
                0.012173575 = queryNorm
              0.8594248 = fieldWeight in 3821, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.8753986 = idf(docFreq=119, maxDocs=42740)
                0.0625 = fieldNorm(doc=3821)
          0.5037557 = weight(abstract_txt:sentences in 3821) [ClassicSimilarity], result of:
            0.5037557 = score(doc=3821,freq=5.0), product of:
              0.5149807 = queryWeight, product of:
                6.0437818 = boost
                6.9994516 = idf(docFreq=105, maxDocs=42740)
                0.012173575 = queryNorm
              0.9782031 = fieldWeight in 3821, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.9994516 = idf(docFreq=105, maxDocs=42740)
                0.0625 = fieldNorm(doc=3821)
        0.24 = coord(6/25)
    
  3. Aker, A.; Gaizauskas, R.: Generating descriptive multi-document summaries of geo-located entities using entity type models (2015) 0.21
    0.20931959 = sum of:
      0.20931959 = product of:
        0.87216496 = sum of:
          0.035736624 = weight(abstract_txt:words in 3727) [ClassicSimilarity], result of:
            0.035736624 = score(doc=3727,freq=2.0), product of:
              0.0754536 = queryWeight, product of:
                1.1567068 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.012173575 = queryNorm
              0.4736238 = fieldWeight in 3727, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0625 = fieldNorm(doc=3727)
          0.21993053 = weight(abstract_txt:summarizer in 3727) [ClassicSimilarity], result of:
            0.21993053 = score(doc=3727,freq=3.0), product of:
              0.2213592 = queryWeight, product of:
                1.9812171 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.012173575 = queryNorm
              0.99354595 = fieldWeight in 3727, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0625 = fieldNorm(doc=3727)
          0.06672271 = weight(abstract_txt:summary in 3727) [ClassicSimilarity], result of:
            0.06672271 = score(doc=3727,freq=1.0), product of:
              0.16500376 = queryWeight, product of:
                2.0949607 = boost
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.012173575 = queryNorm
              0.40437084 = fieldWeight in 3727, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.0625 = fieldNorm(doc=3727)
          0.14752571 = weight(abstract_txt:summaries in 3727) [ClassicSimilarity], result of:
            0.14752571 = score(doc=3727,freq=3.0), product of:
              0.1941703 = queryWeight, product of:
                2.272586 = boost
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.012173575 = queryNorm
              0.75977486 = fieldWeight in 3727, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.0625 = fieldNorm(doc=3727)
          0.10675951 = weight(abstract_txt:sentence in 3727) [ClassicSimilarity], result of:
            0.10675951 = score(doc=3727,freq=1.0), product of:
              0.2484441 = queryWeight, product of:
                2.9683332 = boost
                6.8753986 = idf(docFreq=119, maxDocs=42740)
                0.012173575 = queryNorm
              0.4297124 = fieldWeight in 3727, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8753986 = idf(docFreq=119, maxDocs=42740)
                0.0625 = fieldNorm(doc=3727)
          0.29548994 = weight(abstract_txt:extractive in 3727) [ClassicSimilarity], result of:
            0.29548994 = score(doc=3727,freq=2.0), product of:
              0.35318035 = queryWeight, product of:
                3.064977 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.012173575 = queryNorm
              0.83665454 = fieldWeight in 3727, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.0625 = fieldNorm(doc=3727)
        0.24 = coord(6/25)
    
  4. Ye, S.; Chua, T.-S.; Kan, M.-Y.; Qiu, L.: Document concept lattice for text understanding and summarization (2007) 0.19
    0.18760043 = sum of:
      0.18760043 = product of:
        0.7816685 = sum of:
          0.017079739 = weight(abstract_txt:given in 2942) [ClassicSimilarity], result of:
            0.017079739 = score(doc=2942,freq=1.0), product of:
              0.05811235 = queryWeight, product of:
                1.0151204 = boost
                4.702543 = idf(docFreq=1053, maxDocs=42740)
                0.012173575 = queryNorm
              0.29390892 = fieldWeight in 2942, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.702543 = idf(docFreq=1053, maxDocs=42740)
                0.0625 = fieldNorm(doc=2942)
          0.025425965 = weight(abstract_txt:selection in 2942) [ClassicSimilarity], result of:
            0.025425965 = score(doc=2942,freq=1.0), product of:
              0.07576453 = queryWeight, product of:
                1.1590877 = boost
                5.369471 = idf(docFreq=540, maxDocs=42740)
                0.012173575 = queryNorm
              0.33559194 = fieldWeight in 2942, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.369471 = idf(docFreq=540, maxDocs=42740)
                0.0625 = fieldNorm(doc=2942)
          0.046045937 = weight(abstract_txt:selecting in 2942) [ClassicSimilarity], result of:
            0.046045937 = score(doc=2942,freq=1.0), product of:
              0.11256633 = queryWeight, product of:
                1.4128224 = boost
                6.5448966 = idf(docFreq=166, maxDocs=42740)
                0.012173575 = queryNorm
              0.40905604 = fieldWeight in 2942, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5448966 = idf(docFreq=166, maxDocs=42740)
                0.0625 = fieldNorm(doc=2942)
          0.12697695 = weight(abstract_txt:summarizer in 2942) [ClassicSimilarity], result of:
            0.12697695 = score(doc=2942,freq=1.0), product of:
              0.2213592 = queryWeight, product of:
                1.9812171 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.012173575 = queryNorm
              0.573624 = fieldWeight in 2942, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0625 = fieldNorm(doc=2942)
          0.11556712 = weight(abstract_txt:summary in 2942) [ClassicSimilarity], result of:
            0.11556712 = score(doc=2942,freq=3.0), product of:
              0.16500376 = queryWeight, product of:
                2.0949607 = boost
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.012173575 = queryNorm
              0.7003908 = fieldWeight in 2942, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.0625 = fieldNorm(doc=2942)
          0.4505728 = weight(abstract_txt:sentences in 2942) [ClassicSimilarity], result of:
            0.4505728 = score(doc=2942,freq=4.0), product of:
              0.5149807 = queryWeight, product of:
                6.0437818 = boost
                6.9994516 = idf(docFreq=105, maxDocs=42740)
                0.012173575 = queryNorm
              0.87493145 = fieldWeight in 2942, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.9994516 = idf(docFreq=105, maxDocs=42740)
                0.0625 = fieldNorm(doc=2942)
        0.24 = coord(6/25)
    
  5. Vanderwende, L.; Suzuki, H.; Brockett, J.M.; Nenkova, A.: Beyond SumBasic : task-focused summarization with sentence simplification and lexical expansion (2007) 0.18
    0.18290615 = sum of:
      0.18290615 = product of:
        0.762109 = sum of:
          0.025269609 = weight(abstract_txt:words in 2949) [ClassicSimilarity], result of:
            0.025269609 = score(doc=2949,freq=1.0), product of:
              0.0754536 = queryWeight, product of:
                1.1567068 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.012173575 = queryNorm
              0.3349026 = fieldWeight in 2949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0625 = fieldNorm(doc=2949)
          0.06672271 = weight(abstract_txt:summary in 2949) [ClassicSimilarity], result of:
            0.06672271 = score(doc=2949,freq=1.0), product of:
              0.16500376 = queryWeight, product of:
                2.0949607 = boost
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.012173575 = queryNorm
              0.40437084 = fieldWeight in 2949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4699335 = idf(docFreq=179, maxDocs=42740)
                0.0625 = fieldNorm(doc=2949)
          0.14752571 = weight(abstract_txt:summaries in 2949) [ClassicSimilarity], result of:
            0.14752571 = score(doc=2949,freq=3.0), product of:
              0.1941703 = queryWeight, product of:
                2.272586 = boost
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.012173575 = queryNorm
              0.75977486 = fieldWeight in 2949, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.0625 = fieldNorm(doc=2949)
          0.10675951 = weight(abstract_txt:sentence in 2949) [ClassicSimilarity], result of:
            0.10675951 = score(doc=2949,freq=1.0), product of:
              0.2484441 = queryWeight, product of:
                2.9683332 = boost
                6.8753986 = idf(docFreq=119, maxDocs=42740)
                0.012173575 = queryNorm
              0.4297124 = fieldWeight in 2949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8753986 = idf(docFreq=119, maxDocs=42740)
                0.0625 = fieldNorm(doc=2949)
          0.20894295 = weight(abstract_txt:extractive in 2949) [ClassicSimilarity], result of:
            0.20894295 = score(doc=2949,freq=1.0), product of:
              0.35318035 = queryWeight, product of:
                3.064977 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.012173575 = queryNorm
              0.5916041 = fieldWeight in 2949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.0625 = fieldNorm(doc=2949)
          0.20688848 = weight(abstract_txt:simplification in 2949) [ClassicSimilarity], result of:
            0.20688848 = score(doc=2949,freq=1.0), product of:
              0.38617295 = queryWeight, product of:
                3.7007456 = boost
                8.571848 = idf(docFreq=21, maxDocs=42740)
                0.012173575 = queryNorm
              0.5357405 = fieldWeight in 2949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.571848 = idf(docFreq=21, maxDocs=42740)
                0.0625 = fieldNorm(doc=2949)
        0.24 = coord(6/25)