Document (#38728)

Author
Aker, A.
Gaizauskas, R.
Title
Generating descriptive multi-document summaries of geo-located entities using entity type models
Source
Journal of the Association for Information Science and Technology. 66(2015) no.4, S.721-738
Year
2015
Abstract
In this article, we investigate the application of entity type models in extractive multi-document summarization using automatic caption generation for images of geo-located entities (e.g., Westminster Abbey) as an application scenario. Entity type models contain sets of patterns aiming to capture the ways geo-located entities are described in natural language. They are automatically derived from texts about geo-located entities of the same type (e.g., churches, lakes). We integrate entity type models into a multi-document summarizer and use them to address the 2 major tasks in extractive multi-document summarization: sentence scoring and summary composition. We experiment with 3 different representation methods for entity type models: signature words, n-gram language models, and dependency patterns. We evaluate the summarizer with integrated entity type models relative to (a) a summarizer using standard text-related features commonly used in text summarization and (b) the Wikipedia location descriptions. Our results show that entity type models significantly improve the quality of output summaries over that of summaries generated using standard summarization features and Wikipedia summaries. The representation of entity type models using dependency patterns is superior to the representations using signature words and n-gram language models.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23211/abstract.

Similar documents (content)

  1. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.28
    0.2838199 = sum of:
      0.2838199 = product of:
        0.88693726 = sum of:
          0.02528605 = weight(abstract_txt:representation in 1564) [ClassicSimilarity], result of:
            0.02528605 = score(doc=1564,freq=1.0), product of:
              0.06554424 = queryWeight, product of:
                1.1287782 = boost
                4.9380608 = idf(docFreq=858, maxDocs=44083)
                0.011758975 = queryNorm
              0.385786 = fieldWeight in 1564, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9380608 = idf(docFreq=858, maxDocs=44083)
                0.078125 = fieldNorm(doc=1564)
          0.03221943 = weight(abstract_txt:words in 1564) [ClassicSimilarity], result of:
            0.03221943 = score(doc=1564,freq=1.0), product of:
              0.07703577 = queryWeight, product of:
                1.2237356 = boost
                5.3534703 = idf(docFreq=566, maxDocs=44083)
                0.011758975 = queryNorm
              0.41823986 = fieldWeight in 1564, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3534703 = idf(docFreq=566, maxDocs=44083)
                0.078125 = fieldNorm(doc=1564)
          0.023061669 = weight(abstract_txt:language in 1564) [ClassicSimilarity], result of:
            0.023061669 = score(doc=1564,freq=1.0), product of:
              0.07056209 = queryWeight, product of:
                1.434408 = boost
                4.1833987 = idf(docFreq=1826, maxDocs=44083)
                0.011758975 = queryNorm
              0.32682803 = fieldWeight in 1564, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1833987 = idf(docFreq=1826, maxDocs=44083)
                0.078125 = fieldNorm(doc=1564)
          0.04696044 = weight(abstract_txt:document in 1564) [ClassicSimilarity], result of:
            0.04696044 = score(doc=1564,freq=2.0), product of:
              0.09903042 = queryWeight, product of:
                1.9621882 = boost
                4.291988 = idf(docFreq=1638, maxDocs=44083)
                0.011758975 = queryNorm
              0.47420216 = fieldWeight in 1564, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.291988 = idf(docFreq=1638, maxDocs=44083)
                0.078125 = fieldNorm(doc=1564)
          0.026217284 = weight(abstract_txt:using in 1564) [ClassicSimilarity], result of:
            0.026217284 = score(doc=1564,freq=1.0), product of:
              0.09683806 = queryWeight, product of:
                2.37643 = boost
                3.4653857 = idf(docFreq=3745, maxDocs=44083)
                0.011758975 = queryNorm
              0.27073327 = fieldWeight in 1564, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4653857 = idf(docFreq=3745, maxDocs=44083)
                0.078125 = fieldNorm(doc=1564)
          0.21679494 = weight(abstract_txt:multi in 1564) [ClassicSimilarity], result of:
            0.21679494 = score(doc=1564,freq=6.0), product of:
              0.19037336 = queryWeight, product of:
                2.7205672 = boost
                5.950826 = idf(docFreq=311, maxDocs=44083)
                0.011758975 = queryNorm
              1.1387882 = fieldWeight in 1564, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.950826 = idf(docFreq=311, maxDocs=44083)
                0.078125 = fieldNorm(doc=1564)
          0.2527788 = weight(abstract_txt:summaries in 1564) [ClassicSimilarity], result of:
            0.2527788 = score(doc=1564,freq=3.0), product of:
              0.2657116 = queryWeight, product of:
                3.2141166 = boost
                7.0303903 = idf(docFreq=105, maxDocs=44083)
                0.011758975 = queryNorm
              0.9513276 = fieldWeight in 1564, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0303903 = idf(docFreq=105, maxDocs=44083)
                0.078125 = fieldNorm(doc=1564)
          0.26361862 = weight(abstract_txt:summarization in 1564) [ClassicSimilarity], result of:
            0.26361862 = score(doc=1564,freq=3.0), product of:
              0.2732546 = queryWeight, product of:
                3.2594185 = boost
                7.1294813 = idf(docFreq=95, maxDocs=44083)
                0.011758975 = queryNorm
              0.9647362 = fieldWeight in 1564, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.1294813 = idf(docFreq=95, maxDocs=44083)
                0.078125 = fieldNorm(doc=1564)
        0.32 = coord(8/25)
    
  2. Sankarasubramaniam, Y.; Ramanathan, K.; Ghosh, S.: Text summarization using Wikipedia (2014) 0.22
    0.22275531 = sum of:
      0.22275531 = product of:
        0.7955547 = sum of:
          0.08260143 = weight(abstract_txt:wikipedia in 3694) [ClassicSimilarity], result of:
            0.08260143 = score(doc=3694,freq=4.0), product of:
              0.10548537 = queryWeight, product of:
                1.4319818 = boost
                6.264484 = idf(docFreq=227, maxDocs=44083)
                0.011758975 = queryNorm
              0.7830605 = fieldWeight in 3694, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.264484 = idf(docFreq=227, maxDocs=44083)
                0.0625 = fieldNorm(doc=3694)
          0.026564835 = weight(abstract_txt:document in 3694) [ClassicSimilarity], result of:
            0.026564835 = score(doc=3694,freq=1.0), product of:
              0.09903042 = queryWeight, product of:
                1.9621882 = boost
                4.291988 = idf(docFreq=1638, maxDocs=44083)
                0.011758975 = queryNorm
              0.26824924 = fieldWeight in 3694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.291988 = idf(docFreq=1638, maxDocs=44083)
                0.0625 = fieldNorm(doc=3694)
          0.03632773 = weight(abstract_txt:using in 3694) [ClassicSimilarity], result of:
            0.03632773 = score(doc=3694,freq=3.0), product of:
              0.09683806 = queryWeight, product of:
                2.37643 = boost
                3.4653857 = idf(docFreq=3745, maxDocs=44083)
                0.011758975 = queryNorm
              0.375139 = fieldWeight in 3694, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4653857 = idf(docFreq=3745, maxDocs=44083)
                0.0625 = fieldNorm(doc=3694)
          0.070804924 = weight(abstract_txt:multi in 3694) [ClassicSimilarity], result of:
            0.070804924 = score(doc=3694,freq=1.0), product of:
              0.19037336 = queryWeight, product of:
                2.7205672 = boost
                5.950826 = idf(docFreq=311, maxDocs=44083)
                0.011758975 = queryNorm
              0.37192664 = fieldWeight in 3694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.950826 = idf(docFreq=311, maxDocs=44083)
                0.0625 = fieldNorm(doc=3694)
          0.19679742 = weight(abstract_txt:summarizer in 3694) [ClassicSimilarity], result of:
            0.19679742 = score(doc=3694,freq=1.0), product of:
              0.34192476 = queryWeight, product of:
                3.1575646 = boost
                9.208922 = idf(docFreq=11, maxDocs=44083)
                0.011758975 = queryNorm
              0.57555765 = fieldWeight in 3694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.208922 = idf(docFreq=11, maxDocs=44083)
                0.0625 = fieldNorm(doc=3694)
          0.29825044 = weight(abstract_txt:summarization in 3694) [ClassicSimilarity], result of:
            0.29825044 = score(doc=3694,freq=6.0), product of:
              0.2732546 = queryWeight, product of:
                3.2594185 = boost
                7.1294813 = idf(docFreq=95, maxDocs=44083)
                0.011758975 = queryNorm
              1.0914745 = fieldWeight in 3694, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.1294813 = idf(docFreq=95, maxDocs=44083)
                0.0625 = fieldNorm(doc=3694)
          0.08420794 = weight(abstract_txt:models in 3694) [ClassicSimilarity], result of:
            0.08420794 = score(doc=3694,freq=1.0), product of:
              0.29003197 = queryWeight, product of:
                5.3094473 = boost
                4.645443 = idf(docFreq=1150, maxDocs=44083)
                0.011758975 = queryNorm
              0.2903402 = fieldWeight in 3694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.645443 = idf(docFreq=1150, maxDocs=44083)
                0.0625 = fieldNorm(doc=3694)
        0.28 = coord(7/25)
    
  3. Vanderwende, L.; Suzuki, H.; Brockett, J.M.; Nenkova, A.: Beyond SumBasic : task-focused summarization with sentence simplification and lexical expansion (2007) 0.19
    0.19381307 = sum of:
      0.19381307 = product of:
        0.6921895 = sum of:
          0.025775544 = weight(abstract_txt:words in 1949) [ClassicSimilarity], result of:
            0.025775544 = score(doc=1949,freq=1.0), product of:
              0.07703577 = queryWeight, product of:
                1.2237356 = boost
                5.3534703 = idf(docFreq=566, maxDocs=44083)
                0.011758975 = queryNorm
              0.3345919 = fieldWeight in 1949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3534703 = idf(docFreq=566, maxDocs=44083)
                0.0625 = fieldNorm(doc=1949)
          0.026564835 = weight(abstract_txt:document in 1949) [ClassicSimilarity], result of:
            0.026564835 = score(doc=1949,freq=1.0), product of:
              0.09903042 = queryWeight, product of:
                1.9621882 = boost
                4.291988 = idf(docFreq=1638, maxDocs=44083)
                0.011758975 = queryNorm
              0.26824924 = fieldWeight in 1949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.291988 = idf(docFreq=1638, maxDocs=44083)
                0.0625 = fieldNorm(doc=1949)
          0.13495247 = weight(abstract_txt:extractive in 1949) [ClassicSimilarity], result of:
            0.13495247 = score(doc=1949,freq=1.0), product of:
              0.23227784 = queryWeight, product of:
                2.124933 = boost
                9.295935 = idf(docFreq=10, maxDocs=44083)
                0.011758975 = queryNorm
              0.5809959 = fieldWeight in 1949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.295935 = idf(docFreq=10, maxDocs=44083)
                0.0625 = fieldNorm(doc=1949)
          0.020973826 = weight(abstract_txt:using in 1949) [ClassicSimilarity], result of:
            0.020973826 = score(doc=1949,freq=1.0), product of:
              0.09683806 = queryWeight, product of:
                2.37643 = boost
                3.4653857 = idf(docFreq=3745, maxDocs=44083)
                0.011758975 = queryNorm
              0.2165866 = fieldWeight in 1949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4653857 = idf(docFreq=3745, maxDocs=44083)
                0.0625 = fieldNorm(doc=1949)
          0.070804924 = weight(abstract_txt:multi in 1949) [ClassicSimilarity], result of:
            0.070804924 = score(doc=1949,freq=1.0), product of:
              0.19037336 = queryWeight, product of:
                2.7205672 = boost
                5.950826 = idf(docFreq=311, maxDocs=44083)
                0.011758975 = queryNorm
              0.37192664 = fieldWeight in 1949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.950826 = idf(docFreq=311, maxDocs=44083)
                0.0625 = fieldNorm(doc=1949)
          0.20222303 = weight(abstract_txt:summaries in 1949) [ClassicSimilarity], result of:
            0.20222303 = score(doc=1949,freq=3.0), product of:
              0.2657116 = queryWeight, product of:
                3.2141166 = boost
                7.0303903 = idf(docFreq=105, maxDocs=44083)
                0.011758975 = queryNorm
              0.7610621 = fieldWeight in 1949, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0303903 = idf(docFreq=105, maxDocs=44083)
                0.0625 = fieldNorm(doc=1949)
          0.21089488 = weight(abstract_txt:summarization in 1949) [ClassicSimilarity], result of:
            0.21089488 = score(doc=1949,freq=3.0), product of:
              0.2732546 = queryWeight, product of:
                3.2594185 = boost
                7.1294813 = idf(docFreq=95, maxDocs=44083)
                0.011758975 = queryNorm
              0.77178895 = fieldWeight in 1949, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.1294813 = idf(docFreq=95, maxDocs=44083)
                0.0625 = fieldNorm(doc=1949)
        0.28 = coord(7/25)
    
  4. Kar, M.; Nunes, S.; Ribeiro, C.: Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model (2015) 0.19
    0.19094314 = sum of:
      0.19094314 = product of:
        0.59669733 = sum of:
          0.013299095 = weight(abstract_txt:standard in 3677) [ClassicSimilarity], result of:
            0.013299095 = score(doc=3677,freq=1.0), product of:
              0.060033605 = queryWeight, product of:
                1.0802855 = boost
                4.72592 = idf(docFreq=1061, maxDocs=44083)
                0.011758975 = queryNorm
              0.22152752 = fieldWeight in 3677, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.72592 = idf(docFreq=1061, maxDocs=44083)
                0.046875 = fieldNorm(doc=3677)
          0.019331658 = weight(abstract_txt:words in 3677) [ClassicSimilarity], result of:
            0.019331658 = score(doc=3677,freq=1.0), product of:
              0.07703577 = queryWeight, product of:
                1.2237356 = boost
                5.3534703 = idf(docFreq=566, maxDocs=44083)
                0.011758975 = queryNorm
              0.25094393 = fieldWeight in 3677, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3534703 = idf(docFreq=566, maxDocs=44083)
                0.046875 = fieldNorm(doc=3677)
          0.030975534 = weight(abstract_txt:wikipedia in 3677) [ClassicSimilarity], result of:
            0.030975534 = score(doc=3677,freq=1.0), product of:
              0.10548537 = queryWeight, product of:
                1.4319818 = boost
                6.264484 = idf(docFreq=227, maxDocs=44083)
                0.011758975 = queryNorm
              0.29364768 = fieldWeight in 3677, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.264484 = idf(docFreq=227, maxDocs=44083)
                0.046875 = fieldNorm(doc=3677)
          0.044550583 = weight(abstract_txt:document in 3677) [ClassicSimilarity], result of:
            0.044550583 = score(doc=3677,freq=5.0), product of:
              0.09903042 = queryWeight, product of:
                1.9621882 = boost
                4.291988 = idf(docFreq=1638, maxDocs=44083)
                0.011758975 = queryNorm
              0.44986767 = fieldWeight in 3677, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.291988 = idf(docFreq=1638, maxDocs=44083)
                0.046875 = fieldNorm(doc=3677)
          0.10121436 = weight(abstract_txt:extractive in 3677) [ClassicSimilarity], result of:
            0.10121436 = score(doc=3677,freq=1.0), product of:
              0.23227784 = queryWeight, product of:
                2.124933 = boost
                9.295935 = idf(docFreq=10, maxDocs=44083)
                0.011758975 = queryNorm
              0.43574694 = fieldWeight in 3677, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.295935 = idf(docFreq=10, maxDocs=44083)
                0.046875 = fieldNorm(doc=3677)
          0.03146074 = weight(abstract_txt:using in 3677) [ClassicSimilarity], result of:
            0.03146074 = score(doc=3677,freq=4.0), product of:
              0.09683806 = queryWeight, product of:
                2.37643 = boost
                3.4653857 = idf(docFreq=3745, maxDocs=44083)
                0.011758975 = queryNorm
              0.3248799 = fieldWeight in 3677, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4653857 = idf(docFreq=3745, maxDocs=44083)
                0.046875 = fieldNorm(doc=3677)
          0.15166727 = weight(abstract_txt:summaries in 3677) [ClassicSimilarity], result of:
            0.15166727 = score(doc=3677,freq=3.0), product of:
              0.2657116 = queryWeight, product of:
                3.2141166 = boost
                7.0303903 = idf(docFreq=105, maxDocs=44083)
                0.011758975 = queryNorm
              0.57079655 = fieldWeight in 3677, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0303903 = idf(docFreq=105, maxDocs=44083)
                0.046875 = fieldNorm(doc=3677)
          0.2041981 = weight(abstract_txt:summarization in 3677) [ClassicSimilarity], result of:
            0.2041981 = score(doc=3677,freq=5.0), product of:
              0.2732546 = queryWeight, product of:
                3.2594185 = boost
                7.1294813 = idf(docFreq=95, maxDocs=44083)
                0.011758975 = queryNorm
              0.7472815 = fieldWeight in 3677, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.1294813 = idf(docFreq=95, maxDocs=44083)
                0.046875 = fieldNorm(doc=3677)
        0.32 = coord(8/25)
    
  5. Liu, X.; Zheng, W.; Fang, H.: ¬An exploration of ranking models and feedback method for related entity finding (2013) 0.18
    0.18000306 = sum of:
      0.18000306 = product of:
        0.75001276 = sum of:
          0.017732127 = weight(abstract_txt:standard in 3715) [ClassicSimilarity], result of:
            0.017732127 = score(doc=3715,freq=1.0), product of:
              0.060033605 = queryWeight, product of:
                1.0802855 = boost
                4.72592 = idf(docFreq=1061, maxDocs=44083)
                0.011758975 = queryNorm
              0.29537 = fieldWeight in 3715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.72592 = idf(docFreq=1061, maxDocs=44083)
                0.0625 = fieldNorm(doc=3715)
          0.026564835 = weight(abstract_txt:document in 3715) [ClassicSimilarity], result of:
            0.026564835 = score(doc=3715,freq=1.0), product of:
              0.09903042 = queryWeight, product of:
                1.9621882 = boost
                4.291988 = idf(docFreq=1638, maxDocs=44083)
                0.011758975 = queryNorm
              0.26824924 = fieldWeight in 3715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.291988 = idf(docFreq=1638, maxDocs=44083)
                0.0625 = fieldNorm(doc=3715)
          0.17642747 = weight(abstract_txt:entities in 3715) [ClassicSimilarity], result of:
            0.17642747 = score(doc=3715,freq=7.0), product of:
              0.18291192 = queryWeight, product of:
                2.6667197 = boost
                5.833043 = idf(docFreq=350, maxDocs=44083)
                0.011758975 = queryNorm
              0.9645488 = fieldWeight in 3715, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.833043 = idf(docFreq=350, maxDocs=44083)
                0.0625 = fieldNorm(doc=3715)
          0.09424691 = weight(abstract_txt:type in 3715) [ClassicSimilarity], result of:
            0.09424691 = score(doc=3715,freq=1.0), product of:
              0.30185807 = queryWeight, product of:
                5.1386495 = boost
                4.9955616 = idf(docFreq=810, maxDocs=44083)
                0.011758975 = queryNorm
              0.3122226 = fieldWeight in 3715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9955616 = idf(docFreq=810, maxDocs=44083)
                0.0625 = fieldNorm(doc=3715)
          0.14585243 = weight(abstract_txt:models in 3715) [ClassicSimilarity], result of:
            0.14585243 = score(doc=3715,freq=3.0), product of:
              0.29003197 = queryWeight, product of:
                5.3094473 = boost
                4.645443 = idf(docFreq=1150, maxDocs=44083)
                0.011758975 = queryNorm
              0.502884 = fieldWeight in 3715, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.645443 = idf(docFreq=1150, maxDocs=44083)
                0.0625 = fieldNorm(doc=3715)
          0.28918898 = weight(abstract_txt:entity in 3715) [ClassicSimilarity], result of:
            0.28918898 = score(doc=3715,freq=3.0), product of:
              0.42493382 = queryWeight, product of:
                5.748202 = boost
                6.286658 = idf(docFreq=222, maxDocs=44083)
                0.011758975 = queryNorm
              0.68055063 = fieldWeight in 3715, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.286658 = idf(docFreq=222, maxDocs=44083)
                0.0625 = fieldNorm(doc=3715)
        0.24 = coord(6/25)