Document (#38728)

Author
Aker, A.
Gaizauskas, R.
Title
Generating descriptive multi-document summaries of geo-located entities using entity type models
Source
Journal of the Association for Information Science and Technology. 66(2015) no.4, S.721-738
Year
2015
Abstract
In this article, we investigate the application of entity type models in extractive multi-document summarization using automatic caption generation for images of geo-located entities (e.g., Westminster Abbey) as an application scenario. Entity type models contain sets of patterns aiming to capture the ways geo-located entities are described in natural language. They are automatically derived from texts about geo-located entities of the same type (e.g., churches, lakes). We integrate entity type models into a multi-document summarizer and use them to address the 2 major tasks in extractive multi-document summarization: sentence scoring and summary composition. We experiment with 3 different representation methods for entity type models: signature words, n-gram language models, and dependency patterns. We evaluate the summarizer with integrated entity type models relative to (a) a summarizer using standard text-related features commonly used in text summarization and (b) the Wikipedia location descriptions. Our results show that entity type models significantly improve the quality of output summaries over that of summaries generated using standard summarization features and Wikipedia summaries. The representation of entity type models using dependency patterns is superior to the representations using signature words and n-gram language models.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23211/abstract.

Similar documents (content)

  1. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.28
    0.28253907 = sum of:
      0.28253907 = product of:
        0.8829346 = sum of:
          0.02545674 = weight(abstract_txt:representation in 2564) [ClassicSimilarity], result of:
            0.02545674 = score(doc=2564,freq=1.0), product of:
              0.06566622 = queryWeight, product of:
                1.123973 = boost
                4.9621596 = idf(docFreq=812, maxDocs=42740)
                0.011773766 = queryNorm
              0.38766873 = fieldWeight in 2564, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9621596 = idf(docFreq=812, maxDocs=42740)
                0.078125 = fieldNorm(doc=2564)
          0.03205576 = weight(abstract_txt:words in 2564) [ClassicSimilarity], result of:
            0.03205576 = score(doc=2564,freq=1.0), product of:
              0.076573335 = queryWeight, product of:
                1.2137344 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.011773766 = queryNorm
              0.41862828 = fieldWeight in 2564, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.078125 = fieldNorm(doc=2564)
          0.023051916 = weight(abstract_txt:language in 2564) [ClassicSimilarity], result of:
            0.023051916 = score(doc=2564,freq=1.0), product of:
              0.07035721 = queryWeight, product of:
                1.4249014 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.011773766 = queryNorm
              0.32764113 = fieldWeight in 2564, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.078125 = fieldNorm(doc=2564)
          0.046227463 = weight(abstract_txt:document in 2564) [ClassicSimilarity], result of:
            0.046227463 = score(doc=2564,freq=2.0), product of:
              0.09774027 = queryWeight, product of:
                1.9392626 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.011773766 = queryNorm
              0.4729623 = fieldWeight in 2564, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.078125 = fieldNorm(doc=2564)
          0.02633054 = weight(abstract_txt:using in 2564) [ClassicSimilarity], result of:
            0.02633054 = score(doc=2564,freq=1.0), product of:
              0.09686208 = queryWeight, product of:
                2.3644078 = boost
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.011773766 = queryNorm
              0.2718354 = fieldWeight in 2564, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.078125 = fieldNorm(doc=2564)
          0.217456 = weight(abstract_txt:multi in 2564) [ClassicSimilarity], result of:
            0.217456 = score(doc=2564,freq=6.0), product of:
              0.19025992 = queryWeight, product of:
                2.705662 = boost
                5.972531 = idf(docFreq=295, maxDocs=42740)
                0.011773766 = queryNorm
              1.1429417 = fieldWeight in 2564, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.972531 = idf(docFreq=295, maxDocs=42740)
                0.078125 = fieldNorm(doc=2564)
          0.249525 = weight(abstract_txt:summaries in 2564) [ClassicSimilarity], result of:
            0.249525 = score(doc=2564,freq=3.0), product of:
              0.26273572 = queryWeight, product of:
                3.1795044 = boost
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.011773766 = queryNorm
              0.9497186 = fieldWeight in 2564, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.078125 = fieldNorm(doc=2564)
          0.26283118 = weight(abstract_txt:summarization in 2564) [ClassicSimilarity], result of:
            0.26283118 = score(doc=2564,freq=3.0), product of:
              0.27199507 = queryWeight, product of:
                3.2350454 = boost
                7.141102 = idf(docFreq=91, maxDocs=42740)
                0.011773766 = queryNorm
              0.96630865 = fieldWeight in 2564, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.141102 = idf(docFreq=91, maxDocs=42740)
                0.078125 = fieldNorm(doc=2564)
        0.32 = coord(8/25)
    
  2. Sankarasubramaniam, Y.; Ramanathan, K.; Ghosh, S.: Text summarization using Wikipedia (2014) 0.22
    0.22216555 = sum of:
      0.22216555 = product of:
        0.7934484 = sum of:
          0.08304796 = weight(abstract_txt:wikipedia in 4694) [ClassicSimilarity], result of:
            0.08304796 = score(doc=4694,freq=4.0), product of:
              0.10558758 = queryWeight, product of:
                1.4252508 = boost
                6.2922525 = idf(docFreq=214, maxDocs=42740)
                0.011773766 = queryNorm
              0.78653157 = fieldWeight in 4694, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2922525 = idf(docFreq=214, maxDocs=42740)
                0.0625 = fieldNorm(doc=4694)
          0.026150202 = weight(abstract_txt:document in 4694) [ClassicSimilarity], result of:
            0.026150202 = score(doc=4694,freq=1.0), product of:
              0.09774027 = queryWeight, product of:
                1.9392626 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.011773766 = queryNorm
              0.26754788 = fieldWeight in 4694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=4694)
          0.03648467 = weight(abstract_txt:using in 4694) [ClassicSimilarity], result of:
            0.03648467 = score(doc=4694,freq=3.0), product of:
              0.09686208 = queryWeight, product of:
                2.3644078 = boost
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.011773766 = queryNorm
              0.3766662 = fieldWeight in 4694, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.0625 = fieldNorm(doc=4694)
          0.07102083 = weight(abstract_txt:multi in 4694) [ClassicSimilarity], result of:
            0.07102083 = score(doc=4694,freq=1.0), product of:
              0.19025992 = queryWeight, product of:
                2.705662 = boost
                5.972531 = idf(docFreq=295, maxDocs=42740)
                0.011773766 = queryNorm
              0.37328318 = fieldWeight in 4694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.972531 = idf(docFreq=295, maxDocs=42740)
                0.0625 = fieldNorm(doc=4694)
          0.19329193 = weight(abstract_txt:summarizer in 4694) [ClassicSimilarity], result of:
            0.19329193 = score(doc=4694,freq=1.0), product of:
              0.33696625 = queryWeight, product of:
                3.1183417 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.011773766 = queryNorm
              0.573624 = fieldWeight in 4694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0625 = fieldNorm(doc=4694)
          0.29735956 = weight(abstract_txt:summarization in 4694) [ClassicSimilarity], result of:
            0.29735956 = score(doc=4694,freq=6.0), product of:
              0.27199507 = queryWeight, product of:
                3.2350454 = boost
                7.141102 = idf(docFreq=91, maxDocs=42740)
                0.011773766 = queryNorm
              1.0932535 = fieldWeight in 4694, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.141102 = idf(docFreq=91, maxDocs=42740)
                0.0625 = fieldNorm(doc=4694)
          0.08609326 = weight(abstract_txt:models in 4694) [ClassicSimilarity], result of:
            0.08609326 = score(doc=4694,freq=1.0), product of:
              0.2935731 = queryWeight, product of:
                5.3140793 = boost
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.011773766 = queryNorm
              0.29326004 = fieldWeight in 4694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.0625 = fieldNorm(doc=4694)
        0.28 = coord(7/25)
    
  3. Vanderwende, L.; Suzuki, H.; Brockett, J.M.; Nenkova, A.: Beyond SumBasic : task-focused summarization with sentence simplification and lexical expansion (2007) 0.19
    0.1946357 = sum of:
      0.1946357 = product of:
        0.6951275 = sum of:
          0.02564461 = weight(abstract_txt:words in 2949) [ClassicSimilarity], result of:
            0.02564461 = score(doc=2949,freq=1.0), product of:
              0.076573335 = queryWeight, product of:
                1.2137344 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.011773766 = queryNorm
              0.3349026 = fieldWeight in 2949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0625 = fieldNorm(doc=2949)
          0.026150202 = weight(abstract_txt:document in 2949) [ClassicSimilarity], result of:
            0.026150202 = score(doc=2949,freq=1.0), product of:
              0.09774027 = queryWeight, product of:
                1.9392626 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.011773766 = queryNorm
              0.26754788 = fieldWeight in 2949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=2949)
          0.14136246 = weight(abstract_txt:extractive in 2949) [ClassicSimilarity], result of:
            0.14136246 = score(doc=2949,freq=1.0), product of:
              0.23894773 = queryWeight, product of:
                2.144057 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.011773766 = queryNorm
              0.5916041 = fieldWeight in 2949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.0625 = fieldNorm(doc=2949)
          0.021064434 = weight(abstract_txt:using in 2949) [ClassicSimilarity], result of:
            0.021064434 = score(doc=2949,freq=1.0), product of:
              0.09686208 = queryWeight, product of:
                2.3644078 = boost
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.011773766 = queryNorm
              0.21746832 = fieldWeight in 2949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.0625 = fieldNorm(doc=2949)
          0.07102083 = weight(abstract_txt:multi in 2949) [ClassicSimilarity], result of:
            0.07102083 = score(doc=2949,freq=1.0), product of:
              0.19025992 = queryWeight, product of:
                2.705662 = boost
                5.972531 = idf(docFreq=295, maxDocs=42740)
                0.011773766 = queryNorm
              0.37328318 = fieldWeight in 2949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.972531 = idf(docFreq=295, maxDocs=42740)
                0.0625 = fieldNorm(doc=2949)
          0.19962 = weight(abstract_txt:summaries in 2949) [ClassicSimilarity], result of:
            0.19962 = score(doc=2949,freq=3.0), product of:
              0.26273572 = queryWeight, product of:
                3.1795044 = boost
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.011773766 = queryNorm
              0.75977486 = fieldWeight in 2949, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.0625 = fieldNorm(doc=2949)
          0.21026495 = weight(abstract_txt:summarization in 2949) [ClassicSimilarity], result of:
            0.21026495 = score(doc=2949,freq=3.0), product of:
              0.27199507 = queryWeight, product of:
                3.2350454 = boost
                7.141102 = idf(docFreq=91, maxDocs=42740)
                0.011773766 = queryNorm
              0.7730469 = fieldWeight in 2949, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.141102 = idf(docFreq=91, maxDocs=42740)
                0.0625 = fieldNorm(doc=2949)
        0.28 = coord(7/25)
    
  4. Kar, M.; Nunes, S.; Ribeiro, C.: Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model (2015) 0.19
    0.19146007 = sum of:
      0.19146007 = product of:
        0.59831274 = sum of:
          0.013159433 = weight(abstract_txt:standard in 4677) [ClassicSimilarity], result of:
            0.013159433 = score(doc=4677,freq=1.0), product of:
              0.05945624 = queryWeight, product of:
                1.0695069 = boost
                4.7217007 = idf(docFreq=1033, maxDocs=42740)
                0.011773766 = queryNorm
              0.22132972 = fieldWeight in 4677, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7217007 = idf(docFreq=1033, maxDocs=42740)
                0.046875 = fieldNorm(doc=4677)
          0.019233458 = weight(abstract_txt:words in 4677) [ClassicSimilarity], result of:
            0.019233458 = score(doc=4677,freq=1.0), product of:
              0.076573335 = queryWeight, product of:
                1.2137344 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.011773766 = queryNorm
              0.25117695 = fieldWeight in 4677, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.046875 = fieldNorm(doc=4677)
          0.031142987 = weight(abstract_txt:wikipedia in 4677) [ClassicSimilarity], result of:
            0.031142987 = score(doc=4677,freq=1.0), product of:
              0.10558758 = queryWeight, product of:
                1.4252508 = boost
                6.2922525 = idf(docFreq=214, maxDocs=42740)
                0.011773766 = queryNorm
              0.29494935 = fieldWeight in 4677, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2922525 = idf(docFreq=214, maxDocs=42740)
                0.046875 = fieldNorm(doc=4677)
          0.04385522 = weight(abstract_txt:document in 4677) [ClassicSimilarity], result of:
            0.04385522 = score(doc=4677,freq=5.0), product of:
              0.09774027 = queryWeight, product of:
                1.9392626 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.011773766 = queryNorm
              0.44869143 = fieldWeight in 4677, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.046875 = fieldNorm(doc=4677)
          0.106021844 = weight(abstract_txt:extractive in 4677) [ClassicSimilarity], result of:
            0.106021844 = score(doc=4677,freq=1.0), product of:
              0.23894773 = queryWeight, product of:
                2.144057 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.011773766 = queryNorm
              0.4437031 = fieldWeight in 4677, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.046875 = fieldNorm(doc=4677)
          0.03159665 = weight(abstract_txt:using in 4677) [ClassicSimilarity], result of:
            0.03159665 = score(doc=4677,freq=4.0), product of:
              0.09686208 = queryWeight, product of:
                2.3644078 = boost
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.011773766 = queryNorm
              0.32620248 = fieldWeight in 4677, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.046875 = fieldNorm(doc=4677)
          0.14971499 = weight(abstract_txt:summaries in 4677) [ClassicSimilarity], result of:
            0.14971499 = score(doc=4677,freq=3.0), product of:
              0.26273572 = queryWeight, product of:
                3.1795044 = boost
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.011773766 = queryNorm
              0.56983113 = fieldWeight in 4677, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.046875 = fieldNorm(doc=4677)
          0.20358817 = weight(abstract_txt:summarization in 4677) [ClassicSimilarity], result of:
            0.20358817 = score(doc=4677,freq=5.0), product of:
              0.27199507 = queryWeight, product of:
                3.2350454 = boost
                7.141102 = idf(docFreq=91, maxDocs=42740)
                0.011773766 = queryNorm
              0.7484995 = fieldWeight in 4677, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.141102 = idf(docFreq=91, maxDocs=42740)
                0.046875 = fieldNorm(doc=4677)
        0.32 = coord(8/25)
    
  5. Liu, X.; Zheng, W.; Fang, H.: ¬An exploration of ranking models and feedback method for related entity finding (2013) 0.18
    0.18237253 = sum of:
      0.18237253 = product of:
        0.75988555 = sum of:
          0.01754591 = weight(abstract_txt:standard in 4715) [ClassicSimilarity], result of:
            0.01754591 = score(doc=4715,freq=1.0), product of:
              0.05945624 = queryWeight, product of:
                1.0695069 = boost
                4.7217007 = idf(docFreq=1033, maxDocs=42740)
                0.011773766 = queryNorm
              0.2951063 = fieldWeight in 4715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7217007 = idf(docFreq=1033, maxDocs=42740)
                0.0625 = fieldNorm(doc=4715)
          0.026150202 = weight(abstract_txt:document in 4715) [ClassicSimilarity], result of:
            0.026150202 = score(doc=4715,freq=1.0), product of:
              0.09774027 = queryWeight, product of:
                1.9392626 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.011773766 = queryNorm
              0.26754788 = fieldWeight in 4715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=4715)
          0.17893836 = weight(abstract_txt:entities in 4715) [ClassicSimilarity], result of:
            0.17893836 = score(doc=4715,freq=7.0), product of:
              0.18415909 = queryWeight, product of:
                2.661929 = boost
                5.8759933 = idf(docFreq=325, maxDocs=42740)
                0.011773766 = queryNorm
              0.971651 = fieldWeight in 4715, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.8759933 = idf(docFreq=325, maxDocs=42740)
                0.0625 = fieldNorm(doc=4715)
          0.09454132 = weight(abstract_txt:type in 4715) [ClassicSimilarity], result of:
            0.09454132 = score(doc=4715,freq=1.0), product of:
              0.30169314 = queryWeight, product of:
                5.110623 = boost
                5.013906 = idf(docFreq=771, maxDocs=42740)
                0.011773766 = queryNorm
              0.31336913 = fieldWeight in 4715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.013906 = idf(docFreq=771, maxDocs=42740)
                0.0625 = fieldNorm(doc=4715)
          0.14911792 = weight(abstract_txt:models in 4715) [ClassicSimilarity], result of:
            0.14911792 = score(doc=4715,freq=3.0), product of:
              0.2935731 = queryWeight, product of:
                5.3140793 = boost
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.011773766 = queryNorm
              0.5079413 = fieldWeight in 4715, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.0625 = fieldNorm(doc=4715)
          0.29359186 = weight(abstract_txt:entity in 4715) [ClassicSimilarity], result of:
            0.29359186 = score(doc=4715,freq=3.0), product of:
              0.4281104 = queryWeight, product of:
                5.739747 = boost
                6.3350143 = idf(docFreq=205, maxDocs=42740)
                0.011773766 = queryNorm
              0.6857854 = fieldWeight in 4715, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.3350143 = idf(docFreq=205, maxDocs=42740)
                0.0625 = fieldNorm(doc=4715)
        0.24 = coord(6/25)