Document (#18147)

Author
Salton, G.
Title
Automatic text structuring and summarization
Source
Information processing and management. 33(1997) no.2, S.193-207
Year
1997
Abstract
Applies the ideas from the automatic link generation research to automatic text summarisation. Using techniques for inter-document link generation, generates intra-document links between passages of a document. Based on the intra-document linkage pattern of a text, characterises the structure of the text. Applies the knowledge of text structure to do automatic text summarisation by passage extraction. Evaluates a set of 50 summaries generated using these techniques by comparing the to paragraph extracts constructed by humans. The automatic summarisation methods perform well, especially in view of the fact that the summaries generates by 2 humans for the same article are surprisingly dissimilar
Footnote
Contribution to a special issue on methods and tools for the automatic construction of hypertext
Theme
Automatisches Abstracting
Hypertext

Similar documents (author)

  1. Salton, G.: Another look at automatic text-retrieval systems (1986) 4.85
    4.8517637 = sum of:
      4.8517637 = weight(author_txt:salton in 1356) [ClassicSimilarity], result of:
        4.8517637 = fieldWeight in 1356, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.762822 = idf(docFreq=49, maxDocs=43254)
          0.625 = fieldNorm(doc=1356)
    
  2. Salton, G.: ¬A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART) (1972) 4.85
    4.8517637 = sum of:
      4.8517637 = weight(author_txt:salton in 2325) [ClassicSimilarity], result of:
        4.8517637 = fieldWeight in 2325, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.762822 = idf(docFreq=49, maxDocs=43254)
          0.625 = fieldNorm(doc=2325)
    
  3. Salton, G.: Future prospects for text-based information retrieval (1990) 4.85
    4.8517637 = sum of:
      4.8517637 = weight(author_txt:salton in 2327) [ClassicSimilarity], result of:
        4.8517637 = fieldWeight in 2327, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.762822 = idf(docFreq=49, maxDocs=43254)
          0.625 = fieldNorm(doc=2327)
    
  4. Salton, G.: Fast document classification in automatic information retrieval (1978) 4.85
    4.8517637 = sum of:
      4.8517637 = weight(author_txt:salton in 2331) [ClassicSimilarity], result of:
        4.8517637 = fieldWeight in 2331, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.762822 = idf(docFreq=49, maxDocs=43254)
          0.625 = fieldNorm(doc=2331)
    
  5. Salton, G.: Expert systems and information retrieval (1987) 4.85
    4.8517637 = sum of:
      4.8517637 = weight(author_txt:salton in 2837) [ClassicSimilarity], result of:
        4.8517637 = fieldWeight in 2837, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.762822 = idf(docFreq=49, maxDocs=43254)
          0.625 = fieldNorm(doc=2837)
    

Similar documents (content)

  1. Szlávik, Z.; Tombros, A.; Lalmas, M.: Summarisation of the logical structure of XML documents (2012) 0.21
    0.20871459 = sum of:
      0.20871459 = product of:
        1.0435729 = sum of:
          0.01106205 = weight(abstract_txt:using in 4196) [ClassicSimilarity], result of:
            0.01106205 = score(doc=4196,freq=1.0), product of:
              0.0509598 = queryWeight, product of:
                1.0294591 = boost
                3.4731848 = idf(docFreq=3646, maxDocs=43254)
                0.014252489 = queryNorm
              0.21707405 = fieldWeight in 4196, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4731848 = idf(docFreq=3646, maxDocs=43254)
                0.0625 = fieldNorm(doc=4196)
          0.058202595 = weight(abstract_txt:structure in 4196) [ClassicSimilarity], result of:
            0.058202595 = score(doc=4196,freq=7.0), product of:
              0.08058722 = queryWeight, product of:
                1.2945783 = boost
                4.367643 = idf(docFreq=1490, maxDocs=43254)
                0.014252489 = queryNorm
              0.72223103 = fieldWeight in 4196, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.367643 = idf(docFreq=1490, maxDocs=43254)
                0.0625 = fieldNorm(doc=4196)
          0.15826616 = weight(abstract_txt:summaries in 4196) [ClassicSimilarity], result of:
            0.15826616 = score(doc=4196,freq=3.0), product of:
              0.20823589 = queryWeight, product of:
                2.0810046 = boost
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.014252489 = queryNorm
              0.760033 = fieldWeight in 4196, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.0625 = fieldNorm(doc=4196)
          0.041518774 = weight(abstract_txt:document in 4196) [ClassicSimilarity], result of:
            0.041518774 = score(doc=4196,freq=1.0), product of:
              0.15506376 = queryWeight, product of:
                2.5396004 = boost
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.014252489 = queryNorm
              0.26775292 = fieldWeight in 4196, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.0625 = fieldNorm(doc=4196)
          0.7745233 = weight(abstract_txt:summarisation in 4196) [ClassicSimilarity], result of:
            0.7745233 = score(doc=4196,freq=6.0), product of:
              0.54534787 = queryWeight, product of:
                4.1245604 = boost
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.014252489 = queryNorm
              1.4202372 = fieldWeight in 4196, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.0625 = fieldNorm(doc=4196)
        0.2 = coord(5/25)
    
  2. Yang, C.C.; Wang, F.L.: Hierarchical summarization of large documents (2008) 0.19
    0.18930738 = sum of:
      0.18930738 = product of:
        0.5915856 = sum of:
          0.01106205 = weight(abstract_txt:using in 3720) [ClassicSimilarity], result of:
            0.01106205 = score(doc=3720,freq=1.0), product of:
              0.0509598 = queryWeight, product of:
                1.0294591 = boost
                3.4731848 = idf(docFreq=3646, maxDocs=43254)
                0.014252489 = queryNorm
              0.21707405 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4731848 = idf(docFreq=3646, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
          0.12725513 = weight(abstract_txt:summarization in 3720) [ClassicSimilarity], result of:
            0.12725513 = score(doc=3720,freq=7.0), product of:
              0.10774856 = queryWeight, product of:
                1.0584881 = boost
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.014252489 = queryNorm
              1.1810378 = fieldWeight in 3720, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
          0.043997027 = weight(abstract_txt:structure in 3720) [ClassicSimilarity], result of:
            0.043997027 = score(doc=3720,freq=4.0), product of:
              0.08058722 = queryWeight, product of:
                1.2945783 = boost
                4.367643 = idf(docFreq=1490, maxDocs=43254)
                0.014252489 = queryNorm
              0.54595536 = fieldWeight in 3720, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.367643 = idf(docFreq=1490, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
          0.02446172 = weight(abstract_txt:techniques in 3720) [ClassicSimilarity], result of:
            0.02446172 = score(doc=3720,freq=1.0), product of:
              0.086495854 = queryWeight, product of:
                1.341198 = boost
                4.524928 = idf(docFreq=1273, maxDocs=43254)
                0.014252489 = queryNorm
              0.282808 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.524928 = idf(docFreq=1273, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
          0.09137501 = weight(abstract_txt:summaries in 3720) [ClassicSimilarity], result of:
            0.09137501 = score(doc=3720,freq=1.0), product of:
              0.20823589 = queryWeight, product of:
                2.0810046 = boost
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.014252489 = queryNorm
              0.43880528 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
          0.10984834 = weight(abstract_txt:document in 3720) [ClassicSimilarity], result of:
            0.10984834 = score(doc=3720,freq=7.0), product of:
              0.15506376 = queryWeight, product of:
                2.5396004 = boost
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.014252489 = queryNorm
              0.7084076 = fieldWeight in 3720, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
          0.052608263 = weight(abstract_txt:text in 3720) [ClassicSimilarity], result of:
            0.052608263 = score(doc=3720,freq=1.0), product of:
              0.20784856 = queryWeight, product of:
                3.601052 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.014252489 = queryNorm
              0.25310862 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
          0.13097803 = weight(abstract_txt:automatic in 3720) [ClassicSimilarity], result of:
            0.13097803 = score(doc=3720,freq=2.0), product of:
              0.2851716 = queryWeight, product of:
                3.850511 = boost
                5.1963353 = idf(docFreq=650, maxDocs=43254)
                0.014252489 = queryNorm
              0.45929548 = fieldWeight in 3720, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1963353 = idf(docFreq=650, maxDocs=43254)
                0.0625 = fieldNorm(doc=3720)
        0.32 = coord(8/25)
    
  3. Sweeney, S.; Crestani, F.; Losada, D.E.: 'Show me more' : incremental length summarisation using novelty detection (2008) 0.19
    0.1854921 = sum of:
      0.1854921 = product of:
        0.9274605 = sum of:
          0.09137501 = weight(abstract_txt:summaries in 4055) [ClassicSimilarity], result of:
            0.09137501 = score(doc=4055,freq=1.0), product of:
              0.20823589 = queryWeight, product of:
                2.0810046 = boost
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.014252489 = queryNorm
              0.43880528 = fieldWeight in 4055, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.0625 = fieldNorm(doc=4055)
          0.08303755 = weight(abstract_txt:document in 4055) [ClassicSimilarity], result of:
            0.08303755 = score(doc=4055,freq=4.0), product of:
              0.15506376 = queryWeight, product of:
                2.5396004 = boost
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.014252489 = queryNorm
              0.53550583 = fieldWeight in 4055, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.0625 = fieldNorm(doc=4055)
          0.07439932 = weight(abstract_txt:text in 4055) [ClassicSimilarity], result of:
            0.07439932 = score(doc=4055,freq=2.0), product of:
              0.20784856 = queryWeight, product of:
                3.601052 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.014252489 = queryNorm
              0.35794964 = fieldWeight in 4055, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.0625 = fieldNorm(doc=4055)
          0.13097803 = weight(abstract_txt:automatic in 4055) [ClassicSimilarity], result of:
            0.13097803 = score(doc=4055,freq=2.0), product of:
              0.2851716 = queryWeight, product of:
                3.850511 = boost
                5.1963353 = idf(docFreq=650, maxDocs=43254)
                0.014252489 = queryNorm
              0.45929548 = fieldWeight in 4055, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1963353 = idf(docFreq=650, maxDocs=43254)
                0.0625 = fieldNorm(doc=4055)
          0.5476706 = weight(abstract_txt:summarisation in 4055) [ClassicSimilarity], result of:
            0.5476706 = score(doc=4055,freq=3.0), product of:
              0.54534787 = queryWeight, product of:
                4.1245604 = boost
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.014252489 = queryNorm
              1.0042592 = fieldWeight in 4055, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.0625 = fieldNorm(doc=4055)
        0.2 = coord(5/25)
    
  4. Lihui, C.; Lian, C.W.: Using Web structure and summarisation techniques for Web content mining (2005) 0.17
    0.17313774 = sum of:
      0.17313774 = product of:
        0.72140723 = sum of:
          0.01106205 = weight(abstract_txt:using in 3047) [ClassicSimilarity], result of:
            0.01106205 = score(doc=3047,freq=1.0), product of:
              0.0509598 = queryWeight, product of:
                1.0294591 = boost
                3.4731848 = idf(docFreq=3646, maxDocs=43254)
                0.014252489 = queryNorm
              0.21707405 = fieldWeight in 3047, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4731848 = idf(docFreq=3646, maxDocs=43254)
                0.0625 = fieldNorm(doc=3047)
          0.031110596 = weight(abstract_txt:structure in 3047) [ClassicSimilarity], result of:
            0.031110596 = score(doc=3047,freq=2.0), product of:
              0.08058722 = queryWeight, product of:
                1.2945783 = boost
                4.367643 = idf(docFreq=1490, maxDocs=43254)
                0.014252489 = queryNorm
              0.38604873 = fieldWeight in 3047, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.367643 = idf(docFreq=1490, maxDocs=43254)
                0.0625 = fieldNorm(doc=3047)
          0.04236894 = weight(abstract_txt:techniques in 3047) [ClassicSimilarity], result of:
            0.04236894 = score(doc=3047,freq=3.0), product of:
              0.086495854 = queryWeight, product of:
                1.341198 = boost
                4.524928 = idf(docFreq=1273, maxDocs=43254)
                0.014252489 = queryNorm
              0.48983783 = fieldWeight in 3047, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.524928 = idf(docFreq=1273, maxDocs=43254)
                0.0625 = fieldNorm(doc=3047)
          0.058716413 = weight(abstract_txt:document in 3047) [ClassicSimilarity], result of:
            0.058716413 = score(doc=3047,freq=2.0), product of:
              0.15506376 = queryWeight, product of:
                2.5396004 = boost
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.014252489 = queryNorm
              0.37865978 = fieldWeight in 3047, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.0625 = fieldNorm(doc=3047)
          0.13097803 = weight(abstract_txt:automatic in 3047) [ClassicSimilarity], result of:
            0.13097803 = score(doc=3047,freq=2.0), product of:
              0.2851716 = queryWeight, product of:
                3.850511 = boost
                5.1963353 = idf(docFreq=650, maxDocs=43254)
                0.014252489 = queryNorm
              0.45929548 = fieldWeight in 3047, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1963353 = idf(docFreq=650, maxDocs=43254)
                0.0625 = fieldNorm(doc=3047)
          0.4471712 = weight(abstract_txt:summarisation in 3047) [ClassicSimilarity], result of:
            0.4471712 = score(doc=3047,freq=2.0), product of:
              0.54534787 = queryWeight, product of:
                4.1245604 = boost
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.014252489 = queryNorm
              0.81997424 = fieldWeight in 3047, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.0625 = fieldNorm(doc=3047)
        0.24 = coord(6/25)
    
  5. Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.17
    0.16594392 = sum of:
      0.16594392 = product of:
        0.691433 = sum of:
          0.009679294 = weight(abstract_txt:using in 4766) [ClassicSimilarity], result of:
            0.009679294 = score(doc=4766,freq=1.0), product of:
              0.0509598 = queryWeight, product of:
                1.0294591 = boost
                3.4731848 = idf(docFreq=3646, maxDocs=43254)
                0.014252489 = queryNorm
              0.1899398 = fieldWeight in 4766, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4731848 = idf(docFreq=3646, maxDocs=43254)
                0.0546875 = fieldNorm(doc=4766)
          0.24188635 = weight(abstract_txt:passage in 4766) [ClassicSimilarity], result of:
            0.24188635 = score(doc=4766,freq=14.0), product of:
              0.14344546 = queryWeight, product of:
                1.2213037 = boost
                8.240858 = idf(docFreq=30, maxDocs=43254)
                0.014252489 = queryNorm
              1.68626 = fieldWeight in 4766, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.240858 = idf(docFreq=30, maxDocs=43254)
                0.0546875 = fieldNorm(doc=4766)
          0.24780658 = weight(abstract_txt:passages in 4766) [ClassicSimilarity], result of:
            0.24780658 = score(doc=4766,freq=14.0), product of:
              0.14577658 = queryWeight, product of:
                1.2311873 = boost
                8.307549 = idf(docFreq=28, maxDocs=43254)
                0.014252489 = queryNorm
              1.6999066 = fieldWeight in 4766, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.307549 = idf(docFreq=28, maxDocs=43254)
                0.0546875 = fieldNorm(doc=4766)
          0.037072822 = weight(abstract_txt:techniques in 4766) [ClassicSimilarity], result of:
            0.037072822 = score(doc=4766,freq=3.0), product of:
              0.086495854 = queryWeight, product of:
                1.341198 = boost
                4.524928 = idf(docFreq=1273, maxDocs=43254)
                0.014252489 = queryNorm
              0.4286081 = fieldWeight in 4766, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.524928 = idf(docFreq=1273, maxDocs=43254)
                0.0546875 = fieldNorm(doc=4766)
          0.06292355 = weight(abstract_txt:document in 4766) [ClassicSimilarity], result of:
            0.06292355 = score(doc=4766,freq=3.0), product of:
              0.15506376 = queryWeight, product of:
                2.5396004 = boost
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.014252489 = queryNorm
              0.40579146 = fieldWeight in 4766, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.0546875 = fieldNorm(doc=4766)
          0.09206446 = weight(abstract_txt:text in 4766) [ClassicSimilarity], result of:
            0.09206446 = score(doc=4766,freq=4.0), product of:
              0.20784856 = queryWeight, product of:
                3.601052 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.014252489 = queryNorm
              0.4429401 = fieldWeight in 4766, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.0546875 = fieldNorm(doc=4766)
        0.24 = coord(6/25)