Document (#18146)

Author
Salton, G.
Title
Automatic text structuring and summarization
Source
Information processing and management. 33(1997) no.2, S.193-207
Year
1997
Abstract
Applies the ideas from the automatic link generation research to automatic text summarisation. Using techniques for inter-document link generation, generates intra-document links between passages of a document. Based on the intra-document linkage pattern of a text, characterises the structure of the text. Applies the knowledge of text structure to do automatic text summarisation by passage extraction. Evaluates a set of 50 summaries generated using these techniques by comparing the to paragraph extracts constructed by humans. The automatic summarisation methods perform well, especially in view of the fact that the summaries generates by 2 humans for the same article are surprisingly dissimilar
Footnote
Contribution to a special issue on methods and tools for the automatic construction of hypertext
Theme
Automatisches Abstracting
Hypertext

Similar documents (author)

  1. Salton, G.: Another look at automatic text-retrieval systems (1986) 4.87
    4.8655405 = sum of:
      4.8655405 = weight(author_txt:salton in 1356) [ClassicSimilarity], result of:
        4.8655405 = fieldWeight in 1356, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.7848644 = idf(docFreq=49, maxDocs=44218)
          0.625 = fieldNorm(doc=1356)
    
  2. Salton, G.: ¬A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART) (1972) 4.87
    4.8655405 = sum of:
      4.8655405 = weight(author_txt:salton in 2325) [ClassicSimilarity], result of:
        4.8655405 = fieldWeight in 2325, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.7848644 = idf(docFreq=49, maxDocs=44218)
          0.625 = fieldNorm(doc=2325)
    
  3. Salton, G.: Future prospects for text-based information retrieval (1990) 4.87
    4.8655405 = sum of:
      4.8655405 = weight(author_txt:salton in 2327) [ClassicSimilarity], result of:
        4.8655405 = fieldWeight in 2327, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.7848644 = idf(docFreq=49, maxDocs=44218)
          0.625 = fieldNorm(doc=2327)
    
  4. Salton, G.: Fast document classification in automatic information retrieval (1978) 4.87
    4.8655405 = sum of:
      4.8655405 = weight(author_txt:salton in 2331) [ClassicSimilarity], result of:
        4.8655405 = fieldWeight in 2331, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.7848644 = idf(docFreq=49, maxDocs=44218)
          0.625 = fieldNorm(doc=2331)
    
  5. Salton, G.: Expert systems and information retrieval (1987) 4.87
    4.8655405 = sum of:
      4.8655405 = weight(author_txt:salton in 2837) [ClassicSimilarity], result of:
        4.8655405 = fieldWeight in 2837, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.7848644 = idf(docFreq=49, maxDocs=44218)
          0.625 = fieldNorm(doc=2837)
    

Similar documents (content)

  1. Szlávik, Z.; Tombros, A.; Lalmas, M.: Summarisation of the logical structure of XML documents (2012) 0.21
    0.20702736 = sum of:
      0.20702736 = product of:
        1.0351368 = sum of:
          0.011041914 = weight(abstract_txt:using in 2731) [ClassicSimilarity], result of:
            0.011041914 = score(doc=2731,freq=1.0), product of:
              0.051014893 = queryWeight, product of:
                1.0296062 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.014307326 = queryNorm
              0.21644491 = fieldWeight in 2731, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0625 = fieldNorm(doc=2731)
          0.05821739 = weight(abstract_txt:structure in 2731) [ClassicSimilarity], result of:
            0.05821739 = score(doc=2731,freq=7.0), product of:
              0.08078609 = queryWeight, product of:
                1.2956597 = boost
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.014307326 = queryNorm
              0.72063637 = fieldWeight in 2731, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.0625 = fieldNorm(doc=2731)
          0.16021669 = weight(abstract_txt:summaries in 2731) [ClassicSimilarity], result of:
            0.16021669 = score(doc=2731,freq=3.0), product of:
              0.21042572 = queryWeight, product of:
                2.0910869 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.014307326 = queryNorm
              0.7613931 = fieldWeight in 2731, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.0625 = fieldNorm(doc=2731)
          0.042056784 = weight(abstract_txt:document in 2731) [ClassicSimilarity], result of:
            0.042056784 = score(doc=2731,freq=1.0), product of:
              0.15675983 = queryWeight, product of:
                2.552437 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.014307326 = queryNorm
              0.26828802 = fieldWeight in 2731, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=2731)
          0.76360404 = weight(abstract_txt:summarisation in 2731) [ClassicSimilarity], result of:
            0.76360404 = score(doc=2731,freq=6.0), product of:
              0.54145145 = queryWeight, product of:
                4.1081667 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.014307326 = queryNorm
              1.4102908 = fieldWeight in 2731, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=2731)
        0.2 = coord(5/25)
    
  2. Yang, C.C.; Wang, F.L.: Hierarchical summarization of large documents (2008) 0.19
    0.19062893 = sum of:
      0.19062893 = product of:
        0.5957154 = sum of:
          0.011041914 = weight(abstract_txt:using in 1719) [ClassicSimilarity], result of:
            0.011041914 = score(doc=1719,freq=1.0), product of:
              0.051014893 = queryWeight, product of:
                1.0296062 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.014307326 = queryNorm
              0.21644491 = fieldWeight in 1719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.12761265 = weight(abstract_txt:summarization in 1719) [ClassicSimilarity], result of:
            0.12761265 = score(doc=1719,freq=7.0), product of:
              0.10819832 = queryWeight, product of:
                1.0602735 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.014307326 = queryNorm
              1.1794327 = fieldWeight in 1719, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.044008214 = weight(abstract_txt:structure in 1719) [ClassicSimilarity], result of:
            0.044008214 = score(doc=1719,freq=4.0), product of:
              0.08078609 = queryWeight, product of:
                1.2956597 = boost
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.014307326 = queryNorm
              0.5447499 = fieldWeight in 1719, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.024711188 = weight(abstract_txt:techniques in 1719) [ClassicSimilarity], result of:
            0.024711188 = score(doc=1719,freq=1.0), product of:
              0.08728303 = queryWeight, product of:
                1.3467518 = boost
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.014307326 = queryNorm
              0.2831156 = fieldWeight in 1719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.09250115 = weight(abstract_txt:summaries in 1719) [ClassicSimilarity], result of:
            0.09250115 = score(doc=1719,freq=1.0), product of:
              0.21042572 = queryWeight, product of:
                2.0910869 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.014307326 = queryNorm
              0.4395905 = fieldWeight in 1719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.11127179 = weight(abstract_txt:document in 1719) [ClassicSimilarity], result of:
            0.11127179 = score(doc=1719,freq=7.0), product of:
              0.15675983 = queryWeight, product of:
                2.552437 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.014307326 = queryNorm
              0.70982337 = fieldWeight in 1719, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.052741684 = weight(abstract_txt:text in 1719) [ClassicSimilarity], result of:
            0.052741684 = score(doc=1719,freq=1.0), product of:
              0.20867823 = queryWeight, product of:
                3.6067982 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014307326 = queryNorm
              0.25274166 = fieldWeight in 1719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.13182683 = weight(abstract_txt:automatic in 1719) [ClassicSimilarity], result of:
            0.13182683 = score(doc=1719,freq=2.0), product of:
              0.2870604 = queryWeight, product of:
                3.861707 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.014307326 = queryNorm
              0.45923027 = fieldWeight in 1719, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
        0.32 = coord(8/25)
    
  3. Sweeney, S.; Crestani, F.; Losada, D.E.: 'Show me more' : incremental length summarisation using novelty detection (2008) 0.18
    0.18459582 = sum of:
      0.18459582 = product of:
        0.9229791 = sum of:
          0.09250115 = weight(abstract_txt:summaries in 2054) [ClassicSimilarity], result of:
            0.09250115 = score(doc=2054,freq=1.0), product of:
              0.21042572 = queryWeight, product of:
                2.0910869 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.014307326 = queryNorm
              0.4395905 = fieldWeight in 2054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.0625 = fieldNorm(doc=2054)
          0.08411357 = weight(abstract_txt:document in 2054) [ClassicSimilarity], result of:
            0.08411357 = score(doc=2054,freq=4.0), product of:
              0.15675983 = queryWeight, product of:
                2.552437 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.014307326 = queryNorm
              0.53657603 = fieldWeight in 2054, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=2054)
          0.07458801 = weight(abstract_txt:text in 2054) [ClassicSimilarity], result of:
            0.07458801 = score(doc=2054,freq=2.0), product of:
              0.20867823 = queryWeight, product of:
                3.6067982 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014307326 = queryNorm
              0.3574307 = fieldWeight in 2054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2054)
          0.13182683 = weight(abstract_txt:automatic in 2054) [ClassicSimilarity], result of:
            0.13182683 = score(doc=2054,freq=2.0), product of:
              0.2870604 = queryWeight, product of:
                3.861707 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.014307326 = queryNorm
              0.45923027 = fieldWeight in 2054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=2054)
          0.53994954 = weight(abstract_txt:summarisation in 2054) [ClassicSimilarity], result of:
            0.53994954 = score(doc=2054,freq=3.0), product of:
              0.54145145 = queryWeight, product of:
                4.1081667 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.014307326 = queryNorm
              0.9972262 = fieldWeight in 2054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=2054)
        0.2 = coord(5/25)
    
  4. Lihui, C.; Lian, C.W.: Using Web structure and summarisation techniques for Web content mining (2005) 0.17
    0.17211181 = sum of:
      0.17211181 = product of:
        0.71713257 = sum of:
          0.011041914 = weight(abstract_txt:using in 1046) [ClassicSimilarity], result of:
            0.011041914 = score(doc=1046,freq=1.0), product of:
              0.051014893 = queryWeight, product of:
                1.0296062 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.014307326 = queryNorm
              0.21644491 = fieldWeight in 1046, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0625 = fieldNorm(doc=1046)
          0.031118507 = weight(abstract_txt:structure in 1046) [ClassicSimilarity], result of:
            0.031118507 = score(doc=1046,freq=2.0), product of:
              0.08078609 = queryWeight, product of:
                1.2956597 = boost
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.014307326 = queryNorm
              0.38519636 = fieldWeight in 1046, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.0625 = fieldNorm(doc=1046)
          0.042801034 = weight(abstract_txt:techniques in 1046) [ClassicSimilarity], result of:
            0.042801034 = score(doc=1046,freq=3.0), product of:
              0.08728303 = queryWeight, product of:
                1.3467518 = boost
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.014307326 = queryNorm
              0.4903706 = fieldWeight in 1046, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.0625 = fieldNorm(doc=1046)
          0.059477273 = weight(abstract_txt:document in 1046) [ClassicSimilarity], result of:
            0.059477273 = score(doc=1046,freq=2.0), product of:
              0.15675983 = queryWeight, product of:
                2.552437 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.014307326 = queryNorm
              0.37941656 = fieldWeight in 1046, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=1046)
          0.13182683 = weight(abstract_txt:automatic in 1046) [ClassicSimilarity], result of:
            0.13182683 = score(doc=1046,freq=2.0), product of:
              0.2870604 = queryWeight, product of:
                3.861707 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.014307326 = queryNorm
              0.45923027 = fieldWeight in 1046, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=1046)
          0.44086698 = weight(abstract_txt:summarisation in 1046) [ClassicSimilarity], result of:
            0.44086698 = score(doc=1046,freq=2.0), product of:
              0.54145145 = queryWeight, product of:
                4.1081667 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.014307326 = queryNorm
              0.81423175 = fieldWeight in 1046, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=1046)
        0.24 = coord(6/25)
    
  5. Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.17
    0.16730839 = sum of:
      0.16730839 = product of:
        0.6971183 = sum of:
          0.0096616745 = weight(abstract_txt:using in 2765) [ClassicSimilarity], result of:
            0.0096616745 = score(doc=2765,freq=1.0), product of:
              0.051014893 = queryWeight, product of:
                1.0296062 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.014307326 = queryNorm
              0.18938929 = fieldWeight in 2765, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.24551718 = weight(abstract_txt:passage in 2765) [ClassicSimilarity], result of:
            0.24551718 = score(doc=2765,freq=14.0), product of:
              0.14521025 = queryWeight, product of:
                1.2283052 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.014307326 = queryNorm
              1.6907703 = fieldWeight in 2765, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.24845162 = weight(abstract_txt:passages in 2765) [ClassicSimilarity], result of:
            0.24845162 = score(doc=2765,freq=14.0), product of:
              0.146365 = queryWeight, product of:
                1.2331795 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.014307326 = queryNorm
              1.6974797 = fieldWeight in 2765, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.037450902 = weight(abstract_txt:techniques in 2765) [ClassicSimilarity], result of:
            0.037450902 = score(doc=2765,freq=3.0), product of:
              0.08728303 = queryWeight, product of:
                1.3467518 = boost
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.014307326 = queryNorm
              0.4290743 = fieldWeight in 2765, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.06373893 = weight(abstract_txt:document in 2765) [ClassicSimilarity], result of:
            0.06373893 = score(doc=2765,freq=3.0), product of:
              0.15675983 = queryWeight, product of:
                2.552437 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.014307326 = queryNorm
              0.4066024 = fieldWeight in 2765, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.09229794 = weight(abstract_txt:text in 2765) [ClassicSimilarity], result of:
            0.09229794 = score(doc=2765,freq=4.0), product of:
              0.20867823 = queryWeight, product of:
                3.6067982 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014307326 = queryNorm
              0.4422979 = fieldWeight in 2765, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
        0.24 = coord(6/25)