Document (#39732)

Author
Szlávik, Z.
Tombros, A.
Lalmas, M.
Title
Summarisation of the logical structure of XML documents
Source
Information processing and management. 48(2012) no.5, S.956-968
Year
2012
Abstract
Summarisation is traditionally used to produce summaries of the textual contents of documents. In this paper, it is argued that summarisation methods can also be applied to the logical structure of XML documents. Structure summarisation selects the most important elements of the logical structure and ensures that the user's attention is focused towards sections, subsections, etc. that are believed to be of particular interest. Structure summaries are shown to users as hierarchical tables of contents. This paper discusses methods for structure summarisation that use various features of XML elements in order to select document portions that a user's attention should be focused to. An evaluation methodology for structure summarisation is also introduced and summarisation results using various summariser versions are presented and compared to one another. We show that data sets used in information retrieval evaluation can be used effectively in order to produce high quality (query independent) structure summaries. We also discuss the choice and effectiveness of particular summariser features with respect to several evaluation measures.
Content
Beitrag in einem Themenheft "Large-Scale and Distributed Systems for Information Retrieval" Vgl.: doi:10.1016/j.ipm.2011.11.002.
Object
XML

Similar documents (author)

  1. Tombros, T.; Crestani, F.: Users' perception of relevance of spoken documents (2000) 1.86
    1.8576884 = sum of:
      1.8576884 = product of:
        3.7153769 = sum of:
          3.7153769 = weight(author_txt:tombros in 4996) [ClassicSimilarity], result of:
            3.7153769 = score(doc=4996,freq=1.0), product of:
              0.7822124 = queryWeight, product of:
                1.120506 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.07348561 = queryNorm
              4.749831 = fieldWeight in 4996, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.5 = fieldNorm(doc=4996)
        0.5 = coord(1/2)
    
  2. Tao, Y.; Tombros, A.: How collaborators make sense of tasks together : a comparative analysis of collaborative sensemaking behavior in collaborative information-seeking tasks (2017) 1.86
    1.8576884 = sum of:
      1.8576884 = product of:
        3.7153769 = sum of:
          3.7153769 = weight(author_txt:tombros in 3429) [ClassicSimilarity], result of:
            3.7153769 = score(doc=3429,freq=1.0), product of:
              0.7822124 = queryWeight, product of:
                1.120506 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.07348561 = queryNorm
              4.749831 = fieldWeight in 3429, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.5 = fieldNorm(doc=3429)
        0.5 = coord(1/2)
    
  3. Lalmas, M.: Logical models in information retrieval : introduction and overview (1998) 1.65
    1.6505941 = sum of:
      1.6505941 = product of:
        3.3011882 = sum of:
          3.3011882 = weight(author_txt:lalmas in 2668) [ClassicSimilarity], result of:
            3.3011882 = score(doc=2668,freq=1.0), product of:
              0.6230118 = queryWeight, product of:
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.07348561 = queryNorm
              5.298757 = fieldWeight in 2668, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.625 = fieldNorm(doc=2668)
        0.5 = coord(1/2)
    
  4. Lalmas, M.: XML information retrieval (2009) 1.65
    1.6505941 = sum of:
      1.6505941 = product of:
        3.3011882 = sum of:
          3.3011882 = weight(author_txt:lalmas in 3880) [ClassicSimilarity], result of:
            3.3011882 = score(doc=3880,freq=1.0), product of:
              0.6230118 = queryWeight, product of:
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.07348561 = queryNorm
              5.298757 = fieldWeight in 3880, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.625 = fieldNorm(doc=3880)
        0.5 = coord(1/2)
    
  5. Lalmas, M.: XML retrieval (2009) 1.65
    1.6505941 = sum of:
      1.6505941 = product of:
        3.3011882 = sum of:
          3.3011882 = weight(author_txt:lalmas in 4998) [ClassicSimilarity], result of:
            3.3011882 = score(doc=4998,freq=1.0), product of:
              0.6230118 = queryWeight, product of:
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.07348561 = queryNorm
              5.298757 = fieldWeight in 4998, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.625 = fieldNorm(doc=4998)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Salton, G.: Automatic text structuring and summarization (1997) 0.33
    0.32811284 = sum of:
      0.32811284 = product of:
        1.6405642 = sum of:
          0.019314963 = weight(abstract_txt:methods in 145) [ClassicSimilarity], result of:
            0.019314963 = score(doc=145,freq=1.0), product of:
              0.04968377 = queryWeight, product of:
                1.048126 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.011431231 = queryNorm
              0.388758 = fieldWeight in 145, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.09375 = fieldNorm(doc=145)
          0.01081054 = weight(abstract_txt:that in 145) [ClassicSimilarity], result of:
            0.01081054 = score(doc=145,freq=1.0), product of:
              0.048665814 = queryWeight, product of:
                1.7967135 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.011431231 = queryNorm
              0.22213829 = fieldWeight in 145, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.09375 = fieldNorm(doc=145)
          0.19993132 = weight(abstract_txt:summaries in 145) [ClassicSimilarity], result of:
            0.19993132 = score(doc=145,freq=2.0), product of:
              0.21440074 = queryWeight, product of:
                2.6666436 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.011431231 = queryNorm
              0.9325123 = fieldWeight in 145, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.09375 = fieldNorm(doc=145)
          0.12682536 = weight(abstract_txt:structure in 145) [ClassicSimilarity], result of:
            0.12682536 = score(doc=145,freq=2.0), product of:
              0.21949907 = queryWeight, product of:
                4.4060817 = boost
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.011431231 = queryNorm
              0.57779455 = fieldWeight in 145, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.09375 = fieldNorm(doc=145)
          1.283682 = weight(abstract_txt:summarisation in 145) [ClassicSimilarity], result of:
            1.283682 = score(doc=145,freq=3.0), product of:
              0.8581684 = queryWeight, product of:
                8.149416 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.011431231 = queryNorm
              1.4958392 = fieldWeight in 145, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.09375 = fieldNorm(doc=145)
        0.2 = coord(5/25)
    
  2. White, R.W.; Jose, J.M.; Ruthven, I.: ¬A task-oriented study on the influencing effects of query-biased summarisation in web searching (2003) 0.25
    0.250322 = sum of:
      0.250322 = product of:
        1.0430084 = sum of:
          0.010268707 = weight(abstract_txt:used in 1081) [ClassicSimilarity], result of:
            0.010268707 = score(doc=1081,freq=1.0), product of:
              0.048908807 = queryWeight, product of:
                1.2736361 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.011431231 = queryNorm
              0.2099562 = fieldWeight in 1081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0625 = fieldNorm(doc=1081)
          0.035635702 = weight(abstract_txt:user's in 1081) [ClassicSimilarity], result of:
            0.035635702 = score(doc=1081,freq=1.0), product of:
              0.09793464 = queryWeight, product of:
                1.4715478 = boost
                5.8219566 = idf(docFreq=355, maxDocs=44218)
                0.011431231 = queryNorm
              0.3638723 = fieldWeight in 1081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8219566 = idf(docFreq=355, maxDocs=44218)
                0.0625 = fieldNorm(doc=1081)
          0.034584578 = weight(abstract_txt:evaluation in 1081) [ClassicSimilarity], result of:
            0.034584578 = score(doc=1081,freq=2.0), product of:
              0.08722109 = queryWeight, product of:
                1.7008367 = boost
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.011431231 = queryNorm
              0.3965162 = fieldWeight in 1081, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.0625 = fieldNorm(doc=1081)
          0.0124829365 = weight(abstract_txt:that in 1081) [ClassicSimilarity], result of:
            0.0124829365 = score(doc=1081,freq=3.0), product of:
              0.048665814 = queryWeight, product of:
                1.7967135 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.011431231 = queryNorm
              0.2565032 = fieldWeight in 1081, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=1081)
          0.09424853 = weight(abstract_txt:summaries in 1081) [ClassicSimilarity], result of:
            0.09424853 = score(doc=1081,freq=1.0), product of:
              0.21440074 = queryWeight, product of:
                2.6666436 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.011431231 = queryNorm
              0.4395905 = fieldWeight in 1081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.0625 = fieldNorm(doc=1081)
          0.855788 = weight(abstract_txt:summarisation in 1081) [ClassicSimilarity], result of:
            0.855788 = score(doc=1081,freq=3.0), product of:
              0.8581684 = queryWeight, product of:
                8.149416 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.011431231 = queryNorm
              0.9972262 = fieldWeight in 1081, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=1081)
        0.24 = coord(6/25)
    
  3. Sweeney, S.; Crestani, F.; Losada, D.E.: 'Show me more' : incremental length summarisation using novelty detection (2008) 0.21
    0.20654014 = sum of:
      0.20654014 = product of:
        1.0327007 = sum of:
          0.010630975 = weight(abstract_txt:also in 2054) [ClassicSimilarity], result of:
            0.010630975 = score(doc=2054,freq=1.0), product of:
              0.05005244 = queryWeight, product of:
                1.2884408 = boost
                3.3983476 = idf(docFreq=4017, maxDocs=44218)
                0.011431231 = queryNorm
              0.21239673 = fieldWeight in 2054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3983476 = idf(docFreq=4017, maxDocs=44218)
                0.0625 = fieldNorm(doc=2054)
          0.052965146 = weight(abstract_txt:produce in 2054) [ClassicSimilarity], result of:
            0.052965146 = score(doc=2054,freq=2.0), product of:
              0.10123474 = queryWeight, product of:
                1.4961357 = boost
                5.9192348 = idf(docFreq=322, maxDocs=44218)
                0.011431231 = queryNorm
              0.5231914 = fieldWeight in 2054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9192348 = idf(docFreq=322, maxDocs=44218)
                0.0625 = fieldNorm(doc=2054)
          0.019068 = weight(abstract_txt:that in 2054) [ClassicSimilarity], result of:
            0.019068 = score(doc=2054,freq=7.0), product of:
              0.048665814 = queryWeight, product of:
                1.7967135 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.011431231 = queryNorm
              0.3918151 = fieldWeight in 2054, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=2054)
          0.09424853 = weight(abstract_txt:summaries in 2054) [ClassicSimilarity], result of:
            0.09424853 = score(doc=2054,freq=1.0), product of:
              0.21440074 = queryWeight, product of:
                2.6666436 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.011431231 = queryNorm
              0.4395905 = fieldWeight in 2054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.0625 = fieldNorm(doc=2054)
          0.855788 = weight(abstract_txt:summarisation in 2054) [ClassicSimilarity], result of:
            0.855788 = score(doc=2054,freq=3.0), product of:
              0.8581684 = queryWeight, product of:
                8.149416 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.011431231 = queryNorm
              0.9972262 = fieldWeight in 2054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=2054)
        0.2 = coord(5/25)
    
  4. Endres-Niggemeyer, B.: Summarising text for intelligent communication : results of the Dagstuhl seminar (1994) 0.17
    0.1716088 = sum of:
      0.1716088 = product of:
        0.7150367 = sum of:
          0.0160958 = weight(abstract_txt:methods in 8867) [ClassicSimilarity], result of:
            0.0160958 = score(doc=8867,freq=1.0), product of:
              0.04968377 = queryWeight, product of:
                1.048126 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.011431231 = queryNorm
              0.32396498 = fieldWeight in 8867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.078125 = fieldNorm(doc=8867)
          0.019270398 = weight(abstract_txt:particular in 8867) [ClassicSimilarity], result of:
            0.019270398 = score(doc=8867,freq=1.0), product of:
              0.056018725 = queryWeight, product of:
                1.1129427 = boost
                4.4031897 = idf(docFreq=1470, maxDocs=44218)
                0.011431231 = queryNorm
              0.3439992 = fieldWeight in 8867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4031897 = idf(docFreq=1470, maxDocs=44218)
                0.078125 = fieldNorm(doc=8867)
          0.01985083 = weight(abstract_txt:order in 8867) [ClassicSimilarity], result of:
            0.01985083 = score(doc=8867,freq=1.0), product of:
              0.057138026 = queryWeight, product of:
                1.1240065 = boost
                4.446962 = idf(docFreq=1407, maxDocs=44218)
                0.011431231 = queryNorm
              0.3474189 = fieldWeight in 8867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.446962 = idf(docFreq=1407, maxDocs=44218)
                0.078125 = fieldNorm(doc=8867)
          0.033199042 = weight(abstract_txt:attention in 8867) [ClassicSimilarity], result of:
            0.033199042 = score(doc=8867,freq=1.0), product of:
              0.08050506 = queryWeight, product of:
                1.3341904 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.011431231 = queryNorm
              0.41238457 = fieldWeight in 8867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.078125 = fieldNorm(doc=8867)
          0.009008784 = weight(abstract_txt:that in 8867) [ClassicSimilarity], result of:
            0.009008784 = score(doc=8867,freq=1.0), product of:
              0.048665814 = queryWeight, product of:
                1.7967135 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.011431231 = queryNorm
              0.18511525 = fieldWeight in 8867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=8867)
          0.6176118 = weight(abstract_txt:summarisation in 8867) [ClassicSimilarity], result of:
            0.6176118 = score(doc=8867,freq=1.0), product of:
              0.8581684 = queryWeight, product of:
                8.149416 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.011431231 = queryNorm
              0.71968603 = fieldWeight in 8867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.078125 = fieldNorm(doc=8867)
        0.24 = coord(6/25)
    
  5. Sparck Jones, K.: Automatic summarising : the state of the art (2007) 0.17
    0.1658705 = sum of:
      0.1658705 = product of:
        0.8293525 = sum of:
          0.012835884 = weight(abstract_txt:used in 932) [ClassicSimilarity], result of:
            0.012835884 = score(doc=932,freq=1.0), product of:
              0.048908807 = queryWeight, product of:
                1.2736361 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.011431231 = queryNorm
              0.26244524 = fieldWeight in 932, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.078125 = fieldNorm(doc=932)
          0.06835377 = weight(abstract_txt:evaluation in 932) [ClassicSimilarity], result of:
            0.06835377 = score(doc=932,freq=5.0), product of:
              0.08722109 = queryWeight, product of:
                1.7008367 = boost
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.011431231 = queryNorm
              0.78368396 = fieldWeight in 932, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.078125 = fieldNorm(doc=932)
          0.012740344 = weight(abstract_txt:that in 932) [ClassicSimilarity], result of:
            0.012740344 = score(doc=932,freq=2.0), product of:
              0.048665814 = queryWeight, product of:
                1.7967135 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.011431231 = queryNorm
              0.26179248 = fieldWeight in 932, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=932)
          0.11781066 = weight(abstract_txt:summaries in 932) [ClassicSimilarity], result of:
            0.11781066 = score(doc=932,freq=1.0), product of:
              0.21440074 = queryWeight, product of:
                2.6666436 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.011431231 = queryNorm
              0.5494881 = fieldWeight in 932, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.078125 = fieldNorm(doc=932)
          0.6176118 = weight(abstract_txt:summarisation in 932) [ClassicSimilarity], result of:
            0.6176118 = score(doc=932,freq=1.0), product of:
              0.8581684 = queryWeight, product of:
                8.149416 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.011431231 = queryNorm
              0.71968603 = fieldWeight in 932, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.078125 = fieldNorm(doc=932)
        0.2 = coord(5/25)