Document (#39733)

Author
Szlávik, Z.
Tombros, A.
Lalmas, M.
Title
Summarisation of the logical structure of XML documents
Source
Information processing and management. 48(2012) no.5, S.956-968
Year
2012
Abstract
Summarisation is traditionally used to produce summaries of the textual contents of documents. In this paper, it is argued that summarisation methods can also be applied to the logical structure of XML documents. Structure summarisation selects the most important elements of the logical structure and ensures that the user's attention is focused towards sections, subsections, etc. that are believed to be of particular interest. Structure summaries are shown to users as hierarchical tables of contents. This paper discusses methods for structure summarisation that use various features of XML elements in order to select document portions that a user's attention should be focused to. An evaluation methodology for structure summarisation is also introduced and summarisation results using various summariser versions are presented and compared to one another. We show that data sets used in information retrieval evaluation can be used effectively in order to produce high quality (query independent) structure summaries. We also discuss the choice and effectiveness of particular summariser features with respect to several evaluation measures.
Content
Beitrag in einem Themenheft "Large-Scale and Distributed Systems for Information Retrieval" Vgl.: doi:10.1016/j.ipm.2011.11.002.
Object
XML

Similar documents (author)

  1. Tombros, T.; Crestani, F.: Users' perception of relevance of spoken documents (2000) 1.85
    1.8516622 = sum of:
      1.8516622 = product of:
        3.7033243 = sum of:
          3.7033243 = weight(author_txt:tombros in 5997) [ClassicSimilarity], result of:
            3.7033243 = score(doc=5997,freq=1.0), product of:
              0.7824752 = queryWeight, product of:
                1.1209912 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.07374238 = queryNorm
              4.732833 = fieldWeight in 5997, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.5 = fieldNorm(doc=5997)
        0.5 = coord(1/2)
    
  2. Tao, Y.; Tombros, A.: How collaborators make sense of tasks together : a comparative analysis of collaborative sensemaking behavior in collaborative information-seeking tasks (2017) 1.85
    1.8516622 = sum of:
      1.8516622 = product of:
        3.7033243 = sum of:
          3.7033243 = weight(author_txt:tombros in 5430) [ClassicSimilarity], result of:
            3.7033243 = score(doc=5430,freq=1.0), product of:
              0.7824752 = queryWeight, product of:
                1.1209912 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.07374238 = queryNorm
              4.732833 = fieldWeight in 5430, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.5 = fieldNorm(doc=5430)
        0.5 = coord(1/2)
    
  3. Lalmas, M.: Logical models in information retrieval : introduction and overview (1998) 1.64
    1.6431043 = sum of:
      1.6431043 = product of:
        3.2862086 = sum of:
          3.2862086 = weight(author_txt:lalmas in 3669) [ClassicSimilarity], result of:
            3.2862086 = score(doc=3669,freq=1.0), product of:
              0.62268174 = queryWeight, product of:
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.07374238 = queryNorm
              5.277509 = fieldWeight in 3669, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.625 = fieldNorm(doc=3669)
        0.5 = coord(1/2)
    
  4. Lalmas, M.: XML information retrieval (2009) 1.64
    1.6431043 = sum of:
      1.6431043 = product of:
        3.2862086 = sum of:
          3.2862086 = weight(author_txt:lalmas in 881) [ClassicSimilarity], result of:
            3.2862086 = score(doc=881,freq=1.0), product of:
              0.62268174 = queryWeight, product of:
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.07374238 = queryNorm
              5.277509 = fieldWeight in 881, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.625 = fieldNorm(doc=881)
        0.5 = coord(1/2)
    
  5. Lalmas, M.: XML retrieval (2009) 1.64
    1.6431043 = sum of:
      1.6431043 = product of:
        3.2862086 = sum of:
          3.2862086 = weight(author_txt:lalmas in 1999) [ClassicSimilarity], result of:
            3.2862086 = score(doc=1999,freq=1.0), product of:
              0.62268174 = queryWeight, product of:
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.07374238 = queryNorm
              5.277509 = fieldWeight in 1999, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.625 = fieldNorm(doc=1999)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Salton, G.: Automatic text structuring and summarization (1997) 0.33
    0.32941648 = sum of:
      0.32941648 = product of:
        1.6470823 = sum of:
          0.019480025 = weight(abstract_txt:methods in 1146) [ClassicSimilarity], result of:
            0.019480025 = score(doc=1146,freq=1.0), product of:
              0.04980081 = queryWeight, product of:
                1.0464898 = boost
                4.172361 = idf(docFreq=1790, maxDocs=42740)
                0.011405636 = queryNorm
              0.39115882 = fieldWeight in 1146, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.172361 = idf(docFreq=1790, maxDocs=42740)
                0.09375 = fieldNorm(doc=1146)
          0.011048313 = weight(abstract_txt:that in 1146) [ClassicSimilarity], result of:
            0.011048313 = score(doc=1146,freq=1.0), product of:
              0.04921314 = queryWeight, product of:
                1.8018475 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.011405636 = queryNorm
              0.22449924 = fieldWeight in 1146, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.09375 = fieldNorm(doc=1146)
          0.19669035 = weight(abstract_txt:summaries in 1146) [ClassicSimilarity], result of:
            0.19669035 = score(doc=1146,freq=2.0), product of:
              0.21137446 = queryWeight, product of:
                2.6405156 = boost
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.011405636 = queryNorm
              0.93053037 = fieldWeight in 1146, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.09375 = fieldNorm(doc=1146)
          0.12683453 = weight(abstract_txt:structure in 1146) [ClassicSimilarity], result of:
            0.12683453 = score(doc=1146,freq=2.0), product of:
              0.21878205 = queryWeight, product of:
                4.386849 = boost
                4.3725977 = idf(docFreq=1465, maxDocs=42740)
                0.011405636 = queryNorm
              0.57973003 = fieldWeight in 1146, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3725977 = idf(docFreq=1465, maxDocs=42740)
                0.09375 = fieldNorm(doc=1146)
          1.2930291 = weight(abstract_txt:summarisation in 1146) [ClassicSimilarity], result of:
            1.2930291 = score(doc=1146,freq=3.0), product of:
              0.85947084 = queryWeight, product of:
                8.133293 = boost
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.011405636 = queryNorm
              1.5044478 = fieldWeight in 1146, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.09375 = fieldNorm(doc=1146)
        0.2 = coord(5/25)
    
  2. White, R.W.; Jose, J.M.; Ruthven, I.: ¬A task-oriented study on the influencing effects of query-biased summarisation in web searching (2003) 0.25
    0.25137228 = sum of:
      0.25137228 = product of:
        1.0473845 = sum of:
          0.010308863 = weight(abstract_txt:used in 3082) [ClassicSimilarity], result of:
            0.010308863 = score(doc=3082,freq=1.0), product of:
              0.04887369 = queryWeight, product of:
                1.2696968 = boost
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.011405636 = queryNorm
              0.21092868 = fieldWeight in 3082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.0625 = fieldNorm(doc=3082)
          0.03518253 = weight(abstract_txt:user's in 3082) [ClassicSimilarity], result of:
            0.03518253 = score(doc=3082,freq=1.0), product of:
              0.09678073 = queryWeight, product of:
                1.4588522 = boost
                5.8164515 = idf(docFreq=345, maxDocs=42740)
                0.011405636 = queryNorm
              0.36352822 = fieldWeight in 3082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8164515 = idf(docFreq=345, maxDocs=42740)
                0.0625 = fieldNorm(doc=3082)
          0.034395516 = weight(abstract_txt:evaluation in 3082) [ClassicSimilarity], result of:
            0.034395516 = score(doc=3082,freq=2.0), product of:
              0.086614884 = queryWeight, product of:
                1.6902803 = boost
                4.492771 = idf(docFreq=1299, maxDocs=42740)
                0.011405636 = queryNorm
              0.3971086 = fieldWeight in 3082, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.492771 = idf(docFreq=1299, maxDocs=42740)
                0.0625 = fieldNorm(doc=3082)
          0.012757492 = weight(abstract_txt:that in 3082) [ClassicSimilarity], result of:
            0.012757492 = score(doc=3082,freq=3.0), product of:
              0.04921314 = queryWeight, product of:
                1.8018475 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.011405636 = queryNorm
              0.2592294 = fieldWeight in 3082, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=3082)
          0.092720725 = weight(abstract_txt:summaries in 3082) [ClassicSimilarity], result of:
            0.092720725 = score(doc=3082,freq=1.0), product of:
              0.21137446 = queryWeight, product of:
                2.6405156 = boost
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.011405636 = queryNorm
              0.43865624 = fieldWeight in 3082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.0625 = fieldNorm(doc=3082)
          0.86201936 = weight(abstract_txt:summarisation in 3082) [ClassicSimilarity], result of:
            0.86201936 = score(doc=3082,freq=3.0), product of:
              0.85947084 = queryWeight, product of:
                8.133293 = boost
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.011405636 = queryNorm
              1.0029652 = fieldWeight in 3082, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.0625 = fieldNorm(doc=3082)
        0.24 = coord(6/25)
    
  3. Sweeney, S.; Crestani, F.; Losada, D.E.: 'Show me more' : incremental length summarisation using novelty detection (2008) 0.21
    0.2075386 = sum of:
      0.2075386 = product of:
        1.037693 = sum of:
          0.010836809 = weight(abstract_txt:also in 4055) [ClassicSimilarity], result of:
            0.010836809 = score(doc=4055,freq=1.0), product of:
              0.050528403 = queryWeight, product of:
                1.2910119 = boost
                3.4315145 = idf(docFreq=3756, maxDocs=42740)
                0.011405636 = queryNorm
              0.21446966 = fieldWeight in 4055, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4315145 = idf(docFreq=3756, maxDocs=42740)
                0.0625 = fieldNorm(doc=4055)
          0.052628692 = weight(abstract_txt:produce in 4055) [ClassicSimilarity], result of:
            0.052628692 = score(doc=4055,freq=2.0), product of:
              0.10047143 = queryWeight, product of:
                1.4864084 = boost
                5.926318 = idf(docFreq=309, maxDocs=42740)
                0.011405636 = queryNorm
              0.5238175 = fieldWeight in 4055, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.926318 = idf(docFreq=309, maxDocs=42740)
                0.0625 = fieldNorm(doc=4055)
          0.019487392 = weight(abstract_txt:that in 4055) [ClassicSimilarity], result of:
            0.019487392 = score(doc=4055,freq=7.0), product of:
              0.04921314 = queryWeight, product of:
                1.8018475 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.011405636 = queryNorm
              0.39597943 = fieldWeight in 4055, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=4055)
          0.092720725 = weight(abstract_txt:summaries in 4055) [ClassicSimilarity], result of:
            0.092720725 = score(doc=4055,freq=1.0), product of:
              0.21137446 = queryWeight, product of:
                2.6405156 = boost
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.011405636 = queryNorm
              0.43865624 = fieldWeight in 4055, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.0625 = fieldNorm(doc=4055)
          0.86201936 = weight(abstract_txt:summarisation in 4055) [ClassicSimilarity], result of:
            0.86201936 = score(doc=4055,freq=3.0), product of:
              0.85947084 = queryWeight, product of:
                8.133293 = boost
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.011405636 = queryNorm
              1.0029652 = fieldWeight in 4055, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.0625 = fieldNorm(doc=4055)
        0.2 = coord(5/25)
    
  4. Endres-Niggemeyer, B.: Summarising text for intelligent communication : results of the Dagstuhl seminar (1994) 0.17
    0.1729289 = sum of:
      0.1729289 = product of:
        0.7205371 = sum of:
          0.016233357 = weight(abstract_txt:methods in 867) [ClassicSimilarity], result of:
            0.016233357 = score(doc=867,freq=1.0), product of:
              0.04980081 = queryWeight, product of:
                1.0464898 = boost
                4.172361 = idf(docFreq=1790, maxDocs=42740)
                0.011405636 = queryNorm
              0.3259657 = fieldWeight in 867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.172361 = idf(docFreq=1790, maxDocs=42740)
                0.078125 = fieldNorm(doc=867)
          0.01914208 = weight(abstract_txt:particular in 867) [ClassicSimilarity], result of:
            0.01914208 = score(doc=867,freq=1.0), product of:
              0.05558492 = queryWeight, product of:
                1.1055931 = boost
                4.4080057 = idf(docFreq=1414, maxDocs=42740)
                0.011405636 = queryNorm
              0.34437543 = fieldWeight in 867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4080057 = idf(docFreq=1414, maxDocs=42740)
                0.078125 = fieldNorm(doc=867)
          0.019910328 = weight(abstract_txt:order in 867) [ClassicSimilarity], result of:
            0.019910328 = score(doc=867,freq=1.0), product of:
              0.057062373 = queryWeight, product of:
                1.1201901 = boost
                4.466204 = idf(docFreq=1334, maxDocs=42740)
                0.011405636 = queryNorm
              0.3489222 = fieldWeight in 867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.466204 = idf(docFreq=1334, maxDocs=42740)
                0.078125 = fieldNorm(doc=867)
          0.033935532 = weight(abstract_txt:attention in 867) [ClassicSimilarity], result of:
            0.033935532 = score(doc=867,freq=1.0), product of:
              0.08142054 = queryWeight, product of:
                1.3380854 = boost
                5.334954 = idf(docFreq=559, maxDocs=42740)
                0.011405636 = queryNorm
              0.41679326 = fieldWeight in 867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.334954 = idf(docFreq=559, maxDocs=42740)
                0.078125 = fieldNorm(doc=867)
          0.009206927 = weight(abstract_txt:that in 867) [ClassicSimilarity], result of:
            0.009206927 = score(doc=867,freq=1.0), product of:
              0.04921314 = queryWeight, product of:
                1.8018475 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.011405636 = queryNorm
              0.18708271 = fieldWeight in 867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.078125 = fieldNorm(doc=867)
          0.6221089 = weight(abstract_txt:summarisation in 867) [ClassicSimilarity], result of:
            0.6221089 = score(doc=867,freq=1.0), product of:
              0.85947084 = queryWeight, product of:
                8.133293 = boost
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.011405636 = queryNorm
              0.7238278 = fieldWeight in 867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.078125 = fieldNorm(doc=867)
        0.24 = coord(6/25)
    
  5. Sparck Jones, K.: Automatic summarising : the state of the art (2007) 0.17
    0.16637932 = sum of:
      0.16637932 = product of:
        0.83189654 = sum of:
          0.012886078 = weight(abstract_txt:used in 2933) [ClassicSimilarity], result of:
            0.012886078 = score(doc=2933,freq=1.0), product of:
              0.04887369 = queryWeight, product of:
                1.2696968 = boost
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.011405636 = queryNorm
              0.26366085 = fieldWeight in 2933, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.078125 = fieldNorm(doc=2933)
          0.06798011 = weight(abstract_txt:evaluation in 2933) [ClassicSimilarity], result of:
            0.06798011 = score(doc=2933,freq=5.0), product of:
              0.086614884 = queryWeight, product of:
                1.6902803 = boost
                4.492771 = idf(docFreq=1299, maxDocs=42740)
                0.011405636 = queryNorm
              0.7848548 = fieldWeight in 2933, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.492771 = idf(docFreq=1299, maxDocs=42740)
                0.078125 = fieldNorm(doc=2933)
          0.013020561 = weight(abstract_txt:that in 2933) [ClassicSimilarity], result of:
            0.013020561 = score(doc=2933,freq=2.0), product of:
              0.04921314 = queryWeight, product of:
                1.8018475 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.011405636 = queryNorm
              0.2645749 = fieldWeight in 2933, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.078125 = fieldNorm(doc=2933)
          0.115900904 = weight(abstract_txt:summaries in 2933) [ClassicSimilarity], result of:
            0.115900904 = score(doc=2933,freq=1.0), product of:
              0.21137446 = queryWeight, product of:
                2.6405156 = boost
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.011405636 = queryNorm
              0.5483203 = fieldWeight in 2933, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0185 = idf(docFreq=103, maxDocs=42740)
                0.078125 = fieldNorm(doc=2933)
          0.6221089 = weight(abstract_txt:summarisation in 2933) [ClassicSimilarity], result of:
            0.6221089 = score(doc=2933,freq=1.0), product of:
              0.85947084 = queryWeight, product of:
                8.133293 = boost
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.011405636 = queryNorm
              0.7238278 = fieldWeight in 2933, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.078125 = fieldNorm(doc=2933)
        0.2 = coord(5/25)