Document (#37565)

Author
Huo, W.
Title
Automatic multi-word term extraction and its application to Web-page summarization
Imprint
Guelph, Ontario : University of Guelph
Year
2012
Pages
vii, 104 S
Abstract
In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Content
A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Theme
Computerlinguistik

Similar documents (content)

  1. Xiong, S.; Ji, D.: Query-focused multi-document summarization using hypergraph-based ranking (2016) 0.31
    0.31090304 = sum of:
      0.31090304 = product of:
        1.110368 = sum of:
          0.036728214 = weight(abstract_txt:learn in 4437) [ClassicSimilarity], result of:
            0.036728214 = score(doc=4437,freq=1.0), product of:
              0.0745175 = queryWeight, product of:
                6.308869 = idf(docFreq=213, maxDocs=43254)
                0.011811547 = queryNorm
              0.49288037 = fieldWeight in 4437, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.308869 = idf(docFreq=213, maxDocs=43254)
                0.078125 = fieldNorm(doc=4437)
          0.012549954 = weight(abstract_txt:results in 4437) [ClassicSimilarity], result of:
            0.012549954 = score(doc=4437,freq=1.0), product of:
              0.04588772 = queryWeight, product of:
                1.1097728 = boost
                3.5007057 = idf(docFreq=3547, maxDocs=43254)
                0.011811547 = queryNorm
              0.27349263 = fieldWeight in 4437, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5007057 = idf(docFreq=3547, maxDocs=43254)
                0.078125 = fieldNorm(doc=4437)
          0.03252766 = weight(abstract_txt:document in 4437) [ClassicSimilarity], result of:
            0.03252766 = score(doc=4437,freq=2.0), product of:
              0.06872166 = queryWeight, product of:
                1.3581029 = boost
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.011811547 = queryNorm
              0.47332472 = fieldWeight in 4437, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.078125 = fieldNorm(doc=4437)
          0.15185952 = weight(abstract_txt:summaries in 4437) [ClassicSimilarity], result of:
            0.15185952 = score(doc=4437,freq=1.0), product of:
              0.27685997 = queryWeight, product of:
                3.3385782 = boost
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.011811547 = queryNorm
              0.5485066 = fieldWeight in 4437, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.078125 = fieldNorm(doc=4437)
          0.30145648 = weight(abstract_txt:summarization in 4437) [ClassicSimilarity], result of:
            0.30145648 = score(doc=4437,freq=2.0), product of:
              0.38201886 = queryWeight, product of:
                4.528384 = boost
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.011811547 = queryNorm
              0.7891141 = fieldWeight in 4437, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.078125 = fieldNorm(doc=4437)
          0.3087746 = weight(abstract_txt:multi in 4437) [ClassicSimilarity], result of:
            0.3087746 = score(doc=4437,freq=2.0), product of:
              0.4677805 = queryWeight, product of:
                6.6288915 = boost
                5.9744015 = idf(docFreq=298, maxDocs=43254)
                0.011811547 = queryNorm
              0.66008437 = fieldWeight in 4437, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9744015 = idf(docFreq=298, maxDocs=43254)
                0.078125 = fieldNorm(doc=4437)
          0.26647165 = weight(abstract_txt:word in 4437) [ClassicSimilarity], result of:
            0.26647165 = score(doc=4437,freq=2.0), product of:
              0.443315 = queryWeight, product of:
                6.8987765 = boost
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.011811547 = queryNorm
              0.60108876 = fieldWeight in 4437, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.078125 = fieldNorm(doc=4437)
        0.28 = coord(7/25)
    
  2. Chang, Y.-W.: Influence of human behavior and the principle of least effort on library and information science research (2016) 0.31
    0.31090304 = sum of:
      0.31090304 = product of:
        1.110368 = sum of:
          0.036728214 = weight(abstract_txt:learn in 4438) [ClassicSimilarity], result of:
            0.036728214 = score(doc=4438,freq=1.0), product of:
              0.0745175 = queryWeight, product of:
                6.308869 = idf(docFreq=213, maxDocs=43254)
                0.011811547 = queryNorm
              0.49288037 = fieldWeight in 4438, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.308869 = idf(docFreq=213, maxDocs=43254)
                0.078125 = fieldNorm(doc=4438)
          0.012549954 = weight(abstract_txt:results in 4438) [ClassicSimilarity], result of:
            0.012549954 = score(doc=4438,freq=1.0), product of:
              0.04588772 = queryWeight, product of:
                1.1097728 = boost
                3.5007057 = idf(docFreq=3547, maxDocs=43254)
                0.011811547 = queryNorm
              0.27349263 = fieldWeight in 4438, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5007057 = idf(docFreq=3547, maxDocs=43254)
                0.078125 = fieldNorm(doc=4438)
          0.03252766 = weight(abstract_txt:document in 4438) [ClassicSimilarity], result of:
            0.03252766 = score(doc=4438,freq=2.0), product of:
              0.06872166 = queryWeight, product of:
                1.3581029 = boost
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.011811547 = queryNorm
              0.47332472 = fieldWeight in 4438, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.078125 = fieldNorm(doc=4438)
          0.15185952 = weight(abstract_txt:summaries in 4438) [ClassicSimilarity], result of:
            0.15185952 = score(doc=4438,freq=1.0), product of:
              0.27685997 = queryWeight, product of:
                3.3385782 = boost
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.011811547 = queryNorm
              0.5485066 = fieldWeight in 4438, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.078125 = fieldNorm(doc=4438)
          0.30145648 = weight(abstract_txt:summarization in 4438) [ClassicSimilarity], result of:
            0.30145648 = score(doc=4438,freq=2.0), product of:
              0.38201886 = queryWeight, product of:
                4.528384 = boost
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.011811547 = queryNorm
              0.7891141 = fieldWeight in 4438, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.078125 = fieldNorm(doc=4438)
          0.3087746 = weight(abstract_txt:multi in 4438) [ClassicSimilarity], result of:
            0.3087746 = score(doc=4438,freq=2.0), product of:
              0.4677805 = queryWeight, product of:
                6.6288915 = boost
                5.9744015 = idf(docFreq=298, maxDocs=43254)
                0.011811547 = queryNorm
              0.66008437 = fieldWeight in 4438, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9744015 = idf(docFreq=298, maxDocs=43254)
                0.078125 = fieldNorm(doc=4438)
          0.26647165 = weight(abstract_txt:word in 4438) [ClassicSimilarity], result of:
            0.26647165 = score(doc=4438,freq=2.0), product of:
              0.443315 = queryWeight, product of:
                6.8987765 = boost
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.011811547 = queryNorm
              0.60108876 = fieldWeight in 4438, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.078125 = fieldNorm(doc=4438)
        0.28 = coord(7/25)
    
  3. Vilares, J.; Alonso, M.A.; Doval, Y.; Vilares, M.: Studying the effect and treatment of misspelled queries in Cross-Language Information Retrieval (2016) 0.31
    0.31090304 = sum of:
      0.31090304 = product of:
        1.110368 = sum of:
          0.036728214 = weight(abstract_txt:learn in 4439) [ClassicSimilarity], result of:
            0.036728214 = score(doc=4439,freq=1.0), product of:
              0.0745175 = queryWeight, product of:
                6.308869 = idf(docFreq=213, maxDocs=43254)
                0.011811547 = queryNorm
              0.49288037 = fieldWeight in 4439, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.308869 = idf(docFreq=213, maxDocs=43254)
                0.078125 = fieldNorm(doc=4439)
          0.012549954 = weight(abstract_txt:results in 4439) [ClassicSimilarity], result of:
            0.012549954 = score(doc=4439,freq=1.0), product of:
              0.04588772 = queryWeight, product of:
                1.1097728 = boost
                3.5007057 = idf(docFreq=3547, maxDocs=43254)
                0.011811547 = queryNorm
              0.27349263 = fieldWeight in 4439, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5007057 = idf(docFreq=3547, maxDocs=43254)
                0.078125 = fieldNorm(doc=4439)
          0.03252766 = weight(abstract_txt:document in 4439) [ClassicSimilarity], result of:
            0.03252766 = score(doc=4439,freq=2.0), product of:
              0.06872166 = queryWeight, product of:
                1.3581029 = boost
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.011811547 = queryNorm
              0.47332472 = fieldWeight in 4439, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.078125 = fieldNorm(doc=4439)
          0.15185952 = weight(abstract_txt:summaries in 4439) [ClassicSimilarity], result of:
            0.15185952 = score(doc=4439,freq=1.0), product of:
              0.27685997 = queryWeight, product of:
                3.3385782 = boost
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.011811547 = queryNorm
              0.5485066 = fieldWeight in 4439, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.078125 = fieldNorm(doc=4439)
          0.30145648 = weight(abstract_txt:summarization in 4439) [ClassicSimilarity], result of:
            0.30145648 = score(doc=4439,freq=2.0), product of:
              0.38201886 = queryWeight, product of:
                4.528384 = boost
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.011811547 = queryNorm
              0.7891141 = fieldWeight in 4439, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.078125 = fieldNorm(doc=4439)
          0.3087746 = weight(abstract_txt:multi in 4439) [ClassicSimilarity], result of:
            0.3087746 = score(doc=4439,freq=2.0), product of:
              0.4677805 = queryWeight, product of:
                6.6288915 = boost
                5.9744015 = idf(docFreq=298, maxDocs=43254)
                0.011811547 = queryNorm
              0.66008437 = fieldWeight in 4439, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9744015 = idf(docFreq=298, maxDocs=43254)
                0.078125 = fieldNorm(doc=4439)
          0.26647165 = weight(abstract_txt:word in 4439) [ClassicSimilarity], result of:
            0.26647165 = score(doc=4439,freq=2.0), product of:
              0.443315 = queryWeight, product of:
                6.8987765 = boost
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.011811547 = queryNorm
              0.60108876 = fieldWeight in 4439, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.078125 = fieldNorm(doc=4439)
        0.28 = coord(7/25)
    
  4. Pandey, S.; Khanna, P.; Yokota, H.: ¬A semantics and image retrieval system for hierarchical image databases (2016) 0.31
    0.31090304 = sum of:
      0.31090304 = product of:
        1.110368 = sum of:
          0.036728214 = weight(abstract_txt:learn in 4649) [ClassicSimilarity], result of:
            0.036728214 = score(doc=4649,freq=1.0), product of:
              0.0745175 = queryWeight, product of:
                6.308869 = idf(docFreq=213, maxDocs=43254)
                0.011811547 = queryNorm
              0.49288037 = fieldWeight in 4649, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.308869 = idf(docFreq=213, maxDocs=43254)
                0.078125 = fieldNorm(doc=4649)
          0.012549954 = weight(abstract_txt:results in 4649) [ClassicSimilarity], result of:
            0.012549954 = score(doc=4649,freq=1.0), product of:
              0.04588772 = queryWeight, product of:
                1.1097728 = boost
                3.5007057 = idf(docFreq=3547, maxDocs=43254)
                0.011811547 = queryNorm
              0.27349263 = fieldWeight in 4649, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5007057 = idf(docFreq=3547, maxDocs=43254)
                0.078125 = fieldNorm(doc=4649)
          0.03252766 = weight(abstract_txt:document in 4649) [ClassicSimilarity], result of:
            0.03252766 = score(doc=4649,freq=2.0), product of:
              0.06872166 = queryWeight, product of:
                1.3581029 = boost
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.011811547 = queryNorm
              0.47332472 = fieldWeight in 4649, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.078125 = fieldNorm(doc=4649)
          0.15185952 = weight(abstract_txt:summaries in 4649) [ClassicSimilarity], result of:
            0.15185952 = score(doc=4649,freq=1.0), product of:
              0.27685997 = queryWeight, product of:
                3.3385782 = boost
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.011811547 = queryNorm
              0.5485066 = fieldWeight in 4649, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0208845 = idf(docFreq=104, maxDocs=43254)
                0.078125 = fieldNorm(doc=4649)
          0.30145648 = weight(abstract_txt:summarization in 4649) [ClassicSimilarity], result of:
            0.30145648 = score(doc=4649,freq=2.0), product of:
              0.38201886 = queryWeight, product of:
                4.528384 = boost
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.011811547 = queryNorm
              0.7891141 = fieldWeight in 4649, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.078125 = fieldNorm(doc=4649)
          0.3087746 = weight(abstract_txt:multi in 4649) [ClassicSimilarity], result of:
            0.3087746 = score(doc=4649,freq=2.0), product of:
              0.4677805 = queryWeight, product of:
                6.6288915 = boost
                5.9744015 = idf(docFreq=298, maxDocs=43254)
                0.011811547 = queryNorm
              0.66008437 = fieldWeight in 4649, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9744015 = idf(docFreq=298, maxDocs=43254)
                0.078125 = fieldNorm(doc=4649)
          0.26647165 = weight(abstract_txt:word in 4649) [ClassicSimilarity], result of:
            0.26647165 = score(doc=4649,freq=2.0), product of:
              0.443315 = queryWeight, product of:
                6.8987765 = boost
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.011811547 = queryNorm
              0.60108876 = fieldWeight in 4649, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.078125 = fieldNorm(doc=4649)
        0.28 = coord(7/25)
    
  5. Zajic, D.M.; Dorr, B.J.; Lin, J.: Single-document and multi-document summarization techniques for email threads using sentence compression (2008) 0.31
    0.30765384 = sum of:
      0.30765384 = product of:
        0.9614183 = sum of:
          0.009656839 = weight(abstract_txt:these in 4106) [ClassicSimilarity], result of:
            0.009656839 = score(doc=4106,freq=1.0), product of:
              0.038532313 = queryWeight, product of:
                1.0169472 = boost
                3.2078931 = idf(docFreq=4754, maxDocs=43254)
                0.011811547 = queryNorm
              0.25061664 = fieldWeight in 4106, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2078931 = idf(docFreq=4754, maxDocs=43254)
                0.078125 = fieldNorm(doc=4106)
          0.049427383 = weight(abstract_txt:applies in 4106) [ClassicSimilarity], result of:
            0.049427383 = score(doc=4106,freq=1.0), product of:
              0.09083157 = queryWeight, product of:
                1.1040514 = boost
                6.965315 = idf(docFreq=110, maxDocs=43254)
                0.011811547 = queryNorm
              0.54416525 = fieldWeight in 4106, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.965315 = idf(docFreq=110, maxDocs=43254)
                0.078125 = fieldNorm(doc=4106)
          0.012549954 = weight(abstract_txt:results in 4106) [ClassicSimilarity], result of:
            0.012549954 = score(doc=4106,freq=1.0), product of:
              0.04588772 = queryWeight, product of:
                1.1097728 = boost
                3.5007057 = idf(docFreq=3547, maxDocs=43254)
                0.011811547 = queryNorm
              0.27349263 = fieldWeight in 4106, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5007057 = idf(docFreq=3547, maxDocs=43254)
                0.078125 = fieldNorm(doc=4106)
          0.009433562 = weight(abstract_txt:from in 4106) [ClassicSimilarity], result of:
            0.009433562 = score(doc=4106,freq=1.0), product of:
              0.043425947 = queryWeight, product of:
                1.3222274 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.011811547 = queryNorm
              0.2172333 = fieldWeight in 4106, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.078125 = fieldNorm(doc=4106)
          0.03252766 = weight(abstract_txt:document in 4106) [ClassicSimilarity], result of:
            0.03252766 = score(doc=4106,freq=2.0), product of:
              0.06872166 = queryWeight, product of:
                1.3581029 = boost
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.011811547 = queryNorm
              0.47332472 = fieldWeight in 4106, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.078125 = fieldNorm(doc=4106)
          0.065512836 = weight(abstract_txt:generate in 4106) [ClassicSimilarity], result of:
            0.065512836 = score(doc=4106,freq=1.0), product of:
              0.13808696 = queryWeight, product of:
                1.9251394 = boost
                6.0727262 = idf(docFreq=270, maxDocs=43254)
                0.011811547 = queryNorm
              0.47443175 = fieldWeight in 4106, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0727262 = idf(docFreq=270, maxDocs=43254)
                0.078125 = fieldNorm(doc=4106)
          0.5639734 = weight(abstract_txt:summarization in 4106) [ClassicSimilarity], result of:
            0.5639734 = score(doc=4106,freq=7.0), product of:
              0.38201886 = queryWeight, product of:
                4.528384 = boost
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.011811547 = queryNorm
              1.4762973 = fieldWeight in 4106, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.1422453 = idf(docFreq=92, maxDocs=43254)
                0.078125 = fieldNorm(doc=4106)
          0.2183366 = weight(abstract_txt:multi in 4106) [ClassicSimilarity], result of:
            0.2183366 = score(doc=4106,freq=1.0), product of:
              0.4677805 = queryWeight, product of:
                6.6288915 = boost
                5.9744015 = idf(docFreq=298, maxDocs=43254)
                0.011811547 = queryNorm
              0.46675012 = fieldWeight in 4106, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9744015 = idf(docFreq=298, maxDocs=43254)
                0.078125 = fieldNorm(doc=4106)
        0.32 = coord(8/25)