Document (#37564)

Author
Huo, W.
Title
Automatic multi-word term extraction and its application to Web-page summarization
Imprint
Guelph, Ontario : University of Guelph
Year
2012
Pages
vii, 104 S
Abstract
In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Content
A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Theme
Computerlinguistik

Similar documents (content)

  1. Xiong, S.; Ji, D.: Query-focused multi-document summarization using hypergraph-based ranking (2016) 0.31
    0.3107999 = sum of:
      0.3107999 = product of:
        1.1099997 = sum of:
          0.03643399 = weight(abstract_txt:learn in 2972) [ClassicSimilarity], result of:
            0.03643399 = score(doc=2972,freq=1.0), product of:
              0.07425106 = queryWeight, product of:
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.011821936 = queryNorm
              0.49068648 = fieldWeight in 2972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.078125 = fieldNorm(doc=2972)
          0.012420493 = weight(abstract_txt:results in 2972) [ClassicSimilarity], result of:
            0.012420493 = score(doc=2972,freq=1.0), product of:
              0.045652796 = queryWeight, product of:
                1.1089127 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.011821936 = queryNorm
              0.27206424 = fieldWeight in 2972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=2972)
          0.03289829 = weight(abstract_txt:document in 2972) [ClassicSimilarity], result of:
            0.03289829 = score(doc=2972,freq=2.0), product of:
              0.06936606 = queryWeight, product of:
                1.3669014 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.011821936 = queryNorm
              0.4742707 = fieldWeight in 2972, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=2972)
          0.15349375 = weight(abstract_txt:summaries in 2972) [ClassicSimilarity], result of:
            0.15349375 = score(doc=2972,freq=1.0), product of:
              0.27933952 = queryWeight, product of:
                3.3595066 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.011821936 = queryNorm
              0.5494881 = fieldWeight in 2972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.078125 = fieldNorm(doc=2972)
          0.30183664 = weight(abstract_txt:summarization in 2972) [ClassicSimilarity], result of:
            0.30183664 = score(doc=2972,freq=2.0), product of:
              0.38302118 = queryWeight, product of:
                4.542449 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.011821936 = queryNorm
              0.78804165 = fieldWeight in 2972, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.078125 = fieldNorm(doc=2972)
          0.30576164 = weight(abstract_txt:multi in 2972) [ClassicSimilarity], result of:
            0.30576164 = score(doc=2972,freq=2.0), product of:
              0.46556053 = queryWeight, product of:
                6.624998 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.011821936 = queryNorm
              0.6567602 = fieldWeight in 2972, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.078125 = fieldNorm(doc=2972)
          0.26715487 = weight(abstract_txt:word in 2972) [ClassicSimilarity], result of:
            0.26715487 = score(doc=2972,freq=2.0), product of:
              0.44486362 = queryWeight, product of:
                6.9232035 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.011821936 = queryNorm
              0.60053205 = fieldWeight in 2972, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=2972)
        0.28 = coord(7/25)
    
  2. Chang, Y.-W.: Influence of human behavior and the principle of least effort on library and information science research (2016) 0.31
    0.3107999 = sum of:
      0.3107999 = product of:
        1.1099997 = sum of:
          0.03643399 = weight(abstract_txt:learn in 2973) [ClassicSimilarity], result of:
            0.03643399 = score(doc=2973,freq=1.0), product of:
              0.07425106 = queryWeight, product of:
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.011821936 = queryNorm
              0.49068648 = fieldWeight in 2973, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.078125 = fieldNorm(doc=2973)
          0.012420493 = weight(abstract_txt:results in 2973) [ClassicSimilarity], result of:
            0.012420493 = score(doc=2973,freq=1.0), product of:
              0.045652796 = queryWeight, product of:
                1.1089127 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.011821936 = queryNorm
              0.27206424 = fieldWeight in 2973, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=2973)
          0.03289829 = weight(abstract_txt:document in 2973) [ClassicSimilarity], result of:
            0.03289829 = score(doc=2973,freq=2.0), product of:
              0.06936606 = queryWeight, product of:
                1.3669014 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.011821936 = queryNorm
              0.4742707 = fieldWeight in 2973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=2973)
          0.15349375 = weight(abstract_txt:summaries in 2973) [ClassicSimilarity], result of:
            0.15349375 = score(doc=2973,freq=1.0), product of:
              0.27933952 = queryWeight, product of:
                3.3595066 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.011821936 = queryNorm
              0.5494881 = fieldWeight in 2973, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.078125 = fieldNorm(doc=2973)
          0.30183664 = weight(abstract_txt:summarization in 2973) [ClassicSimilarity], result of:
            0.30183664 = score(doc=2973,freq=2.0), product of:
              0.38302118 = queryWeight, product of:
                4.542449 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.011821936 = queryNorm
              0.78804165 = fieldWeight in 2973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.078125 = fieldNorm(doc=2973)
          0.30576164 = weight(abstract_txt:multi in 2973) [ClassicSimilarity], result of:
            0.30576164 = score(doc=2973,freq=2.0), product of:
              0.46556053 = queryWeight, product of:
                6.624998 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.011821936 = queryNorm
              0.6567602 = fieldWeight in 2973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.078125 = fieldNorm(doc=2973)
          0.26715487 = weight(abstract_txt:word in 2973) [ClassicSimilarity], result of:
            0.26715487 = score(doc=2973,freq=2.0), product of:
              0.44486362 = queryWeight, product of:
                6.9232035 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.011821936 = queryNorm
              0.60053205 = fieldWeight in 2973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=2973)
        0.28 = coord(7/25)
    
  3. Vilares, J.; Alonso, M.A.; Doval, Y.; Vilares, M.: Studying the effect and treatment of misspelled queries in Cross-Language Information Retrieval (2016) 0.31
    0.3107999 = sum of:
      0.3107999 = product of:
        1.1099997 = sum of:
          0.03643399 = weight(abstract_txt:learn in 2974) [ClassicSimilarity], result of:
            0.03643399 = score(doc=2974,freq=1.0), product of:
              0.07425106 = queryWeight, product of:
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.011821936 = queryNorm
              0.49068648 = fieldWeight in 2974, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.078125 = fieldNorm(doc=2974)
          0.012420493 = weight(abstract_txt:results in 2974) [ClassicSimilarity], result of:
            0.012420493 = score(doc=2974,freq=1.0), product of:
              0.045652796 = queryWeight, product of:
                1.1089127 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.011821936 = queryNorm
              0.27206424 = fieldWeight in 2974, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=2974)
          0.03289829 = weight(abstract_txt:document in 2974) [ClassicSimilarity], result of:
            0.03289829 = score(doc=2974,freq=2.0), product of:
              0.06936606 = queryWeight, product of:
                1.3669014 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.011821936 = queryNorm
              0.4742707 = fieldWeight in 2974, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=2974)
          0.15349375 = weight(abstract_txt:summaries in 2974) [ClassicSimilarity], result of:
            0.15349375 = score(doc=2974,freq=1.0), product of:
              0.27933952 = queryWeight, product of:
                3.3595066 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.011821936 = queryNorm
              0.5494881 = fieldWeight in 2974, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.078125 = fieldNorm(doc=2974)
          0.30183664 = weight(abstract_txt:summarization in 2974) [ClassicSimilarity], result of:
            0.30183664 = score(doc=2974,freq=2.0), product of:
              0.38302118 = queryWeight, product of:
                4.542449 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.011821936 = queryNorm
              0.78804165 = fieldWeight in 2974, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.078125 = fieldNorm(doc=2974)
          0.30576164 = weight(abstract_txt:multi in 2974) [ClassicSimilarity], result of:
            0.30576164 = score(doc=2974,freq=2.0), product of:
              0.46556053 = queryWeight, product of:
                6.624998 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.011821936 = queryNorm
              0.6567602 = fieldWeight in 2974, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.078125 = fieldNorm(doc=2974)
          0.26715487 = weight(abstract_txt:word in 2974) [ClassicSimilarity], result of:
            0.26715487 = score(doc=2974,freq=2.0), product of:
              0.44486362 = queryWeight, product of:
                6.9232035 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.011821936 = queryNorm
              0.60053205 = fieldWeight in 2974, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=2974)
        0.28 = coord(7/25)
    
  4. Pandey, S.; Khanna, P.; Yokota, H.: ¬A semantics and image retrieval system for hierarchical image databases (2016) 0.31
    0.3107999 = sum of:
      0.3107999 = product of:
        1.1099997 = sum of:
          0.03643399 = weight(abstract_txt:learn in 3184) [ClassicSimilarity], result of:
            0.03643399 = score(doc=3184,freq=1.0), product of:
              0.07425106 = queryWeight, product of:
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.011821936 = queryNorm
              0.49068648 = fieldWeight in 3184, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.078125 = fieldNorm(doc=3184)
          0.012420493 = weight(abstract_txt:results in 3184) [ClassicSimilarity], result of:
            0.012420493 = score(doc=3184,freq=1.0), product of:
              0.045652796 = queryWeight, product of:
                1.1089127 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.011821936 = queryNorm
              0.27206424 = fieldWeight in 3184, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=3184)
          0.03289829 = weight(abstract_txt:document in 3184) [ClassicSimilarity], result of:
            0.03289829 = score(doc=3184,freq=2.0), product of:
              0.06936606 = queryWeight, product of:
                1.3669014 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.011821936 = queryNorm
              0.4742707 = fieldWeight in 3184, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=3184)
          0.15349375 = weight(abstract_txt:summaries in 3184) [ClassicSimilarity], result of:
            0.15349375 = score(doc=3184,freq=1.0), product of:
              0.27933952 = queryWeight, product of:
                3.3595066 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.011821936 = queryNorm
              0.5494881 = fieldWeight in 3184, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.078125 = fieldNorm(doc=3184)
          0.30183664 = weight(abstract_txt:summarization in 3184) [ClassicSimilarity], result of:
            0.30183664 = score(doc=3184,freq=2.0), product of:
              0.38302118 = queryWeight, product of:
                4.542449 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.011821936 = queryNorm
              0.78804165 = fieldWeight in 3184, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.078125 = fieldNorm(doc=3184)
          0.30576164 = weight(abstract_txt:multi in 3184) [ClassicSimilarity], result of:
            0.30576164 = score(doc=3184,freq=2.0), product of:
              0.46556053 = queryWeight, product of:
                6.624998 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.011821936 = queryNorm
              0.6567602 = fieldWeight in 3184, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.078125 = fieldNorm(doc=3184)
          0.26715487 = weight(abstract_txt:word in 3184) [ClassicSimilarity], result of:
            0.26715487 = score(doc=3184,freq=2.0), product of:
              0.44486362 = queryWeight, product of:
                6.9232035 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.011821936 = queryNorm
              0.60053205 = fieldWeight in 3184, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=3184)
        0.28 = coord(7/25)
    
  5. Zajic, D.M.; Dorr, B.J.; Lin, J.: Single-document and multi-document summarization techniques for email threads using sentence compression (2008) 0.31
    0.30674192 = sum of:
      0.30674192 = product of:
        0.9585686 = sum of:
          0.009530208 = weight(abstract_txt:these in 2105) [ClassicSimilarity], result of:
            0.009530208 = score(doc=2105,freq=1.0), product of:
              0.03826277 = queryWeight, product of:
                1.0152006 = boost
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.011821936 = queryNorm
              0.24907261 = fieldWeight in 2105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.078125 = fieldNorm(doc=2105)
          0.048681248 = weight(abstract_txt:applies in 2105) [ClassicSimilarity], result of:
            0.048681248 = score(doc=2105,freq=1.0), product of:
              0.09007535 = queryWeight, product of:
                1.1014167 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.011821936 = queryNorm
              0.5404503 = fieldWeight in 2105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.078125 = fieldNorm(doc=2105)
          0.012420493 = weight(abstract_txt:results in 2105) [ClassicSimilarity], result of:
            0.012420493 = score(doc=2105,freq=1.0), product of:
              0.045652796 = queryWeight, product of:
                1.1089127 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.011821936 = queryNorm
              0.27206424 = fieldWeight in 2105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=2105)
          0.00931418 = weight(abstract_txt:from in 2105) [ClassicSimilarity], result of:
            0.00931418 = score(doc=2105,freq=1.0), product of:
              0.043135516 = queryWeight, product of:
                1.3201606 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.011821936 = queryNorm
              0.21592833 = fieldWeight in 2105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=2105)
          0.03289829 = weight(abstract_txt:document in 2105) [ClassicSimilarity], result of:
            0.03289829 = score(doc=2105,freq=2.0), product of:
              0.06936606 = queryWeight, product of:
                1.3669014 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.011821936 = queryNorm
              0.4742707 = fieldWeight in 2105, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=2105)
          0.06483336 = weight(abstract_txt:generate in 2105) [ClassicSimilarity], result of:
            0.06483336 = score(doc=2105,freq=1.0), product of:
              0.13737483 = queryWeight, product of:
                1.923611 = boost
                6.0408955 = idf(docFreq=285, maxDocs=44218)
                0.011821936 = queryNorm
              0.47194496 = fieldWeight in 2105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0408955 = idf(docFreq=285, maxDocs=44218)
                0.078125 = fieldNorm(doc=2105)
          0.5646847 = weight(abstract_txt:summarization in 2105) [ClassicSimilarity], result of:
            0.5646847 = score(doc=2105,freq=7.0), product of:
              0.38302118 = queryWeight, product of:
                4.542449 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.011821936 = queryNorm
              1.474291 = fieldWeight in 2105, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.078125 = fieldNorm(doc=2105)
          0.2162061 = weight(abstract_txt:multi in 2105) [ClassicSimilarity], result of:
            0.2162061 = score(doc=2105,freq=1.0), product of:
              0.46556053 = queryWeight, product of:
                6.624998 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.011821936 = queryNorm
              0.46439958 = fieldWeight in 2105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.078125 = fieldNorm(doc=2105)
        0.32 = coord(8/25)