Document (#36667)

Author
Pirkola, A.
Title
Constructing topic-specific search keyphrase suggestion tools for Web information retrieval
Source
Information und Wissen: global, sozial und frei? Proceedings des 12. Internationalen Symposiums für Informationswissenschaft (ISI 2011) ; Hildesheim, 9. - 11. März 2011. Hrsg.: J. Griesbaum, T. Mandl u. C. Womser-Hacker
Imprint
Boizenburg : VWH, Verl. W. Hülsbusch
Year
2010
Pages
S.172-183
Series
Schriften zur Informationswissenschaft; Bd.58
Abstract
We devised a method to extract keyphrases from the Web pages to construct a keyphrase list for a specific topic. The keyphrases are identified and out-oftopic phrases removed based on their frequencies in the text corpora of various densities of text discussing the topic. The list is intended as a search aid for Web information retrieval, so that the user can browse the list, identify different aspects of the topic, and select from it keyphrases (e.g. find synonymous phrases) for a query. A keyphrase list containing a large set of key-phrases related to climate change was constructed using the proposed method. We argue that there is a need for such keyphrase suggestion tools, because the major Web search engines do not provide users with such terminological search aids that help them identify different topic aspects and find synonyms.

Similar documents (author)

  1. Pirkola, A.: Morphological typology of languages for IR (2001) 5.91
    5.9096622 = sum of:
      5.9096622 = weight(author_txt:pirkola in 477) [ClassicSimilarity], result of:
        5.9096622 = fieldWeight in 477, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.45546 = idf(docFreq=8, maxDocs=42306)
          0.625 = fieldNorm(doc=477)
    
  2. Pirkola, A.; Jarvelin, K.: ¬The effect of anaphor and ellipsis resolution on proximity searching in a text database (1995) 4.73
    4.72773 = sum of:
      4.72773 = weight(author_txt:pirkola in 4157) [ClassicSimilarity], result of:
        4.72773 = fieldWeight in 4157, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.45546 = idf(docFreq=8, maxDocs=42306)
          0.5 = fieldNorm(doc=4157)
    
  3. Pirkola, A.; Järvelin, K.: Employing the resolution power of search keys (2001) 4.73
    4.72773 = sum of:
      4.72773 = weight(author_txt:pirkola in 908) [ClassicSimilarity], result of:
        4.72773 = fieldWeight in 908, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.45546 = idf(docFreq=8, maxDocs=42306)
          0.5 = fieldNorm(doc=908)
    
  4. Pirkola, A.; Puolamäki, D.; Järvelin, K.: Applying query structuring in cross-language retrieval (2003) 3.55
    3.5457973 = sum of:
      3.5457973 = weight(author_txt:pirkola in 3075) [ClassicSimilarity], result of:
        3.5457973 = fieldWeight in 3075, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.45546 = idf(docFreq=8, maxDocs=42306)
          0.375 = fieldNorm(doc=3075)
    
  5. Pirkola, A.; Hedlund, T.; Keskustalo, H.; Järvelin, K.: Dictionary-based cross-language information retrieval : problems, methods, and research findings (2001) 2.95
    2.9548311 = sum of:
      2.9548311 = weight(author_txt:pirkola in 4909) [ClassicSimilarity], result of:
        2.9548311 = fieldWeight in 4909, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.45546 = idf(docFreq=8, maxDocs=42306)
          0.3125 = fieldNorm(doc=4909)
    

Similar documents (content)

  1. Wu, Y.-f.B.; Li, Q.; Bot, R.S.; Chen, X.: Finding nuggets in documents : a machine learning approach (2006) 0.38
    0.37621158 = sum of:
      0.37621158 = product of:
        1.5675483 = sum of:
          0.0065030376 = weight(abstract_txt:that in 291) [ClassicSimilarity], result of:
            0.0065030376 = score(doc=291,freq=2.0), product of:
              0.030593717 = queryWeight, product of:
                1.0178082 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.012499059 = queryNorm
              0.2125612 = fieldWeight in 291, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0625 = fieldNorm(doc=291)
          0.020734621 = weight(abstract_txt:text in 291) [ClassicSimilarity], result of:
            0.020734621 = score(doc=291,freq=2.0), product of:
              0.057896867 = queryWeight, product of:
                1.1432251 = boost
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.012499059 = queryNorm
              0.35813028 = fieldWeight in 291, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.0625 = fieldNorm(doc=291)
          0.021536862 = weight(abstract_txt:search in 291) [ClassicSimilarity], result of:
            0.021536862 = score(doc=291,freq=1.0), product of:
              0.094261125 = queryWeight, product of:
                2.0629349 = boost
                3.6556938 = idf(docFreq=2971, maxDocs=42306)
                0.012499059 = queryNorm
              0.22848086 = fieldWeight in 291, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6556938 = idf(docFreq=2971, maxDocs=42306)
                0.0625 = fieldNorm(doc=291)
          0.105827324 = weight(abstract_txt:phrases in 291) [ClassicSimilarity], result of:
            0.105827324 = score(doc=291,freq=1.0), product of:
              0.2475312 = queryWeight, product of:
                2.8951082 = boost
                6.8405 = idf(docFreq=122, maxDocs=42306)
                0.012499059 = queryNorm
              0.42753124 = fieldWeight in 291, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8405 = idf(docFreq=122, maxDocs=42306)
                0.0625 = fieldNorm(doc=291)
          0.7674683 = weight(abstract_txt:keyphrases in 291) [ClassicSimilarity], result of:
            0.7674683 = score(doc=291,freq=7.0), product of:
              0.4848109 = queryWeight, product of:
                4.051688 = boost
                9.573242 = idf(docFreq=7, maxDocs=42306)
                0.012499059 = queryNorm
              1.583026 = fieldWeight in 291, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.573242 = idf(docFreq=7, maxDocs=42306)
                0.0625 = fieldNorm(doc=291)
          0.6454782 = weight(abstract_txt:keyphrase in 291) [ClassicSimilarity], result of:
            0.6454782 = score(doc=291,freq=3.0), product of:
              0.6306063 = queryWeight, product of:
                5.3357854 = boost
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.012499059 = queryNorm
              1.0235835 = fieldWeight in 291, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.0625 = fieldNorm(doc=291)
        0.24 = coord(6/25)
    
  2. Medelyan, O.; Witten, I.H.: Domain-independent automatic keyphrase indexing with small training sets (2008) 0.29
    0.2922909 = sum of:
      0.2922909 = product of:
        1.0438961 = sum of:
          0.008128797 = weight(abstract_txt:that in 3872) [ClassicSimilarity], result of:
            0.008128797 = score(doc=3872,freq=2.0), product of:
              0.030593717 = queryWeight, product of:
                1.0178082 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.012499059 = queryNorm
              0.2657015 = fieldWeight in 3872, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.078125 = fieldNorm(doc=3872)
          0.022442054 = weight(abstract_txt:specific in 3872) [ClassicSimilarity], result of:
            0.022442054 = score(doc=3872,freq=1.0), product of:
              0.066267826 = queryWeight, product of:
                1.223082 = boost
                4.334808 = idf(docFreq=1506, maxDocs=42306)
                0.012499059 = queryNorm
              0.33865687 = fieldWeight in 3872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.334808 = idf(docFreq=1506, maxDocs=42306)
                0.078125 = fieldNorm(doc=3872)
          0.025691506 = weight(abstract_txt:method in 3872) [ClassicSimilarity], result of:
            0.025691506 = score(doc=3872,freq=1.0), product of:
              0.07251937 = queryWeight, product of:
                1.2794732 = boost
                4.534668 = idf(docFreq=1233, maxDocs=42306)
                0.012499059 = queryNorm
              0.35427094 = fieldWeight in 3872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.534668 = idf(docFreq=1233, maxDocs=42306)
                0.078125 = fieldNorm(doc=3872)
          0.026921079 = weight(abstract_txt:search in 3872) [ClassicSimilarity], result of:
            0.026921079 = score(doc=3872,freq=1.0), product of:
              0.094261125 = queryWeight, product of:
                2.0629349 = boost
                3.6556938 = idf(docFreq=2971, maxDocs=42306)
                0.012499059 = queryNorm
              0.28560108 = fieldWeight in 3872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6556938 = idf(docFreq=2971, maxDocs=42306)
                0.078125 = fieldNorm(doc=3872)
          0.13228415 = weight(abstract_txt:phrases in 3872) [ClassicSimilarity], result of:
            0.13228415 = score(doc=3872,freq=1.0), product of:
              0.2475312 = queryWeight, product of:
                2.8951082 = boost
                6.8405 = idf(docFreq=122, maxDocs=42306)
                0.012499059 = queryNorm
              0.53441405 = fieldWeight in 3872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8405 = idf(docFreq=122, maxDocs=42306)
                0.078125 = fieldNorm(doc=3872)
          0.3625947 = weight(abstract_txt:keyphrases in 3872) [ClassicSimilarity], result of:
            0.3625947 = score(doc=3872,freq=1.0), product of:
              0.4848109 = queryWeight, product of:
                4.051688 = boost
                9.573242 = idf(docFreq=7, maxDocs=42306)
                0.012499059 = queryNorm
              0.74790955 = fieldWeight in 3872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.573242 = idf(docFreq=7, maxDocs=42306)
                0.078125 = fieldNorm(doc=3872)
          0.46583378 = weight(abstract_txt:keyphrase in 3872) [ClassicSimilarity], result of:
            0.46583378 = score(doc=3872,freq=1.0), product of:
              0.6306063 = queryWeight, product of:
                5.3357854 = boost
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.012499059 = queryNorm
              0.7387078 = fieldWeight in 3872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.078125 = fieldNorm(doc=3872)
        0.28 = coord(7/25)
    
  3. Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002) 0.27
    0.27091694 = sum of:
      0.27091694 = product of:
        1.3545847 = sum of:
          0.013006075 = weight(abstract_txt:that in 1602) [ClassicSimilarity], result of:
            0.013006075 = score(doc=1602,freq=8.0), product of:
              0.030593717 = queryWeight, product of:
                1.0178082 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.012499059 = queryNorm
              0.4251224 = fieldWeight in 1602, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0625 = fieldNorm(doc=1602)
          0.01972093 = weight(abstract_txt:tools in 1602) [ClassicSimilarity], result of:
            0.01972093 = score(doc=1602,freq=1.0), product of:
              0.0705482 = queryWeight, product of:
                1.2619646 = boost
                4.4726143 = idf(docFreq=1312, maxDocs=42306)
                0.012499059 = queryNorm
              0.2795384 = fieldWeight in 1602, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4726143 = idf(docFreq=1312, maxDocs=42306)
                0.0625 = fieldNorm(doc=1602)
          0.027358621 = weight(abstract_txt:identify in 1602) [ClassicSimilarity], result of:
            0.027358621 = score(doc=1602,freq=1.0), product of:
              0.08775337 = queryWeight, product of:
                1.4074601 = boost
                4.988275 = idf(docFreq=783, maxDocs=42306)
                0.012499059 = queryNorm
              0.3117672 = fieldWeight in 1602, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.988275 = idf(docFreq=783, maxDocs=42306)
                0.0625 = fieldNorm(doc=1602)
          0.7674683 = weight(abstract_txt:keyphrases in 1602) [ClassicSimilarity], result of:
            0.7674683 = score(doc=1602,freq=7.0), product of:
              0.4848109 = queryWeight, product of:
                4.051688 = boost
                9.573242 = idf(docFreq=7, maxDocs=42306)
                0.012499059 = queryNorm
              1.583026 = fieldWeight in 1602, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.573242 = idf(docFreq=7, maxDocs=42306)
                0.0625 = fieldNorm(doc=1602)
          0.52703077 = weight(abstract_txt:keyphrase in 1602) [ClassicSimilarity], result of:
            0.52703077 = score(doc=1602,freq=2.0), product of:
              0.6306063 = queryWeight, product of:
                5.3357854 = boost
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.012499059 = queryNorm
              0.8357524 = fieldWeight in 1602, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.0625 = fieldNorm(doc=1602)
        0.2 = coord(5/25)
    
  4. Martín-Moncunill, D.; García-Barriocanal, E.; Sicilia, M.-A.; Sánchez-Alonso, S.: Evaluating the practical applicability of thesaurus-based keyphrase extraction in the agricultural domain : insights from the VOA3R project (2015) 0.20
    0.20058711 = sum of:
      0.20058711 = product of:
        1.2536695 = sum of:
          0.0065030376 = weight(abstract_txt:that in 4107) [ClassicSimilarity], result of:
            0.0065030376 = score(doc=4107,freq=2.0), product of:
              0.030593717 = queryWeight, product of:
                1.0178082 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.012499059 = queryNorm
              0.2125612 = fieldWeight in 4107, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0625 = fieldNorm(doc=4107)
          0.021536862 = weight(abstract_txt:search in 4107) [ClassicSimilarity], result of:
            0.021536862 = score(doc=4107,freq=1.0), product of:
              0.094261125 = queryWeight, product of:
                2.0629349 = boost
                3.6556938 = idf(docFreq=2971, maxDocs=42306)
                0.012499059 = queryNorm
              0.22848086 = fieldWeight in 4107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6556938 = idf(docFreq=2971, maxDocs=42306)
                0.0625 = fieldNorm(doc=4107)
          0.5801515 = weight(abstract_txt:keyphrases in 4107) [ClassicSimilarity], result of:
            0.5801515 = score(doc=4107,freq=4.0), product of:
              0.4848109 = queryWeight, product of:
                4.051688 = boost
                9.573242 = idf(docFreq=7, maxDocs=42306)
                0.012499059 = queryNorm
              1.1966553 = fieldWeight in 4107, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.573242 = idf(docFreq=7, maxDocs=42306)
                0.0625 = fieldNorm(doc=4107)
          0.6454782 = weight(abstract_txt:keyphrase in 4107) [ClassicSimilarity], result of:
            0.6454782 = score(doc=4107,freq=3.0), product of:
              0.6306063 = queryWeight, product of:
                5.3357854 = boost
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.012499059 = queryNorm
              1.0235835 = fieldWeight in 4107, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.0625 = fieldNorm(doc=4107)
        0.16 = coord(4/25)
    
  5. Martinez-Romo, J.; Araujo, L.; Fernandez, A.D.: SemGraph : extracting keyphrases following a novel semantic graph-based approach (2016) 0.17
    0.1659536 = sum of:
      0.1659536 = product of:
        0.829768 = sum of:
          0.008128797 = weight(abstract_txt:that in 4833) [ClassicSimilarity], result of:
            0.008128797 = score(doc=4833,freq=2.0), product of:
              0.030593717 = queryWeight, product of:
                1.0178082 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.012499059 = queryNorm
              0.2657015 = fieldWeight in 4833, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.078125 = fieldNorm(doc=4833)
          0.025918275 = weight(abstract_txt:text in 4833) [ClassicSimilarity], result of:
            0.025918275 = score(doc=4833,freq=2.0), product of:
              0.057896867 = queryWeight, product of:
                1.1432251 = boost
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.012499059 = queryNorm
              0.44766283 = fieldWeight in 4833, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.078125 = fieldNorm(doc=4833)
          0.03633327 = weight(abstract_txt:method in 4833) [ClassicSimilarity], result of:
            0.03633327 = score(doc=4833,freq=2.0), product of:
              0.07251937 = queryWeight, product of:
                1.2794732 = boost
                4.534668 = idf(docFreq=1233, maxDocs=42306)
                0.012499059 = queryNorm
              0.5010147 = fieldWeight in 4833, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.534668 = idf(docFreq=1233, maxDocs=42306)
                0.078125 = fieldNorm(doc=4833)
          0.034198277 = weight(abstract_txt:identify in 4833) [ClassicSimilarity], result of:
            0.034198277 = score(doc=4833,freq=1.0), product of:
              0.08775337 = queryWeight, product of:
                1.4074601 = boost
                4.988275 = idf(docFreq=783, maxDocs=42306)
                0.012499059 = queryNorm
              0.389709 = fieldWeight in 4833, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.988275 = idf(docFreq=783, maxDocs=42306)
                0.078125 = fieldNorm(doc=4833)
          0.7251894 = weight(abstract_txt:keyphrases in 4833) [ClassicSimilarity], result of:
            0.7251894 = score(doc=4833,freq=4.0), product of:
              0.4848109 = queryWeight, product of:
                4.051688 = boost
                9.573242 = idf(docFreq=7, maxDocs=42306)
                0.012499059 = queryNorm
              1.4958191 = fieldWeight in 4833, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.573242 = idf(docFreq=7, maxDocs=42306)
                0.078125 = fieldNorm(doc=4833)
        0.2 = coord(5/25)