Document (#13158)

Author
Pirkola, A.
Jarvelin, K.
Title
¬The effect of anaphor and ellipsis resolution on proximity searching in a text database
Source
Information processing and management. 32(1996) no.2, S.199-216
Year
1995
Abstract
So far, methods for ellipsis and anaphor resolution have been developed and the effects of anaphor resolution have been analyzed in the context of statistical information retrieval of scientific abstracts. No significant improvements has been observed. Analyzes the effects of ellipsis and anaphor resolution on proximity searching in a full text database. Anaphora and ellipsis are classified on the basis of the type of their correlates / antecedents rather than, as traditional, on the basis of their own linguistic type. The classification differentiates proper names and common nouns of basic words, compound words, and phrases. The study was carried out in a newspaper article database containing 55.000 full text articles. A set of 154 keyword pairs in different categories was created. Human resolution of keyword ellipsis and anaphora was performed to identify sentences and paragraphs which would match proximity searches after resolution. Findings indicate that ellipsis and anaphor resolution is most relevant for proper name phrases and only marginal in the other keyword categories. Therefore the recall effect of restricted resolution of proper name phrases only was analyzed for keyword pairs containing at least 1 proper name phrase. Findings indicate a recall increase of 38.2% in sentence searches, and 28.8% in paragraph searches when proper name ellipsis were resolved. The recall increase was 17.6% sentence searches, and 19.8% in paragraph searches when proper name anaphora were resolved. Some simple and computationally justifiable resolution method might be developed only for proper name phrases to support keyword based full text information retrieval. Discusses elements of such a method
Theme
Retrievalstudien
Volltextretrieval

Similar documents (author)

  1. Pirkola, A.: Morphological typology of languages for IR (2001) 5.91
    5.9096622 = sum of:
      5.9096622 = weight(author_txt:pirkola in 477) [ClassicSimilarity], result of:
        5.9096622 = fieldWeight in 477, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.45546 = idf(docFreq=8, maxDocs=42306)
          0.625 = fieldNorm(doc=477)
    
  2. Pirkola, A.: Constructing topic-specific search keyphrase suggestion tools for Web information retrieval (2010) 5.91
    5.9096622 = sum of:
      5.9096622 = weight(author_txt:pirkola in 1666) [ClassicSimilarity], result of:
        5.9096622 = fieldWeight in 1666, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.45546 = idf(docFreq=8, maxDocs=42306)
          0.625 = fieldNorm(doc=1666)
    
  3. Pirkola, A.; Järvelin, K.: Employing the resolution power of search keys (2001) 4.73
    4.72773 = sum of:
      4.72773 = weight(author_txt:pirkola in 908) [ClassicSimilarity], result of:
        4.72773 = fieldWeight in 908, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.45546 = idf(docFreq=8, maxDocs=42306)
          0.5 = fieldNorm(doc=908)
    
  4. Pirkola, A.; Puolamäki, D.; Järvelin, K.: Applying query structuring in cross-language retrieval (2003) 3.55
    3.5457973 = sum of:
      3.5457973 = weight(author_txt:pirkola in 3075) [ClassicSimilarity], result of:
        3.5457973 = fieldWeight in 3075, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.45546 = idf(docFreq=8, maxDocs=42306)
          0.375 = fieldNorm(doc=3075)
    
  5. Pirkola, A.; Hedlund, T.; Keskustalo, H.; Järvelin, K.: Dictionary-based cross-language information retrieval : problems, methods, and research findings (2001) 2.95
    2.9548311 = sum of:
      2.9548311 = weight(author_txt:pirkola in 4909) [ClassicSimilarity], result of:
        2.9548311 = fieldWeight in 4909, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.45546 = idf(docFreq=8, maxDocs=42306)
          0.3125 = fieldNorm(doc=4909)
    

Similar documents (content)

  1. Steinberger, J.; Poesio, M.; Kabadjov, M.A.; Jezek, K.: Two uses of anaphora resolution in summarization (2007) 0.34
    0.34180778 = sum of:
      0.34180778 = product of:
        2.1362987 = sum of:
          0.06255712 = weight(abstract_txt:sentence in 2950) [ClassicSimilarity], result of:
            0.06255712 = score(doc=2950,freq=1.0), product of:
              0.11649437 = queryWeight, product of:
                1.282114 = boost
                6.873561 = idf(docFreq=118, maxDocs=42306)
                0.0132189365 = queryNorm
              0.53699696 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.873561 = idf(docFreq=118, maxDocs=42306)
                0.078125 = fieldNorm(doc=2950)
          1.3298936 = weight(title_txt:anaphora in 2950) [ClassicSimilarity], result of:
            1.3298936 = score(doc=2950,freq=1.0), product of:
              0.35963997 = queryWeight, product of:
                2.7590134 = boost
                9.860925 = idf(docFreq=5, maxDocs=42306)
                0.0132189365 = queryNorm
              3.697847 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.860925 = idf(docFreq=5, maxDocs=42306)
                0.375 = fieldNorm(doc=2950)
          0.18917967 = weight(abstract_txt:proper in 2950) [ClassicSimilarity], result of:
            0.18917967 = score(doc=2950,freq=1.0), product of:
              0.3698788 = queryWeight, product of:
                4.2740335 = boost
                6.5467386 = idf(docFreq=164, maxDocs=42306)
                0.0132189365 = queryNorm
              0.51146394 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5467386 = idf(docFreq=164, maxDocs=42306)
                0.078125 = fieldNorm(doc=2950)
          0.5546683 = weight(abstract_txt:resolution in 2950) [ClassicSimilarity], result of:
            0.5546683 = score(doc=2950,freq=3.0), product of:
              0.57126784 = queryWeight, product of:
                6.022826 = boost
                7.1753473 = idf(docFreq=87, maxDocs=42306)
                0.0132189365 = queryNorm
              0.9709427 = fieldWeight in 2950, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.1753473 = idf(docFreq=87, maxDocs=42306)
                0.078125 = fieldNorm(doc=2950)
        0.16 = coord(4/25)
    
  2. Colavizza, G.; Boyack, K.W.; Eck, N.J. van; Waltman, L.: ¬The closer the better : similarity of publication pairs at different cocitation levels (2018) 0.17
    0.17076357 = sum of:
      0.17076357 = product of:
        0.6098699 = sum of:
          0.030843014 = weight(abstract_txt:indicate in 1133) [ClassicSimilarity], result of:
            0.030843014 = score(doc=1133,freq=1.0), product of:
              0.072704 = queryWeight, product of:
                1.0128691 = boost
                5.430108 = idf(docFreq=503, maxDocs=42306)
                0.0132189365 = queryNorm
              0.42422718 = fieldWeight in 1133, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.430108 = idf(docFreq=503, maxDocs=42306)
                0.078125 = fieldNorm(doc=1133)
          0.034904923 = weight(abstract_txt:increase in 1133) [ClassicSimilarity], result of:
            0.034904923 = score(doc=1133,freq=1.0), product of:
              0.07895475 = queryWeight, product of:
                1.0555123 = boost
                5.658723 = idf(docFreq=400, maxDocs=42306)
                0.0132189365 = queryNorm
              0.4420877 = fieldWeight in 1133, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.658723 = idf(docFreq=400, maxDocs=42306)
                0.078125 = fieldNorm(doc=1133)
          0.061879795 = weight(abstract_txt:pairs in 1133) [ClassicSimilarity], result of:
            0.061879795 = score(doc=1133,freq=1.0), product of:
              0.115651965 = queryWeight, product of:
                1.27747 = boost
                6.8486633 = idf(docFreq=121, maxDocs=42306)
                0.0132189365 = queryNorm
              0.5350518 = fieldWeight in 1133, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8486633 = idf(docFreq=121, maxDocs=42306)
                0.078125 = fieldNorm(doc=1133)
          0.08846913 = weight(abstract_txt:sentence in 1133) [ClassicSimilarity], result of:
            0.08846913 = score(doc=1133,freq=2.0), product of:
              0.11649437 = queryWeight, product of:
                1.282114 = boost
                6.873561 = idf(docFreq=118, maxDocs=42306)
                0.0132189365 = queryNorm
              0.7594284 = fieldWeight in 1133, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.873561 = idf(docFreq=118, maxDocs=42306)
                0.078125 = fieldNorm(doc=1133)
          0.025627103 = weight(abstract_txt:text in 1133) [ClassicSimilarity], result of:
            0.025627103 = score(doc=1133,freq=1.0), product of:
              0.08095869 = queryWeight, product of:
                1.5115443 = boost
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.0132189365 = queryNorm
              0.31654543 = fieldWeight in 1133, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.078125 = fieldNorm(doc=1133)
          0.25708947 = weight(abstract_txt:paragraph in 1133) [ClassicSimilarity], result of:
            0.25708947 = score(doc=1133,freq=3.0), product of:
              0.20723808 = queryWeight, product of:
                1.7100506 = boost
                9.167778 = idf(docFreq=11, maxDocs=42306)
                0.0132189365 = queryNorm
              1.2405514 = fieldWeight in 1133, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.167778 = idf(docFreq=11, maxDocs=42306)
                0.078125 = fieldNorm(doc=1133)
          0.11105644 = weight(abstract_txt:proximity in 1133) [ClassicSimilarity], result of:
            0.11105644 = score(doc=1133,freq=1.0), product of:
              0.19551498 = queryWeight, product of:
                2.0342758 = boost
                7.2706575 = idf(docFreq=79, maxDocs=42306)
                0.0132189365 = queryNorm
              0.5680201 = fieldWeight in 1133, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2706575 = idf(docFreq=79, maxDocs=42306)
                0.078125 = fieldNorm(doc=1133)
        0.28 = coord(7/25)
    
  3. Boyack, K.W.; Small, H.; Klavans, R.: Improving the accuracy of co-citation clustering using full text (2013) 0.15
    0.1505268 = sum of:
      0.1505268 = product of:
        0.47039628 = sum of:
          0.013985297 = weight(abstract_txt:been in 3037) [ClassicSimilarity], result of:
            0.013985297 = score(doc=3037,freq=1.0), product of:
              0.04912079 = queryWeight, product of:
                1.0196532 = boost
                3.6443186 = idf(docFreq=3005, maxDocs=42306)
                0.0132189365 = queryNorm
              0.28471237 = fieldWeight in 3037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6443186 = idf(docFreq=3005, maxDocs=42306)
                0.078125 = fieldNorm(doc=3037)
          0.031790778 = weight(abstract_txt:effect in 3037) [ClassicSimilarity], result of:
            0.031790778 = score(doc=3037,freq=1.0), product of:
              0.07418587 = queryWeight, product of:
                1.0231394 = boost
                5.4851675 = idf(docFreq=476, maxDocs=42306)
                0.0132189365 = queryNorm
              0.42852873 = fieldWeight in 3037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4851675 = idf(docFreq=476, maxDocs=42306)
                0.078125 = fieldNorm(doc=3037)
          0.034904923 = weight(abstract_txt:increase in 3037) [ClassicSimilarity], result of:
            0.034904923 = score(doc=3037,freq=1.0), product of:
              0.07895475 = queryWeight, product of:
                1.0555123 = boost
                5.658723 = idf(docFreq=400, maxDocs=42306)
                0.0132189365 = queryNorm
              0.4420877 = fieldWeight in 3037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.658723 = idf(docFreq=400, maxDocs=42306)
                0.078125 = fieldNorm(doc=3037)
          0.031729657 = weight(abstract_txt:only in 3037) [ClassicSimilarity], result of:
            0.031729657 = score(doc=3037,freq=2.0), product of:
              0.06731592 = queryWeight, product of:
                1.1936547 = boost
                4.2662134 = idf(docFreq=1613, maxDocs=42306)
                0.0132189365 = queryNorm
              0.47135442 = fieldWeight in 3037, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2662134 = idf(docFreq=1613, maxDocs=42306)
                0.078125 = fieldNorm(doc=3037)
          0.061879795 = weight(abstract_txt:pairs in 3037) [ClassicSimilarity], result of:
            0.061879795 = score(doc=3037,freq=1.0), product of:
              0.115651965 = queryWeight, product of:
                1.27747 = boost
                6.8486633 = idf(docFreq=121, maxDocs=42306)
                0.0132189365 = queryNorm
              0.5350518 = fieldWeight in 3037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8486633 = idf(docFreq=121, maxDocs=42306)
                0.078125 = fieldNorm(doc=3037)
          0.059363004 = weight(abstract_txt:full in 3037) [ClassicSimilarity], result of:
            0.059363004 = score(doc=3037,freq=3.0), product of:
              0.089286886 = queryWeight, product of:
                1.3747181 = boost
                4.9133477 = idf(docFreq=844, maxDocs=42306)
                0.0132189365 = queryNorm
              0.6648569 = fieldWeight in 3037, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.9133477 = idf(docFreq=844, maxDocs=42306)
                0.078125 = fieldNorm(doc=3037)
          0.044387445 = weight(abstract_txt:text in 3037) [ClassicSimilarity], result of:
            0.044387445 = score(doc=3037,freq=3.0), product of:
              0.08095869 = queryWeight, product of:
                1.5115443 = boost
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.0132189365 = queryNorm
              0.5482728 = fieldWeight in 3037, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.078125 = fieldNorm(doc=3037)
          0.1923554 = weight(abstract_txt:proximity in 3037) [ClassicSimilarity], result of:
            0.1923554 = score(doc=3037,freq=3.0), product of:
              0.19551498 = queryWeight, product of:
                2.0342758 = boost
                7.2706575 = idf(docFreq=79, maxDocs=42306)
                0.0132189365 = queryNorm
              0.9838397 = fieldWeight in 3037, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.2706575 = idf(docFreq=79, maxDocs=42306)
                0.078125 = fieldNorm(doc=3037)
        0.32 = coord(8/25)
    
  4. Bonzi, S.: Representation of concepts in text : a comparison of within-document frequency, anaphora, and synonymy (1991) 0.14
    0.13659106 = sum of:
      0.13659106 = product of:
        1.1382589 = sum of:
          0.030752525 = weight(abstract_txt:text in 4933) [ClassicSimilarity], result of:
            0.030752525 = score(doc=4933,freq=1.0), product of:
              0.08095869 = queryWeight, product of:
                1.5115443 = boost
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.0132189365 = queryNorm
              0.37985453 = fieldWeight in 4933, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.09375 = fieldNorm(doc=4933)
          0.88659567 = weight(title_txt:anaphora in 4933) [ClassicSimilarity], result of:
            0.88659567 = score(doc=4933,freq=1.0), product of:
              0.35963997 = queryWeight, product of:
                2.7590134 = boost
                9.860925 = idf(docFreq=5, maxDocs=42306)
                0.0132189365 = queryNorm
              2.4652312 = fieldWeight in 4933, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.860925 = idf(docFreq=5, maxDocs=42306)
                0.25 = fieldNorm(doc=4933)
          0.22091076 = weight(abstract_txt:keyword in 4933) [ClassicSimilarity], result of:
            0.22091076 = score(doc=4933,freq=3.0), product of:
              0.22512157 = queryWeight, product of:
                2.8180764 = boost
                6.0432124 = idf(docFreq=272, maxDocs=42306)
                0.0132189365 = queryNorm
              0.98129535 = fieldWeight in 4933, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.0432124 = idf(docFreq=272, maxDocs=42306)
                0.09375 = fieldNorm(doc=4933)
        0.12 = coord(3/25)
    
  5. Wu, D.-S.; Liang, T.: Chinese pronominal anaphora resolution using lexical knowledge and entropy-based weight (2008) 0.11
    0.11440459 = sum of:
      0.11440459 = product of:
        1.4300574 = sum of:
          0.88659567 = weight(title_txt:anaphora in 187) [ClassicSimilarity], result of:
            0.88659567 = score(doc=187,freq=1.0), product of:
              0.35963997 = queryWeight, product of:
                2.7590134 = boost
                9.860925 = idf(docFreq=5, maxDocs=42306)
                0.0132189365 = queryNorm
              2.4652312 = fieldWeight in 187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.860925 = idf(docFreq=5, maxDocs=42306)
                0.25 = fieldNorm(doc=187)
          0.54346174 = weight(abstract_txt:resolution in 187) [ClassicSimilarity], result of:
            0.54346174 = score(doc=187,freq=2.0), product of:
              0.57126784 = queryWeight, product of:
                6.022826 = boost
                7.1753473 = idf(docFreq=87, maxDocs=42306)
                0.0132189365 = queryNorm
              0.95132565 = fieldWeight in 187, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1753473 = idf(docFreq=87, maxDocs=42306)
                0.09375 = fieldNorm(doc=187)
        0.08 = coord(2/25)