Document (#13155)

Author
Pirkola, A.
Jarvelin, K.
Title
¬The effect of anaphor and ellipsis resolution on proximity searching in a text database
Source
Information processing and management. 32(1996) no.2, S.199-216
Year
1995
Abstract
So far, methods for ellipsis and anaphor resolution have been developed and the effects of anaphor resolution have been analyzed in the context of statistical information retrieval of scientific abstracts. No significant improvements has been observed. Analyzes the effects of ellipsis and anaphor resolution on proximity searching in a full text database. Anaphora and ellipsis are classified on the basis of the type of their correlates / antecedents rather than, as traditional, on the basis of their own linguistic type. The classification differentiates proper names and common nouns of basic words, compound words, and phrases. The study was carried out in a newspaper article database containing 55.000 full text articles. A set of 154 keyword pairs in different categories was created. Human resolution of keyword ellipsis and anaphora was performed to identify sentences and paragraphs which would match proximity searches after resolution. Findings indicate that ellipsis and anaphor resolution is most relevant for proper name phrases and only marginal in the other keyword categories. Therefore the recall effect of restricted resolution of proper name phrases only was analyzed for keyword pairs containing at least 1 proper name phrase. Findings indicate a recall increase of 38.2% in sentence searches, and 28.8% in paragraph searches when proper name ellipsis were resolved. The recall increase was 17.6% sentence searches, and 19.8% in paragraph searches when proper name anaphora were resolved. Some simple and computationally justifiable resolution method might be developed only for proper name phrases to support keyword based full text information retrieval. Discusses elements of such a method
Theme
Retrievalstudien
Volltextretrieval

Similar documents (author)

  1. Pirkola, A.: Morphological typology of languages for IR (2001) 5.93
    5.927861 = sum of:
      5.927861 = weight(author_txt:pirkola in 474) [ClassicSimilarity], result of:
        5.927861 = fieldWeight in 474, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.484578 = idf(docFreq=8, maxDocs=43556)
          0.625 = fieldNorm(doc=474)
    
  2. Pirkola, A.: Constructing topic-specific search keyphrase suggestion tools for Web information retrieval (2010) 5.93
    5.927861 = sum of:
      5.927861 = weight(author_txt:pirkola in 1663) [ClassicSimilarity], result of:
        5.927861 = fieldWeight in 1663, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.484578 = idf(docFreq=8, maxDocs=43556)
          0.625 = fieldNorm(doc=1663)
    
  3. Pirkola, A.; Järvelin, K.: Employing the resolution power of search keys (2001) 4.74
    4.742289 = sum of:
      4.742289 = weight(author_txt:pirkola in 905) [ClassicSimilarity], result of:
        4.742289 = fieldWeight in 905, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.484578 = idf(docFreq=8, maxDocs=43556)
          0.5 = fieldNorm(doc=905)
    
  4. Pirkola, A.; Puolamäki, D.; Järvelin, K.: Applying query structuring in cross-language retrieval (2003) 3.56
    3.556717 = sum of:
      3.556717 = weight(author_txt:pirkola in 3072) [ClassicSimilarity], result of:
        3.556717 = fieldWeight in 3072, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.484578 = idf(docFreq=8, maxDocs=43556)
          0.375 = fieldNorm(doc=3072)
    
  5. Pirkola, A.; Hedlund, T.; Keskustalo, H.; Järvelin, K.: Dictionary-based cross-language information retrieval : problems, methods, and research findings (2001) 2.96
    2.9639306 = sum of:
      2.9639306 = weight(author_txt:pirkola in 4906) [ClassicSimilarity], result of:
        2.9639306 = fieldWeight in 4906, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.484578 = idf(docFreq=8, maxDocs=43556)
          0.3125 = fieldNorm(doc=4906)
    

Similar documents (content)

  1. Steinberger, J.; Poesio, M.; Kabadjov, M.A.; Jezek, K.: Two uses of anaphora resolution in summarization (2007) 0.34
    0.344006 = sum of:
      0.344006 = product of:
        2.1500375 = sum of:
          0.0620245 = weight(abstract_txt:sentence in 2947) [ClassicSimilarity], result of:
            0.0620245 = score(doc=2947,freq=1.0), product of:
              0.11584079 = queryWeight, product of:
                1.2851906 = boost
                6.853489 = idf(docFreq=124, maxDocs=43556)
                0.013151711 = queryNorm
              0.5354288 = fieldWeight in 2947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.853489 = idf(docFreq=124, maxDocs=43556)
                0.078125 = fieldNorm(doc=2947)
          1.3420082 = weight(title_txt:anaphora in 2947) [ClassicSimilarity], result of:
            1.3420082 = score(doc=2947,freq=1.0), product of:
              0.3618476 = queryWeight, product of:
                2.781924 = boost
                9.890043 = idf(docFreq=5, maxDocs=43556)
                0.013151711 = queryNorm
              3.7087662 = fieldWeight in 2947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.890043 = idf(docFreq=5, maxDocs=43556)
                0.375 = fieldNorm(doc=2947)
          0.18967006 = weight(abstract_txt:proper in 2947) [ClassicSimilarity], result of:
            0.18967006 = score(doc=2947,freq=1.0), product of:
              0.3705452 = queryWeight, product of:
                4.300227 = boost
                6.551904 = idf(docFreq=168, maxDocs=43556)
                0.013151711 = queryNorm
              0.5118675 = fieldWeight in 2947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.551904 = idf(docFreq=168, maxDocs=43556)
                0.078125 = fieldNorm(doc=2947)
          0.5563347 = weight(abstract_txt:resolution in 2947) [ClassicSimilarity], result of:
            0.5563347 = score(doc=2947,freq=3.0), product of:
              0.5724539 = queryWeight, product of:
                6.060568 = boost
                7.181993 = idf(docFreq=89, maxDocs=43556)
                0.013151711 = queryNorm
              0.9718419 = fieldWeight in 2947, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.181993 = idf(docFreq=89, maxDocs=43556)
                0.078125 = fieldNorm(doc=2947)
        0.16 = coord(4/25)
    
  2. Colavizza, G.; Boyack, K.W.; Eck, N.J. van; Waltman, L.: ¬The closer the better : similarity of publication pairs at different cocitation levels (2018) 0.17
    0.16638418 = sum of:
      0.16638418 = product of:
        0.5942292 = sum of:
          0.030144187 = weight(abstract_txt:indicate in 500) [ClassicSimilarity], result of:
            0.030144187 = score(doc=500,freq=1.0), product of:
              0.07160693 = queryWeight, product of:
                1.0104488 = boost
                5.3883834 = idf(docFreq=540, maxDocs=43556)
                0.013151711 = queryNorm
              0.42096746 = fieldWeight in 500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3883834 = idf(docFreq=540, maxDocs=43556)
                0.078125 = fieldNorm(doc=500)
          0.03424981 = weight(abstract_txt:increase in 500) [ClassicSimilarity], result of:
            0.03424981 = score(doc=500,freq=1.0), product of:
              0.0779695 = queryWeight, product of:
                1.0543848 = boost
                5.6226797 = idf(docFreq=427, maxDocs=43556)
                0.013151711 = queryNorm
              0.43927187 = fieldWeight in 500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6226797 = idf(docFreq=427, maxDocs=43556)
                0.078125 = fieldNorm(doc=500)
          0.061173227 = weight(abstract_txt:pairs in 500) [ClassicSimilarity], result of:
            0.061173227 = score(doc=500,freq=1.0), product of:
              0.11477843 = queryWeight, product of:
                1.2792839 = boost
                6.8219905 = idf(docFreq=128, maxDocs=43556)
                0.013151711 = queryNorm
              0.532968 = fieldWeight in 500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8219905 = idf(docFreq=128, maxDocs=43556)
                0.078125 = fieldNorm(doc=500)
          0.087715894 = weight(abstract_txt:sentence in 500) [ClassicSimilarity], result of:
            0.087715894 = score(doc=500,freq=2.0), product of:
              0.11584079 = queryWeight, product of:
                1.2851906 = boost
                6.853489 = idf(docFreq=124, maxDocs=43556)
                0.013151711 = queryNorm
              0.75721073 = fieldWeight in 500, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.853489 = idf(docFreq=124, maxDocs=43556)
                0.078125 = fieldNorm(doc=500)
          0.025587667 = weight(abstract_txt:text in 500) [ClassicSimilarity], result of:
            0.025587667 = score(doc=500,freq=1.0), product of:
              0.08088161 = queryWeight, product of:
                1.5187163 = boost
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.013151711 = queryNorm
              0.31635952 = fieldWeight in 500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.078125 = fieldNorm(doc=500)
          0.24676844 = weight(abstract_txt:paragraph in 500) [ClassicSimilarity], result of:
            0.24676844 = score(doc=500,freq=3.0), product of:
              0.20166878 = queryWeight, product of:
                1.6957277 = boost
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.013151711 = queryNorm
              1.2236323 = fieldWeight in 500, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.078125 = fieldNorm(doc=500)
          0.10859003 = weight(abstract_txt:proximity in 500) [ClassicSimilarity], result of:
            0.10859003 = score(doc=500,freq=1.0), product of:
              0.19262369 = queryWeight, product of:
                2.0297253 = boost
                7.2158947 = idf(docFreq=86, maxDocs=43556)
                0.013151711 = queryNorm
              0.5637418 = fieldWeight in 500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2158947 = idf(docFreq=86, maxDocs=43556)
                0.078125 = fieldNorm(doc=500)
        0.28 = coord(7/25)
    
  3. Boyack, K.W.; Small, H.; Klavans, R.: Improving the accuracy of co-citation clustering using full text (2013) 0.15
    0.14840242 = sum of:
      0.14840242 = product of:
        0.46375757 = sum of:
          0.013818537 = weight(abstract_txt:been in 3034) [ClassicSimilarity], result of:
            0.013818537 = score(doc=3034,freq=1.0), product of:
              0.04873315 = queryWeight, product of:
                1.020927 = boost
                3.6295063 = idf(docFreq=3140, maxDocs=43556)
                0.013151711 = queryNorm
              0.28355518 = fieldWeight in 3034, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6295063 = idf(docFreq=3140, maxDocs=43556)
                0.078125 = fieldNorm(doc=3034)
          0.0313147 = weight(abstract_txt:effect in 3034) [ClassicSimilarity], result of:
            0.0313147 = score(doc=3034,freq=1.0), product of:
              0.07344882 = queryWeight, product of:
                1.0233618 = boost
                5.4572444 = idf(docFreq=504, maxDocs=43556)
                0.013151711 = queryNorm
              0.42634723 = fieldWeight in 3034, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4572444 = idf(docFreq=504, maxDocs=43556)
                0.078125 = fieldNorm(doc=3034)
          0.03424981 = weight(abstract_txt:increase in 3034) [ClassicSimilarity], result of:
            0.03424981 = score(doc=3034,freq=1.0), product of:
              0.0779695 = queryWeight, product of:
                1.0543848 = boost
                5.6226797 = idf(docFreq=427, maxDocs=43556)
                0.013151711 = queryNorm
              0.43927187 = fieldWeight in 3034, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6226797 = idf(docFreq=427, maxDocs=43556)
                0.078125 = fieldNorm(doc=3034)
          0.031256728 = weight(abstract_txt:only in 3034) [ClassicSimilarity], result of:
            0.031256728 = score(doc=3034,freq=2.0), product of:
              0.0666503 = queryWeight, product of:
                1.1939427 = boost
                4.2445965 = idf(docFreq=1697, maxDocs=43556)
                0.013151711 = queryNorm
              0.46896607 = fieldWeight in 3034, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2445965 = idf(docFreq=1697, maxDocs=43556)
                0.078125 = fieldNorm(doc=3034)
          0.061173227 = weight(abstract_txt:pairs in 3034) [ClassicSimilarity], result of:
            0.061173227 = score(doc=3034,freq=1.0), product of:
              0.11477843 = queryWeight, product of:
                1.2792839 = boost
                6.8219905 = idf(docFreq=128, maxDocs=43556)
                0.013151711 = queryNorm
              0.532968 = fieldWeight in 3034, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8219905 = idf(docFreq=128, maxDocs=43556)
                0.078125 = fieldNorm(doc=3034)
          0.05954205 = weight(abstract_txt:full in 3034) [ClassicSimilarity], result of:
            0.05954205 = score(doc=3034,freq=3.0), product of:
              0.08947297 = queryWeight, product of:
                1.3833381 = boost
                4.9179177 = idf(docFreq=865, maxDocs=43556)
                0.013151711 = queryNorm
              0.66547525 = fieldWeight in 3034, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.9179177 = idf(docFreq=865, maxDocs=43556)
                0.078125 = fieldNorm(doc=3034)
          0.044319138 = weight(abstract_txt:text in 3034) [ClassicSimilarity], result of:
            0.044319138 = score(doc=3034,freq=3.0), product of:
              0.08088161 = queryWeight, product of:
                1.5187163 = boost
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.013151711 = queryNorm
              0.54795074 = fieldWeight in 3034, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.078125 = fieldNorm(doc=3034)
          0.18808343 = weight(abstract_txt:proximity in 3034) [ClassicSimilarity], result of:
            0.18808343 = score(doc=3034,freq=3.0), product of:
              0.19262369 = queryWeight, product of:
                2.0297253 = boost
                7.2158947 = idf(docFreq=86, maxDocs=43556)
                0.013151711 = queryNorm
              0.97642934 = fieldWeight in 3034, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.2158947 = idf(docFreq=86, maxDocs=43556)
                0.078125 = fieldNorm(doc=3034)
        0.32 = coord(8/25)
    
  4. Bonzi, S.: Representation of concepts in text : a comparison of within-document frequency, anaphora, and synonymy (1991) 0.14
    0.13751684 = sum of:
      0.13751684 = product of:
        1.1459737 = sum of:
          0.030705199 = weight(abstract_txt:text in 4930) [ClassicSimilarity], result of:
            0.030705199 = score(doc=4930,freq=1.0), product of:
              0.08088161 = queryWeight, product of:
                1.5187163 = boost
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.013151711 = queryNorm
              0.3796314 = fieldWeight in 4930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.09375 = fieldNorm(doc=4930)
          0.89467216 = weight(title_txt:anaphora in 4930) [ClassicSimilarity], result of:
            0.89467216 = score(doc=4930,freq=1.0), product of:
              0.3618476 = queryWeight, product of:
                2.781924 = boost
                9.890043 = idf(docFreq=5, maxDocs=43556)
                0.013151711 = queryNorm
              2.4725108 = fieldWeight in 4930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.890043 = idf(docFreq=5, maxDocs=43556)
                0.25 = fieldNorm(doc=4930)
          0.22059631 = weight(abstract_txt:keyword in 4930) [ClassicSimilarity], result of:
            0.22059631 = score(doc=4930,freq=3.0), product of:
              0.2249246 = queryWeight, product of:
                2.8315568 = boost
                6.0398955 = idf(docFreq=281, maxDocs=43556)
                0.013151711 = queryNorm
              0.98075676 = fieldWeight in 4930, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.0398955 = idf(docFreq=281, maxDocs=43556)
                0.09375 = fieldNorm(doc=4930)
        0.12 = coord(3/25)
    
  5. Wu, D.-S.; Liang, T.: Chinese pronominal anaphora resolution using lexical knowledge and entropy-based weight (2008) 0.12
    0.11518133 = sum of:
      0.11518133 = product of:
        1.4397666 = sum of:
          0.89467216 = weight(title_txt:anaphora in 4365) [ClassicSimilarity], result of:
            0.89467216 = score(doc=4365,freq=1.0), product of:
              0.3618476 = queryWeight, product of:
                2.781924 = boost
                9.890043 = idf(docFreq=5, maxDocs=43556)
                0.013151711 = queryNorm
              2.4725108 = fieldWeight in 4365, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.890043 = idf(docFreq=5, maxDocs=43556)
                0.25 = fieldNorm(doc=4365)
          0.5450945 = weight(abstract_txt:resolution in 4365) [ClassicSimilarity], result of:
            0.5450945 = score(doc=4365,freq=2.0), product of:
              0.5724539 = queryWeight, product of:
                6.060568 = boost
                7.181993 = idf(docFreq=89, maxDocs=43556)
                0.013151711 = queryNorm
              0.95220673 = fieldWeight in 4365, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.181993 = idf(docFreq=89, maxDocs=43556)
                0.09375 = fieldNorm(doc=4365)
        0.08 = coord(2/25)