Document (#23766)

Author
Kaszkiel, M.
Zobel, J.
Title
Effective ranking with arbitrary passages
Source
Journal of the American Society for Information Science and technology. 52(2001) no.4, S.344-364
Year
2001
Abstract
Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material among otherwise irrelevant text. In this article, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents
Theme
Retrievalalgorithmen

Similar documents (author)

  1. Heinz, S.; Zobel, J.: Efficient single-pass index construction for text databases (2003) 4.68
    4.67505 = sum of:
      4.67505 = weight(author_txt:zobel in 2679) [ClassicSimilarity], result of:
        4.67505 = fieldWeight in 2679, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.3501 = idf(docFreq=9, maxDocs=42306)
          0.5 = fieldNorm(doc=2679)
    
  2. Uitdenbogerd, A.L.; Zobel, J.: ¬An architecture for effective music information retrieval (2004) 4.68
    4.67505 = sum of:
      4.67505 = weight(author_txt:zobel in 4056) [ClassicSimilarity], result of:
        4.67505 = fieldWeight in 4056, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.3501 = idf(docFreq=9, maxDocs=42306)
          0.5 = fieldNorm(doc=4056)
    
  3. Hoad, T.C.; Zobel, J.: Methods for identifying versioned and plagiarized documents (2003) 4.68
    4.67505 = sum of:
      4.67505 = weight(author_txt:zobel in 160) [ClassicSimilarity], result of:
        4.67505 = fieldWeight in 160, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.3501 = idf(docFreq=9, maxDocs=42306)
          0.5 = fieldNorm(doc=160)
    
  4. Moffat, A.; Zobel, J.: Self-indexing inverted files for fast text retrieval (1996) 4.68
    4.67505 = sum of:
      4.67505 = weight(author_txt:zobel in 2010) [ClassicSimilarity], result of:
        4.67505 = fieldWeight in 2010, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.3501 = idf(docFreq=9, maxDocs=42306)
          0.5 = fieldNorm(doc=2010)
    
  5. Hawking, D.; Zobel, J.: Does topic metadata help with Web search? (2007) 4.68
    4.67505 = sum of:
      4.67505 = weight(author_txt:zobel in 2205) [ClassicSimilarity], result of:
        4.67505 = fieldWeight in 2205, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.3501 = idf(docFreq=9, maxDocs=42306)
          0.5 = fieldNorm(doc=2205)
    

Similar documents (content)

  1. Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.57
    0.5713815 = sum of:
      0.5713815 = product of:
        1.7855673 = sum of:
          0.042140514 = weight(abstract_txt:blocks in 585) [ClassicSimilarity], result of:
            0.042140514 = score(doc=585,freq=1.0), product of:
              0.099548265 = queryWeight, product of:
                1.0148582 = boost
                7.740661 = idf(docFreq=49, maxDocs=42306)
                0.012672149 = queryNorm
              0.4233174 = fieldWeight in 585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.740661 = idf(docFreq=49, maxDocs=42306)
                0.0546875 = fieldNorm(doc=585)
          0.02466479 = weight(abstract_txt:document in 585) [ClassicSimilarity], result of:
            0.02466479 = score(doc=585,freq=3.0), product of:
              0.060849216 = queryWeight, product of:
                1.1220986 = boost
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.012672149 = queryNorm
              0.40534276 = fieldWeight in 585, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.0546875 = fieldNorm(doc=585)
          0.03626227 = weight(abstract_txt:text in 585) [ClassicSimilarity], result of:
            0.03626227 = score(doc=585,freq=4.0), product of:
              0.08182592 = queryWeight, product of:
                1.5936561 = boost
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.012672149 = queryNorm
              0.44316363 = fieldWeight in 585, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.0546875 = fieldNorm(doc=585)
          0.014654556 = weight(abstract_txt:with in 585) [ClassicSimilarity], result of:
            0.014654556 = score(doc=585,freq=4.0), product of:
              0.05302967 = queryWeight, product of:
                1.6562771 = boost
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.012672149 = queryNorm
              0.27634636 = fieldWeight in 585, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.0546875 = fieldNorm(doc=585)
          0.051710676 = weight(abstract_txt:length in 585) [ClassicSimilarity], result of:
            0.051710676 = score(doc=585,freq=1.0), product of:
              0.14375754 = queryWeight, product of:
                1.724721 = boost
                6.5775104 = idf(docFreq=159, maxDocs=42306)
                0.012672149 = queryNorm
              0.3597076 = fieldWeight in 585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5775104 = idf(docFreq=159, maxDocs=42306)
                0.0546875 = fieldNorm(doc=585)
          0.7549158 = weight(abstract_txt:passage in 585) [ClassicSimilarity], result of:
            0.7549158 = score(doc=585,freq=14.0), product of:
              0.44889364 = queryWeight, product of:
                4.3101287 = boost
                8.218697 = idf(docFreq=30, maxDocs=42306)
                0.012672149 = queryNorm
              1.6817253 = fieldWeight in 585, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.218697 = idf(docFreq=30, maxDocs=42306)
                0.0546875 = fieldNorm(doc=585)
          0.087775834 = weight(abstract_txt:documents in 585) [ClassicSimilarity], result of:
            0.087775834 = score(doc=585,freq=3.0), product of:
              0.22515073 = queryWeight, product of:
                4.316882 = boost
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.012672149 = queryNorm
              0.38985363 = fieldWeight in 585, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.0546875 = fieldNorm(doc=585)
          0.7734428 = weight(abstract_txt:passages in 585) [ClassicSimilarity], result of:
            0.7734428 = score(doc=585,freq=14.0), product of:
              0.45620838 = queryWeight, product of:
                4.3451037 = boost
                8.285388 = idf(docFreq=28, maxDocs=42306)
                0.012672149 = queryNorm
              1.6953717 = fieldWeight in 585, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.285388 = idf(docFreq=28, maxDocs=42306)
                0.0546875 = fieldNorm(doc=585)
        0.32 = coord(8/25)
    
  2. Stamatatos, E.: Plagiarism detection using stopword n-grams (2011) 0.21
    0.20970136 = sum of:
      0.20970136 = product of:
        0.8737557 = sum of:
          0.020343175 = weight(abstract_txt:document in 1956) [ClassicSimilarity], result of:
            0.020343175 = score(doc=1956,freq=1.0), product of:
              0.060849216 = queryWeight, product of:
                1.1220986 = boost
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.012672149 = queryNorm
              0.33432108 = fieldWeight in 1956, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.078125 = fieldNorm(doc=1956)
          0.027050328 = weight(abstract_txt:collections in 1956) [ClassicSimilarity], result of:
            0.027050328 = score(doc=1956,freq=1.0), product of:
              0.07357962 = queryWeight, product of:
                1.2339066 = boost
                4.705708 = idf(docFreq=1039, maxDocs=42306)
                0.012672149 = queryNorm
              0.36763343 = fieldWeight in 1956, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.705708 = idf(docFreq=1039, maxDocs=42306)
                0.078125 = fieldNorm(doc=1956)
          0.018130314 = weight(abstract_txt:with in 1956) [ClassicSimilarity], result of:
            0.018130314 = score(doc=1956,freq=3.0), product of:
              0.05302967 = queryWeight, product of:
                1.6562771 = boost
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.012672149 = queryNorm
              0.34188998 = fieldWeight in 1956, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.078125 = fieldNorm(doc=1956)
          0.28822818 = weight(abstract_txt:passage in 1956) [ClassicSimilarity], result of:
            0.28822818 = score(doc=1956,freq=1.0), product of:
              0.44889364 = queryWeight, product of:
                4.3101287 = boost
                8.218697 = idf(docFreq=30, maxDocs=42306)
                0.012672149 = queryNorm
              0.6420857 = fieldWeight in 1956, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.218697 = idf(docFreq=30, maxDocs=42306)
                0.078125 = fieldNorm(doc=1956)
          0.102383815 = weight(abstract_txt:documents in 1956) [ClassicSimilarity], result of:
            0.102383815 = score(doc=1956,freq=2.0), product of:
              0.22515073 = queryWeight, product of:
                4.316882 = boost
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.012672149 = queryNorm
              0.45473453 = fieldWeight in 1956, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.078125 = fieldNorm(doc=1956)
          0.41761985 = weight(abstract_txt:passages in 1956) [ClassicSimilarity], result of:
            0.41761985 = score(doc=1956,freq=2.0), product of:
              0.45620838 = queryWeight, product of:
                4.3451037 = boost
                8.285388 = idf(docFreq=28, maxDocs=42306)
                0.012672149 = queryNorm
              0.9154147 = fieldWeight in 1956, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.285388 = idf(docFreq=28, maxDocs=42306)
                0.078125 = fieldNorm(doc=1956)
        0.24 = coord(6/25)
    
  3. Wan, X.; Yang, J.; Xiao, J.: Towards a unified approach to document similarity search using manifold-ranking of blocks (2008) 0.20
    0.20017962 = sum of:
      0.20017962 = product of:
        0.7149272 = sum of:
          0.10769035 = weight(abstract_txt:blocks in 4082) [ClassicSimilarity], result of:
            0.10769035 = score(doc=4082,freq=5.0), product of:
              0.099548265 = queryWeight, product of:
                1.0148582 = boost
                7.740661 = idf(docFreq=49, maxDocs=42306)
                0.012672149 = queryNorm
              1.0817903 = fieldWeight in 4082, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.740661 = idf(docFreq=49, maxDocs=42306)
                0.0625 = fieldNorm(doc=4082)
          0.046031352 = weight(abstract_txt:document in 4082) [ClassicSimilarity], result of:
            0.046031352 = score(doc=4082,freq=8.0), product of:
              0.060849216 = queryWeight, product of:
                1.1220986 = boost
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.012672149 = queryNorm
              0.75648224 = fieldWeight in 4082, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.0625 = fieldNorm(doc=4082)
          0.04364846 = weight(abstract_txt:whole in 4082) [ClassicSimilarity], result of:
            0.04364846 = score(doc=4082,freq=1.0), product of:
              0.11746138 = queryWeight, product of:
                1.5590178 = boost
                5.945574 = idf(docFreq=300, maxDocs=42306)
                0.012672149 = queryNorm
              0.37159836 = fieldWeight in 4082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.945574 = idf(docFreq=300, maxDocs=42306)
                0.0625 = fieldNorm(doc=4082)
          0.035890337 = weight(abstract_txt:text in 4082) [ClassicSimilarity], result of:
            0.035890337 = score(doc=4082,freq=3.0), product of:
              0.08182592 = queryWeight, product of:
                1.5936561 = boost
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.012672149 = queryNorm
              0.4386182 = fieldWeight in 4082, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.0625 = fieldNorm(doc=4082)
          0.008374033 = weight(abstract_txt:with in 4082) [ClassicSimilarity], result of:
            0.008374033 = score(doc=4082,freq=1.0), product of:
              0.05302967 = queryWeight, product of:
                1.6562771 = boost
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.012672149 = queryNorm
              0.15791221 = fieldWeight in 4082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.0625 = fieldNorm(doc=4082)
          0.11583406 = weight(abstract_txt:documents in 4082) [ClassicSimilarity], result of:
            0.11583406 = score(doc=4082,freq=4.0), product of:
              0.22515073 = queryWeight, product of:
                4.316882 = boost
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.012672149 = queryNorm
              0.5144734 = fieldWeight in 4082, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.0625 = fieldNorm(doc=4082)
          0.3574586 = weight(abstract_txt:ranking in 4082) [ClassicSimilarity], result of:
            0.3574586 = score(doc=4082,freq=6.0), product of:
              0.41690385 = queryWeight, product of:
                5.874237 = boost
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.012672149 = queryNorm
              0.8574125 = fieldWeight in 4082, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.0625 = fieldNorm(doc=4082)
        0.28 = coord(7/25)
    
  4. Moura, E.S. de; Fernandes, D.; Ribeiro-Neto, B.; Silva, A.S. da; Gonçalves, M.A.: Using structural information to improve search in Web collections (2010) 0.16
    0.15838127 = sum of:
      0.15838127 = product of:
        0.65992194 = sum of:
          0.0851367 = weight(abstract_txt:blocks in 1120) [ClassicSimilarity], result of:
            0.0851367 = score(doc=1120,freq=2.0), product of:
              0.099548265 = queryWeight, product of:
                1.0148582 = boost
                7.740661 = idf(docFreq=49, maxDocs=42306)
                0.012672149 = queryNorm
              0.85523033 = fieldWeight in 1120, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.740661 = idf(docFreq=49, maxDocs=42306)
                0.078125 = fieldNorm(doc=1120)
          0.020343175 = weight(abstract_txt:document in 1120) [ClassicSimilarity], result of:
            0.020343175 = score(doc=1120,freq=1.0), product of:
              0.060849216 = queryWeight, product of:
                1.1220986 = boost
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.012672149 = queryNorm
              0.33432108 = fieldWeight in 1120, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2793097 = idf(docFreq=1592, maxDocs=42306)
                0.078125 = fieldNorm(doc=1120)
          0.03825494 = weight(abstract_txt:collections in 1120) [ClassicSimilarity], result of:
            0.03825494 = score(doc=1120,freq=2.0), product of:
              0.07357962 = queryWeight, product of:
                1.2339066 = boost
                4.705708 = idf(docFreq=1039, maxDocs=42306)
                0.012672149 = queryNorm
              0.5199122 = fieldWeight in 1120, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.705708 = idf(docFreq=1039, maxDocs=42306)
                0.078125 = fieldNorm(doc=1120)
          0.054560572 = weight(abstract_txt:whole in 1120) [ClassicSimilarity], result of:
            0.054560572 = score(doc=1120,freq=1.0), product of:
              0.11746138 = queryWeight, product of:
                1.5590178 = boost
                5.945574 = idf(docFreq=300, maxDocs=42306)
                0.012672149 = queryNorm
              0.46449795 = fieldWeight in 1120, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.945574 = idf(docFreq=300, maxDocs=42306)
                0.078125 = fieldNorm(doc=1120)
          0.014803338 = weight(abstract_txt:with in 1120) [ClassicSimilarity], result of:
            0.014803338 = score(doc=1120,freq=2.0), product of:
              0.05302967 = queryWeight, product of:
                1.6562771 = boost
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.012672149 = queryNorm
              0.27915198 = fieldWeight in 1120, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.078125 = fieldNorm(doc=1120)
          0.44682324 = weight(abstract_txt:ranking in 1120) [ClassicSimilarity], result of:
            0.44682324 = score(doc=1120,freq=6.0), product of:
              0.41690385 = queryWeight, product of:
                5.874237 = boost
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.012672149 = queryNorm
              1.0717657 = fieldWeight in 1120, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.078125 = fieldNorm(doc=1120)
        0.24 = coord(6/25)
    
  5. Otterbacher, J.; Erkan, G.; Radev, D.R.: Biased LexRank : passage retrieval using random walks with question-based priors (2009) 0.15
    0.14901844 = sum of:
      0.14901844 = product of:
        1.2418203 = sum of:
          0.04395651 = weight(abstract_txt:text in 270) [ClassicSimilarity], result of:
            0.04395651 = score(doc=270,freq=2.0), product of:
              0.08182592 = queryWeight, product of:
                1.5936561 = boost
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.012672149 = queryNorm
              0.53719544 = fieldWeight in 270, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.09375 = fieldNorm(doc=270)
          0.48913938 = weight(abstract_txt:passage in 270) [ClassicSimilarity], result of:
            0.48913938 = score(doc=270,freq=2.0), product of:
              0.44889364 = queryWeight, product of:
                4.3101287 = boost
                8.218697 = idf(docFreq=30, maxDocs=42306)
                0.012672149 = queryNorm
              1.0896554 = fieldWeight in 270, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.218697 = idf(docFreq=30, maxDocs=42306)
                0.09375 = fieldNorm(doc=270)
          0.7087244 = weight(abstract_txt:passages in 270) [ClassicSimilarity], result of:
            0.7087244 = score(doc=270,freq=4.0), product of:
              0.45620838 = queryWeight, product of:
                4.3451037 = boost
                8.285388 = idf(docFreq=28, maxDocs=42306)
                0.012672149 = queryNorm
              1.5535102 = fieldWeight in 270, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.285388 = idf(docFreq=28, maxDocs=42306)
                0.09375 = fieldNorm(doc=270)
        0.12 = coord(3/25)