Document (#9897)

Cox, K.
¬An experiment to test the utility of repeating phrases for information retrieval systems
Journal of information science. 20(1994) no.5, S.348-355
Describes a method of evaluating the utility of repeating phrases for information retrieval systems and calculating recall and precision. The technique compares 2 different techniques by asking people to perform tasks and then examining the outcomes of those tasks. Shows that people found those phrases automatically generated by finding repeating content words in documents to be good content indicators and discriminators were better than using words alone or phrases generated randomly from content words

Similar documents (content)

  1. Fagan, J.L.: ¬The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval (1989) 0.12
    0.12446436 = sum of:
      0.12446436 = product of:
        0.51860154 = sum of:
          0.031880785 = weight(abstract_txt:indicators in 1845) [ClassicSimilarity], result of:
            0.031880785 = score(doc=1845,freq=1.0), product of:
              0.08463448 = queryWeight, product of:
                1.0782026 = boost
                6.027006 = idf(docFreq=289, maxDocs=44218)
                0.013024027 = queryNorm
              0.37668788 = fieldWeight in 1845, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.027006 = idf(docFreq=289, maxDocs=44218)
                0.0625 = fieldNorm(doc=1845)
          0.016358733 = weight(abstract_txt:systems in 1845) [ClassicSimilarity], result of:
            0.016358733 = score(doc=1845,freq=2.0), product of:
              0.054245174 = queryWeight, product of:
                1.2207375 = boost
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.013024027 = queryNorm
              0.3015703 = fieldWeight in 1845, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.0625 = fieldNorm(doc=1845)
          0.02117051 = weight(abstract_txt:retrieval in 1845) [ClassicSimilarity], result of:
            0.02117051 = score(doc=1845,freq=3.0), product of:
              0.056275383 = queryWeight, product of:
                1.2433716 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.013024027 = queryNorm
              0.37619486 = fieldWeight in 1845, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=1845)
          0.055259477 = weight(abstract_txt:content in 1845) [ClassicSimilarity], result of:
            0.055259477 = score(doc=1845,freq=3.0), product of:
              0.12212349 = queryWeight, product of:
                2.2432978 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.013024027 = queryNorm
              0.45248854 = fieldWeight in 1845, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.0625 = fieldNorm(doc=1845)
          0.06700986 = weight(abstract_txt:words in 1845) [ClassicSimilarity], result of:
            0.06700986 = score(doc=1845,freq=1.0), product of:
              0.20029075 = queryWeight, product of:
                2.8728821 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.013024027 = queryNorm
              0.33456293 = fieldWeight in 1845, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=1845)
          0.32692215 = weight(abstract_txt:phrases in 1845) [ClassicSimilarity], result of:
            0.32692215 = score(doc=1845,freq=3.0), product of:
              0.43968046 = queryWeight, product of:
                4.91502 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.013024027 = queryNorm
              0.7435449 = fieldWeight in 1845, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0625 = fieldNorm(doc=1845)
        0.24 = coord(6/25)
  2. Kim, W.; Wilbur, W.J.: Corpus-based statistical screening for content-bearing terms (2001) 0.11
    0.1106712 = sum of:
      0.1106712 = product of:
        0.553356 = sum of:
          0.02075051 = weight(abstract_txt:recall in 5188) [ClassicSimilarity], result of:
            0.02075051 = score(doc=5188,freq=1.0), product of:
              0.077002764 = queryWeight, product of:
                1.0284421 = boost
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.013024027 = queryNorm
              0.26947746 = fieldWeight in 5188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.01751601 = weight(abstract_txt:those in 5188) [ClassicSimilarity], result of:
            0.01751601 = score(doc=5188,freq=1.0), product of:
              0.08665373 = queryWeight, product of:
                1.5428914 = boost
                4.312277 = idf(docFreq=1610, maxDocs=44218)
                0.013024027 = queryNorm
              0.20213798 = fieldWeight in 5188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.312277 = idf(docFreq=1610, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.05350476 = weight(abstract_txt:content in 5188) [ClassicSimilarity], result of:
            0.05350476 = score(doc=5188,freq=5.0), product of:
              0.12212349 = queryWeight, product of:
                2.2432978 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.013024027 = queryNorm
              0.43812016 = fieldWeight in 5188, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.08704837 = weight(abstract_txt:words in 5188) [ClassicSimilarity], result of:
            0.08704837 = score(doc=5188,freq=3.0), product of:
              0.20029075 = queryWeight, product of:
                2.8728821 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.013024027 = queryNorm
              0.43461 = fieldWeight in 5188, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.37453637 = weight(abstract_txt:phrases in 5188) [ClassicSimilarity], result of:
            0.37453637 = score(doc=5188,freq=7.0), product of:
              0.43968046 = queryWeight, product of:
                4.91502 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.013024027 = queryNorm
              0.85183764 = fieldWeight in 5188, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
        0.2 = coord(5/25)
  3. Lin, X.: Searching and browsing on map displays (1995) 0.11
    0.10784026 = sum of:
      0.10784026 = product of:
        0.3851438 = sum of:
          0.043740403 = weight(abstract_txt:compares in 3852) [ClassicSimilarity], result of:
            0.043740403 = score(doc=3852,freq=1.0), product of:
              0.07974847 = queryWeight, product of:
                1.0466173 = boost
                5.8504486 = idf(docFreq=345, maxDocs=44218)
                0.013024027 = queryNorm
              0.54847956 = fieldWeight in 3852, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8504486 = idf(docFreq=345, maxDocs=44218)
                0.09375 = fieldNorm(doc=3852)
          0.05046788 = weight(abstract_txt:perform in 3852) [ClassicSimilarity], result of:
            0.05046788 = score(doc=3852,freq=1.0), product of:
              0.08772914 = queryWeight, product of:
                1.0977379 = boost
                6.1362057 = idf(docFreq=259, maxDocs=44218)
                0.013024027 = queryNorm
              0.5752693 = fieldWeight in 3852, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1362057 = idf(docFreq=259, maxDocs=44218)
                0.09375 = fieldNorm(doc=3852)
          0.018334199 = weight(abstract_txt:retrieval in 3852) [ClassicSimilarity], result of:
            0.018334199 = score(doc=3852,freq=1.0), product of:
              0.056275383 = queryWeight, product of:
                1.2433716 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.013024027 = queryNorm
              0.3257943 = fieldWeight in 3852, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.09375 = fieldNorm(doc=3852)
          0.08684074 = weight(abstract_txt:randomly in 3852) [ClassicSimilarity], result of:
            0.08684074 = score(doc=3852,freq=1.0), product of:
              0.12597455 = queryWeight, product of:
                1.3154311 = boost
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.013024027 = queryNorm
              0.68935144 = fieldWeight in 3852, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.09375 = fieldNorm(doc=3852)
          0.052112956 = weight(abstract_txt:people in 3852) [ClassicSimilarity], result of:
            0.052112956 = score(doc=3852,freq=1.0), product of:
              0.11292089 = queryWeight, product of:
                1.7612818 = boost
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.013024027 = queryNorm
              0.4614997 = fieldWeight in 3852, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.09375 = fieldNorm(doc=3852)
          0.060127962 = weight(abstract_txt:tasks in 3852) [ClassicSimilarity], result of:
            0.060127962 = score(doc=3852,freq=1.0), product of:
              0.12422094 = queryWeight, product of:
                1.8473072 = boost
                5.1630983 = idf(docFreq=687, maxDocs=44218)
                0.013024027 = queryNorm
              0.48404047 = fieldWeight in 3852, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1630983 = idf(docFreq=687, maxDocs=44218)
                0.09375 = fieldNorm(doc=3852)
          0.073519625 = weight(abstract_txt:generated in 3852) [ClassicSimilarity], result of:
            0.073519625 = score(doc=3852,freq=1.0), product of:
              0.14204066 = queryWeight, product of:
                1.9753681 = boost
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.013024027 = queryNorm
              0.51759565 = fieldWeight in 3852, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.09375 = fieldNorm(doc=3852)
        0.28 = coord(7/25)
  4. Sanderson, M.; Lawrie, D.: Building, testing, and applying concept hierarchies (2000) 0.09
    0.09077488 = sum of:
      0.09077488 = product of:
        0.45387438 = sum of:
          0.033029873 = weight(abstract_txt:experiment in 37) [ClassicSimilarity], result of:
            0.033029873 = score(doc=37,freq=1.0), product of:
              0.07467799 = queryWeight, product of:
                1.0127984 = boost
                5.6614056 = idf(docFreq=417, maxDocs=44218)
                0.013024027 = queryNorm
              0.4422973 = fieldWeight in 37, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6614056 = idf(docFreq=417, maxDocs=44218)
                0.078125 = fieldNorm(doc=37)
          0.061266348 = weight(abstract_txt:generated in 37) [ClassicSimilarity], result of:
            0.061266348 = score(doc=37,freq=1.0), product of:
              0.14204066 = queryWeight, product of:
                1.9753681 = boost
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.013024027 = queryNorm
              0.43132967 = fieldWeight in 37, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.078125 = fieldNorm(doc=37)
          0.039880097 = weight(abstract_txt:content in 37) [ClassicSimilarity], result of:
            0.039880097 = score(doc=37,freq=1.0), product of:
              0.12212349 = queryWeight, product of:
                2.2432978 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.013024027 = queryNorm
              0.3265555 = fieldWeight in 37, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.078125 = fieldNorm(doc=37)
          0.083762325 = weight(abstract_txt:words in 37) [ClassicSimilarity], result of:
            0.083762325 = score(doc=37,freq=1.0), product of:
              0.20029075 = queryWeight, product of:
                2.8728821 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.013024027 = queryNorm
              0.41820365 = fieldWeight in 37, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=37)
          0.23593575 = weight(abstract_txt:phrases in 37) [ClassicSimilarity], result of:
            0.23593575 = score(doc=37,freq=1.0), product of:
              0.43968046 = queryWeight, product of:
                4.91502 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.013024027 = queryNorm
              0.5366073 = fieldWeight in 37, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.078125 = fieldNorm(doc=37)
        0.2 = coord(5/25)
  5. Pirkola, A.; Jarvelin, K.: ¬The effect of anaphor and ellipsis resolution on proximity searching in a text database (1995) 0.09
    0.08599525 = sum of:
      0.08599525 = product of:
        0.53747034 = sum of:
          0.047921244 = weight(abstract_txt:recall in 4088) [ClassicSimilarity], result of:
            0.047921244 = score(doc=4088,freq=3.0), product of:
              0.077002764 = queryWeight, product of:
                1.0284421 = boost
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.013024027 = queryNorm
              0.6223315 = fieldWeight in 4088, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.0625 = fieldNorm(doc=4088)
          0.017285649 = weight(abstract_txt:retrieval in 4088) [ClassicSimilarity], result of:
            0.017285649 = score(doc=4088,freq=2.0), product of:
              0.056275383 = queryWeight, product of:
                1.2433716 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.013024027 = queryNorm
              0.3071618 = fieldWeight in 4088, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=4088)
          0.09476625 = weight(abstract_txt:words in 4088) [ClassicSimilarity], result of:
            0.09476625 = score(doc=4088,freq=2.0), product of:
              0.20029075 = queryWeight, product of:
                2.8728821 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.013024027 = queryNorm
              0.47314343 = fieldWeight in 4088, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=4088)
          0.3774972 = weight(abstract_txt:phrases in 4088) [ClassicSimilarity], result of:
            0.3774972 = score(doc=4088,freq=4.0), product of:
              0.43968046 = queryWeight, product of:
                4.91502 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.013024027 = queryNorm
              0.8585717 = fieldWeight in 4088, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0625 = fieldNorm(doc=4088)
        0.16 = coord(4/25)