Document (#16044)

Author
Kelledy, F.
Smeaton, A.F.
Title
Signature files and beyond
Source
Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon
Imprint
London : Taylor Graham
Year
1996
Pages
S.124-144
Abstract
Proposes that signature files be used as a viable alternative to other indexing strategies such as inverted files for searching through large volumes of text. Demonstrates through simulation, that search times can be further reduced by enhancing the basic signature file concept using deterministic partitioning algorithms which eliminate the need for an exhaustive search of the entire signature file. Reports research to evaluate the performance of some deterministic partitioning algorithms in a non simulated environment using 276 MB of raw newspaper text (taken from the Wall Street Journal) and real user queries. Presents a selection of results to illustrate trends and highlight important aspects of the performance of these methods under realistic rather than simulated operating conditions. As a result of the research reported here certain aspects of this approach to signature files are shown to be found wanting and require improvement. Suggests lines of future research on the partitioning of signature files
Theme
Retrievalalgorithmen

Similar documents (author)

  1. Smeaton, A.F.: Prospects for intelligent, language-based information retrieval (1991) 5.35
    5.3510256 = sum of:
      5.3510256 = weight(author_txt:smeaton in 3700) [ClassicSimilarity], result of:
        5.3510256 = score(doc=3700,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.561642 = idf(docFreq=21, maxDocs=42306)
            0.116800025 = queryNorm
          5.351026 = fieldWeight in 3700, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.561642 = idf(docFreq=21, maxDocs=42306)
            0.625 = fieldNorm(doc=3700)
    
  2. Smeaton, A.F.: Retrieving information from hypertext : issues and problems (1991) 5.35
    5.3510256 = sum of:
      5.3510256 = weight(author_txt:smeaton in 4278) [ClassicSimilarity], result of:
        5.3510256 = score(doc=4278,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.561642 = idf(docFreq=21, maxDocs=42306)
            0.116800025 = queryNorm
          5.351026 = fieldWeight in 4278, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.561642 = idf(docFreq=21, maxDocs=42306)
            0.625 = fieldNorm(doc=4278)
    
  3. Smeaton, A.F.: Progress in the application of natural language processing to information retrieval tasks (1992) 5.35
    5.3510256 = sum of:
      5.3510256 = weight(author_txt:smeaton in 7080) [ClassicSimilarity], result of:
        5.3510256 = score(doc=7080,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.561642 = idf(docFreq=21, maxDocs=42306)
            0.116800025 = queryNorm
          5.351026 = fieldWeight in 7080, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.561642 = idf(docFreq=21, maxDocs=42306)
            0.625 = fieldNorm(doc=7080)
    
  4. Smeaton, A.F.: Information retrieval and hypertext : competing technologies or complementary access methods (1992) 5.35
    5.3510256 = sum of:
      5.3510256 = weight(author_txt:smeaton in 7503) [ClassicSimilarity], result of:
        5.3510256 = score(doc=7503,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.561642 = idf(docFreq=21, maxDocs=42306)
            0.116800025 = queryNorm
          5.351026 = fieldWeight in 7503, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.561642 = idf(docFreq=21, maxDocs=42306)
            0.625 = fieldNorm(doc=7503)
    
  5. Smeaton, A.F.: Natural language processing used in information retrieval tasks : an overview of achievements to date (1995) 5.35
    5.3510256 = sum of:
      5.3510256 = weight(author_txt:smeaton in 1334) [ClassicSimilarity], result of:
        5.3510256 = score(doc=1334,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.561642 = idf(docFreq=21, maxDocs=42306)
            0.116800025 = queryNorm
          5.351026 = fieldWeight in 1334, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.561642 = idf(docFreq=21, maxDocs=42306)
            0.625 = fieldNorm(doc=1334)
    

Similar documents (content)

  1. Lam, W.; Wong, K.-F.; Wong, C.-Y.: Chinese document indexing based on new partitioned signature file : model and evaluation (2001) 0.44
    0.4372568 = sum of:
      0.4372568 = product of:
        1.5616314 = sum of:
          0.014081113 = weight(abstract_txt:search in 1304) [ClassicSimilarity], result of:
            0.014081113 = score(doc=1304,freq=2.0), product of:
              0.04357849 = queryWeight, product of:
                1.0299459 = boost
                3.6556938 = idf(docFreq=2971, maxDocs=42306)
                0.01157412 = queryNorm
              0.3231207 = fieldWeight in 1304, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6556938 = idf(docFreq=2971, maxDocs=42306)
                0.0625 = fieldNorm(doc=1304)
          0.013556598 = weight(abstract_txt:text in 1304) [ClassicSimilarity], result of:
            0.013556598 = score(doc=1304,freq=1.0), product of:
              0.05353338 = queryWeight, product of:
                1.1415387 = boost
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.01157412 = queryNorm
              0.25323635 = fieldWeight in 1304, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.0625 = fieldNorm(doc=1304)
          0.020474346 = weight(abstract_txt:performance in 1304) [ClassicSimilarity], result of:
            0.020474346 = score(doc=1304,freq=1.0), product of:
              0.07046891 = queryWeight, product of:
                1.3097159 = boost
                4.6487103 = idf(docFreq=1100, maxDocs=42306)
                0.01157412 = queryNorm
              0.2905444 = fieldWeight in 1304, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6487103 = idf(docFreq=1100, maxDocs=42306)
                0.0625 = fieldNorm(doc=1304)
          0.07857891 = weight(abstract_txt:file in 1304) [ClassicSimilarity], result of:
            0.07857891 = score(doc=1304,freq=5.0), product of:
              0.10101941 = queryWeight, product of:
                1.5681252 = boost
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.01157412 = queryNorm
              0.7778595 = fieldWeight in 1304, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.0625 = fieldNorm(doc=1304)
          0.3166049 = weight(abstract_txt:partitioning in 1304) [ClassicSimilarity], result of:
            0.3166049 = score(doc=1304,freq=2.0), product of:
              0.3973956 = queryWeight, product of:
                3.8092148 = boost
                9.013627 = idf(docFreq=13, maxDocs=42306)
                0.01157412 = queryNorm
              0.7966996 = fieldWeight in 1304, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.013627 = idf(docFreq=13, maxDocs=42306)
                0.0625 = fieldNorm(doc=1304)
          0.13376866 = weight(abstract_txt:files in 1304) [ClassicSimilarity], result of:
            0.13376866 = score(doc=1304,freq=2.0), product of:
              0.26529586 = queryWeight, product of:
                4.0180335 = boost
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.01157412 = queryNorm
              0.5042245 = fieldWeight in 1304, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.0625 = fieldNorm(doc=1304)
          0.9845668 = weight(abstract_txt:signature in 1304) [ClassicSimilarity], result of:
            0.9845668 = score(doc=1304,freq=7.0), product of:
              0.7025794 = queryWeight, product of:
                7.162863 = boost
                8.47463 = idf(docFreq=23, maxDocs=42306)
                0.01157412 = queryNorm
              1.4013603 = fieldWeight in 1304, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.47463 = idf(docFreq=23, maxDocs=42306)
                0.0625 = fieldNorm(doc=1304)
        0.28 = coord(7/25)
    
  2. Lee, D.L.; Ren, L.: Document ranking on weight-partitioned signature files (1996) 0.31
    0.30822417 = sum of:
      0.30822417 = product of:
        1.5411208 = sum of:
          0.0149352765 = weight(abstract_txt:search in 3418) [ClassicSimilarity], result of:
            0.0149352765 = score(doc=3418,freq=1.0), product of:
              0.04357849 = queryWeight, product of:
                1.0299459 = boost
                3.6556938 = idf(docFreq=2971, maxDocs=42306)
                0.01157412 = queryNorm
              0.34272128 = fieldWeight in 3418, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6556938 = idf(docFreq=2971, maxDocs=42306)
                0.09375 = fieldNorm(doc=3418)
          0.03071152 = weight(abstract_txt:performance in 3418) [ClassicSimilarity], result of:
            0.03071152 = score(doc=3418,freq=1.0), product of:
              0.07046891 = queryWeight, product of:
                1.3097159 = boost
                4.6487103 = idf(docFreq=1100, maxDocs=42306)
                0.01157412 = queryNorm
              0.4358166 = fieldWeight in 3418, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6487103 = idf(docFreq=1100, maxDocs=42306)
                0.09375 = fieldNorm(doc=3418)
          0.105424665 = weight(abstract_txt:file in 3418) [ClassicSimilarity], result of:
            0.105424665 = score(doc=3418,freq=4.0), product of:
              0.10101941 = queryWeight, product of:
                1.5681252 = boost
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.01157412 = queryNorm
              1.043608 = fieldWeight in 3418, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.09375 = fieldNorm(doc=3418)
          0.1418831 = weight(abstract_txt:files in 3418) [ClassicSimilarity], result of:
            0.1418831 = score(doc=3418,freq=1.0), product of:
              0.26529586 = queryWeight, product of:
                4.0180335 = boost
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.01157412 = queryNorm
              0.53481084 = fieldWeight in 3418, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.09375 = fieldNorm(doc=3418)
          1.2481662 = weight(abstract_txt:signature in 3418) [ClassicSimilarity], result of:
            1.2481662 = score(doc=3418,freq=5.0), product of:
              0.7025794 = queryWeight, product of:
                7.162863 = boost
                8.47463 = idf(docFreq=23, maxDocs=42306)
                0.01157412 = queryNorm
              1.7765484 = fieldWeight in 3418, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.47463 = idf(docFreq=23, maxDocs=42306)
                0.09375 = fieldNorm(doc=3418)
        0.2 = coord(5/25)
    
  3. Carterette, B.; Can, F.: Comparing inverted files and signature files for searching a large lexicon (2005) 0.25
    0.24501818 = sum of:
      0.24501818 = product of:
        1.5313637 = sum of:
          0.09930447 = weight(abstract_txt:inverted in 3030) [ClassicSimilarity], result of:
            0.09930447 = score(doc=3030,freq=3.0), product of:
              0.095759004 = queryWeight, product of:
                1.0795758 = boost
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.01157412 = queryNorm
              1.0370249 = fieldWeight in 3030, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.078125 = fieldNorm(doc=3030)
          0.08785389 = weight(abstract_txt:file in 3030) [ClassicSimilarity], result of:
            0.08785389 = score(doc=3030,freq=4.0), product of:
              0.10101941 = queryWeight, product of:
                1.5681252 = boost
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.01157412 = queryNorm
              0.8696734 = fieldWeight in 3030, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.078125 = fieldNorm(doc=3030)
          0.20479062 = weight(abstract_txt:files in 3030) [ClassicSimilarity], result of:
            0.20479062 = score(doc=3030,freq=3.0), product of:
              0.26529586 = queryWeight, product of:
                4.0180335 = boost
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.01157412 = queryNorm
              0.77193296 = fieldWeight in 3030, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.078125 = fieldNorm(doc=3030)
          1.1394147 = weight(abstract_txt:signature in 3030) [ClassicSimilarity], result of:
            1.1394147 = score(doc=3030,freq=6.0), product of:
              0.7025794 = queryWeight, product of:
                7.162863 = boost
                8.47463 = idf(docFreq=23, maxDocs=42306)
                0.01157412 = queryNorm
              1.6217594 = fieldWeight in 3030, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.47463 = idf(docFreq=23, maxDocs=42306)
                0.078125 = fieldNorm(doc=3030)
        0.16 = coord(4/25)
    
  4. MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the update of partitioned inverted files (2007) 0.15
    0.15318616 = sum of:
      0.15318616 = product of:
        0.7659308 = sum of:
          0.11235018 = weight(abstract_txt:inverted in 2820) [ClassicSimilarity], result of:
            0.11235018 = score(doc=2820,freq=6.0), product of:
              0.095759004 = queryWeight, product of:
                1.0795758 = boost
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.01157412 = queryNorm
              1.1732597 = fieldWeight in 2820, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.0625 = fieldNorm(doc=2820)
          0.019171923 = weight(abstract_txt:text in 2820) [ClassicSimilarity], result of:
            0.019171923 = score(doc=2820,freq=2.0), product of:
              0.05353338 = queryWeight, product of:
                1.1415387 = boost
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.01157412 = queryNorm
              0.35813028 = fieldWeight in 2820, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.0625 = fieldNorm(doc=2820)
          0.035141557 = weight(abstract_txt:file in 2820) [ClassicSimilarity], result of:
            0.035141557 = score(doc=2820,freq=1.0), product of:
              0.10101941 = queryWeight, product of:
                1.5681252 = boost
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.01157412 = queryNorm
              0.34786934 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.0625 = fieldNorm(doc=2820)
          0.38776025 = weight(abstract_txt:partitioning in 2820) [ClassicSimilarity], result of:
            0.38776025 = score(doc=2820,freq=3.0), product of:
              0.3973956 = queryWeight, product of:
                3.8092148 = boost
                9.013627 = idf(docFreq=13, maxDocs=42306)
                0.01157412 = queryNorm
              0.9757537 = fieldWeight in 2820, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.013627 = idf(docFreq=13, maxDocs=42306)
                0.0625 = fieldNorm(doc=2820)
          0.21150684 = weight(abstract_txt:files in 2820) [ClassicSimilarity], result of:
            0.21150684 = score(doc=2820,freq=5.0), product of:
              0.26529586 = queryWeight, product of:
                4.0180335 = boost
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.01157412 = queryNorm
              0.79724896 = fieldWeight in 2820, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.0625 = fieldNorm(doc=2820)
        0.2 = coord(5/25)
    
  5. Robertson, A.M.; Willett, P.: Applications of n-grams in textual information systems (1998) 0.15
    0.14889243 = sum of:
      0.14889243 = product of:
        0.93057775 = sum of:
          0.08026684 = weight(abstract_txt:inverted in 716) [ClassicSimilarity], result of:
            0.08026684 = score(doc=716,freq=1.0), product of:
              0.095759004 = queryWeight, product of:
                1.0795758 = boost
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.01157412 = queryNorm
              0.8382172 = fieldWeight in 716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.109375 = fieldNorm(doc=716)
          0.033550866 = weight(abstract_txt:text in 716) [ClassicSimilarity], result of:
            0.033550866 = score(doc=716,freq=2.0), product of:
              0.05353338 = queryWeight, product of:
                1.1415387 = boost
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.01157412 = queryNorm
              0.626728 = fieldWeight in 716, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0517817 = idf(docFreq=1999, maxDocs=42306)
                0.109375 = fieldNorm(doc=716)
          0.16553028 = weight(abstract_txt:files in 716) [ClassicSimilarity], result of:
            0.16553028 = score(doc=716,freq=1.0), product of:
              0.26529586 = queryWeight, product of:
                4.0180335 = boost
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.01157412 = queryNorm
              0.62394595 = fieldWeight in 716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.109375 = fieldNorm(doc=716)
          0.65122974 = weight(abstract_txt:signature in 716) [ClassicSimilarity], result of:
            0.65122974 = score(doc=716,freq=1.0), product of:
              0.7025794 = queryWeight, product of:
                7.162863 = boost
                8.47463 = idf(docFreq=23, maxDocs=42306)
                0.01157412 = queryNorm
              0.92691267 = fieldWeight in 716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.47463 = idf(docFreq=23, maxDocs=42306)
                0.109375 = fieldNorm(doc=716)
        0.16 = coord(4/25)