Document (#16040)

Author
Almerri, J.
McGregor, D.R.
Title
Codon signatures : a document retrieval method
Source
Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon
Imprint
London : Taylor Graham
Year
1996
Pages
S.154-173
Abstract
The performance of an information retrieval system depends on its ability to distinguish between relevant and non relevant documents in response to users' information needs. Proposes a new method called Codon Signatures (CS) that is able to use a relationship between terms and concepts. The Codon Signature is designed to improve retrieval performance (recall and precision) by creating the Codon structure, a representation of semantic meaning in context. It is also designed to reduce the amount of storage space required by the index. Presents a theoretical analysis of CS paprameters and performance. The method was tested against 3 document base collections and gave acceptable results regarding information effectiveness and efficiency, compared to a conventional Signature Files method

Similar documents (content)

  1. Lam, W.; Wong, K.-F.; Wong, C.-Y.: Chinese document indexing based on new partitioned signature file : model and evaluation (2001) 0.45
    0.44839564 = sum of:
      0.44839564 = product of:
        1.1209891 = sum of:
          0.054055322 = weight(abstract_txt:files in 303) [ClassicSimilarity], result of:
            0.054055322 = score(doc=303,freq=2.0), product of:
              0.10690714 = queryWeight, product of:
                1.0155215 = boost
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.01840267 = queryNorm
              0.50562876 = fieldWeight in 303, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.040705845 = weight(abstract_txt:storage in 303) [ClassicSimilarity], result of:
            0.040705845 = score(doc=303,freq=1.0), product of:
              0.1114882 = queryWeight, product of:
                1.0370513 = boost
                5.8418155 = idf(docFreq=348, maxDocs=44218)
                0.01840267 = queryNorm
              0.36511347 = fieldWeight in 303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8418155 = idf(docFreq=348, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.046058737 = weight(abstract_txt:efficiency in 303) [ClassicSimilarity], result of:
            0.046058737 = score(doc=303,freq=1.0), product of:
              0.12105956 = queryWeight, product of:
                1.0806507 = boost
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.01840267 = queryNorm
              0.38046345 = fieldWeight in 303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.016964663 = weight(abstract_txt:between in 303) [ClassicSimilarity], result of:
            0.016964663 = score(doc=303,freq=1.0), product of:
              0.07837265 = queryWeight, product of:
                1.2296543 = boost
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.01840267 = queryNorm
              0.21646151 = fieldWeight in 303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.008691406 = weight(abstract_txt:information in 303) [ClassicSimilarity], result of:
            0.008691406 = score(doc=303,freq=1.0), product of:
              0.057441376 = queryWeight, product of:
                1.2893144 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.01840267 = queryNorm
              0.15130915 = fieldWeight in 303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.032300334 = weight(abstract_txt:document in 303) [ClassicSimilarity], result of:
            0.032300334 = score(doc=303,freq=1.0), product of:
              0.120394245 = queryWeight, product of:
                1.5240655 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.01840267 = queryNorm
              0.26828802 = fieldWeight in 303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.044525802 = weight(abstract_txt:retrieval in 303) [ClassicSimilarity], result of:
            0.044525802 = score(doc=303,freq=3.0), product of:
              0.11835835 = queryWeight, product of:
                1.8507419 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.01840267 = queryNorm
              0.37619486 = fieldWeight in 303, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.060812898 = weight(abstract_txt:performance in 303) [ClassicSimilarity], result of:
            0.060812898 = score(doc=303,freq=1.0), product of:
              0.2101335 = queryWeight, product of:
                2.4660056 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.01840267 = queryNorm
              0.28940126 = fieldWeight in 303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.6679325 = weight(abstract_txt:signature in 303) [ClassicSimilarity], result of:
            0.6679325 = score(doc=303,freq=7.0), product of:
              0.47415838 = queryWeight, product of:
                3.0245621 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.01840267 = queryNorm
              1.4086696 = fieldWeight in 303, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.14894152 = weight(abstract_txt:method in 303) [ClassicSimilarity], result of:
            0.14894152 = score(doc=303,freq=4.0), product of:
              0.26472905 = queryWeight, product of:
                3.196072 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.01840267 = queryNorm
              0.56261873 = fieldWeight in 303, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
        0.4 = coord(10/25)
    
  2. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.30
    0.30029216 = sum of:
      0.30029216 = product of:
        1.2512174 = sum of:
          0.057573423 = weight(abstract_txt:efficiency in 690) [ClassicSimilarity], result of:
            0.057573423 = score(doc=690,freq=1.0), product of:
              0.12105956 = queryWeight, product of:
                1.0806507 = boost
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.01840267 = queryNorm
              0.47557932 = fieldWeight in 690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
          0.021205828 = weight(abstract_txt:between in 690) [ClassicSimilarity], result of:
            0.021205828 = score(doc=690,freq=1.0), product of:
              0.07837265 = queryWeight, product of:
                1.2296543 = boost
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.01840267 = queryNorm
              0.2705769 = fieldWeight in 690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
          0.069932275 = weight(abstract_txt:document in 690) [ClassicSimilarity], result of:
            0.069932275 = score(doc=690,freq=3.0), product of:
              0.120394245 = queryWeight, product of:
                1.5240655 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.01840267 = queryNorm
              0.5808606 = fieldWeight in 690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
          0.07601613 = weight(abstract_txt:performance in 690) [ClassicSimilarity], result of:
            0.07601613 = score(doc=690,freq=1.0), product of:
              0.2101335 = queryWeight, product of:
                2.4660056 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.01840267 = queryNorm
              0.3617516 = fieldWeight in 690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
          0.31556845 = weight(abstract_txt:signature in 690) [ClassicSimilarity], result of:
            0.31556845 = score(doc=690,freq=1.0), product of:
              0.47415838 = queryWeight, product of:
                3.0245621 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.01840267 = queryNorm
              0.66553384 = fieldWeight in 690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
          0.7109212 = weight(abstract_txt:signatures in 690) [ClassicSimilarity], result of:
            0.7109212 = score(doc=690,freq=3.0), product of:
              0.5649825 = queryWeight, product of:
                3.301553 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.01840267 = queryNorm
              1.2583067 = fieldWeight in 690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
        0.24 = coord(6/25)
    
  3. Lee, D.L.; Ren, L.: Document ranking on weight-partitioned signature files (1996) 0.29
    0.2860399 = sum of:
      0.2860399 = product of:
        1.1918329 = sum of:
          0.05733433 = weight(abstract_txt:files in 2417) [ClassicSimilarity], result of:
            0.05733433 = score(doc=2417,freq=1.0), product of:
              0.10690714 = queryWeight, product of:
                1.0155215 = boost
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.01840267 = queryNorm
              0.5363003 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.061058763 = weight(abstract_txt:storage in 2417) [ClassicSimilarity], result of:
            0.061058763 = score(doc=2417,freq=1.0), product of:
              0.1114882 = queryWeight, product of:
                1.0370513 = boost
                5.8418155 = idf(docFreq=348, maxDocs=44218)
                0.01840267 = queryNorm
              0.5476702 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8418155 = idf(docFreq=348, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.096901 = weight(abstract_txt:document in 2417) [ClassicSimilarity], result of:
            0.096901 = score(doc=2417,freq=4.0), product of:
              0.120394245 = queryWeight, product of:
                1.5240655 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.01840267 = queryNorm
              0.80486405 = fieldWeight in 2417, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.038560476 = weight(abstract_txt:retrieval in 2417) [ClassicSimilarity], result of:
            0.038560476 = score(doc=2417,freq=1.0), product of:
              0.11835835 = queryWeight, product of:
                1.8507419 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.01840267 = queryNorm
              0.3257943 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.09121934 = weight(abstract_txt:performance in 2417) [ClassicSimilarity], result of:
            0.09121934 = score(doc=2417,freq=1.0), product of:
              0.2101335 = queryWeight, product of:
                2.4660056 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.01840267 = queryNorm
              0.43410188 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.846759 = weight(abstract_txt:signature in 2417) [ClassicSimilarity], result of:
            0.846759 = score(doc=2417,freq=5.0), product of:
              0.47415838 = queryWeight, product of:
                3.0245621 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.01840267 = queryNorm
              1.7858148 = fieldWeight in 2417, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
        0.24 = coord(6/25)
    
  4. Burgin, R.: ¬The Monte Carlo method and the evaluation of retrieval system performance (1999) 0.21
    0.2104703 = sum of:
      0.2104703 = product of:
        0.75167966 = sum of:
          0.0514214 = weight(abstract_txt:between in 2946) [ClassicSimilarity], result of:
            0.0514214 = score(doc=2946,freq=3.0), product of:
              0.07837265 = queryWeight, product of:
                1.2296543 = boost
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.01840267 = queryNorm
              0.6561141 = fieldWeight in 2946, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.109375 = fieldNorm(doc=2946)
          0.1704209 = weight(abstract_txt:distinguish in 2946) [ClassicSimilarity], result of:
            0.1704209 = score(doc=2946,freq=2.0), product of:
              0.15828422 = queryWeight, product of:
                1.2356759 = boost
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.01840267 = queryNorm
              1.0766765 = fieldWeight in 2946, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.109375 = fieldNorm(doc=2946)
          0.1332048 = weight(abstract_txt:acceptable in 2946) [ClassicSimilarity], result of:
            0.1332048 = score(doc=2946,freq=1.0), product of:
              0.16921763 = queryWeight, product of:
                1.2776402 = boost
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.01840267 = queryNorm
              0.78718036 = fieldWeight in 2946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.109375 = fieldNorm(doc=2946)
          0.01520996 = weight(abstract_txt:information in 2946) [ClassicSimilarity], result of:
            0.01520996 = score(doc=2946,freq=1.0), product of:
              0.057441376 = queryWeight, product of:
                1.2893144 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.01840267 = queryNorm
              0.264791 = fieldWeight in 2946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.109375 = fieldNorm(doc=2946)
          0.10059449 = weight(abstract_txt:retrieval in 2946) [ClassicSimilarity], result of:
            0.10059449 = score(doc=2946,freq=5.0), product of:
              0.11835835 = queryWeight, product of:
                1.8507419 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.01840267 = queryNorm
              0.8499146 = fieldWeight in 2946, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.109375 = fieldNorm(doc=2946)
          0.15050425 = weight(abstract_txt:performance in 2946) [ClassicSimilarity], result of:
            0.15050425 = score(doc=2946,freq=2.0), product of:
              0.2101335 = queryWeight, product of:
                2.4660056 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.01840267 = queryNorm
              0.7162316 = fieldWeight in 2946, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.109375 = fieldNorm(doc=2946)
          0.13032383 = weight(abstract_txt:method in 2946) [ClassicSimilarity], result of:
            0.13032383 = score(doc=2946,freq=1.0), product of:
              0.26472905 = queryWeight, product of:
                3.196072 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.01840267 = queryNorm
              0.4922914 = fieldWeight in 2946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.109375 = fieldNorm(doc=2946)
        0.28 = coord(7/25)
    
  5. Yang, L.; Ji, D.; Leong, M.: Document reranking by term distribution and maximal marginal relevance for chinese information retrieval (2007) 0.18
    0.17819272 = sum of:
      0.17819272 = product of:
        0.5568523 = sum of:
          0.04849162 = weight(abstract_txt:recall in 907) [ClassicSimilarity], result of:
            0.04849162 = score(doc=907,freq=1.0), product of:
              0.107968114 = queryWeight, product of:
                1.0205482 = boost
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.01840267 = queryNorm
              0.44912907 = fieldWeight in 907, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.078125 = fieldNorm(doc=907)
          0.05051151 = weight(abstract_txt:against in 907) [ClassicSimilarity], result of:
            0.05051151 = score(doc=907,freq=1.0), product of:
              0.1109459 = queryWeight, product of:
                1.034526 = boost
                5.8275905 = idf(docFreq=353, maxDocs=44218)
                0.01840267 = queryNorm
              0.4552805 = fieldWeight in 907, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8275905 = idf(docFreq=353, maxDocs=44218)
                0.078125 = fieldNorm(doc=907)
          0.010864257 = weight(abstract_txt:information in 907) [ClassicSimilarity], result of:
            0.010864257 = score(doc=907,freq=1.0), product of:
              0.057441376 = queryWeight, product of:
                1.2893144 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.01840267 = queryNorm
              0.18913643 = fieldWeight in 907, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.078125 = fieldNorm(doc=907)
          0.08075083 = weight(abstract_txt:document in 907) [ClassicSimilarity], result of:
            0.08075083 = score(doc=907,freq=4.0), product of:
              0.120394245 = queryWeight, product of:
                1.5240655 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.01840267 = queryNorm
              0.67072004 = fieldWeight in 907, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=907)
          0.07190728 = weight(abstract_txt:relevant in 907) [ClassicSimilarity], result of:
            0.07190728 = score(doc=907,freq=2.0), product of:
              0.14039974 = queryWeight, product of:
                1.645826 = boost
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.01840267 = queryNorm
              0.5121611 = fieldWeight in 907, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.078125 = fieldNorm(doc=907)
          0.032133732 = weight(abstract_txt:retrieval in 907) [ClassicSimilarity], result of:
            0.032133732 = score(doc=907,freq=1.0), product of:
              0.11835835 = queryWeight, product of:
                1.8507419 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.01840267 = queryNorm
              0.27149525 = fieldWeight in 907, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=907)
          0.07601613 = weight(abstract_txt:performance in 907) [ClassicSimilarity], result of:
            0.07601613 = score(doc=907,freq=1.0), product of:
              0.2101335 = queryWeight, product of:
                2.4660056 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.01840267 = queryNorm
              0.3617516 = fieldWeight in 907, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.078125 = fieldNorm(doc=907)
          0.18617691 = weight(abstract_txt:method in 907) [ClassicSimilarity], result of:
            0.18617691 = score(doc=907,freq=4.0), product of:
              0.26472905 = queryWeight, product of:
                3.196072 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.01840267 = queryNorm
              0.7032734 = fieldWeight in 907, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=907)
        0.32 = coord(8/25)