Document (#25304)

Author
Lam, W.
Wong, K.-F.
Wong, C.-Y.
Title
Chinese document indexing based on new partitioned signature file : model and evaluation
Source
Journal of the American Society for Information Science and technology. 52(2001) no.7, S.584-597
Year
2001
Abstract
In this article we investigate the use of signature files in Chinese information retrieval system and propose a new partitioning method for Chinese signature file based on the characteristic of Chinese words. Our partitioning method, called Partitioned Signature File for Chinese (PSFC), offers faster search efficiency than the traditional single signature file approach. We devise a general scheme for controlling the trade-off between the false drop and storage overhead while maintaining the search space reduction in PSFC. An analytical study is presented to support the claims of our method. We also propose two new hashing methods for Chinese signature files so that the signature file will be more suitable for dynamic environment while the retrieval performance is maintained. Furthermore, we have implemented PSFC and the new hashing methods, and we evaluated them using a large-scale real-world Chinese document corpus, namely, the TREC-5 (Text REtrieval Conference) Chinese collection. The experimental results confirm the features of PSFC and demonstrate its superiority over the traditional single signature file method

Similar documents (author)

  1. Wong, S.K.M.: On modelling information retrieval with probabilistic inference (1995) 5.13
    5.125237 = sum of:
      5.125237 = weight(author_txt:wong in 1938) [ClassicSimilarity], result of:
        5.125237 = fieldWeight in 1938, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.200379 = idf(docFreq=32, maxDocs=44218)
          0.625 = fieldNorm(doc=1938)
    
  2. Wong, K.: Frühe Spuren des menschlichen Geistes (2005) 5.13
    5.125237 = sum of:
      5.125237 = weight(author_txt:wong in 983) [ClassicSimilarity], result of:
        5.125237 = fieldWeight in 983, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.200379 = idf(docFreq=32, maxDocs=44218)
          0.625 = fieldNorm(doc=983)
    
  3. Salton, G.; Wong, A.: Generation and search of clustered files (1978) 4.10
    4.1001897 = sum of:
      4.1001897 = weight(author_txt:wong in 2411) [ClassicSimilarity], result of:
        4.1001897 = fieldWeight in 2411, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.200379 = idf(docFreq=32, maxDocs=44218)
          0.5 = fieldNorm(doc=2411)
    
  4. Wong, S.K.M.; Yao, Y.Y.: ¬An information-theoretic measure of term specifics (1992) 4.10
    4.1001897 = sum of:
      4.1001897 = weight(author_txt:wong in 4807) [ClassicSimilarity], result of:
        4.1001897 = fieldWeight in 4807, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.200379 = idf(docFreq=32, maxDocs=44218)
          0.5 = fieldNorm(doc=4807)
    
  5. Wong, W.Y.P.; Lee, D.L.: Implementation of partial document ranking using inverted files (1993) 4.10
    4.1001897 = sum of:
      4.1001897 = weight(author_txt:wong in 6539) [ClassicSimilarity], result of:
        4.1001897 = fieldWeight in 6539, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.200379 = idf(docFreq=32, maxDocs=44218)
          0.5 = fieldNorm(doc=6539)
    

Similar documents (content)

  1. Lee, D.L.; Ren, L.: Document ranking on weight-partitioned signature files (1996) 0.64
    0.6390664 = sum of:
      0.6390664 = product of:
        1.9970825 = sum of:
          0.011645426 = weight(abstract_txt:search in 2417) [ClassicSimilarity], result of:
            0.011645426 = score(doc=2417,freq=1.0), product of:
              0.03395738 = queryWeight, product of:
                1.013374 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.009160401 = queryNorm
              0.34294242 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.07491421 = weight(abstract_txt:false in 2417) [ClassicSimilarity], result of:
            0.07491421 = score(doc=2417,freq=2.0), product of:
              0.073992334 = queryWeight, product of:
                1.0577451 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.009160401 = queryNorm
              1.012459 = fieldWeight in 2417, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.037635643 = weight(abstract_txt:document in 2417) [ClassicSimilarity], result of:
            0.037635643 = score(doc=2417,freq=4.0), product of:
              0.04676025 = queryWeight, product of:
                1.1891621 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.009160401 = queryNorm
              0.80486405 = fieldWeight in 2417, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.0149766095 = weight(abstract_txt:retrieval in 2417) [ClassicSimilarity], result of:
            0.0149766095 = score(doc=2417,freq=1.0), product of:
              0.045969523 = queryWeight, product of:
                1.4440535 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.009160401 = queryNorm
              0.3257943 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.044536475 = weight(abstract_txt:files in 2417) [ClassicSimilarity], result of:
            0.044536475 = score(doc=2417,freq=1.0), product of:
              0.08304391 = queryWeight, product of:
                1.5847347 = boost
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.009160401 = queryNorm
              0.5363003 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.25003055 = weight(abstract_txt:partitioned in 2417) [ClassicSimilarity], result of:
            0.25003055 = score(doc=2417,freq=2.0), product of:
              0.20820093 = queryWeight, product of:
                2.5092504 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.009160401 = queryNorm
              1.2009099 = fieldWeight in 2417, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.24784315 = weight(abstract_txt:file in 2417) [ClassicSimilarity], result of:
            0.24784315 = score(doc=2417,freq=4.0), product of:
              0.23693849 = queryWeight, product of:
                4.636402 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.009160401 = queryNorm
              1.0460231 = fieldWeight in 2417, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          1.3155004 = weight(abstract_txt:signature in 2417) [ClassicSimilarity], result of:
            1.3155004 = score(doc=2417,freq=5.0), product of:
              0.7366388 = queryWeight, product of:
                9.439738 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.009160401 = queryNorm
              1.7858148 = fieldWeight in 2417, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
        0.32 = coord(8/25)
    
  2. Kelledy, F.; Smeaton, A.F.: Signature files and beyond (1996) 0.38
    0.38122743 = sum of:
      0.38122743 = product of:
        1.5884477 = sum of:
          0.013724267 = weight(abstract_txt:search in 6973) [ClassicSimilarity], result of:
            0.013724267 = score(doc=6973,freq=2.0), product of:
              0.03395738 = queryWeight, product of:
                1.013374 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.009160401 = queryNorm
              0.4041615 = fieldWeight in 6973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.078125 = fieldNorm(doc=6973)
          0.014136715 = weight(abstract_txt:methods in 6973) [ClassicSimilarity], result of:
            0.014136715 = score(doc=6973,freq=1.0), product of:
              0.043636553 = queryWeight, product of:
                1.1487563 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.009160401 = queryNorm
              0.32396498 = fieldWeight in 6973, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.078125 = fieldNorm(doc=6973)
          0.07422745 = weight(abstract_txt:files in 6973) [ClassicSimilarity], result of:
            0.07422745 = score(doc=6973,freq=4.0), product of:
              0.08304391 = queryWeight, product of:
                1.5847347 = boost
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.009160401 = queryNorm
              0.89383376 = fieldWeight in 6973, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.078125 = fieldNorm(doc=6973)
          0.24406596 = weight(abstract_txt:partitioning in 6973) [ClassicSimilarity], result of:
            0.24406596 = score(doc=6973,freq=3.0), product of:
              0.20210752 = queryWeight, product of:
                2.4722586 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.009160401 = queryNorm
              1.2076045 = fieldWeight in 6973, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.078125 = fieldNorm(doc=6973)
          0.14604299 = weight(abstract_txt:file in 6973) [ClassicSimilarity], result of:
            0.14604299 = score(doc=6973,freq=2.0), product of:
              0.23693849 = queryWeight, product of:
                4.636402 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.009160401 = queryNorm
              0.6163751 = fieldWeight in 6973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.078125 = fieldNorm(doc=6973)
          1.0962503 = weight(abstract_txt:signature in 6973) [ClassicSimilarity], result of:
            1.0962503 = score(doc=6973,freq=5.0), product of:
              0.7366388 = queryWeight, product of:
                9.439738 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.009160401 = queryNorm
              1.488179 = fieldWeight in 6973, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.078125 = fieldNorm(doc=6973)
        0.24 = coord(6/25)
    
  3. Carterette, B.; Can, F.: Comparing inverted files and signature files for searching a large lexicon (2005) 0.31
    0.3120266 = sum of:
      0.3120266 = product of:
        1.5601329 = sum of:
          0.037301376 = weight(abstract_txt:faster in 1029) [ClassicSimilarity], result of:
            0.037301376 = score(doc=1029,freq=1.0), product of:
              0.06613398 = queryWeight, product of:
                7.2195506 = idf(docFreq=87, maxDocs=44218)
                0.009160401 = queryNorm
              0.56402737 = fieldWeight in 1029, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2195506 = idf(docFreq=87, maxDocs=44218)
                0.078125 = fieldNorm(doc=1029)
          0.064282864 = weight(abstract_txt:files in 1029) [ClassicSimilarity], result of:
            0.064282864 = score(doc=1029,freq=3.0), product of:
              0.08304391 = queryWeight, product of:
                1.5847347 = boost
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.009160401 = queryNorm
              0.7740828 = fieldWeight in 1029, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.078125 = fieldNorm(doc=1029)
          0.051130712 = weight(abstract_txt:method in 1029) [ClassicSimilarity], result of:
            0.051130712 = score(doc=1029,freq=2.0), product of:
              0.10281883 = queryWeight, product of:
                2.4937563 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.009160401 = queryNorm
              0.49728936 = fieldWeight in 1029, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=1029)
          0.20653597 = weight(abstract_txt:file in 1029) [ClassicSimilarity], result of:
            0.20653597 = score(doc=1029,freq=4.0), product of:
              0.23693849 = queryWeight, product of:
                4.636402 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.009160401 = queryNorm
              0.871686 = fieldWeight in 1029, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.078125 = fieldNorm(doc=1029)
          1.200882 = weight(abstract_txt:signature in 1029) [ClassicSimilarity], result of:
            1.200882 = score(doc=1029,freq=6.0), product of:
              0.7366388 = queryWeight, product of:
                9.439738 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.009160401 = queryNorm
              1.6302183 = fieldWeight in 1029, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.078125 = fieldNorm(doc=1029)
        0.2 = coord(5/25)
    
  4. Faloutsos, C.: Signature files (1992) 0.17
    0.17180549 = sum of:
      0.17180549 = product of:
        1.4317124 = sum of:
          0.034279715 = weight(abstract_txt:methods in 3499) [ClassicSimilarity], result of:
            0.034279715 = score(doc=3499,freq=3.0), product of:
              0.043636553 = queryWeight, product of:
                1.1487563 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.009160401 = queryNorm
              0.78557336 = fieldWeight in 3499, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.109375 = fieldNorm(doc=3499)
          0.024710143 = weight(abstract_txt:retrieval in 3499) [ClassicSimilarity], result of:
            0.024710143 = score(doc=3499,freq=2.0), product of:
              0.045969523 = queryWeight, product of:
                1.4440535 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.009160401 = queryNorm
              0.53753316 = fieldWeight in 3499, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.109375 = fieldNorm(doc=3499)
          1.3727225 = weight(abstract_txt:signature in 3499) [ClassicSimilarity], result of:
            1.3727225 = score(doc=3499,freq=4.0), product of:
              0.7366388 = queryWeight, product of:
                9.439738 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.009160401 = queryNorm
              1.8634948 = fieldWeight in 3499, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.109375 = fieldNorm(doc=3499)
        0.12 = coord(3/25)
    
  5. Almerri, J.; McGregor, D.R.: Codon signatures : a document retrieval method (1996) 0.17
    0.16527939 = sum of:
      0.16527939 = product of:
        0.82639694 = sum of:
          0.015681518 = weight(abstract_txt:document in 6970) [ClassicSimilarity], result of:
            0.015681518 = score(doc=6970,freq=1.0), product of:
              0.04676025 = queryWeight, product of:
                1.1891621 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.009160401 = queryNorm
              0.33536002 = fieldWeight in 6970, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=6970)
          0.017650103 = weight(abstract_txt:retrieval in 6970) [ClassicSimilarity], result of:
            0.017650103 = score(doc=6970,freq=2.0), product of:
              0.045969523 = queryWeight, product of:
                1.4440535 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.009160401 = queryNorm
              0.38395226 = fieldWeight in 6970, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=6970)
          0.037113726 = weight(abstract_txt:files in 6970) [ClassicSimilarity], result of:
            0.037113726 = score(doc=6970,freq=1.0), product of:
              0.08304391 = queryWeight, product of:
                1.5847347 = boost
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.009160401 = queryNorm
              0.44691688 = fieldWeight in 6970, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.078125 = fieldNorm(doc=6970)
          0.062622085 = weight(abstract_txt:method in 6970) [ClassicSimilarity], result of:
            0.062622085 = score(doc=6970,freq=3.0), product of:
              0.10281883 = queryWeight, product of:
                2.4937563 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.009160401 = queryNorm
              0.60905266 = fieldWeight in 6970, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=6970)
          0.6933295 = weight(abstract_txt:signature in 6970) [ClassicSimilarity], result of:
            0.6933295 = score(doc=6970,freq=2.0), product of:
              0.7366388 = queryWeight, product of:
                9.439738 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.009160401 = queryNorm
              0.94120693 = fieldWeight in 6970, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.078125 = fieldNorm(doc=6970)
        0.2 = coord(5/25)