Document (#5818)

Author
Shaw, R.J.
Willett, P.
Title
On the non-random nature of nearest-neighbour document clusters
Source
Information processing and management. 29(1993) no.4, S.449-452
Year
1993
Abstract
It has been suggested that the observed values of retrieval effectiveness that are obtained in searches of files of nearest-neighbour clusters can be explained by assuming that the pairwise inter-document similarities used to construct the clusters have been generated randomly. Such similarities are significantly different from those obtained by a random generation procedure

Similar documents (author)

  1. Shaw, R.R.: Classification systems (1962/63) 1.77
    1.7668729 = sum of:
      1.7668729 = product of:
        3.5337458 = sum of:
          3.5337458 = weight(author_txt:shaw in 603) [ClassicSimilarity], result of:
            3.5337458 = score(doc=603,freq=1.0), product of:
              0.70710677 = queryWeight, product of:
                7.995954 = idf(docFreq=38, maxDocs=42596)
                0.08843307 = queryNorm
              4.9974713 = fieldWeight in 603, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.995954 = idf(docFreq=38, maxDocs=42596)
                0.625 = fieldNorm(doc=603)
        0.5 = coord(1/2)
    
  2. Willett, P.: Recent trends in hierarchic document clustering : a critical review (1988) 1.77
    1.7668729 = sum of:
      1.7668729 = product of:
        3.5337458 = sum of:
          3.5337458 = weight(author_txt:willett in 2604) [ClassicSimilarity], result of:
            3.5337458 = score(doc=2604,freq=1.0), product of:
              0.70710677 = queryWeight, product of:
                7.995954 = idf(docFreq=38, maxDocs=42596)
                0.08843307 = queryNorm
              4.9974713 = fieldWeight in 2604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.995954 = idf(docFreq=38, maxDocs=42596)
                0.625 = fieldNorm(doc=2604)
        0.5 = coord(1/2)
    
  3. Shaw, W.M.: Subject and citation indexing : pt.1: the clustering structure of composite representations in the cystic fibrosis document collection (1991) 1.77
    1.7668729 = sum of:
      1.7668729 = product of:
        3.5337458 = sum of:
          3.5337458 = weight(author_txt:shaw in 4841) [ClassicSimilarity], result of:
            3.5337458 = score(doc=4841,freq=1.0), product of:
              0.70710677 = queryWeight, product of:
                7.995954 = idf(docFreq=38, maxDocs=42596)
                0.08843307 = queryNorm
              4.9974713 = fieldWeight in 4841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.995954 = idf(docFreq=38, maxDocs=42596)
                0.625 = fieldNorm(doc=4841)
        0.5 = coord(1/2)
    
  4. Shaw, W.M.: Subject and citation indexing : pt.2: the optimal, cluster-based retrieval performance of composite representations (1991) 1.77
    1.7668729 = sum of:
      1.7668729 = product of:
        3.5337458 = sum of:
          3.5337458 = weight(author_txt:shaw in 4842) [ClassicSimilarity], result of:
            3.5337458 = score(doc=4842,freq=1.0), product of:
              0.70710677 = queryWeight, product of:
                7.995954 = idf(docFreq=38, maxDocs=42596)
                0.08843307 = queryNorm
              4.9974713 = fieldWeight in 4842, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.995954 = idf(docFreq=38, maxDocs=42596)
                0.625 = fieldNorm(doc=4842)
        0.5 = coord(1/2)
    
  5. Willett, P.: Best-match text retrieval (1993) 1.77
    1.7668729 = sum of:
      1.7668729 = product of:
        3.5337458 = sum of:
          3.5337458 = weight(author_txt:willett in 7818) [ClassicSimilarity], result of:
            3.5337458 = score(doc=7818,freq=1.0), product of:
              0.70710677 = queryWeight, product of:
                7.995954 = idf(docFreq=38, maxDocs=42596)
                0.08843307 = queryNorm
              4.9974713 = fieldWeight in 7818, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.995954 = idf(docFreq=38, maxDocs=42596)
                0.625 = fieldNorm(doc=7818)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Sembok, T.M.T.; Rijsbergen, C.J. van: IMAGING: a relevant feedback retrieval with nearest neighbour clusters (1994) 0.29
    0.2872767 = sum of:
      0.2872767 = product of:
        1.7954793 = sum of:
          0.17995131 = weight(abstract_txt:obtained in 1140) [ClassicSimilarity], result of:
            0.17995131 = score(doc=1140,freq=1.0), product of:
              0.19928913 = queryWeight, product of:
                2.2587194 = boost
                5.7789826 = idf(docFreq=357, maxDocs=42596)
                0.01526757 = queryNorm
              0.902966 = fieldWeight in 1140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7789826 = idf(docFreq=357, maxDocs=42596)
                0.15625 = fieldNorm(doc=1140)
          0.5071675 = weight(abstract_txt:nearest in 1140) [ClassicSimilarity], result of:
            0.5071675 = score(doc=1140,freq=1.0), product of:
              0.39763188 = queryWeight, product of:
                3.1905172 = boost
                8.163008 = idf(docFreq=32, maxDocs=42596)
                0.01526757 = queryNorm
              1.27547 = fieldWeight in 1140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.163008 = idf(docFreq=32, maxDocs=42596)
                0.15625 = fieldNorm(doc=1140)
          0.72005147 = weight(abstract_txt:neighbour in 1140) [ClassicSimilarity], result of:
            0.72005147 = score(doc=1140,freq=1.0), product of:
              0.5022916 = queryWeight, product of:
                3.5859025 = boost
                9.174609 = idf(docFreq=11, maxDocs=42596)
                0.01526757 = queryNorm
              1.4335327 = fieldWeight in 1140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.174609 = idf(docFreq=11, maxDocs=42596)
                0.15625 = fieldNorm(doc=1140)
          0.38830903 = weight(abstract_txt:clusters in 1140) [ClassicSimilarity], result of:
            0.38830903 = score(doc=1140,freq=1.0), product of:
              0.38094503 = queryWeight, product of:
                3.8246992 = boost
                6.5237174 = idf(docFreq=169, maxDocs=42596)
                0.01526757 = queryNorm
              1.0193309 = fieldWeight in 1140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5237174 = idf(docFreq=169, maxDocs=42596)
                0.15625 = fieldNorm(doc=1140)
        0.16 = coord(4/25)
    
  2. Mohan, K.C.: Boolean and nearest neighbour text searching in a multi-strategy retrieval system (1996) 0.21
    0.2050484 = sum of:
      0.2050484 = product of:
        1.025242 = sum of:
          0.043724637 = weight(abstract_txt:effectiveness in 325) [ClassicSimilarity], result of:
            0.043724637 = score(doc=325,freq=1.0), product of:
              0.07812482 = queryWeight, product of:
                5.1170435 = idf(docFreq=693, maxDocs=42596)
                0.01526757 = queryNorm
              0.55967665 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1170435 = idf(docFreq=693, maxDocs=42596)
                0.109375 = fieldNorm(doc=325)
          0.09109941 = weight(abstract_txt:explained in 325) [ClassicSimilarity], result of:
            0.09109941 = score(doc=325,freq=1.0), product of:
              0.12744279 = queryWeight, product of:
                1.2772124 = boost
                6.5355515 = idf(docFreq=167, maxDocs=42596)
                0.01526757 = queryNorm
              0.7148259 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5355515 = idf(docFreq=167, maxDocs=42596)
                0.109375 = fieldNorm(doc=325)
          0.031364612 = weight(abstract_txt:been in 325) [ClassicSimilarity], result of:
            0.031364612 = score(doc=325,freq=1.0), product of:
              0.078875385 = queryWeight, product of:
                1.4209907 = boost
                3.6356356 = idf(docFreq=3052, maxDocs=42596)
                0.01526757 = queryNorm
              0.39764765 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6356356 = idf(docFreq=3052, maxDocs=42596)
                0.109375 = fieldNorm(doc=325)
          0.35501724 = weight(abstract_txt:nearest in 325) [ClassicSimilarity], result of:
            0.35501724 = score(doc=325,freq=1.0), product of:
              0.39763188 = queryWeight, product of:
                3.1905172 = boost
                8.163008 = idf(docFreq=32, maxDocs=42596)
                0.01526757 = queryNorm
              0.89282894 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.163008 = idf(docFreq=32, maxDocs=42596)
                0.109375 = fieldNorm(doc=325)
          0.50403607 = weight(abstract_txt:neighbour in 325) [ClassicSimilarity], result of:
            0.50403607 = score(doc=325,freq=1.0), product of:
              0.5022916 = queryWeight, product of:
                3.5859025 = boost
                9.174609 = idf(docFreq=11, maxDocs=42596)
                0.01526757 = queryNorm
              1.0034729 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.174609 = idf(docFreq=11, maxDocs=42596)
                0.109375 = fieldNorm(doc=325)
        0.2 = coord(5/25)
    
  3. Small, H.G.: Structural dynamics of scientific literature (2015) 0.20
    0.19940361 = sum of:
      0.19940361 = product of:
        0.71215576 = sum of:
          0.052580915 = weight(abstract_txt:observed in 3357) [ClassicSimilarity], result of:
            0.052580915 = score(doc=3357,freq=1.0), product of:
              0.110562794 = queryWeight, product of:
                1.1896248 = boost
                6.087362 = idf(docFreq=262, maxDocs=42596)
                0.01526757 = queryNorm
              0.47557515 = fieldWeight in 3357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.087362 = idf(docFreq=262, maxDocs=42596)
                0.078125 = fieldNorm(doc=3357)
          0.06692133 = weight(abstract_txt:procedure in 3357) [ClassicSimilarity], result of:
            0.06692133 = score(doc=3357,freq=1.0), product of:
              0.1298474 = queryWeight, product of:
                1.2892054 = boost
                6.5969205 = idf(docFreq=157, maxDocs=42596)
                0.01526757 = queryNorm
              0.51538444 = fieldWeight in 3357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5969205 = idf(docFreq=157, maxDocs=42596)
                0.078125 = fieldNorm(doc=3357)
          0.009615577 = weight(abstract_txt:that in 3357) [ClassicSimilarity], result of:
            0.009615577 = score(doc=3357,freq=1.0), product of:
              0.051374495 = queryWeight, product of:
                1.4045588 = boost
                2.3957293 = idf(docFreq=10548, maxDocs=42596)
                0.01526757 = queryNorm
              0.18716635 = fieldWeight in 3357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3957293 = idf(docFreq=10548, maxDocs=42596)
                0.078125 = fieldNorm(doc=3357)
          0.022403294 = weight(abstract_txt:been in 3357) [ClassicSimilarity], result of:
            0.022403294 = score(doc=3357,freq=1.0), product of:
              0.078875385 = queryWeight, product of:
                1.4209907 = boost
                3.6356356 = idf(docFreq=3052, maxDocs=42596)
                0.01526757 = queryNorm
              0.28403404 = fieldWeight in 3357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6356356 = idf(docFreq=3052, maxDocs=42596)
                0.078125 = fieldNorm(doc=3357)
          0.036516316 = weight(abstract_txt:document in 3357) [ClassicSimilarity], result of:
            0.036516316 = score(doc=3357,freq=1.0), product of:
              0.10924248 = queryWeight, product of:
                1.672308 = boost
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.01526757 = queryNorm
              0.33426848 = fieldWeight in 3357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.078125 = fieldNorm(doc=3357)
          0.089975655 = weight(abstract_txt:obtained in 3357) [ClassicSimilarity], result of:
            0.089975655 = score(doc=3357,freq=1.0), product of:
              0.19928913 = queryWeight, product of:
                2.2587194 = boost
                5.7789826 = idf(docFreq=357, maxDocs=42596)
                0.01526757 = queryNorm
              0.451483 = fieldWeight in 3357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7789826 = idf(docFreq=357, maxDocs=42596)
                0.078125 = fieldNorm(doc=3357)
          0.43414268 = weight(abstract_txt:clusters in 3357) [ClassicSimilarity], result of:
            0.43414268 = score(doc=3357,freq=5.0), product of:
              0.38094503 = queryWeight, product of:
                3.8246992 = boost
                6.5237174 = idf(docFreq=169, maxDocs=42596)
                0.01526757 = queryNorm
              1.1396465 = fieldWeight in 3357, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.5237174 = idf(docFreq=169, maxDocs=42596)
                0.078125 = fieldNorm(doc=3357)
        0.28 = coord(7/25)
    
  4. Al-Hawamdeh, S.; Smith, G.; Willett, P.; Vere, R. de: Using nearest-neighbour searching techniques to access full-text documents (1991) 0.15
    0.14778207 = sum of:
      0.14778207 = product of:
        0.923638 = sum of:
          0.013461808 = weight(abstract_txt:that in 2300) [ClassicSimilarity], result of:
            0.013461808 = score(doc=2300,freq=1.0), product of:
              0.051374495 = queryWeight, product of:
                1.4045588 = boost
                2.3957293 = idf(docFreq=10548, maxDocs=42596)
                0.01526757 = queryNorm
              0.2620329 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3957293 = idf(docFreq=10548, maxDocs=42596)
                0.109375 = fieldNorm(doc=2300)
          0.05112284 = weight(abstract_txt:document in 2300) [ClassicSimilarity], result of:
            0.05112284 = score(doc=2300,freq=1.0), product of:
              0.10924248 = queryWeight, product of:
                1.672308 = boost
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.01526757 = queryNorm
              0.46797585 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.109375 = fieldNorm(doc=2300)
          0.35501724 = weight(abstract_txt:nearest in 2300) [ClassicSimilarity], result of:
            0.35501724 = score(doc=2300,freq=1.0), product of:
              0.39763188 = queryWeight, product of:
                3.1905172 = boost
                8.163008 = idf(docFreq=32, maxDocs=42596)
                0.01526757 = queryNorm
              0.89282894 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.163008 = idf(docFreq=32, maxDocs=42596)
                0.109375 = fieldNorm(doc=2300)
          0.50403607 = weight(abstract_txt:neighbour in 2300) [ClassicSimilarity], result of:
            0.50403607 = score(doc=2300,freq=1.0), product of:
              0.5022916 = queryWeight, product of:
                3.5859025 = boost
                9.174609 = idf(docFreq=11, maxDocs=42596)
                0.01526757 = queryNorm
              1.0034729 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.174609 = idf(docFreq=11, maxDocs=42596)
                0.109375 = fieldNorm(doc=2300)
        0.16 = coord(4/25)
    
  5. Rasmussen, E.: Clustering algorithms (1992) 0.13
    0.12601247 = sum of:
      0.12601247 = product of:
        0.525052 = sum of:
          0.024985507 = weight(abstract_txt:effectiveness in 4514) [ClassicSimilarity], result of:
            0.024985507 = score(doc=4514,freq=1.0), product of:
              0.07812482 = queryWeight, product of:
                5.1170435 = idf(docFreq=693, maxDocs=42596)
                0.01526757 = queryNorm
              0.31981522 = fieldWeight in 4514, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1170435 = idf(docFreq=693, maxDocs=42596)
                0.0625 = fieldNorm(doc=4514)
          0.010878783 = weight(abstract_txt:that in 4514) [ClassicSimilarity], result of:
            0.010878783 = score(doc=4514,freq=2.0), product of:
              0.051374495 = queryWeight, product of:
                1.4045588 = boost
                2.3957293 = idf(docFreq=10548, maxDocs=42596)
                0.01526757 = queryNorm
              0.21175455 = fieldWeight in 4514, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3957293 = idf(docFreq=10548, maxDocs=42596)
                0.0625 = fieldNorm(doc=4514)
          0.025346434 = weight(abstract_txt:been in 4514) [ClassicSimilarity], result of:
            0.025346434 = score(doc=4514,freq=2.0), product of:
              0.078875385 = queryWeight, product of:
                1.4209907 = boost
                3.6356356 = idf(docFreq=3052, maxDocs=42596)
                0.01526757 = queryNorm
              0.32134783 = fieldWeight in 4514, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6356356 = idf(docFreq=3052, maxDocs=42596)
                0.0625 = fieldNorm(doc=4514)
          0.041313495 = weight(abstract_txt:document in 4514) [ClassicSimilarity], result of:
            0.041313495 = score(doc=4514,freq=2.0), product of:
              0.10924248 = queryWeight, product of:
                1.672308 = boost
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.01526757 = queryNorm
              0.3781816 = fieldWeight in 4514, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.0625 = fieldNorm(doc=4514)
          0.20286702 = weight(abstract_txt:nearest in 4514) [ClassicSimilarity], result of:
            0.20286702 = score(doc=4514,freq=1.0), product of:
              0.39763188 = queryWeight, product of:
                3.1905172 = boost
                8.163008 = idf(docFreq=32, maxDocs=42596)
                0.01526757 = queryNorm
              0.510188 = fieldWeight in 4514, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.163008 = idf(docFreq=32, maxDocs=42596)
                0.0625 = fieldNorm(doc=4514)
          0.21966074 = weight(abstract_txt:clusters in 4514) [ClassicSimilarity], result of:
            0.21966074 = score(doc=4514,freq=2.0), product of:
              0.38094503 = queryWeight, product of:
                3.8246992 = boost
                6.5237174 = idf(docFreq=169, maxDocs=42596)
                0.01526757 = queryNorm
              0.5766206 = fieldWeight in 4514, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5237174 = idf(docFreq=169, maxDocs=42596)
                0.0625 = fieldNorm(doc=4514)
        0.24 = coord(6/25)