Document (#5818)

Author
Shaw, R.J.
Willett, P.
Title
On the non-random nature of nearest-neighbour document clusters
Source
Information processing and management. 29(1993) no.4, S.449-452
Year
1993
Abstract
It has been suggested that the observed values of retrieval effectiveness that are obtained in searches of files of nearest-neighbour clusters can be explained by assuming that the pairwise inter-document similarities used to construct the clusters have been generated randomly. Such similarities are significantly different from those obtained by a random generation procedure

Similar documents (author)

  1. Willett, P.: Recent trends in hierarchic document clustering : a critical review (1988) 1.78
    1.7846438 = sum of:
      1.7846438 = product of:
        3.5692875 = sum of:
          3.5692875 = weight(author_txt:willett in 2604) [ClassicSimilarity], result of:
            3.5692875 = score(doc=2604,freq=1.0), product of:
              0.71174586 = queryWeight, product of:
                1.0066043 = boost
                8.023735 = idf(docFreq=36, maxDocs=41550)
                0.08812307 = queryNorm
              5.0148344 = fieldWeight in 2604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.023735 = idf(docFreq=36, maxDocs=41550)
                0.625 = fieldNorm(doc=2604)
        0.5 = coord(1/2)
    
  2. Willett, P.: Best-match text retrieval (1993) 1.78
    1.7846438 = sum of:
      1.7846438 = product of:
        3.5692875 = sum of:
          3.5692875 = weight(author_txt:willett in 7818) [ClassicSimilarity], result of:
            3.5692875 = score(doc=7818,freq=1.0), product of:
              0.71174586 = queryWeight, product of:
                1.0066043 = boost
                8.023735 = idf(docFreq=36, maxDocs=41550)
                0.08812307 = queryNorm
              5.0148344 = fieldWeight in 7818, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.023735 = idf(docFreq=36, maxDocs=41550)
                0.625 = fieldNorm(doc=7818)
        0.5 = coord(1/2)
    
  3. Willett, P.: From chemical documentation to chemoinformatics : 50 years of chemical information science (2009) 1.78
    1.7846438 = sum of:
      1.7846438 = product of:
        3.5692875 = sum of:
          3.5692875 = weight(author_txt:willett in 656) [ClassicSimilarity], result of:
            3.5692875 = score(doc=656,freq=1.0), product of:
              0.71174586 = queryWeight, product of:
                1.0066043 = boost
                8.023735 = idf(docFreq=36, maxDocs=41550)
                0.08812307 = queryNorm
              5.0148344 = fieldWeight in 656, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.023735 = idf(docFreq=36, maxDocs=41550)
                0.625 = fieldNorm(doc=656)
        0.5 = coord(1/2)
    
  4. Shaw, R.R.: Classification systems (1962/63) 1.75
    1.7497468 = sum of:
      1.7497468 = product of:
        3.4994936 = sum of:
          3.4994936 = weight(author_txt:shaw in 603) [ClassicSimilarity], result of:
            3.4994936 = score(doc=603,freq=1.0), product of:
              0.70243704 = queryWeight, product of:
                7.9710913 = idf(docFreq=38, maxDocs=41550)
                0.08812307 = queryNorm
              4.981932 = fieldWeight in 603, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9710913 = idf(docFreq=38, maxDocs=41550)
                0.625 = fieldNorm(doc=603)
        0.5 = coord(1/2)
    
  5. Shaw, W.M.: Subject and citation indexing : pt.1: the clustering structure of composite representations in the cystic fibrosis document collection (1991) 1.75
    1.7497468 = sum of:
      1.7497468 = product of:
        3.4994936 = sum of:
          3.4994936 = weight(author_txt:shaw in 4841) [ClassicSimilarity], result of:
            3.4994936 = score(doc=4841,freq=1.0), product of:
              0.70243704 = queryWeight, product of:
                7.9710913 = idf(docFreq=38, maxDocs=41550)
                0.08812307 = queryNorm
              4.981932 = fieldWeight in 4841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9710913 = idf(docFreq=38, maxDocs=41550)
                0.625 = fieldNorm(doc=4841)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Sembok, T.M.T.; Rijsbergen, C.J. van: IMAGING: a relevant feedback retrieval with nearest neighbour clusters (1994) 0.29
    0.28714588 = sum of:
      0.28714588 = product of:
        1.7946619 = sum of:
          0.1811389 = weight(abstract_txt:obtained in 1140) [ClassicSimilarity], result of:
            0.1811389 = score(doc=1140,freq=1.0), product of:
              0.20008402 = queryWeight, product of:
                2.262001 = boost
                5.794011 = idf(docFreq=343, maxDocs=41550)
                0.015266528 = queryNorm
              0.9053142 = fieldWeight in 1140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.794011 = idf(docFreq=343, maxDocs=41550)
                0.15625 = fieldNorm(doc=1140)
          0.51359534 = weight(abstract_txt:nearest in 1140) [ClassicSimilarity], result of:
            0.51359534 = score(doc=1140,freq=1.0), product of:
              0.40082237 = queryWeight, product of:
                3.2015667 = boost
                8.200665 = idf(docFreq=30, maxDocs=41550)
                0.015266528 = queryNorm
              1.281354 = fieldWeight in 1140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.200665 = idf(docFreq=30, maxDocs=41550)
                0.15625 = fieldNorm(doc=1140)
          0.7133471 = weight(abstract_txt:neighbour in 1140) [ClassicSimilarity], result of:
            0.7133471 = score(doc=1140,freq=1.0), product of:
              0.498967 = queryWeight, product of:
                3.5720909 = boost
                9.149746 = idf(docFreq=11, maxDocs=41550)
                0.015266528 = queryNorm
              1.4296478 = fieldWeight in 1140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.149746 = idf(docFreq=11, maxDocs=41550)
                0.15625 = fieldNorm(doc=1140)
          0.38658056 = weight(abstract_txt:clusters in 1140) [ClassicSimilarity], result of:
            0.38658056 = score(doc=1140,freq=1.0), product of:
              0.37966013 = queryWeight, product of:
                3.8161876 = boost
                6.516659 = idf(docFreq=166, maxDocs=41550)
                0.015266528 = queryNorm
              1.0182279 = fieldWeight in 1140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.516659 = idf(docFreq=166, maxDocs=41550)
                0.15625 = fieldNorm(doc=1140)
        0.16 = coord(4/25)
    
  2. Mohan, K.C.: Boolean and nearest neighbour text searching in a multi-strategy retrieval system (1996) 0.21
    0.20506501 = sum of:
      0.20506501 = product of:
        1.0253251 = sum of:
          0.043821916 = weight(abstract_txt:effectiveness in 324) [ClassicSimilarity], result of:
            0.043821916 = score(doc=324,freq=1.0), product of:
              0.07820901 = queryWeight, product of:
                5.1229076 = idf(docFreq=672, maxDocs=41550)
                0.015266528 = queryNorm
              0.560318 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1229076 = idf(docFreq=672, maxDocs=41550)
                0.109375 = fieldNorm(doc=324)
          0.09095697 = weight(abstract_txt:explained in 324) [ClassicSimilarity], result of:
            0.09095697 = score(doc=324,freq=1.0), product of:
              0.12725842 = queryWeight, product of:
                1.275601 = boost
                6.534786 = idf(docFreq=163, maxDocs=41550)
                0.015266528 = queryNorm
              0.71474224 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.534786 = idf(docFreq=163, maxDocs=41550)
                0.109375 = fieldNorm(doc=324)
          0.03168656 = weight(abstract_txt:been in 324) [ClassicSimilarity], result of:
            0.03168656 = score(doc=324,freq=1.0), product of:
              0.0793821 = queryWeight, product of:
                1.4247802 = boost
                3.649509 = idf(docFreq=2936, maxDocs=41550)
                0.015266528 = queryNorm
              0.39916503 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.649509 = idf(docFreq=2936, maxDocs=41550)
                0.109375 = fieldNorm(doc=324)
          0.35951674 = weight(abstract_txt:nearest in 324) [ClassicSimilarity], result of:
            0.35951674 = score(doc=324,freq=1.0), product of:
              0.40082237 = queryWeight, product of:
                3.2015667 = boost
                8.200665 = idf(docFreq=30, maxDocs=41550)
                0.015266528 = queryNorm
              0.8969478 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.200665 = idf(docFreq=30, maxDocs=41550)
                0.109375 = fieldNorm(doc=324)
          0.49934292 = weight(abstract_txt:neighbour in 324) [ClassicSimilarity], result of:
            0.49934292 = score(doc=324,freq=1.0), product of:
              0.498967 = queryWeight, product of:
                3.5720909 = boost
                9.149746 = idf(docFreq=11, maxDocs=41550)
                0.015266528 = queryNorm
              1.0007534 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.149746 = idf(docFreq=11, maxDocs=41550)
                0.109375 = fieldNorm(doc=324)
        0.2 = coord(5/25)
    
  3. Small, H.G.: Structural dynamics of scientific literature (2015) 0.20
    0.19924925 = sum of:
      0.19924925 = product of:
        0.7116045 = sum of:
          0.05339844 = weight(abstract_txt:observed in 4356) [ClassicSimilarity], result of:
            0.05339844 = score(doc=4356,freq=1.0), product of:
              0.11166069 = queryWeight, product of:
                1.194873 = boost
                6.121224 = idf(docFreq=247, maxDocs=41550)
                0.015266528 = queryNorm
              0.4782206 = fieldWeight in 4356, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.121224 = idf(docFreq=247, maxDocs=41550)
                0.078125 = fieldNorm(doc=4356)
          0.06666727 = weight(abstract_txt:procedure in 4356) [ClassicSimilarity], result of:
            0.06666727 = score(doc=4356,freq=1.0), product of:
              0.12946619 = queryWeight, product of:
                1.2866185 = boost
                6.5912275 = idf(docFreq=154, maxDocs=41550)
                0.015266528 = queryNorm
              0.51493967 = fieldWeight in 4356, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5912275 = idf(docFreq=154, maxDocs=41550)
                0.078125 = fieldNorm(doc=4356)
          0.009742043 = weight(abstract_txt:that in 4356) [ClassicSimilarity], result of:
            0.009742043 = score(doc=4356,freq=1.0), product of:
              0.051803015 = queryWeight, product of:
                1.409645 = boost
                2.4071603 = idf(docFreq=10172, maxDocs=41550)
                0.015266528 = queryNorm
              0.18805939 = fieldWeight in 4356, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4071603 = idf(docFreq=10172, maxDocs=41550)
                0.078125 = fieldNorm(doc=4356)
          0.022633256 = weight(abstract_txt:been in 4356) [ClassicSimilarity], result of:
            0.022633256 = score(doc=4356,freq=1.0), product of:
              0.0793821 = queryWeight, product of:
                1.4247802 = boost
                3.649509 = idf(docFreq=2936, maxDocs=41550)
                0.015266528 = queryNorm
              0.2851179 = fieldWeight in 4356, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.649509 = idf(docFreq=2936, maxDocs=41550)
                0.078125 = fieldNorm(doc=4356)
          0.03638384 = weight(abstract_txt:document in 4356) [ClassicSimilarity], result of:
            0.03638384 = score(doc=4356,freq=1.0), product of:
              0.10893404 = queryWeight, product of:
                1.6690463 = boost
                4.275185 = idf(docFreq=1570, maxDocs=41550)
                0.015266528 = queryNorm
              0.33399883 = fieldWeight in 4356, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.275185 = idf(docFreq=1570, maxDocs=41550)
                0.078125 = fieldNorm(doc=4356)
          0.09056945 = weight(abstract_txt:obtained in 4356) [ClassicSimilarity], result of:
            0.09056945 = score(doc=4356,freq=1.0), product of:
              0.20008402 = queryWeight, product of:
                2.262001 = boost
                5.794011 = idf(docFreq=343, maxDocs=41550)
                0.015266528 = queryNorm
              0.4526571 = fieldWeight in 4356, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.794011 = idf(docFreq=343, maxDocs=41550)
                0.078125 = fieldNorm(doc=4356)
          0.43221018 = weight(abstract_txt:clusters in 4356) [ClassicSimilarity], result of:
            0.43221018 = score(doc=4356,freq=5.0), product of:
              0.37966013 = queryWeight, product of:
                3.8161876 = boost
                6.516659 = idf(docFreq=166, maxDocs=41550)
                0.015266528 = queryNorm
              1.1384134 = fieldWeight in 4356, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.516659 = idf(docFreq=166, maxDocs=41550)
                0.078125 = fieldNorm(doc=4356)
        0.28 = coord(7/25)
    
  4. Al-Hawamdeh, S.; Smith, G.; Willett, P.; Vere, R. de: Using nearest-neighbour searching techniques to access full-text documents (1991) 0.15
    0.14774975 = sum of:
      0.14774975 = product of:
        0.9234359 = sum of:
          0.0136388615 = weight(abstract_txt:that in 2300) [ClassicSimilarity], result of:
            0.0136388615 = score(doc=2300,freq=1.0), product of:
              0.051803015 = queryWeight, product of:
                1.409645 = boost
                2.4071603 = idf(docFreq=10172, maxDocs=41550)
                0.015266528 = queryNorm
              0.26328316 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4071603 = idf(docFreq=10172, maxDocs=41550)
                0.109375 = fieldNorm(doc=2300)
          0.05093738 = weight(abstract_txt:document in 2300) [ClassicSimilarity], result of:
            0.05093738 = score(doc=2300,freq=1.0), product of:
              0.10893404 = queryWeight, product of:
                1.6690463 = boost
                4.275185 = idf(docFreq=1570, maxDocs=41550)
                0.015266528 = queryNorm
              0.46759838 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.275185 = idf(docFreq=1570, maxDocs=41550)
                0.109375 = fieldNorm(doc=2300)
          0.35951674 = weight(abstract_txt:nearest in 2300) [ClassicSimilarity], result of:
            0.35951674 = score(doc=2300,freq=1.0), product of:
              0.40082237 = queryWeight, product of:
                3.2015667 = boost
                8.200665 = idf(docFreq=30, maxDocs=41550)
                0.015266528 = queryNorm
              0.8969478 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.200665 = idf(docFreq=30, maxDocs=41550)
                0.109375 = fieldNorm(doc=2300)
          0.49934292 = weight(abstract_txt:neighbour in 2300) [ClassicSimilarity], result of:
            0.49934292 = score(doc=2300,freq=1.0), product of:
              0.498967 = queryWeight, product of:
                3.5720909 = boost
                9.149746 = idf(docFreq=11, maxDocs=41550)
                0.015266528 = queryNorm
              1.0007534 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.149746 = idf(docFreq=11, maxDocs=41550)
                0.109375 = fieldNorm(doc=2300)
        0.16 = coord(4/25)
    
  5. Rasmussen, E.: Clustering algorithms (1992) 0.13
    0.12646903 = sum of:
      0.12646903 = product of:
        0.5269543 = sum of:
          0.025041096 = weight(abstract_txt:effectiveness in 4991) [ClassicSimilarity], result of:
            0.025041096 = score(doc=4991,freq=1.0), product of:
              0.07820901 = queryWeight, product of:
                5.1229076 = idf(docFreq=672, maxDocs=41550)
                0.015266528 = queryNorm
              0.32018173 = fieldWeight in 4991, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1229076 = idf(docFreq=672, maxDocs=41550)
                0.0625 = fieldNorm(doc=4991)
          0.011021865 = weight(abstract_txt:that in 4991) [ClassicSimilarity], result of:
            0.011021865 = score(doc=4991,freq=2.0), product of:
              0.051803015 = queryWeight, product of:
                1.409645 = boost
                2.4071603 = idf(docFreq=10172, maxDocs=41550)
                0.015266528 = queryNorm
              0.21276492 = fieldWeight in 4991, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4071603 = idf(docFreq=10172, maxDocs=41550)
                0.0625 = fieldNorm(doc=4991)
          0.025606604 = weight(abstract_txt:been in 4991) [ClassicSimilarity], result of:
            0.025606604 = score(doc=4991,freq=2.0), product of:
              0.0793821 = queryWeight, product of:
                1.4247802 = boost
                3.649509 = idf(docFreq=2936, maxDocs=41550)
                0.015266528 = queryNorm
              0.32257405 = fieldWeight in 4991, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.649509 = idf(docFreq=2936, maxDocs=41550)
                0.0625 = fieldNorm(doc=4991)
          0.04116362 = weight(abstract_txt:document in 4991) [ClassicSimilarity], result of:
            0.04116362 = score(doc=4991,freq=2.0), product of:
              0.10893404 = queryWeight, product of:
                1.6690463 = boost
                4.275185 = idf(docFreq=1570, maxDocs=41550)
                0.015266528 = queryNorm
              0.37787655 = fieldWeight in 4991, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.275185 = idf(docFreq=1570, maxDocs=41550)
                0.0625 = fieldNorm(doc=4991)
          0.20543814 = weight(abstract_txt:nearest in 4991) [ClassicSimilarity], result of:
            0.20543814 = score(doc=4991,freq=1.0), product of:
              0.40082237 = queryWeight, product of:
                3.2015667 = boost
                8.200665 = idf(docFreq=30, maxDocs=41550)
                0.015266528 = queryNorm
              0.5125416 = fieldWeight in 4991, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.200665 = idf(docFreq=30, maxDocs=41550)
                0.0625 = fieldNorm(doc=4991)
          0.21868297 = weight(abstract_txt:clusters in 4991) [ClassicSimilarity], result of:
            0.21868297 = score(doc=4991,freq=2.0), product of:
              0.37966013 = queryWeight, product of:
                3.8161876 = boost
                6.516659 = idf(docFreq=166, maxDocs=41550)
                0.015266528 = queryNorm
              0.5759967 = fieldWeight in 4991, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.516659 = idf(docFreq=166, maxDocs=41550)
                0.0625 = fieldNorm(doc=4991)
        0.24 = coord(6/25)