Document (#5818)

Author
Shaw, R.J.
Willett, P.
Title
On the non-random nature of nearest-neighbour document clusters
Source
Information processing and management. 29(1993) no.4, S.449-452
Year
1993
Abstract
It has been suggested that the observed values of retrieval effectiveness that are obtained in searches of files of nearest-neighbour clusters can be explained by assuming that the pairwise inter-document similarities used to construct the clusters have been generated randomly. Such similarities are significantly different from those obtained by a random generation procedure

Similar documents (author)

  1. Willett, P.: Recent trends in hierarchic document clustering : a critical review (1988) 1.78
    1.7750387 = sum of:
      1.7750387 = product of:
        3.5500774 = sum of:
          3.5500774 = weight(author_txt:willett in 2604) [ClassicSimilarity], result of:
            3.5500774 = score(doc=2604,freq=1.0), product of:
              0.70940065 = queryWeight, product of:
                1.0032547 = boost
                8.006933 = idf(docFreq=37, maxDocs=41962)
                0.088310875 = queryNorm
              5.0043335 = fieldWeight in 2604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.006933 = idf(docFreq=37, maxDocs=41962)
                0.625 = fieldNorm(doc=2604)
        0.5 = coord(1/2)
    
  2. Willett, P.: Best-match text retrieval (1993) 1.78
    1.7750387 = sum of:
      1.7750387 = product of:
        3.5500774 = sum of:
          3.5500774 = weight(author_txt:willett in 7818) [ClassicSimilarity], result of:
            3.5500774 = score(doc=7818,freq=1.0), product of:
              0.70940065 = queryWeight, product of:
                1.0032547 = boost
                8.006933 = idf(docFreq=37, maxDocs=41962)
                0.088310875 = queryNorm
              5.0043335 = fieldWeight in 7818, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.006933 = idf(docFreq=37, maxDocs=41962)
                0.625 = fieldNorm(doc=7818)
        0.5 = coord(1/2)
    
  3. Willett, P.: From chemical documentation to chemoinformatics : 50 years of chemical information science (2009) 1.78
    1.7750387 = sum of:
      1.7750387 = product of:
        3.5500774 = sum of:
          3.5500774 = weight(author_txt:willett in 657) [ClassicSimilarity], result of:
            3.5500774 = score(doc=657,freq=1.0), product of:
              0.70940065 = queryWeight, product of:
                1.0032547 = boost
                8.006933 = idf(docFreq=37, maxDocs=41962)
                0.088310875 = queryNorm
              5.0043335 = fieldWeight in 657, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.006933 = idf(docFreq=37, maxDocs=41962)
                0.625 = fieldNorm(doc=657)
        0.5 = coord(1/2)
    
  4. Shaw, R.R.: Classification systems (1962/63) 1.76
    1.7578194 = sum of:
      1.7578194 = product of:
        3.5156388 = sum of:
          3.5156388 = weight(author_txt:shaw in 603) [ClassicSimilarity], result of:
            3.5156388 = score(doc=603,freq=1.0), product of:
              0.7048054 = queryWeight, product of:
                7.980958 = idf(docFreq=38, maxDocs=41962)
                0.088310875 = queryNorm
              4.9880986 = fieldWeight in 603, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.980958 = idf(docFreq=38, maxDocs=41962)
                0.625 = fieldNorm(doc=603)
        0.5 = coord(1/2)
    
  5. Shaw, W.M.: Subject and citation indexing : pt.1: the clustering structure of composite representations in the cystic fibrosis document collection (1991) 1.76
    1.7578194 = sum of:
      1.7578194 = product of:
        3.5156388 = sum of:
          3.5156388 = weight(author_txt:shaw in 4841) [ClassicSimilarity], result of:
            3.5156388 = score(doc=4841,freq=1.0), product of:
              0.7048054 = queryWeight, product of:
                7.980958 = idf(docFreq=38, maxDocs=41962)
                0.088310875 = queryNorm
              4.9880986 = fieldWeight in 4841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.980958 = idf(docFreq=38, maxDocs=41962)
                0.625 = fieldNorm(doc=4841)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Sembok, T.M.T.; Rijsbergen, C.J. van: IMAGING: a relevant feedback retrieval with nearest neighbour clusters (1994) 0.29
    0.28692663 = sum of:
      0.28692663 = product of:
        1.7932914 = sum of:
          0.18134685 = weight(abstract_txt:obtained in 1140) [ClassicSimilarity], result of:
            0.18134685 = score(doc=1140,freq=1.0), product of:
              0.20017311 = queryWeight, product of:
                2.261198 = boost
                5.798081 = idf(docFreq=345, maxDocs=41962)
                0.015268025 = queryNorm
              0.9059501 = fieldWeight in 1140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.798081 = idf(docFreq=345, maxDocs=41962)
                0.15625 = fieldNorm(doc=1140)
          0.5090065 = weight(abstract_txt:nearest in 1140) [ClassicSimilarity], result of:
            0.5090065 = score(doc=1140,freq=1.0), product of:
              0.39830396 = queryWeight, product of:
                3.18965 = boost
                8.178783 = idf(docFreq=31, maxDocs=41962)
                0.015268025 = queryNorm
              1.2779349 = fieldWeight in 1140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.178783 = idf(docFreq=31, maxDocs=41962)
                0.15625 = fieldNorm(doc=1140)
          0.71497124 = weight(abstract_txt:neighbour in 1140) [ClassicSimilarity], result of:
            0.71497124 = score(doc=1140,freq=1.0), product of:
              0.49956432 = queryWeight, product of:
                3.5721645 = boost
                9.159613 = idf(docFreq=11, maxDocs=41962)
                0.015268025 = queryNorm
              1.4311895 = fieldWeight in 1140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.159613 = idf(docFreq=11, maxDocs=41962)
                0.15625 = fieldNorm(doc=1140)
          0.3879669 = weight(abstract_txt:clusters in 1140) [ClassicSimilarity], result of:
            0.3879669 = score(doc=1140,freq=1.0), product of:
              0.38044563 = queryWeight, product of:
                3.8179276 = boost
                6.526526 = idf(docFreq=166, maxDocs=41962)
                0.015268025 = queryNorm
              1.0197697 = fieldWeight in 1140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.526526 = idf(docFreq=166, maxDocs=41962)
                0.15625 = fieldNorm(doc=1140)
        0.16 = coord(4/25)
    
  2. Mohan, K.C.: Boolean and nearest neighbour text searching in a multi-strategy retrieval system (1996) 0.20
    0.20475142 = sum of:
      0.20475142 = product of:
        1.0237571 = sum of:
          0.043918982 = weight(abstract_txt:effectiveness in 325) [ClassicSimilarity], result of:
            0.043918982 = score(doc=325,freq=1.0), product of:
              0.07829942 = queryWeight, product of:
                5.1283264 = idf(docFreq=675, maxDocs=41962)
                0.015268025 = queryNorm
              0.5609107 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1283264 = idf(docFreq=675, maxDocs=41962)
                0.109375 = fieldNorm(doc=325)
          0.09128202 = weight(abstract_txt:explained in 325) [ClassicSimilarity], result of:
            0.09128202 = score(doc=325,freq=1.0), product of:
              0.12752065 = queryWeight, product of:
                1.2761773 = boost
                6.5446534 = idf(docFreq=163, maxDocs=41962)
                0.015268025 = queryNorm
              0.71582144 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5446534 = idf(docFreq=163, maxDocs=41962)
                0.109375 = fieldNorm(doc=325)
          0.031771705 = weight(abstract_txt:been in 325) [ClassicSimilarity], result of:
            0.031771705 = score(doc=325,freq=1.0), product of:
              0.07949882 = queryWeight, product of:
                1.425004 = boost
                3.6539428 = idf(docFreq=2952, maxDocs=41962)
                0.015268025 = queryNorm
              0.39965 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6539428 = idf(docFreq=2952, maxDocs=41962)
                0.109375 = fieldNorm(doc=325)
          0.35630456 = weight(abstract_txt:nearest in 325) [ClassicSimilarity], result of:
            0.35630456 = score(doc=325,freq=1.0), product of:
              0.39830396 = queryWeight, product of:
                3.18965 = boost
                8.178783 = idf(docFreq=31, maxDocs=41962)
                0.015268025 = queryNorm
              0.89455444 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.178783 = idf(docFreq=31, maxDocs=41962)
                0.109375 = fieldNorm(doc=325)
          0.5004798 = weight(abstract_txt:neighbour in 325) [ClassicSimilarity], result of:
            0.5004798 = score(doc=325,freq=1.0), product of:
              0.49956432 = queryWeight, product of:
                3.5721645 = boost
                9.159613 = idf(docFreq=11, maxDocs=41962)
                0.015268025 = queryNorm
              1.0018326 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.159613 = idf(docFreq=11, maxDocs=41962)
                0.109375 = fieldNorm(doc=325)
        0.2 = coord(5/25)
    
  3. Small, H.G.: Structural dynamics of scientific literature (2015) 0.20
    0.19987203 = sum of:
      0.19987203 = product of:
        0.7138287 = sum of:
          0.053500157 = weight(abstract_txt:observed in 4357) [ClassicSimilarity], result of:
            0.053500157 = score(doc=4357,freq=1.0), product of:
              0.111766696 = queryWeight, product of:
                1.1947497 = boost
                6.1270666 = idf(docFreq=248, maxDocs=41962)
                0.015268025 = queryNorm
              0.4786771 = fieldWeight in 4357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1270666 = idf(docFreq=248, maxDocs=41962)
                0.078125 = fieldNorm(doc=4357)
          0.066902936 = weight(abstract_txt:procedure in 4357) [ClassicSimilarity], result of:
            0.066902936 = score(doc=4357,freq=1.0), product of:
              0.12972963 = queryWeight, product of:
                1.287183 = boost
                6.6010947 = idf(docFreq=154, maxDocs=41962)
                0.015268025 = queryNorm
              0.51571053 = fieldWeight in 4357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6010947 = idf(docFreq=154, maxDocs=41962)
                0.078125 = fieldNorm(doc=4357)
          0.0097942315 = weight(abstract_txt:that in 4357) [ClassicSimilarity], result of:
            0.0097942315 = score(doc=4357,freq=1.0), product of:
              0.051971238 = queryWeight, product of:
                1.4111166 = boost
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.015268025 = queryNorm
              0.18845485 = fieldWeight in 4357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.078125 = fieldNorm(doc=4357)
          0.022694074 = weight(abstract_txt:been in 4357) [ClassicSimilarity], result of:
            0.022694074 = score(doc=4357,freq=1.0), product of:
              0.07949882 = queryWeight, product of:
                1.425004 = boost
                3.6539428 = idf(docFreq=2952, maxDocs=41962)
                0.015268025 = queryNorm
              0.2854643 = fieldWeight in 4357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6539428 = idf(docFreq=2952, maxDocs=41962)
                0.078125 = fieldNorm(doc=4357)
          0.03650362 = weight(abstract_txt:document in 4357) [ClassicSimilarity], result of:
            0.03650362 = score(doc=4357,freq=1.0), product of:
              0.10913809 = queryWeight, product of:
                1.6696441 = boost
                4.28124 = idf(docFreq=1576, maxDocs=41962)
                0.015268025 = queryNorm
              0.33447188 = fieldWeight in 4357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.28124 = idf(docFreq=1576, maxDocs=41962)
                0.078125 = fieldNorm(doc=4357)
          0.090673424 = weight(abstract_txt:obtained in 4357) [ClassicSimilarity], result of:
            0.090673424 = score(doc=4357,freq=1.0), product of:
              0.20017311 = queryWeight, product of:
                2.261198 = boost
                5.798081 = idf(docFreq=345, maxDocs=41962)
                0.015268025 = queryNorm
              0.45297506 = fieldWeight in 4357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.798081 = idf(docFreq=345, maxDocs=41962)
                0.078125 = fieldNorm(doc=4357)
          0.43376023 = weight(abstract_txt:clusters in 4357) [ClassicSimilarity], result of:
            0.43376023 = score(doc=4357,freq=5.0), product of:
              0.38044563 = queryWeight, product of:
                3.8179276 = boost
                6.526526 = idf(docFreq=166, maxDocs=41962)
                0.015268025 = queryNorm
              1.1401372 = fieldWeight in 4357, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.526526 = idf(docFreq=166, maxDocs=41962)
                0.078125 = fieldNorm(doc=4357)
        0.28 = coord(7/25)
    
  4. Al-Hawamdeh, S.; Smith, G.; Willett, P.; Vere, R. de: Using nearest-neighbour searching techniques to access full-text documents (1991) 0.15
    0.14745621 = sum of:
      0.14745621 = product of:
        0.92160136 = sum of:
          0.013711926 = weight(abstract_txt:that in 2300) [ClassicSimilarity], result of:
            0.013711926 = score(doc=2300,freq=1.0), product of:
              0.051971238 = queryWeight, product of:
                1.4111166 = boost
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.015268025 = queryNorm
              0.2638368 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.109375 = fieldNorm(doc=2300)
          0.051105067 = weight(abstract_txt:document in 2300) [ClassicSimilarity], result of:
            0.051105067 = score(doc=2300,freq=1.0), product of:
              0.10913809 = queryWeight, product of:
                1.6696441 = boost
                4.28124 = idf(docFreq=1576, maxDocs=41962)
                0.015268025 = queryNorm
              0.46826062 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.28124 = idf(docFreq=1576, maxDocs=41962)
                0.109375 = fieldNorm(doc=2300)
          0.35630456 = weight(abstract_txt:nearest in 2300) [ClassicSimilarity], result of:
            0.35630456 = score(doc=2300,freq=1.0), product of:
              0.39830396 = queryWeight, product of:
                3.18965 = boost
                8.178783 = idf(docFreq=31, maxDocs=41962)
                0.015268025 = queryNorm
              0.89455444 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.178783 = idf(docFreq=31, maxDocs=41962)
                0.109375 = fieldNorm(doc=2300)
          0.5004798 = weight(abstract_txt:neighbour in 2300) [ClassicSimilarity], result of:
            0.5004798 = score(doc=2300,freq=1.0), product of:
              0.49956432 = queryWeight, product of:
                3.5721645 = boost
                9.159613 = idf(docFreq=11, maxDocs=41962)
                0.015268025 = queryNorm
              1.0018326 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.159613 = idf(docFreq=11, maxDocs=41962)
                0.109375 = fieldNorm(doc=2300)
        0.16 = coord(4/25)
    
  5. Rasmussen, E.: Clustering algorithms (1992) 0.13
    0.12629324 = sum of:
      0.12629324 = product of:
        0.5262219 = sum of:
          0.025096562 = weight(abstract_txt:effectiveness in 4927) [ClassicSimilarity], result of:
            0.025096562 = score(doc=4927,freq=1.0), product of:
              0.07829942 = queryWeight, product of:
                5.1283264 = idf(docFreq=675, maxDocs=41962)
                0.015268025 = queryNorm
              0.3205204 = fieldWeight in 4927, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1283264 = idf(docFreq=675, maxDocs=41962)
                0.0625 = fieldNorm(doc=4927)
          0.011080909 = weight(abstract_txt:that in 4927) [ClassicSimilarity], result of:
            0.011080909 = score(doc=4927,freq=2.0), product of:
              0.051971238 = queryWeight, product of:
                1.4111166 = boost
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.015268025 = queryNorm
              0.21321233 = fieldWeight in 4927, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4122221 = idf(docFreq=10221, maxDocs=41962)
                0.0625 = fieldNorm(doc=4927)
          0.025675412 = weight(abstract_txt:been in 4927) [ClassicSimilarity], result of:
            0.025675412 = score(doc=4927,freq=2.0), product of:
              0.07949882 = queryWeight, product of:
                1.425004 = boost
                3.6539428 = idf(docFreq=2952, maxDocs=41962)
                0.015268025 = queryNorm
              0.32296595 = fieldWeight in 4927, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6539428 = idf(docFreq=2952, maxDocs=41962)
                0.0625 = fieldNorm(doc=4927)
          0.04129913 = weight(abstract_txt:document in 4927) [ClassicSimilarity], result of:
            0.04129913 = score(doc=4927,freq=2.0), product of:
              0.10913809 = queryWeight, product of:
                1.6696441 = boost
                4.28124 = idf(docFreq=1576, maxDocs=41962)
                0.015268025 = queryNorm
              0.3784117 = fieldWeight in 4927, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.28124 = idf(docFreq=1576, maxDocs=41962)
                0.0625 = fieldNorm(doc=4927)
          0.20360261 = weight(abstract_txt:nearest in 4927) [ClassicSimilarity], result of:
            0.20360261 = score(doc=4927,freq=1.0), product of:
              0.39830396 = queryWeight, product of:
                3.18965 = boost
                8.178783 = idf(docFreq=31, maxDocs=41962)
                0.015268025 = queryNorm
              0.51117396 = fieldWeight in 4927, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.178783 = idf(docFreq=31, maxDocs=41962)
                0.0625 = fieldNorm(doc=4927)
          0.21946722 = weight(abstract_txt:clusters in 4927) [ClassicSimilarity], result of:
            0.21946722 = score(doc=4927,freq=2.0), product of:
              0.38044563 = queryWeight, product of:
                3.8179276 = boost
                6.526526 = idf(docFreq=166, maxDocs=41962)
                0.015268025 = queryNorm
              0.57686883 = fieldWeight in 4927, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.526526 = idf(docFreq=166, maxDocs=41962)
                0.0625 = fieldNorm(doc=4927)
        0.24 = coord(6/25)