Document (#6628)

Author
Can, F.
Title
Incremental clustering for dynamic information processing
Source
ACM transactions on information systems. 11(1993) no.2, S.143-164
Year
1993
Abstract
Clustering of very large document databases is useful for both searching and browsing. The periodic updating of clusters is required due to the dynamic nature of databases. Introduces an algorithm for incremental clustering and discusses the complexity and cost of analysis of the algorithm together with an investigation of its expected behaviour. Shows through empirical testing that the algortihm achieves cost effectiveness and generates statistically valid clusters that are compatible with those of reclustering. The experimental evidence shows that the algorithm creates an effective and effecient retrieval environment
Theme
Automatisches Indexieren
Retrievalalgorithmen

Similar documents (content)

  1. Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.20
    0.19951609 = sum of:
      0.19951609 = product of:
        0.83131707 = sum of:
          0.043135326 = weight(abstract_txt:complexity in 2449) [ClassicSimilarity], result of:
            0.043135326 = score(doc=2449,freq=1.0), product of:
              0.115622915 = queryWeight, product of:
                1.0902991 = boost
                5.9691043 = idf(docFreq=293, maxDocs=42306)
                0.017765976 = queryNorm
              0.37306902 = fieldWeight in 2449, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9691043 = idf(docFreq=293, maxDocs=42306)
                0.0625 = fieldNorm(doc=2449)
          0.04758991 = weight(abstract_txt:expected in 2449) [ClassicSimilarity], result of:
            0.04758991 = score(doc=2449,freq=1.0), product of:
              0.12345208 = queryWeight, product of:
                1.1266083 = boost
                6.167887 = idf(docFreq=240, maxDocs=42306)
                0.017765976 = queryNorm
              0.38549295 = fieldWeight in 2449, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.167887 = idf(docFreq=240, maxDocs=42306)
                0.0625 = fieldNorm(doc=2449)
          0.011967648 = weight(abstract_txt:that in 2449) [ClassicSimilarity], result of:
            0.011967648 = score(doc=2449,freq=2.0), product of:
              0.056302126 = queryWeight, product of:
                1.3177916 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.017765976 = queryNorm
              0.2125612 = fieldWeight in 2449, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0625 = fieldNorm(doc=2449)
          0.11288072 = weight(abstract_txt:clusters in 2449) [ClassicSimilarity], result of:
            0.11288072 = score(doc=2449,freq=1.0), product of:
              0.2766379 = queryWeight, product of:
                2.3850338 = boost
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.017765976 = queryNorm
              0.40804502 = fieldWeight in 2449, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.0625 = fieldNorm(doc=2449)
          0.25575158 = weight(abstract_txt:algorithm in 2449) [ClassicSimilarity], result of:
            0.25575158 = score(doc=2449,freq=5.0), product of:
              0.319461 = queryWeight, product of:
                3.139014 = boost
                5.7284284 = idf(docFreq=373, maxDocs=42306)
                0.017765976 = queryNorm
              0.8005722 = fieldWeight in 2449, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.7284284 = idf(docFreq=373, maxDocs=42306)
                0.0625 = fieldNorm(doc=2449)
          0.35999185 = weight(abstract_txt:clustering in 2449) [ClassicSimilarity], result of:
            0.35999185 = score(doc=2449,freq=6.0), product of:
              0.37757823 = queryWeight, product of:
                3.412619 = boost
                6.227734 = idf(docFreq=226, maxDocs=42306)
                0.017765976 = queryNorm
              0.9534232 = fieldWeight in 2449, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.227734 = idf(docFreq=226, maxDocs=42306)
                0.0625 = fieldNorm(doc=2449)
        0.24 = coord(6/25)
    
  2. Zamir, O.; Etzioni, O.: Grouper : a dynamic clustering interface to Web search results (1999) 0.16
    0.16184646 = sum of:
      0.16184646 = product of:
        0.8092323 = sum of:
          0.043875743 = weight(abstract_txt:browsing in 208) [ClassicSimilarity], result of:
            0.043875743 = score(doc=208,freq=1.0), product of:
              0.10077779 = queryWeight, product of:
                1.0179024 = boost
                5.572751 = idf(docFreq=436, maxDocs=42306)
                0.017765976 = queryNorm
              0.43537116 = fieldWeight in 208, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.572751 = idf(docFreq=436, maxDocs=42306)
                0.078125 = fieldNorm(doc=208)
          0.0105780065 = weight(abstract_txt:that in 208) [ClassicSimilarity], result of:
            0.0105780065 = score(doc=208,freq=1.0), product of:
              0.056302126 = queryWeight, product of:
                1.3177916 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.017765976 = queryNorm
              0.18787934 = fieldWeight in 208, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.078125 = fieldNorm(doc=208)
          0.24439393 = weight(abstract_txt:clusters in 208) [ClassicSimilarity], result of:
            0.24439393 = score(doc=208,freq=3.0), product of:
              0.2766379 = queryWeight, product of:
                2.3850338 = boost
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.017765976 = queryNorm
              0.88344336 = fieldWeight in 208, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.078125 = fieldNorm(doc=208)
          0.14296947 = weight(abstract_txt:algorithm in 208) [ClassicSimilarity], result of:
            0.14296947 = score(doc=208,freq=1.0), product of:
              0.319461 = queryWeight, product of:
                3.139014 = boost
                5.7284284 = idf(docFreq=373, maxDocs=42306)
                0.017765976 = queryNorm
              0.44753346 = fieldWeight in 208, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7284284 = idf(docFreq=373, maxDocs=42306)
                0.078125 = fieldNorm(doc=208)
          0.36741513 = weight(abstract_txt:clustering in 208) [ClassicSimilarity], result of:
            0.36741513 = score(doc=208,freq=4.0), product of:
              0.37757823 = queryWeight, product of:
                3.412619 = boost
                6.227734 = idf(docFreq=226, maxDocs=42306)
                0.017765976 = queryNorm
              0.97308344 = fieldWeight in 208, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.227734 = idf(docFreq=226, maxDocs=42306)
                0.078125 = fieldNorm(doc=208)
        0.2 = coord(5/25)
    
  3. Kishida, K.: High-speed rough clustering for very large document collections (2010) 0.15
    0.15098473 = sum of:
      0.15098473 = product of:
        0.75492364 = sum of:
          0.043135326 = weight(abstract_txt:complexity in 464) [ClassicSimilarity], result of:
            0.043135326 = score(doc=464,freq=1.0), product of:
              0.115622915 = queryWeight, product of:
                1.0902991 = boost
                5.9691043 = idf(docFreq=293, maxDocs=42306)
                0.017765976 = queryNorm
              0.37306902 = fieldWeight in 464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9691043 = idf(docFreq=293, maxDocs=42306)
                0.0625 = fieldNorm(doc=464)
          0.011967648 = weight(abstract_txt:that in 464) [ClassicSimilarity], result of:
            0.011967648 = score(doc=464,freq=2.0), product of:
              0.056302126 = queryWeight, product of:
                1.3177916 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.017765976 = queryNorm
              0.2125612 = fieldWeight in 464, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0625 = fieldNorm(doc=464)
          0.11288072 = weight(abstract_txt:clusters in 464) [ClassicSimilarity], result of:
            0.11288072 = score(doc=464,freq=1.0), product of:
              0.2766379 = queryWeight, product of:
                2.3850338 = boost
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.017765976 = queryNorm
              0.40804502 = fieldWeight in 464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.0625 = fieldNorm(doc=464)
          0.19810432 = weight(abstract_txt:algorithm in 464) [ClassicSimilarity], result of:
            0.19810432 = score(doc=464,freq=3.0), product of:
              0.319461 = queryWeight, product of:
                3.139014 = boost
                5.7284284 = idf(docFreq=373, maxDocs=42306)
                0.017765976 = queryNorm
              0.6201205 = fieldWeight in 464, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7284284 = idf(docFreq=373, maxDocs=42306)
                0.0625 = fieldNorm(doc=464)
          0.3888356 = weight(abstract_txt:clustering in 464) [ClassicSimilarity], result of:
            0.3888356 = score(doc=464,freq=7.0), product of:
              0.37757823 = queryWeight, product of:
                3.412619 = boost
                6.227734 = idf(docFreq=226, maxDocs=42306)
                0.017765976 = queryNorm
              1.0298147 = fieldWeight in 464, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.227734 = idf(docFreq=226, maxDocs=42306)
                0.0625 = fieldNorm(doc=464)
        0.2 = coord(5/25)
    
  4. Gómez-Núñez, A.J.; Vargas-Quesada, B.; Moya-Anegón, F. de: Updating the SCImago journal and country rank classification : a new approach using Ward's clustering and alternative combination of citation measures (2016) 0.15
    0.14516547 = sum of:
      0.14516547 = product of:
        0.60485613 = sum of:
          0.03328098 = weight(abstract_txt:evidence in 4500) [ClassicSimilarity], result of:
            0.03328098 = score(doc=4500,freq=1.0), product of:
              0.0972641 = queryWeight, product of:
                5.47474 = idf(docFreq=481, maxDocs=42306)
                0.017765976 = queryNorm
              0.34217125 = fieldWeight in 4500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.47474 = idf(docFreq=481, maxDocs=42306)
                0.0625 = fieldNorm(doc=4500)
          0.040490855 = weight(abstract_txt:introduces in 4500) [ClassicSimilarity], result of:
            0.040490855 = score(doc=4500,freq=1.0), product of:
              0.110847645 = queryWeight, product of:
                1.0675468 = boost
                5.8445415 = idf(docFreq=332, maxDocs=42306)
                0.017765976 = queryNorm
              0.36528385 = fieldWeight in 4500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8445415 = idf(docFreq=332, maxDocs=42306)
                0.0625 = fieldNorm(doc=4500)
          0.07255407 = weight(abstract_txt:updating in 4500) [ClassicSimilarity], result of:
            0.07255407 = score(doc=4500,freq=1.0), product of:
              0.16352959 = queryWeight, product of:
                1.2966474 = boost
                7.0988073 = idf(docFreq=94, maxDocs=42306)
                0.017765976 = queryNorm
              0.44367546 = fieldWeight in 4500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0988073 = idf(docFreq=94, maxDocs=42306)
                0.0625 = fieldNorm(doc=4500)
          0.008462405 = weight(abstract_txt:that in 4500) [ClassicSimilarity], result of:
            0.008462405 = score(doc=4500,freq=1.0), product of:
              0.056302126 = queryWeight, product of:
                1.3177916 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.017765976 = queryNorm
              0.15030347 = fieldWeight in 4500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0625 = fieldNorm(doc=4500)
          0.19551514 = weight(abstract_txt:clusters in 4500) [ClassicSimilarity], result of:
            0.19551514 = score(doc=4500,freq=3.0), product of:
              0.2766379 = queryWeight, product of:
                2.3850338 = boost
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.017765976 = queryNorm
              0.7067547 = fieldWeight in 4500, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.0625 = fieldNorm(doc=4500)
          0.25455266 = weight(abstract_txt:clustering in 4500) [ClassicSimilarity], result of:
            0.25455266 = score(doc=4500,freq=3.0), product of:
              0.37757823 = queryWeight, product of:
                3.412619 = boost
                6.227734 = idf(docFreq=226, maxDocs=42306)
                0.017765976 = queryNorm
              0.674172 = fieldWeight in 4500, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.227734 = idf(docFreq=226, maxDocs=42306)
                0.0625 = fieldNorm(doc=4500)
        0.24 = coord(6/25)
    
  5. Kostoff, R.N.; Block, J.A.: Factor matrix text filtering and clustering (2005) 0.14
    0.1425455 = sum of:
      0.1425455 = product of:
        0.71272755 = sum of:
          0.018321645 = weight(abstract_txt:that in 4684) [ClassicSimilarity], result of:
            0.018321645 = score(doc=4684,freq=3.0), product of:
              0.056302126 = queryWeight, product of:
                1.3177916 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.017765976 = queryNorm
              0.32541656 = fieldWeight in 4684, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.078125 = fieldNorm(doc=4684)
          0.042920407 = weight(abstract_txt:databases in 4684) [ClassicSimilarity], result of:
            0.042920407 = score(doc=4684,freq=1.0), product of:
              0.1251222 = queryWeight, product of:
                1.6040057 = boost
                4.390757 = idf(docFreq=1424, maxDocs=42306)
                0.017765976 = queryNorm
              0.3430279 = fieldWeight in 4684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.390757 = idf(docFreq=1424, maxDocs=42306)
                0.078125 = fieldNorm(doc=4684)
          0.1411009 = weight(abstract_txt:clusters in 4684) [ClassicSimilarity], result of:
            0.1411009 = score(doc=4684,freq=1.0), product of:
              0.2766379 = queryWeight, product of:
                2.3850338 = boost
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.017765976 = queryNorm
              0.51005626 = fieldWeight in 4684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.078125 = fieldNorm(doc=4684)
          0.14296947 = weight(abstract_txt:algorithm in 4684) [ClassicSimilarity], result of:
            0.14296947 = score(doc=4684,freq=1.0), product of:
              0.319461 = queryWeight, product of:
                3.139014 = boost
                5.7284284 = idf(docFreq=373, maxDocs=42306)
                0.017765976 = queryNorm
              0.44753346 = fieldWeight in 4684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7284284 = idf(docFreq=373, maxDocs=42306)
                0.078125 = fieldNorm(doc=4684)
          0.36741513 = weight(abstract_txt:clustering in 4684) [ClassicSimilarity], result of:
            0.36741513 = score(doc=4684,freq=4.0), product of:
              0.37757823 = queryWeight, product of:
                3.412619 = boost
                6.227734 = idf(docFreq=226, maxDocs=42306)
                0.017765976 = queryNorm
              0.97308344 = fieldWeight in 4684, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.227734 = idf(docFreq=226, maxDocs=42306)
                0.078125 = fieldNorm(doc=4684)
        0.2 = coord(5/25)