Search (1 results, page 1 of 1)

  • × author_ss:"Kishida, K."
  • × theme_ss:"Automatisches Klassifizieren"
  1. Kishida, K.: High-speed rough clustering for very large document collections (2010) 0.02
    0.02320403 = product of:
      0.0812141 = sum of:
        0.03718255 = weight(_text_:processing in 3463) [ClassicSimilarity], result of:
          0.03718255 = score(doc=3463,freq=2.0), product of:
            0.1662677 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.04107254 = queryNorm
            0.22363065 = fieldWeight in 3463, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3463)
        0.044031553 = weight(_text_:techniques in 3463) [ClassicSimilarity], result of:
          0.044031553 = score(doc=3463,freq=2.0), product of:
            0.18093403 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.04107254 = queryNorm
            0.24335694 = fieldWeight in 3463, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3463)
      0.2857143 = coord(2/7)
    
    Abstract
    Document clustering is an important tool, but it is not yet widely used in practice probably because of its high computational complexity. This article explores techniques of high-speed rough clustering of documents, assuming that it is sometimes necessary to obtain a clustering result in a shorter time, although the result is just an approximate outline of document clusters. A promising approach for such clustering is to reduce the number of documents to be checked for generating cluster vectors in the leader-follower clustering algorithm. Based on this idea, the present article proposes a modified Crouch algorithm and incomplete single-pass leader-follower algorithm. Also, a two-stage grouping technique, in which the first stage attempts to decrease the number of documents to be processed in the second stage by applying a quick merging technique, is developed. An experiment using a part of the Reuters corpus RCV1 showed empirically that both the modified Crouch and the incomplete single-pass leader-follower algorithms achieve clustering results more efficiently than the original methods, and also improved the effectiveness of clustering results. On the other hand, the two-stage grouping technique did not reduce the processing time in this experiment.