Search (1 results, page 1 of 1)

Ruocco, A.S.; Frieder, O.: Clustering and classification of large document bases in a parallel environment (1997) 0.00
```
0.0014724231 = product of:
  0.008834538 = sum of:
    0.008834538 = weight(_text_:in in 1661) [ClassicSimilarity], result of:
      0.008834538 = score(doc=1661,freq=4.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.14877784 = fieldWeight in 1661, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1661)
  0.16666667 = coord(1/6)
```
Abstract

Proposes the use of parallel computing systems to overcome the computationally intense clustering process. Examines 2 operations: clustering a document set and classifying the document set. Uses a subset of the TIPSTER corpus, specifically, articles from the Wall Street Journal. Document set classification was performed without the large storage requirements for ancillary data matrices. The time performance of the parallel systems was an improvement over sequential systems times, and produced the same clustering and classification scheme. Results show near linear speed up in higher threshold clustering applications