Document (#19662)

Author
Ruocco, A.S.
Frieder, O.
Title
Clustering and classification of large document bases in a parallel environment
Source
Journal of the American Society for Information Science. 48(1997) no.10, S.932-943
Year
1997
Abstract
Proposes the use of parallel computing systems to overcome the computationally intense clustering process. Examines 2 operations: clustering a document set and classifying the document set. Uses a subset of the TIPSTER corpus, specifically, articles from the Wall Street Journal. Document set classification was performed without the large storage requirements for ancillary data matrices. The time performance of the parallel systems was an improvement over sequential systems times, and produced the same clustering and classification scheme. Results show near linear speed up in higher threshold clustering applications
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Grossman, D.A.; Frieder, O.: Information retrieval : algorithms and heuristics (1998) 4.46
    4.462149 = sum of:
      4.462149 = weight(author_txt:frieder in 2182) [ClassicSimilarity], result of:
        4.462149 = fieldWeight in 2182, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.5 = fieldNorm(doc=2182)
    
  2. Grossman, D.A.; Frieder, O.: Information retrieval : algorithms and heuristics (2004) 4.46
    4.462149 = sum of:
      4.462149 = weight(author_txt:frieder in 1486) [ClassicSimilarity], result of:
        4.462149 = fieldWeight in 1486, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.5 = fieldNorm(doc=1486)
    
  3. Soo, J.; Frieder, O.: On searching misspelled collections (2015) 4.46
    4.462149 = sum of:
      4.462149 = weight(author_txt:frieder in 1862) [ClassicSimilarity], result of:
        4.462149 = fieldWeight in 1862, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.5 = fieldNorm(doc=1862)
    
  4. Aljlayl, M.; Frieder, O.; Grossman, D.: On bidirectional English-Arabic search (2002) 3.35
    3.346612 = sum of:
      3.346612 = weight(author_txt:frieder in 5227) [ClassicSimilarity], result of:
        3.346612 = fieldWeight in 5227, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.375 = fieldNorm(doc=5227)
    
  5. Urbain, J.; Goharian, N.; Frieder, O.: Probabilistic passage models for semantic search of genomics literature (2008) 3.35
    3.346612 = sum of:
      3.346612 = weight(author_txt:frieder in 2380) [ClassicSimilarity], result of:
        3.346612 = fieldWeight in 2380, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.375 = fieldNorm(doc=2380)
    

Similar documents (content)

  1. Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.30
    0.30356342 = sum of:
      0.30356342 = product of:
        1.0841551 = sum of:
          0.049223695 = weight(abstract_txt:operations in 448) [ClassicSimilarity], result of:
            0.049223695 = score(doc=448,freq=1.0), product of:
              0.12057056 = queryWeight, product of:
                1.0794291 = boost
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.017099926 = queryNorm
              0.40825632 = fieldWeight in 448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.0625 = fieldNorm(doc=448)
          0.06001791 = weight(abstract_txt:near in 448) [ClassicSimilarity], result of:
            0.06001791 = score(doc=448,freq=1.0), product of:
              0.13760865 = queryWeight, product of:
                1.1531781 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.017099926 = queryNorm
              0.43614927 = fieldWeight in 448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0625 = fieldNorm(doc=448)
          0.065536335 = weight(abstract_txt:subset in 448) [ClassicSimilarity], result of:
            0.065536335 = score(doc=448,freq=1.0), product of:
              0.14591947 = queryWeight, product of:
                1.1874905 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.017099926 = queryNorm
              0.44912672 = fieldWeight in 448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.0625 = fieldNorm(doc=448)
          0.031212045 = weight(abstract_txt:large in 448) [ClassicSimilarity], result of:
            0.031212045 = score(doc=448,freq=1.0), product of:
              0.112120055 = queryWeight, product of:
                1.4720757 = boost
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.017099926 = queryNorm
              0.27838057 = fieldWeight in 448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.0625 = fieldNorm(doc=448)
          0.07902314 = weight(abstract_txt:document in 448) [ClassicSimilarity], result of:
            0.07902314 = score(doc=448,freq=2.0), product of:
              0.20827541 = queryWeight, product of:
                2.8374126 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.017099926 = queryNorm
              0.37941656 = fieldWeight in 448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=448)
          0.2795692 = weight(abstract_txt:parallel in 448) [ClassicSimilarity], result of:
            0.2795692 = score(doc=448,freq=4.0), product of:
              0.34871596 = queryWeight, product of:
                3.179582 = boost
                6.4136834 = idf(docFreq=196, maxDocs=44218)
                0.017099926 = queryNorm
              0.8017104 = fieldWeight in 448, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.4136834 = idf(docFreq=196, maxDocs=44218)
                0.0625 = fieldNorm(doc=448)
          0.5195727 = weight(abstract_txt:clustering in 448) [ClassicSimilarity], result of:
            0.5195727 = score(doc=448,freq=6.0), product of:
              0.5459618 = queryWeight, product of:
                5.136173 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.017099926 = queryNorm
              0.95166487 = fieldWeight in 448, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0625 = fieldNorm(doc=448)
        0.28 = coord(7/25)
    
  2. Rooney, N.; Patterson, D.; Galushka, M.; Dobrynin, V.; Smirnova, E.: ¬An investigation into the stability of contextual document clustering (2008) 0.14
    0.13518235 = sum of:
      0.13518235 = product of:
        0.6759117 = sum of:
          0.039137382 = weight(abstract_txt:times in 1356) [ClassicSimilarity], result of:
            0.039137382 = score(doc=1356,freq=1.0), product of:
              0.103479184 = queryWeight, product of:
                6.0514402 = idf(docFreq=282, maxDocs=44218)
                0.017099926 = queryNorm
              0.37821501 = fieldWeight in 1356, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0514402 = idf(docFreq=282, maxDocs=44218)
                0.0625 = fieldNorm(doc=1356)
          0.056648992 = weight(abstract_txt:corpus in 1356) [ClassicSimilarity], result of:
            0.056648992 = score(doc=1356,freq=2.0), product of:
              0.10509368 = queryWeight, product of:
                1.0077709 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.017099926 = queryNorm
              0.5390333 = fieldWeight in 1356, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0625 = fieldNorm(doc=1356)
          0.044140495 = weight(abstract_txt:large in 1356) [ClassicSimilarity], result of:
            0.044140495 = score(doc=1356,freq=2.0), product of:
              0.112120055 = queryWeight, product of:
                1.4720757 = boost
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.017099926 = queryNorm
              0.39368957 = fieldWeight in 1356, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.0625 = fieldNorm(doc=1356)
          0.111755595 = weight(abstract_txt:document in 1356) [ClassicSimilarity], result of:
            0.111755595 = score(doc=1356,freq=4.0), product of:
              0.20827541 = queryWeight, product of:
                2.8374126 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.017099926 = queryNorm
              0.53657603 = fieldWeight in 1356, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=1356)
          0.42422926 = weight(abstract_txt:clustering in 1356) [ClassicSimilarity], result of:
            0.42422926 = score(doc=1356,freq=4.0), product of:
              0.5459618 = queryWeight, product of:
                5.136173 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.017099926 = queryNorm
              0.77703106 = fieldWeight in 1356, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0625 = fieldNorm(doc=1356)
        0.2 = coord(5/25)
    
  3. Mather, L.A.: ¬A linear algebra measure of cluster quality (2000) 0.12
    0.12049536 = sum of:
      0.12049536 = product of:
        0.753096 = sum of:
          0.074436046 = weight(abstract_txt:linear in 4767) [ClassicSimilarity], result of:
            0.074436046 = score(doc=4767,freq=2.0), product of:
              0.12607743 = queryWeight, product of:
                1.1038046 = boost
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.017099926 = queryNorm
              0.59039944 = fieldWeight in 4767, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.0625 = fieldNorm(doc=4767)
          0.12948412 = weight(abstract_txt:matrices in 4767) [ClassicSimilarity], result of:
            0.12948412 = score(doc=4767,freq=2.0), product of:
              0.18235856 = queryWeight, product of:
                1.3275063 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.017099926 = queryNorm
              0.7100523 = fieldWeight in 4767, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.0625 = fieldNorm(doc=4767)
          0.12494656 = weight(abstract_txt:document in 4767) [ClassicSimilarity], result of:
            0.12494656 = score(doc=4767,freq=5.0), product of:
              0.20827541 = queryWeight, product of:
                2.8374126 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.017099926 = queryNorm
              0.59991026 = fieldWeight in 4767, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=4767)
          0.42422926 = weight(abstract_txt:clustering in 4767) [ClassicSimilarity], result of:
            0.42422926 = score(doc=4767,freq=4.0), product of:
              0.5459618 = queryWeight, product of:
                5.136173 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.017099926 = queryNorm
              0.77703106 = fieldWeight in 4767, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0625 = fieldNorm(doc=4767)
        0.16 = coord(4/25)
    
  4. Kishida, K.: High-speed rough clustering for very large document collections (2010) 0.12
    0.11693577 = sum of:
      0.11693577 = product of:
        0.73084855 = sum of:
          0.040056884 = weight(abstract_txt:corpus in 3463) [ClassicSimilarity], result of:
            0.040056884 = score(doc=3463,freq=1.0), product of:
              0.10509368 = queryWeight, product of:
                1.0077709 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.017099926 = queryNorm
              0.3811541 = fieldWeight in 3463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0625 = fieldNorm(doc=3463)
          0.05056592 = weight(abstract_txt:speed in 3463) [ClassicSimilarity], result of:
            0.05056592 = score(doc=3463,freq=1.0), product of:
              0.122752525 = queryWeight, product of:
                1.0891526 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.017099926 = queryNorm
              0.41193387 = fieldWeight in 3463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.0625 = fieldNorm(doc=3463)
          0.07902314 = weight(abstract_txt:document in 3463) [ClassicSimilarity], result of:
            0.07902314 = score(doc=3463,freq=2.0), product of:
              0.20827541 = queryWeight, product of:
                2.8374126 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.017099926 = queryNorm
              0.37941656 = fieldWeight in 3463, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=3463)
          0.5612026 = weight(abstract_txt:clustering in 3463) [ClassicSimilarity], result of:
            0.5612026 = score(doc=3463,freq=7.0), product of:
              0.5459618 = queryWeight, product of:
                5.136173 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.017099926 = queryNorm
              1.0279155 = fieldWeight in 3463, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0625 = fieldNorm(doc=3463)
        0.16 = coord(4/25)
    
  5. Zamir, O.; Etzioni, O.: Grouper : a dynamic clustering interface to Web search results (1999) 0.12
    0.11671565 = sum of:
      0.11671565 = product of:
        0.7294728 = sum of:
          0.0632074 = weight(abstract_txt:speed in 6207) [ClassicSimilarity], result of:
            0.0632074 = score(doc=6207,freq=1.0), product of:
              0.122752525 = queryWeight, product of:
                1.0891526 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.017099926 = queryNorm
              0.5149173 = fieldWeight in 6207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.078125 = fieldNorm(doc=6207)
          0.03719995 = weight(abstract_txt:systems in 6207) [ClassicSimilarity], result of:
            0.03719995 = score(doc=6207,freq=2.0), product of:
              0.098683335 = queryWeight, product of:
                1.6914378 = boost
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.017099926 = queryNorm
              0.37696287 = fieldWeight in 6207, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.078125 = fieldNorm(doc=6207)
          0.098778926 = weight(abstract_txt:document in 6207) [ClassicSimilarity], result of:
            0.098778926 = score(doc=6207,freq=2.0), product of:
              0.20827541 = queryWeight, product of:
                2.8374126 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.017099926 = queryNorm
              0.4742707 = fieldWeight in 6207, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=6207)
          0.53028655 = weight(abstract_txt:clustering in 6207) [ClassicSimilarity], result of:
            0.53028655 = score(doc=6207,freq=4.0), product of:
              0.5459618 = queryWeight, product of:
                5.136173 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.017099926 = queryNorm
              0.9712888 = fieldWeight in 6207, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.078125 = fieldNorm(doc=6207)
        0.16 = coord(4/25)