Document (#19663)

Author
Ruocco, A.S.
Frieder, O.
Title
Clustering and classification of large document bases in a parallel environment
Source
Journal of the American Society for Information Science. 48(1997) no.10, S.932-943
Year
1997
Abstract
Proposes the use of parallel computing systems to overcome the computationally intense clustering process. Examines 2 operations: clustering a document set and classifying the document set. Uses a subset of the TIPSTER corpus, specifically, articles from the Wall Street Journal. Document set classification was performed without the large storage requirements for ancillary data matrices. The time performance of the parallel systems was an improvement over sequential systems times, and produced the same clustering and classification scheme. Results show near linear speed up in higher threshold clustering applications
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Grossman, D.A.; Frieder, O.: Information retrieval : algorithms and heuristics (1998) 4.45
    4.445151 = sum of:
      4.445151 = weight(author_txt:frieder in 3183) [ClassicSimilarity], result of:
        4.445151 = fieldWeight in 3183, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.890302 = idf(docFreq=15, maxDocs=42740)
          0.5 = fieldNorm(doc=3183)
    
  2. Grossman, D.A.; Frieder, O.: Information retrieval : algorithms and heuristics (2004) 4.45
    4.445151 = sum of:
      4.445151 = weight(author_txt:frieder in 3487) [ClassicSimilarity], result of:
        4.445151 = fieldWeight in 3487, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.890302 = idf(docFreq=15, maxDocs=42740)
          0.5 = fieldNorm(doc=3487)
    
  3. Soo, J.; Frieder, O.: On searching misspelled collections (2015) 4.45
    4.445151 = sum of:
      4.445151 = weight(author_txt:frieder in 3863) [ClassicSimilarity], result of:
        4.445151 = fieldWeight in 3863, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.890302 = idf(docFreq=15, maxDocs=42740)
          0.5 = fieldNorm(doc=3863)
    
  4. Aljlayl, M.; Frieder, O.; Grossman, D.: On bidirectional English-Arabic search (2002) 3.33
    3.3338633 = sum of:
      3.3338633 = weight(author_txt:frieder in 228) [ClassicSimilarity], result of:
        3.3338633 = fieldWeight in 228, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.890302 = idf(docFreq=15, maxDocs=42740)
          0.375 = fieldNorm(doc=228)
    
  5. Urbain, J.; Goharian, N.; Frieder, O.: Probabilistic passage models for semantic search of genomics literature (2008) 3.33
    3.3338633 = sum of:
      3.3338633 = weight(author_txt:frieder in 4381) [ClassicSimilarity], result of:
        3.3338633 = fieldWeight in 4381, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.890302 = idf(docFreq=15, maxDocs=42740)
          0.375 = fieldNorm(doc=4381)
    

Similar documents (content)

  1. Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.30
    0.30294773 = sum of:
      0.30294773 = product of:
        1.0819561 = sum of:
          0.049227532 = weight(abstract_txt:operations in 2449) [ClassicSimilarity], result of:
            0.049227532 = score(doc=2449,freq=1.0), product of:
              0.120454095 = queryWeight, product of:
                1.0755885 = boost
                6.5389266 = idf(docFreq=167, maxDocs=42740)
                0.017126514 = queryNorm
              0.4086829 = fieldWeight in 2449, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5389266 = idf(docFreq=167, maxDocs=42740)
                0.0625 = fieldNorm(doc=2449)
          0.059895836 = weight(abstract_txt:near in 2449) [ClassicSimilarity], result of:
            0.059895836 = score(doc=2449,freq=1.0), product of:
              0.13728212 = queryWeight, product of:
                1.1482656 = boost
                6.980759 = idf(docFreq=107, maxDocs=42740)
                0.017126514 = queryNorm
              0.43629745 = fieldWeight in 2449, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.980759 = idf(docFreq=107, maxDocs=42740)
                0.0625 = fieldNorm(doc=2449)
          0.06660046 = weight(abstract_txt:subset in 2449) [ClassicSimilarity], result of:
            0.06660046 = score(doc=2449,freq=1.0), product of:
              0.14734463 = queryWeight, product of:
                1.1896043 = boost
                7.232074 = idf(docFreq=83, maxDocs=42740)
                0.017126514 = queryNorm
              0.4520046 = fieldWeight in 2449, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.232074 = idf(docFreq=83, maxDocs=42740)
                0.0625 = fieldNorm(doc=2449)
          0.031418864 = weight(abstract_txt:large in 2449) [ClassicSimilarity], result of:
            0.031418864 = score(doc=2449,freq=1.0), product of:
              0.112500176 = queryWeight, product of:
                1.4700326 = boost
                4.468454 = idf(docFreq=1331, maxDocs=42740)
                0.017126514 = queryNorm
              0.27927837 = fieldWeight in 2449, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.468454 = idf(docFreq=1331, maxDocs=42740)
                0.0625 = fieldNorm(doc=2449)
          0.078131855 = weight(abstract_txt:document in 2449) [ClassicSimilarity], result of:
            0.078131855 = score(doc=2449,freq=2.0), product of:
              0.206496 = queryWeight, product of:
                2.816574 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.017126514 = queryNorm
              0.37836984 = fieldWeight in 2449, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=2449)
          0.27763718 = weight(abstract_txt:parallel in 2449) [ClassicSimilarity], result of:
            0.27763718 = score(doc=2449,freq=4.0), product of:
              0.3467542 = queryWeight, product of:
                3.1608715 = boost
                6.405395 = idf(docFreq=191, maxDocs=42740)
                0.017126514 = queryNorm
              0.8006744 = fieldWeight in 2449, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.405395 = idf(docFreq=191, maxDocs=42740)
                0.0625 = fieldNorm(doc=2449)
          0.51904434 = weight(abstract_txt:clustering in 2449) [ClassicSimilarity], result of:
            0.51904434 = score(doc=2449,freq=6.0), product of:
              0.54503626 = queryWeight, product of:
                5.1160297 = boost
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.017126514 = queryNorm
              0.9523116 = fieldWeight in 2449, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.0625 = fieldNorm(doc=2449)
        0.28 = coord(7/25)
    
  2. Rooney, N.; Patterson, D.; Galushka, M.; Dobrynin, V.; Smirnova, E.: ¬An investigation into the stability of contextual document clustering (2008) 0.14
    0.13514827 = sum of:
      0.13514827 = product of:
        0.6757413 = sum of:
          0.039561216 = weight(abstract_txt:times in 3357) [ClassicSimilarity], result of:
            0.039561216 = score(doc=3357,freq=1.0), product of:
              0.10411883 = queryWeight, product of:
                6.0793943 = idf(docFreq=265, maxDocs=42740)
                0.017126514 = queryNorm
              0.37996215 = fieldWeight in 3357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0793943 = idf(docFreq=265, maxDocs=42740)
                0.0625 = fieldNorm(doc=3357)
          0.057454042 = weight(abstract_txt:corpus in 3357) [ClassicSimilarity], result of:
            0.057454042 = score(doc=3357,freq=2.0), product of:
              0.105979025 = queryWeight, product of:
                1.0088935 = boost
                6.1334615 = idf(docFreq=251, maxDocs=42740)
                0.017126514 = queryNorm
              0.54212654 = fieldWeight in 3357, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1334615 = idf(docFreq=251, maxDocs=42740)
                0.0625 = fieldNorm(doc=3357)
          0.044432983 = weight(abstract_txt:large in 3357) [ClassicSimilarity], result of:
            0.044432983 = score(doc=3357,freq=2.0), product of:
              0.112500176 = queryWeight, product of:
                1.4700326 = boost
                4.468454 = idf(docFreq=1331, maxDocs=42740)
                0.017126514 = queryNorm
              0.39495924 = fieldWeight in 3357, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.468454 = idf(docFreq=1331, maxDocs=42740)
                0.0625 = fieldNorm(doc=3357)
          0.110495135 = weight(abstract_txt:document in 3357) [ClassicSimilarity], result of:
            0.110495135 = score(doc=3357,freq=4.0), product of:
              0.206496 = queryWeight, product of:
                2.816574 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.017126514 = queryNorm
              0.53509575 = fieldWeight in 3357, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=3357)
          0.4237979 = weight(abstract_txt:clustering in 3357) [ClassicSimilarity], result of:
            0.4237979 = score(doc=3357,freq=4.0), product of:
              0.54503626 = queryWeight, product of:
                5.1160297 = boost
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.017126514 = queryNorm
              0.7775591 = fieldWeight in 3357, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.0625 = fieldNorm(doc=3357)
        0.2 = coord(5/25)
    
  3. Mather, L.A.: ¬A linear algebra measure of cluster quality (2000) 0.12
    0.12026297 = sum of:
      0.12026297 = product of:
        0.7516436 = sum of:
          0.07560551 = weight(abstract_txt:linear in 5768) [ClassicSimilarity], result of:
            0.07560551 = score(doc=5768,freq=2.0), product of:
              0.12726486 = queryWeight, product of:
                1.1055785 = boost
                6.721248 = idf(docFreq=139, maxDocs=42740)
                0.017126514 = queryNorm
              0.59408003 = fieldWeight in 5768, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.721248 = idf(docFreq=139, maxDocs=42740)
                0.0625 = fieldNorm(doc=5768)
          0.12870288 = weight(abstract_txt:matrices in 5768) [ClassicSimilarity], result of:
            0.12870288 = score(doc=5768,freq=2.0), product of:
              0.18143944 = queryWeight, product of:
                1.3200829 = boost
                8.025305 = idf(docFreq=37, maxDocs=42740)
                0.017126514 = queryNorm
              0.70934343 = fieldWeight in 5768, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.025305 = idf(docFreq=37, maxDocs=42740)
                0.0625 = fieldNorm(doc=5768)
          0.12353731 = weight(abstract_txt:document in 5768) [ClassicSimilarity], result of:
            0.12353731 = score(doc=5768,freq=5.0), product of:
              0.206496 = queryWeight, product of:
                2.816574 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.017126514 = queryNorm
              0.5982552 = fieldWeight in 5768, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=5768)
          0.4237979 = weight(abstract_txt:clustering in 5768) [ClassicSimilarity], result of:
            0.4237979 = score(doc=5768,freq=4.0), product of:
              0.54503626 = queryWeight, product of:
                5.1160297 = boost
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.017126514 = queryNorm
              0.7775591 = fieldWeight in 5768, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.0625 = fieldNorm(doc=5768)
        0.16 = coord(4/25)
    
  4. Kishida, K.: High-speed rough clustering for very large document collections (2010) 0.12
    0.11673357 = sum of:
      0.11673357 = product of:
        0.7295849 = sum of:
          0.040626142 = weight(abstract_txt:corpus in 464) [ClassicSimilarity], result of:
            0.040626142 = score(doc=464,freq=1.0), product of:
              0.105979025 = queryWeight, product of:
                1.0088935 = boost
                6.1334615 = idf(docFreq=251, maxDocs=42740)
                0.017126514 = queryNorm
              0.38334134 = fieldWeight in 464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1334615 = idf(docFreq=251, maxDocs=42740)
                0.0625 = fieldNorm(doc=464)
          0.05019502 = weight(abstract_txt:speed in 464) [ClassicSimilarity], result of:
            0.05019502 = score(doc=464,freq=1.0), product of:
              0.122027196 = queryWeight, product of:
                1.0825891 = boost
                6.581486 = idf(docFreq=160, maxDocs=42740)
                0.017126514 = queryNorm
              0.4113429 = fieldWeight in 464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.581486 = idf(docFreq=160, maxDocs=42740)
                0.0625 = fieldNorm(doc=464)
          0.078131855 = weight(abstract_txt:document in 464) [ClassicSimilarity], result of:
            0.078131855 = score(doc=464,freq=2.0), product of:
              0.206496 = queryWeight, product of:
                2.816574 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.017126514 = queryNorm
              0.37836984 = fieldWeight in 464, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=464)
          0.5606319 = weight(abstract_txt:clustering in 464) [ClassicSimilarity], result of:
            0.5606319 = score(doc=464,freq=7.0), product of:
              0.54503626 = queryWeight, product of:
                5.1160297 = boost
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.017126514 = queryNorm
              1.0286139 = fieldWeight in 464, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.0625 = fieldNorm(doc=464)
        0.16 = coord(4/25)
    
  5. Zamir, O.; Etzioni, O.: Grouper : a dynamic clustering interface to Web search results (1999) 0.12
    0.116373114 = sum of:
      0.116373114 = product of:
        0.727332 = sum of:
          0.062743776 = weight(abstract_txt:speed in 208) [ClassicSimilarity], result of:
            0.062743776 = score(doc=208,freq=1.0), product of:
              0.122027196 = queryWeight, product of:
                1.0825891 = boost
                6.581486 = idf(docFreq=160, maxDocs=42740)
                0.017126514 = queryNorm
              0.51417863 = fieldWeight in 208, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.581486 = idf(docFreq=160, maxDocs=42740)
                0.078125 = fieldNorm(doc=208)
          0.03717605 = weight(abstract_txt:systems in 208) [ClassicSimilarity], result of:
            0.03717605 = score(doc=208,freq=2.0), product of:
              0.098540656 = queryWeight, product of:
                1.6850147 = boost
                3.414623 = idf(docFreq=3820, maxDocs=42740)
                0.017126514 = queryNorm
              0.3772661 = fieldWeight in 208, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.414623 = idf(docFreq=3820, maxDocs=42740)
                0.078125 = fieldNorm(doc=208)
          0.09766482 = weight(abstract_txt:document in 208) [ClassicSimilarity], result of:
            0.09766482 = score(doc=208,freq=2.0), product of:
              0.206496 = queryWeight, product of:
                2.816574 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.017126514 = queryNorm
              0.4729623 = fieldWeight in 208, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.078125 = fieldNorm(doc=208)
          0.52974737 = weight(abstract_txt:clustering in 208) [ClassicSimilarity], result of:
            0.52974737 = score(doc=208,freq=4.0), product of:
              0.54503626 = queryWeight, product of:
                5.1160297 = boost
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.017126514 = queryNorm
              0.97194886 = fieldWeight in 208, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.078125 = fieldNorm(doc=208)
        0.16 = coord(4/25)