Document (#20953)

Author
Galal, G.M.
Cook, D.J.
Holder, L.B.
Title
Exploiting parallelism in a structural scientific discovery system to improve scalability
Source
Journal of the American Society for Information Science. 50(1999) no.1, S.65-73
Year
1999
Abstract
The large amount of data collected today is quickly overwhelming researchers' abilities to interpret the data and discover interesting patterns. Knowledge discovery and data mining approaches hold the potential to automate the interpretation process, but these approaches frequently utilize computationally expensive algorithms. In particular, scientific discovery systems focus on the utilization of richer data representation, sometimes without regard for scalability. This research investigates approaches for scaling a particular knowledge discovery in databases (KDD) system, SUBDUE, using parallel and distributed resources. SUBDUE has been used to discover interesting and repetitive concepts in graph-based databases from a variety of domains, but requires a substantial amount of processing time. Experiments that demonstrate scalability of parallel versions of the SUBDUE system are performed using CAD circuit databases and artificially-generated databases, and potential achievements and obstacles are discussed
Theme
Data Mining
Object
SUBDUE

Similar documents (author)

  1. Cook, M.: ¬The management of information from archives (1999) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:cook in 6786) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 6786, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=6786)
    
  2. Cook, M.: New directions in records management (1994) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:cook in 690) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 690, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=690)
    
  3. Cook, K.: ¬The incredible expanding OPAC (1994) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:cook in 2400) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 2400, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=2400)
    
  4. Cook, T.: Keeping our electronic memory : approaches for securing computer-generated records (1995) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:cook in 6372) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 6372, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=6372)
    
  5. Cook, M.: ¬The International Description Standards : new departures (1996) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:cook in 7872) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 7872, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=7872)
    

Similar documents (content)

  1. Chen, Z.: Knowledge discovery and system-user partnership : on a production 'adversarial partnership' approach (1994) 0.14
    0.13721591 = sum of:
      0.13721591 = product of:
        0.68607956 = sum of:
          0.057623412 = weight(abstract_txt:potential in 6759) [ClassicSimilarity], result of:
            0.057623412 = score(doc=6759,freq=1.0), product of:
              0.11371564 = queryWeight, product of:
                1.2601473 = boost
                4.632983 = idf(docFreq=1168, maxDocs=44218)
                0.019477721 = queryNorm
              0.5067325 = fieldWeight in 6759, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.632983 = idf(docFreq=1168, maxDocs=44218)
                0.109375 = fieldNorm(doc=6759)
          0.033334196 = weight(abstract_txt:system in 6759) [ClassicSimilarity], result of:
            0.033334196 = score(doc=6759,freq=1.0), product of:
              0.09037423 = queryWeight, product of:
                1.3758756 = boost
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.019477721 = queryNorm
              0.36884624 = fieldWeight in 6759, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.109375 = fieldNorm(doc=6759)
          0.043038864 = weight(abstract_txt:data in 6759) [ClassicSimilarity], result of:
            0.043038864 = score(doc=6759,freq=1.0), product of:
              0.11794279 = queryWeight, product of:
                1.8149385 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.019477721 = queryNorm
              0.36491305 = fieldWeight in 6759, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.109375 = fieldNorm(doc=6759)
          0.1408285 = weight(abstract_txt:databases in 6759) [ClassicSimilarity], result of:
            0.1408285 = score(doc=6759,freq=2.0), product of:
              0.20632313 = queryWeight, product of:
                2.4004915 = boost
                4.4127526 = idf(docFreq=1456, maxDocs=44218)
                0.019477721 = queryNorm
              0.6825628 = fieldWeight in 6759, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4127526 = idf(docFreq=1456, maxDocs=44218)
                0.109375 = fieldNorm(doc=6759)
          0.41125458 = weight(abstract_txt:discovery in 6759) [ClassicSimilarity], result of:
            0.41125458 = score(doc=6759,freq=4.0), product of:
              0.33456823 = queryWeight, product of:
                3.0568109 = boost
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.019477721 = queryNorm
              1.2292099 = fieldWeight in 6759, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.109375 = fieldNorm(doc=6759)
        0.2 = coord(5/25)
    
  2. Janée, G.; Frew, J.; Hill, L.L.: Issues in georeferenced digital libraries (2004) 0.11
    0.10794553 = sum of:
      0.10794553 = product of:
        0.6746596 = sum of:
          0.028572166 = weight(abstract_txt:system in 1165) [ClassicSimilarity], result of:
            0.028572166 = score(doc=1165,freq=1.0), product of:
              0.09037423 = queryWeight, product of:
                1.3758756 = boost
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.019477721 = queryNorm
              0.3161539 = fieldWeight in 1165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.09375 = fieldNorm(doc=1165)
          0.036890455 = weight(abstract_txt:data in 1165) [ClassicSimilarity], result of:
            0.036890455 = score(doc=1165,freq=1.0), product of:
              0.11794279 = queryWeight, product of:
                1.8149385 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.019477721 = queryNorm
              0.31278262 = fieldWeight in 1165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.09375 = fieldNorm(doc=1165)
          0.24925792 = weight(abstract_txt:discovery in 1165) [ClassicSimilarity], result of:
            0.24925792 = score(doc=1165,freq=2.0), product of:
              0.33456823 = queryWeight, product of:
                3.0568109 = boost
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.019477721 = queryNorm
              0.7450137 = fieldWeight in 1165, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.09375 = fieldNorm(doc=1165)
          0.3599391 = weight(abstract_txt:scalability in 1165) [ClassicSimilarity], result of:
            0.3599391 = score(doc=1165,freq=1.0), product of:
              0.48929244 = queryWeight, product of:
                3.2014086 = boost
                7.84674 = idf(docFreq=46, maxDocs=44218)
                0.019477721 = queryNorm
              0.7356318 = fieldWeight in 1165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.84674 = idf(docFreq=46, maxDocs=44218)
                0.09375 = fieldNorm(doc=1165)
        0.16 = coord(4/25)
    
  3. Hsu, C.-N.; Chang, C.-H.; Hsieh, C.-H.; Lu, J.-J.; Chang, C.-C.: Reconfigurable Web wrapper agents for biological information integration (2005) 0.11
    0.10555724 = sum of:
      0.10555724 = product of:
        0.5277862 = sum of:
          0.08423616 = weight(abstract_txt:automate in 5263) [ClassicSimilarity], result of:
            0.08423616 = score(doc=5263,freq=1.0), product of:
              0.16882442 = queryWeight, product of:
                1.08571 = boost
                7.983315 = idf(docFreq=40, maxDocs=44218)
                0.019477721 = queryNorm
              0.4989572 = fieldWeight in 5263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.983315 = idf(docFreq=40, maxDocs=44218)
                0.0625 = fieldNorm(doc=5263)
          0.086664446 = weight(abstract_txt:overwhelming in 5263) [ClassicSimilarity], result of:
            0.086664446 = score(doc=5263,freq=1.0), product of:
              0.17205352 = queryWeight, product of:
                1.0960441 = boost
                8.059301 = idf(docFreq=37, maxDocs=44218)
                0.019477721 = queryNorm
              0.50370634 = fieldWeight in 5263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.059301 = idf(docFreq=37, maxDocs=44218)
                0.0625 = fieldNorm(doc=5263)
          0.125645 = weight(abstract_txt:discover in 5263) [ClassicSimilarity], result of:
            0.125645 = score(doc=5263,freq=2.0), product of:
              0.22039397 = queryWeight, product of:
                1.7543292 = boost
                6.449863 = idf(docFreq=189, maxDocs=44218)
                0.019477721 = queryNorm
              0.57009274 = fieldWeight in 5263, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.449863 = idf(docFreq=189, maxDocs=44218)
                0.0625 = fieldNorm(doc=5263)
          0.06506864 = weight(abstract_txt:data in 5263) [ClassicSimilarity], result of:
            0.06506864 = score(doc=5263,freq=7.0), product of:
              0.11794279 = queryWeight, product of:
                1.8149385 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.019477721 = queryNorm
              0.55169666 = fieldWeight in 5263, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=5263)
          0.16617194 = weight(abstract_txt:discovery in 5263) [ClassicSimilarity], result of:
            0.16617194 = score(doc=5263,freq=2.0), product of:
              0.33456823 = queryWeight, product of:
                3.0568109 = boost
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.019477721 = queryNorm
              0.4966758 = fieldWeight in 5263, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.0625 = fieldNorm(doc=5263)
        0.2 = coord(5/25)
    
  4. Networked knowledge organization systems (2001) 0.10
    0.09876255 = sum of:
      0.09876255 = product of:
        0.49381274 = sum of:
          0.0493915 = weight(abstract_txt:potential in 6473) [ClassicSimilarity], result of:
            0.0493915 = score(doc=6473,freq=1.0), product of:
              0.11371564 = queryWeight, product of:
                1.2601473 = boost
                4.632983 = idf(docFreq=1168, maxDocs=44218)
                0.019477721 = queryNorm
              0.43434218 = fieldWeight in 6473, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.632983 = idf(docFreq=1168, maxDocs=44218)
                0.09375 = fieldNorm(doc=6473)
          0.036890455 = weight(abstract_txt:data in 6473) [ClassicSimilarity], result of:
            0.036890455 = score(doc=6473,freq=1.0), product of:
              0.11794279 = queryWeight, product of:
                1.8149385 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.019477721 = queryNorm
              0.31278262 = fieldWeight in 6473, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.09375 = fieldNorm(doc=6473)
          0.07291787 = weight(abstract_txt:approaches in 6473) [ClassicSimilarity], result of:
            0.07291787 = score(doc=6473,freq=1.0), product of:
              0.16877384 = queryWeight, product of:
                1.8802233 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.019477721 = queryNorm
              0.43204486 = fieldWeight in 6473, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.09375 = fieldNorm(doc=6473)
          0.08535497 = weight(abstract_txt:databases in 6473) [ClassicSimilarity], result of:
            0.08535497 = score(doc=6473,freq=1.0), product of:
              0.20632313 = queryWeight, product of:
                2.4004915 = boost
                4.4127526 = idf(docFreq=1456, maxDocs=44218)
                0.019477721 = queryNorm
              0.41369557 = fieldWeight in 6473, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4127526 = idf(docFreq=1456, maxDocs=44218)
                0.09375 = fieldNorm(doc=6473)
          0.24925792 = weight(abstract_txt:discovery in 6473) [ClassicSimilarity], result of:
            0.24925792 = score(doc=6473,freq=2.0), product of:
              0.33456823 = queryWeight, product of:
                3.0568109 = boost
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.019477721 = queryNorm
              0.7450137 = fieldWeight in 6473, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.09375 = fieldNorm(doc=6473)
        0.2 = coord(5/25)
    
  5. Barrio, P.; Gravano, L.: Sampling strategies for information extraction over the deep web (2017) 0.09
    0.08978112 = sum of:
      0.08978112 = product of:
        0.37408802 = sum of:
          0.081447914 = weight(abstract_txt:expensive in 3412) [ClassicSimilarity], result of:
            0.081447914 = score(doc=3412,freq=2.0), product of:
              0.14322127 = queryWeight, product of:
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.019477721 = queryNorm
              0.5686859 = fieldWeight in 3412, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3412)
          0.058212984 = weight(abstract_txt:richer in 3412) [ClassicSimilarity], result of:
            0.058212984 = score(doc=3412,freq=1.0), product of:
              0.14424832 = queryWeight, product of:
                1.0035791 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.019477721 = queryNorm
              0.4035609 = fieldWeight in 3412, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3412)
          0.08371977 = weight(abstract_txt:computationally in 3412) [ClassicSimilarity], result of:
            0.08371977 = score(doc=3412,freq=1.0), product of:
              0.1837876 = queryWeight, product of:
                1.1328028 = boost
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.019477721 = queryNorm
              0.45552456 = fieldWeight in 3412, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3412)
          0.07773887 = weight(abstract_txt:discover in 3412) [ClassicSimilarity], result of:
            0.07773887 = score(doc=3412,freq=1.0), product of:
              0.22039397 = queryWeight, product of:
                1.7543292 = boost
                6.449863 = idf(docFreq=189, maxDocs=44218)
                0.019477721 = queryNorm
              0.35272688 = fieldWeight in 3412, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.449863 = idf(docFreq=189, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3412)
          0.030433072 = weight(abstract_txt:data in 3412) [ClassicSimilarity], result of:
            0.030433072 = score(doc=3412,freq=2.0), product of:
              0.11794279 = queryWeight, product of:
                1.8149385 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.019477721 = queryNorm
              0.2580325 = fieldWeight in 3412, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3412)
          0.042535424 = weight(abstract_txt:approaches in 3412) [ClassicSimilarity], result of:
            0.042535424 = score(doc=3412,freq=1.0), product of:
              0.16877384 = queryWeight, product of:
                1.8802233 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.019477721 = queryNorm
              0.25202617 = fieldWeight in 3412, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3412)
        0.24 = coord(6/25)