Document (#20302)

Author
Trybula, W.J.
Title
Data mining and knowledge discovery
Source
Annual review of information science and technology. 32(1997), S.197-229
Year
1997
Abstract
State of the art review of the recently developed concepts of data mining (defined as the automated process of evaluating data and finding relationships) and knowledge discovery (defined as the automated process of extracting information, especially unpredicted relationships or previously unknown patterns among the data) with particular reference to numerical data. Includes: the knowledge acquisition process; data mining; evaluation methods; and knowledge discovery. Concludes that existing work in the field are confusing because the terminology is inconsistent and poorly defined. Although methods are available for analyzing and cleaning databases, better coordinated efforts should be directed toward providing users with improved means of structuring search mechanisms to explore the data for relationships
Theme
Data Mining
Literaturübersicht

Similar documents (content)

  1. Benoit, G.: Data mining (2002) 0.62
    0.61939156 = sum of:
      0.61939156 = product of:
        1.2903991 = sum of:
          0.04744714 = weight(abstract_txt:previously in 297) [ClassicSimilarity], result of:
            0.04744714 = score(doc=297,freq=1.0), product of:
              0.12361145 = queryWeight, product of:
                1.0489676 = boost
                6.1414557 = idf(docFreq=252, maxDocs=43254)
                0.019187806 = queryNorm
              0.38384098 = fieldWeight in 297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1414557 = idf(docFreq=252, maxDocs=43254)
                0.0625 = fieldNorm(doc=297)
          0.06868687 = weight(abstract_txt:extracting in 297) [ClassicSimilarity], result of:
            0.06868687 = score(doc=297,freq=1.0), product of:
              0.15818593 = queryWeight, product of:
                1.1866336 = boost
                6.9474573 = idf(docFreq=112, maxDocs=43254)
                0.019187806 = queryNorm
              0.43421608 = fieldWeight in 297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9474573 = idf(docFreq=112, maxDocs=43254)
                0.0625 = fieldNorm(doc=297)
          0.08696607 = weight(abstract_txt:inconsistent in 297) [ClassicSimilarity], result of:
            0.08696607 = score(doc=297,freq=1.0), product of:
              0.18513359 = queryWeight, product of:
                1.2837348 = boost
                7.515962 = idf(docFreq=63, maxDocs=43254)
                0.019187806 = queryNorm
              0.46974763 = fieldWeight in 297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.515962 = idf(docFreq=63, maxDocs=43254)
                0.0625 = fieldNorm(doc=297)
          0.09508832 = weight(abstract_txt:poorly in 297) [ClassicSimilarity], result of:
            0.09508832 = score(doc=297,freq=1.0), product of:
              0.19648835 = queryWeight, product of:
                1.3225166 = boost
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.019187806 = queryNorm
              0.48393872 = fieldWeight in 297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7430196 = idf(docFreq=50, maxDocs=43254)
                0.0625 = fieldNorm(doc=297)
          0.1043226 = weight(abstract_txt:confusing in 297) [ClassicSimilarity], result of:
            0.1043226 = score(doc=297,freq=1.0), product of:
              0.20901187 = queryWeight, product of:
                1.364012 = boost
                7.9859657 = idf(docFreq=39, maxDocs=43254)
                0.019187806 = queryNorm
              0.49912286 = fieldWeight in 297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9859657 = idf(docFreq=39, maxDocs=43254)
                0.0625 = fieldNorm(doc=297)
          0.029609464 = weight(abstract_txt:methods in 297) [ClassicSimilarity], result of:
            0.029609464 = score(doc=297,freq=1.0), product of:
              0.11373192 = queryWeight, product of:
                1.4229475 = boost
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.019187806 = queryNorm
              0.26034436 = fieldWeight in 297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.0625 = fieldNorm(doc=297)
          0.041245434 = weight(abstract_txt:process in 297) [ClassicSimilarity], result of:
            0.041245434 = score(doc=297,freq=1.0), product of:
              0.16238391 = queryWeight, product of:
                2.0824034 = boost
                4.063992 = idf(docFreq=2019, maxDocs=43254)
                0.019187806 = queryNorm
              0.2539995 = fieldWeight in 297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.063992 = idf(docFreq=2019, maxDocs=43254)
                0.0625 = fieldNorm(doc=297)
          0.07515741 = weight(abstract_txt:knowledge in 297) [ClassicSimilarity], result of:
            0.07515741 = score(doc=297,freq=4.0), product of:
              0.16797058 = queryWeight, product of:
                2.4455657 = boost
                3.5795512 = idf(docFreq=3278, maxDocs=43254)
                0.019187806 = queryNorm
              0.4474439 = fieldWeight in 297, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5795512 = idf(docFreq=3278, maxDocs=43254)
                0.0625 = fieldNorm(doc=297)
          0.08794279 = weight(abstract_txt:defined in 297) [ClassicSimilarity], result of:
            0.08794279 = score(doc=297,freq=1.0), product of:
              0.26900432 = queryWeight, product of:
                2.6802356 = boost
                5.230714 = idf(docFreq=628, maxDocs=43254)
                0.019187806 = queryNorm
              0.32691962 = fieldWeight in 297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.230714 = idf(docFreq=628, maxDocs=43254)
                0.0625 = fieldNorm(doc=297)
          0.1936043 = weight(abstract_txt:discovery in 297) [ClassicSimilarity], result of:
            0.1936043 = score(doc=297,freq=3.0), product of:
              0.31564242 = queryWeight, product of:
                2.903294 = boost
                5.666032 = idf(docFreq=406, maxDocs=43254)
                0.019187806 = queryNorm
              0.61336595 = fieldWeight in 297, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.666032 = idf(docFreq=406, maxDocs=43254)
                0.0625 = fieldNorm(doc=297)
          0.32721806 = weight(abstract_txt:mining in 297) [ClassicSimilarity], result of:
            0.32721806 = score(doc=297,freq=5.0), product of:
              0.37774083 = queryWeight, product of:
                3.1760716 = boost
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.019187806 = queryNorm
              0.86625016 = fieldWeight in 297, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.0625 = fieldNorm(doc=297)
          0.13311069 = weight(abstract_txt:data in 297) [ClassicSimilarity], result of:
            0.13311069 = score(doc=297,freq=6.0), product of:
              0.25884685 = queryWeight, product of:
                4.0160875 = boost
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.019187806 = queryNorm
              0.514245 = fieldWeight in 297, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.0625 = fieldNorm(doc=297)
        0.48 = coord(12/25)
    
  2. Fayyad, U.M.: Data mining and knowledge dicovery : making sense out of data (1996) 0.26
    0.25555122 = sum of:
      0.25555122 = product of:
        1.0647968 = sum of:
          0.13737375 = weight(abstract_txt:extracting in 77) [ClassicSimilarity], result of:
            0.13737375 = score(doc=77,freq=1.0), product of:
              0.15818593 = queryWeight, product of:
                1.1866336 = boost
                6.9474573 = idf(docFreq=112, maxDocs=43254)
                0.019187806 = queryNorm
              0.86843216 = fieldWeight in 77, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9474573 = idf(docFreq=112, maxDocs=43254)
                0.125 = fieldNorm(doc=77)
          0.1166597 = weight(abstract_txt:process in 77) [ClassicSimilarity], result of:
            0.1166597 = score(doc=77,freq=2.0), product of:
              0.16238391 = queryWeight, product of:
                2.0824034 = boost
                4.063992 = idf(docFreq=2019, maxDocs=43254)
                0.019187806 = queryNorm
              0.7184191 = fieldWeight in 77, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.063992 = idf(docFreq=2019, maxDocs=43254)
                0.125 = fieldNorm(doc=77)
          0.10628863 = weight(abstract_txt:knowledge in 77) [ClassicSimilarity], result of:
            0.10628863 = score(doc=77,freq=2.0), product of:
              0.16797058 = queryWeight, product of:
                2.4455657 = boost
                3.5795512 = idf(docFreq=3278, maxDocs=43254)
                0.019187806 = queryNorm
              0.6327812 = fieldWeight in 77, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5795512 = idf(docFreq=3278, maxDocs=43254)
                0.125 = fieldNorm(doc=77)
          0.223555 = weight(abstract_txt:discovery in 77) [ClassicSimilarity], result of:
            0.223555 = score(doc=77,freq=1.0), product of:
              0.31564242 = queryWeight, product of:
                2.903294 = boost
                5.666032 = idf(docFreq=406, maxDocs=43254)
                0.019187806 = queryNorm
              0.708254 = fieldWeight in 77, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.666032 = idf(docFreq=406, maxDocs=43254)
                0.125 = fieldNorm(doc=77)
          0.29267272 = weight(abstract_txt:mining in 77) [ClassicSimilarity], result of:
            0.29267272 = score(doc=77,freq=1.0), product of:
              0.37774083 = queryWeight, product of:
                3.1760716 = boost
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.019187806 = queryNorm
              0.7747977 = fieldWeight in 77, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.125 = fieldNorm(doc=77)
          0.18824692 = weight(abstract_txt:data in 77) [ClassicSimilarity], result of:
            0.18824692 = score(doc=77,freq=3.0), product of:
              0.25884685 = queryWeight, product of:
                4.0160875 = boost
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.019187806 = queryNorm
              0.7272521 = fieldWeight in 77, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.125 = fieldNorm(doc=77)
        0.24 = coord(6/25)
    
  3. Berry, M.W.; Esau, R.; Kiefer, B.: ¬The use of text mining techniques in electronic discovery for legal matters (2012) 0.22
    0.22225 = sum of:
      0.22225 = product of:
        0.79375 = sum of:
          0.0685583 = weight(abstract_txt:analyzing in 1556) [ClassicSimilarity], result of:
            0.0685583 = score(doc=1556,freq=1.0), product of:
              0.120567754 = queryWeight, product of:
                1.0359727 = boost
                6.0653734 = idf(docFreq=272, maxDocs=43254)
                0.019187806 = queryNorm
              0.5686288 = fieldWeight in 1556, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0653734 = idf(docFreq=272, maxDocs=43254)
                0.09375 = fieldNorm(doc=1556)
          0.0711707 = weight(abstract_txt:previously in 1556) [ClassicSimilarity], result of:
            0.0711707 = score(doc=1556,freq=1.0), product of:
              0.12361145 = queryWeight, product of:
                1.0489676 = boost
                6.1414557 = idf(docFreq=252, maxDocs=43254)
                0.019187806 = queryNorm
              0.57576144 = fieldWeight in 1556, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1414557 = idf(docFreq=252, maxDocs=43254)
                0.09375 = fieldNorm(doc=1556)
          0.044414192 = weight(abstract_txt:methods in 1556) [ClassicSimilarity], result of:
            0.044414192 = score(doc=1556,freq=1.0), product of:
              0.11373192 = queryWeight, product of:
                1.4229475 = boost
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.019187806 = queryNorm
              0.39051652 = fieldWeight in 1556, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.09375 = fieldNorm(doc=1556)
          0.10715878 = weight(abstract_txt:process in 1556) [ClassicSimilarity], result of:
            0.10715878 = score(doc=1556,freq=3.0), product of:
              0.16238391 = queryWeight, product of:
                2.0824034 = boost
                4.063992 = idf(docFreq=2019, maxDocs=43254)
                0.019187806 = queryNorm
              0.6599101 = fieldWeight in 1556, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.063992 = idf(docFreq=2019, maxDocs=43254)
                0.09375 = fieldNorm(doc=1556)
          0.16766626 = weight(abstract_txt:discovery in 1556) [ClassicSimilarity], result of:
            0.16766626 = score(doc=1556,freq=1.0), product of:
              0.31564242 = queryWeight, product of:
                2.903294 = boost
                5.666032 = idf(docFreq=406, maxDocs=43254)
                0.019187806 = queryNorm
              0.5311905 = fieldWeight in 1556, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.666032 = idf(docFreq=406, maxDocs=43254)
                0.09375 = fieldNorm(doc=1556)
          0.21950454 = weight(abstract_txt:mining in 1556) [ClassicSimilarity], result of:
            0.21950454 = score(doc=1556,freq=1.0), product of:
              0.37774083 = queryWeight, product of:
                3.1760716 = boost
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.019187806 = queryNorm
              0.58109826 = fieldWeight in 1556, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.09375 = fieldNorm(doc=1556)
          0.11527722 = weight(abstract_txt:data in 1556) [ClassicSimilarity], result of:
            0.11527722 = score(doc=1556,freq=2.0), product of:
              0.25884685 = queryWeight, product of:
                4.0160875 = boost
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.019187806 = queryNorm
              0.44534916 = fieldWeight in 1556, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.09375 = fieldNorm(doc=1556)
        0.28 = coord(7/25)
    
  4. Raghavan, V.V.; Deogun, J.S.; Sever, H.: Knowledge discovery and data mining : introduction (1998) 0.21
    0.20787649 = sum of:
      0.20787649 = product of:
        1.0393825 = sum of:
          0.087494776 = weight(abstract_txt:process in 4900) [ClassicSimilarity], result of:
            0.087494776 = score(doc=4900,freq=2.0), product of:
              0.16238391 = queryWeight, product of:
                2.0824034 = boost
                4.063992 = idf(docFreq=2019, maxDocs=43254)
                0.019187806 = queryNorm
              0.5388143 = fieldWeight in 4900, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.063992 = idf(docFreq=2019, maxDocs=43254)
                0.09375 = fieldNorm(doc=4900)
          0.11273611 = weight(abstract_txt:knowledge in 4900) [ClassicSimilarity], result of:
            0.11273611 = score(doc=4900,freq=4.0), product of:
              0.16797058 = queryWeight, product of:
                2.4455657 = boost
                3.5795512 = idf(docFreq=3278, maxDocs=43254)
                0.019187806 = queryNorm
              0.6711658 = fieldWeight in 4900, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5795512 = idf(docFreq=3278, maxDocs=43254)
                0.09375 = fieldNorm(doc=4900)
          0.23711587 = weight(abstract_txt:discovery in 4900) [ClassicSimilarity], result of:
            0.23711587 = score(doc=4900,freq=2.0), product of:
              0.31564242 = queryWeight, product of:
                2.903294 = boost
                5.666032 = idf(docFreq=406, maxDocs=43254)
                0.019187806 = queryNorm
              0.75121677 = fieldWeight in 4900, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.666032 = idf(docFreq=406, maxDocs=43254)
                0.09375 = fieldNorm(doc=4900)
          0.43900907 = weight(abstract_txt:mining in 4900) [ClassicSimilarity], result of:
            0.43900907 = score(doc=4900,freq=4.0), product of:
              0.37774083 = queryWeight, product of:
                3.1760716 = boost
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.019187806 = queryNorm
              1.1621965 = fieldWeight in 4900, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.09375 = fieldNorm(doc=4900)
          0.16302663 = weight(abstract_txt:data in 4900) [ClassicSimilarity], result of:
            0.16302663 = score(doc=4900,freq=4.0), product of:
              0.25884685 = queryWeight, product of:
                4.0160875 = boost
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.019187806 = queryNorm
              0.62981886 = fieldWeight in 4900, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.09375 = fieldNorm(doc=4900)
        0.2 = coord(5/25)
    
  5. Bath, P.A.: Data mining in health and medical information (2003) 0.21
    0.20542388 = sum of:
      0.20542388 = product of:
        0.7336567 = sum of:
          0.0685583 = weight(abstract_txt:analyzing in 264) [ClassicSimilarity], result of:
            0.0685583 = score(doc=264,freq=1.0), product of:
              0.120567754 = queryWeight, product of:
                1.0359727 = boost
                6.0653734 = idf(docFreq=272, maxDocs=43254)
                0.019187806 = queryNorm
              0.5686288 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0653734 = idf(docFreq=272, maxDocs=43254)
                0.09375 = fieldNorm(doc=264)
          0.044414192 = weight(abstract_txt:methods in 264) [ClassicSimilarity], result of:
            0.044414192 = score(doc=264,freq=1.0), product of:
              0.11373192 = queryWeight, product of:
                1.4229475 = boost
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.019187806 = queryNorm
              0.39051652 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.09375 = fieldNorm(doc=264)
          0.061868154 = weight(abstract_txt:process in 264) [ClassicSimilarity], result of:
            0.061868154 = score(doc=264,freq=1.0), product of:
              0.16238391 = queryWeight, product of:
                2.0824034 = boost
                4.063992 = idf(docFreq=2019, maxDocs=43254)
                0.019187806 = queryNorm
              0.38099927 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.063992 = idf(docFreq=2019, maxDocs=43254)
                0.09375 = fieldNorm(doc=264)
          0.056368057 = weight(abstract_txt:knowledge in 264) [ClassicSimilarity], result of:
            0.056368057 = score(doc=264,freq=1.0), product of:
              0.16797058 = queryWeight, product of:
                2.4455657 = boost
                3.5795512 = idf(docFreq=3278, maxDocs=43254)
                0.019187806 = queryNorm
              0.3355829 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5795512 = idf(docFreq=3278, maxDocs=43254)
                0.09375 = fieldNorm(doc=264)
          0.16766626 = weight(abstract_txt:discovery in 264) [ClassicSimilarity], result of:
            0.16766626 = score(doc=264,freq=1.0), product of:
              0.31564242 = queryWeight, product of:
                2.903294 = boost
                5.666032 = idf(docFreq=406, maxDocs=43254)
                0.019187806 = queryNorm
              0.5311905 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.666032 = idf(docFreq=406, maxDocs=43254)
                0.09375 = fieldNorm(doc=264)
          0.21950454 = weight(abstract_txt:mining in 264) [ClassicSimilarity], result of:
            0.21950454 = score(doc=264,freq=1.0), product of:
              0.37774083 = queryWeight, product of:
                3.1760716 = boost
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.019187806 = queryNorm
              0.58109826 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.09375 = fieldNorm(doc=264)
          0.11527722 = weight(abstract_txt:data in 264) [ClassicSimilarity], result of:
            0.11527722 = score(doc=264,freq=2.0), product of:
              0.25884685 = queryWeight, product of:
                4.0160875 = boost
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.019187806 = queryNorm
              0.44534916 = fieldWeight in 264, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.09375 = fieldNorm(doc=264)
        0.28 = coord(7/25)