Document (#20913)

Author
Wu, X.
Title
Rule induction with extension matrices
Source
Journal of the American Society for Information Science. 49(1998) no.5, S.435-454
Year
1998
Abstract
Presents a heuristic, attribute-based, noise-tolerant data mining program, HCV (Version 2.0), absed on the newly-developed extension matrix approach. Gives a simple example of attribute-based induction to show the difference between the rules in variable-valued logic produced by HCV, the decision tree generated by C4.5 and the decision tree's decompiled rules by C4.5 rules. Outlines the extension matrix approach for data mining. Describes the HCV algorithm in detail. Outlines techniques developed and implemented in the HCV program for noise handling and discretization of continuous domains respectively. Follows these with a performance comparison of HCV with famous ID3-like algorithms including C4.5 and C4.5 rules on a collection of standard databases including the famous MONK's problems
Footnote
Contribution to a special issue devoted to knowledge discovery and data mining
Theme
Data Mining

Similar documents (content)

  1. Yang, H.; King, I.; Lyu, M.R.: ¬The generalized dependency degree between attributes (2007) 0.22
    0.21966977 = sum of:
      0.21966977 = product of:
        0.9152908 = sum of:
          0.03547898 = weight(abstract_txt:tree in 1322) [ClassicSimilarity], result of:
            0.03547898 = score(doc=1322,freq=1.0), product of:
              0.082932875 = queryWeight, product of:
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.012116087 = queryNorm
              0.42780355 = fieldWeight in 1322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.0625 = fieldNorm(doc=1322)
          0.008979269 = weight(abstract_txt:with in 1322) [ClassicSimilarity], result of:
            0.008979269 = score(doc=1322,freq=3.0), product of:
              0.033182316 = queryWeight, product of:
                1.0955964 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.012116087 = queryNorm
              0.27060407 = fieldWeight in 1322, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=1322)
          0.039020736 = weight(abstract_txt:decision in 1322) [ClassicSimilarity], result of:
            0.039020736 = score(doc=1322,freq=1.0), product of:
              0.1113319 = queryWeight, product of:
                1.6385566 = boost
                5.6078424 = idf(docFreq=440, maxDocs=44218)
                0.012116087 = queryNorm
              0.35049015 = fieldWeight in 1322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6078424 = idf(docFreq=440, maxDocs=44218)
                0.0625 = fieldNorm(doc=1322)
          0.113541715 = weight(abstract_txt:attribute in 1322) [ClassicSimilarity], result of:
            0.113541715 = score(doc=1322,freq=2.0), product of:
              0.18010107 = queryWeight, product of:
                2.0840578 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.012116087 = queryNorm
              0.6304333 = fieldWeight in 1322, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0625 = fieldNorm(doc=1322)
          0.110088535 = weight(abstract_txt:rules in 1322) [ClassicSimilarity], result of:
            0.110088535 = score(doc=1322,freq=3.0), product of:
              0.19418707 = queryWeight, product of:
                3.06039 = boost
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.012116087 = queryNorm
              0.56692 = fieldWeight in 1322, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.0625 = fieldNorm(doc=1322)
          0.60818154 = weight(abstract_txt:c4.5 in 1322) [ClassicSimilarity], result of:
            0.60818154 = score(doc=1322,freq=2.0), product of:
              0.69466937 = queryWeight, product of:
                5.788362 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.012116087 = queryNorm
              0.8754978 = fieldWeight in 1322, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.0625 = fieldNorm(doc=1322)
        0.24 = coord(6/25)
    
  2. Tang, X.-B.; Liu, G.-C.; Yang, J.; Wei, W.: Knowledge-based financial statement fraud detection system : based on an ontology and a decision tree (2018) 0.16
    0.15673566 = sum of:
      0.15673566 = product of:
        0.7836783 = sum of:
          0.044348724 = weight(abstract_txt:tree in 4306) [ClassicSimilarity], result of:
            0.044348724 = score(doc=4306,freq=1.0), product of:
              0.082932875 = queryWeight, product of:
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.012116087 = queryNorm
              0.53475446 = fieldWeight in 4306, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.078125 = fieldNorm(doc=4306)
          0.020429792 = weight(abstract_txt:developed in 4306) [ClassicSimilarity], result of:
            0.020429792 = score(doc=4306,freq=1.0), product of:
              0.062324475 = queryWeight, product of:
                1.2259731 = boost
                4.195805 = idf(docFreq=1809, maxDocs=44218)
                0.012116087 = queryNorm
              0.32779726 = fieldWeight in 4306, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.195805 = idf(docFreq=1809, maxDocs=44218)
                0.078125 = fieldNorm(doc=4306)
          0.06897956 = weight(abstract_txt:decision in 4306) [ClassicSimilarity], result of:
            0.06897956 = score(doc=4306,freq=2.0), product of:
              0.1113319 = queryWeight, product of:
                1.6385566 = boost
                5.6078424 = idf(docFreq=440, maxDocs=44218)
                0.012116087 = queryNorm
              0.61958486 = fieldWeight in 4306, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6078424 = idf(docFreq=440, maxDocs=44218)
                0.078125 = fieldNorm(doc=4306)
          0.112358645 = weight(abstract_txt:rules in 4306) [ClassicSimilarity], result of:
            0.112358645 = score(doc=4306,freq=2.0), product of:
              0.19418707 = queryWeight, product of:
                3.06039 = boost
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.012116087 = queryNorm
              0.5786103 = fieldWeight in 4306, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.078125 = fieldNorm(doc=4306)
          0.5375616 = weight(abstract_txt:c4.5 in 4306) [ClassicSimilarity], result of:
            0.5375616 = score(doc=4306,freq=1.0), product of:
              0.69466937 = queryWeight, product of:
                5.788362 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.012116087 = queryNorm
              0.7738381 = fieldWeight in 4306, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.078125 = fieldNorm(doc=4306)
        0.2 = coord(5/25)
    
  3. Fletcher, G.P.; Hinde, C.J.: Using a neural network as a tool for constructing rule based systems (1995) 0.09
    0.091620326 = sum of:
      0.091620326 = product of:
        0.57262707 = sum of:
          0.009072322 = weight(abstract_txt:with in 3214) [ClassicSimilarity], result of:
            0.009072322 = score(doc=3214,freq=1.0), product of:
              0.033182316 = queryWeight, product of:
                1.0955964 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.012116087 = queryNorm
              0.27340835 = fieldWeight in 3214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.109375 = fieldNorm(doc=3214)
          0.18268345 = weight(abstract_txt:noise in 3214) [ClassicSimilarity], result of:
            0.18268345 = score(doc=3214,freq=1.0), product of:
              0.2145508 = queryWeight, product of:
                2.274661 = boost
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.012116087 = queryNorm
              0.8514695 = fieldWeight in 3214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.109375 = fieldNorm(doc=3214)
          0.2696419 = weight(abstract_txt:induction in 3214) [ClassicSimilarity], result of:
            0.2696419 = score(doc=3214,freq=1.0), product of:
              0.27813494 = queryWeight, product of:
                2.5898786 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.012116087 = queryNorm
              0.96946436 = fieldWeight in 3214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.109375 = fieldNorm(doc=3214)
          0.11122938 = weight(abstract_txt:rules in 3214) [ClassicSimilarity], result of:
            0.11122938 = score(doc=3214,freq=1.0), product of:
              0.19418707 = queryWeight, product of:
                3.06039 = boost
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.012116087 = queryNorm
              0.572795 = fieldWeight in 3214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.109375 = fieldNorm(doc=3214)
        0.16 = coord(4/25)
    
  4. Kolluri, V.; Metzler, D.P.: Knowledge guided rule learning (1999) 0.09
    0.090294145 = sum of:
      0.090294145 = product of:
        0.3224791 = sum of:
          0.042209953 = weight(abstract_txt:continuous in 6550) [ClassicSimilarity], result of:
            0.042209953 = score(doc=6550,freq=2.0), product of:
              0.08953065 = queryWeight, product of:
                1.0390166 = boost
                7.11192 = idf(docFreq=97, maxDocs=44218)
                0.012116087 = queryNorm
              0.47145814 = fieldWeight in 6550, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.11192 = idf(docFreq=97, maxDocs=44218)
                0.046875 = fieldNorm(doc=6550)
          0.0054986575 = weight(abstract_txt:with in 6550) [ClassicSimilarity], result of:
            0.0054986575 = score(doc=6550,freq=2.0), product of:
              0.033182316 = queryWeight, product of:
                1.0955964 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.012116087 = queryNorm
              0.16571048 = fieldWeight in 6550, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.046875 = fieldNorm(doc=6550)
          0.012257876 = weight(abstract_txt:developed in 6550) [ClassicSimilarity], result of:
            0.012257876 = score(doc=6550,freq=1.0), product of:
              0.062324475 = queryWeight, product of:
                1.2259731 = boost
                4.195805 = idf(docFreq=1809, maxDocs=44218)
                0.012116087 = queryNorm
              0.19667837 = fieldWeight in 6550, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.195805 = idf(docFreq=1809, maxDocs=44218)
                0.046875 = fieldNorm(doc=6550)
          0.039081406 = weight(abstract_txt:mining in 6550) [ClassicSimilarity], result of:
            0.039081406 = score(doc=6550,freq=1.0), product of:
              0.13500875 = queryWeight, product of:
                1.804399 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.012116087 = queryNorm
              0.28947312 = fieldWeight in 6550, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.046875 = fieldNorm(doc=6550)
          0.10429472 = weight(abstract_txt:attribute in 6550) [ClassicSimilarity], result of:
            0.10429472 = score(doc=6550,freq=3.0), product of:
              0.18010107 = queryWeight, product of:
                2.0840578 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.012116087 = queryNorm
              0.57908994 = fieldWeight in 6550, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.046875 = fieldNorm(doc=6550)
          0.07146675 = weight(abstract_txt:extension in 6550) [ClassicSimilarity], result of:
            0.07146675 = score(doc=6550,freq=1.0), product of:
              0.23110798 = queryWeight, product of:
                2.891377 = boost
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.012116087 = queryNorm
              0.30923533 = fieldWeight in 6550, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.046875 = fieldNorm(doc=6550)
          0.04766974 = weight(abstract_txt:rules in 6550) [ClassicSimilarity], result of:
            0.04766974 = score(doc=6550,freq=1.0), product of:
              0.19418707 = queryWeight, product of:
                3.06039 = boost
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.012116087 = queryNorm
              0.24548358 = fieldWeight in 6550, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.046875 = fieldNorm(doc=6550)
        0.28 = coord(7/25)
    
  5. Methodologies for knowledge discovery and data mining : Third Pacific-Asia Conference, PAKDD'99, Beijing, China, April 26-28, 1999, Proceedings (1999) 0.08
    0.083024904 = sum of:
      0.083024904 = product of:
        0.51890564 = sum of:
          0.009072322 = weight(abstract_txt:with in 3821) [ClassicSimilarity], result of:
            0.009072322 = score(doc=3821,freq=1.0), product of:
              0.033182316 = queryWeight, product of:
                1.0955964 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.012116087 = queryNorm
              0.27340835 = fieldWeight in 3821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.109375 = fieldNorm(doc=3821)
          0.12896205 = weight(abstract_txt:mining in 3821) [ClassicSimilarity], result of:
            0.12896205 = score(doc=3821,freq=2.0), product of:
              0.13500875 = queryWeight, product of:
                1.804399 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.012116087 = queryNorm
              0.95521253 = fieldWeight in 3821, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.109375 = fieldNorm(doc=3821)
          0.2696419 = weight(abstract_txt:induction in 3821) [ClassicSimilarity], result of:
            0.2696419 = score(doc=3821,freq=1.0), product of:
              0.27813494 = queryWeight, product of:
                2.5898786 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.012116087 = queryNorm
              0.96946436 = fieldWeight in 3821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.109375 = fieldNorm(doc=3821)
          0.11122938 = weight(abstract_txt:rules in 3821) [ClassicSimilarity], result of:
            0.11122938 = score(doc=3821,freq=1.0), product of:
              0.19418707 = queryWeight, product of:
                3.06039 = boost
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.012116087 = queryNorm
              0.572795 = fieldWeight in 3821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.109375 = fieldNorm(doc=3821)
        0.16 = coord(4/25)