Document (#32910)

Author
Yoon, Y.
Lee, G.G.
Title
Efficient implementation of associative classifiers for document classification
Source
Information processing and management. 43(2007) no.2, S.393-405
Year
2007
Abstract
In practical text classification tasks, the ability to interpret the classification result is as important as the ability to classify exactly. Associative classifiers have many favorable characteristics such as rapid training, good classification accuracy, and excellent interpretation. However, associative classifiers also have some obstacles to overcome when they are applied in the area of text classification. The target text collection generally has a very high dimension, thus the training process might take a very long time. We propose a feature selection based on the mutual information between the word and class variables to reduce the space dimension of the associative classifiers. In addition, the training process of the associative classifier produces a huge amount of classification rules, which makes the prediction with a new document ineffective. We resolve this by introducing a new efficient method for storing and pruning classification rules. This method can also be used when predicting a test document. Experimental results using the 20-newsgroups dataset show many benefits of the associative classification in both training and predicting when applied to a real world problem.
Footnote
Beitrag in: Special issue on AIRS2005: Information Retrieval Research in Asia
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Yoon, L.L.: ¬The performance of cited references as an approach to information retrieval (1994) 5.58
    5.5776863 = sum of:
      5.5776863 = weight(author_txt:yoon in 8219) [ClassicSimilarity], result of:
        5.5776863 = fieldWeight in 8219, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.625 = fieldNorm(doc=8219)
    
  2. Yoon, J.W.: Utilizing quantitative users' reactions to represent affective meanings of an image (2010) 5.58
    5.5776863 = sum of:
      5.5776863 = weight(author_txt:yoon in 3584) [ClassicSimilarity], result of:
        5.5776863 = fieldWeight in 3584, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.625 = fieldNorm(doc=3584)
    
  3. Yoon, J.W.: Towards a user-oriented thesaurus for non-domain-specific image collections (2009) 5.58
    5.5776863 = sum of:
      5.5776863 = weight(author_txt:yoon in 4221) [ClassicSimilarity], result of:
        5.5776863 = fieldWeight in 4221, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.625 = fieldNorm(doc=4221)
    
  4. Yoon, K.: Conceptual syntagmatic associations in user tagging (2012) 5.58
    5.5776863 = sum of:
      5.5776863 = weight(author_txt:yoon in 240) [ClassicSimilarity], result of:
        5.5776863 = fieldWeight in 240, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.625 = fieldNorm(doc=240)
    
  5. Yoon, A.: Data reusers' trust development (2017) 5.58
    5.5776863 = sum of:
      5.5776863 = weight(author_txt:yoon in 3532) [ClassicSimilarity], result of:
        5.5776863 = fieldWeight in 3532, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.625 = fieldNorm(doc=3532)
    

Similar documents (content)

  1. Gauch, S.; Chandramouli, A.; Ranganathan, S.: Training a hierarchical classifier using inter document relationships (2009) 0.21
    0.21062316 = sum of:
      0.21062316 = product of:
        0.8775965 = sum of:
          0.042687528 = weight(abstract_txt:text in 2697) [ClassicSimilarity], result of:
            0.042687528 = score(doc=2697,freq=2.0), product of:
              0.09554306 = queryWeight, product of:
                1.6379825 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014424244 = queryNorm
              0.44678837 = fieldWeight in 2697, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.032584857 = weight(abstract_txt:when in 2697) [ClassicSimilarity], result of:
            0.032584857 = score(doc=2697,freq=1.0), product of:
              0.10054313 = queryWeight, product of:
                1.6802964 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.014424244 = queryNorm
              0.32408836 = fieldWeight in 2697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.036104333 = weight(abstract_txt:document in 2697) [ClassicSimilarity], result of:
            0.036104333 = score(doc=2697,freq=1.0), product of:
              0.10765842 = queryWeight, product of:
                1.7387363 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.014424244 = queryNorm
              0.33536002 = fieldWeight in 2697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.16261524 = weight(abstract_txt:training in 2697) [ClassicSimilarity], result of:
            0.16261524 = score(doc=2697,freq=4.0), product of:
              0.20358334 = queryWeight, product of:
                2.760897 = boost
                5.112096 = idf(docFreq=723, maxDocs=44218)
                0.014424244 = queryNorm
              0.79876494 = fieldWeight in 2697, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.112096 = idf(docFreq=723, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.4487263 = weight(abstract_txt:classifiers in 2697) [ClassicSimilarity], result of:
            0.4487263 = score(doc=2697,freq=3.0), product of:
              0.4408275 = queryWeight, product of:
                4.0626874 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.014424244 = queryNorm
              1.0179181 = fieldWeight in 2697, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.15487824 = weight(abstract_txt:classification in 2697) [ClassicSimilarity], result of:
            0.15487824 = score(doc=2697,freq=4.0), product of:
              0.2482971 = queryWeight, product of:
                4.3120112 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.014424244 = queryNorm
              0.6237618 = fieldWeight in 2697, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
        0.24 = coord(6/25)
    
  2. Liu, R.-L.: Dynamic category profiling for text filtering and classification (2007) 0.17
    0.17491709 = sum of:
      0.17491709 = product of:
        0.87458545 = sum of:
          0.03018464 = weight(abstract_txt:text in 900) [ClassicSimilarity], result of:
            0.03018464 = score(doc=900,freq=1.0), product of:
              0.09554306 = queryWeight, product of:
                1.6379825 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014424244 = queryNorm
              0.3159271 = fieldWeight in 900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.051059235 = weight(abstract_txt:document in 900) [ClassicSimilarity], result of:
            0.051059235 = score(doc=900,freq=2.0), product of:
              0.10765842 = queryWeight, product of:
                1.7387363 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.014424244 = queryNorm
              0.4742707 = fieldWeight in 900, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.08130762 = weight(abstract_txt:training in 900) [ClassicSimilarity], result of:
            0.08130762 = score(doc=900,freq=1.0), product of:
              0.20358334 = queryWeight, product of:
                2.760897 = boost
                5.112096 = idf(docFreq=723, maxDocs=44218)
                0.014424244 = queryNorm
              0.39938247 = fieldWeight in 900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.112096 = idf(docFreq=723, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.6345948 = weight(abstract_txt:classifiers in 900) [ClassicSimilarity], result of:
            0.6345948 = score(doc=900,freq=6.0), product of:
              0.4408275 = queryWeight, product of:
                4.0626874 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.014424244 = queryNorm
              1.4395536 = fieldWeight in 900, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.07743912 = weight(abstract_txt:classification in 900) [ClassicSimilarity], result of:
            0.07743912 = score(doc=900,freq=1.0), product of:
              0.2482971 = queryWeight, product of:
                4.3120112 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.014424244 = queryNorm
              0.3118809 = fieldWeight in 900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
        0.2 = coord(5/25)
    
  3. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.17
    0.16502701 = sum of:
      0.16502701 = product of:
        0.68761253 = sum of:
          0.048059296 = weight(abstract_txt:method in 5273) [ClassicSimilarity], result of:
            0.048059296 = score(doc=5273,freq=3.0), product of:
              0.07890828 = queryWeight, product of:
                1.2154171 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.014424244 = queryNorm
              0.60905266 = fieldWeight in 5273, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=5273)
          0.033551414 = weight(abstract_txt:applied in 5273) [ClassicSimilarity], result of:
            0.033551414 = score(doc=5273,freq=1.0), product of:
              0.08956093 = queryWeight, product of:
                1.2948617 = boost
                4.79515 = idf(docFreq=993, maxDocs=44218)
                0.014424244 = queryNorm
              0.3746211 = fieldWeight in 5273, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.79515 = idf(docFreq=993, maxDocs=44218)
                0.078125 = fieldNorm(doc=5273)
          0.036274575 = weight(abstract_txt:very in 5273) [ClassicSimilarity], result of:
            0.036274575 = score(doc=5273,freq=1.0), product of:
              0.09434371 = queryWeight, product of:
                1.3289864 = boost
                4.921521 = idf(docFreq=875, maxDocs=44218)
                0.014424244 = queryNorm
              0.38449383 = fieldWeight in 5273, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.921521 = idf(docFreq=875, maxDocs=44218)
                0.078125 = fieldNorm(doc=5273)
          0.03018464 = weight(abstract_txt:text in 5273) [ClassicSimilarity], result of:
            0.03018464 = score(doc=5273,freq=1.0), product of:
              0.09554306 = queryWeight, product of:
                1.6379825 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014424244 = queryNorm
              0.3159271 = fieldWeight in 5273, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=5273)
          0.36638346 = weight(abstract_txt:classifiers in 5273) [ClassicSimilarity], result of:
            0.36638346 = score(doc=5273,freq=2.0), product of:
              0.4408275 = queryWeight, product of:
                4.0626874 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.014424244 = queryNorm
              0.83112663 = fieldWeight in 5273, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.078125 = fieldNorm(doc=5273)
          0.17315914 = weight(abstract_txt:classification in 5273) [ClassicSimilarity], result of:
            0.17315914 = score(doc=5273,freq=5.0), product of:
              0.2482971 = queryWeight, product of:
                4.3120112 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.014424244 = queryNorm
              0.69738686 = fieldWeight in 5273, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=5273)
        0.24 = coord(6/25)
    
  4. Desale, S.K.; Kumbhar, R.: Research on automatic classification of documents in library environment : a literature review (2013) 0.15
    0.15183114 = sum of:
      0.15183114 = product of:
        0.63262975 = sum of:
          0.036274575 = weight(abstract_txt:very in 1071) [ClassicSimilarity], result of:
            0.036274575 = score(doc=1071,freq=1.0), product of:
              0.09434371 = queryWeight, product of:
                1.3289864 = boost
                4.921521 = idf(docFreq=875, maxDocs=44218)
                0.014424244 = queryNorm
              0.38449383 = fieldWeight in 1071, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.921521 = idf(docFreq=875, maxDocs=44218)
                0.078125 = fieldNorm(doc=1071)
          0.03018464 = weight(abstract_txt:text in 1071) [ClassicSimilarity], result of:
            0.03018464 = score(doc=1071,freq=1.0), product of:
              0.09554306 = queryWeight, product of:
                1.6379825 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014424244 = queryNorm
              0.3159271 = fieldWeight in 1071, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=1071)
          0.036104333 = weight(abstract_txt:document in 1071) [ClassicSimilarity], result of:
            0.036104333 = score(doc=1071,freq=1.0), product of:
              0.10765842 = queryWeight, product of:
                1.7387363 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.014424244 = queryNorm
              0.33536002 = fieldWeight in 1071, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=1071)
          0.08130762 = weight(abstract_txt:training in 1071) [ClassicSimilarity], result of:
            0.08130762 = score(doc=1071,freq=1.0), product of:
              0.20358334 = queryWeight, product of:
                2.760897 = boost
                5.112096 = idf(docFreq=723, maxDocs=44218)
                0.014424244 = queryNorm
              0.39938247 = fieldWeight in 1071, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.112096 = idf(docFreq=723, maxDocs=44218)
                0.078125 = fieldNorm(doc=1071)
          0.25907224 = weight(abstract_txt:classifiers in 1071) [ClassicSimilarity], result of:
            0.25907224 = score(doc=1071,freq=1.0), product of:
              0.4408275 = queryWeight, product of:
                4.0626874 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.014424244 = queryNorm
              0.5876953 = fieldWeight in 1071, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.078125 = fieldNorm(doc=1071)
          0.18968633 = weight(abstract_txt:classification in 1071) [ClassicSimilarity], result of:
            0.18968633 = score(doc=1071,freq=6.0), product of:
              0.2482971 = queryWeight, product of:
                4.3120112 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.014424244 = queryNorm
              0.76394904 = fieldWeight in 1071, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=1071)
        0.24 = coord(6/25)
    
  5. Liu, R.-L.: Context-based term frequency assessment for text classification (2010) 0.15
    0.1502008 = sum of:
      0.1502008 = product of:
        0.751004 = sum of:
          0.058946755 = weight(abstract_txt:efficient in 3331) [ClassicSimilarity], result of:
            0.058946755 = score(doc=3331,freq=1.0), product of:
              0.13040212 = queryWeight, product of:
                1.562451 = boost
                5.7860904 = idf(docFreq=368, maxDocs=44218)
                0.014424244 = queryNorm
              0.45203832 = fieldWeight in 3331, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7860904 = idf(docFreq=368, maxDocs=44218)
                0.078125 = fieldNorm(doc=3331)
          0.06036928 = weight(abstract_txt:text in 3331) [ClassicSimilarity], result of:
            0.06036928 = score(doc=3331,freq=4.0), product of:
              0.09554306 = queryWeight, product of:
                1.6379825 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014424244 = queryNorm
              0.6318542 = fieldWeight in 3331, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=3331)
          0.036104333 = weight(abstract_txt:document in 3331) [ClassicSimilarity], result of:
            0.036104333 = score(doc=3331,freq=1.0), product of:
              0.10765842 = queryWeight, product of:
                1.7387363 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.014424244 = queryNorm
              0.33536002 = fieldWeight in 3331, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=3331)
          0.5181445 = weight(abstract_txt:classifiers in 3331) [ClassicSimilarity], result of:
            0.5181445 = score(doc=3331,freq=4.0), product of:
              0.4408275 = queryWeight, product of:
                4.0626874 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.014424244 = queryNorm
              1.1753906 = fieldWeight in 3331, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.078125 = fieldNorm(doc=3331)
          0.07743912 = weight(abstract_txt:classification in 3331) [ClassicSimilarity], result of:
            0.07743912 = score(doc=3331,freq=1.0), product of:
              0.2482971 = queryWeight, product of:
                4.3120112 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.014424244 = queryNorm
              0.3118809 = fieldWeight in 3331, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=3331)
        0.2 = coord(5/25)