Document (#32911)

Author
Yoon, Y.
Lee, G.G.
Title
Efficient implementation of associative classifiers for document classification
Source
Information processing and management. 43(2007) no.2, S.393-405
Year
2007
Abstract
In practical text classification tasks, the ability to interpret the classification result is as important as the ability to classify exactly. Associative classifiers have many favorable characteristics such as rapid training, good classification accuracy, and excellent interpretation. However, associative classifiers also have some obstacles to overcome when they are applied in the area of text classification. The target text collection generally has a very high dimension, thus the training process might take a very long time. We propose a feature selection based on the mutual information between the word and class variables to reduce the space dimension of the associative classifiers. In addition, the training process of the associative classifier produces a huge amount of classification rules, which makes the prediction with a new document ineffective. We resolve this by introducing a new efficient method for storing and pruning classification rules. This method can also be used when predicting a test document. Experimental results using the 20-newsgroups dataset show many benefits of the associative classification in both training and predicting when applied to a real world problem.
Footnote
Beitrag in: Special issue on AIRS2005: Information Retrieval Research in Asia
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Yoon, L.L.: ¬The performance of cited references as an approach to information retrieval (1994) 5.64
    5.6377864 = sum of:
      5.6377864 = weight(author_txt:yoon in 219) [ClassicSimilarity], result of:
        5.6377864 = fieldWeight in 219, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.020458 = idf(docFreq=13, maxDocs=42596)
          0.625 = fieldNorm(doc=219)
    
  2. Yoon, J.W.: Utilizing quantitative users' reactions to represent affective meanings of an image (2010) 5.64
    5.6377864 = sum of:
      5.6377864 = weight(author_txt:yoon in 4764) [ClassicSimilarity], result of:
        5.6377864 = fieldWeight in 4764, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.020458 = idf(docFreq=13, maxDocs=42596)
          0.625 = fieldNorm(doc=4764)
    
  3. Yoon, J.W.: Towards a user-oriented thesaurus for non-domain-specific image collections (2009) 5.64
    5.6377864 = sum of:
      5.6377864 = weight(author_txt:yoon in 222) [ClassicSimilarity], result of:
        5.6377864 = fieldWeight in 222, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.020458 = idf(docFreq=13, maxDocs=42596)
          0.625 = fieldNorm(doc=222)
    
  4. Yoon, K.: Conceptual syntagmatic associations in user tagging (2012) 5.64
    5.6377864 = sum of:
      5.6377864 = weight(author_txt:yoon in 1241) [ClassicSimilarity], result of:
        5.6377864 = fieldWeight in 1241, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.020458 = idf(docFreq=13, maxDocs=42596)
          0.625 = fieldNorm(doc=1241)
    
  5. Yoon, A.: Data reusers' trust development (2017) 5.64
    5.6377864 = sum of:
      5.6377864 = weight(author_txt:yoon in 4533) [ClassicSimilarity], result of:
        5.6377864 = fieldWeight in 4533, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.020458 = idf(docFreq=13, maxDocs=42596)
          0.625 = fieldNorm(doc=4533)
    

Similar documents (content)

  1. Gauch, S.; Chandramouli, A.; Ranganathan, S.: Training a hierarchical classifier using inter document relationships (2009) 0.21
    0.21171583 = sum of:
      0.21171583 = product of:
        0.88214934 = sum of:
          0.042562116 = weight(abstract_txt:text in 3877) [ClassicSimilarity], result of:
            0.042562116 = score(doc=3877,freq=2.0), product of:
              0.09513788 = queryWeight, product of:
                1.6294786 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.014419164 = queryNorm
              0.44737297 = fieldWeight in 3877, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.078125 = fieldNorm(doc=3877)
          0.032913864 = weight(abstract_txt:when in 3877) [ClassicSimilarity], result of:
            0.032913864 = score(doc=3877,freq=1.0), product of:
              0.10098741 = queryWeight, product of:
                1.6788255 = boost
                4.171782 = idf(docFreq=1785, maxDocs=42596)
                0.014419164 = queryNorm
              0.32592046 = fieldWeight in 3877, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.171782 = idf(docFreq=1785, maxDocs=42596)
                0.078125 = fieldNorm(doc=3877)
          0.035508323 = weight(abstract_txt:document in 3877) [ClassicSimilarity], result of:
            0.035508323 = score(doc=3877,freq=1.0), product of:
              0.10622696 = queryWeight, product of:
                1.7218261 = boost
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.014419164 = queryNorm
              0.33426848 = fieldWeight in 3877, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.078125 = fieldNorm(doc=3877)
          0.16321476 = weight(abstract_txt:training in 3877) [ClassicSimilarity], result of:
            0.16321476 = score(doc=3877,freq=4.0), product of:
              0.20361692 = queryWeight, product of:
                2.7526324 = boost
                5.130097 = idf(docFreq=684, maxDocs=42596)
                0.014419164 = queryNorm
              0.8015776 = fieldWeight in 3877, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.130097 = idf(docFreq=684, maxDocs=42596)
                0.078125 = fieldNorm(doc=3877)
          0.45328322 = weight(abstract_txt:classifiers in 3877) [ClassicSimilarity], result of:
            0.45328322 = score(doc=3877,freq=3.0), product of:
              0.44279248 = queryWeight, product of:
                4.059209 = boost
                7.5651712 = idf(docFreq=59, maxDocs=42596)
                0.014419164 = queryNorm
              1.0236923 = fieldWeight in 3877, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5651712 = idf(docFreq=59, maxDocs=42596)
                0.078125 = fieldNorm(doc=3877)
          0.15466702 = weight(abstract_txt:classification in 3877) [ClassicSimilarity], result of:
            0.15466702 = score(doc=3877,freq=4.0), product of:
              0.24750426 = queryWeight, product of:
                4.2918806 = boost
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.014419164 = queryNorm
              0.6249065 = fieldWeight in 3877, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.078125 = fieldNorm(doc=3877)
        0.24 = coord(6/25)
    
  2. Liu, R.-L.: Dynamic category profiling for text filtering and classification (2007) 0.18
    0.1760585 = sum of:
      0.1760585 = product of:
        0.8802925 = sum of:
          0.030095963 = weight(abstract_txt:text in 2080) [ClassicSimilarity], result of:
            0.030095963 = score(doc=2080,freq=1.0), product of:
              0.09513788 = queryWeight, product of:
                1.6294786 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.014419164 = queryNorm
              0.31634048 = fieldWeight in 2080, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.078125 = fieldNorm(doc=2080)
          0.05021635 = weight(abstract_txt:document in 2080) [ClassicSimilarity], result of:
            0.05021635 = score(doc=2080,freq=2.0), product of:
              0.10622696 = queryWeight, product of:
                1.7218261 = boost
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.014419164 = queryNorm
              0.472727 = fieldWeight in 2080, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.078125 = fieldNorm(doc=2080)
          0.08160738 = weight(abstract_txt:training in 2080) [ClassicSimilarity], result of:
            0.08160738 = score(doc=2080,freq=1.0), product of:
              0.20361692 = queryWeight, product of:
                2.7526324 = boost
                5.130097 = idf(docFreq=684, maxDocs=42596)
                0.014419164 = queryNorm
              0.4007888 = fieldWeight in 2080, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.130097 = idf(docFreq=684, maxDocs=42596)
                0.078125 = fieldNorm(doc=2080)
          0.64103925 = weight(abstract_txt:classifiers in 2080) [ClassicSimilarity], result of:
            0.64103925 = score(doc=2080,freq=6.0), product of:
              0.44279248 = queryWeight, product of:
                4.059209 = boost
                7.5651712 = idf(docFreq=59, maxDocs=42596)
                0.014419164 = queryNorm
              1.4477195 = fieldWeight in 2080, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.5651712 = idf(docFreq=59, maxDocs=42596)
                0.078125 = fieldNorm(doc=2080)
          0.07733351 = weight(abstract_txt:classification in 2080) [ClassicSimilarity], result of:
            0.07733351 = score(doc=2080,freq=1.0), product of:
              0.24750426 = queryWeight, product of:
                4.2918806 = boost
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.014419164 = queryNorm
              0.31245324 = fieldWeight in 2080, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.078125 = fieldNorm(doc=2080)
        0.2 = coord(5/25)
    
  3. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.17
    0.16597973 = sum of:
      0.16597973 = product of:
        0.6915822 = sum of:
          0.04846205 = weight(abstract_txt:method in 274) [ClassicSimilarity], result of:
            0.04846205 = score(doc=274,freq=3.0), product of:
              0.07916714 = queryWeight, product of:
                1.2136649 = boost
                4.5238285 = idf(docFreq=1255, maxDocs=42596)
                0.014419164 = queryNorm
              0.6121485 = fieldWeight in 274, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5238285 = idf(docFreq=1255, maxDocs=42596)
                0.078125 = fieldNorm(doc=274)
          0.03384344 = weight(abstract_txt:applied in 274) [ClassicSimilarity], result of:
            0.03384344 = score(doc=274,freq=1.0), product of:
              0.08987396 = queryWeight, product of:
                1.2931331 = boost
                4.8200393 = idf(docFreq=933, maxDocs=42596)
                0.014419164 = queryNorm
              0.37656558 = fieldWeight in 274, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8200393 = idf(docFreq=933, maxDocs=42596)
                0.078125 = fieldNorm(doc=274)
          0.03615356 = weight(abstract_txt:very in 274) [ClassicSimilarity], result of:
            0.03615356 = score(doc=274,freq=1.0), product of:
              0.09391859 = queryWeight, product of:
                1.3219106 = boost
                4.9273047 = idf(docFreq=838, maxDocs=42596)
                0.014419164 = queryNorm
              0.3849457 = fieldWeight in 274, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9273047 = idf(docFreq=838, maxDocs=42596)
                0.078125 = fieldNorm(doc=274)
          0.030095963 = weight(abstract_txt:text in 274) [ClassicSimilarity], result of:
            0.030095963 = score(doc=274,freq=1.0), product of:
              0.09513788 = queryWeight, product of:
                1.6294786 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.014419164 = queryNorm
              0.31634048 = fieldWeight in 274, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.078125 = fieldNorm(doc=274)
          0.37010422 = weight(abstract_txt:classifiers in 274) [ClassicSimilarity], result of:
            0.37010422 = score(doc=274,freq=2.0), product of:
              0.44279248 = queryWeight, product of:
                4.059209 = boost
                7.5651712 = idf(docFreq=59, maxDocs=42596)
                0.014419164 = queryNorm
              0.83584124 = fieldWeight in 274, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5651712 = idf(docFreq=59, maxDocs=42596)
                0.078125 = fieldNorm(doc=274)
          0.17292297 = weight(abstract_txt:classification in 274) [ClassicSimilarity], result of:
            0.17292297 = score(doc=274,freq=5.0), product of:
              0.24750426 = queryWeight, product of:
                4.2918806 = boost
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.014419164 = queryNorm
              0.69866663 = fieldWeight in 274, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.078125 = fieldNorm(doc=274)
        0.24 = coord(6/25)
    
  4. Desale, S.K.; Kumbhar, R.: Research on automatic classification of documents in library environment : a literature review (2013) 0.15
    0.15227905 = sum of:
      0.15227905 = product of:
        0.63449603 = sum of:
          0.03615356 = weight(abstract_txt:very in 2072) [ClassicSimilarity], result of:
            0.03615356 = score(doc=2072,freq=1.0), product of:
              0.09391859 = queryWeight, product of:
                1.3219106 = boost
                4.9273047 = idf(docFreq=838, maxDocs=42596)
                0.014419164 = queryNorm
              0.3849457 = fieldWeight in 2072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9273047 = idf(docFreq=838, maxDocs=42596)
                0.078125 = fieldNorm(doc=2072)
          0.030095963 = weight(abstract_txt:text in 2072) [ClassicSimilarity], result of:
            0.030095963 = score(doc=2072,freq=1.0), product of:
              0.09513788 = queryWeight, product of:
                1.6294786 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.014419164 = queryNorm
              0.31634048 = fieldWeight in 2072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.078125 = fieldNorm(doc=2072)
          0.035508323 = weight(abstract_txt:document in 2072) [ClassicSimilarity], result of:
            0.035508323 = score(doc=2072,freq=1.0), product of:
              0.10622696 = queryWeight, product of:
                1.7218261 = boost
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.014419164 = queryNorm
              0.33426848 = fieldWeight in 2072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.078125 = fieldNorm(doc=2072)
          0.08160738 = weight(abstract_txt:training in 2072) [ClassicSimilarity], result of:
            0.08160738 = score(doc=2072,freq=1.0), product of:
              0.20361692 = queryWeight, product of:
                2.7526324 = boost
                5.130097 = idf(docFreq=684, maxDocs=42596)
                0.014419164 = queryNorm
              0.4007888 = fieldWeight in 2072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.130097 = idf(docFreq=684, maxDocs=42596)
                0.078125 = fieldNorm(doc=2072)
          0.2617032 = weight(abstract_txt:classifiers in 2072) [ClassicSimilarity], result of:
            0.2617032 = score(doc=2072,freq=1.0), product of:
              0.44279248 = queryWeight, product of:
                4.059209 = boost
                7.5651712 = idf(docFreq=59, maxDocs=42596)
                0.014419164 = queryNorm
              0.591029 = fieldWeight in 2072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5651712 = idf(docFreq=59, maxDocs=42596)
                0.078125 = fieldNorm(doc=2072)
          0.18942763 = weight(abstract_txt:classification in 2072) [ClassicSimilarity], result of:
            0.18942763 = score(doc=2072,freq=6.0), product of:
              0.24750426 = queryWeight, product of:
                4.2918806 = boost
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.014419164 = queryNorm
              0.765351 = fieldWeight in 2072, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.078125 = fieldNorm(doc=2072)
        0.24 = coord(6/25)
    
  5. Liu, R.-L.: Context-based term frequency assessment for text classification (2010) 0.15
    0.15100466 = sum of:
      0.15100466 = product of:
        0.7550233 = sum of:
          0.058583155 = weight(abstract_txt:efficient in 4511) [ClassicSimilarity], result of:
            0.058583155 = score(doc=4511,freq=1.0), product of:
              0.12956849 = queryWeight, product of:
                1.5526587 = boost
                5.787398 = idf(docFreq=354, maxDocs=42596)
                0.014419164 = queryNorm
              0.45214045 = fieldWeight in 4511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.787398 = idf(docFreq=354, maxDocs=42596)
                0.078125 = fieldNorm(doc=4511)
          0.060191926 = weight(abstract_txt:text in 4511) [ClassicSimilarity], result of:
            0.060191926 = score(doc=4511,freq=4.0), product of:
              0.09513788 = queryWeight, product of:
                1.6294786 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.014419164 = queryNorm
              0.63268095 = fieldWeight in 4511, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.078125 = fieldNorm(doc=4511)
          0.035508323 = weight(abstract_txt:document in 4511) [ClassicSimilarity], result of:
            0.035508323 = score(doc=4511,freq=1.0), product of:
              0.10622696 = queryWeight, product of:
                1.7218261 = boost
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.014419164 = queryNorm
              0.33426848 = fieldWeight in 4511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.078125 = fieldNorm(doc=4511)
          0.5234064 = weight(abstract_txt:classifiers in 4511) [ClassicSimilarity], result of:
            0.5234064 = score(doc=4511,freq=4.0), product of:
              0.44279248 = queryWeight, product of:
                4.059209 = boost
                7.5651712 = idf(docFreq=59, maxDocs=42596)
                0.014419164 = queryNorm
              1.182058 = fieldWeight in 4511, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.5651712 = idf(docFreq=59, maxDocs=42596)
                0.078125 = fieldNorm(doc=4511)
          0.07733351 = weight(abstract_txt:classification in 4511) [ClassicSimilarity], result of:
            0.07733351 = score(doc=4511,freq=1.0), product of:
              0.24750426 = queryWeight, product of:
                4.2918806 = boost
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.014419164 = queryNorm
              0.31245324 = fieldWeight in 4511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.078125 = fieldNorm(doc=4511)
        0.2 = coord(5/25)