Document (#32911)

Author
Yoon, Y.
Lee, G.G.
Title
Efficient implementation of associative classifiers for document classification
Source
Information processing and management. 43(2007) no.2, S.393-405
Year
2007
Abstract
In practical text classification tasks, the ability to interpret the classification result is as important as the ability to classify exactly. Associative classifiers have many favorable characteristics such as rapid training, good classification accuracy, and excellent interpretation. However, associative classifiers also have some obstacles to overcome when they are applied in the area of text classification. The target text collection generally has a very high dimension, thus the training process might take a very long time. We propose a feature selection based on the mutual information between the word and class variables to reduce the space dimension of the associative classifiers. In addition, the training process of the associative classifier produces a huge amount of classification rules, which makes the prediction with a new document ineffective. We resolve this by introducing a new efficient method for storing and pruning classification rules. This method can also be used when predicting a test document. Experimental results using the 20-newsgroups dataset show many benefits of the associative classification in both training and predicting when applied to a real world problem.
Footnote
Beitrag in: Special issue on AIRS2005: Information Retrieval Research in Asia
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Yoon, L.L.: ¬The performance of cited references as an approach to information retrieval (1994) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:yoon in 219) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 219, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=219)
    
  2. Yoon, J.W.: Utilizing quantitative users' reactions to represent affective meanings of an image (2010) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:yoon in 585) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 585, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=585)
    
  3. Yoon, J.W.: Towards a user-oriented thesaurus for non-domain-specific image collections (2009) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:yoon in 1222) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 1222, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=1222)
    
  4. Yoon, K.: Conceptual syntagmatic associations in user tagging (2012) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:yoon in 2241) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 2241, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=2241)
    
  5. Yoon, A.: Data reusers' trust development (2017) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:yoon in 5533) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 5533, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=5533)
    

Similar documents (content)

  1. Gauch, S.; Chandramouli, A.; Ranganathan, S.: Training a hierarchical classifier using inter document relationships (2009) 0.21
    0.21177083 = sum of:
      0.21177083 = product of:
        0.88237846 = sum of:
          0.042565074 = weight(abstract_txt:text in 4698) [ClassicSimilarity], result of:
            0.042565074 = score(doc=4698,freq=2.0), product of:
              0.09512331 = queryWeight, product of:
                1.6291037 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.014417064 = queryNorm
              0.44747257 = fieldWeight in 4698, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.03285512 = weight(abstract_txt:when in 4698) [ClassicSimilarity], result of:
            0.03285512 = score(doc=4698,freq=1.0), product of:
              0.1008471 = queryWeight, product of:
                1.6774013 = boost
                4.1701303 = idf(docFreq=1794, maxDocs=42740)
                0.014417064 = queryNorm
              0.32579142 = fieldWeight in 4698, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1701303 = idf(docFreq=1794, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.035540104 = weight(abstract_txt:document in 4698) [ClassicSimilarity], result of:
            0.035540104 = score(doc=4698,freq=1.0), product of:
              0.10626914 = queryWeight, product of:
                1.7219037 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.014417064 = queryNorm
              0.33443484 = fieldWeight in 4698, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.16316098 = weight(abstract_txt:training in 4698) [ClassicSimilarity], result of:
            0.16316098 = score(doc=4698,freq=4.0), product of:
              0.2035316 = queryWeight, product of:
                2.7516332 = boost
                5.130556 = idf(docFreq=686, maxDocs=42740)
                0.014417064 = queryNorm
              0.8016494 = fieldWeight in 4698, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.130556 = idf(docFreq=686, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.4536187 = weight(abstract_txt:classifiers in 4698) [ClassicSimilarity], result of:
            0.4536187 = score(doc=4698,freq=3.0), product of:
              0.4429226 = queryWeight, product of:
                4.059182 = boost
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.014417064 = queryNorm
              1.024149 = fieldWeight in 4698, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.15463847 = weight(abstract_txt:classification in 4698) [ClassicSimilarity], result of:
            0.15463847 = score(doc=4698,freq=4.0), product of:
              0.24742448 = queryWeight, product of:
                4.29053 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.014417064 = queryNorm
              0.6249926 = fieldWeight in 4698, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
        0.24 = coord(6/25)
    
  2. Liu, R.-L.: Dynamic category profiling for text filtering and classification (2007) 0.18
    0.17615457 = sum of:
      0.17615457 = product of:
        0.8807728 = sum of:
          0.030098053 = weight(abstract_txt:text in 2901) [ClassicSimilarity], result of:
            0.030098053 = score(doc=2901,freq=1.0), product of:
              0.09512331 = queryWeight, product of:
                1.6291037 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.014417064 = queryNorm
              0.3164109 = fieldWeight in 2901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=2901)
          0.050261296 = weight(abstract_txt:document in 2901) [ClassicSimilarity], result of:
            0.050261296 = score(doc=2901,freq=2.0), product of:
              0.10626914 = queryWeight, product of:
                1.7219037 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.014417064 = queryNorm
              0.4729623 = fieldWeight in 2901, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.078125 = fieldNorm(doc=2901)
          0.08158049 = weight(abstract_txt:training in 2901) [ClassicSimilarity], result of:
            0.08158049 = score(doc=2901,freq=1.0), product of:
              0.2035316 = queryWeight, product of:
                2.7516332 = boost
                5.130556 = idf(docFreq=686, maxDocs=42740)
                0.014417064 = queryNorm
              0.4008247 = fieldWeight in 2901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.130556 = idf(docFreq=686, maxDocs=42740)
                0.078125 = fieldNorm(doc=2901)
          0.6415137 = weight(abstract_txt:classifiers in 2901) [ClassicSimilarity], result of:
            0.6415137 = score(doc=2901,freq=6.0), product of:
              0.4429226 = queryWeight, product of:
                4.059182 = boost
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.014417064 = queryNorm
              1.4483653 = fieldWeight in 2901, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.078125 = fieldNorm(doc=2901)
          0.077319235 = weight(abstract_txt:classification in 2901) [ClassicSimilarity], result of:
            0.077319235 = score(doc=2901,freq=1.0), product of:
              0.24742448 = queryWeight, product of:
                4.29053 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.014417064 = queryNorm
              0.3124963 = fieldWeight in 2901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.078125 = fieldNorm(doc=2901)
        0.2 = coord(5/25)
    
  3. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.17
    0.16599445 = sum of:
      0.16599445 = product of:
        0.69164354 = sum of:
          0.04836299 = weight(abstract_txt:method in 274) [ClassicSimilarity], result of:
            0.04836299 = score(doc=274,freq=3.0), product of:
              0.07904346 = queryWeight, product of:
                1.2125303 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.014417064 = queryNorm
              0.6118531 = fieldWeight in 274, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.078125 = fieldNorm(doc=274)
          0.033759486 = weight(abstract_txt:applied in 274) [ClassicSimilarity], result of:
            0.033759486 = score(doc=274,freq=1.0), product of:
              0.08970738 = queryWeight, product of:
                1.2917359 = boost
                4.817011 = idf(docFreq=939, maxDocs=42740)
                0.014417064 = queryNorm
              0.37632897 = fieldWeight in 274, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.817011 = idf(docFreq=939, maxDocs=42740)
                0.078125 = fieldNorm(doc=274)
          0.036153812 = weight(abstract_txt:very in 274) [ClassicSimilarity], result of:
            0.036153812 = score(doc=274,freq=1.0), product of:
              0.09390031 = queryWeight, product of:
                1.321579 = boost
                4.928299 = idf(docFreq=840, maxDocs=42740)
                0.014417064 = queryNorm
              0.38502336 = fieldWeight in 274, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.928299 = idf(docFreq=840, maxDocs=42740)
                0.078125 = fieldNorm(doc=274)
          0.030098053 = weight(abstract_txt:text in 274) [ClassicSimilarity], result of:
            0.030098053 = score(doc=274,freq=1.0), product of:
              0.09512331 = queryWeight, product of:
                1.6291037 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.014417064 = queryNorm
              0.3164109 = fieldWeight in 274, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=274)
          0.3703781 = weight(abstract_txt:classifiers in 274) [ClassicSimilarity], result of:
            0.3703781 = score(doc=274,freq=2.0), product of:
              0.4429226 = queryWeight, product of:
                4.059182 = boost
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.014417064 = queryNorm
              0.83621407 = fieldWeight in 274, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.078125 = fieldNorm(doc=274)
          0.17289108 = weight(abstract_txt:classification in 274) [ClassicSimilarity], result of:
            0.17289108 = score(doc=274,freq=5.0), product of:
              0.24742448 = queryWeight, product of:
                4.29053 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.014417064 = queryNorm
              0.698763 = fieldWeight in 274, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.078125 = fieldNorm(doc=274)
        0.24 = coord(6/25)
    
  4. Desale, S.K.; Kumbhar, R.: Research on automatic classification of documents in library environment : a literature review (2013) 0.15
    0.15231888 = sum of:
      0.15231888 = product of:
        0.63466203 = sum of:
          0.036153812 = weight(abstract_txt:very in 3072) [ClassicSimilarity], result of:
            0.036153812 = score(doc=3072,freq=1.0), product of:
              0.09390031 = queryWeight, product of:
                1.321579 = boost
                4.928299 = idf(docFreq=840, maxDocs=42740)
                0.014417064 = queryNorm
              0.38502336 = fieldWeight in 3072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.928299 = idf(docFreq=840, maxDocs=42740)
                0.078125 = fieldNorm(doc=3072)
          0.030098053 = weight(abstract_txt:text in 3072) [ClassicSimilarity], result of:
            0.030098053 = score(doc=3072,freq=1.0), product of:
              0.09512331 = queryWeight, product of:
                1.6291037 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.014417064 = queryNorm
              0.3164109 = fieldWeight in 3072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=3072)
          0.035540104 = weight(abstract_txt:document in 3072) [ClassicSimilarity], result of:
            0.035540104 = score(doc=3072,freq=1.0), product of:
              0.10626914 = queryWeight, product of:
                1.7219037 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.014417064 = queryNorm
              0.33443484 = fieldWeight in 3072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.078125 = fieldNorm(doc=3072)
          0.08158049 = weight(abstract_txt:training in 3072) [ClassicSimilarity], result of:
            0.08158049 = score(doc=3072,freq=1.0), product of:
              0.2035316 = queryWeight, product of:
                2.7516332 = boost
                5.130556 = idf(docFreq=686, maxDocs=42740)
                0.014417064 = queryNorm
              0.4008247 = fieldWeight in 3072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.130556 = idf(docFreq=686, maxDocs=42740)
                0.078125 = fieldNorm(doc=3072)
          0.26189685 = weight(abstract_txt:classifiers in 3072) [ClassicSimilarity], result of:
            0.26189685 = score(doc=3072,freq=1.0), product of:
              0.4429226 = queryWeight, product of:
                4.059182 = boost
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.014417064 = queryNorm
              0.5912926 = fieldWeight in 3072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.078125 = fieldNorm(doc=3072)
          0.18939269 = weight(abstract_txt:classification in 3072) [ClassicSimilarity], result of:
            0.18939269 = score(doc=3072,freq=6.0), product of:
              0.24742448 = queryWeight, product of:
                4.29053 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.014417064 = queryNorm
              0.76545656 = fieldWeight in 3072, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.078125 = fieldNorm(doc=3072)
        0.24 = coord(6/25)
    
  5. Liu, R.-L.: Context-based term frequency assessment for text classification (2010) 0.15
    0.15106586 = sum of:
      0.15106586 = product of:
        0.75532925 = sum of:
          0.05848007 = weight(abstract_txt:efficient in 332) [ClassicSimilarity], result of:
            0.05848007 = score(doc=332,freq=1.0), product of:
              0.12939064 = queryWeight, product of:
                1.5513545 = boost
                5.785155 = idf(docFreq=356, maxDocs=42740)
                0.014417064 = queryNorm
              0.4519652 = fieldWeight in 332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.785155 = idf(docFreq=356, maxDocs=42740)
                0.078125 = fieldNorm(doc=332)
          0.060196105 = weight(abstract_txt:text in 332) [ClassicSimilarity], result of:
            0.060196105 = score(doc=332,freq=4.0), product of:
              0.09512331 = queryWeight, product of:
                1.6291037 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.014417064 = queryNorm
              0.6328218 = fieldWeight in 332, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=332)
          0.035540104 = weight(abstract_txt:document in 332) [ClassicSimilarity], result of:
            0.035540104 = score(doc=332,freq=1.0), product of:
              0.10626914 = queryWeight, product of:
                1.7219037 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.014417064 = queryNorm
              0.33443484 = fieldWeight in 332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.078125 = fieldNorm(doc=332)
          0.5237937 = weight(abstract_txt:classifiers in 332) [ClassicSimilarity], result of:
            0.5237937 = score(doc=332,freq=4.0), product of:
              0.4429226 = queryWeight, product of:
                4.059182 = boost
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.014417064 = queryNorm
              1.1825852 = fieldWeight in 332, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.078125 = fieldNorm(doc=332)
          0.077319235 = weight(abstract_txt:classification in 332) [ClassicSimilarity], result of:
            0.077319235 = score(doc=332,freq=1.0), product of:
              0.24742448 = queryWeight, product of:
                4.29053 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.014417064 = queryNorm
              0.3124963 = fieldWeight in 332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.078125 = fieldNorm(doc=332)
        0.2 = coord(5/25)