Document (#34120)

Author
Li, T.
Zhu, S.
Ogihara, M.
Title
Text categorization via generalized discriminant analysis
Source
Information processing and management. 44(2008) no.5, S.1684-1697
Year
2008
Abstract
Text categorization is an important research area and has been receiving much attention due to the growth of the on-line information and of Internet. Automated text categorization is generally cast as a multi-class classification problem. Much of previous work focused on binary document classification problems. Support vector machines (SVMs) excel in binary classification, but the elegant theory behind large-margin hyperplane cannot be easily extended to multi-class text classification. In addition, the training time and scaling are also important concerns. On the other hand, other techniques naturally extensible to handle multi-class classification are generally not as accurate as SVM. This paper presents a simple and efficient solution to multi-class text categorization. Classification problems are first formulated as optimization via discriminant analysis. Text categorization is then cast as the problem of finding coordinate transformations that reflects the inherent similarity from the data. While most of the previous approaches decompose a multi-class classification problem into multiple independent binary classification tasks, the proposed approach enables direct multi-class classification. By using generalized singular value decomposition (GSVD), a coordinate transformation that reflects the inherent class structure indicated by the generalized singular values is identified. Extensive experiments demonstrate the efficiency and effectiveness of the proposed approach.
Theme
Automatisches Klassifizieren

Similar documents (content)

  1. Tao, J.; Zhou, L.; Hickey, K.: Making sense of the black-boxes : toward interpretable text classification using deep learning models (2023) 0.27
    0.2683538 = sum of:
      0.2683538 = product of:
        0.83860564 = sum of:
          0.016150696 = weight(abstract_txt:problems in 990) [ClassicSimilarity], result of:
            0.016150696 = score(doc=990,freq=1.0), product of:
              0.060130727 = queryWeight, product of:
                1.0629497 = boost
                4.297489 = idf(docFreq=1634, maxDocs=44218)
                0.013163427 = queryNorm
              0.26859307 = fieldWeight in 990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.297489 = idf(docFreq=1634, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.019927632 = weight(abstract_txt:proposed in 990) [ClassicSimilarity], result of:
            0.019927632 = score(doc=990,freq=1.0), product of:
              0.06917345 = queryWeight, product of:
                1.140077 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.013163427 = queryNorm
              0.2880821 = fieldWeight in 990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.11886167 = weight(abstract_txt:binary in 990) [ClassicSimilarity], result of:
            0.11886167 = score(doc=990,freq=1.0), product of:
              0.26043174 = queryWeight, product of:
                2.7092998 = boost
                7.3024383 = idf(docFreq=80, maxDocs=44218)
                0.013163427 = queryNorm
              0.4564024 = fieldWeight in 990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3024383 = idf(docFreq=80, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.08073998 = weight(abstract_txt:text in 990) [ClassicSimilarity], result of:
            0.08073998 = score(doc=990,freq=4.0), product of:
              0.15972827 = queryWeight, product of:
                3.0006545 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.013163427 = queryNorm
              0.5054833 = fieldWeight in 990, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.14565635 = weight(abstract_txt:categorization in 990) [ClassicSimilarity], result of:
            0.14565635 = score(doc=990,freq=1.0), product of:
              0.35359156 = queryWeight, product of:
                4.0755424 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.013163427 = queryNorm
              0.41193387 = fieldWeight in 990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.18133786 = weight(abstract_txt:multi in 990) [ClassicSimilarity], result of:
            0.18133786 = score(doc=990,freq=2.0), product of:
              0.34513715 = queryWeight, product of:
                4.410836 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.013163427 = queryNorm
              0.52540815 = fieldWeight in 990, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.13026884 = weight(abstract_txt:classification in 990) [ClassicSimilarity], result of:
            0.13026884 = score(doc=990,freq=5.0), product of:
              0.23349458 = queryWeight, product of:
                4.443336 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.013163427 = queryNorm
              0.5579095 = fieldWeight in 990, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.1456627 = weight(abstract_txt:class in 990) [ClassicSimilarity], result of:
            0.1456627 = score(doc=990,freq=1.0), product of:
              0.39557046 = queryWeight, product of:
                5.1004725 = boost
                5.8917522 = idf(docFreq=331, maxDocs=44218)
                0.013163427 = queryNorm
              0.36823452 = fieldWeight in 990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8917522 = idf(docFreq=331, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
        0.32 = coord(8/25)
    
  2. Stamatatos, E.: Author identification : using text sampling to handle the class imbalance problem (2008) 0.21
    0.21064658 = sum of:
      0.21064658 = product of:
        0.87769413 = sum of:
          0.02708961 = weight(abstract_txt:problem in 2063) [ClassicSimilarity], result of:
            0.02708961 = score(doc=2063,freq=1.0), product of:
              0.097170524 = queryWeight, product of:
                1.6549214 = boost
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.013163427 = queryNorm
              0.27878425 = fieldWeight in 2063, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.0625 = fieldNorm(doc=2063)
          0.09888588 = weight(abstract_txt:text in 2063) [ClassicSimilarity], result of:
            0.09888588 = score(doc=2063,freq=6.0), product of:
              0.15972827 = queryWeight, product of:
                3.0006545 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.013163427 = queryNorm
              0.6190881 = fieldWeight in 2063, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2063)
          0.14565635 = weight(abstract_txt:categorization in 2063) [ClassicSimilarity], result of:
            0.14565635 = score(doc=2063,freq=1.0), product of:
              0.35359156 = queryWeight, product of:
                4.0755424 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.013163427 = queryNorm
              0.41193387 = fieldWeight in 2063, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.0625 = fieldNorm(doc=2063)
          0.22209261 = weight(abstract_txt:multi in 2063) [ClassicSimilarity], result of:
            0.22209261 = score(doc=2063,freq=3.0), product of:
              0.34513715 = queryWeight, product of:
                4.410836 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.013163427 = queryNorm
              0.6434909 = fieldWeight in 2063, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.0625 = fieldNorm(doc=2063)
          0.058257993 = weight(abstract_txt:classification in 2063) [ClassicSimilarity], result of:
            0.058257993 = score(doc=2063,freq=1.0), product of:
              0.23349458 = queryWeight, product of:
                4.443336 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.013163427 = queryNorm
              0.2495047 = fieldWeight in 2063, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=2063)
          0.3257117 = weight(abstract_txt:class in 2063) [ClassicSimilarity], result of:
            0.3257117 = score(doc=2063,freq=5.0), product of:
              0.39557046 = queryWeight, product of:
                5.1004725 = boost
                5.8917522 = idf(docFreq=331, maxDocs=44218)
                0.013163427 = queryNorm
              0.8233974 = fieldWeight in 2063, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.8917522 = idf(docFreq=331, maxDocs=44218)
                0.0625 = fieldNorm(doc=2063)
        0.24 = coord(6/25)
    
  3. Billal, B.; Fonseca, A.; Sadat, F.; Lounis, H.: Semi-supervised learning and social media text analysis towards multi-labeling categorization (2017) 0.15
    0.14797531 = sum of:
      0.14797531 = product of:
        0.6165638 = sum of:
          0.017436678 = weight(abstract_txt:proposed in 4095) [ClassicSimilarity], result of:
            0.017436678 = score(doc=4095,freq=1.0), product of:
              0.06917345 = queryWeight, product of:
                1.140077 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.013163427 = queryNorm
              0.25207183 = fieldWeight in 4095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.02370341 = weight(abstract_txt:problem in 4095) [ClassicSimilarity], result of:
            0.02370341 = score(doc=4095,freq=1.0), product of:
              0.097170524 = queryWeight, product of:
                1.6549214 = boost
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.013163427 = queryNorm
              0.24393621 = fieldWeight in 4095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.07064748 = weight(abstract_txt:text in 4095) [ClassicSimilarity], result of:
            0.07064748 = score(doc=4095,freq=4.0), product of:
              0.15972827 = queryWeight, product of:
                3.0006545 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.013163427 = queryNorm
              0.4422979 = fieldWeight in 4095, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.22439417 = weight(abstract_txt:multi in 4095) [ClassicSimilarity], result of:
            0.22439417 = score(doc=4095,freq=4.0), product of:
              0.34513715 = queryWeight, product of:
                4.410836 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.013163427 = queryNorm
              0.6501594 = fieldWeight in 4095, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.15292723 = weight(abstract_txt:classification in 4095) [ClassicSimilarity], result of:
            0.15292723 = score(doc=4095,freq=9.0), product of:
              0.23349458 = queryWeight, product of:
                4.443336 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.013163427 = queryNorm
              0.65494984 = fieldWeight in 4095, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.12745485 = weight(abstract_txt:class in 4095) [ClassicSimilarity], result of:
            0.12745485 = score(doc=4095,freq=1.0), product of:
              0.39557046 = queryWeight, product of:
                5.1004725 = boost
                5.8917522 = idf(docFreq=331, maxDocs=44218)
                0.013163427 = queryNorm
              0.3222052 = fieldWeight in 4095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8917522 = idf(docFreq=331, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
        0.24 = coord(6/25)
    
  4. Muneer, I.; Sharjeel, M.; Iqbal, M.; Adeel Nawab, R.M.; Rayson, P.: CLEU - A Cross-language english-urdu corpus and benchmark for text reuse experiments (2019) 0.14
    0.13789698 = sum of:
      0.13789698 = product of:
        0.49248922 = sum of:
          0.028181927 = weight(abstract_txt:proposed in 5299) [ClassicSimilarity], result of:
            0.028181927 = score(doc=5299,freq=2.0), product of:
              0.06917345 = queryWeight, product of:
                1.140077 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.013163427 = queryNorm
              0.4074096 = fieldWeight in 5299, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0625 = fieldNorm(doc=5299)
          0.025063807 = weight(abstract_txt:much in 5299) [ClassicSimilarity], result of:
            0.025063807 = score(doc=5299,freq=1.0), product of:
              0.08059974 = queryWeight, product of:
                1.2306408 = boost
                4.9754615 = idf(docFreq=829, maxDocs=44218)
                0.013163427 = queryNorm
              0.31096634 = fieldWeight in 5299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9754615 = idf(docFreq=829, maxDocs=44218)
                0.0625 = fieldNorm(doc=5299)
          0.02708961 = weight(abstract_txt:problem in 5299) [ClassicSimilarity], result of:
            0.02708961 = score(doc=5299,freq=1.0), product of:
              0.097170524 = queryWeight, product of:
                1.6549214 = boost
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.013163427 = queryNorm
              0.27878425 = fieldWeight in 5299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.0625 = fieldNorm(doc=5299)
          0.11886167 = weight(abstract_txt:binary in 5299) [ClassicSimilarity], result of:
            0.11886167 = score(doc=5299,freq=1.0), product of:
              0.26043174 = queryWeight, product of:
                2.7092998 = boost
                7.3024383 = idf(docFreq=80, maxDocs=44218)
                0.013163427 = queryNorm
              0.4564024 = fieldWeight in 5299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3024383 = idf(docFreq=80, maxDocs=44218)
                0.0625 = fieldNorm(doc=5299)
          0.10680895 = weight(abstract_txt:text in 5299) [ClassicSimilarity], result of:
            0.10680895 = score(doc=5299,freq=7.0), product of:
              0.15972827 = queryWeight, product of:
                3.0006545 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.013163427 = queryNorm
              0.6686916 = fieldWeight in 5299, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=5299)
          0.12822524 = weight(abstract_txt:multi in 5299) [ClassicSimilarity], result of:
            0.12822524 = score(doc=5299,freq=1.0), product of:
              0.34513715 = queryWeight, product of:
                4.410836 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.013163427 = queryNorm
              0.37151965 = fieldWeight in 5299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.0625 = fieldNorm(doc=5299)
          0.058257993 = weight(abstract_txt:classification in 5299) [ClassicSimilarity], result of:
            0.058257993 = score(doc=5299,freq=1.0), product of:
              0.23349458 = queryWeight, product of:
                4.443336 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.013163427 = queryNorm
              0.2495047 = fieldWeight in 5299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=5299)
        0.28 = coord(7/25)
    
  5. Aphinyanaphongs, Y.; Fu, L.D.; Li, Z.; Peskin, E.R.; Efstathiadis, E.; Aliferis, C.F.; Statnikov, A.: ¬A comprehensive empirical comparison of modern supervised classification and feature selection methods for text categorization (2014) 0.12
    0.12078352 = sum of:
      0.12078352 = product of:
        0.6039176 = sum of:
          0.019044872 = weight(abstract_txt:important in 1496) [ClassicSimilarity], result of:
            0.019044872 = score(doc=1496,freq=1.0), product of:
              0.057838142 = queryWeight, product of:
                1.0424894 = boost
                4.2147684 = idf(docFreq=1775, maxDocs=44218)
                0.013163427 = queryNorm
              0.32927877 = fieldWeight in 1496, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2147684 = idf(docFreq=1775, maxDocs=44218)
                0.078125 = fieldNorm(doc=1496)
          0.0364689 = weight(abstract_txt:previous in 1496) [ClassicSimilarity], result of:
            0.0364689 = score(doc=1496,freq=1.0), product of:
              0.08918888 = queryWeight, product of:
                1.294553 = boost
                5.2338576 = idf(docFreq=640, maxDocs=44218)
                0.013163427 = queryNorm
              0.40889513 = fieldWeight in 1496, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2338576 = idf(docFreq=640, maxDocs=44218)
                0.078125 = fieldNorm(doc=1496)
          0.08740359 = weight(abstract_txt:text in 1496) [ClassicSimilarity], result of:
            0.08740359 = score(doc=1496,freq=3.0), product of:
              0.15972827 = queryWeight, product of:
                3.0006545 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.013163427 = queryNorm
              0.54720175 = fieldWeight in 1496, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=1496)
          0.31535524 = weight(abstract_txt:categorization in 1496) [ClassicSimilarity], result of:
            0.31535524 = score(doc=1496,freq=3.0), product of:
              0.35359156 = queryWeight, product of:
                4.0755424 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.013163427 = queryNorm
              0.891863 = fieldWeight in 1496, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.078125 = fieldNorm(doc=1496)
          0.145645 = weight(abstract_txt:classification in 1496) [ClassicSimilarity], result of:
            0.145645 = score(doc=1496,freq=4.0), product of:
              0.23349458 = queryWeight, product of:
                4.443336 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.013163427 = queryNorm
              0.6237618 = fieldWeight in 1496, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=1496)
        0.2 = coord(5/25)