Document (#37563)

Author
Hotho, A.
Bloehdorn, S.
Title
Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts
Source
Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK
Imprint
Washington, DC : IEEE Computer Society
Year
2004
Pages
S.331-334
Abstract
Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content
Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Theme
Automatisches Klassifizieren
Computerlinguistik

Similar documents (content)

  1. Altinel, B.; Ganiz, M.C.: Semantic text classification : a survey of past and recent advances (2018) 0.13
    0.13225445 = sum of:
      0.13225445 = product of:
        0.5510602 = sum of:
          0.045747135 = weight(abstract_txt:mining in 5051) [ClassicSimilarity], result of:
            0.045747135 = score(doc=5051,freq=1.0), product of:
              0.13545932 = queryWeight, product of:
                1.0126197 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.02166185 = queryNorm
              0.33771864 = fieldWeight in 5051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.021801135 = weight(abstract_txt:based in 5051) [ClassicSimilarity], result of:
            0.021801135 = score(doc=5051,freq=3.0), product of:
              0.07219746 = queryWeight, product of:
                1.0454853 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.02166185 = queryNorm
              0.3019654 = fieldWeight in 5051, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.053225216 = weight(abstract_txt:document in 5051) [ClassicSimilarity], result of:
            0.053225216 = score(doc=5051,freq=3.0), product of:
              0.13090236 = queryWeight, product of:
                1.4077667 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.02166185 = queryNorm
              0.4066024 = fieldWeight in 5051, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.15766504 = weight(abstract_txt:words in 5051) [ClassicSimilarity], result of:
            0.15766504 = score(doc=5051,freq=7.0), product of:
              0.20356381 = queryWeight, product of:
                1.7555258 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.02166185 = queryNorm
              0.774524 = fieldWeight in 5051, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.13367559 = weight(abstract_txt:classification in 5051) [ClassicSimilarity], result of:
            0.13367559 = score(doc=5051,freq=13.0), product of:
              0.16982189 = queryWeight, product of:
                1.9638097 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.02166185 = queryNorm
              0.78715175 = fieldWeight in 5051, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.13894613 = weight(abstract_txt:text in 5051) [ClassicSimilarity], result of:
            0.13894613 = score(doc=5051,freq=13.0), product of:
              0.17425686 = queryWeight, product of:
                1.9892873 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.02166185 = queryNorm
              0.7973639 = fieldWeight in 5051, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
        0.24 = coord(6/25)
    
  2. Baeza-Yates, R.; Hurtado, C.; Mendoza, M.: Improving search engines by query clustering (2007) 0.13
    0.12672241 = sum of:
      0.12672241 = product of:
        0.63361204 = sum of:
          0.07782047 = weight(abstract_txt:extracted in 601) [ClassicSimilarity], result of:
            0.07782047 = score(doc=601,freq=1.0), product of:
              0.13476385 = queryWeight, product of:
                1.0100169 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.02166185 = queryNorm
              0.5774581 = fieldWeight in 601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.09375 = fieldNorm(doc=601)
          0.021577528 = weight(abstract_txt:based in 601) [ClassicSimilarity], result of:
            0.021577528 = score(doc=601,freq=1.0), product of:
              0.07219746 = queryWeight, product of:
                1.0454853 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.02166185 = queryNorm
              0.29886824 = fieldWeight in 601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.09375 = fieldNorm(doc=601)
          0.03499015 = weight(abstract_txt:approach in 601) [ClassicSimilarity], result of:
            0.03499015 = score(doc=601,freq=1.0), product of:
              0.099651694 = queryWeight, product of:
                1.2282854 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.02166185 = queryNorm
              0.3511245 = fieldWeight in 601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.09375 = fieldNorm(doc=601)
          0.052679304 = weight(abstract_txt:document in 601) [ClassicSimilarity], result of:
            0.052679304 = score(doc=601,freq=1.0), product of:
              0.13090236 = queryWeight, product of:
                1.4077667 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.02166185 = queryNorm
              0.40243202 = fieldWeight in 601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.09375 = fieldNorm(doc=601)
          0.44654462 = weight(abstract_txt:boosting in 601) [ClassicSimilarity], result of:
            0.44654462 = score(doc=601,freq=1.0), product of:
              0.5442069 = queryWeight, product of:
                2.8703773 = boost
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.02166185 = queryNorm
              0.820542 = fieldWeight in 601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.09375 = fieldNorm(doc=601)
        0.2 = coord(5/25)
    
  3. Perovsek, M.; Kranjca, J.; Erjaveca, T.; Cestnika, B.; Lavraca, N.: TextFlows : a visual programming platform for text mining and natural language processing (2016) 0.11
    0.114625834 = sum of:
      0.114625834 = product of:
        0.47760764 = sum of:
          0.1307061 = weight(abstract_txt:mining in 2697) [ClassicSimilarity], result of:
            0.1307061 = score(doc=2697,freq=4.0), product of:
              0.13545932 = queryWeight, product of:
                1.0126197 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.02166185 = queryNorm
              0.9649104 = fieldWeight in 2697, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.017981272 = weight(abstract_txt:based in 2697) [ClassicSimilarity], result of:
            0.017981272 = score(doc=2697,freq=1.0), product of:
              0.07219746 = queryWeight, product of:
                1.0454853 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.02166185 = queryNorm
              0.24905685 = fieldWeight in 2697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.09616772 = weight(abstract_txt:corpora in 2697) [ClassicSimilarity], result of:
            0.09616772 = score(doc=2697,freq=1.0), product of:
              0.17524724 = queryWeight, product of:
                1.1517748 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.02166185 = queryNorm
              0.5487546 = fieldWeight in 2697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.03581895 = weight(abstract_txt:through in 2697) [ClassicSimilarity], result of:
            0.03581895 = score(doc=2697,freq=1.0), product of:
              0.11430104 = queryWeight, product of:
                1.3154733 = boost
                4.011184 = idf(docFreq=2176, maxDocs=44218)
                0.02166185 = queryNorm
              0.31337377 = fieldWeight in 2697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.011184 = idf(docFreq=2176, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.062083155 = weight(abstract_txt:document in 2697) [ClassicSimilarity], result of:
            0.062083155 = score(doc=2697,freq=2.0), product of:
              0.13090236 = queryWeight, product of:
                1.4077667 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.02166185 = queryNorm
              0.4742707 = fieldWeight in 2697, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.13485044 = weight(abstract_txt:text in 2697) [ClassicSimilarity], result of:
            0.13485044 = score(doc=2697,freq=6.0), product of:
              0.17425686 = queryWeight, product of:
                1.9892873 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.02166185 = queryNorm
              0.77386016 = fieldWeight in 2697, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
        0.24 = coord(6/25)
    
  4. Pearce, C.; Nicholas, C.: TELLTALE: Experiments in a dynamic hypertext environment for degraded and multilingual data (1996) 0.11
    0.11291086 = sum of:
      0.11291086 = product of:
        0.47046193 = sum of:
          0.017981272 = weight(abstract_txt:based in 4071) [ClassicSimilarity], result of:
            0.017981272 = score(doc=4071,freq=1.0), product of:
              0.07219746 = queryWeight, product of:
                1.0454853 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.02166185 = queryNorm
              0.24905685 = fieldWeight in 4071, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.078125 = fieldNorm(doc=4071)
          0.077343486 = weight(abstract_txt:typically in 4071) [ClassicSimilarity], result of:
            0.077343486 = score(doc=4071,freq=1.0), product of:
              0.15155867 = queryWeight, product of:
                1.0711057 = boost
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.02166185 = queryNorm
              0.5103204 = fieldWeight in 4071, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.078125 = fieldNorm(doc=4071)
          0.13600169 = weight(abstract_txt:corpora in 4071) [ClassicSimilarity], result of:
            0.13600169 = score(doc=4071,freq=2.0), product of:
              0.17524724 = queryWeight, product of:
                1.1517748 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.02166185 = queryNorm
              0.7760561 = fieldWeight in 4071, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=4071)
          0.04389942 = weight(abstract_txt:document in 4071) [ClassicSimilarity], result of:
            0.04389942 = score(doc=4071,freq=1.0), product of:
              0.13090236 = queryWeight, product of:
                1.4077667 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.02166185 = queryNorm
              0.33536002 = fieldWeight in 4071, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=4071)
          0.08513113 = weight(abstract_txt:words in 4071) [ClassicSimilarity], result of:
            0.08513113 = score(doc=4071,freq=1.0), product of:
              0.20356381 = queryWeight, product of:
                1.7555258 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.02166185 = queryNorm
              0.41820365 = fieldWeight in 4071, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=4071)
          0.110104926 = weight(abstract_txt:text in 4071) [ClassicSimilarity], result of:
            0.110104926 = score(doc=4071,freq=4.0), product of:
              0.17425686 = queryWeight, product of:
                1.9892873 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.02166185 = queryNorm
              0.6318542 = fieldWeight in 4071, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=4071)
        0.24 = coord(6/25)
    
  5. Toepfer, M.; Seifert, C.: Content-based quality estimation for automatic subject indexing of short texts under precision and recall constraints 0.10
    0.103129745 = sum of:
      0.103129745 = product of:
        0.36832052 = sum of:
          0.017981272 = weight(abstract_txt:based in 4309) [ClassicSimilarity], result of:
            0.017981272 = score(doc=4309,freq=1.0), product of:
              0.07219746 = queryWeight, product of:
                1.0454853 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.02166185 = queryNorm
              0.24905685 = fieldWeight in 4309, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.078125 = fieldNorm(doc=4309)
          0.077343486 = weight(abstract_txt:typically in 4309) [ClassicSimilarity], result of:
            0.077343486 = score(doc=4309,freq=1.0), product of:
              0.15155867 = queryWeight, product of:
                1.0711057 = boost
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.02166185 = queryNorm
              0.5103204 = fieldWeight in 4309, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.078125 = fieldNorm(doc=4309)
          0.05050393 = weight(abstract_txt:approach in 4309) [ClassicSimilarity], result of:
            0.05050393 = score(doc=4309,freq=3.0), product of:
              0.099651694 = queryWeight, product of:
                1.2282854 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.02166185 = queryNorm
              0.5068045 = fieldWeight in 4309, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.078125 = fieldNorm(doc=4309)
          0.062083155 = weight(abstract_txt:document in 4309) [ClassicSimilarity], result of:
            0.062083155 = score(doc=4309,freq=2.0), product of:
              0.13090236 = queryWeight, product of:
                1.4077667 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.02166185 = queryNorm
              0.4742707 = fieldWeight in 4309, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=4309)
          0.052392006 = weight(abstract_txt:concepts in 4309) [ClassicSimilarity], result of:
            0.052392006 = score(doc=4309,freq=1.0), product of:
              0.14728267 = queryWeight, product of:
                1.4932508 = boost
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.02166185 = queryNorm
              0.35572416 = fieldWeight in 4309, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.078125 = fieldNorm(doc=4309)
          0.0529642 = weight(abstract_txt:classification in 4309) [ClassicSimilarity], result of:
            0.0529642 = score(doc=4309,freq=1.0), product of:
              0.16982189 = queryWeight, product of:
                1.9638097 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.02166185 = queryNorm
              0.3118809 = fieldWeight in 4309, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=4309)
          0.055052463 = weight(abstract_txt:text in 4309) [ClassicSimilarity], result of:
            0.055052463 = score(doc=4309,freq=1.0), product of:
              0.17425686 = queryWeight, product of:
                1.9892873 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.02166185 = queryNorm
              0.3159271 = fieldWeight in 4309, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=4309)
        0.28 = coord(7/25)