Document (#30274)

Author
Yoon, Y.
Lee, C.
Lee, G.G.
Title
¬An effective procedure for constructing a hierarchical text classification system
Source
Journal of the American Society for Information Science and Technology. 57(2006) no.3, S.431-442
Year
2006
Abstract
In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with large collections of practical data. In this article, we introduce a new evaluation scheme for internal node classifiers, which can be used effectively to develop a hierarchical classification system. We also show that our method for constructing the hierarchical classification system is very effective, especially for the task of constructing classifiers applied to hierarchy tree with a lot of levels.
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Yoon, L.L.: ¬The performance of cited references as an approach to information retrieval (1994) 5.58
    5.5776863 = sum of:
      5.5776863 = weight(author_txt:yoon in 8219) [ClassicSimilarity], result of:
        5.5776863 = fieldWeight in 8219, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.625 = fieldNorm(doc=8219)
    
  2. Yoon, J.W.: Utilizing quantitative users' reactions to represent affective meanings of an image (2010) 5.58
    5.5776863 = sum of:
      5.5776863 = weight(author_txt:yoon in 3584) [ClassicSimilarity], result of:
        5.5776863 = fieldWeight in 3584, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.625 = fieldNorm(doc=3584)
    
  3. Yoon, J.W.: Towards a user-oriented thesaurus for non-domain-specific image collections (2009) 5.58
    5.5776863 = sum of:
      5.5776863 = weight(author_txt:yoon in 4221) [ClassicSimilarity], result of:
        5.5776863 = fieldWeight in 4221, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.625 = fieldNorm(doc=4221)
    
  4. Yoon, K.: Conceptual syntagmatic associations in user tagging (2012) 5.58
    5.5776863 = sum of:
      5.5776863 = weight(author_txt:yoon in 240) [ClassicSimilarity], result of:
        5.5776863 = fieldWeight in 240, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.625 = fieldNorm(doc=240)
    
  5. Yoon, A.: Data reusers' trust development (2017) 5.58
    5.5776863 = sum of:
      5.5776863 = weight(author_txt:yoon in 3532) [ClassicSimilarity], result of:
        5.5776863 = fieldWeight in 3532, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.924298 = idf(docFreq=15, maxDocs=44218)
          0.625 = fieldNorm(doc=3532)
    

Similar documents (content)

  1. Gauch, S.; Chandramouli, A.; Ranganathan, S.: Training a hierarchical classifier using inter document relationships (2009) 0.28
    0.27984762 = sum of:
      0.27984762 = product of:
        1.1660318 = sum of:
          0.039676502 = weight(abstract_txt:text in 2697) [ClassicSimilarity], result of:
            0.039676502 = score(doc=2697,freq=2.0), product of:
              0.08880379 = queryWeight, product of:
                1.380358 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015909003 = queryNorm
              0.44678837 = fieldWeight in 2697, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.05301754 = weight(abstract_txt:large in 2697) [ClassicSimilarity], result of:
            0.05301754 = score(doc=2697,freq=2.0), product of:
              0.10773471 = queryWeight, product of:
                1.5203857 = boost
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.015909003 = queryNorm
              0.49211198 = fieldWeight in 2697, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.312806 = weight(abstract_txt:classifiers in 2697) [ClassicSimilarity], result of:
            0.312806 = score(doc=2697,freq=3.0), product of:
              0.3072998 = queryWeight, product of:
                2.5677757 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.015909003 = queryNorm
              1.0179181 = fieldWeight in 2697, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.252835 = weight(abstract_txt:hierarchy in 2697) [ClassicSimilarity], result of:
            0.252835 = score(doc=2697,freq=2.0), product of:
              0.34940666 = queryWeight, product of:
                3.353414 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.015909003 = queryNorm
              0.7236124 = fieldWeight in 2697, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.16194789 = weight(abstract_txt:classification in 2697) [ClassicSimilarity], result of:
            0.16194789 = score(doc=2697,freq=4.0), product of:
              0.259631 = queryWeight, product of:
                4.0880375 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.015909003 = queryNorm
              0.6237618 = fieldWeight in 2697, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.34574887 = weight(abstract_txt:hierarchical in 2697) [ClassicSimilarity], result of:
            0.34574887 = score(doc=2697,freq=3.0), product of:
              0.4458609 = queryWeight, product of:
                4.890414 = boost
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.015909003 = queryNorm
              0.7754636 = fieldWeight in 2697, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
        0.24 = coord(6/25)
    
  2. Sun, A.; Lim, E.-P.; Ng, W.-K.: Performance measurement framework for hierarchical text classification (2003) 0.24
    0.23908795 = sum of:
      0.23908795 = product of:
        0.99619985 = sum of:
          0.05442281 = weight(abstract_txt:tree in 1808) [ClassicSimilarity], result of:
            0.05442281 = score(doc=1808,freq=1.0), product of:
              0.12721449 = queryWeight, product of:
                1.1682324 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.015909003 = queryNorm
              0.42780355 = fieldWeight in 1808, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.0625 = fieldNorm(doc=1808)
          0.022444418 = weight(abstract_txt:text in 1808) [ClassicSimilarity], result of:
            0.022444418 = score(doc=1808,freq=1.0), product of:
              0.08880379 = queryWeight, product of:
                1.380358 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015909003 = queryNorm
              0.25274166 = fieldWeight in 1808, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=1808)
          0.046421766 = weight(abstract_txt:method in 1808) [ClassicSimilarity], result of:
            0.046421766 = score(doc=1808,freq=1.0), product of:
              0.16502033 = queryWeight, product of:
                2.304572 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.015909003 = queryNorm
              0.28130937 = fieldWeight in 1808, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=1808)
          0.32306468 = weight(abstract_txt:classifiers in 1808) [ClassicSimilarity], result of:
            0.32306468 = score(doc=1808,freq=5.0), product of:
              0.3072998 = queryWeight, product of:
                2.5677757 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.015909003 = queryNorm
              1.0513014 = fieldWeight in 1808, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.0625 = fieldNorm(doc=1808)
          0.15867588 = weight(abstract_txt:classification in 1808) [ClassicSimilarity], result of:
            0.15867588 = score(doc=1808,freq=6.0), product of:
              0.259631 = queryWeight, product of:
                4.0880375 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.015909003 = queryNorm
              0.6111592 = fieldWeight in 1808, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=1808)
          0.39117023 = weight(abstract_txt:hierarchical in 1808) [ClassicSimilarity], result of:
            0.39117023 = score(doc=1808,freq=6.0), product of:
              0.4458609 = queryWeight, product of:
                4.890414 = boost
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.015909003 = queryNorm
              0.8773369 = fieldWeight in 1808, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.0625 = fieldNorm(doc=1808)
        0.24 = coord(6/25)
    
  3. Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.24
    0.23855975 = sum of:
      0.23855975 = product of:
        0.993999 = sum of:
          0.07871627 = weight(abstract_txt:performs in 2760) [ClassicSimilarity], result of:
            0.07871627 = score(doc=2760,freq=1.0), product of:
              0.14021213 = queryWeight, product of:
                1.226461 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.015909003 = queryNorm
              0.5614084 = fieldWeight in 2760, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.078125 = fieldNorm(doc=2760)
          0.06872171 = weight(abstract_txt:text in 2760) [ClassicSimilarity], result of:
            0.06872171 = score(doc=2760,freq=6.0), product of:
              0.08880379 = queryWeight, product of:
                1.380358 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015909003 = queryNorm
              0.77386016 = fieldWeight in 2760, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2760)
          0.032541666 = weight(abstract_txt:system in 2760) [ClassicSimilarity], result of:
            0.032541666 = score(doc=2760,freq=1.0), product of:
              0.123515785 = queryWeight, product of:
                2.3022485 = boost
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.015909003 = queryNorm
              0.2634616 = fieldWeight in 2760, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.078125 = fieldNorm(doc=2760)
          0.252835 = weight(abstract_txt:hierarchy in 2760) [ClassicSimilarity], result of:
            0.252835 = score(doc=2760,freq=2.0), product of:
              0.34940666 = queryWeight, product of:
                3.353414 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.015909003 = queryNorm
              0.7236124 = fieldWeight in 2760, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.078125 = fieldNorm(doc=2760)
          0.16194789 = weight(abstract_txt:classification in 2760) [ClassicSimilarity], result of:
            0.16194789 = score(doc=2760,freq=4.0), product of:
              0.259631 = queryWeight, product of:
                4.0880375 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.015909003 = queryNorm
              0.6237618 = fieldWeight in 2760, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=2760)
          0.3992364 = weight(abstract_txt:hierarchical in 2760) [ClassicSimilarity], result of:
            0.3992364 = score(doc=2760,freq=4.0), product of:
              0.4458609 = queryWeight, product of:
                4.890414 = boost
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.015909003 = queryNorm
              0.8954282 = fieldWeight in 2760, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.078125 = fieldNorm(doc=2760)
        0.24 = coord(6/25)
    
  4. Li, T.; Zhu, S.; Ogihara, M.: Hierarchical document classification using automatically generated hierarchy (2007) 0.23
    0.23303087 = sum of:
      0.23303087 = product of:
        0.8322531 = sum of:
          0.105196424 = weight(abstract_txt:categorization in 4797) [ClassicSimilarity], result of:
            0.105196424 = score(doc=4797,freq=3.0), product of:
              0.117951326 = queryWeight, product of:
                1.124896 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.015909003 = queryNorm
              0.891863 = fieldWeight in 4797, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.13160059 = weight(abstract_txt:hierarchies in 4797) [ClassicSimilarity], result of:
            0.13160059 = score(doc=4797,freq=3.0), product of:
              0.13694328 = queryWeight, product of:
                1.2120801 = boost
                7.1017675 = idf(docFreq=98, maxDocs=44218)
                0.015909003 = queryNorm
              0.9609861 = fieldWeight in 4797, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.1017675 = idf(docFreq=98, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.039676502 = weight(abstract_txt:text in 4797) [ClassicSimilarity], result of:
            0.039676502 = score(doc=4797,freq=2.0), product of:
              0.08880379 = queryWeight, product of:
                1.380358 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015909003 = queryNorm
              0.44678837 = fieldWeight in 4797, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.037489064 = weight(abstract_txt:large in 4797) [ClassicSimilarity], result of:
            0.037489064 = score(doc=4797,freq=1.0), product of:
              0.10773471 = queryWeight, product of:
                1.5203857 = boost
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.015909003 = queryNorm
              0.34797573 = fieldWeight in 4797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.058027208 = weight(abstract_txt:method in 4797) [ClassicSimilarity], result of:
            0.058027208 = score(doc=4797,freq=1.0), product of:
              0.16502033 = queryWeight, product of:
                2.304572 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.015909003 = queryNorm
              0.3516367 = fieldWeight in 4797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.11451445 = weight(abstract_txt:classification in 4797) [ClassicSimilarity], result of:
            0.11451445 = score(doc=4797,freq=2.0), product of:
              0.259631 = queryWeight, product of:
                4.0880375 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.015909003 = queryNorm
              0.44106615 = fieldWeight in 4797, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.34574887 = weight(abstract_txt:hierarchical in 4797) [ClassicSimilarity], result of:
            0.34574887 = score(doc=4797,freq=3.0), product of:
              0.4458609 = queryWeight, product of:
                4.890414 = boost
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.015909003 = queryNorm
              0.7754636 = fieldWeight in 4797, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
        0.28 = coord(7/25)
    
  5. Pons-Porrata, A.; Berlanga-Llavori, R.; Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques (2007) 0.19
    0.19101521 = sum of:
      0.19101521 = product of:
        0.6821972 = sum of:
          0.042795982 = weight(abstract_txt:build in 916) [ClassicSimilarity], result of:
            0.042795982 = score(doc=916,freq=1.0), product of:
              0.09339951 = queryWeight, product of:
                1.0009981 = boost
                5.8650045 = idf(docFreq=340, maxDocs=44218)
                0.015909003 = queryNorm
              0.4582035 = fieldWeight in 916, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8650045 = idf(docFreq=340, maxDocs=44218)
                0.078125 = fieldNorm(doc=916)
          0.075979635 = weight(abstract_txt:hierarchies in 916) [ClassicSimilarity], result of:
            0.075979635 = score(doc=916,freq=1.0), product of:
              0.13694328 = queryWeight, product of:
                1.2120801 = boost
                7.1017675 = idf(docFreq=98, maxDocs=44218)
                0.015909003 = queryNorm
              0.5548256 = fieldWeight in 916, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1017675 = idf(docFreq=98, maxDocs=44218)
                0.078125 = fieldNorm(doc=916)
          0.04602087 = weight(abstract_txt:system in 916) [ClassicSimilarity], result of:
            0.04602087 = score(doc=916,freq=2.0), product of:
              0.123515785 = queryWeight, product of:
                2.3022485 = boost
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.015909003 = queryNorm
              0.372591 = fieldWeight in 916, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.078125 = fieldNorm(doc=916)
          0.058027208 = weight(abstract_txt:method in 916) [ClassicSimilarity], result of:
            0.058027208 = score(doc=916,freq=1.0), product of:
              0.16502033 = queryWeight, product of:
                2.304572 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.015909003 = queryNorm
              0.3516367 = fieldWeight in 916, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=916)
          0.17878136 = weight(abstract_txt:hierarchy in 916) [ClassicSimilarity], result of:
            0.17878136 = score(doc=916,freq=1.0), product of:
              0.34940666 = queryWeight, product of:
                3.353414 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.015909003 = queryNorm
              0.5116713 = fieldWeight in 916, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.078125 = fieldNorm(doc=916)
          0.080973946 = weight(abstract_txt:classification in 916) [ClassicSimilarity], result of:
            0.080973946 = score(doc=916,freq=1.0), product of:
              0.259631 = queryWeight, product of:
                4.0880375 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.015909003 = queryNorm
              0.3118809 = fieldWeight in 916, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=916)
          0.1996182 = weight(abstract_txt:hierarchical in 916) [ClassicSimilarity], result of:
            0.1996182 = score(doc=916,freq=1.0), product of:
              0.4458609 = queryWeight, product of:
                4.890414 = boost
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.015909003 = queryNorm
              0.4477141 = fieldWeight in 916, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.078125 = fieldNorm(doc=916)
        0.28 = coord(7/25)