Document (#30275)

Author
Yoon, Y.
Lee, C.
Lee, G.G.
Title
¬An effective procedure for constructing a hierarchical text classification system
Source
Journal of the American Society for Information Science and Technology. 57(2006) no.3, S.431-442
Year
2006
Abstract
In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with large collections of practical data. In this article, we introduce a new evaluation scheme for internal node classifiers, which can be used effectively to develop a hierarchical classification system. We also show that our method for constructing the hierarchical classification system is very effective, especially for the task of constructing classifiers applied to hierarchy tree with a lot of levels.
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Yoon, L.L.: ¬The performance of cited references as an approach to information retrieval (1994) 5.64
    5.6377864 = sum of:
      5.6377864 = weight(author_txt:yoon in 219) [ClassicSimilarity], result of:
        5.6377864 = fieldWeight in 219, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.020458 = idf(docFreq=13, maxDocs=42596)
          0.625 = fieldNorm(doc=219)
    
  2. Yoon, J.W.: Utilizing quantitative users' reactions to represent affective meanings of an image (2010) 5.64
    5.6377864 = sum of:
      5.6377864 = weight(author_txt:yoon in 4764) [ClassicSimilarity], result of:
        5.6377864 = fieldWeight in 4764, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.020458 = idf(docFreq=13, maxDocs=42596)
          0.625 = fieldNorm(doc=4764)
    
  3. Yoon, J.W.: Towards a user-oriented thesaurus for non-domain-specific image collections (2009) 5.64
    5.6377864 = sum of:
      5.6377864 = weight(author_txt:yoon in 222) [ClassicSimilarity], result of:
        5.6377864 = fieldWeight in 222, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.020458 = idf(docFreq=13, maxDocs=42596)
          0.625 = fieldNorm(doc=222)
    
  4. Yoon, K.: Conceptual syntagmatic associations in user tagging (2012) 5.64
    5.6377864 = sum of:
      5.6377864 = weight(author_txt:yoon in 1241) [ClassicSimilarity], result of:
        5.6377864 = fieldWeight in 1241, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.020458 = idf(docFreq=13, maxDocs=42596)
          0.625 = fieldNorm(doc=1241)
    
  5. Yoon, A.: Data reusers' trust development (2017) 5.64
    5.6377864 = sum of:
      5.6377864 = weight(author_txt:yoon in 4533) [ClassicSimilarity], result of:
        5.6377864 = fieldWeight in 4533, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.020458 = idf(docFreq=13, maxDocs=42596)
          0.625 = fieldNorm(doc=4533)
    

Similar documents (content)

  1. Gauch, S.; Chandramouli, A.; Ranganathan, S.: Training a hierarchical classifier using inter document relationships (2009) 0.28
    0.28056186 = sum of:
      0.28056186 = product of:
        1.1690078 = sum of:
          0.03951654 = weight(abstract_txt:text in 3877) [ClassicSimilarity], result of:
            0.03951654 = score(doc=3877,freq=2.0), product of:
              0.08833019 = queryWeight, product of:
                1.3731861 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.015886016 = queryNorm
              0.44737297 = fieldWeight in 3877, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.078125 = fieldNorm(doc=3877)
          0.053067658 = weight(abstract_txt:large in 3877) [ClassicSimilarity], result of:
            0.053067658 = score(doc=3877,freq=2.0), product of:
              0.10751684 = queryWeight, product of:
                1.5150015 = boost
                4.467334 = idf(docFreq=1328, maxDocs=42596)
                0.015886016 = queryNorm
              0.4935753 = fieldWeight in 3877, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.467334 = idf(docFreq=1328, maxDocs=42596)
                0.078125 = fieldNorm(doc=3877)
          0.31563604 = weight(abstract_txt:classifiers in 3877) [ClassicSimilarity], result of:
            0.31563604 = score(doc=3877,freq=3.0), product of:
              0.30833098 = queryWeight, product of:
                2.5655675 = boost
                7.5651712 = idf(docFreq=59, maxDocs=42596)
                0.015886016 = queryNorm
              1.0236923 = fieldWeight in 3877, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5651712 = idf(docFreq=59, maxDocs=42596)
                0.078125 = fieldNorm(doc=3877)
          0.25201 = weight(abstract_txt:hierarchy in 3877) [ClassicSimilarity], result of:
            0.25201 = score(doc=3877,freq=2.0), product of:
              0.34772196 = queryWeight, product of:
                3.336849 = boost
                6.559649 = idf(docFreq=163, maxDocs=42596)
                0.015886016 = queryNorm
              0.7247457 = fieldWeight in 3877, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.559649 = idf(docFreq=163, maxDocs=42596)
                0.078125 = fieldNorm(doc=3877)
          0.16154964 = weight(abstract_txt:classification in 3877) [ClassicSimilarity], result of:
            0.16154964 = score(doc=3877,freq=4.0), product of:
              0.2585181 = queryWeight, product of:
                4.068937 = boost
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.015886016 = queryNorm
              0.6249065 = fieldWeight in 3877, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.078125 = fieldNorm(doc=3877)
          0.34722796 = weight(abstract_txt:hierarchical in 3877) [ClassicSimilarity], result of:
            0.34722796 = score(doc=3877,freq=3.0), product of:
              0.44594634 = queryWeight, product of:
                4.878498 = boost
                5.7541537 = idf(docFreq=366, maxDocs=42596)
                0.015886016 = queryNorm
              0.7786317 = fieldWeight in 3877, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7541537 = idf(docFreq=366, maxDocs=42596)
                0.078125 = fieldNorm(doc=3877)
        0.24 = coord(6/25)
    
  2. Sun, A.; Lim, E.-P.; Ng, W.-K.: Performance measurement framework for hierarchical text classification (2003) 0.24
    0.24002123 = sum of:
      0.24002123 = product of:
        1.0000885 = sum of:
          0.053858228 = weight(abstract_txt:tree in 2809) [ClassicSimilarity], result of:
            0.053858228 = score(doc=2809,freq=1.0), product of:
              0.12599827 = queryWeight, product of:
                1.1596906 = boost
                6.839234 = idf(docFreq=123, maxDocs=42596)
                0.015886016 = queryNorm
              0.42745212 = fieldWeight in 2809, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.839234 = idf(docFreq=123, maxDocs=42596)
                0.0625 = fieldNorm(doc=2809)
          0.02235393 = weight(abstract_txt:text in 2809) [ClassicSimilarity], result of:
            0.02235393 = score(doc=2809,freq=1.0), product of:
              0.08833019 = queryWeight, product of:
                1.3731861 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.015886016 = queryNorm
              0.25307238 = fieldWeight in 2809, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.0625 = fieldNorm(doc=2809)
          0.04675945 = weight(abstract_txt:method in 2809) [ClassicSimilarity], result of:
            0.04675945 = score(doc=2809,freq=1.0), product of:
              0.16538009 = queryWeight, product of:
                2.301241 = boost
                4.5238285 = idf(docFreq=1255, maxDocs=42596)
                0.015886016 = queryNorm
              0.28273928 = fieldWeight in 2809, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5238285 = idf(docFreq=1255, maxDocs=42596)
                0.0625 = fieldNorm(doc=2809)
          0.3259875 = weight(abstract_txt:classifiers in 2809) [ClassicSimilarity], result of:
            0.3259875 = score(doc=2809,freq=5.0), product of:
              0.30833098 = queryWeight, product of:
                2.5655675 = boost
                7.5651712 = idf(docFreq=59, maxDocs=42596)
                0.015886016 = queryNorm
              1.0572648 = fieldWeight in 2809, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.5651712 = idf(docFreq=59, maxDocs=42596)
                0.0625 = fieldNorm(doc=2809)
          0.15828566 = weight(abstract_txt:classification in 2809) [ClassicSimilarity], result of:
            0.15828566 = score(doc=2809,freq=6.0), product of:
              0.2585181 = queryWeight, product of:
                4.068937 = boost
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.015886016 = queryNorm
              0.6122808 = fieldWeight in 2809, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.0625 = fieldNorm(doc=2809)
          0.39284363 = weight(abstract_txt:hierarchical in 2809) [ClassicSimilarity], result of:
            0.39284363 = score(doc=2809,freq=6.0), product of:
              0.44594634 = queryWeight, product of:
                4.878498 = boost
                5.7541537 = idf(docFreq=366, maxDocs=42596)
                0.015886016 = queryNorm
              0.8809213 = fieldWeight in 2809, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.7541537 = idf(docFreq=366, maxDocs=42596)
                0.0625 = fieldNorm(doc=2809)
        0.24 = coord(6/25)
    
  3. Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.24
    0.2383018 = sum of:
      0.2383018 = product of:
        0.99292415 = sum of:
          0.07796657 = weight(abstract_txt:performs in 3940) [ClassicSimilarity], result of:
            0.07796657 = score(doc=3940,freq=1.0), product of:
              0.13895115 = queryWeight, product of:
                1.2178419 = boost
                7.182179 = idf(docFreq=87, maxDocs=42596)
                0.015886016 = queryNorm
              0.56110775 = fieldWeight in 3940, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.182179 = idf(docFreq=87, maxDocs=42596)
                0.078125 = fieldNorm(doc=3940)
          0.068444654 = weight(abstract_txt:text in 3940) [ClassicSimilarity], result of:
            0.068444654 = score(doc=3940,freq=6.0), product of:
              0.08833019 = queryWeight, product of:
                1.3731861 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.015886016 = queryNorm
              0.7748728 = fieldWeight in 3940, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.078125 = fieldNorm(doc=3940)
          0.032008972 = weight(abstract_txt:system in 3940) [ClassicSimilarity], result of:
            0.032008972 = score(doc=3940,freq=1.0), product of:
              0.12184032 = queryWeight, product of:
                2.2807903 = boost
                3.3627198 = idf(docFreq=4010, maxDocs=42596)
                0.015886016 = queryNorm
              0.26271248 = fieldWeight in 3940, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3627198 = idf(docFreq=4010, maxDocs=42596)
                0.078125 = fieldNorm(doc=3940)
          0.25201 = weight(abstract_txt:hierarchy in 3940) [ClassicSimilarity], result of:
            0.25201 = score(doc=3940,freq=2.0), product of:
              0.34772196 = queryWeight, product of:
                3.336849 = boost
                6.559649 = idf(docFreq=163, maxDocs=42596)
                0.015886016 = queryNorm
              0.7247457 = fieldWeight in 3940, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.559649 = idf(docFreq=163, maxDocs=42596)
                0.078125 = fieldNorm(doc=3940)
          0.16154964 = weight(abstract_txt:classification in 3940) [ClassicSimilarity], result of:
            0.16154964 = score(doc=3940,freq=4.0), product of:
              0.2585181 = queryWeight, product of:
                4.068937 = boost
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.015886016 = queryNorm
              0.6249065 = fieldWeight in 3940, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.078125 = fieldNorm(doc=3940)
          0.40094435 = weight(abstract_txt:hierarchical in 3940) [ClassicSimilarity], result of:
            0.40094435 = score(doc=3940,freq=4.0), product of:
              0.44594634 = queryWeight, product of:
                4.878498 = boost
                5.7541537 = idf(docFreq=366, maxDocs=42596)
                0.015886016 = queryNorm
              0.89908653 = fieldWeight in 3940, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.7541537 = idf(docFreq=366, maxDocs=42596)
                0.078125 = fieldNorm(doc=3940)
        0.24 = coord(6/25)
    
  4. Li, T.; Zhu, S.; Ogihara, M.: Hierarchical document classification using automatically generated hierarchy (2007) 0.23
    0.23400092 = sum of:
      0.23400092 = product of:
        0.83571756 = sum of:
          0.106817685 = weight(abstract_txt:categorization in 798) [ClassicSimilarity], result of:
            0.106817685 = score(doc=798,freq=3.0), product of:
              0.11884427 = queryWeight, product of:
                1.1262867 = boost
                6.6422358 = idf(docFreq=150, maxDocs=42596)
                0.015886016 = queryNorm
              0.89880383 = fieldWeight in 798, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.6422358 = idf(docFreq=150, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.13194874 = weight(abstract_txt:hierarchies in 798) [ClassicSimilarity], result of:
            0.13194874 = score(doc=798,freq=3.0), product of:
              0.13682106 = queryWeight, product of:
                1.2084712 = boost
                7.126916 = idf(docFreq=92, maxDocs=42596)
                0.015886016 = queryNorm
              0.9643891 = fieldWeight in 798, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.126916 = idf(docFreq=92, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.03951654 = weight(abstract_txt:text in 798) [ClassicSimilarity], result of:
            0.03951654 = score(doc=798,freq=2.0), product of:
              0.08833019 = queryWeight, product of:
                1.3731861 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.015886016 = queryNorm
              0.44737297 = fieldWeight in 798, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.037524503 = weight(abstract_txt:large in 798) [ClassicSimilarity], result of:
            0.037524503 = score(doc=798,freq=1.0), product of:
              0.10751684 = queryWeight, product of:
                1.5150015 = boost
                4.467334 = idf(docFreq=1328, maxDocs=42596)
                0.015886016 = queryNorm
              0.34901047 = fieldWeight in 798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.467334 = idf(docFreq=1328, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.05844931 = weight(abstract_txt:method in 798) [ClassicSimilarity], result of:
            0.05844931 = score(doc=798,freq=1.0), product of:
              0.16538009 = queryWeight, product of:
                2.301241 = boost
                4.5238285 = idf(docFreq=1255, maxDocs=42596)
                0.015886016 = queryNorm
              0.3534241 = fieldWeight in 798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5238285 = idf(docFreq=1255, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.11423284 = weight(abstract_txt:classification in 798) [ClassicSimilarity], result of:
            0.11423284 = score(doc=798,freq=2.0), product of:
              0.2585181 = queryWeight, product of:
                4.068937 = boost
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.015886016 = queryNorm
              0.44187558 = fieldWeight in 798, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.34722796 = weight(abstract_txt:hierarchical in 798) [ClassicSimilarity], result of:
            0.34722796 = score(doc=798,freq=3.0), product of:
              0.44594634 = queryWeight, product of:
                4.878498 = boost
                5.7541537 = idf(docFreq=366, maxDocs=42596)
                0.015886016 = queryNorm
              0.7786317 = fieldWeight in 798, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7541537 = idf(docFreq=366, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
        0.28 = coord(7/25)
    
  5. Pons-Porrata, A.; Berlanga-Llavori, R.; Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques (2007) 0.19
    0.19110219 = sum of:
      0.19110219 = product of:
        0.6825078 = sum of:
          0.04316539 = weight(abstract_txt:build in 2096) [ClassicSimilarity], result of:
            0.04316539 = score(doc=2096,freq=1.0), product of:
              0.093687214 = queryWeight, product of:
                5.8974643 = idf(docFreq=317, maxDocs=42596)
                0.015886016 = queryNorm
              0.4607394 = fieldWeight in 2096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8974643 = idf(docFreq=317, maxDocs=42596)
                0.078125 = fieldNorm(doc=2096)
          0.07618064 = weight(abstract_txt:hierarchies in 2096) [ClassicSimilarity], result of:
            0.07618064 = score(doc=2096,freq=1.0), product of:
              0.13682106 = queryWeight, product of:
                1.2084712 = boost
                7.126916 = idf(docFreq=92, maxDocs=42596)
                0.015886016 = queryNorm
              0.5567903 = fieldWeight in 2096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.126916 = idf(docFreq=92, maxDocs=42596)
                0.078125 = fieldNorm(doc=2096)
          0.045267522 = weight(abstract_txt:system in 2096) [ClassicSimilarity], result of:
            0.045267522 = score(doc=2096,freq=2.0), product of:
              0.12184032 = queryWeight, product of:
                2.2807903 = boost
                3.3627198 = idf(docFreq=4010, maxDocs=42596)
                0.015886016 = queryNorm
              0.37153155 = fieldWeight in 2096, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3627198 = idf(docFreq=4010, maxDocs=42596)
                0.078125 = fieldNorm(doc=2096)
          0.05844931 = weight(abstract_txt:method in 2096) [ClassicSimilarity], result of:
            0.05844931 = score(doc=2096,freq=1.0), product of:
              0.16538009 = queryWeight, product of:
                2.301241 = boost
                4.5238285 = idf(docFreq=1255, maxDocs=42596)
                0.015886016 = queryNorm
              0.3534241 = fieldWeight in 2096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5238285 = idf(docFreq=1255, maxDocs=42596)
                0.078125 = fieldNorm(doc=2096)
          0.17819797 = weight(abstract_txt:hierarchy in 2096) [ClassicSimilarity], result of:
            0.17819797 = score(doc=2096,freq=1.0), product of:
              0.34772196 = queryWeight, product of:
                3.336849 = boost
                6.559649 = idf(docFreq=163, maxDocs=42596)
                0.015886016 = queryNorm
              0.51247257 = fieldWeight in 2096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.559649 = idf(docFreq=163, maxDocs=42596)
                0.078125 = fieldNorm(doc=2096)
          0.08077482 = weight(abstract_txt:classification in 2096) [ClassicSimilarity], result of:
            0.08077482 = score(doc=2096,freq=1.0), product of:
              0.2585181 = queryWeight, product of:
                4.068937 = boost
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.015886016 = queryNorm
              0.31245324 = fieldWeight in 2096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.078125 = fieldNorm(doc=2096)
          0.20047218 = weight(abstract_txt:hierarchical in 2096) [ClassicSimilarity], result of:
            0.20047218 = score(doc=2096,freq=1.0), product of:
              0.44594634 = queryWeight, product of:
                4.878498 = boost
                5.7541537 = idf(docFreq=366, maxDocs=42596)
                0.015886016 = queryNorm
              0.44954327 = fieldWeight in 2096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7541537 = idf(docFreq=366, maxDocs=42596)
                0.078125 = fieldNorm(doc=2096)
        0.28 = coord(7/25)