Document (#30275)

Author
Yoon, Y.
Lee, C.
Lee, G.G.
Title
¬An effective procedure for constructing a hierarchical text classification system
Source
Journal of the American Society for Information Science and Technology. 57(2006) no.3, S.431-442
Year
2006
Abstract
In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with large collections of practical data. In this article, we introduce a new evaluation scheme for internal node classifiers, which can be used effectively to develop a hierarchical classification system. We also show that our method for constructing the hierarchical classification system is very effective, especially for the task of constructing classifiers applied to hierarchy tree with a lot of levels.
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Yoon, L.L.: ¬The performance of cited references as an approach to information retrieval (1994) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:yoon in 219) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 219, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=219)
    
  2. Yoon, J.W.: Utilizing quantitative users' reactions to represent affective meanings of an image (2010) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:yoon in 585) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 585, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=585)
    
  3. Yoon, J.W.: Towards a user-oriented thesaurus for non-domain-specific image collections (2009) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:yoon in 1222) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 1222, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=1222)
    
  4. Yoon, K.: Conceptual syntagmatic associations in user tagging (2012) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:yoon in 2241) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 2241, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=2241)
    
  5. Yoon, A.: Data reusers' trust development (2017) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:yoon in 5533) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 5533, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=5533)
    

Similar documents (content)

  1. Gauch, S.; Chandramouli, A.; Ranganathan, S.: Training a hierarchical classifier using inter document relationships (2009) 0.28
    0.28058964 = sum of:
      0.28058964 = product of:
        1.1691235 = sum of:
          0.0395275 = weight(abstract_txt:text in 4698) [ClassicSimilarity], result of:
            0.0395275 = score(doc=4698,freq=2.0), product of:
              0.08833502 = queryWeight, product of:
                1.3741663 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.015872022 = queryNorm
              0.44747257 = fieldWeight in 4698, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.053086862 = weight(abstract_txt:large in 4698) [ClassicSimilarity], result of:
            0.053086862 = score(doc=4698,freq=2.0), product of:
              0.10752879 = queryWeight, product of:
                1.5161257 = boost
                4.468454 = idf(docFreq=1331, maxDocs=42740)
                0.015872022 = queryNorm
              0.49369904 = fieldWeight in 4698, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.468454 = idf(docFreq=1331, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.3159353 = weight(abstract_txt:classifiers in 4698) [ClassicSimilarity], result of:
            0.3159353 = score(doc=4698,freq=3.0), product of:
              0.30848572 = queryWeight, product of:
                2.5679724 = boost
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.015872022 = queryNorm
              1.024149 = fieldWeight in 4698, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.25230065 = weight(abstract_txt:hierarchy in 4698) [ClassicSimilarity], result of:
            0.25230065 = score(doc=4698,freq=2.0), product of:
              0.34794402 = queryWeight, product of:
                3.3402052 = boost
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.015872022 = queryNorm
              0.7251185 = fieldWeight in 4698, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.16155338 = weight(abstract_txt:classification in 4698) [ClassicSimilarity], result of:
            0.16155338 = score(doc=4698,freq=4.0), product of:
              0.25848848 = queryWeight, product of:
                4.0714965 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.015872022 = queryNorm
              0.6249926 = fieldWeight in 4698, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.34671983 = weight(abstract_txt:hierarchical in 4698) [ClassicSimilarity], result of:
            0.34671983 = score(doc=4698,freq=3.0), product of:
              0.44545323 = queryWeight, product of:
                4.8791466 = boost
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.015872022 = queryNorm
              0.778353 = fieldWeight in 4698, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
        0.24 = coord(6/25)
    
  2. Sun, A.; Lim, E.-P.; Ng, W.-K.: Performance measurement framework for hierarchical text classification (2003) 0.24
    0.23995328 = sum of:
      0.23995328 = product of:
        0.99980533 = sum of:
          0.053916957 = weight(abstract_txt:tree in 2809) [ClassicSimilarity], result of:
            0.053916957 = score(doc=2809,freq=1.0), product of:
              0.12607345 = queryWeight, product of:
                1.1608328 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.015872022 = queryNorm
              0.42766306 = fieldWeight in 2809, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.0625 = fieldNorm(doc=2809)
          0.02236013 = weight(abstract_txt:text in 2809) [ClassicSimilarity], result of:
            0.02236013 = score(doc=2809,freq=1.0), product of:
              0.08833502 = queryWeight, product of:
                1.3741663 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.015872022 = queryNorm
              0.2531287 = fieldWeight in 2809, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=2809)
          0.04667357 = weight(abstract_txt:method in 2809) [ClassicSimilarity], result of:
            0.04667357 = score(doc=2809,freq=1.0), product of:
              0.16515605 = queryWeight, product of:
                2.30126 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.015872022 = queryNorm
              0.28260285 = fieldWeight in 2809, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.0625 = fieldNorm(doc=2809)
          0.32629657 = weight(abstract_txt:classifiers in 2809) [ClassicSimilarity], result of:
            0.32629657 = score(doc=2809,freq=5.0), product of:
              0.30848572 = queryWeight, product of:
                2.5679724 = boost
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.015872022 = queryNorm
              1.0577364 = fieldWeight in 2809, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.0625 = fieldNorm(doc=2809)
          0.15828936 = weight(abstract_txt:classification in 2809) [ClassicSimilarity], result of:
            0.15828936 = score(doc=2809,freq=6.0), product of:
              0.25848848 = queryWeight, product of:
                4.0714965 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.015872022 = queryNorm
              0.61236525 = fieldWeight in 2809, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.0625 = fieldNorm(doc=2809)
          0.39226875 = weight(abstract_txt:hierarchical in 2809) [ClassicSimilarity], result of:
            0.39226875 = score(doc=2809,freq=6.0), product of:
              0.44545323 = queryWeight, product of:
                4.8791466 = boost
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.015872022 = queryNorm
              0.88060594 = fieldWeight in 2809, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.0625 = fieldNorm(doc=2809)
        0.24 = coord(6/25)
    
  3. Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.24
    0.23825663 = sum of:
      0.23825663 = product of:
        0.992736 = sum of:
          0.078046046 = weight(abstract_txt:performs in 4761) [ClassicSimilarity], result of:
            0.078046046 = score(doc=4761,freq=1.0), product of:
              0.13902748 = queryWeight, product of:
                1.2190125 = boost
                7.1855536 = idf(docFreq=87, maxDocs=42740)
                0.015872022 = queryNorm
              0.5613714 = fieldWeight in 4761, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1855536 = idf(docFreq=87, maxDocs=42740)
                0.078125 = fieldNorm(doc=4761)
          0.06846364 = weight(abstract_txt:text in 4761) [ClassicSimilarity], result of:
            0.06846364 = score(doc=4761,freq=6.0), product of:
              0.08833502 = queryWeight, product of:
                1.3741663 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.015872022 = queryNorm
              0.7750452 = fieldWeight in 4761, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=4761)
          0.032014642 = weight(abstract_txt:system in 4761) [ClassicSimilarity], result of:
            0.032014642 = score(doc=4761,freq=1.0), product of:
              0.12183885 = queryWeight, product of:
                2.282342 = boost
                3.3633559 = idf(docFreq=4021, maxDocs=42740)
                0.015872022 = queryNorm
              0.2627622 = fieldWeight in 4761, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3633559 = idf(docFreq=4021, maxDocs=42740)
                0.078125 = fieldNorm(doc=4761)
          0.25230065 = weight(abstract_txt:hierarchy in 4761) [ClassicSimilarity], result of:
            0.25230065 = score(doc=4761,freq=2.0), product of:
              0.34794402 = queryWeight, product of:
                3.3402052 = boost
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.015872022 = queryNorm
              0.7251185 = fieldWeight in 4761, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.078125 = fieldNorm(doc=4761)
          0.16155338 = weight(abstract_txt:classification in 4761) [ClassicSimilarity], result of:
            0.16155338 = score(doc=4761,freq=4.0), product of:
              0.25848848 = queryWeight, product of:
                4.0714965 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.015872022 = queryNorm
              0.6249926 = fieldWeight in 4761, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.078125 = fieldNorm(doc=4761)
          0.40035763 = weight(abstract_txt:hierarchical in 4761) [ClassicSimilarity], result of:
            0.40035763 = score(doc=4761,freq=4.0), product of:
              0.44545323 = queryWeight, product of:
                4.8791466 = boost
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.015872022 = queryNorm
              0.89876467 = fieldWeight in 4761, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.078125 = fieldNorm(doc=4761)
        0.24 = coord(6/25)
    
  4. Li, T.; Zhu, S.; Ogihara, M.: Hierarchical document classification using automatically generated hierarchy (2007) 0.23
    0.2339082 = sum of:
      0.2339082 = product of:
        0.83538646 = sum of:
          0.10693886 = weight(abstract_txt:categorization in 1798) [ClassicSimilarity], result of:
            0.10693886 = score(doc=1798,freq=3.0), product of:
              0.11891866 = queryWeight, product of:
                1.1274124 = boost
                6.645611 = idf(docFreq=150, maxDocs=42740)
                0.015872022 = queryNorm
              0.8992606 = fieldWeight in 1798, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.645611 = idf(docFreq=150, maxDocs=42740)
                0.078125 = fieldNorm(doc=1798)
          0.13208473 = weight(abstract_txt:hierarchies in 1798) [ClassicSimilarity], result of:
            0.13208473 = score(doc=1798,freq=3.0), product of:
              0.13689724 = queryWeight, product of:
                1.2096373 = boost
                7.130291 = idf(docFreq=92, maxDocs=42740)
                0.015872022 = queryNorm
              0.9648458 = fieldWeight in 1798, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.130291 = idf(docFreq=92, maxDocs=42740)
                0.078125 = fieldNorm(doc=1798)
          0.0395275 = weight(abstract_txt:text in 1798) [ClassicSimilarity], result of:
            0.0395275 = score(doc=1798,freq=2.0), product of:
              0.08833502 = queryWeight, product of:
                1.3741663 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.015872022 = queryNorm
              0.44747257 = fieldWeight in 1798, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=1798)
          0.03753808 = weight(abstract_txt:large in 1798) [ClassicSimilarity], result of:
            0.03753808 = score(doc=1798,freq=1.0), product of:
              0.10752879 = queryWeight, product of:
                1.5161257 = boost
                4.468454 = idf(docFreq=1331, maxDocs=42740)
                0.015872022 = queryNorm
              0.34909797 = fieldWeight in 1798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.468454 = idf(docFreq=1331, maxDocs=42740)
                0.078125 = fieldNorm(doc=1798)
          0.05834196 = weight(abstract_txt:method in 1798) [ClassicSimilarity], result of:
            0.05834196 = score(doc=1798,freq=1.0), product of:
              0.16515605 = queryWeight, product of:
                2.30126 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.015872022 = queryNorm
              0.35325354 = fieldWeight in 1798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.078125 = fieldNorm(doc=1798)
          0.1142355 = weight(abstract_txt:classification in 1798) [ClassicSimilarity], result of:
            0.1142355 = score(doc=1798,freq=2.0), product of:
              0.25848848 = queryWeight, product of:
                4.0714965 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.015872022 = queryNorm
              0.44193652 = fieldWeight in 1798, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.078125 = fieldNorm(doc=1798)
          0.34671983 = weight(abstract_txt:hierarchical in 1798) [ClassicSimilarity], result of:
            0.34671983 = score(doc=1798,freq=3.0), product of:
              0.44545323 = queryWeight, product of:
                4.8791466 = boost
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.015872022 = queryNorm
              0.778353 = fieldWeight in 1798, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.078125 = fieldNorm(doc=1798)
        0.28 = coord(7/25)
    
  5. Pons-Porrata, A.; Berlanga-Llavori, R.; Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques (2007) 0.19
    0.1910498 = sum of:
      0.1910498 = product of:
        0.6823207 = sum of:
          0.043085035 = weight(abstract_txt:build in 2917) [ClassicSimilarity], result of:
            0.043085035 = score(doc=2917,freq=1.0), product of:
              0.093558736 = queryWeight, product of:
                5.8945694 = idf(docFreq=319, maxDocs=42740)
                0.015872022 = queryNorm
              0.46051323 = fieldWeight in 2917, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8945694 = idf(docFreq=319, maxDocs=42740)
                0.078125 = fieldNorm(doc=2917)
          0.07625915 = weight(abstract_txt:hierarchies in 2917) [ClassicSimilarity], result of:
            0.07625915 = score(doc=2917,freq=1.0), product of:
              0.13689724 = queryWeight, product of:
                1.2096373 = boost
                7.130291 = idf(docFreq=92, maxDocs=42740)
                0.015872022 = queryNorm
              0.557054 = fieldWeight in 2917, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.130291 = idf(docFreq=92, maxDocs=42740)
                0.078125 = fieldNorm(doc=2917)
          0.045275543 = weight(abstract_txt:system in 2917) [ClassicSimilarity], result of:
            0.045275543 = score(doc=2917,freq=2.0), product of:
              0.12183885 = queryWeight, product of:
                2.282342 = boost
                3.3633559 = idf(docFreq=4021, maxDocs=42740)
                0.015872022 = queryNorm
              0.37160185 = fieldWeight in 2917, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3633559 = idf(docFreq=4021, maxDocs=42740)
                0.078125 = fieldNorm(doc=2917)
          0.05834196 = weight(abstract_txt:method in 2917) [ClassicSimilarity], result of:
            0.05834196 = score(doc=2917,freq=1.0), product of:
              0.16515605 = queryWeight, product of:
                2.30126 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.015872022 = queryNorm
              0.35325354 = fieldWeight in 2917, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.078125 = fieldNorm(doc=2917)
          0.17840351 = weight(abstract_txt:hierarchy in 2917) [ClassicSimilarity], result of:
            0.17840351 = score(doc=2917,freq=1.0), product of:
              0.34794402 = queryWeight, product of:
                3.3402052 = boost
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.015872022 = queryNorm
              0.51273626 = fieldWeight in 2917, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.078125 = fieldNorm(doc=2917)
          0.08077669 = weight(abstract_txt:classification in 2917) [ClassicSimilarity], result of:
            0.08077669 = score(doc=2917,freq=1.0), product of:
              0.25848848 = queryWeight, product of:
                4.0714965 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.015872022 = queryNorm
              0.3124963 = fieldWeight in 2917, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.078125 = fieldNorm(doc=2917)
          0.20017882 = weight(abstract_txt:hierarchical in 2917) [ClassicSimilarity], result of:
            0.20017882 = score(doc=2917,freq=1.0), product of:
              0.44545323 = queryWeight, product of:
                4.8791466 = boost
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.015872022 = queryNorm
              0.44938233 = fieldWeight in 2917, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.078125 = fieldNorm(doc=2917)
        0.28 = coord(7/25)