Document (#34562)

Author
Yi, K.
Title
Automatic text classification using library classification schemes : trends, issues and challenges
Source
International cataloguing and bibliographic control. 36(2007) no.4, S.78-82
Year
2007
Abstract
The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Theme
Automatisches Klassifizieren

Similar documents (content)

  1. Yi, K.: Challenges in automated classification using library classification schemes (2006) 0.44
    0.43700653 = sum of:
      0.43700653 = product of:
        1.3656454 = sum of:
          0.05609891 = weight(abstract_txt:tool in 811) [ClassicSimilarity], result of:
            0.05609891 = score(doc=811,freq=1.0), product of:
              0.09051771 = queryWeight, product of:
                1.0445518 = boost
                4.9580493 = idf(docFreq=841, maxDocs=44083)
                0.01747804 = queryNorm
              0.61975616 = fieldWeight in 811, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9580493 = idf(docFreq=841, maxDocs=44083)
                0.125 = fieldNorm(doc=811)
          0.06840488 = weight(abstract_txt:projects in 811) [ClassicSimilarity], result of:
            0.06840488 = score(doc=811,freq=1.0), product of:
              0.10331309 = queryWeight, product of:
                1.1159401 = boost
                5.2969 = idf(docFreq=599, maxDocs=44083)
                0.01747804 = queryNorm
              0.6621125 = fieldWeight in 811, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2969 = idf(docFreq=599, maxDocs=44083)
                0.125 = fieldNorm(doc=811)
          0.08877087 = weight(abstract_txt:popular in 811) [ClassicSimilarity], result of:
            0.08877087 = score(doc=811,freq=1.0), product of:
              0.122916706 = queryWeight, product of:
                1.2172189 = boost
                5.7776275 = idf(docFreq=370, maxDocs=44083)
                0.01747804 = queryNorm
              0.72220343 = fieldWeight in 811, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7776275 = idf(docFreq=370, maxDocs=44083)
                0.125 = fieldNorm(doc=811)
          0.103628784 = weight(abstract_txt:library in 811) [ClassicSimilarity], result of:
            0.103628784 = score(doc=811,freq=3.0), product of:
              0.14999051 = queryWeight, product of:
                2.6892123 = boost
                3.191141 = idf(docFreq=4927, maxDocs=44083)
                0.01747804 = queryNorm
              0.69090223 = fieldWeight in 811, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.191141 = idf(docFreq=4927, maxDocs=44083)
                0.125 = fieldNorm(doc=811)
          0.19100833 = weight(abstract_txt:challenges in 811) [ClassicSimilarity], result of:
            0.19100833 = score(doc=811,freq=1.0), product of:
              0.2954649 = queryWeight, product of:
                3.268713 = boost
                5.1717367 = idf(docFreq=679, maxDocs=44083)
                0.01747804 = queryNorm
              0.6464671 = fieldWeight in 811, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1717367 = idf(docFreq=679, maxDocs=44083)
                0.125 = fieldNorm(doc=811)
          0.23465537 = weight(abstract_txt:schemes in 811) [ClassicSimilarity], result of:
            0.23465537 = score(doc=811,freq=1.0), product of:
              0.33891544 = queryWeight, product of:
                3.500818 = boost
                5.5389714 = idf(docFreq=470, maxDocs=44083)
                0.01747804 = queryNorm
              0.6923714 = fieldWeight in 811, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5389714 = idf(docFreq=470, maxDocs=44083)
                0.125 = fieldNorm(doc=811)
          0.15211171 = weight(abstract_txt:text in 811) [ClassicSimilarity], result of:
            0.15211171 = score(doc=811,freq=1.0), product of:
              0.30097404 = queryWeight, product of:
                4.25905 = boost
                4.0431848 = idf(docFreq=2101, maxDocs=44083)
                0.01747804 = queryNorm
              0.5053981 = fieldWeight in 811, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0431848 = idf(docFreq=2101, maxDocs=44083)
                0.125 = fieldNorm(doc=811)
          0.47096652 = weight(abstract_txt:classification in 811) [ClassicSimilarity], result of:
            0.47096652 = score(doc=811,freq=4.0), product of:
              0.47108647 = queryWeight, product of:
                6.739979 = boost
                3.9989815 = idf(docFreq=2196, maxDocs=44083)
                0.01747804 = queryNorm
              0.99974537 = fieldWeight in 811, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9989815 = idf(docFreq=2196, maxDocs=44083)
                0.125 = fieldNorm(doc=811)
        0.32 = coord(8/25)
    
  2. Batley, S.: Classification in theory and practice (2005) 0.26
    0.2570235 = sum of:
      0.2570235 = product of:
        0.80319846 = sum of:
          0.008598833 = weight(abstract_txt:that in 2171) [ClassicSimilarity], result of:
            0.008598833 = score(doc=2171,freq=5.0), product of:
              0.041480463 = queryWeight, product of:
                2.3732903 = idf(docFreq=11164, maxDocs=44083)
                0.01747804 = queryNorm
              0.20729838 = fieldWeight in 2171, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3732903 = idf(docFreq=11164, maxDocs=44083)
                0.0390625 = fieldNorm(doc=2171)
          0.0349707 = weight(abstract_txt:examined in 2171) [ClassicSimilarity], result of:
            0.0349707 = score(doc=2171,freq=3.0), product of:
              0.09945495 = queryWeight, product of:
                1.0949049 = boost
                5.1970544 = idf(docFreq=662, maxDocs=44083)
                0.01747804 = queryNorm
              0.35162354 = fieldWeight in 2171, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.1970544 = idf(docFreq=662, maxDocs=44083)
                0.0390625 = fieldNorm(doc=2171)
          0.027740896 = weight(abstract_txt:popular in 2171) [ClassicSimilarity], result of:
            0.027740896 = score(doc=2171,freq=1.0), product of:
              0.122916706 = queryWeight, product of:
                1.2172189 = boost
                5.7776275 = idf(docFreq=370, maxDocs=44083)
                0.01747804 = queryNorm
              0.22568858 = fieldWeight in 2171, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7776275 = idf(docFreq=370, maxDocs=44083)
                0.0390625 = fieldNorm(doc=2171)
          0.022940297 = weight(abstract_txt:issues in 2171) [ClassicSimilarity], result of:
            0.022940297 = score(doc=2171,freq=1.0), product of:
              0.13643944 = queryWeight, product of:
                1.8136278 = boost
                4.3042655 = idf(docFreq=1618, maxDocs=44083)
                0.01747804 = queryNorm
              0.16813537 = fieldWeight in 2171, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3042655 = idf(docFreq=1618, maxDocs=44083)
                0.0390625 = fieldNorm(doc=2171)
          0.04180756 = weight(abstract_txt:library in 2171) [ClassicSimilarity], result of:
            0.04180756 = score(doc=2171,freq=5.0), product of:
              0.14999051 = queryWeight, product of:
                2.6892123 = boost
                3.191141 = idf(docFreq=4927, maxDocs=44083)
                0.01747804 = queryNorm
              0.27873468 = fieldWeight in 2171, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.191141 = idf(docFreq=4927, maxDocs=44083)
                0.0390625 = fieldNorm(doc=2171)
          0.23188919 = weight(abstract_txt:schemes in 2171) [ClassicSimilarity], result of:
            0.23188919 = score(doc=2171,freq=10.0), product of:
              0.33891544 = queryWeight, product of:
                3.500818 = boost
                5.5389714 = idf(docFreq=470, maxDocs=44083)
                0.01747804 = queryNorm
              0.6842096 = fieldWeight in 2171, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                5.5389714 = idf(docFreq=470, maxDocs=44083)
                0.0390625 = fieldNorm(doc=2171)
          0.08233288 = weight(abstract_txt:text in 2171) [ClassicSimilarity], result of:
            0.08233288 = score(doc=2171,freq=3.0), product of:
              0.30097404 = queryWeight, product of:
                4.25905 = boost
                4.0431848 = idf(docFreq=2101, maxDocs=44083)
                0.01747804 = queryNorm
              0.27355474 = fieldWeight in 2171, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0431848 = idf(docFreq=2101, maxDocs=44083)
                0.0390625 = fieldNorm(doc=2171)
          0.35291815 = weight(abstract_txt:classification in 2171) [ClassicSimilarity], result of:
            0.35291815 = score(doc=2171,freq=23.0), product of:
              0.47108647 = queryWeight, product of:
                6.739979 = boost
                3.9989815 = idf(docFreq=2196, maxDocs=44083)
                0.01747804 = queryNorm
              0.7491579 = fieldWeight in 2171, product of:
                4.7958317 = tf(freq=23.0), with freq of:
                  23.0 = termFreq=23.0
                3.9989815 = idf(docFreq=2196, maxDocs=44083)
                0.0390625 = fieldNorm(doc=2171)
        0.32 = coord(8/25)
    
  3. Hurt, C.D.: Classification and subject analysis : looking to the future at a distance (1997) 0.25
    0.24675761 = sum of:
      0.24675761 = product of:
        1.0281568 = sum of:
          0.010767441 = weight(abstract_txt:that in 6999) [ClassicSimilarity], result of:
            0.010767441 = score(doc=6999,freq=1.0), product of:
              0.041480463 = queryWeight, product of:
                2.3732903 = idf(docFreq=11164, maxDocs=44083)
                0.01747804 = queryNorm
              0.25957862 = fieldWeight in 6999, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3732903 = idf(docFreq=11164, maxDocs=44083)
                0.109375 = fieldNorm(doc=6999)
          0.15064919 = weight(abstract_txt:proliferation in 6999) [ClassicSimilarity], result of:
            0.15064919 = score(doc=6999,freq=1.0), product of:
              0.19116268 = queryWeight, product of:
                1.5179754 = boost
                7.205193 = idf(docFreq=88, maxDocs=44083)
                0.01747804 = queryNorm
              0.788068 = fieldWeight in 6999, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.205193 = idf(docFreq=88, maxDocs=44083)
                0.109375 = fieldNorm(doc=6999)
          0.052351344 = weight(abstract_txt:library in 6999) [ClassicSimilarity], result of:
            0.052351344 = score(doc=6999,freq=1.0), product of:
              0.14999051 = queryWeight, product of:
                2.6892123 = boost
                3.191141 = idf(docFreq=4927, maxDocs=44083)
                0.01747804 = queryNorm
              0.34903103 = fieldWeight in 6999, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.191141 = idf(docFreq=4927, maxDocs=44083)
                0.109375 = fieldNorm(doc=6999)
          0.16713229 = weight(abstract_txt:challenges in 6999) [ClassicSimilarity], result of:
            0.16713229 = score(doc=6999,freq=1.0), product of:
              0.2954649 = queryWeight, product of:
                3.268713 = boost
                5.1717367 = idf(docFreq=679, maxDocs=44083)
                0.01747804 = queryNorm
              0.5656587 = fieldWeight in 6999, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1717367 = idf(docFreq=679, maxDocs=44083)
                0.109375 = fieldNorm(doc=6999)
          0.29037118 = weight(abstract_txt:schemes in 6999) [ClassicSimilarity], result of:
            0.29037118 = score(doc=6999,freq=2.0), product of:
              0.33891544 = queryWeight, product of:
                3.500818 = boost
                5.5389714 = idf(docFreq=470, maxDocs=44083)
                0.01747804 = queryNorm
              0.85676587 = fieldWeight in 6999, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5389714 = idf(docFreq=470, maxDocs=44083)
                0.109375 = fieldNorm(doc=6999)
          0.35688534 = weight(abstract_txt:classification in 6999) [ClassicSimilarity], result of:
            0.35688534 = score(doc=6999,freq=3.0), product of:
              0.47108647 = queryWeight, product of:
                6.739979 = boost
                3.9989815 = idf(docFreq=2196, maxDocs=44083)
                0.01747804 = queryNorm
              0.75757927 = fieldWeight in 6999, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9989815 = idf(docFreq=2196, maxDocs=44083)
                0.109375 = fieldNorm(doc=6999)
        0.24 = coord(6/25)
    
  4. Kumbhar, R.: Library classification trends in the 21st century (2012) 0.23
    0.23017685 = sum of:
      0.23017685 = product of:
        0.82206017 = sum of:
          0.0076910295 = weight(abstract_txt:that in 1737) [ClassicSimilarity], result of:
            0.0076910295 = score(doc=1737,freq=1.0), product of:
              0.041480463 = queryWeight, product of:
                2.3732903 = idf(docFreq=11164, maxDocs=44083)
                0.01747804 = queryNorm
              0.1854133 = fieldWeight in 1737, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3732903 = idf(docFreq=11164, maxDocs=44083)
                0.078125 = fieldNorm(doc=1737)
          0.035061818 = weight(abstract_txt:tool in 1737) [ClassicSimilarity], result of:
            0.035061818 = score(doc=1737,freq=1.0), product of:
              0.09051771 = queryWeight, product of:
                1.0445518 = boost
                4.9580493 = idf(docFreq=841, maxDocs=44083)
                0.01747804 = queryNorm
              0.3873476 = fieldWeight in 1737, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9580493 = idf(docFreq=841, maxDocs=44083)
                0.078125 = fieldNorm(doc=1737)
          0.040380683 = weight(abstract_txt:automatic in 1737) [ClassicSimilarity], result of:
            0.040380683 = score(doc=1737,freq=1.0), product of:
              0.09945495 = queryWeight, product of:
                1.0949049 = boost
                5.1970544 = idf(docFreq=662, maxDocs=44083)
                0.01747804 = queryNorm
              0.40601987 = fieldWeight in 1737, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1970544 = idf(docFreq=662, maxDocs=44083)
                0.078125 = fieldNorm(doc=1737)
          0.050405506 = weight(abstract_txt:trends in 1737) [ClassicSimilarity], result of:
            0.050405506 = score(doc=1737,freq=1.0), product of:
              0.115299985 = queryWeight, product of:
                1.1789024 = boost
                5.595755 = idf(docFreq=444, maxDocs=44083)
                0.01747804 = queryNorm
              0.43716836 = fieldWeight in 1737, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.595755 = idf(docFreq=444, maxDocs=44083)
                0.078125 = fieldNorm(doc=1737)
          0.08361512 = weight(abstract_txt:library in 1737) [ClassicSimilarity], result of:
            0.08361512 = score(doc=1737,freq=5.0), product of:
              0.14999051 = queryWeight, product of:
                2.6892123 = boost
                3.191141 = idf(docFreq=4927, maxDocs=44083)
                0.01747804 = queryNorm
              0.55746937 = fieldWeight in 1737, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.191141 = idf(docFreq=4927, maxDocs=44083)
                0.078125 = fieldNorm(doc=1737)
          0.09506982 = weight(abstract_txt:text in 1737) [ClassicSimilarity], result of:
            0.09506982 = score(doc=1737,freq=1.0), product of:
              0.30097404 = queryWeight, product of:
                4.25905 = boost
                4.0431848 = idf(docFreq=2101, maxDocs=44083)
                0.01747804 = queryNorm
              0.3158738 = fieldWeight in 1737, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0431848 = idf(docFreq=2101, maxDocs=44083)
                0.078125 = fieldNorm(doc=1737)
          0.5098362 = weight(abstract_txt:classification in 1737) [ClassicSimilarity], result of:
            0.5098362 = score(doc=1737,freq=12.0), product of:
              0.47108647 = queryWeight, product of:
                6.739979 = boost
                3.9989815 = idf(docFreq=2196, maxDocs=44083)
                0.01747804 = queryNorm
              1.0822561 = fieldWeight in 1737, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                3.9989815 = idf(docFreq=2196, maxDocs=44083)
                0.078125 = fieldNorm(doc=1737)
        0.28 = coord(7/25)
    
  5. Wang, J.: ¬An extensive study on automated Dewey Decimal Classification (2009) 0.23
    0.22712016 = sum of:
      0.22712016 = product of:
        0.70975053 = sum of:
          0.0061528236 = weight(abstract_txt:that in 4173) [ClassicSimilarity], result of:
            0.0061528236 = score(doc=4173,freq=1.0), product of:
              0.041480463 = queryWeight, product of:
                2.3732903 = idf(docFreq=11164, maxDocs=44083)
                0.01747804 = queryNorm
              0.14833064 = fieldWeight in 4173, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3732903 = idf(docFreq=11164, maxDocs=44083)
                0.0625 = fieldNorm(doc=4173)
          0.032304548 = weight(abstract_txt:automatic in 4173) [ClassicSimilarity], result of:
            0.032304548 = score(doc=4173,freq=1.0), product of:
              0.09945495 = queryWeight, product of:
                1.0949049 = boost
                5.1970544 = idf(docFreq=662, maxDocs=44083)
                0.01747804 = queryNorm
              0.3248159 = fieldWeight in 4173, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1970544 = idf(docFreq=662, maxDocs=44083)
                0.0625 = fieldNorm(doc=4173)
          0.033540368 = weight(abstract_txt:created in 4173) [ClassicSimilarity], result of:
            0.033540368 = score(doc=4173,freq=1.0), product of:
              0.10197549 = queryWeight, product of:
                1.1086925 = boost
                5.2624984 = idf(docFreq=620, maxDocs=44083)
                0.01747804 = queryNorm
              0.32890615 = fieldWeight in 4173, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2624984 = idf(docFreq=620, maxDocs=44083)
                0.0625 = fieldNorm(doc=4173)
          0.081039846 = weight(abstract_txt:automated in 4173) [ClassicSimilarity], result of:
            0.081039846 = score(doc=4173,freq=1.0), product of:
              0.23134476 = queryWeight, product of:
                2.3616092 = boost
                5.6047845 = idf(docFreq=440, maxDocs=44083)
                0.01747804 = queryNorm
              0.35029903 = fieldWeight in 4173, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6047845 = idf(docFreq=440, maxDocs=44083)
                0.0625 = fieldNorm(doc=4173)
          0.051814392 = weight(abstract_txt:library in 4173) [ClassicSimilarity], result of:
            0.051814392 = score(doc=4173,freq=3.0), product of:
              0.14999051 = queryWeight, product of:
                2.6892123 = boost
                3.191141 = idf(docFreq=4927, maxDocs=44083)
                0.01747804 = queryNorm
              0.34545112 = fieldWeight in 4173, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.191141 = idf(docFreq=4927, maxDocs=44083)
                0.0625 = fieldNorm(doc=4173)
          0.11732768 = weight(abstract_txt:schemes in 4173) [ClassicSimilarity], result of:
            0.11732768 = score(doc=4173,freq=1.0), product of:
              0.33891544 = queryWeight, product of:
                3.500818 = boost
                5.5389714 = idf(docFreq=470, maxDocs=44083)
                0.01747804 = queryNorm
              0.3461857 = fieldWeight in 4173, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5389714 = idf(docFreq=470, maxDocs=44083)
                0.0625 = fieldNorm(doc=4173)
          0.076055855 = weight(abstract_txt:text in 4173) [ClassicSimilarity], result of:
            0.076055855 = score(doc=4173,freq=1.0), product of:
              0.30097404 = queryWeight, product of:
                4.25905 = boost
                4.0431848 = idf(docFreq=2101, maxDocs=44083)
                0.01747804 = queryNorm
              0.25269905 = fieldWeight in 4173, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0431848 = idf(docFreq=2101, maxDocs=44083)
                0.0625 = fieldNorm(doc=4173)
          0.31151506 = weight(abstract_txt:classification in 4173) [ClassicSimilarity], result of:
            0.31151506 = score(doc=4173,freq=7.0), product of:
              0.47108647 = queryWeight, product of:
                6.739979 = boost
                3.9989815 = idf(docFreq=2196, maxDocs=44083)
                0.01747804 = queryNorm
              0.66126937 = fieldWeight in 4173, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.9989815 = idf(docFreq=2196, maxDocs=44083)
                0.0625 = fieldNorm(doc=4173)
        0.32 = coord(8/25)