Search (14 results, page 1 of 1)

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.11

0.10669494 = product of:
  0.21338987 = sum of:
    0.21338987 = sum of:
      0.16601379 = weight(_text_:tree in 5273) [ClassicSimilarity], result of:
        0.16601379 = score(doc=5273,freq=2.0), product of:
          0.32745647 = queryWeight, product of:
            6.5552235 = idf(docFreq=170, maxDocs=44218)
            0.049953517 = queryNorm
          0.5069797 = fieldWeight in 5273, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            6.5552235 = idf(docFreq=170, maxDocs=44218)
            0.0546875 = fieldNorm(doc=5273)
      0.047376085 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
        0.047376085 = score(doc=5273,freq=2.0), product of:
          0.17492871 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049953517 = queryNorm
          0.2708308 = fieldWeight in 5273, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=5273)
  0.5 = coord(1/2)

Abstract: In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with large collections of practical data. In this article, we introduce a new evaluation scheme for internal node classifiers, which can be used effectively to develop a hierarchical classification system. We also show that our method for constructing the hierarchical classification system is very effective, especially for the task of constructing classifiers applied to hierarchy tree with a lot of levels.
Date: 22. 7.2006 16:24:52

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.10

0.09964347 = sum of:
  0.07933943 = product of:
    0.23801827 = sum of:
      0.23801827 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.23801827 = score(doc=562,freq=2.0), product of:
          0.42350647 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.049953517 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.020304035 = product of:
    0.04060807 = sum of:
      0.04060807 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.04060807 = score(doc=562,freq=2.0), product of:
          0.17492871 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049953517 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.5 = coord(1/2)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Frank, E.; Paynter, G.W.: Predicting Library of Congress Classifications from Library of Congress Subject Headings (2004) 0.05
```
0.050309774 = product of:
  0.10061955 = sum of:
    0.10061955 = product of:
      0.2012391 = sum of:
        0.2012391 = weight(_text_:tree in 2218) [ClassicSimilarity], result of:
          0.2012391 = score(doc=2218,freq=4.0), product of:
            0.32745647 = queryWeight, product of:
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.049953517 = queryNorm
            0.6145522 = fieldWeight in 2218, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.046875 = fieldNorm(doc=2218)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper addresses the problem of automatically assigning a Library of Congress Classification (LCC) to a work given its set of Library of Congress Subject Headings (LCSH). LCCs are organized in a tree: The root node of this hierarchy comprises all possible topics, and leaf nodes correspond to the most specialized topic areas defined. We describe a procedure that, given a resource identified by its LCSH, automatically places that resource in the LCC hierarchy. The procedure uses machine learning techniques and training data from a large library catalog to learn a model that maps from sets of LCSH to classifications from the LCC tree. We present empirical results for our technique showing its accuracy an an independent collection of 50,000 LCSH/LCC pairs.
Choi, B.; Peng, X.: Dynamic and hierarchical classification of Web pages (2004) 0.05
```
0.050309774 = product of:
  0.10061955 = sum of:
    0.10061955 = product of:
      0.2012391 = sum of:
        0.2012391 = weight(_text_:tree in 2555) [ClassicSimilarity], result of:
          0.2012391 = score(doc=2555,freq=4.0), product of:
            0.32745647 = queryWeight, product of:
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.049953517 = queryNorm
            0.6145522 = fieldWeight in 2555, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.046875 = fieldNorm(doc=2555)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet. Although many automatic classification systems have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of Web pages being added into the systems. They also require searching through all existing categories to make any classification. This article proposes a dynamic and hierarchical classification system that is capable of adding new categories as required, organising the Web pages into a tree structure, and classifying Web pages by searching through only one path of the tree. The proposed single-path search technique reduces the search complexity from (n) to (log(n)). Test results show that the system improves the accuracy of classification by 6 percent in comparison to related systems. The dynamic-category expansion technique also achieves satisfying results for adding new categories into the system as required.
Sun, A.; Lim, E.-P.; Ng, W.-K.: Performance measurement framework for hierarchical text classification (2003) 0.04
```
0.035574384 = product of:
  0.07114877 = sum of:
    0.07114877 = product of:
      0.14229754 = sum of:
        0.14229754 = weight(_text_:tree in 1808) [ClassicSimilarity], result of:
          0.14229754 = score(doc=1808,freq=2.0), product of:
            0.32745647 = queryWeight, product of:
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.049953517 = queryNorm
            0.43455404 = fieldWeight in 1808, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.046875 = fieldNorm(doc=1808)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Hierarchical text classification or simply hierarchical classification refers to assigning a document to one or more suitable categories from a hierarchical category space. In our literature survey, we have found that the existing hierarchical classification experiments used a variety of measures to evaluate performance. These performance measures often assume independence between categories and do not consider documents misclassified into categories that are similar or not far from the correct categories in the category tree. In this paper, we therefore propose new performance measures for hierarchicai classification. The proposed performance measures consist of category similarity measures and distance-based measures that consider the contributions of misclassified documents. Our experiments an hierarchical classification methods based an SVM classifiers and binary Naive Bayes classifiers showed that SVM classifiers perform better than Naive Bayes classifiers an Reuters-21578 collection according to the extended measures. A new classifier-centric measure called blocking measure is also defined to examine the performance of subtree classifiers in a top-down levelbased hierarchical classificatIon method.
Wang, J.: ¬An extensive study on automated Dewey Decimal Classification (2009) 0.03
```
0.029645318 = product of:
  0.059290636 = sum of:
    0.059290636 = product of:
      0.11858127 = sum of:
        0.11858127 = weight(_text_:tree in 3172) [ClassicSimilarity], result of:
          0.11858127 = score(doc=3172,freq=2.0), product of:
            0.32745647 = queryWeight, product of:
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.049953517 = queryNorm
            0.36212835 = fieldWeight in 3172, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3172)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this paper, we present a theoretical analysis and extensive experiments on the automated assignment of Dewey Decimal Classification (DDC) classes to bibliographic data with a supervised machine-learning approach. Library classification systems, such as the DDC, impose great obstacles on state-of-art text categorization (TC) technologies, including deep hierarchy, data sparseness, and skewed distribution. We first analyze statistically the document and category distributions over the DDC, and discuss the obstacles imposed by bibliographic corpora and library classification schemes on TC technology. To overcome these obstacles, we propose an innovative algorithm to reshape the DDC structure into a balanced virtual tree by balancing the category distribution and flattening the hierarchy. To improve the classification effectiveness to a level acceptable to real-world applications, we propose an interactive classification model that is able to predict a class of any depth within a limited number of user interactions. The experiments are conducted on a large bibliographic collection created by the Library of Congress within the science and technology domains over 10 years. With no more than three interactions, a classification accuracy of nearly 90% is achieved, thus providing a practical solution to the automatic bibliographic classification problem.
Golub, K.; Lykke, M.: Automated classification of web pages in hierarchical browsing (2009) 0.03
```
0.029645318 = product of:
  0.059290636 = sum of:
    0.059290636 = product of:
      0.11858127 = sum of:
        0.11858127 = weight(_text_:tree in 3614) [ClassicSimilarity], result of:
          0.11858127 = score(doc=3614,freq=2.0), product of:
            0.32745647 = queryWeight, product of:
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.049953517 = queryNorm
            0.36212835 = fieldWeight in 3614, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5552235 = idf(docFreq=170, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3614)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme. Design/methodology/approach - A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes. Findings - The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness. Research limitations/implications - Further research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation. Practical implications - Improvements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated. Originality/value - A user-based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.02

0.020304035 = product of:
  0.04060807 = sum of:
    0.04060807 = product of:
      0.08121614 = sum of:
        0.08121614 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.08121614 = score(doc=1046,freq=2.0), product of:
            0.17492871 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049953517 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 5. 5.2003 14:17:22

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01

0.011844021 = product of:
  0.023688043 = sum of:
    0.023688043 = product of:
      0.047376085 = sum of:
        0.047376085 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.047376085 = score(doc=2560,freq=2.0), product of:
            0.17492871 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049953517 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 9.2008 18:31:54

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.01

0.010152018 = product of:
  0.020304035 = sum of:
    0.020304035 = product of:
      0.04060807 = sum of:
        0.04060807 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.04060807 = score(doc=2760,freq=2.0), product of:
            0.17492871 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049953517 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2009 19:11:54

Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.01

0.010152018 = product of:
  0.020304035 = sum of:
    0.020304035 = product of:
      0.04060807 = sum of:
        0.04060807 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
          0.04060807 = score(doc=3051,freq=2.0), product of:
            0.17492871 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049953517 = queryNorm
            0.23214069 = fieldWeight in 3051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 8.2009 19:51:28

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.01

0.008460015 = product of:
  0.01692003 = sum of:
    0.01692003 = product of:
      0.03384006 = sum of:
        0.03384006 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.03384006 = score(doc=2765,freq=2.0), product of:
            0.17492871 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049953517 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2009 19:14:43

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.01

0.006768012 = product of:
  0.013536024 = sum of:
    0.013536024 = product of:
      0.027072048 = sum of:
        0.027072048 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.027072048 = score(doc=2741,freq=2.0), product of:
            0.17492871 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049953517 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 12. 9.2004 9:56:22

Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.01

0.006768012 = product of:
  0.013536024 = sum of:
    0.013536024 = product of:
      0.027072048 = sum of:
        0.027072048 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
          0.027072048 = score(doc=3284,freq=2.0), product of:
            0.17492871 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049953517 = queryNorm
            0.15476047 = fieldWeight in 3284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=3284)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 1.2010 14:41:24

Search (14 results, page 1 of 1)

Authors

Languages

Themes