Search (59 results, page 1 of 3)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.16

0.16370177 = product of:
  0.21826902 = sum of:
    0.040865026 = product of:
      0.1634601 = sum of:
        0.1634601 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.1634601 = score(doc=562,freq=2.0), product of:
            0.29084495 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0343058 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.25 = coord(1/4)
    0.1634601 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1634601 = score(doc=562,freq=2.0), product of:
        0.29084495 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0343058 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.013943886 = product of:
      0.027887773 = sum of:
        0.027887773 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.027887773 = score(doc=562,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.02

0.021064898 = product of:
  0.042129796 = sum of:
    0.018889984 = product of:
      0.07555994 = sum of:
        0.07555994 = weight(_text_:learning in 2748) [ClassicSimilarity], result of:
          0.07555994 = score(doc=2748,freq=2.0), product of:
            0.15317118 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0343058 = queryNorm
            0.49330387 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.25 = coord(1/4)
    0.023239812 = product of:
      0.046479624 = sum of:
        0.046479624 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.046479624 = score(doc=2748,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 1. 2.2016 18:25:22

Ko, Y.; Seo, J.: Text classification from unlabeled documents with bootstrapping and feature projection techniques (2009) 0.01
```
0.008014342 = product of:
  0.032057367 = sum of:
    0.032057367 = product of:
      0.12822947 = sum of:
        0.12822947 = weight(_text_:learning in 2452) [ClassicSimilarity], result of:
          0.12822947 = score(doc=2452,freq=16.0), product of:
            0.15317118 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0343058 = queryNorm
            0.83716446 = fieldWeight in 2452, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.046875 = fieldNorm(doc=2452)
      0.25 = coord(1/4)
  0.25 = coord(1/4)
```
Abstract

Many machine learning algorithms have been applied to text classification tasks. In the machine learning paradigm, a general inductive process automatically builds a text classifier by learning, generally known as supervised learning. However, the supervised learning approaches have some problems. The most notable problem is that they require a large number of labeled training documents for accurate learning. While unlabeled documents are easily collected and plentiful, labeled documents are difficultly generated because a labeling task must be done by human developers. In this paper, we propose a new text classification method based on unsupervised or semi-supervised learning. The proposed method launches text classification tasks with only unlabeled documents and the title word of each category for learning, and then it automatically learns text classifier by using bootstrapping and feature projection techniques. The results of experiments showed that the proposed method achieved reasonably useful performance compared to a supervised method. If the proposed method is used in a text classification task, building text classification systems will become significantly faster and less expensive.

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.006971943 = product of:
  0.027887773 = sum of:
    0.027887773 = product of:
      0.055775546 = sum of:
        0.055775546 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.055775546 = score(doc=1046,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 5. 5.2003 14:17:22

Zhou, G.D.; Zhang, M.; Ji, D.H.; Zhu, Q.M.: Hierarchical learning strategy in semantic relation extraction (2008) 0.01
```
0.006940623 = product of:
  0.027762491 = sum of:
    0.027762491 = product of:
      0.111049965 = sum of:
        0.111049965 = weight(_text_:learning in 2077) [ClassicSimilarity], result of:
          0.111049965 = score(doc=2077,freq=12.0), product of:
            0.15317118 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0343058 = queryNorm
            0.7250056 = fieldWeight in 2077, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.046875 = fieldNorm(doc=2077)
      0.25 = coord(1/4)
  0.25 = coord(1/4)
```
Abstract

This paper proposes a novel hierarchical learning strategy to deal with the data sparseness problem in semantic relation extraction by modeling the commonality among related classes. For each class in the hierarchy either manually predefined or automatically clustered, a discriminative function is determined in a top-down way. As the upper-level class normally has much more positive training examples than the lower-level class, the corresponding discriminative function can be determined more reliably and guide the discriminative function learning in the lower-level one more effectively, which otherwise might suffer from limited training data. In this paper, two classifier learning approaches, i.e. the simple perceptron algorithm and the state-of-the-art Support Vector Machines, are applied using the hierarchical learning strategy. Moreover, several kinds of class hierarchies either manually predefined or automatically clustered are explored and compared. Evaluation on the ACE RDC 2003 and 2004 corpora shows that the hierarchical learning strategy much improves the performance on least- and medium-frequent relations.
Oberhauser, O.: Automatisches Klassifizieren : Verfahren zur Erschließung elektronischer Dokumente (2004) 0.01
```
0.0059502195 = product of:
  0.023800878 = sum of:
    0.023800878 = product of:
      0.047601756 = sum of:
        0.047601756 = weight(_text_:lernen in 2487) [ClassicSimilarity], result of:
          0.047601756 = score(doc=2487,freq=2.0), product of:
            0.19222628 = queryWeight, product of:
              5.6033173 = idf(docFreq=442, maxDocs=44218)
              0.0343058 = queryNorm
            0.24763398 = fieldWeight in 2487, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6033173 = idf(docFreq=442, maxDocs=44218)
              0.03125 = fieldNorm(doc=2487)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Automatisches Klassifizieren von Textdokumenten bedeutet die maschinelle Zuordnung jeweils einer oder mehrerer Notationen eines vorgegebenen Klassifikationssystems zu natürlich-sprachlichen Texten mithilfe eines geeigneten Algorithmus. In der vorliegenden Arbeit wird in Form einer umfassenden Literaturstudie ein aktueller Kenntnisstand zu den Ein-satzmöglichkeiten des automatischen Klassifizierens für die sachliche Erschliessung von elektronischen Dokumenten, insbesondere von Web-Ressourcen, erarbeitet. Dies betrifft zum einen den methodischen Aspekt und zum anderen die in relevanten Projekten und Anwendungen gewonnenen Erfahrungen. In methodischer Hinsicht gelten heute statistische Verfahren, die auf dem maschinellen Lernen basieren und auf der Grundlage bereits klassifizierter Beispieldokumente ein Modell - einen "Klassifikator" - erstellen, das zur Klassifizierung neuer Dokumente verwendet werden kann, als "state-of-the-art". Die vier in den 1990er Jahren an den Universitäten Lund, Wolverhampton und Oldenburg sowie bei OCLC (Dublin, OH) durchgeführten "grossen" Projekte zum automatischen Klassifizieren von Web-Ressourcen, die in dieser Arbeit ausführlich analysiert werden, arbeiteten allerdings noch mit einfacheren bzw. älteren methodischen Ansätzen. Diese Projekte bedeuten insbesondere aufgrund ihrer Verwendung etablierter bibliothekarischer Klassifikationssysteme einen wichtigen Erfahrungsgewinn, selbst wenn sie bisher nicht zu permanenten und qualitativ zufriedenstellenden Diensten für die Erschliessung elektronischer Ressourcen geführt haben. Die Analyse der weiteren einschlägigen Anwendungen und Projekte lässt erkennen, dass derzeit in den Bereichen Patent- und Mediendokumentation die aktivsten Bestrebungen bestehen, Systeme für die automatische klassifikatorische Erschliessung elektronischer Dokumente im laufenden operativen Betrieb einzusetzen. Dabei dominieren jedoch halbautomatische Systeme, die menschliche Bearbeiter durch Klassifizierungsvorschläge unterstützen, da die gegenwärtig erreichbare Klassifizierungsgüte für eine Vollautomatisierung meist noch nicht ausreicht. Weitere interessante Anwendungen und Projekte finden sich im Bereich von Web-Portalen, Suchmaschinen und (kommerziellen) Informationsdiensten, während sich etwa im Bibliothekswesen kaum nennenswertes Interesse an einer automatischen Klassifizierung von Büchern bzw. bibliographischen Datensätzen registrieren lässt. Die Studie schliesst mit einer Diskussion der wichtigsten Projekte und Anwendungen sowie einiger im Zusammenhang mit dem automatischen Klassifizieren relevanter Fragestellungen und Themen.

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01

0.005809953 = product of:
  0.023239812 = sum of:
    0.023239812 = product of:
      0.046479624 = sum of:
        0.046479624 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.046479624 = score(doc=611,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 8.2009 12:54:24

Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.01
```
0.0057257228 = product of:
  0.022902891 = sum of:
    0.022902891 = product of:
      0.091611564 = sum of:
        0.091611564 = weight(_text_:learning in 1595) [ClassicSimilarity], result of:
          0.091611564 = score(doc=1595,freq=6.0), product of:
            0.15317118 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0343058 = queryNorm
            0.59809923 = fieldWeight in 1595, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1595)
      0.25 = coord(1/4)
  0.25 = coord(1/4)
```
Abstract

This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based an the divide-and-conquer principle. The method is evaluated using backpropagation neural networks, such as the machine learning algorithm, that leam to assign MeSH categories to a subset of MEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers, are provided. The results indicate that the use of hierarchical structures improves Performance significantly.
Sebastiani, F.: Machine learning in automated text categorization (2002) 0.01
```
0.0056669954 = product of:
  0.022667982 = sum of:
    0.022667982 = product of:
      0.09067193 = sum of:
        0.09067193 = weight(_text_:learning in 3389) [ClassicSimilarity], result of:
          0.09067193 = score(doc=3389,freq=8.0), product of:
            0.15317118 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0343058 = queryNorm
            0.59196466 = fieldWeight in 3389, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.046875 = fieldNorm(doc=3389)
      0.25 = coord(1/4)
  0.25 = coord(1/4)
```
Abstract

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based an machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Pong, J.Y.-H.; Kwok, R.C.-W.; Lau, R.Y.-K.; Hao, J.-X.; Wong, P.C.-C.: ¬A comparative study of two automatic document classification methods in a library setting (2008) 0.01
```
0.0052799117 = product of:
  0.021119647 = sum of:
    0.021119647 = product of:
      0.08447859 = sum of:
        0.08447859 = weight(_text_:learning in 2532) [ClassicSimilarity], result of:
          0.08447859 = score(doc=2532,freq=10.0), product of:
            0.15317118 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0343058 = queryNorm
            0.55153054 = fieldWeight in 2532, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2532)
      0.25 = coord(1/4)
  0.25 = coord(1/4)
```
Abstract

In current library practice, trained human experts usually carry out document cataloguing and indexing based on a manual approach. With the explosive growth in the number of electronic documents available on the Internet and digital libraries, it is increasingly difficult for library practitioners to categorize both electronic documents and traditional library materials using just a manual approach. To improve the effectiveness and efficiency of document categorization at the library setting, more in-depth studies of using automatic document classification methods to categorize library items are required. Machine learning research has advanced rapidly in recent years. However, applying machine learning techniques to improve library practice is still a relatively unexplored area. This paper illustrates the design and development of a machine learning based automatic document classification system to alleviate the manual categorization problem encountered within the library setting. Two supervised machine learning algorithms have been tested. Our empirical tests show that supervised machine learning algorithms in general, and the k-nearest neighbours (KNN) algorithm in particular, can be used to develop an effective document classification system to enhance current library practice. Moreover, some concrete recommendations regarding how to practically apply the KNN algorithm to develop automatic document classification in a library setting are made. To our best knowledge, this is the first in-depth study of applying the KNN algorithm to automatic document classification based on the widely used LCC classification scheme adopted by many large libraries.
Borodin, Y.; Polishchuk, V.; Mahmud, J.; Ramakrishnan, I.V.; Stent, A.: Live and learn from mistakes : a lightweight system for document classification (2013) 0.01
```
0.0052799117 = product of:
  0.021119647 = sum of:
    0.021119647 = product of:
      0.08447859 = sum of:
        0.08447859 = weight(_text_:learning in 2722) [ClassicSimilarity], result of:
          0.08447859 = score(doc=2722,freq=10.0), product of:
            0.15317118 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0343058 = queryNorm
            0.55153054 = fieldWeight in 2722, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2722)
      0.25 = coord(1/4)
  0.25 = coord(1/4)
```
Abstract

We present a Life-Long Learning from Mistakes (3LM) algorithm for document classification, which could be used in various scenarios such as spam filtering, blog classification, and web resource categorization. We extend the ideas of online clustering and batch-mode centroid-based classification to online learning with negative feedback. The 3LM is a competitive learning algorithm, which avoids over-smoothing, characteristic of the centroid-based classifiers, by using a different class representative, which we call clusterhead. The clusterheads competing for vector-space dominance are drawn toward misclassified documents, eventually bringing the model to a "balanced state" for a fixed distribution of documents. Subsequently, the clusterheads oscillate between the misclassified documents, heuristically minimizing the rate of misclassifications, an NP-complete problem. Further, the 3LM algorithm prevents over-fitting by "leashing" the clusterheads to their respective centroids. A clusterhead provably converges if its class can be separated by a hyper-plane from all other classes. Lifelong learning with fixed learning rate allows 3LM to adapt to possibly changing distribution of the data and continually learn and unlearn document classes. We report on our experiments, which demonstrate high accuracy of document classification on Reuters21578, OHSUMED, and TREC07p-spam datasets. The 3LM algorithm did not show over-fitting, while consistently outperforming centroid-based, Naïve Bayes, C4.5, AdaBoost, kNN, and SVM whose accuracy had been reported on the same three corpora.
Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.00
```
0.004907762 = product of:
  0.019631049 = sum of:
    0.019631049 = product of:
      0.078524195 = sum of:
        0.078524195 = weight(_text_:learning in 3390) [ClassicSimilarity], result of:
          0.078524195 = score(doc=3390,freq=6.0), product of:
            0.15317118 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0343058 = queryNorm
            0.51265645 = fieldWeight in 3390, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.046875 = fieldNorm(doc=3390)
      0.25 = coord(1/4)
  0.25 = coord(1/4)
```
Abstract

The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to 1960. Until the late '80s, the dominant approach to the problem involved knowledge-engineering automatic categorisers, i.e. manually building a set of rules encoding expert knowledge an how to classify documents. In the '90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest. A newer paradigm based an machine learning has superseded the previous approach. Within this paradigm, a general inductive process automatically builds a classifier by "learning", from a set of previously classified documents, the characteristics of one or more categories; the advantages are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this tutorial we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues of document indexing, classifier construction, and classifier evaluation, will be touched upon.
Billal, B.; Fonseca, A.; Sadat, F.; Lounis, H.: Semi-supervised learning and social media text analysis towards multi-labeling categorization (2017) 0.00
```
0.004627082 = product of:
  0.018508328 = sum of:
    0.018508328 = product of:
      0.07403331 = sum of:
        0.07403331 = weight(_text_:learning in 4095) [ClassicSimilarity], result of:
          0.07403331 = score(doc=4095,freq=12.0), product of:
            0.15317118 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0343058 = queryNorm
            0.4833371 = fieldWeight in 4095, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.03125 = fieldNorm(doc=4095)
      0.25 = coord(1/4)
  0.25 = coord(1/4)
```
Abstract

In traditional text classification, classes are mutually exclusive, i.e. it is not possible to have one text or text fragment classified into more than one class. On the other hand, in multi-label classification an individual text may belong to several classes simultaneously. This type of classification is required by a large number of current applications such as big data classification, images and video annotation. Supervised learning is the most used type of machine learning in the classification task. It requires large quantities of labeled data and the intervention of a human tagger in the creation of the training sets. When the data sets become very large or heavily noisy, this operation can be tedious, prone to error and time consuming. In this case, semi-supervised learning, which requires only few labels, is a better choice. In this paper, we study and evaluate several methods to address the problem of multi-label classification using semi-supervised learning and data from social networks. First, we propose a linguistic pre-processing involving tokeni-sation, recognition of named entities and hashtag segmentation in order to decrease the noise in this type of massive and unstructured real data and then we perform a word sense disambiguation using WordNet. Second, several experiments related to multi-label classification and semi-supervised learning are carried out on these data sets and compared to each other. These evaluations compare the results of the approaches considered. This paper proposes a method for combining semi-supervised methods with a graph method for the extraction of subjects in social networks using a multi-label classification approach. Experiments show that the performance of the proposed model increases in 4 p.p. the precision of the classification when compared to a baseline.
Fagni, T.; Sebastiani, F.: Selecting negative examples for hierarchical text classification: An experimental comparison (2010) 0.00
```
0.004089802 = product of:
  0.016359208 = sum of:
    0.016359208 = product of:
      0.06543683 = sum of:
        0.06543683 = weight(_text_:learning in 4101) [ClassicSimilarity], result of:
          0.06543683 = score(doc=4101,freq=6.0), product of:
            0.15317118 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0343058 = queryNorm
            0.42721373 = fieldWeight in 4101, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4101)
      0.25 = coord(1/4)
  0.25 = coord(1/4)
```
Abstract

Hierarchical text classification (HTC) approaches have recently attracted a lot of interest on the part of researchers in human language technology and machine learning, since they have been shown to bring about equal, if not better, classification accuracy with respect to their "flat" counterparts while allowing exponential time savings at both learning and classification time. A typical component of HTC methods is a "local" policy for selecting negative examples: Given a category c, its negative training examples are by default identified with the training examples that are negative for c and positive for the categories which are siblings of c in the hierarchy. However, this policy has always been taken for granted and never been subjected to careful scrutiny since first proposed 15 years ago. This article proposes a thorough experimental comparison between this policy and three other policies for the selection of negative examples in HTC contexts, one of which (BEST LOCAL (k)) is being proposed for the first time in this article. We compare these policies on the hierarchical versions of three supervised learning algorithms (boosting, support vector machines, and naïve Bayes) by performing experiments on two standard TC datasets, REUTERS-21578 and RCV1-V2.

Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.00

0.004066967 = product of:
  0.016267868 = sum of:
    0.016267868 = product of:
      0.032535736 = sum of:
        0.032535736 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
          0.032535736 = score(doc=141,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.2708308 = fieldWeight in 141, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Pages: S.1-22

Dubin, D.: Dimensions and discriminability (1998) 0.00

0.004066967 = product of:
  0.016267868 = sum of:
    0.016267868 = product of:
      0.032535736 = sum of:
        0.032535736 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.032535736 = score(doc=2338,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 9.1997 19:16:05

Automatic classification research at OCLC (2002) 0.00

0.004066967 = product of:
  0.016267868 = sum of:
    0.016267868 = product of:
      0.032535736 = sum of:
        0.032535736 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
          0.032535736 = score(doc=1563,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.2708308 = fieldWeight in 1563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1563)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 5. 5.2003 9:22:09

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.00

0.004066967 = product of:
  0.016267868 = sum of:
    0.016267868 = product of:
      0.032535736 = sum of:
        0.032535736 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.032535736 = score(doc=1673,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 1. 8.1996 22:08:06

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.00

0.004066967 = product of:
  0.016267868 = sum of:
    0.016267868 = product of:
      0.032535736 = sum of:
        0.032535736 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
          0.032535736 = score(doc=5273,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.2708308 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 7.2006 16:24:52

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.00

0.004066967 = product of:
  0.016267868 = sum of:
    0.016267868 = product of:
      0.032535736 = sum of:
        0.032535736 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.032535736 = score(doc=2560,freq=2.0), product of:
            0.120133065 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0343058 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 9.2008 18:31:54

Search (59 results, page 1 of 3)

Authors

Years

Languages

Types

Themes