Search (68 results, page 1 of 4)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.31

0.31118122 = product of:
  0.49788997 = sum of:
    0.0489538 = product of:
      0.1468614 = sum of:
        0.1468614 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.1468614 = score(doc=562,freq=2.0), product of:
            0.26131085 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.030822188 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.1468614 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1468614 = score(doc=562,freq=2.0), product of:
        0.26131085 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.030822188 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.1468614 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1468614 = score(doc=562,freq=2.0), product of:
        0.26131085 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.030822188 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.1468614 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1468614 = score(doc=562,freq=2.0), product of:
        0.26131085 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.030822188 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.008351962 = product of:
      0.025055885 = sum of:
        0.025055885 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.025055885 = score(doc=562,freq=2.0), product of:
            0.10793405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.030822188 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
  0.625 = coord(5/8)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01
```
0.0061675874 = product of:
  0.02467035 = sum of:
    0.01771038 = product of:
      0.05313114 = sum of:
        0.05313114 = weight(_text_:problem in 1107) [ClassicSimilarity], result of:
          0.05313114 = score(doc=1107,freq=6.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.4061259 = fieldWeight in 1107, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.33333334 = coord(1/3)
    0.0069599687 = product of:
      0.020879906 = sum of:
        0.020879906 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.020879906 = score(doc=1107,freq=2.0), product of:
            0.10793405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.030822188 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.33333334 = coord(1/3)
  0.25 = coord(2/8)
```
Abstract

Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.

Date

28.10.2013 19:22:57
Kwon, O.W.; Lee, J.H.: Text categorization based on k-nearest neighbor approach for web site classification (2003) 0.00
```
0.0043120594 = product of:
  0.017248238 = sum of:
    0.010225092 = product of:
      0.030675275 = sum of:
        0.030675275 = weight(_text_:problem in 1070) [ClassicSimilarity], result of:
          0.030675275 = score(doc=1070,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.23447686 = fieldWeight in 1070, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1070)
      0.33333334 = coord(1/3)
    0.007023146 = product of:
      0.021069437 = sum of:
        0.021069437 = weight(_text_:29 in 1070) [ClassicSimilarity], result of:
          0.021069437 = score(doc=1070,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.19432661 = fieldWeight in 1070, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1070)
      0.33333334 = coord(1/3)
  0.25 = coord(2/8)
```
Abstract

Automatic categorization is a viable method to deal with the scaling problem on the World Wide Web. For Web site classification, this paper proposes the use of Web pages linked with the home page in a different manner from the sole use of home pages in previous research. To implement our proposed method, we derive a scheme for Web site classification based on the k-nearest neighbor (k-NN) approach. It consists of three phases: Web page selection (connectivity analysis), Web page classification, and Web site classification. Given a Web site, the Web page selection chooses several representative Web pages using connectivity analysis. The k-NN classifier next classifies each of the selected Web pages. Finally, the classified Web pages are extended to a classification of the entire Web site. To improve performance, we supplement the k-NN approach with a feature selection method and a term weighting scheme using markup tags, and also reform its document-document similarity measure. In our experiments on a Korean commercial Web directory, the proposed system, using both a home page and its linked pages, improved the performance of micro-averaging breakeven point by 30.02%, compared with an ordinary classification which uses a home page only.

Date

27.12.2007 17:32:29
Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.00
```
0.0043120594 = product of:
  0.017248238 = sum of:
    0.010225092 = product of:
      0.030675275 = sum of:
        0.030675275 = weight(_text_:problem in 967) [ClassicSimilarity], result of:
          0.030675275 = score(doc=967,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.23447686 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.33333334 = coord(1/3)
    0.007023146 = product of:
      0.021069437 = sum of:
        0.021069437 = weight(_text_:29 in 967) [ClassicSimilarity], result of:
          0.021069437 = score(doc=967,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.19432661 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.33333334 = coord(1/3)
  0.25 = coord(2/8)
```
Abstract

Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.

Date

25. 6.2013 19:05:29
Orwig, R.E.; Chen, H.; Nunamaker, J.F.: ¬A graphical, self-organizing approach to classifying electronic meeting output (1997) 0.00
```
0.003578782 = product of:
  0.028630257 = sum of:
    0.028630257 = product of:
      0.08589077 = sum of:
        0.08589077 = weight(_text_:problem in 6928) [ClassicSimilarity], result of:
          0.08589077 = score(doc=6928,freq=8.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.6565352 = fieldWeight in 6928, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6928)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

Describes research in the application of a Kohonen Self-Organizing Map (SOM) to the problem of classification of electronic brainstorming output and an evaluation of the results. Describes an electronic meeting system and describes the classification problem that exists in the group problem solving process. Surveys the literature concerning classification. Describes the application of the Kohonen SOM to the meeting output classification problem. Describes an experiment that evaluated the classification performed by the Kohonen SOM by comparing it with those of a human expert and a Hopfield neural network. Discusses conclusions and directions for future research

Major, R.L.; Ragsdale, C.T.: ¬An aggregation approach to the classification problem using multiple prediction experts (2000) 0.00

0.0030675277 = product of:
  0.024540221 = sum of:
    0.024540221 = product of:
      0.07362066 = sum of:
        0.07362066 = weight(_text_:problem in 3789) [ClassicSimilarity], result of:
          0.07362066 = score(doc=3789,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.5627445 = fieldWeight in 3789, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.09375 = fieldNorm(doc=3789)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Panyr, J.: STEINADLER: ein Verfahren zur automatischen Deskribierung und zur automatischen thematischen Klassifikation (1978) 0.00

0.0028092582 = product of:
  0.022474065 = sum of:
    0.022474065 = product of:
      0.067422196 = sum of:
        0.067422196 = weight(_text_:29 in 5169) [ClassicSimilarity], result of:
          0.067422196 = score(doc=5169,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.6218451 = fieldWeight in 5169, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.125 = fieldNorm(doc=5169)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Source: Nachrichten für Dokumentation. 29(1978), S.92-96

Li, T.; Zhu, S.; Ogihara, M.: Text categorization via generalized discriminant analysis (2008) 0.00
```
0.0022137975 = product of:
  0.01771038 = sum of:
    0.01771038 = product of:
      0.05313114 = sum of:
        0.05313114 = weight(_text_:problem in 2119) [ClassicSimilarity], result of:
          0.05313114 = score(doc=2119,freq=6.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.4061259 = fieldWeight in 2119, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2119)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

Text categorization is an important research area and has been receiving much attention due to the growth of the on-line information and of Internet. Automated text categorization is generally cast as a multi-class classification problem. Much of previous work focused on binary document classification problems. Support vector machines (SVMs) excel in binary classification, but the elegant theory behind large-margin hyperplane cannot be easily extended to multi-class text classification. In addition, the training time and scaling are also important concerns. On the other hand, other techniques naturally extensible to handle multi-class classification are generally not as accurate as SVM. This paper presents a simple and efficient solution to multi-class text categorization. Classification problems are first formulated as optimization via discriminant analysis. Text categorization is then cast as the problem of finding coordinate transformations that reflects the inherent similarity from the data. While most of the previous approaches decompose a multi-class classification problem into multiple independent binary classification tasks, the proposed approach enables direct multi-class classification. By using generalized singular value decomposition (GSVD), a coordinate transformation that reflects the inherent class structure indicated by the generalized singular values is identified. Extensive experiments demonstrate the efficiency and effectiveness of the proposed approach.

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.00

0.0020879905 = product of:
  0.016703924 = sum of:
    0.016703924 = product of:
      0.05011177 = sum of:
        0.05011177 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.05011177 = score(doc=1046,freq=2.0), product of:
            0.10793405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.030822188 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 5. 5.2003 14:17:22

Krauth, J.: Evaluation von Verfahren der automatischen Klassifikation (1983) 0.00

0.0020450184 = product of:
  0.016360147 = sum of:
    0.016360147 = product of:
      0.04908044 = sum of:
        0.04908044 = weight(_text_:problem in 111) [ClassicSimilarity], result of:
          0.04908044 = score(doc=111,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.375163 = fieldWeight in 111, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0625 = fieldNorm(doc=111)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Abstract: Ein wichtiges Problem der automatischen Klassifikation ist die Frage der Bewertung der Ergebnisse von Klassifikationsverfahren. Hierunter fallen die Aspekte der Beurteilung der Güte von Klassifikationen, des Vergleichs von Klassifikationen, der Validität von Klassifikationen und der Stabilität von Klassifikationsverfahren. Es wird ein Überblick über die verschiedenen Ansätze gegeben

Rose, J.R.; Gasteiger, J.: HORACE: an automatic system for the hierarchical classification of chemical reactions (1994) 0.00
```
0.001789391 = product of:
  0.014315128 = sum of:
    0.014315128 = product of:
      0.042945385 = sum of:
        0.042945385 = weight(_text_:problem in 7696) [ClassicSimilarity], result of:
          0.042945385 = score(doc=7696,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.3282676 = fieldWeight in 7696, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7696)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

Describes an automatic classification system for classifying chemical reactions. A detailed study of the classification of chemical reactions, based on topological and physicochemical features, is followed by an analysis of the hierarchical classification produced by the HORACE algorithm (Hierarchical Organization of Reactions through Attribute and Condition Eduction), which combines both approaches in a synergistic manner. The searching and updating of reaction hierarchies is demonstrated with the hierarchies produced for 2 data sets by the HORACE algorithm. Shows that reaction hierarchies provide an efficient access to reaction information and indicate the main reaction types for a given reaction scheme, define the scope of a reaction type, enable searchers to find unusual reactions, and can help in locating the reactions most relevant for a given problem
Dang, E.K.F.; Luk, R.W.P.; Ho, K.S.; Chan, S.C.F.; Lee, D.L.: ¬A new measure of clustering effectiveness : algorithms and experimental studies (2008) 0.00
```
0.001789391 = product of:
  0.014315128 = sum of:
    0.014315128 = product of:
      0.042945385 = sum of:
        0.042945385 = weight(_text_:problem in 1367) [ClassicSimilarity], result of:
          0.042945385 = score(doc=1367,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.3282676 = fieldWeight in 1367, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1367)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

We propose a new optimal clustering effectiveness measure, called CS1, based on a combination of clusters rather than selecting a single optimal cluster as in the traditional MK1 measure. For hierarchical clustering, we present an algorithm to compute CS1, defined by seeking the optimal combinations of disjoint clusters obtained by cutting the hierarchical structure at a certain similarity level. By reformulating the optimization to a 0-1 linear fractional programming problem, we demonstrate that an exact solution can be obtained by a linear time algorithm. We further discuss how our approach can be generalized to more general problems involving overlapping clusters, and we show how optimal estimates can be obtained by greedy algorithms.

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.00

0.0017399922 = product of:
  0.013919937 = sum of:
    0.013919937 = product of:
      0.04175981 = sum of:
        0.04175981 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.04175981 = score(doc=611,freq=2.0), product of:
            0.10793405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.030822188 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 22. 8.2009 12:54:24

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.00

0.0017399922 = product of:
  0.013919937 = sum of:
    0.013919937 = product of:
      0.04175981 = sum of:
        0.04175981 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.04175981 = score(doc=2748,freq=2.0), product of:
            0.10793405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.030822188 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 1. 2.2016 18:25:22

Frank, E.; Paynter, G.W.: Predicting Library of Congress Classifications from Library of Congress Subject Headings (2004) 0.00
```
0.0015337638 = product of:
  0.012270111 = sum of:
    0.012270111 = product of:
      0.03681033 = sum of:
        0.03681033 = weight(_text_:problem in 2218) [ClassicSimilarity], result of:
          0.03681033 = score(doc=2218,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.28137225 = fieldWeight in 2218, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=2218)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

This paper addresses the problem of automatically assigning a Library of Congress Classification (LCC) to a work given its set of Library of Congress Subject Headings (LCSH). LCCs are organized in a tree: The root node of this hierarchy comprises all possible topics, and leaf nodes correspond to the most specialized topic areas defined. We describe a procedure that, given a resource identified by its LCSH, automatically places that resource in the LCC hierarchy. The procedure uses machine learning techniques and training data from a large library catalog to learn a model that maps from sets of LCSH to classifications from the LCC tree. We present empirical results for our technique showing its accuracy an an independent collection of 50,000 LCSH/LCC pairs.
Sebastiani, F.: Machine learning in automated text categorization (2002) 0.00
```
0.0015337638 = product of:
  0.012270111 = sum of:
    0.012270111 = product of:
      0.03681033 = sum of:
        0.03681033 = weight(_text_:problem in 3389) [ClassicSimilarity], result of:
          0.03681033 = score(doc=3389,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.28137225 = fieldWeight in 3389, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=3389)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based an machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.00
```
0.0015337638 = product of:
  0.012270111 = sum of:
    0.012270111 = product of:
      0.03681033 = sum of:
        0.03681033 = weight(_text_:problem in 3390) [ClassicSimilarity], result of:
          0.03681033 = score(doc=3390,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.28137225 = fieldWeight in 3390, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=3390)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to 1960. Until the late '80s, the dominant approach to the problem involved knowledge-engineering automatic categorisers, i.e. manually building a set of rules encoding expert knowledge an how to classify documents. In the '90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest. A newer paradigm based an machine learning has superseded the previous approach. Within this paradigm, a general inductive process automatically builds a classifier by "learning", from a set of previously classified documents, the characteristics of one or more categories; the advantages are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this tutorial we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues of document indexing, classifier construction, and classifier evaluation, will be touched upon.
Liu, R.-L.: Dynamic category profiling for text filtering and classification (2007) 0.00
```
0.0015337638 = product of:
  0.012270111 = sum of:
    0.012270111 = product of:
      0.03681033 = sum of:
        0.03681033 = weight(_text_:problem in 900) [ClassicSimilarity], result of:
          0.03681033 = score(doc=900,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.28137225 = fieldWeight in 900, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=900)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

Information is often represented in text form and classified into categories. Unfortunately, automatic classifiers often conduct misclassifications. One of the reasons is that the documents for training the classifiers are mainly from the categories, leading the classifiers to derive category profiles for distinguishing each category from others, rather than measuring the extent to which a document's content overlaps that of a category. To tackle the problem, we present a technique DP4FC that selects suitable features to construct category profiles to distinguish relevant documents from irrelevant documents. More specially, DP4FC is associated with various classifiers. Upon receiving a document, it helps the classifiers to create dynamic category profiles with respect to the document, and accordingly make proper decisions in filtering and classification. Theoretical analysis and empirical results show that DP4FC may significantly promote different classifiers' performances under various environments.
Yoon, Y.; Lee, G.G.: Efficient implementation of associative classifiers for document classification (2007) 0.00
```
0.0015337638 = product of:
  0.012270111 = sum of:
    0.012270111 = product of:
      0.03681033 = sum of:
        0.03681033 = weight(_text_:problem in 909) [ClassicSimilarity], result of:
          0.03681033 = score(doc=909,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.28137225 = fieldWeight in 909, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=909)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

In practical text classification tasks, the ability to interpret the classification result is as important as the ability to classify exactly. Associative classifiers have many favorable characteristics such as rapid training, good classification accuracy, and excellent interpretation. However, associative classifiers also have some obstacles to overcome when they are applied in the area of text classification. The target text collection generally has a very high dimension, thus the training process might take a very long time. We propose a feature selection based on the mutual information between the word and class variables to reduce the space dimension of the associative classifiers. In addition, the training process of the associative classifier produces a huge amount of classification rules, which makes the prediction with a new document ineffective. We resolve this by introducing a new efficient method for storing and pruning classification rules. This method can also be used when predicting a test document. Experimental results using the 20-newsgroups dataset show many benefits of the associative classification in both training and predicting when applied to a real world problem.
Zhou, G.D.; Zhang, M.; Ji, D.H.; Zhu, Q.M.: Hierarchical learning strategy in semantic relation extraction (2008) 0.00
```
0.0015337638 = product of:
  0.012270111 = sum of:
    0.012270111 = product of:
      0.03681033 = sum of:
        0.03681033 = weight(_text_:problem in 2077) [ClassicSimilarity], result of:
          0.03681033 = score(doc=2077,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.28137225 = fieldWeight in 2077, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=2077)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

This paper proposes a novel hierarchical learning strategy to deal with the data sparseness problem in semantic relation extraction by modeling the commonality among related classes. For each class in the hierarchy either manually predefined or automatically clustered, a discriminative function is determined in a top-down way. As the upper-level class normally has much more positive training examples than the lower-level class, the corresponding discriminative function can be determined more reliably and guide the discriminative function learning in the lower-level one more effectively, which otherwise might suffer from limited training data. In this paper, two classifier learning approaches, i.e. the simple perceptron algorithm and the state-of-the-art Support Vector Machines, are applied using the hierarchical learning strategy. Moreover, several kinds of class hierarchies either manually predefined or automatically clustered are explored and compared. Evaluation on the ACE RDC 2003 and 2004 corpora shows that the hierarchical learning strategy much improves the performance on least- and medium-frequent relations.

Search (68 results, page 1 of 4)

Authors

Years

Languages

Types

Themes