Search (50 results, page 3 of 3)

Fagni, T.; Sebastiani, F.: Selecting negative examples for hierarchical text classification: An experimental comparison (2010) 0.01
```
0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 4101) [ClassicSimilarity], result of:
          0.032037273 = score(doc=4101,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 4101, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4101)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Hierarchical text classification (HTC) approaches have recently attracted a lot of interest on the part of researchers in human language technology and machine learning, since they have been shown to bring about equal, if not better, classification accuracy with respect to their "flat" counterparts while allowing exponential time savings at both learning and classification time. A typical component of HTC methods is a "local" policy for selecting negative examples: Given a category c, its negative training examples are by default identified with the training examples that are negative for c and positive for the categories which are siblings of c in the hierarchy. However, this policy has always been taken for granted and never been subjected to careful scrutiny since first proposed 15 years ago. This article proposes a thorough experimental comparison between this policy and three other policies for the selection of negative examples in HTC contexts, one of which (BEST LOCAL (k)) is being proposed for the first time in this article. We compare these policies on the hierarchical versions of three supervised learning algorithms (boosting, support vector machines, and naïve Bayes) by performing experiments on two standard TC datasets, REUTERS-21578 and RCV1-V2.
Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.01
```
0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 967) [ClassicSimilarity], result of:
          0.032037273 = score(doc=967,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.

Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.01

0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 2300) [ClassicSimilarity], result of:
          0.032037273 = score(doc=2300,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 2300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2300)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Yang, P.; Gao, W.; Tan, Q.; Wong, K.-F.: ¬A link-bridged topic model for cross-domain document classification (2013) 0.01

0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 2706) [ClassicSimilarity], result of:
          0.032037273 = score(doc=2706,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 2706, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2706)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01

0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 3311) [ClassicSimilarity], result of:
          0.032037273 = score(doc=3311,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 3311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.01

0.007707316 = product of:
  0.015414632 = sum of:
    0.015414632 = product of:
      0.030829264 = sum of:
        0.030829264 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.030829264 = score(doc=2765,freq=2.0), product of:
            0.15936506 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045509085 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2009 19:14:43

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01

0.007707316 = product of:
  0.015414632 = sum of:
    0.015414632 = product of:
      0.030829264 = sum of:
        0.030829264 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.030829264 = score(doc=1107,freq=2.0), product of:
            0.15936506 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045509085 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 28.10.2013 19:22:57

Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.01

0.006165853 = product of:
  0.012331706 = sum of:
    0.012331706 = product of:
      0.024663411 = sum of:
        0.024663411 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
          0.024663411 = score(doc=3284,freq=2.0), product of:
            0.15936506 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045509085 = queryNorm
            0.15476047 = fieldWeight in 3284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=3284)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 1.2010 14:41:24

Borko, H.: Research in computer based classification systems (1985) 0.01
```
0.0056065232 = product of:
  0.0112130465 = sum of:
    0.0112130465 = product of:
      0.022426093 = sum of:
        0.022426093 = weight(_text_:k in 3647) [ClassicSimilarity], result of:
          0.022426093 = score(doc=3647,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.13804297 = fieldWeight in 3647, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3647)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The selection in this reader by R. M. Needham and K. Sparck Jones reports an early approach to automatic classification that was taken in England. The following selection reviews various approaches that were being pursued in the United States at about the same time. It then discusses a particular approach initiated in the early 1960s by Harold Borko, at that time Head of the Language Processing and Retrieval Research Staff at the System Development Corporation, Santa Monica, California and, since 1966, a member of the faculty at the Graduate School of Library and Information Science, University of California, Los Angeles. As was described earlier, there are two steps in automatic classification, the first being to identify pairs of terms that are similar by virtue of co-occurring as index terms in the same documents, and the second being to form equivalence classes of intersubstitutable terms. To compute similarities, Borko and his associates used a standard correlation formula; to derive classification categories, where Needham and Sparck Jones used clumping, the Borko team used the statistical technique of factor analysis. The fact that documents can be classified automatically, and in any number of ways, is worthy of passing notice. Worthy of serious attention would be a demonstra tion that a computer-based classification system was effective in the organization and retrieval of documents. One reason for the inclusion of the following selection in the reader is that it addresses the question of evaluation. To evaluate the effectiveness of their automatically derived classification, Borko and his team asked three questions. The first was Is the classification reliable? in other words, could the categories derived from one sample of texts be used to classify other texts? Reliability was assessed by a case-study comparison of the classes derived from three different samples of abstracts. The notso-surprising conclusion reached was that automatically derived classes were reliable only to the extent that the sample from which they were derived was representative of the total document collection. The second evaluation question asked whether the classification was reasonable, in the sense of adequately describing the content of the document collection. The answer was sought by comparing the automatically derived categories with categories in a related classification system that was manually constructed. Here the conclusion was that the automatic method yielded categories that fairly accurately reflected the major area of interest in the sample collection of texts; however, since there were only eleven such categories and they were quite broad, they could not be regarded as suitable for use in a university or any large general library. The third evaluation question asked whether automatic classification was accurate, in the sense of producing results similar to those obtainabie by human cIassifiers. When using human classification as a criterion, automatic classification was found to be 50 percent accurate.

Schek, M.: Automatische Klassifizierung und Visualisierung im Archiv der Süddeutschen Zeitung (2005) 0.01

0.0056065232 = product of:
  0.0112130465 = sum of:
    0.0112130465 = product of:
      0.022426093 = sum of:
        0.022426093 = weight(_text_:k in 4884) [ClassicSimilarity], result of:
          0.022426093 = score(doc=4884,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.13804297 = fieldWeight in 4884, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.02734375 = fieldNorm(doc=4884)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Object: K-Infinity

Search (50 results, page 3 of 3)

Authors

Years

Languages

Types

Themes