Search (1 results, page 1 of 1)

Did you mean:
themes%3a%22Semantic web%22 1

Hung, C.-M.; Chien, L.-F.: Web-based text classification in the absence of manually labeled training documents (2007) 0.02
```
0.015927691 = product of:
  0.031855382 = sum of:
    0.031855382 = product of:
      0.063710764 = sum of:
        0.063710764 = weight(_text_:web in 87) [ClassicSimilarity], result of:
          0.063710764 = score(doc=87,freq=6.0), product of:
            0.17002425 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.052098576 = queryNorm
            0.37471575 = fieldWeight in 87, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=87)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Most text classification techniques assume that manually labeled documents (corpora) can be easily obtained while learning text classifiers. However, labeled training documents are sometimes unavailable or inadequate even if they are available. The goal of this article is to present a self-learned approach to extract high-quality training documents from the Web when the required manually labeled documents are unavailable or of poor quality. To learn a text classifier automatically, we need only a set of user-defined categories and some highly related keywords. Extensive experiments are conducted to evaluate the performance of the proposed approach using the test set from the Reuters-21578 news data set. The experiments show that very promising results can be achieved only by using automatically extracted documents from the Web.