Search (2 results, page 1 of 1)

Did you mean:
sbss%3a%22Pub A 91 %2f information%22 2

Ko, Y.; Park, J.; Seo, J.: Improving text categorization using the importance of sentences (2004) 0.01
```
0.0065180818 = product of:
  0.016295204 = sum of:
    0.01155891 = weight(_text_:a in 2557) [ClassicSimilarity], result of:
      0.01155891 = score(doc=2557,freq=16.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.2161963 = fieldWeight in 2557, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2557)
    0.0047362936 = product of:
      0.009472587 = sum of:
        0.009472587 = weight(_text_:information in 2557) [ClassicSimilarity], result of:
          0.009472587 = score(doc=2557,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.116372846 = fieldWeight in 2557, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2557)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Automatic text categorization is a problem of assigning text documents to pre-defined categories. In order to classify text documents, we must extract useful features. In previous researches, a text document is commonly represented by the term frequency and the inverted document frequency of each feature. Since there is a difference between important sentences and unimportant sentences in a document, the features from more important sentences should be considered more than other features. In this paper, we measure the importance of sentences using text summarization techniques. Then we represent a document as a vector of features with different weights according to the importance of each sentence. To verify our new method, we conduct experiments using two language newsgroup data sets: one written by English and the other written by Korean. Four kinds of classifiers are used in our experiments: Naive Bayes, Rocchio, k-NN, and SVM. We observe that our new method makes a significant improvement in all these classifiers and both data sets.

Source

Information processing and management. 40(2004) no.1, S.65-79

Type

a
Ko, Y.; Seo, J.: Text classification from unlabeled documents with bootstrapping and feature projection techniques (2009) 0.01
```
0.0065180818 = product of:
  0.016295204 = sum of:
    0.01155891 = weight(_text_:a in 2452) [ClassicSimilarity], result of:
      0.01155891 = score(doc=2452,freq=16.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.2161963 = fieldWeight in 2452, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2452)
    0.0047362936 = product of:
      0.009472587 = sum of:
        0.009472587 = weight(_text_:information in 2452) [ClassicSimilarity], result of:
          0.009472587 = score(doc=2452,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.116372846 = fieldWeight in 2452, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2452)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Many machine learning algorithms have been applied to text classification tasks. In the machine learning paradigm, a general inductive process automatically builds a text classifier by learning, generally known as supervised learning. However, the supervised learning approaches have some problems. The most notable problem is that they require a large number of labeled training documents for accurate learning. While unlabeled documents are easily collected and plentiful, labeled documents are difficultly generated because a labeling task must be done by human developers. In this paper, we propose a new text classification method based on unsupervised or semi-supervised learning. The proposed method launches text classification tasks with only unlabeled documents and the title word of each category for learning, and then it automatically learns text classifier by using bootstrapping and feature projection techniques. The results of experiments showed that the proposed method achieved reasonably useful performance compared to a supervised method. If the proposed method is used in a text classification task, building text classification systems will become significantly faster and less expensive.

Source

Information processing and management. 45(2009) no.1, S.70-83

Type

a

Search (2 results, page 1 of 1)

Authors