Search (1 results, page 1 of 1)

Billal, B.; Fonseca, A.; Sadat, F.; Lounis, H.: Semi-supervised learning and social media text analysis towards multi-labeling categorization (2017) 0.00
```
0.0022438213 = product of:
  0.0044876426 = sum of:
    0.0044876426 = product of:
      0.008975285 = sum of:
        0.008975285 = weight(_text_:a in 4095) [ClassicSimilarity], result of:
          0.008975285 = score(doc=4095,freq=22.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.16900843 = fieldWeight in 4095, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=4095)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In traditional text classification, classes are mutually exclusive, i.e. it is not possible to have one text or text fragment classified into more than one class. On the other hand, in multi-label classification an individual text may belong to several classes simultaneously. This type of classification is required by a large number of current applications such as big data classification, images and video annotation. Supervised learning is the most used type of machine learning in the classification task. It requires large quantities of labeled data and the intervention of a human tagger in the creation of the training sets. When the data sets become very large or heavily noisy, this operation can be tedious, prone to error and time consuming. In this case, semi-supervised learning, which requires only few labels, is a better choice. In this paper, we study and evaluate several methods to address the problem of multi-label classification using semi-supervised learning and data from social networks. First, we propose a linguistic pre-processing involving tokeni-sation, recognition of named entities and hashtag segmentation in order to decrease the noise in this type of massive and unstructured real data and then we perform a word sense disambiguation using WordNet. Second, several experiments related to multi-label classification and semi-supervised learning are carried out on these data sets and compared to each other. These evaluations compare the results of the approaches considered. This paper proposes a method for combining semi-supervised methods with a graph method for the extraction of subjects in social networks using a multi-label classification approach. Experiments show that the performance of the proposed model increases in 4 p.p. the precision of the classification when compared to a baseline.

Type

a