Search (8 results, page 1 of 1)

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01

0.014111074 = product of:
  0.07055537 = sum of:
    0.07055537 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
      0.07055537 = score(doc=2748,freq=2.0), product of:
        0.18236019 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052075688 = queryNorm
        0.38690117 = fieldWeight in 2748, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.078125 = fieldNorm(doc=2748)
  0.2 = coord(1/5)

Date: 1. 2.2016 18:25:22

Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.01
```
0.008929707 = product of:
  0.044648536 = sum of:
    0.044648536 = weight(_text_:7 in 967) [ClassicSimilarity], result of:
      0.044648536 = score(doc=967,freq=4.0), product of:
        0.17251469 = queryWeight, product of:
          3.3127685 = idf(docFreq=4376, maxDocs=44218)
          0.052075688 = queryNorm
        0.25881004 = fieldWeight in 967, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.3127685 = idf(docFreq=4376, maxDocs=44218)
          0.0390625 = fieldNorm(doc=967)
  0.2 = coord(1/5)
```
Abstract

Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.

Source

Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1399-1410

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.01

0.008466644 = product of:
  0.04233322 = sum of:
    0.04233322 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
      0.04233322 = score(doc=690,freq=2.0), product of:
        0.18236019 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052075688 = queryNorm
        0.23214069 = fieldWeight in 690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=690)
  0.2 = coord(1/5)

Date: 23. 3.2013 13:22:36

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01

0.008466644 = product of:
  0.04233322 = sum of:
    0.04233322 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
      0.04233322 = score(doc=2158,freq=2.0), product of:
        0.18236019 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052075688 = queryNorm
        0.23214069 = fieldWeight in 2158, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=2158)
  0.2 = coord(1/5)

Date: 4. 8.2015 19:22:04

Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.01

0.0075771073 = product of:
  0.037885536 = sum of:
    0.037885536 = weight(_text_:7 in 3015) [ClassicSimilarity], result of:
      0.037885536 = score(doc=3015,freq=2.0), product of:
        0.17251469 = queryWeight, product of:
          3.3127685 = idf(docFreq=4376, maxDocs=44218)
          0.052075688 = queryNorm
        0.21960759 = fieldWeight in 3015, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3127685 = idf(docFreq=4376, maxDocs=44218)
          0.046875 = fieldNorm(doc=3015)
  0.2 = coord(1/5)

Source: Journal of the Association for Information Science and Technology. 67(2016) no.7, S.1668-1678

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01

0.007055537 = product of:
  0.035277683 = sum of:
    0.035277683 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
      0.035277683 = score(doc=1107,freq=2.0), product of:
        0.18236019 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052075688 = queryNorm
        0.19345059 = fieldWeight in 1107, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1107)
  0.2 = coord(1/5)

Date: 28.10.2013 19:22:57

Wartena, C.; Sommer, M.: Automatic classification of scientific records using the German Subject Heading Authority File (SWD) (2012) 0.01
```
0.0063142553 = product of:
  0.031571276 = sum of:
    0.031571276 = weight(_text_:7 in 472) [ClassicSimilarity], result of:
      0.031571276 = score(doc=472,freq=2.0), product of:
        0.17251469 = queryWeight, product of:
          3.3127685 = idf(docFreq=4376, maxDocs=44218)
          0.052075688 = queryNorm
        0.18300632 = fieldWeight in 472, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3127685 = idf(docFreq=4376, maxDocs=44218)
          0.0390625 = fieldNorm(doc=472)
  0.2 = coord(1/5)
```
Abstract

The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.

Salles, T.; Rocha, L.; Gonçalves, M.A.; Almeida, J.M.; Mourão, F.; Meira Jr., W.; Viegas, F.: ¬A quantitative analysis of the temporal effects on automatic text classification (2016) 0.01

0.0063142553 = product of:
  0.031571276 = sum of:
    0.031571276 = weight(_text_:7 in 3014) [ClassicSimilarity], result of:
      0.031571276 = score(doc=3014,freq=2.0), product of:
        0.17251469 = queryWeight, product of:
          3.3127685 = idf(docFreq=4376, maxDocs=44218)
          0.052075688 = queryNorm
        0.18300632 = fieldWeight in 3014, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3127685 = idf(docFreq=4376, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3014)
  0.2 = coord(1/5)

Source: Journal of the Association for Information Science and Technology. 67(2016) no.7, S.1639-1667

Search (8 results, page 1 of 1)

Authors

Types

Themes