Search (2 results, page 1 of 1)

Montalvo, S.; Martínez, R.; Fresno, V.; Delgado, A.: Exploiting named entities for bilingual news clustering (2015) 0.05
```
0.051874384 = product of:
  0.10374877 = sum of:
    0.10374877 = product of:
      0.20749754 = sum of:
        0.20749754 = weight(_text_:news in 1642) [ClassicSimilarity], result of:
          0.20749754 = score(doc=1642,freq=10.0), product of:
            0.26705483 = queryWeight, product of:
              5.2416887 = idf(docFreq=635, maxDocs=44218)
              0.05094824 = queryNorm
            0.7769848 = fieldWeight in 1642, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              5.2416887 = idf(docFreq=635, maxDocs=44218)
              0.046875 = fieldNorm(doc=1642)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this article, we present a new algorithm for clustering a bilingual collection of comparable news items in groups of specific topics. Our hypothesis is that named entities (NEs) are more informative than other features in the news when clustering fine grained topics. The algorithm does not need as input any information related to the number of clusters, and carries out the clustering only based on information regarding the shared named entities of the news items. This proposal is evaluated using different data sets and outperforms other state-of-the-art algorithms, thereby proving the plausibility of the approach. In addition, because the applicability of our approach depends on the possibility of identifying equivalent named entities among the news, we propose a heuristic system to identify equivalent named entities in the same and different languages, thereby obtaining good performance.
Zubiaga, A.; Spina, D.; Martínez, R.; Fresno, V.: Real-time classification of Twitter trends (2015) 0.04
```
0.040181726 = product of:
  0.08036345 = sum of:
    0.08036345 = product of:
      0.1607269 = sum of:
        0.1607269 = weight(_text_:news in 1661) [ClassicSimilarity], result of:
          0.1607269 = score(doc=1661,freq=6.0), product of:
            0.26705483 = queryWeight, product of:
              5.2416887 = idf(docFreq=635, maxDocs=44218)
              0.05094824 = queryNorm
            0.60184985 = fieldWeight in 1661, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.2416887 = idf(docFreq=635, maxDocs=44218)
              0.046875 = fieldNorm(doc=1661)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this work, we explore the types of triggers that spark trends on Twitter, introducing a typology with the following 4 types: news, ongoing events, memes, and commemoratives. While previous research has analyzed trending topics over the long term, we look at the earliest tweets that produce a trend, with the aim of categorizing trends early on. This allows us to provide a filtered subset of trends to end users. We experiment with a set of straightforward language-independent features based on the social spread of trends and categorize them using the typology. Our method provides an efficient way to accurately categorize trending topics without need of external data, enabling news organizations to discover breaking news in real-time, or to quickly identify viral memes that might inform marketing decisions, among others. The analysis of social features also reveals patterns associated with each type of trend, such as tweets about ongoing events being shorter as many were likely sent from mobile devices, or memes having more retweets originating from a few trend-setters.

Search (2 results, page 1 of 1)

Authors