Search (3 results, page 1 of 1)

  • × author_ss:"Can, F."
  • × language_ss:"e"
  1. Aksoy, C.; Can, F.; Kocberber, S.: Novelty detection for topic tracking (2012) 0.04
    0.043228656 = product of:
      0.08645731 = sum of:
        0.08645731 = product of:
          0.17291462 = sum of:
            0.17291462 = weight(_text_:news in 51) [ClassicSimilarity], result of:
              0.17291462 = score(doc=51,freq=10.0), product of:
                0.26705483 = queryWeight, product of:
                  5.2416887 = idf(docFreq=635, maxDocs=44218)
                  0.05094824 = queryNorm
                0.64748734 = fieldWeight in 51, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  5.2416887 = idf(docFreq=635, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=51)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Multisource web news portals provide various advantages such as richness in news content and an opportunity to follow developments from different perspectives. However, in such environments, news variety and quantity can have an overwhelming effect. New-event detection and topic-tracking studies address this problem. They examine news streams and organize stories according to their events; however, several tracking stories of an event/topic may contain no new information (i.e., no novelty). We study the novelty detection (ND) problem on the tracking news of a particular topic. For this purpose, we build a Turkish ND test collection called BilNov-2005 and propose the usage of three ND methods: a cosine-similarity (CS)-based method, a language-model (LM)-based method, and a cover-coefficient (CC)-based method. For the LM-based ND method, we show that a simpler smoothing approach, Dirichlet smoothing, can have similar performance to a more complex smoothing approach, Shrinkage smoothing. We introduce a baseline that shows the performance of a system with random novelty decisions. In addition, a category-based threshold learning method is used for the first time in ND literature. The experimental results show that the LM-based ND method significantly outperforms the CS- and CC-based methods, and category-based threshold learning achieves promising results when compared to general threshold learning.
  2. Toraman, C.; Can, F.: Discovering story chains : a framework based on zigzagged search and news actors (2017) 0.04
    0.043228656 = product of:
      0.08645731 = sum of:
        0.08645731 = product of:
          0.17291462 = sum of:
            0.17291462 = weight(_text_:news in 3963) [ClassicSimilarity], result of:
              0.17291462 = score(doc=3963,freq=10.0), product of:
                0.26705483 = queryWeight, product of:
                  5.2416887 = idf(docFreq=635, maxDocs=44218)
                  0.05094824 = queryNorm
                0.64748734 = fieldWeight in 3963, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  5.2416887 = idf(docFreq=635, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3963)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A story chain is a set of related news articles that reveal how different events are connected. This study presents a framework for discovering story chains, given an input document, in a text collection. The framework has 3 complementary parts that i) scan the collection, ii) measure the similarity between chain-member candidates and the chain, and iii) measure similarity among news articles. For scanning, we apply a novel text-mining method that uses a zigzagged search that reinvestigates past documents based on the updated chain. We also utilize social networks of news actors to reveal connections among news articles. We conduct 2 user studies in terms of 4 effectiveness measures-relevance, coverage, coherence, and ability to disclose relations. The first user study compares several versions of the framework, by varying parameters, to set a guideline for use. The second compares the framework with 3 baselines. The results show that our method provides statistically significant improvement in effectiveness in 61% of pairwise comparisons, with medium or large effect size; in the remainder, none of the baselines significantly outperforms our method.
  3. Can, F.; Kocberber, S.; Baglioglu, O.; Kardas, S.; Ocalan, H.C.; Uyar, E.: New event detection and topic tracking in Turkish (2010) 0.02
    0.01933244 = product of:
      0.03866488 = sum of:
        0.03866488 = product of:
          0.07732976 = sum of:
            0.07732976 = weight(_text_:news in 3442) [ClassicSimilarity], result of:
              0.07732976 = score(doc=3442,freq=2.0), product of:
                0.26705483 = queryWeight, product of:
                  5.2416887 = idf(docFreq=635, maxDocs=44218)
                  0.05094824 = queryNorm
                0.28956512 = fieldWeight in 3442, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2416887 = idf(docFreq=635, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3442)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Topic detection and tracking (TDT) applications aim to organize the temporally ordered stories of a news stream according to the events. Two major problems in TDT are new event detection (NED) and topic tracking (TT). These problems focus on finding the first stories of new events and identifying all subsequent stories on a certain topic defined by a small number of sample stories. In this work, we introduce the first large-scale TDT test collection for Turkish, and investigate the NED and TT problems in this language. We present our test-collection-construction approach, which is inspired by the TDT research initiative. We show that in TDT for Turkish with some similarity measures, a simple word truncation stemming method can compete with a lemmatizer-based stemming approach. Our findings show that contrary to our earlier observations on Turkish information retrieval, in NED word stopping has an impact on effectiveness. We demonstrate that the confidence scores of two different similarity measures can be combined in a straightforward manner for higher effectiveness. The influence of several similarity measures on effectiveness also is investigated. We show that it is possible to deploy TT applications in Turkish that can be used in operational settings.