Search (9 results, page 1 of 1)

Toraman, C.; Can, F.: Discovering story chains : a framework based on zigzagged search and news actors (2017) 0.02
```
0.020615976 = product of:
  0.061847925 = sum of:
    0.007487943 = weight(_text_:information in 3963) [ClassicSimilarity], result of:
      0.007487943 = score(doc=3963,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.09697737 = fieldWeight in 3963, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3963)
    0.054359984 = weight(_text_:networks in 3963) [ClassicSimilarity], result of:
      0.054359984 = score(doc=3963,freq=2.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.26129362 = fieldWeight in 3963, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3963)
  0.33333334 = coord(2/6)
```
Abstract

A story chain is a set of related news articles that reveal how different events are connected. This study presents a framework for discovering story chains, given an input document, in a text collection. The framework has 3 complementary parts that i) scan the collection, ii) measure the similarity between chain-member candidates and the chain, and iii) measure similarity among news articles. For scanning, we apply a novel text-mining method that uses a zigzagged search that reinvestigates past documents based on the updated chain. We also utilize social networks of news actors to reveal connections among news articles. We conduct 2 user studies in terms of 4 effectiveness measures-relevance, coverage, coherence, and ability to disclose relations. The first user study compares several versions of the framework, by varying parameters, to set a guideline for use. The second compares the framework with 3 baselines. The results show that our method provides statistically significant improvement in effectiveness in 61% of pairwise comparisons, with medium or large effect size; in the remainder, none of the baselines significantly outperforms our method.

Source

Journal of the Association for Information Science and Technology. 68(2017) no.12, S.2795-2808
Can, F.; Kocberber, S.; Balcik, E.; Kaynak, C.; Ocalan, H.C.: Information retrieval on Turkish texts (2008) 0.00
```
0.0030262163 = product of:
  0.018157298 = sum of:
    0.018157298 = weight(_text_:information in 1373) [ClassicSimilarity], result of:
      0.018157298 = score(doc=1373,freq=6.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.23515764 = fieldWeight in 1373, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1373)
  0.16666667 = coord(1/6)
```
Abstract

In this study, we investigate information retrieval (IR) on Turkish texts using a large-scale test collection that contains 408,305 documents and 72 ad hoc queries. We examine the effects of several stemming options and query-document matching functions on retrieval performance. We show that a simple word truncation approach, a word truncation approach that uses language-dependent corpus statistics, and an elaborate lemmatizer-based stemmer provide similar retrieval effectiveness in Turkish IR. We investigate the effects of a range of search conditions on the retrieval performance; these include scalability issues, query and document length effects, and the use of stopword list in indexing.

Source

Journal of the American Society for Information Science and Technology. 59(2008) no.3, S.407-421

Can, F.: Incremental clustering for dynamic information processing (1993) 0.00

0.0028238804 = product of:
  0.016943282 = sum of:
    0.016943282 = weight(_text_:information in 6627) [ClassicSimilarity], result of:
      0.016943282 = score(doc=6627,freq=4.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.21943474 = fieldWeight in 6627, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=6627)
  0.16666667 = coord(1/6)

Source: ACM transactions on information systems. 11(1993) no.2, S.143-164

Can, F.; Nuray, R.; Sevdik, A.B.: Automatic performance evaluation of Web search engines (2004) 0.00
```
0.0025938996 = product of:
  0.015563398 = sum of:
    0.015563398 = weight(_text_:information in 2570) [ClassicSimilarity], result of:
      0.015563398 = score(doc=2570,freq=6.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.20156369 = fieldWeight in 2570, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2570)
  0.16666667 = coord(1/6)
```
Abstract

Measuring the information retrieval effectiveness of World Wide Web search engines is costly because of human relevance judgments involved. However, both for business enterprises and people it is important to know the most effective Web search engines, since such search engines help their users find higher number of relevant Web pages with less effort. Furthermore, this information can be used for several practical purposes. In this study we introduce automatic Web search engine evaluation method as an efficient and effective assessment tool of such systems. The experiments based on eight Web search engines, 25 queries, and binary user relevance judgments show that our method provides results consistent with human-based evaluations. It is shown that the observed consistencies are statistically significant. This indicates that the new method can be successfully used in the evaluation of Web search engines.

Source

Information processing and management. 40(2004) no.3, S.495-514
Aksoy, C.; Can, F.; Kocberber, S.: Novelty detection for topic tracking (2012) 0.00
```
0.0021615832 = product of:
  0.0129694985 = sum of:
    0.0129694985 = weight(_text_:information in 51) [ClassicSimilarity], result of:
      0.0129694985 = score(doc=51,freq=6.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.16796975 = fieldWeight in 51, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=51)
  0.16666667 = coord(1/6)
```
Abstract

Multisource web news portals provide various advantages such as richness in news content and an opportunity to follow developments from different perspectives. However, in such environments, news variety and quantity can have an overwhelming effect. New-event detection and topic-tracking studies address this problem. They examine news streams and organize stories according to their events; however, several tracking stories of an event/topic may contain no new information (i.e., no novelty). We study the novelty detection (ND) problem on the tracking news of a particular topic. For this purpose, we build a Turkish ND test collection called BilNov-2005 and propose the usage of three ND methods: a cosine-similarity (CS)-based method, a language-model (LM)-based method, and a cover-coefficient (CC)-based method. For the LM-based ND method, we show that a simpler smoothing approach, Dirichlet smoothing, can have similar performance to a more complex smoothing approach, Shrinkage smoothing. We introduce a baseline that shows the performance of a system with random novelty decisions. In addition, a category-based threshold learning method is used for the first time in ND literature. The experimental results show that the LM-based ND method significantly outperforms the CS- and CC-based methods, and category-based threshold learning achieves promising results when compared to general threshold learning.

Source

Journal of the American Society for Information Science and Technology. 63(2012) no.4, S.777-795

Theme

Information Gateway
Kucukyilmaz, T.; Cambazoglu, B.B.; Aykanat, C.; Can, F.: Chat mining : Predicting user and message attributes in computer-mediated communication (2008) 0.00
```
0.0021179102 = product of:
  0.012707461 = sum of:
    0.012707461 = weight(_text_:information in 2099) [ClassicSimilarity], result of:
      0.012707461 = score(doc=2099,freq=4.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.16457605 = fieldWeight in 2099, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2099)
  0.16666667 = coord(1/6)
```
Abstract

The focus of this paper is to investigate the possibility of predicting several user and message attributes in text-based, real-time, online messaging services. For this purpose, a large collection of chat messages is examined. The applicability of various supervised classification techniques for extracting information from the chat messages is evaluated. Two competing models are used for defining the chat mining problem. A term-based approach is used to investigate the user and message attributes in the context of vocabulary use while a style-based approach is used to examine the chat messages according to the variations in the authors' writing styles. Among 100 authors, the identity of an author is correctly predicted with 99.7% accuracy. Moreover, the reverse problem is exploited, and the effect of author attributes on computer-mediated communications is discussed.

Source

Information processing and management. 44(2008) no.4, S.1448-1466
Can, F.; Kocberber, S.; Baglioglu, O.; Kardas, S.; Ocalan, H.C.; Uyar, E.: New event detection and topic tracking in Turkish (2010) 0.00
```
0.0017649251 = product of:
  0.01058955 = sum of:
    0.01058955 = weight(_text_:information in 3442) [ClassicSimilarity], result of:
      0.01058955 = score(doc=3442,freq=4.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.13714671 = fieldWeight in 3442, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3442)
  0.16666667 = coord(1/6)
```
Abstract

Topic detection and tracking (TDT) applications aim to organize the temporally ordered stories of a news stream according to the events. Two major problems in TDT are new event detection (NED) and topic tracking (TT). These problems focus on finding the first stories of new events and identifying all subsequent stories on a certain topic defined by a small number of sample stories. In this work, we introduce the first large-scale TDT test collection for Turkish, and investigate the NED and TT problems in this language. We present our test-collection-construction approach, which is inspired by the TDT research initiative. We show that in TDT for Turkish with some similarity measures, a simple word truncation stemming method can compete with a lemmatizer-based stemming approach. Our findings show that contrary to our earlier observations on Turkish information retrieval, in NED word stopping has an impact on effectiveness. We demonstrate that the confidence scores of two different similarity measures can be combined in a straightforward manner for higher effectiveness. The influence of several similarity measures on effectiveness also is investigated. We show that it is possible to deploy TT applications in Turkish that can be used in operational settings.

Source

Journal of the American Society for Information Science and Technology. 61(2010) no.4, S.802-819

Can, F.: On the efficiency of best-match cluster searches (1994) 0.00

0.0017471868 = product of:
  0.010483121 = sum of:
    0.010483121 = weight(_text_:information in 7294) [ClassicSimilarity], result of:
      0.010483121 = score(doc=7294,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.13576832 = fieldWeight in 7294, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7294)
  0.16666667 = coord(1/6)

Source: Information processing and management. 30(1994) no.3, S.343-361

Carterette, B.; Can, F.: Comparing inverted files and signature files for searching a large lexicon (2005) 0.00

0.0014975886 = product of:
  0.0089855315 = sum of:
    0.0089855315 = weight(_text_:information in 1029) [ClassicSimilarity], result of:
      0.0089855315 = score(doc=1029,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.116372846 = fieldWeight in 1029, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1029)
  0.16666667 = coord(1/6)

Source: Information processing and management. 41(2005) no.3, S.613-634

Search (9 results, page 1 of 1)

Authors

Years

Themes