Search (2 results, page 1 of 1)

Ozmutlu, S.; Cosar, G.C.: Analyzing the results of automatic new topic identification (2008) 0.01
```
0.0073941024 = product of:
  0.05915282 = sum of:
    0.05915282 = weight(_text_:studies in 2604) [ClassicSimilarity], result of:
      0.05915282 = score(doc=2604,freq=4.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.37408823 = fieldWeight in 2604, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.046875 = fieldNorm(doc=2604)
  0.125 = coord(1/8)
```
Abstract

Purpose - Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. Recently, various studies have focused on new topic identification/session identification of search engine transaction logs, and several problems regarding the estimation of topic shifts and continuations were observed in these studies. This study aims to analyze the reasons for the problems that were encountered as a result of applying automatic new topic identification. Design/methodology/approach - Measures, such as cleaning the data of common words and analyzing the errors of automatic new topic identification, are applied to eliminate the problems in estimating topic shifts and continuations. Findings - The findings show that the resulting errors of automatic new topic identification have a pattern, and further research is required to improve the performance of automatic new topic identification. Originality/value - Improving the performance of automatic new topic identification would be valuable to search engine designers, so that they can develop new clustering and query recommendation algorithms, as well as custom-tailored graphical user interfaces for search engine users.
Gencosman, B.C.; Ozmutlu, H.C.; Ozmutlu, S.: Character n-gram application for automatic new topic identification (2014) 0.01
```
0.0061617517 = product of:
  0.049294014 = sum of:
    0.049294014 = weight(_text_:studies in 2688) [ClassicSimilarity], result of:
      0.049294014 = score(doc=2688,freq=4.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.3117402 = fieldWeight in 2688, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2688)
  0.125 = coord(1/8)
```
Abstract

The widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.

Search (2 results, page 1 of 1)

Authors

Years

Themes