Search (3 results, page 1 of 1)

  • × author_ss:"Fang, X."
  • × author_ss:"Chau, M."
  1. Chau, M.; Lu, Y.; Fang, X.; Yang, C.C.: Characteristics of character usage in Chinese Web searching (2009) 0.05
    0.052752987 = product of:
      0.10550597 = sum of:
        0.09033441 = weight(_text_:engines in 2456) [ClassicSimilarity], result of:
          0.09033441 = score(doc=2456,freq=4.0), product of:
            0.22757743 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.04479146 = queryNorm
            0.39693922 = fieldWeight in 2456, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2456)
        0.015171562 = product of:
          0.030343125 = sum of:
            0.030343125 = weight(_text_:22 in 2456) [ClassicSimilarity], result of:
              0.030343125 = score(doc=2456,freq=2.0), product of:
                0.15685207 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04479146 = queryNorm
                0.19345059 = fieldWeight in 2456, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2456)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The use of non-English Web search engines has been prevalent. Given the popularity of Chinese Web searching and the unique characteristics of Chinese language, it is imperative to conduct studies with focuses on the analysis of Chinese Web search queries. In this paper, we report our research on the character usage of Chinese search logs from a Web search engine in Hong Kong. By examining the distribution of search query terms, we found that users tended to use more diversified terms and that the usage of characters in search queries was quite different from the character usage of general online information in Chinese. After studying the Zipf distribution of n-grams with different values of n, we found that the curve of unigram is the most curved one of all while the bigram curve follows the Zipf distribution best, and that the curves of n-grams with larger n (n = 3-6) had similar structures with ?-values in the range of 0.66-0.86. The distribution of combined n-grams was also studied. All the analyses are performed on the data both before and after the removal of function terms and incomplete terms and similar findings are revealed. We believe the findings from this study have provided some insights into further research in non-English Web searching and will assist in the design of more effective Chinese Web search engines.
    Date
    22.11.2008 17:57:22
  2. Chau, M.; Fang, X.; Sheng, O.R.U.: Analysis of the query logs of a Web site search engine (2005) 0.04
    0.039115943 = product of:
      0.15646377 = sum of:
        0.15646377 = weight(_text_:engines in 4573) [ClassicSimilarity], result of:
          0.15646377 = score(doc=4573,freq=12.0), product of:
            0.22757743 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.04479146 = queryNorm
            0.68751884 = fieldWeight in 4573, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4573)
      0.25 = coord(1/4)
    
    Abstract
    A large number of studies have investigated the transaction log of general-purpose search engines such as Excite and AItaVista, but few studies have reported an the analysis of search logs for search engines that are limited to particular Web sites, namely, Web site search engines. In this article, we report our research an analyzing the search logs of the search engine of the Utah state government Web site. Our results show that some statistics, such as the number of search terms per query, of Web users are the same for general-purpose search engines and Web site search engines, but others, such as the search topics and the terms used, are considerably different. Possible reasons for the differences include the focused domain of Web site search engines and users' different information needs. The findings are useful for Web site developers to improve the performance of their services provided an the Web and for researchers to conduct further research in this area. The analysis also can be applied in e-government research by investigating how information should be delivered to users in government Web sites.
  3. Chau, M.; Fang, X.; Rittman, C.C.: Web searching in Chinese : a study of a search engine in Hong Kong (2007) 0.02
    0.022583602 = product of:
      0.09033441 = sum of:
        0.09033441 = weight(_text_:engines in 336) [ClassicSimilarity], result of:
          0.09033441 = score(doc=336,freq=4.0), product of:
            0.22757743 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.04479146 = queryNorm
            0.39693922 = fieldWeight in 336, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0390625 = fieldNorm(doc=336)
      0.25 = coord(1/4)
    
    Abstract
    The number of non-English resources has been increasing rapidly on the Web. Although many studies have been conducted on the query logs in search engines that are primarily English-based (e.g., Excite and AltaVista), only a few of them have studied the information-seeking behavior on the Web in non-English languages. In this article, we report the analysis of the search-query logs of a search engine that focused on Chinese. Three months of search-query logs of Timway, a search engine based in Hong Kong, were collected and analyzed. Metrics on sessions, queries, search topics, and character usage are reported. N-gram analysis also has been applied to perform character-based analysis. Our analysis suggests that some characteristics identified in the search log, such as search topics and the mean number of queries per sessions, are similar to those in English search engines; however, other characteristics, such as the use of operators in query formulation, are significantly different. The analysis also shows that only a very small number of unique Chinese characters are used in search queries. We believe the findings from this study have provided some insights into further research in non-English Web searching.