Search (8 results, page 1 of 1)

  • × author_ss:"Chau, M."
  1. Chau, M.; Lu, Y.; Fang, X.; Yang, C.C.: Characteristics of character usage in Chinese Web searching (2009) 0.15
    0.14947316 = product of:
      0.22420973 = sum of:
        0.08876401 = weight(_text_:search in 2456) [ClassicSimilarity], result of:
          0.08876401 = score(doc=2456,freq=14.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.5079997 = fieldWeight in 2456, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2456)
        0.13544571 = sum of:
          0.10138928 = weight(_text_:engines in 2456) [ClassicSimilarity], result of:
            0.10138928 = score(doc=2456,freq=4.0), product of:
              0.25542772 = queryWeight, product of:
                5.080822 = idf(docFreq=746, maxDocs=44218)
                0.05027291 = queryNorm
              0.39693922 = fieldWeight in 2456, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.080822 = idf(docFreq=746, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2456)
          0.03405643 = weight(_text_:22 in 2456) [ClassicSimilarity], result of:
            0.03405643 = score(doc=2456,freq=2.0), product of:
              0.17604718 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05027291 = queryNorm
              0.19345059 = fieldWeight in 2456, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2456)
      0.6666667 = coord(2/3)
    
    Abstract
    The use of non-English Web search engines has been prevalent. Given the popularity of Chinese Web searching and the unique characteristics of Chinese language, it is imperative to conduct studies with focuses on the analysis of Chinese Web search queries. In this paper, we report our research on the character usage of Chinese search logs from a Web search engine in Hong Kong. By examining the distribution of search query terms, we found that users tended to use more diversified terms and that the usage of characters in search queries was quite different from the character usage of general online information in Chinese. After studying the Zipf distribution of n-grams with different values of n, we found that the curve of unigram is the most curved one of all while the bigram curve follows the Zipf distribution best, and that the curves of n-grams with larger n (n = 3-6) had similar structures with ?-values in the range of 0.66-0.86. The distribution of combined n-grams was also studied. All the analyses are performed on the data both before and after the removal of function terms and incomplete terms and similar findings are revealed. We believe the findings from this study have provided some insights into further research in non-English Web searching and will assist in the design of more effective Chinese Web search engines.
    Date
    22.11.2008 17:57:22
  2. Chau, M.; Fang, X.; Sheng, O.R.U.: Analysis of the query logs of a Web site search engine (2005) 0.14
    0.13601671 = product of:
      0.20402506 = sum of:
        0.11621937 = weight(_text_:search in 4573) [ClassicSimilarity], result of:
          0.11621937 = score(doc=4573,freq=24.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.66512775 = fieldWeight in 4573, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4573)
        0.08780568 = product of:
          0.17561136 = sum of:
            0.17561136 = weight(_text_:engines in 4573) [ClassicSimilarity], result of:
              0.17561136 = score(doc=4573,freq=12.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.68751884 = fieldWeight in 4573, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4573)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    A large number of studies have investigated the transaction log of general-purpose search engines such as Excite and AItaVista, but few studies have reported an the analysis of search logs for search engines that are limited to particular Web sites, namely, Web site search engines. In this article, we report our research an analyzing the search logs of the search engine of the Utah state government Web site. Our results show that some statistics, such as the number of search terms per query, of Web users are the same for general-purpose search engines and Web site search engines, but others, such as the search topics and the terms used, are considerably different. Possible reasons for the differences include the focused domain of Web site search engines and users' different information needs. The findings are useful for Web site developers to improve the performance of their services provided an the Web and for researchers to conduct further research in this area. The analysis also can be applied in e-government research by investigating how information should be delivered to users in government Web sites.
  3. Chau, M.; Fang, X.; Rittman, C.C.: Web searching in Chinese : a study of a search engine in Hong Kong (2007) 0.11
    0.10797747 = product of:
      0.1619662 = sum of:
        0.11127157 = weight(_text_:search in 336) [ClassicSimilarity], result of:
          0.11127157 = score(doc=336,freq=22.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.6368113 = fieldWeight in 336, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=336)
        0.05069464 = product of:
          0.10138928 = sum of:
            0.10138928 = weight(_text_:engines in 336) [ClassicSimilarity], result of:
              0.10138928 = score(doc=336,freq=4.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.39693922 = fieldWeight in 336, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=336)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The number of non-English resources has been increasing rapidly on the Web. Although many studies have been conducted on the query logs in search engines that are primarily English-based (e.g., Excite and AltaVista), only a few of them have studied the information-seeking behavior on the Web in non-English languages. In this article, we report the analysis of the search-query logs of a search engine that focused on Chinese. Three months of search-query logs of Timway, a search engine based in Hong Kong, were collected and analyzed. Metrics on sessions, queries, search topics, and character usage are reported. N-gram analysis also has been applied to perform character-based analysis. Our analysis suggests that some characteristics identified in the search log, such as search topics and the mean number of queries per sessions, are similar to those in English search engines; however, other characteristics, such as the use of operators in query formulation, are significantly different. The analysis also shows that only a very small number of unique Chinese characters are used in search queries. We believe the findings from this study have provided some insights into further research in non-English Web searching.
  4. Chen, H.; Fan, H.; Chau, M.; Zeng, D.: MetaSpider : meta-searching and categorization on the Web (2001) 0.10
    0.10465382 = product of:
      0.15698072 = sum of:
        0.09489272 = weight(_text_:search in 6849) [ClassicSimilarity], result of:
          0.09489272 = score(doc=6849,freq=16.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.54307455 = fieldWeight in 6849, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6849)
        0.062088005 = product of:
          0.12417601 = sum of:
            0.12417601 = weight(_text_:engines in 6849) [ClassicSimilarity], result of:
              0.12417601 = score(doc=6849,freq=6.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.4861493 = fieldWeight in 6849, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6849)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    It has become increasingly difficult to locate relevant information on the Web, even with the help of Web search engines. Two approaches to addressing the low precision and poor presentation of search results of current search tools are studied: meta-search and document categorization. Meta-search engines improve precision by selecting and integrating search results from generic or domain-specific Web search engines or other resources. Document categorization promises better organization and presentation of retrieved results. This article introduces MetaSpider, a meta-search engine that has real-time indexing and categorizing functions. We report in this paper the major components of MetaSpider and discuss related technical approaches. Initial results of a user evaluation study comparing Meta-Spider, NorthernLight, and MetaCrawler in terms of clustering performance and of time and effort expended show that MetaSpider performed best in precision rate, but disclose no statistically significant differences in recall rate and time requirements. Our experimental study also reveals that MetaSpider exhibited a higher level of automation than the other two systems and facilitated efficient searching by providing the user with an organized, comprehensive view of the retrieved documents.
  5. Chau, M.; Wong, C.H.; Zhou, Y.; Qin, J.; Chen, H.: Evaluating the use of search engine development tools in IT education (2010) 0.08
    0.08380928 = product of:
      0.12571391 = sum of:
        0.07501928 = weight(_text_:search in 3325) [ClassicSimilarity], result of:
          0.07501928 = score(doc=3325,freq=10.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.4293381 = fieldWeight in 3325, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3325)
        0.05069464 = product of:
          0.10138928 = sum of:
            0.10138928 = weight(_text_:engines in 3325) [ClassicSimilarity], result of:
              0.10138928 = score(doc=3325,freq=4.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.39693922 = fieldWeight in 3325, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3325)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    It is important for education in computer science and information systems to keep up to date with the latest development in technology. With the rapid development of the Internet and the Web, many schools have included Internet-related technologies, such as Web search engines and e-commerce, as part of their curricula. Previous research has shown that it is effective to use search engine development tools to facilitate students' learning. However, the effectiveness of these tools in the classroom has not been evaluated. In this article, we review the design of three search engine development tools, SpidersRUs, Greenstone, and Alkaline, followed by an evaluation study that compared the three tools in the classroom. In the study, 33 students were divided into 13 groups and each group used the three tools to develop three independent search engines in a class project. Our evaluation results showed that SpidersRUs performed better than the two other tools in overall satisfaction and the level of knowledge gained in their learning experience when using the tools for a class project on Internet applications development.
  6. Chau, M.; Shiu, B.; Chan, M.; Chen, H.: Redips: backlink search and analysis on the Web for business intelligence analysis (2007) 0.08
    0.07852928 = product of:
      0.11779392 = sum of:
        0.06709928 = weight(_text_:search in 142) [ClassicSimilarity], result of:
          0.06709928 = score(doc=142,freq=8.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.3840117 = fieldWeight in 142, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=142)
        0.05069464 = product of:
          0.10138928 = sum of:
            0.10138928 = weight(_text_:engines in 142) [ClassicSimilarity], result of:
              0.10138928 = score(doc=142,freq=4.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.39693922 = fieldWeight in 142, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=142)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The World Wide Web presents significant opportunities for business intelligence analysis as it can provide information about a company's external environment and its stakeholders. Traditional business intelligence analysis on the Web has focused on simple keyword searching. Recently, it has been suggested that the incoming links, or backlinks, of a company's Web site (i.e., other Web pages that have a hyperlink pointing to the company of Interest) can provide important insights about the company's "online communities." Although analysis of these communities can provide useful signals for a company and information about its stakeholder groups, the manual analysis process can be very time-consuming for business analysts and consultants. In this article, we present a tool called Redips that automatically integrates backlink meta-searching and text-mining techniques to facilitate users in performing such business intelligence analysis on the Web. The architectural design and implementation of the tool are presented in the article. To evaluate the effectiveness, efficiency, and user satisfaction of Redips, an experiment was conducted to compare the tool with two popular business Intelligence analysis methods-using backlink search engines and manual browsing. The experiment results showed that Redips was statistically more effective than both benchmark methods (in terms of Recall and F-measure) but required more time in search tasks. In terms of user satisfaction, Redips scored statistically higher than backlink search engines in all five measures used, and also statistically higher than manual browsing in three measures.
  7. Schroeder, J.; Xu, J.; Chen, H.; Chau, M.: Automated criminal link analysis based on domain knowledge (2007) 0.02
    0.018978544 = product of:
      0.056935627 = sum of:
        0.056935627 = weight(_text_:search in 275) [ClassicSimilarity], result of:
          0.056935627 = score(doc=275,freq=4.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.3258447 = fieldWeight in 275, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=275)
      0.33333334 = coord(1/3)
    
    Abstract
    Link (association) analysis has been used in the criminal justice domain to search large datasets for associations between crime entities in order to facilitate crime investigations. However, link analysis still faces many challenging problems, such as information overload, high search complexity, and heavy reliance on domain knowledge. To address these challenges, this article proposes several techniques for automated, effective, and efficient link analysis. These techniques include the co-occurrence analysis, the shortest path algorithm, and a heuristic approach to identifying associations and determining their importance. We developed a prototype system called CrimeLink Explorer based on the proposed techniques. Results of a user study with 10 crime investigators from the Tucson Police Department showed that our system could help subjects conduct link analysis more efficiently than traditional single-level link analysis tools. Moreover, subjects believed that association paths found based on the heuristic approach were more accurate than those found based solely on the co-occurrence analysis and that the automated link analysis system would be of great help in crime investigations.
  8. Chen, H.; Lally, A.M.; Zhu, B.; Chau, M.: HelpfulMed : Intelligent searching for medical information over the Internet (2003) 0.02
    0.015815454 = product of:
      0.04744636 = sum of:
        0.04744636 = weight(_text_:search in 1615) [ClassicSimilarity], result of:
          0.04744636 = score(doc=1615,freq=4.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.27153727 = fieldWeight in 1615, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1615)
      0.33333334 = coord(1/3)
    
    Abstract
    The Medical professionals and researchers need information from reputable sources to accomplish their work. Unfortunately, the Web has a large number of documents that are irrelevant to their work, even those documents that purport to be "medically-related." This paper describes an architecture designed to integrate advanced searching and indexing algorithms, an automatic thesaurus, or "concept space," and Kohonen-based Self-Organizing Map (SOM) technologies to provide searchers with finegrained results. Initial results indicate that these systems provide complementary retrieval functionalities. HelpfulMed not only allows users to search Web pages and other online databases, but also allows them to build searches through the use of an automatic thesaurus and browse a graphical display of medical-related topics. Evaluation results for each of the different components are included. Our spidering algorithm outperformed both breadth-first search and PageRank spiders an a test collection of 100,000 Web pages. The automatically generated thesaurus performed as well as both MeSH and UMLS-systems which require human mediation for currency. Lastly, a variant of the Kohonen SOM was comparable to MeSH terms in perceived cluster precision and significantly better at perceived cluster recall.