Search (111 results, page 1 of 6)

  • × language_ss:"e"
  • × theme_ss:"Suchmaschinen"
  • × year_i:[2000 TO 2010}
  1. Jansen, B.J.; Spink, A.; Pedersen, J.: ¬A temporal comparison of AItaVista Web searching (2005) 0.11
    0.11018313 = product of:
      0.22036625 = sum of:
        0.06775281 = weight(_text_:term in 3454) [ClassicSimilarity], result of:
          0.06775281 = score(doc=3454,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.309317 = fieldWeight in 3454, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.046875 = fieldNorm(doc=3454)
        0.15261345 = weight(_text_:frequency in 3454) [ClassicSimilarity], result of:
          0.15261345 = score(doc=3454,freq=4.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.55206984 = fieldWeight in 3454, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.046875 = fieldNorm(doc=3454)
      0.5 = coord(2/4)
    
    Abstract
    Major Web search engines, such as AItaVista, are essential tools in the quest to locate online information. This article reports research that used transaction log analysis to examine the characteristics and changes in AItaVista Web searching that occurred from 1998 to 2002. The research questions we examined are (1) What are the changes in AItaVista Web searching from 1998 to 2002? (2) What are the current characteristics of AItaVista searching, including the duration and frequency of search sessions? (3) What changes in the information needs of AItaVista users occurred between 1998 and 2002? The results of our research show (1) a move toward more interactivity with increases in session and query length, (2) with 70% of session durations at 5 minutes or less, the frequency of interaction is increasing, but it is happening very quickly, and (3) a broadening range of Web searchers' information needs, with the most frequent terms accounting for less than 1% of total term usage. We discuss the implications of these findings for the development of Web search engines.
  2. Chen, Z.; Meng, X.; Fowler, R.H.; Zhu, B.: Real-time adaptive feature and document learning for Web search (2001) 0.05
    0.05183916 = product of:
      0.10367832 = sum of:
        0.005885557 = product of:
          0.023542227 = sum of:
            0.023542227 = weight(_text_:based in 5209) [ClassicSimilarity], result of:
              0.023542227 = score(doc=5209,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.16644597 = fieldWeight in 5209, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5209)
          0.25 = coord(1/4)
        0.09779277 = weight(_text_:term in 5209) [ClassicSimilarity], result of:
          0.09779277 = score(doc=5209,freq=6.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.44646066 = fieldWeight in 5209, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5209)
      0.5 = coord(2/4)
    
    Abstract
    Chen et alia report on the design of FEATURES, a web search engine with adaptive features based on minimal relevance feedback. Rather than developing user profiles from previous searcher activity either at the server or client location, or updating indexes after search completion, FEATURES allows for index and user characterization files to be updated during query modification on retrieval from a general purpose search engine. Indexing terms relevant to a query are defined as the union of all terms assigned to documents retrieved by the initial search run and are used to build a vector space model on this retrieved set. The top ten weighted terms are presented to the user for a relevant non-relevant choice which is used to modify the term weights. Documents are chosen if their summed term weights are greater than some threshold. A user evaluation of the top ten ranked documents as non-relevant will decrease these term weights and a positive judgement will increase them. A new ordering of the retrieved set will generate new display lists of terms and documents. Precision is improved in a test on Alta Vista searches.
  3. Ross, N.C.M.; Wolfram, D.: End user searching on the Internet : an analysis of term pair topics submitted to the Excite search engine (2000) 0.05
    0.051439807 = product of:
      0.10287961 = sum of:
        0.0070626684 = product of:
          0.028250674 = sum of:
            0.028250674 = weight(_text_:based in 4998) [ClassicSimilarity], result of:
              0.028250674 = score(doc=4998,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.19973516 = fieldWeight in 4998, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4998)
          0.25 = coord(1/4)
        0.09581695 = weight(_text_:term in 4998) [ClassicSimilarity], result of:
          0.09581695 = score(doc=4998,freq=4.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.4374403 = fieldWeight in 4998, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.046875 = fieldNorm(doc=4998)
      0.5 = coord(2/4)
    
    Abstract
    Queries submitted to the Excite search engine were analyzed for subject content based on the cooccurrence of terms within multiterm queries. More than 1000 of the most frequently cooccurring term pairs were categorized into one or more of 30 developed subject areas. Subject area frequencies and their cooccurrences with one another were tallied and analyzed using hierarchical cluster analysis and multidimensional scaling. The cluster analyses revealed several anticipated and a few unanticipated groupings of subjects, resulting in several well-defined high-level clusters of broad subject areas. Multidimensional scaling of subject cooccurrences revealed similar relationships among the different subject categories. Applications that arise from a better understanding of the topics users search and their relationships are discussed
  4. Williamson, N.J.: Knowledge structures and the Internet : progress and prospects (2006) 0.05
    0.05065283 = product of:
      0.10130566 = sum of:
        0.079044946 = weight(_text_:term in 238) [ClassicSimilarity], result of:
          0.079044946 = score(doc=238,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.36086982 = fieldWeight in 238, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0546875 = fieldNorm(doc=238)
        0.022260714 = product of:
          0.04452143 = sum of:
            0.04452143 = weight(_text_:22 in 238) [ClassicSimilarity], result of:
              0.04452143 = score(doc=238,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.2708308 = fieldWeight in 238, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=238)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    This paper analyses the development of the knowledge structures provided as aids to users in searching the Internet. Specific focus is given to web directories, thesauri and gateways and portals. The paper assumes that users need to be able to access information in two ways - to locate information on a subject directly in response to a search term and to be able to browse so as to familiarize themselves with a domain or to refine a request. Emphasis is to the browsing aspect. Background and development are addressed. Structures are analyzed, problems are identified, and future directions discussed.
    Date
    27.12.2008 15:56:22
  5. Baeza-Yates, R.; Hurtado, C.; Mendoza, M.: Improving search engines by query clustering (2007) 0.04
    0.043642364 = product of:
      0.08728473 = sum of:
        0.00823978 = product of:
          0.03295912 = sum of:
            0.03295912 = weight(_text_:based in 601) [ClassicSimilarity], result of:
              0.03295912 = score(doc=601,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23302436 = fieldWeight in 601, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=601)
          0.25 = coord(1/4)
        0.079044946 = weight(_text_:term in 601) [ClassicSimilarity], result of:
          0.079044946 = score(doc=601,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.36086982 = fieldWeight in 601, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0546875 = fieldNorm(doc=601)
      0.5 = coord(2/4)
    
    Abstract
    In this paper, we present a framework for clustering Web search engine queries whose aim is to identify groups of queries used to search for similar information on the Web. The framework is based on a novel term vector model of queries that integrates user selections and the content of selected documents extracted from the logs of a search engine. The query representation obtained allows us to treat query clustering similarly to standard document clustering. We study the application of the clustering framework to two problems: relevance ranking boosting and query recommendation. Finally, we evaluate with experiments the effectiveness of our approach.
  6. Stacey, Alison; Stacey, Adrian: Effective information retrieval from the Internet : an advanced user's guide (2004) 0.04
    0.04147133 = product of:
      0.08294266 = sum of:
        0.0047084456 = product of:
          0.018833783 = sum of:
            0.018833783 = weight(_text_:based in 4497) [ClassicSimilarity], result of:
              0.018833783 = score(doc=4497,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.13315678 = fieldWeight in 4497, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4497)
          0.25 = coord(1/4)
        0.07823421 = weight(_text_:term in 4497) [ClassicSimilarity], result of:
          0.07823421 = score(doc=4497,freq=6.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.35716853 = fieldWeight in 4497, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.03125 = fieldNorm(doc=4497)
      0.5 = coord(2/4)
    
    Content
    Key Features - Importantly, the book enables readers to develop strategies which will continue to be useful despite the rapidly-evolving state of the Internet and Internet technologies - it is not about technological `tricks'. - Enables readers to be aware of and compensate for bias and errors which are ubiquitous an the Internet. - Provides contemporary information an the deficiencies in web skills of novice users as well as practical techniques for teaching such users. The Authors Dr Alison Stacey works at the Learning Resource Centre, Cambridge Regional College. Dr Adrian Stacey, formerly based at Cambridge University, is a software programmer. Readership The book is aimed at a wide range of librarians and other information professionals who need to retrieve information from the Internet efficiently, to evaluate their confidence in the information they retrieve and/or to train others to use the Internet. It is primarily aimed at intermediate to advanced users of the Internet. Contents Fundamentals of information retrieval from the Internet - why learn web searching technique; types of information requests; patterns for information retrieval; leveraging the technology: Search term choice: pinpointing information an the web - why choose queries carefully; making search terms work together; how to pick search terms; finding the 'unfindable': Blas an the Internet - importance of bias; sources of bias; usergenerated bias: selecting information with which you already agree; assessing and compensating for bias; case studies: Query reformulation and longer term strategies - how to interact with your search engine; foraging for information; long term information retrieval: using the Internet to find trends; automating searches: how to make your machine do your work: Assessing the quality of results- how to assess and ensure quality: The novice user and teaching internet skills - novice users and their problems with the web; case study: research in a college library; interpreting 'second hand' web information.
  7. Gorbunov, A.L.: Relevance of Web documents : ghosts consensus method (2002) 0.04
    0.037407737 = product of:
      0.074815474 = sum of:
        0.0070626684 = product of:
          0.028250674 = sum of:
            0.028250674 = weight(_text_:based in 1005) [ClassicSimilarity], result of:
              0.028250674 = score(doc=1005,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.19973516 = fieldWeight in 1005, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1005)
          0.25 = coord(1/4)
        0.06775281 = weight(_text_:term in 1005) [ClassicSimilarity], result of:
          0.06775281 = score(doc=1005,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.309317 = fieldWeight in 1005, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.046875 = fieldNorm(doc=1005)
      0.5 = coord(2/4)
    
    Abstract
    The dominant method currently used to improve the quality of Internet search systems is often called "digital democracy." Such an approach implies the utilization of the majority opinion of Internet users to determine the most relevant documents: for example, citation index usage for sorting of search results (google.com) or an enrichment of a query with terms that are asked frequently in relation with the query's theme. "Digital democracy" is an effective instrument in many cases, but it has an unavoidable shortcoming, which is a matter of principle: the average intellectual and cultural level of Internet users is very low- everyone knows what kind of information is dominant in Internet query statistics. Therefore, when one searches the Internet by means of "digital democracy" systems, one gets answers that reflect an underlying assumption that the user's mind potential is very low, and that his cultural interests are not demanding. Thus, it is more correct to use the term "digital ochlocracy" to refer to Internet search systems with "digital democracy." Based an the well-known mathematical mechanism of linear programming, we propose a method to solve the indicated problem.
  8. Zhang, J.; Dimitroff, A.: ¬The impact of webpage content characteristics on webpage visibility in search engine results : part I (2005) 0.04
    0.035971332 = product of:
      0.14388533 = sum of:
        0.14388533 = weight(_text_:frequency in 1032) [ClassicSimilarity], result of:
          0.14388533 = score(doc=1032,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.5204964 = fieldWeight in 1032, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0625 = fieldNorm(doc=1032)
      0.25 = coord(1/4)
    
    Abstract
    Content characteristics of a webpage include factors such as keyword position in a webpage, keyword duplication, layout, and their combination. These factors may impact webpage visibility in a search engine. Four hypotheses are presented relating to the impact of selected content characteristics on webpage visibility in search engine results lists. Webpage visibility can be improved by increasing the frequency of keywords in the title, in the full-text and in both the title and full-text.
  9. Hoeber, O.; Yang, X.D.: Evaluating WordBars in exploratory Web search scenarios (2008) 0.03
    0.032392055 = product of:
      0.06478411 = sum of:
        0.008323434 = product of:
          0.033293735 = sum of:
            0.033293735 = weight(_text_:based in 2046) [ClassicSimilarity], result of:
              0.033293735 = score(doc=2046,freq=4.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23539014 = fieldWeight in 2046, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2046)
          0.25 = coord(1/4)
        0.056460675 = weight(_text_:term in 2046) [ClassicSimilarity], result of:
          0.056460675 = score(doc=2046,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.25776416 = fieldWeight in 2046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2046)
      0.5 = coord(2/4)
    
    Abstract
    Web searchers commonly have difficulties crafting queries to fulfill their information needs; even after they are able to craft a query, they often find it challenging to evaluate the results of their Web searches. Sources of these problems include the lack of support for constructing and refining queries, and the static nature of the list-based representations of Web search results. WordBars has been developed to assist users in their Web search and exploration tasks. This system provides a visual representation of the frequencies of the terms found in the first 100 document surrogates returned from an initial query, in the form of a histogram. Exploration of the search results is supported through term selection in the histogram, resulting in a re-sorting of the search results based on the use of the selected terms in the document surrogates. Terms from the histogram can be easily added or removed from the query, generating a new set of search results. Examples illustrate how WordBars can provide valuable support for query refinement and search results exploration, both when vague and specific initial queries are provided. User evaluations with both expert and intermediate Web searchers illustrate the benefits of the interactive exploration features of WordBars in terms of effectiveness as well as subjective measures. Although differences were found in the demographics of these two user groups, both were able to benefit from the features of WordBars.
  10. Kwok, S.H.; Yang, C.S.: Searching the Peer-to-Peer Networks : the community and their queries (2004) 0.03
    0.031794466 = product of:
      0.12717786 = sum of:
        0.12717786 = weight(_text_:frequency in 2390) [ClassicSimilarity], result of:
          0.12717786 = score(doc=2390,freq=4.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.46005818 = fieldWeight in 2390, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2390)
      0.25 = coord(1/4)
    
    Abstract
    Peer-to-Peer (P2P) networks provide a new distributed computing paradigm an the Internet for file sharing. The decentralized nature of P2P networks fosters cooperative and non-cooperative behaviors in sharing resources. Searching is a major component of P2P file sharing. Several studies have been reported an the nature of queries of World Wide Web (WWW) search engines, but studies an queries of P2P networks have not been reported yet. In this report, we present our study an the Gnutella network, a decentralized and unstructured P2P network. We found that the majority of Gnutella users are located in the United States. Most queries are repeated. This may be because the hosts of the target files connect or disconnect from the network any time, so clients resubmit their queries. Queries are also forwarded from peers to peers. Findings are compared with the data from two other studies of Web queries. The length of queries in the Gnutella network is longer than those reported in the studies of WWW search engines. Queries with the highest frequency are mostly related to the names of movies, songs, artists, singers, and directors. Terms with the highest frequency are related to file formats, entertainment, and sexuality. This study is important for the future design of applications, architecture, and services of P2P networks.
  11. Hupfer, M.E.; Detlor, B.: Gender and Web information seeking : a self-concept orientation model (2006) 0.03
    0.031794466 = product of:
      0.12717786 = sum of:
        0.12717786 = weight(_text_:frequency in 5119) [ClassicSimilarity], result of:
          0.12717786 = score(doc=5119,freq=4.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.46005818 = fieldWeight in 5119, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5119)
      0.25 = coord(1/4)
    
    Abstract
    Adapting the consumer behavior selectivity model to the Web environment, this paper's key contribution is the introduction of a self-concept orientation model of Web information seeking. This model, which addresses gender, effort, and information content factors, questions the commonly assumed equivalence of sex and gender by specifying the measurement of gender-related selfconcept traits known as self- and other-orientation. Regression analyses identified associations between self-orientation, other-orientation, and self-reported search frequencies for content with identical subject domain (e.g., medical information, government information) and differing relevance (i.e., important to the individual personally versus important to someone close to him or her). Self- and other-orientation interacted such that when individuals were highly self-oriented, their frequency of search for both self- and other-relevant information depended on their level of other-orientation. Specifically, high-self/high-other individuals, with a comprehensive processing strategy, searched most often, whereas high-self/low-other respondents, with an effort minimization strategy, reported the lowest search frequencies. This interaction pattern was even more pronounced for other-relevant information seeking. We found no sex differences in search frequency for either self-relevant or other-relevant information.
  12. Thelwall, M.; Stuart, D.: Web crawling ethics revisited : cost, privacy, and denial of service (2006) 0.03
    0.031786613 = product of:
      0.06357323 = sum of:
        0.00823978 = product of:
          0.03295912 = sum of:
            0.03295912 = weight(_text_:based in 6098) [ClassicSimilarity], result of:
              0.03295912 = score(doc=6098,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23302436 = fieldWeight in 6098, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=6098)
          0.25 = coord(1/4)
        0.055333447 = product of:
          0.11066689 = sum of:
            0.11066689 = weight(_text_:assessment in 6098) [ClassicSimilarity], result of:
              0.11066689 = score(doc=6098,freq=2.0), product of:
                0.25917634 = queryWeight, product of:
                  5.52102 = idf(docFreq=480, maxDocs=44218)
                  0.04694356 = queryNorm
                0.4269946 = fieldWeight in 6098, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.52102 = idf(docFreq=480, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=6098)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Ethical aspects of the employment of Web crawlers for information science research and other contexts are reviewed. The difference between legal and ethical uses of communications technologies is emphasized as well as the changing boundary between ethical and unethical conduct. A review of the potential impacts on Web site owners is used to underpin a new framework for ethical crawling, and it is argued that delicate human judgment is required for each individual case, with verdicts likely to change over time. Decisions can be based upon an approximate cost-benefit analysis, but it is crucial that crawler owners find out about the technological issues affecting the owners of the sites being crawled in order to produce an informed assessment.
  13. Lempel, R.; Moran, S.: SALSA: the stochastic approach for link-structure analysis (2001) 0.03
    0.031173116 = product of:
      0.06234623 = sum of:
        0.005885557 = product of:
          0.023542227 = sum of:
            0.023542227 = weight(_text_:based in 10) [ClassicSimilarity], result of:
              0.023542227 = score(doc=10,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.16644597 = fieldWeight in 10, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=10)
          0.25 = coord(1/4)
        0.056460675 = weight(_text_:term in 10) [ClassicSimilarity], result of:
          0.056460675 = score(doc=10,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.25776416 = fieldWeight in 10, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=10)
      0.5 = coord(2/4)
    
    Abstract
    Today, when searching for information on the WWW, one usually performs a query through a term-based search engine. These engines return, as the query's result, a list of Web pages whose contents matches the query. For broad-topic queries, such searches often result in a huge set of retrieved documents, many of which are irrelevant to the user. However, much information is contained in the link-structure of the WWW. Information such as which pages are linked to others can be used to augment search algorithms. In this context, Jon Kleinberg introduced the notion of two distinct types of Web pages: hubs and authorities. Kleinberg argued that hubs and authorities exhibit a mutually reinforcing relationship: a good hub will point to many authorities, and a good authority will be pointed at by many hubs. In light of this, he dervised an algoirthm aimed at finding authoritative pages. We present SALSA, a new stochastic approach for link-structure analysis, which examines random walks on graphs derived from the link-structure. We show that both SALSA and Kleinberg's Mutual Reinforcement approach employ the same metaalgorithm. We then prove that SALSA is quivalent to a weighted in degree analysis of the link-sturcutre of WWW subgraphs, making it computationally more efficient than the Mutual reinforcement approach. We compare that results of applying SALSA to the results derived through Kleinberg's approach. These comparisions reveal a topological Phenomenon called the TKC effectwhich, in certain cases, prevents the Mutual reinforcement approach from identifying meaningful authorities.
  14. Brophy, J.; Bawden, D.: Is Google enough? : Comparison of an internet search engine with academic library resources (2005) 0.03
    0.03089039 = product of:
      0.06178078 = sum of:
        0.005885557 = product of:
          0.023542227 = sum of:
            0.023542227 = weight(_text_:based in 648) [ClassicSimilarity], result of:
              0.023542227 = score(doc=648,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.16644597 = fieldWeight in 648, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=648)
          0.25 = coord(1/4)
        0.055895224 = product of:
          0.11179045 = sum of:
            0.11179045 = weight(_text_:assessment in 648) [ClassicSimilarity], result of:
              0.11179045 = score(doc=648,freq=4.0), product of:
                0.25917634 = queryWeight, product of:
                  5.52102 = idf(docFreq=480, maxDocs=44218)
                  0.04694356 = queryNorm
                0.43132967 = fieldWeight in 648, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.52102 = idf(docFreq=480, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=648)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Purpose - The purpose of the study was to compare an internet search engine, Google, with appropriate library databases and systems, in order to assess the relative value, strengths and weaknesses of the two sorts of system. Design/methodology/approach - A case study approach was used, with detailed analysis and failure checking of results. The performance of the two systems was assessed in terms of coverage, unique records, precision, and quality and accessibility of results. A novel form of relevance assessment, based on the work of Saracevic and others was devised. Findings - Google is superior for coverage and accessibility. Library systems are superior for quality of results. Precision is similar for both systems. Good coverage requires use of both, as both have many unique items. Improving the skills of the searcher is likely to give better results from the library systems, but not from Google. Research limitations/implications - Only four case studies were included. These were limited to the kind of queries likely to be searched by university students. Library resources were limited to those in two UK academic libraries. Only the basic Google web search functionality was used, and only the top ten records examined. Practical implications - The results offer guidance for those providing support and training for use of these retrieval systems, and also provide evidence for debates on the "Google phenomenon". Originality/value - This is one of the few studies which provide evidence on the relative performance of internet search engines and library databases, and the only one to conduct such in-depth case studies. The method for the assessment of relevance is novel.
  15. Can, F.; Nuray, R.; Sevdik, A.B.: Automatic performance evaluation of Web search engines (2004) 0.03
    0.028708395 = product of:
      0.05741679 = sum of:
        0.009988121 = product of:
          0.039952483 = sum of:
            0.039952483 = weight(_text_:based in 2570) [ClassicSimilarity], result of:
              0.039952483 = score(doc=2570,freq=4.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.28246817 = fieldWeight in 2570, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2570)
          0.25 = coord(1/4)
        0.047428668 = product of:
          0.094857335 = sum of:
            0.094857335 = weight(_text_:assessment in 2570) [ClassicSimilarity], result of:
              0.094857335 = score(doc=2570,freq=2.0), product of:
                0.25917634 = queryWeight, product of:
                  5.52102 = idf(docFreq=480, maxDocs=44218)
                  0.04694356 = queryNorm
                0.36599535 = fieldWeight in 2570, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.52102 = idf(docFreq=480, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2570)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Measuring the information retrieval effectiveness of World Wide Web search engines is costly because of human relevance judgments involved. However, both for business enterprises and people it is important to know the most effective Web search engines, since such search engines help their users find higher number of relevant Web pages with less effort. Furthermore, this information can be used for several practical purposes. In this study we introduce automatic Web search engine evaluation method as an efficient and effective assessment tool of such systems. The experiments based on eight Web search engines, 25 queries, and binary user relevance judgments show that our method provides results consistent with human-based evaluations. It is shown that the observed consistencies are statistically significant. This indicates that the new method can be successfully used in the evaluation of Web search engines.
  16. Noruzi, A.: Google Scholar : the new generation of citation indexes (2005) 0.03
    0.0269785 = product of:
      0.107914 = sum of:
        0.107914 = weight(_text_:frequency in 5061) [ClassicSimilarity], result of:
          0.107914 = score(doc=5061,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.39037234 = fieldWeight in 5061, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.046875 = fieldNorm(doc=5061)
      0.25 = coord(1/4)
    
    Abstract
    Google Scholar (http://scholar.google.com) provides a new method of locating potentially relevant articles on a given subject by identifying subsequent articles that cite a previously published article. An important feature of Google Scholar is that researchers can use it to trace interconnections among authors citing articles on the same topic and to determine the frequency with which others cite a specific article, as it has a "cited by" feature. This study begins with an overview of how to use Google Scholar for citation analysis and identifies advanced search techniques not well documented by Google Scholar. This study also compares the citation counts provided by Web of Science and Google Scholar for articles in the field of "Webometrics." It makes several suggestions for improving Google Scholar. Finally, it concludes that Google Scholar provides a free alternative or complement to other citation indexes.
  17. Summann, F.; Lossau, N.: Search engine technology and digital libraries : moving from theory to practice (2004) 0.03
    0.026661905 = product of:
      0.05332381 = sum of:
        0.008155267 = product of:
          0.032621067 = sum of:
            0.032621067 = weight(_text_:based in 1196) [ClassicSimilarity], result of:
              0.032621067 = score(doc=1196,freq=6.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.2306343 = fieldWeight in 1196, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1196)
          0.25 = coord(1/4)
        0.04516854 = weight(_text_:term in 1196) [ClassicSimilarity], result of:
          0.04516854 = score(doc=1196,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.20621133 = fieldWeight in 1196, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.03125 = fieldNorm(doc=1196)
      0.5 = coord(2/4)
    
    Abstract
    This article describes the journey from the conception of and vision for a modern search-engine-based search environment to its technological realisation. In doing so, it takes up the thread of an earlier article on this subject, this time from a technical viewpoint. As well as presenting the conceptual considerations of the initial stages, this article will principally elucidate the technological aspects of this journey. The starting point for the deliberations about development of an academic search engine was the experience we gained through the generally successful project "Digital Library NRW", in which from 1998 to 2000-with Bielefeld University Library in overall charge-we designed a system model for an Internet-based library portal with an improved academic search environment at its core. At the heart of this system was a metasearch with an availability function, to which we added a user interface integrating all relevant source material for study and research. The deficiencies of this approach were felt soon after the system was launched in June 2001. There were problems with the stability and performance of the database retrieval system, with the integration of full-text documents and Internet pages, and with acceptance by users, because users are increasingly performing the searches themselves using search engines rather than going to the library for help in doing searches. Since a long list of problems are also encountered using commercial search engines for academic use (in particular the retrieval of academic information and long-term availability), the idea was born for a search engine configured specifically for academic use. We also hoped that with one single access point founded on improved search engine technology, we could access the heterogeneous academic resources of subject-based bibliographic databases, catalogues, electronic newspapers, document servers and academic web pages.
  18. Bladow, N.; Dorey, C.; Frederickson, L.; Grover, P.; Knudtson, Y.; Krishnamurthy, S.; Lazarou, V.: What's the Buzz about? : An empirical examination of Search on Yahoo! (2005) 0.02
    0.023954237 = product of:
      0.09581695 = sum of:
        0.09581695 = weight(_text_:term in 3072) [ClassicSimilarity], result of:
          0.09581695 = score(doc=3072,freq=4.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.4374403 = fieldWeight in 3072, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.046875 = fieldNorm(doc=3072)
      0.25 = coord(1/4)
    
    Abstract
    We present an analysis of the Yahoo Buzz Index over a period of 45 weeks. Our key findings are that: (1) It is most common for a search term to show up on the index for one week, followed by two weeks, three weeks, etc. Only two terms persist for all 45 weeks studied - Britney Spears and Jennifer Lopez. Search term longevity follows a power-law distribution or a winner-take-all structure; (2) Most search terms focus on entertainment. Search terms related to serious topics are found less often. The Buzz Index does not necessarily follow the "news cycle"; and, (3) We provide two ways to determine "star power" of various search terms - one that emphasizes staying power on the Index and another that emphasizes rank. In general, the methods lead to dramatically different results. Britney Spears performs well in both methods. We conclude that the data available on the Index is symptomatic of a celebrity-crazed, entertainment-centered culture.
  19. Jansen, B.J.; Pooch , U.: ¬A review of Web searching studies and a framework for future research (2001) 0.02
    0.019761236 = product of:
      0.079044946 = sum of:
        0.079044946 = weight(_text_:term in 5186) [ClassicSimilarity], result of:
          0.079044946 = score(doc=5186,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.36086982 = fieldWeight in 5186, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5186)
      0.25 = coord(1/4)
    
    Abstract
    Jansen and Pooch review three major search engine studies and compare them to three traditional search system studies and three OPAC search studies, to determine if user search characteristics differ. The web search engine studies indicate that most searchers use two, two search term queries per session, no boolean operators, and look only at the top ten items returned, while reporting the location of relevant information. In traditional search systems we find seven to 16 queries of six to nine terms, while about ten documents per session were viewed. The OPAC studies indicated two to five queries per session of two or less terms, with Boolean search about 1% and less than 50 documents viewed.
  20. Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (2005) 0.02
    0.019558553 = product of:
      0.07823421 = sum of:
        0.07823421 = weight(_text_:term in 7) [ClassicSimilarity], result of:
          0.07823421 = score(doc=7,freq=6.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.35716853 = fieldWeight in 7, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.03125 = fieldNorm(doc=7)
      0.25 = coord(1/4)
    
    Content
    Inhalt: Introduction Document File Preparation - Manual Indexing - Information Extraction - Vector Space Modeling - Matrix Decompositions - Query Representations - Ranking and Relevance Feedback - Searching by Link Structure - User Interface - Book Format Document File Preparation Document Purification and Analysis - Text Formatting - Validation - Manual Indexing - Automatic Indexing - Item Normalization - Inverted File Structures - Document File - Dictionary List - Inversion List - Other File Structures Vector Space Models Construction - Term-by-Document Matrices - Simple Query Matching - Design Issues - Term Weighting - Sparse Matrix Storage - Low-Rank Approximations Matrix Decompositions QR Factorization - Singular Value Decomposition - Low-Rank Approximations - Query Matching - Software - Semidiscrete Decomposition - Updating Techniques Query Management Query Binding - Types of Queries - Boolean Queries - Natural Language Queries - Thesaurus Queries - Fuzzy Queries - Term Searches - Probabilistic Queries Ranking and Relevance Feedback Performance Evaluation - Precision - Recall - Average Precision - Genetic Algorithms - Relevance Feedback Searching by Link Structure HITS Method - HITS Implementation - HITS Summary - PageRank Method - PageRank Adjustments - PageRank Implementation - PageRank Summary User Interface Considerations General Guidelines - Search Engine Interfaces - Form Fill-in - Display Considerations - Progress Indication - No Penalties for Error - Results - Test and Retest - Final Considerations Further Reading

Types

  • a 101
  • el 11
  • m 6
  • x 1
  • More… Less…