Search (154 results, page 1 of 8)

Ozumutlu, H.C.; Cavdur, F.: ¬Application of automatic topic identification on Excite Web search engine data logs (2005) 0.03

0.032942846 = product of:
  0.06588569 = sum of:
    0.036211025 = weight(_text_:data in 1047) [ClassicSimilarity], result of:
      0.036211025 = score(doc=1047,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24455236 = fieldWeight in 1047, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1047)
    0.029674664 = product of:
      0.05934933 = sum of:
        0.05934933 = weight(_text_:processing in 1047) [ClassicSimilarity], result of:
          0.05934933 = score(doc=1047,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.3130829 = fieldWeight in 1047, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1047)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Source: Information processing and management. 41(2005) no.5, S.1243-1262

Loia, V.; Pedrycz, W.; Senatore, S.; Sessa, M.I.: Web navigation support by means of proximity-driven assistant agents (2006) 0.03
```
0.026219916 = product of:
  0.05243983 = sum of:
    0.03657866 = weight(_text_:data in 5283) [ClassicSimilarity], result of:
      0.03657866 = score(doc=5283,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 5283, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5283)
    0.01586117 = product of:
      0.03172234 = sum of:
        0.03172234 = weight(_text_:22 in 5283) [ClassicSimilarity], result of:
          0.03172234 = score(doc=5283,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.19345059 = fieldWeight in 5283, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5283)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The explosive growth of the Web and the consequent exigency of the Web personalization domain have gained a key position in the direction of customization of the Web information to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user's navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content, and user profile data. This work presents an agent-based framework designed to help a user in achieving personalized navigation, by recommending related documents according to the user's responses in similar-pages searching mode. Our agent-based approach is grounded in the integration of different techniques and methodologies into a unique platform featuring user profiling, fuzzy multisets, proximity-oriented fuzzy clustering, and knowledge-based discovery technologies. Each of these methodologies serves to solve one facet of the general problem (discovering documents relevant to the user by searching the Web) and is treated by specialized agents that ultimately achieve the final functionality through cooperation and task distribution.

Date

22. 7.2006 16:59:13

Fischer, T.; Neuroth, H.: SSG-FI - special subject gateways to high quality Internet resources for scientific users (2000) 0.03

0.025035713 = product of:
  0.050071426 = sum of:
    0.031038022 = weight(_text_:data in 4873) [ClassicSimilarity], result of:
      0.031038022 = score(doc=4873,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2096163 = fieldWeight in 4873, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=4873)
    0.019033402 = product of:
      0.038066804 = sum of:
        0.038066804 = weight(_text_:22 in 4873) [ClassicSimilarity], result of:
          0.038066804 = score(doc=4873,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.23214069 = fieldWeight in 4873, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=4873)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Project SSG-FI at SUB Göttingen provides special subject gateways to international high quality Internet resources for scientific users. Internet sites are selected by subject specialists and described using an extension of qualified Dublin Core metadata. A basic evaluation is added. These descriptions are freely available and can be searched and browsed. These are now subject gateways for 3 subject ares: earth sciences (GeoGuide); mathematics (MathGuide); and Anglo-American culture (split into HistoryGuide and AnglistikGuide). Together they receive about 3.300 'hard' requests per day, thus reaching over 1 million requests per year. The project SSG-FI behind these guides is open to collaboration. Institutions and private persons wishing to contribute can notify the SSG-FI team or send full data sets. Regular contributors can request registration with the project to access the database via the Internet and create and edit records
Date: 22. 6.2002 19:40:42

Spink, A.; Ozmultu, H.C.: Characteristics of question format web queries : an exploratory study (2002) 0.02
```
0.023530604 = product of:
  0.04706121 = sum of:
    0.02586502 = weight(_text_:data in 3910) [ClassicSimilarity], result of:
      0.02586502 = score(doc=3910,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.17468026 = fieldWeight in 3910, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3910)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 3910) [ClassicSimilarity], result of:
          0.042392377 = score(doc=3910,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 3910, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3910)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Web queries in question format are becoming a common element of a user's interaction with Web search engines. Web search services such as Ask Jeeves - a publicly accessible question and answer (Q&A) search engine - request users to enter question format queries. This paper provides results from a study examining queries in question format submitted to two different Web search engines - Ask Jeeves that explicitly encourages queries in question format and the Excite search service that does not explicitly encourage queries in question format. We identify the characteristics of queries in question format in two different data sets: (1) 30,000 Ask Jeeves queries and 15,575 Excite queries, including the nature, length, and structure of queries in question format. Findings include: (1) 50% of Ask Jeeves queries and less than 1% of Excite were in question format, (2) most users entered only one query in question format with little query reformulation, (3) limited range of formats for queries in question format - mainly "where", "what", or "how" questions, (4) most common question query format was "Where can I find ..." for general information on a topic, and (5) non-question queries may be in request format. Overall, four types of user Web queries were identified: keyword, Boolean, question, and request. These findings provide an initial mapping of the structure and content of queries in question and request format. Implications for Web search services are discussed.

Source

Information processing and management. 38(2002) no.4, S.453-471
Bar-Ilan, J.: Comparing rankings of search results on the Web (2005) 0.02
```
0.023530604 = product of:
  0.04706121 = sum of:
    0.02586502 = weight(_text_:data in 1068) [ClassicSimilarity], result of:
      0.02586502 = score(doc=1068,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.17468026 = fieldWeight in 1068, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1068)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 1068) [ClassicSimilarity], result of:
          0.042392377 = score(doc=1068,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 1068, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1068)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The Web has become an information source for professional data gathering. Because of the vast amounts of information on almost all topics, one cannot systematically go over the whole set of results, and therefore must rely on the ordering of the results by the search engine. It is well known that search engines on the Web have low overlap in terms of coverage. In this study we measure how similar are the rankings of search engines on the overlapping results. We compare rankings of results for identical queries retrieved from several search engines. The method is based only on the set of URLs that appear in the answer sets of the engines being compared. For comparing the similarity of rankings of two search engines, the Spearman correlation coefficient is computed. When comparing more than two sets Kendall's W is used. These are well-known measures and the statistical significance of the results can be computed. The methods are demonstrated on a set of 15 queries that were submitted to four large Web search engines. The findings indicate that the large public search engines on the Web employ considerably different ranking algorithms.

Source

Information processing and management. 41(2005) no.6, S.1511-1519

Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (2005) 0.02

0.022336382 = product of:
  0.044672765 = sum of:
    0.020692015 = weight(_text_:data in 7) [ClassicSimilarity], result of:
      0.020692015 = score(doc=7,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.1397442 = fieldWeight in 7, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03125 = fieldNorm(doc=7)
    0.02398075 = product of:
      0.0479615 = sum of:
        0.0479615 = weight(_text_:processing in 7) [ClassicSimilarity], result of:
          0.0479615 = score(doc=7,freq=4.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.2530092 = fieldWeight in 7, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.03125 = fieldNorm(doc=7)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The second edition of Understanding Search Engines: Mathematical Modeling and Text Retrieval follows the basic premise of the first edition by discussing many of the key design issues for building search engines and emphasizing the important role that applied mathematics can play in improving information retrieval. The authors discuss important data structures, algorithms, and software as well as user-centered issues such as interfaces, manual indexing, and document preparation. Significant changes bring the text up to date on current information retrieval methods: for example the addition of a new chapter on link-structure algorithms used in search engines such as Google. The chapter on user interface has been rewritten to specifically focus on search engine usability. In addition the authors have added new recommendations for further reading and expanded the bibliography, and have updated and streamlined the index to make it more reader friendly.
LCSH: Text processing (Computer science)
Subject: Text processing (Computer science)

Su, L.T.: ¬A comprehensive and systematic model of user evaluation of Web search engines : Il. An evaluation by undergraduates (2003) 0.02
```
0.020863095 = product of:
  0.04172619 = sum of:
    0.02586502 = weight(_text_:data in 2117) [ClassicSimilarity], result of:
      0.02586502 = score(doc=2117,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.17468026 = fieldWeight in 2117, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2117)
    0.01586117 = product of:
      0.03172234 = sum of:
        0.03172234 = weight(_text_:22 in 2117) [ClassicSimilarity], result of:
          0.03172234 = score(doc=2117,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.19345059 = fieldWeight in 2117, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2117)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

This paper presents an application of the model described in Part I to the evaluation of Web search engines by undergraduates. The study observed how 36 undergraduate used four major search engines to find information for their own individual problems and how they evaluated these engines based an actual interaction with the search engines. User evaluation was based an 16 performance measures representing five evaluation criteria: relevance, efficiency, utility, user satisfaction, and connectivity. Non-performance (user-related) measures were also applied. Each participant searched his/ her own topic an all four engines and provided satisfaction ratings for system features and interaction and reasons for satisfaction. Each also made relevance judgements of retrieved items in relation to his/her own information need and participated in post-search Interviews to provide reactions to the search results and overall performance. The study found significant differences in precision PR1 relative recall, user satisfaction with output display, time saving, value of search results, and overall performance among the four engines and also significant engine by discipline interactions an all these measures. In addition, the study found significant differences in user satisfaction with response time among four engines, and significant engine by discipline interaction in user satisfaction with search interface. None of the four search engines dominated in every aspect of the multidimensional evaluation. Content analysis of verbal data identified a number of user criteria and users evaluative comments based an these criteria. Results from both quantitative analysis and content analysis provide insight for system design and development, and useful feedback an strengths and weaknesses of search engines for system improvement

Date

24. 1.2004 18:27:22
Baeza-Yates, R.; Boldi, P.; Castillo, C.: Generalizing PageRank : damping functions for linkbased ranking algorithms (2006) 0.02
```
0.020863095 = product of:
  0.04172619 = sum of:
    0.02586502 = weight(_text_:data in 2565) [ClassicSimilarity], result of:
      0.02586502 = score(doc=2565,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.17468026 = fieldWeight in 2565, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2565)
    0.01586117 = product of:
      0.03172234 = sum of:
        0.03172234 = weight(_text_:22 in 2565) [ClassicSimilarity], result of:
          0.03172234 = score(doc=2565,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.19345059 = fieldWeight in 2565, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2565)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

This paper introduces a family of link-based ranking algorithms that propagate page importance through links. In these algorithms there is a damping function that decreases with distance, so a direct link implies more endorsement than a link through a long path. PageRank is the most widely known ranking function of this family. The main objective of this paper is to determine whether this family of ranking techniques has some interest per se, and how different choices for the damping function impact on rank quality and on convergence speed. Even though our results suggest that PageRank can be approximated with other simpler forms of rankings that may be computed more efficiently, our focus is of more speculative nature, in that it aims at separating the kernel of PageRank, that is, link-based importance propagation, from the way propagation decays over paths. We focus on three damping functions, having linear, exponential, and hyperbolic decay on the lengths of the paths. The exponential decay corresponds to PageRank, and the other functions are new. Our presentation includes algorithms, analysis, comparisons and experiments that study their behavior under different parameters in real Web graph data. Among other results, we show how to calculate a linear approximation that induces a page ordering that is almost identical to PageRank's using a fixed small number of iterations; comparisons were performed using Kendall's tau on large domain datasets.

Date

16. 1.2016 10:22:28
Langville, A.N.; Meyer, C.D.: Google's PageRank and beyond : the science of search engine rankings (2006) 0.02
```
0.01996638 = product of:
  0.03993276 = sum of:
    0.021947198 = weight(_text_:data in 6) [ClassicSimilarity], result of:
      0.021947198 = score(doc=6,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.14822112 = fieldWeight in 6, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0234375 = fieldNorm(doc=6)
    0.017985564 = product of:
      0.035971127 = sum of:
        0.035971127 = weight(_text_:processing in 6) [ClassicSimilarity], result of:
          0.035971127 = score(doc=6,freq=4.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.1897569 = fieldWeight in 6, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0234375 = fieldNorm(doc=6)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other Web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of Web page rankings, "Google's PageRank and Beyond" supplies the answers to these and other questions and more. The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research. The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample Web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text. Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided. It includes: many illustrative examples and entertaining asides; MATLAB code; accessible and informal style; and complete and self-contained section for mathematics review.

Content

Inhalt: Chapter 1. Introduction to Web Search Engines: 1.1 A Short History of Information Retrieval - 1.2 An Overview of Traditional Information Retrieval - 1.3 Web Information Retrieval Chapter 2. Crawling, Indexing, and Query Processing: 2.1 Crawling - 2.2 The Content Index - 2.3 Query Processing Chapter 3. Ranking Webpages by Popularity: 3.1 The Scene in 1998 - 3.2 Two Theses - 3.3 Query-Independence Chapter 4. The Mathematics of Google's PageRank: 4.1 The Original Summation Formula for PageRank - 4.2 Matrix Representation of the Summation Equations - 4.3 Problems with the Iterative Process - 4.4 A Little Markov Chain Theory - 4.5 Early Adjustments to the Basic Model - 4.6 Computation of the PageRank Vector - 4.7 Theorem and Proof for Spectrum of the Google Matrix Chapter 5. Parameters in the PageRank Model: 5.1 The a Factor - 5.2 The Hyperlink Matrix H - 5.3 The Teleportation Matrix E Chapter 6. The Sensitivity of PageRank; 6.1 Sensitivity with respect to alpha - 6.2 Sensitivity with respect to H - 6.3 Sensitivity with respect to vT - 6.4 Other Analyses of Sensitivity - 6.5 Sensitivity Theorems and Proofs Chapter 7. The PageRank Problem as a Linear System: 7.1 Properties of (I - alphaS) - 7.2 Properties of (I - alphaH) - 7.3 Proof of the PageRank Sparse Linear System Chapter 8. Issues in Large-Scale Implementation of PageRank: 8.1 Storage Issues - 8.2 Convergence Criterion - 8.3 Accuracy - 8.4 Dangling Nodes - 8.5 Back Button Modeling
Chapter 9. Accelerating the Computation of PageRank: 9.1 An Adaptive Power Method - 9.2 Extrapolation - 9.3 Aggregation - 9.4 Other Numerical Methods Chapter 10. Updating the PageRank Vector: 10.1 The Two Updating Problems and their History - 10.2 Restarting the Power Method - 10.3 Approximate Updating Using Approximate Aggregation - 10.4 Exact Aggregation - 10.5 Exact vs. Approximate Aggregation - 10.6 Updating with Iterative Aggregation - 10.7 Determining the Partition - 10.8 Conclusions Chapter 11. The HITS Method for Ranking Webpages: 11.1 The HITS Algorithm - 11.2 HITS Implementation - 11.3 HITS Convergence - 11.4 HITS Example - 11.5 Strengths and Weaknesses of HITS - 11.6 HITS's Relationship to Bibliometrics - 11.7 Query-Independent HITS - 11.8 Accelerating HITS - 11.9 HITS Sensitivity Chapter 12. Other Link Methods for Ranking Webpages: 12.1 SALSA - 12.2 Hybrid Ranking Methods - 12.3 Rankings based on Traffic Flow Chapter 13. The Future of Web Information Retrieval: 13.1 Spam - 13.2 Personalization - 13.3 Clustering - 13.4 Intelligent Agents - 13.5 Trends and Time-Sensitive Search - 13.6 Privacy and Censorship - 13.7 Library Classification Schemes - 13.8 Data Fusion Chapter 14. Resources for Web Information Retrieval: 14.1 Resources for Getting Started - 14.2 Resources for Serious Study Chapter 15. The Mathematics Guide: 15.1 Linear Algebra - 15.2 Perron-Frobenius Theory - 15.3 Markov Chains - 15.4 Perron Complementation - 15.5 Stochastic Complementation - 15.6 Censoring - 15.7 Aggregation - 15.8 Disaggregation
Drabenstott, K.M.: Web search strategies (2000) 0.02
```
0.016690476 = product of:
  0.03338095 = sum of:
    0.020692015 = weight(_text_:data in 1188) [ClassicSimilarity], result of:
      0.020692015 = score(doc=1188,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.1397442 = fieldWeight in 1188, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03125 = fieldNorm(doc=1188)
    0.012688936 = product of:
      0.025377871 = sum of:
        0.025377871 = weight(_text_:22 in 1188) [ClassicSimilarity], result of:
          0.025377871 = score(doc=1188,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.15476047 = fieldWeight in 1188, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1188)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Surfing the World Wide Web used to be cool, dude, real cool. But things have gotten hot - so hot that finding something useful an the Web is no longer cool. It is suffocating Web searchers in the smoke and debris of mountain-sized lists of hits, decisions about which search engines they should use, whether they will get lost in the dizzying maze of a subject directory, use the right syntax for the search engine at hand, enter keywords that are likely to retrieve hits an the topics they have in mind, or enlist a browser that has sufficient functionality to display the most promising hits. When it comes to Web searching, in a few short years we have gone from the cool image of surfing the Web into the frying pan of searching the Web. We can turn down the heat by rethinking what Web searchers are doing and introduce some order into the chaos. Web search strategies that are tool-based-oriented to specific Web searching tools such as search en gines, subject directories, and meta search engines-have been widely promoted, and these strategies are just not working. It is time to dissect what Web searching tools expect from searchers and adjust our search strategies to these new tools. This discussion offers Web searchers help in the form of search strategies that are based an strategies that librarians have been using for a long time to search commercial information retrieval systems like Dialog, NEXIS, Wilsonline, FirstSearch, and Data-Star.

Date

22. 9.1997 19:16:05

Hölzig, C.: Google spürt Grippewellen auf : Die neue Anwendung ist bisher auf die USA beschränkt (2008) 0.02

0.016690476 = product of:
  0.03338095 = sum of:
    0.020692015 = weight(_text_:data in 2403) [ClassicSimilarity], result of:
      0.020692015 = score(doc=2403,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.1397442 = fieldWeight in 2403, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03125 = fieldNorm(doc=2403)
    0.012688936 = product of:
      0.025377871 = sum of:
        0.025377871 = weight(_text_:22 in 2403) [ClassicSimilarity], result of:
          0.025377871 = score(doc=2403,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.15476047 = fieldWeight in 2403, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2403)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 3. 5.1997 8:44:22
Theme: Data Mining

Herrera-Viedma, E.; Pasi, G.: Soft approaches to information retrieval and information access on the Web : an introduction to the special topic section (2006) 0.01
```
0.014822943 = product of:
  0.059291773 = sum of:
    0.059291773 = sum of:
      0.033913903 = weight(_text_:processing in 5285) [ClassicSimilarity], result of:
        0.033913903 = score(doc=5285,freq=2.0), product of:
          0.18956426 = queryWeight, product of:
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.046827413 = queryNorm
          0.17890452 = fieldWeight in 5285, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.03125 = fieldNorm(doc=5285)
      0.025377871 = weight(_text_:22 in 5285) [ClassicSimilarity], result of:
        0.025377871 = score(doc=5285,freq=2.0), product of:
          0.16398162 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046827413 = queryNorm
          0.15476047 = fieldWeight in 5285, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=5285)
  0.25 = coord(1/4)
```
Abstract

The World Wide Web is a popular and interactive medium used to collect, disseminate, and access an increasingly huge amount of information, which constitutes the mainstay of the so-called information and knowledge society. Because of its spectacular growth, related to both Web resources (pages, sites, and services) and number of users, the Web is nowadays the main information repository and provides some automatic systems for locating, accessing, and retrieving information. However, an open and crucial question remains: how to provide fast and effective retrieval of the information relevant to specific users' needs. This is a very hard and complex task, since it is pervaded with subjectivity, vagueness, and uncertainty. The expression soft computing refers to techniques and methodologies that work synergistically with the aim of providing flexible information processing tolerant of imprecision, vagueness, partial truth, and approximation. So, soft computing represents a good candidate to design effective systems for information access and retrieval on the Web. One of the most representative tools of soft computing is fuzzy set theory. This special topic section collects research articles witnessing some recent advances in improving the processes of information access and retrieval on the Web by using soft computing tools, and in particular, by using fuzzy sets and/or integrating them with other soft computing tools. In this introductory article, we first review the problem of Web retrieval and the concept of soft computing technology. We then briefly introduce the articles in this section and conclude by highlighting some future research directions that could benefit from the use of soft computing technologies.

Date

22. 7.2006 16:59:33
Jepsen, E.T.; Seiden, P.; Ingwersen, P.; Björneborn, L.; Borlund, P.: Characteristics of scientific Web publications : preliminary data gathering and analysis (2004) 0.01
```
0.01293251 = product of:
  0.05173004 = sum of:
    0.05173004 = weight(_text_:data in 3091) [ClassicSimilarity], result of:
      0.05173004 = score(doc=3091,freq=8.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.34936053 = fieldWeight in 3091, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3091)
  0.25 = coord(1/4)
```
Abstract

Because of the increasing presence of scientific publications an the Web, combined with the existing difficulties in easily verifying and retrieving these publications, research an techniques and methods for retrieval of scientific Web publications is called for. In this article, we report an the initial steps taken toward the construction of a test collection of scientific Web publications within the subject domain of plant biology. The steps reported are those of data gathering and data analysis aiming at identifying characteristics of scientific Web publications. The data used in this article were generated based an specifically selected domain topics that are searched for in three publicly accessible search engines (Google, AlITheWeb, and AItaVista). A sample of the retrieved hits was analyzed with regard to how various publication attributes correlated with the scientific quality of the content and whether this information could be employed to harvest, filter, and rank Web publications. The attributes analyzed were inlinks, outlinks, bibliographic references, file format, language, search engine overlap, structural position (according to site structure), and the occurrence of various types of metadata. As could be expected, the ranked output differs between the three search engines. Apparently, this is caused by differences in ranking algorithms rather than the databases themselves. In fact, because scientific Web content in this subject domain receives few inlinks, both AItaVista and AlITheWeb retrieved a higher degree of accessible scientific content than Google. Because of the search engine cutoffs of accessible URLs, the feasibility of using search engine output for Web content analysis is also discussed.
Rotenberg, B.: Towards personalised search : EU Data Protection Law and its implications for media pluralism (2007) 0.01
```
0.01293251 = product of:
  0.05173004 = sum of:
    0.05173004 = weight(_text_:data in 373) [ClassicSimilarity], result of:
      0.05173004 = score(doc=373,freq=8.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.34936053 = fieldWeight in 373, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=373)
  0.25 = coord(1/4)
```
Abstract

On 17 March 2006, Google, the major web search engine, won a partial victory in its legal battle against the United States government. In an attempt to enforce the 1998 Child Online Protection Act, the us government had asked it to provide one million web addresses or URLs that are accessible through Google, as well as 5,000 users' search queries. In Gonzales v. Google, a California District Court ruled that Google did not have to comply fully with the us government's request: Google did not need to disclose a single search query, and was not required to provide more than 50.000 web addresses. However, it soon appeared that Microsoft, AOL and Yahoo! had handed over the information requested by the government in that instance, and in the course of this case all search engines publicly admitted massive user data collection. It turns out that all major search engines are able to provide a list of IP addresses with the actual search queries made, and vice versa. Scarcely five months later, AOL's search engine logs were the subject of yet another round of data protection concerns. There was a public outcry when it became known that it had published 21 million search queries, that is, the search histories of more than 650,000 of its users. While AOL's intentions were laudable (namely supporting research in user behaviour), it emerged that making the link between the unique ID supplied for a given user and the real-world identity was not all that difficult. Both these cases are milestones in raising awareness of the importance of data protection in relation to web search.
Liu, Y.; Zhang, M.; Cen, R.; Ru, L.; Ma, S.: Data cleansing for Web information retrieval using query independent features (2007) 0.01
```
0.01293251 = product of:
  0.05173004 = sum of:
    0.05173004 = weight(_text_:data in 607) [ClassicSimilarity], result of:
      0.05173004 = score(doc=607,freq=8.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.34936053 = fieldWeight in 607, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=607)
  0.25 = coord(1/4)
```
Abstract

Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance.

Theme

Data Mining
Wang, P.; Berry, M.W.; Yang, Y.: Mining longitudinal Web queries : trends and patterns (2003) 0.01
```
0.012802532 = product of:
  0.051210128 = sum of:
    0.051210128 = weight(_text_:data in 6561) [ClassicSimilarity], result of:
      0.051210128 = score(doc=6561,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.34584928 = fieldWeight in 6561, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6561)
  0.25 = coord(1/4)
```
Abstract

This project analyzed 541,920 user queries submitted to and executed in an academic Website during a four-year period (May 1997 to May 2001) using a relational database. The purpose of the study is three-fold: (1) to understand Web users' query behavior; (2) to identify problems encountered by these Web users; (3) to develop appropriate techniques for optimization of query analysis and mining. The linguistic analyses focus an query structures, lexicon, and word associations using statistical measures such as Zipf distribution and mutual information. A data model with finest granularity is used for data storage and iterative analyses. Patterns and trends of querying behavior are identified and compared with previous studies.

Dempsey, B.J.: Design and empirical evaluation of search software for legal professionals on the WWW (2000) 0.01

0.012717713 = product of:
  0.05087085 = sum of:
    0.05087085 = product of:
      0.1017417 = sum of:
        0.1017417 = weight(_text_:processing in 6274) [ClassicSimilarity], result of:
          0.1017417 = score(doc=6274,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.53671354 = fieldWeight in 6274, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.09375 = fieldNorm(doc=6274)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Information processing and management. 36(2000) no.2, S.253-273

Stock, M.; Stock, W.G.: Recherchieren im Internet (2004) 0.01

0.012688936 = product of:
  0.050755743 = sum of:
    0.050755743 = product of:
      0.101511486 = sum of:
        0.101511486 = weight(_text_:22 in 4686) [ClassicSimilarity], result of:
          0.101511486 = score(doc=4686,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.61904186 = fieldWeight in 4686, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=4686)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 27.11.2005 18:04:22

MacLeod, R.: Promoting a subject gateway : a case study from EEVL (Edinburgh Engineering Virtual Library) (2000) 0.01

0.011215541 = product of:
  0.044862162 = sum of:
    0.044862162 = product of:
      0.089724325 = sum of:
        0.089724325 = weight(_text_:22 in 4872) [ClassicSimilarity], result of:
          0.089724325 = score(doc=4872,freq=4.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.54716086 = fieldWeight in 4872, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=4872)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 6.2002 19:40:22

Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.01

0.011102819 = product of:
  0.044411276 = sum of:
    0.044411276 = product of:
      0.08882255 = sum of:
        0.08882255 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
          0.08882255 = score(doc=3445,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.5416616 = fieldWeight in 3445, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3445)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 25. 8.2005 17:42:22

Search (154 results, page 1 of 8)

Authors

Languages

Types

Themes

Subjects

Classifications