Search (305 results, page 1 of 16)

Langville, A.N.; Meyer, C.D.: Google's PageRank and beyond : the science of search engine rankings (2006) 0.04
```
0.03760769 = product of:
  0.06267948 = sum of:
    0.004989027 = product of:
      0.024945134 = sum of:
        0.024945134 = weight(_text_:problem in 6) [ClassicSimilarity], result of:
          0.024945134 = score(doc=6,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.14068612 = fieldWeight in 6, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0234375 = fieldNorm(doc=6)
      0.2 = coord(1/5)
    0.042174287 = weight(_text_:philosophy in 6) [ClassicSimilarity], result of:
      0.042174287 = score(doc=6,freq=2.0), product of:
        0.23055021 = queryWeight, product of:
          5.5189433 = idf(docFreq=481, maxDocs=44218)
          0.04177434 = queryNorm
        0.18292886 = fieldWeight in 6, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.5189433 = idf(docFreq=481, maxDocs=44218)
          0.0234375 = fieldNorm(doc=6)
    0.01551616 = weight(_text_:of in 6) [ClassicSimilarity], result of:
      0.01551616 = score(doc=6,freq=42.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.23752278 = fieldWeight in 6, product of:
          6.4807405 = tf(freq=42.0), with freq of:
            42.0 = termFreq=42.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0234375 = fieldNorm(doc=6)
  0.6 = coord(3/5)
```
Abstract

Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other Web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of Web page rankings, "Google's PageRank and Beyond" supplies the answers to these and other questions and more. The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research. The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample Web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text. Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided. It includes: many illustrative examples and entertaining asides; MATLAB code; accessible and informal style; and complete and self-contained section for mathematics review.

Content

Inhalt: Chapter 1. Introduction to Web Search Engines: 1.1 A Short History of Information Retrieval - 1.2 An Overview of Traditional Information Retrieval - 1.3 Web Information Retrieval Chapter 2. Crawling, Indexing, and Query Processing: 2.1 Crawling - 2.2 The Content Index - 2.3 Query Processing Chapter 3. Ranking Webpages by Popularity: 3.1 The Scene in 1998 - 3.2 Two Theses - 3.3 Query-Independence Chapter 4. The Mathematics of Google's PageRank: 4.1 The Original Summation Formula for PageRank - 4.2 Matrix Representation of the Summation Equations - 4.3 Problems with the Iterative Process - 4.4 A Little Markov Chain Theory - 4.5 Early Adjustments to the Basic Model - 4.6 Computation of the PageRank Vector - 4.7 Theorem and Proof for Spectrum of the Google Matrix Chapter 5. Parameters in the PageRank Model: 5.1 The a Factor - 5.2 The Hyperlink Matrix H - 5.3 The Teleportation Matrix E Chapter 6. The Sensitivity of PageRank; 6.1 Sensitivity with respect to alpha - 6.2 Sensitivity with respect to H - 6.3 Sensitivity with respect to vT - 6.4 Other Analyses of Sensitivity - 6.5 Sensitivity Theorems and Proofs Chapter 7. The PageRank Problem as a Linear System: 7.1 Properties of (I - alphaS) - 7.2 Properties of (I - alphaH) - 7.3 Proof of the PageRank Sparse Linear System Chapter 8. Issues in Large-Scale Implementation of PageRank: 8.1 Storage Issues - 8.2 Convergence Criterion - 8.3 Accuracy - 8.4 Dangling Nodes - 8.5 Back Button Modeling
Chapter 9. Accelerating the Computation of PageRank: 9.1 An Adaptive Power Method - 9.2 Extrapolation - 9.3 Aggregation - 9.4 Other Numerical Methods Chapter 10. Updating the PageRank Vector: 10.1 The Two Updating Problems and their History - 10.2 Restarting the Power Method - 10.3 Approximate Updating Using Approximate Aggregation - 10.4 Exact Aggregation - 10.5 Exact vs. Approximate Aggregation - 10.6 Updating with Iterative Aggregation - 10.7 Determining the Partition - 10.8 Conclusions Chapter 11. The HITS Method for Ranking Webpages: 11.1 The HITS Algorithm - 11.2 HITS Implementation - 11.3 HITS Convergence - 11.4 HITS Example - 11.5 Strengths and Weaknesses of HITS - 11.6 HITS's Relationship to Bibliometrics - 11.7 Query-Independent HITS - 11.8 Accelerating HITS - 11.9 HITS Sensitivity Chapter 12. Other Link Methods for Ranking Webpages: 12.1 SALSA - 12.2 Hybrid Ranking Methods - 12.3 Rankings based on Traffic Flow Chapter 13. The Future of Web Information Retrieval: 13.1 Spam - 13.2 Personalization - 13.3 Clustering - 13.4 Intelligent Agents - 13.5 Trends and Time-Sensitive Search - 13.6 Privacy and Censorship - 13.7 Library Classification Schemes - 13.8 Data Fusion Chapter 14. Resources for Web Information Retrieval: 14.1 Resources for Getting Started - 14.2 Resources for Serious Study Chapter 15. The Mathematics Guide: 15.1 Linear Algebra - 15.2 Perron-Frobenius Theory - 15.3 Markov Chains - 15.4 Perron Complementation - 15.5 Stochastic Complementation - 15.6 Censoring - 15.7 Aggregation - 15.8 Disaggregation
Wills, R.S.: Google's PageRank : the math behind the search engine (2006) 0.04
```
0.03605866 = product of:
  0.060097765 = sum of:
    0.0066520358 = product of:
      0.033260178 = sum of:
        0.033260178 = weight(_text_:problem in 5954) [ClassicSimilarity], result of:
          0.033260178 = score(doc=5954,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.1875815 = fieldWeight in 5954, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.03125 = fieldNorm(doc=5954)
      0.2 = coord(1/5)
    0.017484732 = weight(_text_:of in 5954) [ClassicSimilarity], result of:
      0.017484732 = score(doc=5954,freq=30.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.26765788 = fieldWeight in 5954, product of:
          5.477226 = tf(freq=30.0), with freq of:
            30.0 = termFreq=30.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=5954)
    0.035961 = product of:
      0.071922 = sum of:
        0.071922 = weight(_text_:mind in 5954) [ClassicSimilarity], result of:
          0.071922 = score(doc=5954,freq=2.0), product of:
            0.2607373 = queryWeight, product of:
              6.241566 = idf(docFreq=233, maxDocs=44218)
              0.04177434 = queryNorm
            0.27584085 = fieldWeight in 5954, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.241566 = idf(docFreq=233, maxDocs=44218)
              0.03125 = fieldNorm(doc=5954)
      0.5 = coord(1/2)
  0.6 = coord(3/5)
```
Abstract

Approximately 91 million American adults use the Internet on a typical day The number-one Internet activity is reading and writing e-mail. Search engine use is next in line and continues to increase in popularity. In fact, survey findings indicate that nearly 60 million American adults use search engines on a given day. Even though there are many Internet search engines, Google, Yahoo!, and MSN receive over 81% of all search requests. Despite claims that the quality of search provided by Yahoo! and MSN now equals that of Google, Google continues to thrive as the search engine of choice, receiving over 46% of all search requests, nearly double the volume of Yahoo! and over four times that of MSN. I use Google's search engine on a daily basis and rarely request information from other search engines. One day, I decided to visit the homepages of Google. Yahoo!, and MSN to compare the quality of search results. Coffee was on my mind that day, so I entered the simple query "coffee" in the search box at each homepage. Table 1 shows the top ten (unsponsored) results returned by each search engine. Although ordered differently, two webpages, www.peets.com and www.coffeegeek.com, appear in all three top ten lists. In addition, each pairing of top ten lists has two additional results in common. Depending on the information I hoped to obtain about coffee by using the search engines, I could argue that any one of the three returned better results: however, I was not looking for a particular webpage, so all three listings of search results seemed of equal quality. Thus, I plan to continue using Google. My decision is indicative of the problem Yahoo!, MSN, and other search engine companies face in the quest to obtain a larger percentage of Internet search volume. Search engine users are loyal to one or a few search engines and are generally happy with search results. Thus, as long as Google continues to provide results deemed high in quality, Google likely will remain the top search engine. But what set Google apart from its competitors in the first place? The answer is PageRank. In this article I explain this simple mathematical algorithm that revolutionized Web search.

Ponte, J.M.: Language models for relevance feedback (2000) 0.03

0.026994044 = product of:
  0.06748511 = sum of:
    0.013543615 = weight(_text_:of in 35) [ClassicSimilarity], result of:
      0.013543615 = score(doc=35,freq=8.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.20732689 = fieldWeight in 35, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=35)
    0.053941496 = product of:
      0.10788299 = sum of:
        0.10788299 = weight(_text_:mind in 35) [ClassicSimilarity], result of:
          0.10788299 = score(doc=35,freq=2.0), product of:
            0.2607373 = queryWeight, product of:
              6.241566 = idf(docFreq=233, maxDocs=44218)
              0.04177434 = queryNorm
            0.41376126 = fieldWeight in 35, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.241566 = idf(docFreq=233, maxDocs=44218)
              0.046875 = fieldNorm(doc=35)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: The language modeling approach to Information Retrieval (IR) is a conceptually simple model of IR originally developed by Ponte and Croft (1998). In this approach, the query is treated as a random event and documents are ranked according to the likelihood that the query would be generated via a language model estimated for each document. The intuition behind this approach is that users have a prototypical document in mind and will choose query terms accordingly. The intuitive appeal of this method is that inferences about the semantic content of documents do not need to be made resulting in a conceptually simple model. In this paper, techniques for relevance feedback and routing are derived from the language modeling approach in a straightforward manner and their effectiveness is demonstrated empirically. These experiments demonstrate further proof of concept for the language modeling approach to retrieval

Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.02

0.02305558 = product of:
  0.038425963 = sum of:
    0.008315044 = product of:
      0.041575223 = sum of:
        0.041575223 = weight(_text_:problem in 664) [ClassicSimilarity], result of:
          0.041575223 = score(doc=664,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.23447686 = fieldWeight in 664, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=664)
      0.2 = coord(1/5)
    0.015961302 = weight(_text_:of in 664) [ClassicSimilarity], result of:
      0.015961302 = score(doc=664,freq=16.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.24433708 = fieldWeight in 664, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=664)
    0.0141496165 = product of:
      0.028299233 = sum of:
        0.028299233 = weight(_text_:22 in 664) [ClassicSimilarity], result of:
          0.028299233 = score(doc=664,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.19345059 = fieldWeight in 664, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=664)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: A new challenge, accessing multiple relevant entities, arises from the availability of linked heterogeneous data. In this article, we address more specifically the problem of accessing relevant entities, such as publications and authors within a bibliographic network, given an information need. We propose a novel algorithm, called BibRank, that estimates a joint relevance of documents and authors within a bibliographic network. This model ranks each type of entity using a score propagation algorithm with respect to the query topic and the structure of the underlying bi-type information entity network. Evidence sources, namely content-based and network-based scores, are both used to estimate the topical similarity between connected entities. For this purpose, authorship relationships are analyzed through a language model-based score on the one hand and on the other hand, non topically related entities of the same type are detected through marginal citations. The article reports the results of experiments using the Bibrank algorithm for an information retrieval task. The CiteSeerX bibliographic data set forms the basis for the topical query automatic generation and evaluation. We show that a statistically significant improvement over closely related ranking models is achieved.
Date: 22. 3.2013 19:34:49
Source: Journal of the American Society for Information Science and Technology. 64(2013) no.3, S.500-515

Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.02

0.022167925 = product of:
  0.05541981 = sum of:
    0.015800884 = weight(_text_:of in 2134) [ClassicSimilarity], result of:
      0.015800884 = score(doc=2134,freq=2.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.24188137 = fieldWeight in 2134, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.109375 = fieldNorm(doc=2134)
    0.039618924 = product of:
      0.07923785 = sum of:
        0.07923785 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
          0.07923785 = score(doc=2134,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.5416616 = fieldWeight in 2134, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=2134)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Date: 30. 3.2001 13:32:22

Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.02

0.022167925 = product of:
  0.05541981 = sum of:
    0.015800884 = weight(_text_:of in 3445) [ClassicSimilarity], result of:
      0.015800884 = score(doc=3445,freq=2.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.24188137 = fieldWeight in 3445, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.109375 = fieldNorm(doc=3445)
    0.039618924 = product of:
      0.07923785 = sum of:
        0.07923785 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
          0.07923785 = score(doc=3445,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.5416616 = fieldWeight in 3445, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3445)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Date: 25. 8.2005 17:42:22

Sachs, W.M.: ¬An approach to associative retrieval through the theory of fuzzy sets (1976) 0.02

0.017710352 = product of:
  0.04427588 = sum of:
    0.016630089 = product of:
      0.08315045 = sum of:
        0.08315045 = weight(_text_:problem in 7) [ClassicSimilarity], result of:
          0.08315045 = score(doc=7,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.46895373 = fieldWeight in 7, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.078125 = fieldNorm(doc=7)
      0.2 = coord(1/5)
    0.02764579 = weight(_text_:of in 7) [ClassicSimilarity], result of:
      0.02764579 = score(doc=7,freq=12.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.42320424 = fieldWeight in 7, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.078125 = fieldNorm(doc=7)
  0.4 = coord(2/5)

Abstract: The theory of fuzzy sets is used to provide a rogorous formulation of the problem of associative retrieval. This formulation suggests the idea of using fuzzy clustering to organize data for retrieval
Source: Journal of the American Society for information science. 27(1976), S.85-87

Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.02

0.017131606 = product of:
  0.042829014 = sum of:
    0.02018963 = weight(_text_:of in 1422) [ClassicSimilarity], result of:
      0.02018963 = score(doc=1422,freq=10.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.3090647 = fieldWeight in 1422, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=1422)
    0.022639386 = product of:
      0.045278773 = sum of:
        0.045278773 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
          0.045278773 = score(doc=1422,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.30952093 = fieldWeight in 1422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1422)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: We propose a novel approach to incorporate term similarity and inverse document frequency into a logical model of information retrieval. The ability of the logic to handle expressive representations along with the use of such classical notions are promising characteristics for IR systems. The approach proposed here has been efficiently implemented and experiments against test collections are presented.
Date: 22. 3.2003 19:27:23
Source: Journal of the American Society for Information Science and technology. 54(2003) no.4, S.285-301

Kelledy, F.; Smeaton, A.F.: Signature files and beyond (1996) 0.02
```
0.016558254 = product of:
  0.041395634 = sum of:
    0.024416098 = weight(_text_:of in 6973) [ClassicSimilarity], result of:
      0.024416098 = score(doc=6973,freq=26.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.37376386 = fieldWeight in 6973, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=6973)
    0.016979538 = product of:
      0.033959076 = sum of:
        0.033959076 = weight(_text_:22 in 6973) [ClassicSimilarity], result of:
          0.033959076 = score(doc=6973,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.23214069 = fieldWeight in 6973, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=6973)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Proposes that signature files be used as a viable alternative to other indexing strategies such as inverted files for searching through large volumes of text. Demonstrates through simulation, that search times can be further reduced by enhancing the basic signature file concept using deterministic partitioning algorithms which eliminate the need for an exhaustive search of the entire signature file. Reports research to evaluate the performance of some deterministic partitioning algorithms in a non simulated environment using 276 MB of raw newspaper text (taken from the Wall Street Journal) and real user queries. Presents a selection of results to illustrate trends and highlight important aspects of the performance of these methods under realistic rather than simulated operating conditions. As a result of the research reported here certain aspects of this approach to signature files are shown to be found wanting and require improvement. Suggests lines of future research on the partitioning of signature files

Source

Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon
Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.02
```
0.015775634 = product of:
  0.039439082 = sum of:
    0.022459546 = weight(_text_:of in 2717) [ClassicSimilarity], result of:
      0.022459546 = score(doc=2717,freq=22.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.34381276 = fieldWeight in 2717, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2717)
    0.016979538 = product of:
      0.033959076 = sum of:
        0.033959076 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
          0.033959076 = score(doc=2717,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.23214069 = fieldWeight in 2717, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2717)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Previous work an search-engine design has indicated that information-seekers may benefit from being given the opportunity to exploit multiple sources of evidence of document relatedness. Few existing systems, however, give users more than minimal control over the selections that may be made among methods of exploitation. By applying the methods of "document network analysis" (DNA), a unifying, graph-theoretic model of content-, collaboration-, and context-based systems (CCC) may be developed in which the nature of the similarities between types of document relatedness and document ranking are clarified. The usefulness of the approach to system design suggested by this model may be tested by constructing and evaluating a prototype system (UCXtra) that allows searchers to maintain control over the multiple ways in which document collections may be ranked and re-ranked.

Date

11. 9.2004 17:32:22

Source

Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas
Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.02
```
0.015490747 = product of:
  0.038726866 = sum of:
    0.018716287 = weight(_text_:of in 2591) [ClassicSimilarity], result of:
      0.018716287 = score(doc=2591,freq=22.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.28651062 = fieldWeight in 2591, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2591)
    0.02001058 = product of:
      0.04002116 = sum of:
        0.04002116 = weight(_text_:22 in 2591) [ClassicSimilarity], result of:
          0.04002116 = score(doc=2591,freq=4.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.27358043 = fieldWeight in 2591, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2591)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Purpose In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.

Date

20. 1.2015 18:30:22
18. 9.2018 18:22:56

Source

Aslib journal of information management. 67(2015) no.6, S.700-714

Faloutsos, C.: Signature files (1992) 0.02

0.015311283 = product of:
  0.038278207 = sum of:
    0.01563882 = weight(_text_:of in 3499) [ClassicSimilarity], result of:
      0.01563882 = score(doc=3499,freq=6.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.23940048 = fieldWeight in 3499, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=3499)
    0.022639386 = product of:
      0.045278773 = sum of:
        0.045278773 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
          0.045278773 = score(doc=3499,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.30952093 = fieldWeight in 3499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3499)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Presents a survey and discussion on signature-based text retrieval methods. It describes the main idea behind the signature approach and its advantages over other text retrieval methods, it provides a classification of the signature methods that have appeared in the literature, it describes the main representatives of each class, together with the relative advantages and drawbacks, and it gives a list of applications as well as commercial or university prototypes that use the signature approach
Date: 7. 5.1999 15:22:48

Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.02

0.015311283 = product of:
  0.038278207 = sum of:
    0.01563882 = weight(_text_:of in 1431) [ClassicSimilarity], result of:
      0.01563882 = score(doc=1431,freq=6.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.23940048 = fieldWeight in 1431, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=1431)
    0.022639386 = product of:
      0.045278773 = sum of:
        0.045278773 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
          0.045278773 = score(doc=1431,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.30952093 = fieldWeight in 1431, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1431)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Properties of a percentile-based rating scale needed in bibliometrics are formulated. Based on these properties, P100 was recently introduced as a new citation-rank approach (Bornmann, Leydesdorff, & Wang, 2013). In this paper, we conceptualize P100 and propose an improvement which we call P100'. Advantages and disadvantages of citation-rank indicators are noted.
Date: 22. 8.2014 17:05:18
Source: Journal of the Association for Information Science and Technology. 65(2014) no.9, S.1939-1943

Burgin, R.: ¬The retrieval effectiveness of 5 clustering algorithms as a function of indexing exhaustivity (1995) 0.01
```
0.014966805 = product of:
  0.037417013 = sum of:
    0.023267398 = weight(_text_:of in 3365) [ClassicSimilarity], result of:
      0.023267398 = score(doc=3365,freq=34.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.35617945 = fieldWeight in 3365, product of:
          5.8309517 = tf(freq=34.0), with freq of:
            34.0 = termFreq=34.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3365)
    0.0141496165 = product of:
      0.028299233 = sum of:
        0.028299233 = weight(_text_:22 in 3365) [ClassicSimilarity], result of:
          0.028299233 = score(doc=3365,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.19345059 = fieldWeight in 3365, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3365)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

The retrieval effectiveness of 5 hierarchical clustering methods (single link, complete link, group average, Ward's method, and weighted average) is examined as a function of indexing exhaustivity with 4 test collections (CR, Cranfield, Medlars, and Time). Evaluations of retrieval effectiveness, based on 3 measures of optimal retrieval performance, confirm earlier findings that the performance of a retrieval system based on single link clustering varies as a function of indexing exhaustivity but fail ti find similar patterns for other clustering methods. The data also confirm earlier findings regarding the poor performance of single link clustering is a retrieval environment. The poor performance of single link clustering appears to derive from that method's tendency to produce a small number of large, ill defined document clusters. By contrast, the data examined here found the retrieval performance of the other clustering methods to be general comparable. The data presented also provides an opportunity to examine the theoretical limits of cluster based retrieval and to compare these theoretical limits to the effectiveness of operational implementations. Performance standards of the 4 document collections examined were found to vary widely, and the effectiveness of operational implementations were found to be in the range defined as unacceptable. Further improvements in search strategies and document representations warrant investigations

Date

22. 2.1996 11:20:06

Source

Journal of the American Society for Information Science. 46(1995) no.8, S.562-572

Crestani, F.; Dominich, S.; Lalmas, M.; Rijsbergen, C.J.K. van: Mathematical, logical, and formal methods in information retrieval : an introduction to the special issue (2003) 0.01

0.014453242 = product of:
  0.036133103 = sum of:
    0.019153563 = weight(_text_:of in 1451) [ClassicSimilarity], result of:
      0.019153563 = score(doc=1451,freq=16.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.2932045 = fieldWeight in 1451, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1451)
    0.016979538 = product of:
      0.033959076 = sum of:
        0.033959076 = weight(_text_:22 in 1451) [ClassicSimilarity], result of:
          0.033959076 = score(doc=1451,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.23214069 = fieldWeight in 1451, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1451)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Research an the use of mathematical, logical, and formal methods, has been central to Information Retrieval research for a long time. Research in this area is important not only because it helps enhancing retrieval effectiveness, but also because it helps clarifying the underlying concepts of Information Retrieval. In this article we outline some of the major aspects of the subject, and summarize the papers of this special issue with respect to how they relate to these aspects. We conclude by highlighting some directions of future research, which are needed to better understand the formal characteristics of Information Retrieval.
Date: 22. 3.2003 19:27:36
Source: Journal of the American Society for Information Science and technology. 54(2003) no.4, S.281-284

Na, S.-H.; Kang, I.-S.; Roh, J.-E.; Lee, J.-H.: ¬An empirical study of query expansion and cluster-based retrieval in language modeling approach (2007) 0.01

0.014326001 = product of:
  0.035815 = sum of:
    0.01646295 = product of:
      0.08231475 = sum of:
        0.08231475 = weight(_text_:problem in 906) [ClassicSimilarity], result of:
          0.08231475 = score(doc=906,freq=4.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.46424055 = fieldWeight in 906, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0546875 = fieldNorm(doc=906)
      0.2 = coord(1/5)
    0.01935205 = weight(_text_:of in 906) [ClassicSimilarity], result of:
      0.01935205 = score(doc=906,freq=12.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.29624295 = fieldWeight in 906, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=906)
  0.4 = coord(2/5)

Abstract: The term mismatch problem in information retrieval is a critical problem, and several techniques have been developed, such as query expansion, cluster-based retrieval and dimensionality reduction to resolve this issue. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance by performing experimentations on seven test collections of NTCIR and TREC.

Frants, V.I.; Shapiro, J.: Control and feedback in a documentary information retrieval system (1991) 0.01

0.01416828 = product of:
  0.0354207 = sum of:
    0.0133040715 = product of:
      0.066520356 = sum of:
        0.066520356 = weight(_text_:problem in 416) [ClassicSimilarity], result of:
          0.066520356 = score(doc=416,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.375163 = fieldWeight in 416, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0625 = fieldNorm(doc=416)
      0.2 = coord(1/5)
    0.02211663 = weight(_text_:of in 416) [ClassicSimilarity], result of:
      0.02211663 = score(doc=416,freq=12.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.33856338 = fieldWeight in 416, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=416)
  0.4 = coord(2/5)

Abstract: Addresses the problem of control in documentary information retrieval systems is analysed and it is shown why an IR system has to be looked at as an adaptive system. The algorithms of feedback are proposed and it is shown how they depend on the type of the collection of documents: static (no change in the collection between searches) and dynamic (when the change occurs between searches). The proposed algorithms are the basis for the development of the fully automated information retrieval systems
Source: Journal of the American Society for Information Science. 42(1991) no.9, S.623-634

Witschel, H.F.: Global term weights in distributed environments (2008) 0.01

0.013958423 = product of:
  0.034896057 = sum of:
    0.01791652 = weight(_text_:of in 2096) [ClassicSimilarity], result of:
      0.01791652 = score(doc=2096,freq=14.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.2742677 = fieldWeight in 2096, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2096)
    0.016979538 = product of:
      0.033959076 = sum of:
        0.033959076 = weight(_text_:22 in 2096) [ClassicSimilarity], result of:
          0.033959076 = score(doc=2096,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.23214069 = fieldWeight in 2096, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2096)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the target retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection - an "extended stop word list" - are known and all terms which are not in that list are treated equally. However, the list cannot always be fully estimated from a general-purpose reference corpus, but some "domain-specific stop words" need to be added. A good solution for achieving this is to mix estimates from small samples of the target retrieval collection with ones derived from a reference corpus.
Date: 1. 8.2008 9:44:22

Maron, M.E.: ¬An historical note on the origins of probabilistic indexing (2008) 0.01

0.01378145 = product of:
  0.034453623 = sum of:
    0.0188148 = product of:
      0.094073996 = sum of:
        0.094073996 = weight(_text_:problem in 2047) [ClassicSimilarity], result of:
          0.094073996 = score(doc=2047,freq=4.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.5305606 = fieldWeight in 2047, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0625 = fieldNorm(doc=2047)
      0.2 = coord(1/5)
    0.01563882 = weight(_text_:of in 2047) [ClassicSimilarity], result of:
      0.01563882 = score(doc=2047,freq=6.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.23940048 = fieldWeight in 2047, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=2047)
  0.4 = coord(2/5)

Abstract: The motivation behind "Probabilistic Indexing" was to replace two-valued thinking about information retrieval with probabilistic notions. This involved a new view of the information retrieval problem - viewing it as problem of inference and prediction, and introducing probabilistically weighted indexes and probabilistically ranked output. These ideas were first formulated and written up in August 1958.

Efthimiadis, E.N.: User choices : a new yardstick for the evaluation of ranking algorithms for interactive query expansion (1995) 0.01
```
0.013479257 = product of:
  0.03369814 = sum of:
    0.019548526 = weight(_text_:of in 5697) [ClassicSimilarity], result of:
      0.019548526 = score(doc=5697,freq=24.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.2992506 = fieldWeight in 5697, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5697)
    0.0141496165 = product of:
      0.028299233 = sum of:
        0.028299233 = weight(_text_:22 in 5697) [ClassicSimilarity], result of:
          0.028299233 = score(doc=5697,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.19345059 = fieldWeight in 5697, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5697)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

The performance of 8 ranking algorithms was evaluated with respect to their effectiveness in ranking terms for query expansion. The evaluation was conducted within an investigation of interactive query expansion and relevance feedback in a real operational environment. Focuses on the identification of algorithms that most effectively take cognizance of user preferences. user choices (i.e. the terms selected by the searchers for the query expansion search) provided the yardstick for the evaluation of the 8 ranking algorithms. This methodology introduces a user oriented approach in evaluating ranking algorithms for query expansion in contrast to the standard, system oriented approaches. Similarities in the performance of the 8 algorithms and the ways these algorithms rank terms were the main focus of this evaluation. The findings demonstrate that the r-lohi, wpq, enim, and porter algorithms have similar performance in bringing good terms to the top of a ranked list of terms for query expansion. However, further evaluation of the algorithms in different (e.g. full text) environments is needed before these results can be generalized beyond the context of the present study

Date

22. 2.1996 13:14:10

Search (305 results, page 1 of 16)

Authors

Years

Languages

Types

Themes

Subjects

Classifications