Search (7 results, page 1 of 1)

Dominich, S.: Mathematical foundations of information retrieval (2001) 0.02

0.020160122 = product of:
  0.030240182 = sum of:
    0.013336393 = weight(_text_:on in 1753) [ClassicSimilarity], result of:
      0.013336393 = score(doc=1753,freq=2.0), product of:
        0.109763056 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.04990557 = queryNorm
        0.121501654 = fieldWeight in 1753, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1753)
    0.01690379 = product of:
      0.03380758 = sum of:
        0.03380758 = weight(_text_:22 in 1753) [ClassicSimilarity], result of:
          0.03380758 = score(doc=1753,freq=2.0), product of:
            0.1747608 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04990557 = queryNorm
            0.19345059 = fieldWeight in 1753, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1753)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: This book offers a comprehensive and consistent mathematical approach to information retrieval (IR) without which no implementation is possible, and sheds an entirely new light upon the structure of IR models. It contains the descriptions of all IR models in a unified formal style and language, along with examples for each, thus offering a comprehensive overview of them. The book also creates mathematical foundations and a consistent mathematical theory (including all mathematical results achieved so far) of IR as a stand-alone mathematical discipline, which thus can be read and taught independently. Also, the book contains all necessary mathematical knowledge on which IR relies, to help the reader avoid searching different sources. The book will be of interest to computer or information scientists, librarians, mathematicians, undergraduate students and researchers whose work involves information retrieval.
Date: 22. 3.2008 12:26:32

Computational information retrieval (2001) 0.01
```
0.009239726 = product of:
  0.027719175 = sum of:
    0.027719175 = weight(_text_:on in 4167) [ClassicSimilarity], result of:
      0.027719175 = score(doc=4167,freq=6.0), product of:
        0.109763056 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.04990557 = queryNorm
        0.25253648 = fieldWeight in 4167, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.046875 = fieldNorm(doc=4167)
  0.33333334 = coord(1/3)
```
Abstract

This volume contains selected papers that focus on the use of linear algebra, computational statistics, and computer science in the development of algorithms and software systems for text retrieval. Experts in information modeling and retrieval share their perspectives on the design of scalable but precise text retrieval systems, revealing many of the challenges and obstacles that mathematical and statistical models must overcome to be viable for automated text processing. This very useful proceedings is an excellent companion for courses in information retrieval, applied linear algebra, and applied statistics. Computational Information Retrieval provides background material on vector space models for text retrieval that applied mathematicians, statisticians, and computer scientists may not be familiar with. For graduate students in these areas, several research questions in information modeling are exposed. In addition, several case studies concerning the efficacy of the popular Latent Semantic Analysis (or Indexing) approach are provided.
Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (2005) 0.01
```
0.007112743 = product of:
  0.021338228 = sum of:
    0.021338228 = weight(_text_:on in 7) [ClassicSimilarity], result of:
      0.021338228 = score(doc=7,freq=8.0), product of:
        0.109763056 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.04990557 = queryNorm
        0.19440265 = fieldWeight in 7, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.03125 = fieldNorm(doc=7)
  0.33333334 = coord(1/3)
```
Abstract

The second edition of Understanding Search Engines: Mathematical Modeling and Text Retrieval follows the basic premise of the first edition by discussing many of the key design issues for building search engines and emphasizing the important role that applied mathematics can play in improving information retrieval. The authors discuss important data structures, algorithms, and software as well as user-centered issues such as interfaces, manual indexing, and document preparation. Significant changes bring the text up to date on current information retrieval methods: for example the addition of a new chapter on link-structure algorithms used in search engines such as Google. The chapter on user interface has been rewritten to specifically focus on search engine usability. In addition the authors have added new recommendations for further reading and expanded the bibliography, and have updated and streamlined the index to make it more reader friendly.

Brenner, E.H.: Beyond Boolean : new approaches in information retrieval; the quest for intuitive online search systems past, present & future (1995) 0.01

0.00622365 = product of:
  0.01867095 = sum of:
    0.01867095 = weight(_text_:on in 2547) [ClassicSimilarity], result of:
      0.01867095 = score(doc=2547,freq=2.0), product of:
        0.109763056 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.04990557 = queryNorm
        0.17010231 = fieldWeight in 2547, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2547)
  0.33333334 = coord(1/3)

Abstract: The challenge of effectively bringing specific, relevant information from the global sea of data to our fingertips, has become an increasingly difficult one. Discusses how the online information industry, founded on Boolean search systems, may be evolving to take advantage of other methods, such as 'term weighting', 'relevance ranking' and 'query by example'

Cross-language information retrieval (1998) 0.01
```
0.0054445597 = product of:
  0.016333679 = sum of:
    0.016333679 = weight(_text_:on in 6299) [ClassicSimilarity], result of:
      0.016333679 = score(doc=6299,freq=12.0), product of:
        0.109763056 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.04990557 = queryNorm
        0.14880852 = fieldWeight in 6299, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.01953125 = fieldNorm(doc=6299)
  0.33333334 = coord(1/3)
```
Content

Enthält die Beiträge: GREFENSTETTE, G.: The Problem of Cross-Language Information Retrieval; DAVIS, M.W.: On the Effective Use of Large Parallel Corpora in Cross-Language Text Retrieval; BALLESTEROS, L. u. W.B. CROFT: Statistical Methods for Cross-Language Information Retrieval; Distributed Cross-Lingual Information Retrieval; Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing; EVANS, D.A. u.a.: Mapping Vocabularies Using Latent Semantics; PICCHI, E. u. C. PETERS: Cross-Language Information Retrieval: A System for Comparable Corpus Querying; YAMABANA, K. u.a.: A Language Conversion Front-End for Cross-Language Information Retrieval; GACHOT, D.A. u.a.: The Systran NLP Browser: An Application of Machine Translation Technology in Cross-Language Information Retrieval; HULL, D.: A Weighted Boolean Model for Cross-Language Text Retrieval; SHERIDAN, P. u.a. Building a Large Multilingual Test Collection from Comparable News Documents; OARD; D.W. u. B.J. DORR: Evaluating Cross-Language Text Filtering Effectiveness

Footnote

Rez. in: Machine translation review: 1999, no.10, S.26-27 (D. Lewis): "Cross Language Information Retrieval (CLIR) addresses the growing need to access large volumes of data across language boundaries. The typical requirement is for the user to input a free form query, usually a brief description of a topic, into a search or retrieval engine which returns a list, in ranked order, of documents or web pages that are relevant to the topic. The search engine matches the terms in the query to indexed terms, usually keywords previously derived from the target documents. Unlike monolingual information retrieval, CLIR requires query terms in one language to be matched to indexed terms in another. Matching can be done by bilingual dictionary lookup, full machine translation, or by applying statistical methods. A query's success is measured in terms of recall (how many potentially relevant target documents are found) and precision (what proportion of documents found are relevant). Issues in CLIR are how to translate query terms into index terms, how to eliminate alternative translations (e.g. to decide that French 'traitement' in a query means 'treatment' and not 'salary'), and how to rank or weight translation alternatives that are retained (e.g. how to order the French terms 'aventure', 'business', 'affaire', and 'liaison' as relevant translations of English 'affair'). Grefenstette provides a lucid and useful overview of the field and the problems. The volume brings together a number of experiments and projects in CLIR. Mark Davies (New Mexico State University) describes Recuerdo, a Spanish retrieval engine which reduces translation ambiguities by scanning indexes for parallel texts; it also uses either a bilingual dictionary or direct equivalents from a parallel corpus in order to compare results for queries on parallel texts. Lisa Ballesteros and Bruce Croft (University of Massachusetts) use a 'local feedback' technique which automatically enhances a query by adding extra terms to it both before and after translation; such terms can be derived from documents known to be relevant to the query.
Christian Fluhr at al (DIST/SMTI, France) outline the EMIR (European Multilingual Information Retrieval) and ESPRIT projects. They found that using SYSTRAN to machine translate queries and to access material from various multilingual databases produced less relevant results than a method referred to as 'multilingual reformulation' (the mechanics of which are only hinted at). An interesting technique is Latent Semantic Indexing (LSI), described by Michael Littman et al (Brown University) and, most clearly, by David Evans et al (Carnegie Mellon University). LSI involves creating matrices of documents and the terms they contain and 'fitting' related documents into a reduced matrix space. This effectively allows queries to be mapped onto a common semantic representation of the documents. Eugenio Picchi and Carol Peters (Pisa) report on a procedure to create links between translation equivalents in an Italian-English parallel corpus. The links are used to construct parallel linguistic contexts in real-time for any term or combination of terms that is being searched for in either language. Their interest is primarily lexicographic but they plan to apply the same procedure to comparable corpora, i.e. to texts which are not translations of each other but which share the same domain. Kiyoshi Yamabana et al (NEC, Japan) address the issue of how to disambiguate between alternative translations of query terms. Their DMAX (double maximise) method looks at co-occurrence frequencies between both source language words and target language words in order to arrive at the most probable translation. The statistical data for the decision are derived, not from the translation texts but independently from monolingual corpora in each language. An interactive user interface allows the user to influence the selection of terms during the matching process. Denis Gachot et al (SYSTRAN) describe the SYSTRAN NLP browser, a prototype tool which collects parsing information derived from a text or corpus previously translated with SYSTRAN. The user enters queries into the browser in either a structured or free form and receives grammatical and lexical information about the source text and/or its translation.
The retrieved output from a query including the phrase 'big rockets' may be, for instance, a sentence containing 'giant rocket' which is semantically ranked above 'military ocket'. David Hull (Xerox Research Centre, Grenoble) describes an implementation of a weighted Boolean model for Spanish-English CLIR. Users construct Boolean-type queries, weighting each term in the query, which is then translated by an on-line dictionary before being applied to the database. Comparisons with the performance of unweighted free-form queries ('vector space' models) proved encouraging. Two contributions consider the evaluation of CLIR systems. In order to by-pass the time-consuming and expensive process of assembling a standard collection of documents and of user queries against which the performance of an CLIR system is manually assessed, Páriac Sheridan et al (ETH Zurich) propose a method based on retrieving 'seed documents'. This involves identifying a unique document in a database (the 'seed document') and, for a number of queries, measuring how fast it is retrieved. The authors have also assembled a large database of multilingual news documents for testing purposes. By storing the (fairly short) documents in a structured form tagged with descriptor codes (e.g. for topic, country and area), the test suite is easily expanded while remaining consistent for the purposes of testing. Douglas Ouard and Bonne Dorr (University of Maryland) describe an evaluation methodology which appears to apply LSI techniques in order to filter and rank incoming documents designed for testing CLIR systems. The volume provides the reader an excellent overview of several projects in CLIR. It is well supported with references and is intended as a secondary text for researchers and practitioners. It highlights the need for a good, general tutorial introduction to the field."

Series

The Kluwer International series on information retrieval

Lalmas, M.: XML retrieval (2009) 0.00

0.0044454644 = product of:
  0.013336393 = sum of:
    0.013336393 = weight(_text_:on in 4998) [ClassicSimilarity], result of:
      0.013336393 = score(doc=4998,freq=2.0), product of:
        0.109763056 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.04990557 = queryNorm
        0.121501654 = fieldWeight in 4998, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4998)
  0.33333334 = coord(1/3)

Series: Synthesis lectures on information concepts, retrieval & services; 7

Langville, A.N.; Meyer, C.D.: Google's PageRank and beyond : the science of search engine rankings (2006) 0.00
```
0.0037721018 = product of:
  0.011316305 = sum of:
    0.011316305 = weight(_text_:on in 6) [ClassicSimilarity], result of:
      0.011316305 = score(doc=6,freq=4.0), product of:
        0.109763056 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.04990557 = queryNorm
        0.10309757 = fieldWeight in 6, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0234375 = fieldNorm(doc=6)
  0.33333334 = coord(1/3)
```
Abstract

Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other Web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of Web page rankings, "Google's PageRank and Beyond" supplies the answers to these and other questions and more. The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research. The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample Web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text. Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided. It includes: many illustrative examples and entertaining asides; MATLAB code; accessible and informal style; and complete and self-contained section for mathematics review.

Content

Chapter 9. Accelerating the Computation of PageRank: 9.1 An Adaptive Power Method - 9.2 Extrapolation - 9.3 Aggregation - 9.4 Other Numerical Methods Chapter 10. Updating the PageRank Vector: 10.1 The Two Updating Problems and their History - 10.2 Restarting the Power Method - 10.3 Approximate Updating Using Approximate Aggregation - 10.4 Exact Aggregation - 10.5 Exact vs. Approximate Aggregation - 10.6 Updating with Iterative Aggregation - 10.7 Determining the Partition - 10.8 Conclusions Chapter 11. The HITS Method for Ranking Webpages: 11.1 The HITS Algorithm - 11.2 HITS Implementation - 11.3 HITS Convergence - 11.4 HITS Example - 11.5 Strengths and Weaknesses of HITS - 11.6 HITS's Relationship to Bibliometrics - 11.7 Query-Independent HITS - 11.8 Accelerating HITS - 11.9 HITS Sensitivity Chapter 12. Other Link Methods for Ranking Webpages: 12.1 SALSA - 12.2 Hybrid Ranking Methods - 12.3 Rankings based on Traffic Flow Chapter 13. The Future of Web Information Retrieval: 13.1 Spam - 13.2 Personalization - 13.3 Clustering - 13.4 Intelligent Agents - 13.5 Trends and Time-Sensitive Search - 13.6 Privacy and Censorship - 13.7 Library Classification Schemes - 13.8 Data Fusion Chapter 14. Resources for Web Information Retrieval: 14.1 Resources for Getting Started - 14.2 Resources for Serious Study Chapter 15. The Mathematics Guide: 15.1 Linear Algebra - 15.2 Perron-Frobenius Theory - 15.3 Markov Chains - 15.4 Perron Complementation - 15.5 Stochastic Complementation - 15.6 Censoring - 15.7 Aggregation - 15.8 Disaggregation

Search (7 results, page 1 of 1)

Authors

Years

Themes

Subjects

Classifications