Search (169 results, page 1 of 9)

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.06

0.059291773 = product of:
  0.23716709 = sum of:
    0.23716709 = sum of:
      0.13565561 = weight(_text_:processing in 402) [ClassicSimilarity], result of:
        0.13565561 = score(doc=402,freq=2.0), product of:
          0.18956426 = queryWeight, product of:
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.046827413 = queryNorm
          0.7156181 = fieldWeight in 402, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.125 = fieldNorm(doc=402)
      0.101511486 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
        0.101511486 = score(doc=402,freq=2.0), product of:
          0.16398162 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046827413 = queryNorm
          0.61904186 = fieldWeight in 402, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.125 = fieldNorm(doc=402)
  0.25 = coord(1/4)

Source: Information processing and management. 22(1986) no.6, S.465-476

Pfeifer, U.; Pennekamp, S.: Incremental processing of vague queries in interactive retrieval systems (1997) 0.04

0.044672765 = product of:
  0.08934553 = sum of:
    0.04138403 = weight(_text_:data in 735) [ClassicSimilarity], result of:
      0.04138403 = score(doc=735,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2794884 = fieldWeight in 735, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=735)
    0.0479615 = product of:
      0.095923 = sum of:
        0.095923 = weight(_text_:processing in 735) [ClassicSimilarity], result of:
          0.095923 = score(doc=735,freq=4.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.5060184 = fieldWeight in 735, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0625 = fieldNorm(doc=735)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The application of information retrieval techniques in interactive environments requires systems capable of effeciently processing vague queries. To reach reasonable response times, new data structures and algorithms have to be developed. In this paper we describe an approach taking advantage of the conditions of interactive usage and special access paths. To have a reference we investigate text queries and compared our algorithms to the well known 'Buckley/Lewit' algorithm. We achieved significant improvements for the response times

Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.04
```
0.04227614 = product of:
  0.08455228 = sum of:
    0.063356094 = weight(_text_:data in 4218) [ClassicSimilarity], result of:
      0.063356094 = score(doc=4218,freq=12.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.4278775 = fieldWeight in 4218, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4218)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 4218) [ClassicSimilarity], result of:
          0.042392377 = score(doc=4218,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 4218, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4218)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning.

Source

Information processing and management. 45(2009) no.3, S.341-355
Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.04
```
0.03959743 = product of:
  0.07919486 = sum of:
    0.053759433 = weight(_text_:data in 2502) [ClassicSimilarity], result of:
      0.053759433 = score(doc=2502,freq=6.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.3630661 = fieldWeight in 2502, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=2502)
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 2502) [ClassicSimilarity], result of:
          0.05087085 = score(doc=2502,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 2502, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=2502)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.

MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.04

0.036669686 = product of:
  0.14667875 = sum of:
    0.14667875 = sum of:
      0.095923 = weight(_text_:processing in 5108) [ClassicSimilarity], result of:
        0.095923 = score(doc=5108,freq=4.0), product of:
          0.18956426 = queryWeight, product of:
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.046827413 = queryNorm
          0.5060184 = fieldWeight in 5108, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            4.048147 = idf(docFreq=2097, maxDocs=44218)
            0.0625 = fieldNorm(doc=5108)
      0.050755743 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
        0.050755743 = score(doc=5108,freq=2.0), product of:
          0.16398162 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046827413 = queryNorm
          0.30952093 = fieldWeight in 5108, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=5108)
  0.25 = coord(1/4)

Abstract: In this paper methods for both speeding up passage processing and examining more passages using parallel computers are explored. The number of passages processed are varied in order to examine the effect on retrieval effectiveness and efficiency. The particular algorithm applied has previously been used to good effect in Okapi experiments at TREC. This algorithm and the mechanism for applying parallel computing to speed up processing are described.
Date: 20. 1.2007 18:30:22

Information retrieval : data structures and algorithms (1992) 0.04
```
0.036463115 = product of:
  0.07292623 = sum of:
    0.05173004 = weight(_text_:data in 3495) [ClassicSimilarity], result of:
      0.05173004 = score(doc=3495,freq=8.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.34936053 = fieldWeight in 3495, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3495)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 3495) [ClassicSimilarity], result of:
          0.042392377 = score(doc=3495,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 3495, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3495)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The book consists of separate chapters by some 20 different authors. It covers many of the information retrieval algorithms, including methods of file organization, file search and access, and query processing

Content

An edited volume containing data structures and algorithms for information retrieval including a disk with examples written in C. for prgrammers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents. ------------------Enthält die Kapitel: FRAKES, W.B.: Introduction to information storage and retrieval systems; BAEZA-YATES, R.S.: Introduction to data structures and algorithms related to information retrieval; HARMAN, D. u.a.: Inverted files; FALOUTSOS, C.: Signature files; GONNET, G.H. u.a.: New indices for text: PAT trees and PAT arrays; FORD, D.A. u. S. CHRISTODOULAKIS: File organizations for optical disks; FOX, C.: Lexical analysis and stoplists; FRAKES, W.B.: Stemming algorithms; SRINIVASAN, P.: Thesaurus construction; BAEZA-YATES, R.A.: String searching algorithms; HARMAN, D.: Relevance feedback and other query modification techniques; WARTIK, S.: Boolean operators; WARTIK, S. u.a.: Hashing algorithms; HARMAN, D.: Ranking algorithms; FOX, E.: u.a.: Extended Boolean models; RASMUSSEN, E.: Clustering algorithms; HOLLAAR, L.: Special-purpose hardware for information retrieval; STANFILL, C.: Parallel information retrieval algorithms

White, K.J.; Sutcliffe, R.F.E.: Applying incremental tree induction to retrieval : from manuals and medical texts (2006) 0.03

0.03466491 = product of:
  0.06932982 = sum of:
    0.043894395 = weight(_text_:data in 5044) [ClassicSimilarity], result of:
      0.043894395 = score(doc=5044,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.29644224 = fieldWeight in 5044, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=5044)
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 5044) [ClassicSimilarity], result of:
          0.05087085 = score(doc=5044,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 5044, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=5044)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The Decision Tree Forest (DTF) is an architecture for information retrieval that uses a separate decision tree for each document in a collection. Experiments were conducted in which DTFs working with the incremental tree induction (ITI) algorithm of Utgoff, Berkman, and Clouse (1997) were trained and evaluated in the medical and word processing domains using the Cystic Fibrosis and SIFT collections. Performance was compared with that of a conventional inverted index system (IIS) using a BM25-derived probabilistic matching function. Initial results using DTF were poor compared to those obtained with IIS. We then simulated scenarios in which large quantities of training data were available, by using only those parts of the document collection that were well covered by the data sets. Consequently, the retrieval effectiveness of DTF improved substantially. In one particular experiment, precision and recall for DTF were 0.65 and 0.67 respectively, values that compared favorably with values of 0.49 and 0.56 for IIS.

Kekäläinen, J.: Binary and graded relevance in IR evaluations : comparison of the effects on ranking of IR systems (2005) 0.03

0.03466491 = product of:
  0.06932982 = sum of:
    0.043894395 = weight(_text_:data in 1036) [ClassicSimilarity], result of:
      0.043894395 = score(doc=1036,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.29644224 = fieldWeight in 1036, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=1036)
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 1036) [ClassicSimilarity], result of:
          0.05087085 = score(doc=1036,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 1036, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=1036)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: In this study the rankings of IR systems based on binary and graded relevance in TREC 7 and 8 data are compared. Relevance of a sample TREC results is reassessed using a relevance scale with four levels: non-relevant, marginally relevant, fairly relevant, highly relevant. Twenty-one topics and 90 systems from TREC 7 and 20 topics and 121 systems from TREC 8 form the data. Binary precision, and cumulated gain, discounted cumulated gain and normalised discounted cumulated gain are the measures compared. Different weighting schemes for relevance levels are tested with cumulated gain measures. Kendall's rank correlations are computed to determine to what extent the rankings produced by different measures are similar. Weighting schemes from binary to emphasising highly relevant documents form a continuum, where the measures correlate strongly in the binary end, and less in the heavily weighted end. The results show the different character of the measures.
Source: Information processing and management. 41(2005) no.5, S.1019-1034

Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (1999) 0.03

0.033504575 = product of:
  0.06700915 = sum of:
    0.031038022 = weight(_text_:data in 5777) [ClassicSimilarity], result of:
      0.031038022 = score(doc=5777,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2096163 = fieldWeight in 5777, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=5777)
    0.035971127 = product of:
      0.071942255 = sum of:
        0.071942255 = weight(_text_:processing in 5777) [ClassicSimilarity], result of:
          0.071942255 = score(doc=5777,freq=4.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.3795138 = fieldWeight in 5777, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=5777)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: This book discusses many of the key design issues for building search engines and emphazises the important role that applied mathematics can play in improving information retrieval. The authors discuss not only important data structures, algorithms, and software but also user-centered issues such as interfaces, manual indexing, and document preparation. They also present some of the current problems in information retrieval that many not be familiar to applied mathematicians and computer scientists and some of the driving computational methods (SVD, SDD) for automated conceptual indexing
LCSH: Text processing (Computer science)
Subject: Text processing (Computer science)

Faloutsos, C.: Signature files (1992) 0.03

0.03338095 = product of:
  0.0667619 = sum of:
    0.04138403 = weight(_text_:data in 3499) [ClassicSimilarity], result of:
      0.04138403 = score(doc=3499,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2794884 = fieldWeight in 3499, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=3499)
    0.025377871 = product of:
      0.050755743 = sum of:
        0.050755743 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
          0.050755743 = score(doc=3499,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.30952093 = fieldWeight in 3499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3499)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 7. 5.1999 15:22:48
Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

Abdelali, A.; Cowie, J.; Soliman, H.S.: Improving query precision using semantic expansion (2007) 0.03

0.032942846 = product of:
  0.06588569 = sum of:
    0.036211025 = weight(_text_:data in 917) [ClassicSimilarity], result of:
      0.036211025 = score(doc=917,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24455236 = fieldWeight in 917, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=917)
    0.029674664 = product of:
      0.05934933 = sum of:
        0.05934933 = weight(_text_:processing in 917) [ClassicSimilarity], result of:
          0.05934933 = score(doc=917,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.3130829 = fieldWeight in 917, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=917)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Query Expansion (QE) is one of the most important mechanisms in the information retrieval field. A typical short Internet query will go through a process of refinement to improve its retrieval power. Most of the existing QE techniques suffer from retrieval performance degradation due to imprecise choice of query's additive terms in the QE process. In this paper, we introduce a novel automated QE mechanism. The new expansion process is guided by the semantics relations between the original query and the expanding words, in the context of the utilized corpus. Experimental results of our "controlled" query expansion, using the Arabic TREC-10 data, show a significant enhancement of recall and precision over current existing mechanisms in the field.
Source: Information processing and management. 43(2007) no.3, S.705-716

Ning, X.; Jin, H.; Jia, W.; Yuan, P.: Practical and effective IR-style keyword search over semantic web (2009) 0.03

0.032942846 = product of:
  0.06588569 = sum of:
    0.036211025 = weight(_text_:data in 4213) [ClassicSimilarity], result of:
      0.036211025 = score(doc=4213,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24455236 = fieldWeight in 4213, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4213)
    0.029674664 = product of:
      0.05934933 = sum of:
        0.05934933 = weight(_text_:processing in 4213) [ClassicSimilarity], result of:
          0.05934933 = score(doc=4213,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.3130829 = fieldWeight in 4213, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4213)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: This paper presents a novel IR-style keyword search model for semantic web data retrieval, distinguished from current retrieval methods. In this model, an answer to a keyword query is a connected subgraph that contains all the query keywords. In addition, the answer is minimal because any proper subgraph can not be an answer to the query. We provide an approximation algorithm to retrieve these answers efficiently. A special ranking strategy is also proposed so that answers can be appropriately ordered. The experimental results over real datasets show that our model outperforms existing possible solutions with respect to effectiveness and efficiency.
Source: Information processing and management. 45(2009) no.2, S.263-271

Joss, M.W.; Wszola, S.: ¬The engines that can : text search and retrieval software, their strategies, and vendors (1996) 0.03
```
0.0314639 = product of:
  0.0629278 = sum of:
    0.043894395 = weight(_text_:data in 5123) [ClassicSimilarity], result of:
      0.043894395 = score(doc=5123,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.29644224 = fieldWeight in 5123, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=5123)
    0.019033402 = product of:
      0.038066804 = sum of:
        0.038066804 = weight(_text_:22 in 5123) [ClassicSimilarity], result of:
          0.038066804 = score(doc=5123,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.23214069 = fieldWeight in 5123, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=5123)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Traces the development of text searching and retrieval software designed to cope with the increasing demands made by the storage and handling of large amounts of data, recorded on high data storage media, from CD-ROM to multi gigabyte storage media and online information services, with particular reference to the need to cope with graphics as well as conventional ASCII text. Includes details of: Boolean searching, fuzzy searching and matching; relevance ranking; proximity searching and improved strategies for dealing with text searching in very large databases. Concludes that the best searching tools for CD-ROM publishers are those optimized for searching and retrieval on CD-ROM. CD-ROM drives have relatively lower random seek times than hard discs and so the software most appropriate to the medium is that which can effectively arrange the indexes and text on the CD-ROM to avoid continuous random access searching. Lists and reviews a selection of software packages designed to achieve the sort of results required for rapid CD-ROM searching

Date

12. 9.1996 13:56:22
Burgin, R.: ¬The retrieval effectiveness of 5 clustering algorithms as a function of indexing exhaustivity (1995) 0.03
```
0.030330349 = product of:
  0.060660698 = sum of:
    0.04479953 = weight(_text_:data in 3365) [ClassicSimilarity], result of:
      0.04479953 = score(doc=3365,freq=6.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.30255508 = fieldWeight in 3365, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3365)
    0.01586117 = product of:
      0.03172234 = sum of:
        0.03172234 = weight(_text_:22 in 3365) [ClassicSimilarity], result of:
          0.03172234 = score(doc=3365,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.19345059 = fieldWeight in 3365, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3365)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The retrieval effectiveness of 5 hierarchical clustering methods (single link, complete link, group average, Ward's method, and weighted average) is examined as a function of indexing exhaustivity with 4 test collections (CR, Cranfield, Medlars, and Time). Evaluations of retrieval effectiveness, based on 3 measures of optimal retrieval performance, confirm earlier findings that the performance of a retrieval system based on single link clustering varies as a function of indexing exhaustivity but fail ti find similar patterns for other clustering methods. The data also confirm earlier findings regarding the poor performance of single link clustering is a retrieval environment. The poor performance of single link clustering appears to derive from that method's tendency to produce a small number of large, ill defined document clusters. By contrast, the data examined here found the retrieval performance of the other clustering methods to be general comparable. The data presented also provides an opportunity to examine the theoretical limits of cluster based retrieval and to compare these theoretical limits to the effectiveness of operational implementations. Performance standards of the 4 document collections examined were found to vary widely, and the effectiveness of operational implementations were found to be in the range defined as unacceptable. Further improvements in search strategies and document representations warrant investigations

Date

22. 2.1996 11:20:06
Quiroga, L.M.; Mostafa, J.: ¬An experiment in building profiles in information filtering : the role of context of user relevance feedback (2002) 0.03
```
0.028887425 = product of:
  0.05777485 = sum of:
    0.03657866 = weight(_text_:data in 2579) [ClassicSimilarity], result of:
      0.03657866 = score(doc=2579,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 2579, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2579)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 2579) [ClassicSimilarity], result of:
          0.042392377 = score(doc=2579,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 2579, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2579)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

An experiment was conducted to see how relevance feedback could be used to build and adjust profiles to improve the performance of filtering systems. Data was collected during the system interaction of 18 graduate students with SIFTER (Smart Information Filtering Technology for Electronic Resources), a filtering system that ranks incoming information based on users' profiles. The data set came from a collection of 6000 records concerning consumer health. In the first phase of the study, three different modes of profile acquisition were compared. The explicit mode allowed users to directly specify the profile; the implicit mode utilized relevance feedback to create and refine the profile; and the combined mode allowed users to initialize the profile and to continuously refine it using relevance feedback. Filtering performance, measured in terms of Normalized Precision, showed that the three approaches were significantly different ( [small alpha, Greek] =0.05 and p =0.012). The explicit mode of profile acquisition consistently produced superior results. Exclusive reliance on relevance feedback in the implicit mode resulted in inferior performance. The low performance obtained by the implicit acquisition mode motivated the second phase of the study, which aimed to clarify the role of context in relevance feedback judgments. An inductive content analysis of thinking aloud protocols showed dimensions that were highly situational, establishing the importance context plays in feedback relevance assessments. Results suggest the need for better representation of documents, profiles, and relevance feedback mechanisms that incorporate dimensions identified in this research.

Source

Information processing and management. 38(2002) no.5, S.671-694
Wan, X.; Yang, J.; Xiao, J.: Towards a unified approach to document similarity search using manifold-ranking of blocks (2008) 0.03
```
0.028887425 = product of:
  0.05777485 = sum of:
    0.03657866 = weight(_text_:data in 2081) [ClassicSimilarity], result of:
      0.03657866 = score(doc=2081,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 2081, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2081)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 2081) [ClassicSimilarity], result of:
          0.042392377 = score(doc=2081,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 2081, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2081)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches to similarity search first compute the pairwise similarity score between each document and the query using a retrieval function or similarity measure (e.g. Cosine), and then rank the documents by the similarity scores. In this paper, we propose a novel retrieval approach based on manifold-ranking of document blocks (i.e. a block of coherent text about a subtopic) to re-rank a small set of documents initially retrieved by some existing retrieval function. The proposed approach can make full use of the intrinsic global manifold structure of the document blocks by propagating the ranking scores between the blocks on a weighted graph. First, the TextTiling algorithm and the VIPS algorithm are respectively employed to segment text documents and web pages into blocks. Then, each block is assigned with a ranking score by the manifold-ranking algorithm. Lastly, a document gets its final ranking score by fusing the scores of its blocks. Experimental results on the TDT data and the ODP data demonstrate that the proposed approach can significantly improve the retrieval performances over baseline approaches. Document block is validated to be a better unit than the whole document in the manifold-ranking process.

Source

Information processing and management. 44(2008) no.3, S.1032-1048

Sakai, T.: On the reliability of information retrieval metrics based on graded relevance (2007) 0.03

0.028236724 = product of:
  0.05647345 = sum of:
    0.031038022 = weight(_text_:data in 910) [ClassicSimilarity], result of:
      0.031038022 = score(doc=910,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2096163 = fieldWeight in 910, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=910)
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 910) [ClassicSimilarity], result of:
          0.05087085 = score(doc=910,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 910, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=910)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: This paper compares 14 information retrieval metrics based on graded relevance, together with 10 traditional metrics based on binary relevance, in terms of stability, sensitivity and resemblance of system rankings. More specifically, we compare these metrics using the Buckley/Voorhees stability method, the Voorhees/Buckley swap method and Kendall's rank correlation, with three data sets comprising test collections and submitted runs from NTCIR. Our experiments show that (Average) Normalised Discounted Cumulative Gain at document cut-off l are the best among the rank-based graded-relevance metrics, provided that l is large. On the other hand, if one requires a recall-based graded-relevance metric that is highly correlated with Average Precision, then Q-measure is the best choice. Moreover, these best graded-relevance metrics are at least as stable and sensitive as Average Precision, and are fairly robust to the choice of gain values.
Source: Information processing and management. 43(2007) no.2, S.531-548

Rokaya, M.; Atlam, E.; Fuketa, M.; Dorji, T.C.; Aoe, J.-i.: Ranking of field association terms using Co-word analysis (2008) 0.03

0.028236724 = product of:
  0.05647345 = sum of:
    0.031038022 = weight(_text_:data in 2060) [ClassicSimilarity], result of:
      0.031038022 = score(doc=2060,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2096163 = fieldWeight in 2060, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=2060)
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 2060) [ClassicSimilarity], result of:
          0.05087085 = score(doc=2060,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 2060, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=2060)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Information retrieval involves finding some desired information in a store of information or a database. In this paper, Co-word analysis will be used to achieve a ranking of a selected sample of FA terms. Based on this ranking a better arranging of search results can be achieved. Experimental results achieved using 41 MB of data (7660 documents) in the field of sports. The corpus was collected from CNN newspaper, sports field. This corpus was chosen to be distributed over 11 sub-fields of the field sports from the experimental results, the average precision increased by 18.3% after applying the proposed arranging scheme depending on the absolute frequency to count the terms weights, and the average precision increased by 17.2% after applying the proposed arranging scheme depending on a formula based on "TF*IDF" to count the terms weights.
Source: Information processing and management. 44(2008) no.2, S.738-755

Zhao, L.; Wu, L.; Huang, X.: Using query expansion in graph-based approach for query-focused multi-document summarization (2009) 0.03

0.028236724 = product of:
  0.05647345 = sum of:
    0.031038022 = weight(_text_:data in 2449) [ClassicSimilarity], result of:
      0.031038022 = score(doc=2449,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2096163 = fieldWeight in 2449, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=2449)
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 2449) [ClassicSimilarity], result of:
          0.05087085 = score(doc=2449,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 2449, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=2449)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: This paper presents a novel query expansion method, which is combined in the graph-based algorithm for query-focused multi-document summarization, so as to resolve the problem of information limit in the original query. Our approach makes use of both the sentence-to-sentence relations and the sentence-to-word relations to select the query biased informative words from the document set and use them as query expansions to improve the sentence ranking result. Compared to previous query expansion approaches, our approach can capture more relevant information with less noise. We performed experiments on the data of document understanding conference (DUC) 2005 and DUC 2006, and the evaluation results show that the proposed query expansion method can significantly improve the system performance and make our system comparable to the state-of-the-art systems.
Source: Information processing and management. 45(2009) no.1, S.35-41

Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.03
```
0.026219916 = product of:
  0.05243983 = sum of:
    0.03657866 = weight(_text_:data in 56) [ClassicSimilarity], result of:
      0.03657866 = score(doc=56,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 56, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=56)
    0.01586117 = product of:
      0.03172234 = sum of:
        0.03172234 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
          0.03172234 = score(doc=56,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.19345059 = fieldWeight in 56, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.

Date

22. 7.2006 16:32:43

Search (169 results, page 1 of 9)

Authors

Years

Languages

Types

Themes

Subjects

Classifications