Search (258 results, page 1 of 13)

Salton, G.; Buckley, C.: Parallel text search methods (1988) 0.39

0.38776773 = product of:
  0.5170236 = sum of:
    0.02946245 = weight(_text_:for in 404) [ClassicSimilarity], result of:
      0.02946245 = score(doc=404,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.33190575 = fieldWeight in 404, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.125 = fieldNorm(doc=404)
    0.25572333 = weight(_text_:computing in 404) [ClassicSimilarity], result of:
      0.25572333 = score(doc=404,freq=2.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.9778349 = fieldWeight in 404, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.125 = fieldNorm(doc=404)
    0.23183785 = product of:
      0.4636757 = sum of:
        0.4636757 = weight(_text_:machinery in 404) [ClassicSimilarity], result of:
          0.4636757 = score(doc=404,freq=2.0), product of:
            0.35214928 = queryWeight, product of:
              7.448392 = idf(docFreq=69, maxDocs=44218)
              0.047278564 = queryNorm
            1.3167021 = fieldWeight in 404, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.448392 = idf(docFreq=69, maxDocs=44218)
              0.125 = fieldNorm(doc=404)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Source: Communications of the Association for Computing Machinery. 31(1988), S.205-215

Aho, A.; Corasick, M.: Efficient string matching : an aid to bibliographic search (1975) 0.34

0.33929676 = product of:
  0.45239568 = sum of:
    0.025779642 = weight(_text_:for in 3506) [ClassicSimilarity], result of:
      0.025779642 = score(doc=3506,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.29041752 = fieldWeight in 3506, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.109375 = fieldNorm(doc=3506)
    0.22375791 = weight(_text_:computing in 3506) [ClassicSimilarity], result of:
      0.22375791 = score(doc=3506,freq=2.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.85560554 = fieldWeight in 3506, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.109375 = fieldNorm(doc=3506)
    0.20285812 = product of:
      0.40571624 = sum of:
        0.40571624 = weight(_text_:machinery in 3506) [ClassicSimilarity], result of:
          0.40571624 = score(doc=3506,freq=2.0), product of:
            0.35214928 = queryWeight, product of:
              7.448392 = idf(docFreq=69, maxDocs=44218)
              0.047278564 = queryNorm
            1.1521144 = fieldWeight in 3506, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.448392 = idf(docFreq=69, maxDocs=44218)
              0.109375 = fieldNorm(doc=3506)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Source: Communications of the Association for Computing Machinery. 18(1975), S.333-340

Boyer, R.; Moore, S.: ¬A fast string searching algorithm (1977) 0.34

0.33929676 = product of:
  0.45239568 = sum of:
    0.025779642 = weight(_text_:for in 3507) [ClassicSimilarity], result of:
      0.025779642 = score(doc=3507,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.29041752 = fieldWeight in 3507, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.109375 = fieldNorm(doc=3507)
    0.22375791 = weight(_text_:computing in 3507) [ClassicSimilarity], result of:
      0.22375791 = score(doc=3507,freq=2.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.85560554 = fieldWeight in 3507, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.109375 = fieldNorm(doc=3507)
    0.20285812 = product of:
      0.40571624 = sum of:
        0.40571624 = weight(_text_:machinery in 3507) [ClassicSimilarity], result of:
          0.40571624 = score(doc=3507,freq=2.0), product of:
            0.35214928 = queryWeight, product of:
              7.448392 = idf(docFreq=69, maxDocs=44218)
              0.047278564 = queryNorm
            1.1521144 = fieldWeight in 3507, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.448392 = idf(docFreq=69, maxDocs=44218)
              0.109375 = fieldNorm(doc=3507)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Source: Communications of the Association for Computing Machinery. 20(1977), S.762-772

MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.17

0.17397097 = product of:
  0.23196128 = sum of:
    0.02551523 = weight(_text_:for in 5108) [ClassicSimilarity], result of:
      0.02551523 = score(doc=5108,freq=6.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.28743884 = fieldWeight in 5108, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0625 = fieldNorm(doc=5108)
    0.18082368 = weight(_text_:computing in 5108) [ClassicSimilarity], result of:
      0.18082368 = score(doc=5108,freq=4.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.69143367 = fieldWeight in 5108, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.0625 = fieldNorm(doc=5108)
    0.025622372 = product of:
      0.051244743 = sum of:
        0.051244743 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
          0.051244743 = score(doc=5108,freq=2.0), product of:
            0.16556148 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047278564 = queryNorm
            0.30952093 = fieldWeight in 5108, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=5108)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: In this paper methods for both speeding up passage processing and examining more passages using parallel computers are explored. The number of passages processed are varied in order to examine the effect on retrieval effectiveness and efficiency. The particular algorithm applied has previously been used to good effect in Okapi experiments at TREC. This algorithm and the mechanism for applying parallel computing to speed up processing are described.
Date: 20. 1.2007 18:30:22

Kleinberg, J.M.: Authoritative sources in a hyperlinked environment (1998) 0.15

0.1536992 = product of:
  0.20493227 = sum of:
    0.022096837 = weight(_text_:for in 5) [ClassicSimilarity], result of:
      0.022096837 = score(doc=5,freq=8.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.24892932 = fieldWeight in 5, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.046875 = fieldNorm(doc=5)
    0.095896244 = weight(_text_:computing in 5) [ClassicSimilarity], result of:
      0.095896244 = score(doc=5,freq=2.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.36668807 = fieldWeight in 5, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.046875 = fieldNorm(doc=5)
    0.08693919 = product of:
      0.17387839 = sum of:
        0.17387839 = weight(_text_:machinery in 5) [ClassicSimilarity], result of:
          0.17387839 = score(doc=5,freq=2.0), product of:
            0.35214928 = queryWeight, product of:
              7.448392 = idf(docFreq=69, maxDocs=44218)
              0.047278564 = queryNorm
            0.4937633 = fieldWeight in 5, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.448392 = idf(docFreq=69, maxDocs=44218)
              0.046875 = fieldNorm(doc=5)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.
Source: Journal of the Association for Computing Machinery. 46(1998) no.5, S.604-632

Maron, M.E.; Kuhns, I.L.: On relevance, probabilistic indexing and information retrieval (1960) 0.15
```
0.15290862 = product of:
  0.20387816 = sum of:
    0.01841403 = weight(_text_:for in 1928) [ClassicSimilarity], result of:
      0.01841403 = score(doc=1928,freq=8.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.20744109 = fieldWeight in 1928, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1928)
    0.11301481 = weight(_text_:computing in 1928) [ClassicSimilarity], result of:
      0.11301481 = score(doc=1928,freq=4.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.43214604 = fieldWeight in 1928, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1928)
    0.07244933 = product of:
      0.14489865 = sum of:
        0.14489865 = weight(_text_:machinery in 1928) [ClassicSimilarity], result of:
          0.14489865 = score(doc=1928,freq=2.0), product of:
            0.35214928 = queryWeight, product of:
              7.448392 = idf(docFreq=69, maxDocs=44218)
              0.047278564 = queryNorm
            0.4114694 = fieldWeight in 1928, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.448392 = idf(docFreq=69, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1928)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Abstract

Reports on a novel technique for literature indexing and searching in a mechanized library system. The notion of relevance is taken as the key concept in the theory of information retrieval and a comparative concept of relevance is explicated in terms of the theory of probability. The resulting technique called 'Probabilistic indexing' allows a computing machine, given a request for information, to make a statistical inference and derive a number (called the 'relevance number') for each document, which is a measure of the probability that the document will satisfy the given request. The result of a search is an ordered list of those documents which satisfy the request ranked according to their probable relevance. The paper goes on to show that whereas in a conventional library system the cross-referencing ('see' and 'see also') is based soley on the 'semantic closeness' between index terms, statistical measures of closeness between index terms can be defined and computed. Thus, given an arbitrary request consisting of one (or many) index term(s), a machine can eleborate on it to increase the probability of selecting relevant documents that would not otherwise have been selected. Finally, the paper suggest an interpretation of the whole library problem as one where the request is considered as a clue on the basis of which the library system makes a concatenated statistical inference in order to provide as an output an ordered list of those documents which most probably satisfy the information needs of the user

Source

Journal of the Association for Computing Machinery. 7(1960) no.3, S.216-244
MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the generation of partitioned inverted files (2005) 0.09
```
0.092616804 = product of:
  0.18523361 = sum of:
    0.019136423 = weight(_text_:for in 651) [ClassicSimilarity], result of:
      0.019136423 = score(doc=651,freq=6.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.21557912 = fieldWeight in 651, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.046875 = fieldNorm(doc=651)
    0.16609718 = weight(_text_:computing in 651) [ClassicSimilarity], result of:
      0.16609718 = score(doc=651,freq=6.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.6351224 = fieldWeight in 651, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.046875 = fieldNorm(doc=651)
  0.5 = coord(2/4)
```
Abstract

Purpose - The generation of inverted indexes is one of the most computationally intensive activities for information retrieval systems: indexing large multi-gigabyte text databases can take many hours or even days to complete. We examine the generation of partitioned inverted files in order to speed up the process of indexing. Two types of index partitions are investigated: TermId and DocId. Design/methodology/approach - We use standard measures used in parallel computing such as speedup and efficiency to examine the computing results and also the space costs of our trial indexing experiments. Findings - The results from runs on both partitioning methods are compared and contrasted, concluding that DocId is the more efficient method. Practical implications - The practical implications are that the DocId partitioning method would in most circumstances be used for distributing inverted file data in a parallel computer, particularly if indexing speed is the primary consideration. Originality/value - The paper is of value to database administrators who manage large-scale text collections, and who need to use parallel computing to implement their text retrieval services.
MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the update of partitioned inverted files (2007) 0.08
```
0.07950091 = product of:
  0.15900183 = sum of:
    0.020587513 = weight(_text_:for in 819) [ClassicSimilarity], result of:
      0.020587513 = score(doc=819,freq=10.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.2319262 = fieldWeight in 819, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=819)
    0.13841431 = weight(_text_:computing in 819) [ClassicSimilarity], result of:
      0.13841431 = score(doc=819,freq=6.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.5292687 = fieldWeight in 819, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.0390625 = fieldNorm(doc=819)
  0.5 = coord(2/4)
```
Abstract

Purpose - An issue that tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query service, and much work has been done with the emphasis strongly on queries. This paper aims to study the effect of using parallel methods for the update of inverted files in order to reduce costs, by looking at two types of partitioning for inverted files: document identifier and term identifier. Design/methodology/approach - Raw update service and update with query service are studied with these partitioning schemes using an incremental update strategy. The paper uses standard measures used in parallel computing such as speedup to examine the computing results and also the costs of reorganising indexes while servicing transactions. Findings - Empirical results show that for both transaction processing and index reorganisation the document identifier method is superior. However, there is evidence that the term identifier partitioning method could be useful in a concurrent transaction processing context. Practical implications - There is an increasing need to service updates, which is now becoming a requirement of inverted files (for dynamic collections such as the web), demonstrating that a shift in requirements of inverted file maintenance is needed from the past. Originality/value - The paper is of value to database administrators who manage large-scale and dynamic text collections, and who need to use parallel computing to implement their text retrieval services.
Costa Carvalho, A. da; Rossi, C.; Moura, E.S. de; Silva, A.S. da; Fernandes, D.: LePrEF: Learn to precompute evidence fusion for efficient query evaluation (2012) 0.05
```
0.051233012 = product of:
  0.102466024 = sum of:
    0.022552488 = weight(_text_:for in 278) [ClassicSimilarity], result of:
      0.022552488 = score(doc=278,freq=12.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.2540624 = fieldWeight in 278, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=278)
    0.079913534 = weight(_text_:computing in 278) [ClassicSimilarity], result of:
      0.079913534 = score(doc=278,freq=2.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.3055734 = fieldWeight in 278, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.0390625 = fieldNorm(doc=278)
  0.5 = coord(2/4)
```
Abstract

State-of-the-art search engine ranking methods combine several distinct sources of relevance evidence to produce a high-quality ranking of results for each query. The fusion of information is currently done at query-processing time, which has a direct effect on the response time of search systems. Previous research also shows that an alternative to improve search efficiency in textual databases is to precompute term impacts at indexing time. In this article, we propose a novel alternative to precompute term impacts, providing a generic framework for combining any distinct set of sources of evidence by using a machine-learning technique. This method retains the advantages of producing high-quality results, but avoids the costs of combining evidence at query-processing time. Our method, called Learn to Precompute Evidence Fusion (LePrEF), uses genetic programming to compute a unified precomputed impact value for each term found in each document prior to query processing, at indexing time. Compared with previous research on precomputing term impacts, our method offers the advantage of providing a generic framework to precompute impact using any set of relevance evidence at any text collection, whereas previous research articles do not. The precomputed impact values are indexed and used later for computing document ranking at query-processing time. By doing so, our method effectively reduces the query processing to simple additions of such impacts. We show that this approach, while leading to results comparable to state-of-the-art ranking methods, also can lead to a significant decrease in computational costs during query processing.

Source

Journal of the American Society for Information Science and Technology. 63(2012) no.7, S.1383-1397
Information retrieval : data structures and algorithms (1992) 0.05
```
0.050250523 = product of:
  0.100501046 = sum of:
    0.020587513 = weight(_text_:for in 3495) [ClassicSimilarity], result of:
      0.020587513 = score(doc=3495,freq=10.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.2319262 = fieldWeight in 3495, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3495)
    0.079913534 = weight(_text_:computing in 3495) [ClassicSimilarity], result of:
      0.079913534 = score(doc=3495,freq=2.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.3055734 = fieldWeight in 3495, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3495)
  0.5 = coord(2/4)
```
Content

An edited volume containing data structures and algorithms for information retrieval including a disk with examples written in C. for prgrammers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents. ------------------Enthält die Kapitel: FRAKES, W.B.: Introduction to information storage and retrieval systems; BAEZA-YATES, R.S.: Introduction to data structures and algorithms related to information retrieval; HARMAN, D. u.a.: Inverted files; FALOUTSOS, C.: Signature files; GONNET, G.H. u.a.: New indices for text: PAT trees and PAT arrays; FORD, D.A. u. S. CHRISTODOULAKIS: File organizations for optical disks; FOX, C.: Lexical analysis and stoplists; FRAKES, W.B.: Stemming algorithms; SRINIVASAN, P.: Thesaurus construction; BAEZA-YATES, R.A.: String searching algorithms; HARMAN, D.: Relevance feedback and other query modification techniques; WARTIK, S.: Boolean operators; WARTIK, S. u.a.: Hashing algorithms; HARMAN, D.: Ranking algorithms; FOX, E.: u.a.: Extended Boolean models; RASMUSSEN, E.: Clustering algorithms; HOLLAAR, L.: Special-purpose hardware for information retrieval; STANFILL, C.: Parallel information retrieval algorithms

Footnote

Rez. in: Computing reviews. July 1993, S.341-342 (G. Salton)
Jacucci, G.; Barral, O.; Daee, P.; Wenzel, M.; Serim, B.; Ruotsalo, T.; Pluchino, P.; Freeman, J.; Gamberini, L.; Kaski, S.; Blankertz, B.: Integrating neurophysiologic relevance feedback in intent modeling for information retrieval (2019) 0.05
```
0.047930278 = product of:
  0.095860556 = sum of:
    0.01594702 = weight(_text_:for in 5356) [ClassicSimilarity], result of:
      0.01594702 = score(doc=5356,freq=6.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.17964928 = fieldWeight in 5356, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5356)
    0.079913534 = weight(_text_:computing in 5356) [ClassicSimilarity], result of:
      0.079913534 = score(doc=5356,freq=2.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.3055734 = fieldWeight in 5356, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5356)
  0.5 = coord(2/4)
```
Abstract

The use of implicit relevance feedback from neurophysiology could deliver effortless information retrieval. However, both computing neurophysiologic responses and retrieving documents are characterized by uncertainty because of noisy signals and incomplete or inconsistent representations of the data. We present the first-of-its-kind, fully integrated information retrieval system that makes use of online implicit relevance feedback generated from brain activity as measured through electroencephalography (EEG), and eye movements. The findings of the evaluation experiment (N = 16) show that we are able to compute online neurophysiology-based relevance feedback with performance significantly better than chance in complex data domains and realistic search tasks. We contribute by demonstrating how to integrate in interactive intent modeling this inherently noisy implicit relevance feedback combined with scarce explicit feedback. Although experimental measures of task performance did not allow us to demonstrate how the classification outcomes translated into search task performance, the experiment proved that our approach is able to generate relevance feedback from brain signals and eye movements in a realistic scenario, thus providing promising implications for future work in neuroadaptive information retrieval (IR).

Source

Journal of the Association for Information Science and Technology. 70(2019) no.9, S.917-930
Dang, E.K.F.; Luk, R.W.P.; Allan, J.: Beyond bag-of-words : bigram-enhanced context-dependent term weights (2014) 0.05
```
0.04646711 = product of:
  0.09293422 = sum of:
    0.013020686 = weight(_text_:for in 1283) [ClassicSimilarity], result of:
      0.013020686 = score(doc=1283,freq=4.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.14668301 = fieldWeight in 1283, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1283)
    0.079913534 = weight(_text_:computing in 1283) [ClassicSimilarity], result of:
      0.079913534 = score(doc=1283,freq=2.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.3055734 = fieldWeight in 1283, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1283)
  0.5 = coord(2/4)
```
Abstract

While term independence is a widely held assumption in most of the established information retrieval approaches, it is clearly not true and various works in the past have investigated a relaxation of the assumption. One approach is to use n-grams in document representation instead of unigrams. However, the majority of early works on n-grams obtained only modest performance improvement. On the other hand, the use of information based on supporting terms or "contexts" of queries has been found to be promising. In particular, recent studies showed that using new context-dependent term weights improved the performance of relevance feedback (RF) retrieval compared with using traditional bag-of-words BM25 term weights. Calculation of the new term weights requires an estimation of the local probability of relevance of each query term occurrence. In previous studies, the estimation of this probability was based on unigrams that occur in the neighborhood of a query term. We explore an integration of the n-gram and context approaches by computing context-dependent term weights based on a mixture of unigrams and bigrams. Extensive experiments are performed using the title queries of the Text Retrieval Conference (TREC)-6, TREC-7, TREC-8, and TREC-2005 collections, for RF with relevance judgment of either the top 10 or top 20 documents of an initial retrieval. We identify some crucial elements needed in the use of bigrams in our methods, such as proper inverse document frequency (IDF) weighting of the bigrams and noise reduction by pruning bigrams with large document frequency values. We show that enhancing context-dependent term weights with bigrams is effective in further improving retrieval performance.

Source

Journal of the Association for Information Science and Technology. 65(2014) no.6, S.1134-1148

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.04

0.040353596 = product of:
  0.08070719 = sum of:
    0.02946245 = weight(_text_:for in 402) [ClassicSimilarity], result of:
      0.02946245 = score(doc=402,freq=2.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.33190575 = fieldWeight in 402, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.125 = fieldNorm(doc=402)
    0.051244743 = product of:
      0.10248949 = sum of:
        0.10248949 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.10248949 = score(doc=402,freq=2.0), product of:
            0.16556148 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047278564 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Source: Information processing and management. 22(1986) no.6, S.465-476

Henzinger, M.R.: Hyperlink analysis for the Web (2001) 0.04
```
0.04020042 = product of:
  0.08040084 = sum of:
    0.01647001 = weight(_text_:for in 8) [ClassicSimilarity], result of:
      0.01647001 = score(doc=8,freq=10.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.18554096 = fieldWeight in 8, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.03125 = fieldNorm(doc=8)
    0.06393083 = weight(_text_:computing in 8) [ClassicSimilarity], result of:
      0.06393083 = score(doc=8,freq=2.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.24445872 = fieldWeight in 8, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.03125 = fieldNorm(doc=8)
  0.5 = coord(2/4)
```
Content

Information retrieval is a computer science subfield whose goal is to find all documents relevant to a user query in a given collection of documents. As such, information retrieval should really be called document retrieval. Before the advent of the Web, IR systems were typically installed in libraries for use mostly by reference librarians. The retrieval algorithm for these systems was usually based exclusively on analysis of the words in the document. The Web changed all this. Now each Web user has access to various search engines whose retrieval algorithms often use not only the words in the documents but also information like the hyperlink structure of the Web or markup language tags. How are hyperlinks useful? The hyperlink functionality alone-that is, the hyperlink to Web page B that is contained in Web page A-is not directly useful in information retrieval. However, the way Web page authors use hyperlinks can give them valuable information content. Authors usually create hyperlinks they think will be useful to readers. Some may be navigational aids that, for example, take the reader back to the site's home page; others provide access to documents that augment the content of the current page. The latter tend to point to highquality pages that might be on the same topic as the page containing the hyperlink. Web information retrieval systems can exploit this information to refine searches for relevant documents. Hyperlink analysis significantly improves the relevance of the search results, so much so that all major Web search engines claim to use some type of hyperlink analysis. However, the search engines do not disclose details about the type of hyperlink analysis they perform- mostly to avoid manipulation of search results by Web-positioning companies. In this article, I discuss how hyperlink analysis can be applied to ranking algorithms, and survey other ways Web search engines can use this analysis.

Source

IEEE Internet computing. 5(2001) no.1, S.45-50
Bar-Ilan, J.; Levene, M.; Mat-Hassan, M.: Methods for evaluating dynamic changes in search engine rankings : a case study (2006) 0.04
```
0.038344223 = product of:
  0.076688446 = sum of:
    0.012757615 = weight(_text_:for in 616) [ClassicSimilarity], result of:
      0.012757615 = score(doc=616,freq=6.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.14371942 = fieldWeight in 616, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.03125 = fieldNorm(doc=616)
    0.06393083 = weight(_text_:computing in 616) [ClassicSimilarity], result of:
      0.06393083 = score(doc=616,freq=2.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.24445872 = fieldWeight in 616, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.03125 = fieldNorm(doc=616)
  0.5 = coord(2/4)
```
Abstract

Purpose - The objective of this paper is to characterize the changes in the rankings of the top ten results of major search engines over time and to compare the rankings between these engines. Design/methodology/approach - The papers compare rankings of the top-ten results of the search engines Google and AlltheWeb on ten identical queries over a period of three weeks. Only the top-ten results were considered, since users do not normally inspect more than the first results page returned by a search engine. The experiment was repeated twice, in October 2003 and in January 2004, in order to assess changes to the top-ten results of some of the queries during the three months interval. In order to assess the changes in the rankings, three measures were computed for each data collection point and each search engine. Findings - The findings in this paper show that the rankings of AlltheWeb were highly stable over each period, while the rankings of Google underwent constant yet minor changes, with occasional major ones. Changes over time can be explained by the dynamic nature of the web or by fluctuations in the search engines' indexes. The top-ten results of the two search engines had surprisingly low overlap. With such small overlap, the task of comparing the rankings of the two engines becomes extremely challenging. Originality/value - The paper shows that because of the abundance of information on the web, ranking search results is of extreme importance. The paper compares several measures for computing the similarity between rankings of search tools, and shows that none of the measures is fully satisfactory as a standalone measure. It also demonstrates the apparent differences in the ranking algorithms of two widely used search engines.

Stanfill, C.: Parallel information retrieval algorithms (1992) 0.03

0.031965416 = product of:
  0.12786166 = sum of:
    0.12786166 = weight(_text_:computing in 3515) [ClassicSimilarity], result of:
      0.12786166 = score(doc=3515,freq=2.0), product of:
        0.26151994 = queryWeight, product of:
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.047278564 = queryNorm
        0.48891744 = fieldWeight in 3515, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.5314693 = idf(docFreq=475, maxDocs=44218)
          0.0625 = fieldNorm(doc=3515)
  0.25 = coord(1/4)

Abstract: Data Parallel computers, such as the connection Machine CM-2, can provide interactive access to text databases containign tens, hundreds or even thousands of Gigabytes of data. Starts by presenting a brief overview of data parallel computing, a performance model of the CM-2, and a model of the workload involved in searching text databases. Discusses various algorithms used in information retrieval and gives performance estimates based on the data and procssing models presented

Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.02
```
0.024224073 = product of:
  0.048448145 = sum of:
    0.029231368 = weight(_text_:for in 2239) [ClassicSimilarity], result of:
      0.029231368 = score(doc=2239,freq=14.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.32930255 = fieldWeight in 2239, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.046875 = fieldNorm(doc=2239)
    0.019216778 = product of:
      0.038433556 = sum of:
        0.038433556 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
          0.038433556 = score(doc=2239,freq=2.0), product of:
            0.16556148 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047278564 = queryNorm
            0.23214069 = fieldWeight in 2239, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2239)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.

Date

31. 5.2004 19:22:06

Source

Journal of the American Society for Information Science and technology. 55(2004) no.7, S.628-636

Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.02

0.023227734 = product of:
  0.04645547 = sum of:
    0.020833097 = weight(_text_:for in 1422) [ClassicSimilarity], result of:
      0.020833097 = score(doc=1422,freq=4.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.23469281 = fieldWeight in 1422, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.0625 = fieldNorm(doc=1422)
    0.025622372 = product of:
      0.051244743 = sum of:
        0.051244743 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
          0.051244743 = score(doc=1422,freq=2.0), product of:
            0.16556148 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047278564 = queryNorm
            0.30952093 = fieldWeight in 1422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1422)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: We propose a novel approach to incorporate term similarity and inverse document frequency into a logical model of information retrieval. The ability of the logic to handle expressive representations along with the use of such classical notions are promising characteristics for IR systems. The approach proposed here has been efficiently implemented and experiments against test collections are presented.
Date: 22. 3.2003 19:27:23
Source: Journal of the American Society for Information Science and technology. 54(2003) no.4, S.285-301

Joss, M.W.; Wszola, S.: ¬The engines that can : text search and retrieval software, their strategies, and vendors (1996) 0.02
```
0.020656807 = product of:
  0.041313615 = sum of:
    0.022096837 = weight(_text_:for in 5123) [ClassicSimilarity], result of:
      0.022096837 = score(doc=5123,freq=8.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.24892932 = fieldWeight in 5123, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.046875 = fieldNorm(doc=5123)
    0.019216778 = product of:
      0.038433556 = sum of:
        0.038433556 = weight(_text_:22 in 5123) [ClassicSimilarity], result of:
          0.038433556 = score(doc=5123,freq=2.0), product of:
            0.16556148 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047278564 = queryNorm
            0.23214069 = fieldWeight in 5123, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=5123)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Traces the development of text searching and retrieval software designed to cope with the increasing demands made by the storage and handling of large amounts of data, recorded on high data storage media, from CD-ROM to multi gigabyte storage media and online information services, with particular reference to the need to cope with graphics as well as conventional ASCII text. Includes details of: Boolean searching, fuzzy searching and matching; relevance ranking; proximity searching and improved strategies for dealing with text searching in very large databases. Concludes that the best searching tools for CD-ROM publishers are those optimized for searching and retrieval on CD-ROM. CD-ROM drives have relatively lower random seek times than hard discs and so the software most appropriate to the medium is that which can effectively arrange the indexes and text on the CD-ROM to avoid continuous random access searching. Lists and reviews a selection of software packages designed to achieve the sort of results required for rapid CD-ROM searching

Date

12. 9.1996 13:56:22
Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.02
```
0.020656807 = product of:
  0.041313615 = sum of:
    0.022096837 = weight(_text_:for in 2419) [ClassicSimilarity], result of:
      0.022096837 = score(doc=2419,freq=8.0), product of:
        0.08876751 = queryWeight, product of:
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.047278564 = queryNorm
        0.24892932 = fieldWeight in 2419, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.8775425 = idf(docFreq=18385, maxDocs=44218)
          0.046875 = fieldNorm(doc=2419)
    0.019216778 = product of:
      0.038433556 = sum of:
        0.038433556 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
          0.038433556 = score(doc=2419,freq=2.0), product of:
            0.16556148 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047278564 = queryNorm
            0.23214069 = fieldWeight in 2419, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.

Date

16.11.2008 16:22:48

Source

Research and advanced technology for digital libraries : 8th European conference, ECDL 2004, Bath, UK, September 12-17, 2004 : proceedings. Eds.: Heery, R. u. E. Lyon

Search (258 results, page 1 of 13)

Authors

Years

Languages

Types

Themes

Subjects

Classifications