Search (105 results, page 1 of 6)

Faloutsos, C.: Signature files (1992) 0.05

0.05130119 = product of:
  0.10260238 = sum of:
    0.10260238 = sum of:
      0.04608324 = weight(_text_:data in 3499) [ClassicSimilarity], result of:
        0.04608324 = score(doc=3499,freq=2.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.2794884 = fieldWeight in 3499, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.0625 = fieldNorm(doc=3499)
      0.056519132 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
        0.056519132 = score(doc=3499,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.30952093 = fieldWeight in 3499, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=3499)
  0.5 = coord(1/2)

Date: 7. 5.1999 15:22:48
Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

Joss, M.W.; Wszola, S.: ¬The engines that can : text search and retrieval software, their strategies, and vendors (1996) 0.05
```
0.045634005 = product of:
  0.09126801 = sum of:
    0.09126801 = sum of:
      0.048878662 = weight(_text_:data in 5123) [ClassicSimilarity], result of:
        0.048878662 = score(doc=5123,freq=4.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.29644224 = fieldWeight in 5123, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.046875 = fieldNorm(doc=5123)
      0.04238935 = weight(_text_:22 in 5123) [ClassicSimilarity], result of:
        0.04238935 = score(doc=5123,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.23214069 = fieldWeight in 5123, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=5123)
  0.5 = coord(1/2)
```
Abstract

Traces the development of text searching and retrieval software designed to cope with the increasing demands made by the storage and handling of large amounts of data, recorded on high data storage media, from CD-ROM to multi gigabyte storage media and online information services, with particular reference to the need to cope with graphics as well as conventional ASCII text. Includes details of: Boolean searching, fuzzy searching and matching; relevance ranking; proximity searching and improved strategies for dealing with text searching in very large databases. Concludes that the best searching tools for CD-ROM publishers are those optimized for searching and retrieval on CD-ROM. CD-ROM drives have relatively lower random seek times than hard discs and so the software most appropriate to the medium is that which can effectively arrange the indexes and text on the CD-ROM to avoid continuous random access searching. Lists and reviews a selection of software packages designed to achieve the sort of results required for rapid CD-ROM searching

Date

12. 9.1996 13:56:22
Burgin, R.: ¬The retrieval effectiveness of 5 clustering algorithms as a function of indexing exhaustivity (1995) 0.04
```
0.042605516 = product of:
  0.08521103 = sum of:
    0.08521103 = sum of:
      0.049886573 = weight(_text_:data in 3365) [ClassicSimilarity], result of:
        0.049886573 = score(doc=3365,freq=6.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.30255508 = fieldWeight in 3365, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.0390625 = fieldNorm(doc=3365)
      0.035324458 = weight(_text_:22 in 3365) [ClassicSimilarity], result of:
        0.035324458 = score(doc=3365,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.19345059 = fieldWeight in 3365, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=3365)
  0.5 = coord(1/2)
```
Abstract

The retrieval effectiveness of 5 hierarchical clustering methods (single link, complete link, group average, Ward's method, and weighted average) is examined as a function of indexing exhaustivity with 4 test collections (CR, Cranfield, Medlars, and Time). Evaluations of retrieval effectiveness, based on 3 measures of optimal retrieval performance, confirm earlier findings that the performance of a retrieval system based on single link clustering varies as a function of indexing exhaustivity but fail ti find similar patterns for other clustering methods. The data also confirm earlier findings regarding the poor performance of single link clustering is a retrieval environment. The poor performance of single link clustering appears to derive from that method's tendency to produce a small number of large, ill defined document clusters. By contrast, the data examined here found the retrieval performance of the other clustering methods to be general comparable. The data presented also provides an opportunity to examine the theoretical limits of cluster based retrieval and to compare these theoretical limits to the effectiveness of operational implementations. Performance standards of the 4 document collections examined were found to vary widely, and the effectiveness of operational implementations were found to be in the range defined as unacceptable. Further improvements in search strategies and document representations warrant investigations

Date

22. 2.1996 11:20:06
Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.04
```
0.03847589 = product of:
  0.07695178 = sum of:
    0.07695178 = sum of:
      0.03456243 = weight(_text_:data in 2419) [ClassicSimilarity], result of:
        0.03456243 = score(doc=2419,freq=2.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.2096163 = fieldWeight in 2419, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.046875 = fieldNorm(doc=2419)
      0.04238935 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
        0.04238935 = score(doc=2419,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.23214069 = fieldWeight in 2419, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2419)
  0.5 = coord(1/2)
```
Abstract

The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.

Date

16.11.2008 16:22:48
Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.04
```
0.038028337 = product of:
  0.076056674 = sum of:
    0.076056674 = sum of:
      0.040732216 = weight(_text_:data in 56) [ClassicSimilarity], result of:
        0.040732216 = score(doc=56,freq=4.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.24703519 = fieldWeight in 56, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.0390625 = fieldNorm(doc=56)
      0.035324458 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
        0.035324458 = score(doc=56,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.19345059 = fieldWeight in 56, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=56)
  0.5 = coord(1/2)
```
Abstract

The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.

Date

22. 7.2006 16:32:43
Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.04
```
0.038028337 = product of:
  0.076056674 = sum of:
    0.076056674 = sum of:
      0.040732216 = weight(_text_:data in 664) [ClassicSimilarity], result of:
        0.040732216 = score(doc=664,freq=4.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.24703519 = fieldWeight in 664, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.0390625 = fieldNorm(doc=664)
      0.035324458 = weight(_text_:22 in 664) [ClassicSimilarity], result of:
        0.035324458 = score(doc=664,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.19345059 = fieldWeight in 664, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=664)
  0.5 = coord(1/2)
```
Abstract

A new challenge, accessing multiple relevant entities, arises from the availability of linked heterogeneous data. In this article, we address more specifically the problem of accessing relevant entities, such as publications and authors within a bibliographic network, given an information need. We propose a novel algorithm, called BibRank, that estimates a joint relevance of documents and authors within a bibliographic network. This model ranks each type of entity using a score propagation algorithm with respect to the query topic and the structure of the underlying bi-type information entity network. Evidence sources, namely content-based and network-based scores, are both used to estimate the topical similarity between connected entities. For this purpose, authorship relationships are analyzed through a language model-based score on the one hand and on the other hand, non topically related entities of the same type are detected through marginal citations. The article reports the results of experiments using the Bibrank algorithm for an information retrieval task. The CiteSeerX bibliographic data set forms the basis for the topical query automatic generation and evaluation. We show that a statistically significant improvement over closely related ranking models is achieved.

Date

22. 3.2013 19:34:49
Baloh, P.; Desouza, K.C.; Hackney, R.: Contextualizing organizational interventions of knowledge management systems : a design science perspectiveA domain analysis (2012) 0.03
```
0.032063242 = product of:
  0.064126484 = sum of:
    0.064126484 = sum of:
      0.028802028 = weight(_text_:data in 241) [ClassicSimilarity], result of:
        0.028802028 = score(doc=241,freq=2.0), product of:
          0.16488427 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.052144732 = queryNorm
          0.17468026 = fieldWeight in 241, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.0390625 = fieldNorm(doc=241)
      0.035324458 = weight(_text_:22 in 241) [ClassicSimilarity], result of:
        0.035324458 = score(doc=241,freq=2.0), product of:
          0.18260197 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052144732 = queryNorm
          0.19345059 = fieldWeight in 241, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=241)
  0.5 = coord(1/2)
```
Abstract

We address how individuals' (workers) knowledge needs influence the design of knowledge management systems (KMS), enabling knowledge creation and utilization. It is evident that KMS technologies and activities are indiscriminately deployed in most organizations with little regard to the actual context of their adoption. Moreover, it is apparent that the extant literature pertaining to knowledge management projects is frequently deficient in identifying the variety of factors indicative for successful KMS. This presents an obvious business practice and research gap that requires a critical analysis of the necessary intervention that will actually improve how workers can leverage and form organization-wide knowledge. This research involved an extensive review of the literature, a grounded theory methodological approach and rigorous data collection and synthesis through an empirical case analysis (Parsons Brinckerhoff and Samsung). The contribution of this study is the formulation of a model for designing KMS based upon the design science paradigm, which aspires to create artifacts that are interdependent of people and organizations. The essential proposition is that KMS design and implementation must be contextualized in relation to knowledge needs and that these will differ for various organizational settings. The findings present valuable insights and further understanding of the way in which KMS design efforts should be focused.

Date

11. 6.2012 14:22:34

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.03

0.028259566 = product of:
  0.056519132 = sum of:
    0.056519132 = product of:
      0.113038264 = sum of:
        0.113038264 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.113038264 = score(doc=402,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information processing and management. 22(1986) no.6, S.465-476

Stanfill, C.: Parallel information retrieval algorithms (1992) 0.03

0.025761317 = product of:
  0.051522635 = sum of:
    0.051522635 = product of:
      0.10304527 = sum of:
        0.10304527 = weight(_text_:data in 3515) [ClassicSimilarity], result of:
          0.10304527 = score(doc=3515,freq=10.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.6249551 = fieldWeight in 3515, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0625 = fieldNorm(doc=3515)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Data Parallel computers, such as the connection Machine CM-2, can provide interactive access to text databases containign tens, hundreds or even thousands of Gigabytes of data. Starts by presenting a brief overview of data parallel computing, a performance model of the CM-2, and a model of the workload involved in searching text databases. Discusses various algorithms used in information retrieval and gives performance estimates based on the data and procssing models presented
Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

Baeza-Yates, R.A.: Introduction to data structures and algorithms related to information retrieval (1992) 0.02

0.024943287 = product of:
  0.049886573 = sum of:
    0.049886573 = product of:
      0.099773146 = sum of:
        0.099773146 = weight(_text_:data in 3082) [ClassicSimilarity], result of:
          0.099773146 = score(doc=3082,freq=6.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.60511017 = fieldWeight in 3082, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.078125 = fieldNorm(doc=3082)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: In this chapter we review the main concepts and data structures used in information retrieval, and we classify information retrieval related algorithms
Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.02

0.024727121 = product of:
  0.049454242 = sum of:
    0.049454242 = product of:
      0.098908484 = sum of:
        0.098908484 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
          0.098908484 = score(doc=2134,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.5416616 = fieldWeight in 2134, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=2134)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 30. 3.2001 13:32:22

Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.02

0.024727121 = product of:
  0.049454242 = sum of:
    0.049454242 = product of:
      0.098908484 = sum of:
        0.098908484 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
          0.098908484 = score(doc=3445,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.5416616 = fieldWeight in 3445, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3445)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 25. 8.2005 17:42:22

Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.02

0.021194674 = product of:
  0.04238935 = sum of:
    0.04238935 = product of:
      0.0847787 = sum of:
        0.0847787 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
          0.0847787 = score(doc=58,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.46428138 = fieldWeight in 58, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=58)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 14. 6.2015 22:12:44

Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.02

0.021194674 = product of:
  0.04238935 = sum of:
    0.04238935 = product of:
      0.0847787 = sum of:
        0.0847787 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
          0.0847787 = score(doc=2051,freq=2.0), product of:
            0.18260197 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052144732 = queryNorm
            0.46428138 = fieldWeight in 2051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=2051)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 14. 6.2015 22:12:56

Bodoff, D.; Enache, D.; Kambil, A.; Simon, G.; Yukhimets, A.: ¬A unified maximum likelihood approach to document retrieval (2001) 0.02
```
0.019320987 = product of:
  0.038641974 = sum of:
    0.038641974 = product of:
      0.07728395 = sum of:
        0.07728395 = weight(_text_:data in 174) [ClassicSimilarity], result of:
          0.07728395 = score(doc=174,freq=10.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.46871632 = fieldWeight in 174, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=174)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Empirical work shows significant benefits from using relevance feedback data to improve information retrieval (IR) performance. Still, one fundamental difficulty has limited the ability to fully exploit this valuable data. The problem is that it is not clear whether the relevance feedback data should be used to train the system about what the users really mean, or about what the documents really mean. In this paper, we resolve the question using a maximum likelihood framework. We show how all the available data can be used to simultaneously estimate both documents and queries in proportions that are optimal in a maximum likelihood sense. The resulting algorithm is directly applicable to many approaches to IR, and the unified framework can help explain previously reported results as well as guidethe search for new methods that utilize feedback data in IR
Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.02
```
0.017637566 = product of:
  0.03527513 = sum of:
    0.03527513 = product of:
      0.07055026 = sum of:
        0.07055026 = weight(_text_:data in 4218) [ClassicSimilarity], result of:
          0.07055026 = score(doc=4218,freq=12.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.4278775 = fieldWeight in 4218, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4218)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning.

Harman, D.; Fox, E.; Baeza-Yates, R.; Lee, W.: Inverted files (1992) 0.02

0.016292887 = product of:
  0.032585774 = sum of:
    0.032585774 = product of:
      0.06517155 = sum of:
        0.06517155 = weight(_text_:data in 3497) [ClassicSimilarity], result of:
          0.06517155 = score(doc=3497,freq=4.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.3952563 = fieldWeight in 3497, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0625 = fieldNorm(doc=3497)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: This chaper presents a survey of the various structures (techniques) that can be used in building inverted files, and gives the details for producing an inverted file using sorted arrays. The chapter ends with 2 modifications to this basic method that are affective for large data collections
Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

French, J.C.; Powell, A.L.; Schulman, E.: Using clustering strategies for creating authority files (2000) 0.01
```
0.014965973 = product of:
  0.029931946 = sum of:
    0.029931946 = product of:
      0.05986389 = sum of:
        0.05986389 = weight(_text_:data in 4811) [ClassicSimilarity], result of:
          0.05986389 = score(doc=4811,freq=6.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.3630661 = fieldWeight in 4811, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=4811)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. Authority work, the need to discover and reconcile variant forms of strings in bibliographical entries, will become more critical in the future. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. We investigate a number of approximate string matching techniques that have traditionally been used to help with this problem. We then introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms. We demonstrate the utility of these approaches using data from the Astrophysics Data System and show how we can reduce the human effort involved in the creation of authority files
Bodoff, D.; Robertson, S.: ¬A new unified probabilistic model (2004) 0.01
```
0.014965973 = product of:
  0.029931946 = sum of:
    0.029931946 = product of:
      0.05986389 = sum of:
        0.05986389 = weight(_text_:data in 2129) [ClassicSimilarity], result of:
          0.05986389 = score(doc=2129,freq=6.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.3630661 = fieldWeight in 2129, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2129)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper proposes a new unified probabilistic model. Two previous models, Robertson et al.'s "Model 0" and "Model 3," each have strengths and weaknesses. The strength of Model 0 not found in Model 3, is that it does not require relevance data about the particular document or query, and, related to that, its probability estimates are straightforward. The strength of Model 3 not found in Model 0 is that it can utilize feedback information about the particular document and query in question. In this paper we introduce a new unified probabilistic model that combines these strengths: the expression of its probabilities is straightforward, it does not require that data must be available for the particular document or query in question, but it can utilize such specific data if it is available. The model is one way to resolve the difficulty of combining two marginal views in probabilistic retrieval.
Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.01
```
0.014965973 = product of:
  0.029931946 = sum of:
    0.029931946 = product of:
      0.05986389 = sum of:
        0.05986389 = weight(_text_:data in 2502) [ClassicSimilarity], result of:
          0.05986389 = score(doc=2502,freq=6.0), product of:
            0.16488427 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.052144732 = queryNorm
            0.3630661 = fieldWeight in 2502, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2502)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.

Search (105 results, page 1 of 6)

Authors

Years

Languages

Types

Themes

Subjects

Classifications