Search (111 results, page 1 of 6)

Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.03

0.031540915 = product of:
  0.094622746 = sum of:
    0.053391904 = weight(_text_:bibliographic in 664) [ClassicSimilarity], result of:
      0.053391904 = score(doc=664,freq=6.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.3724989 = fieldWeight in 664, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0390625 = fieldNorm(doc=664)
    0.028759988 = weight(_text_:data in 664) [ClassicSimilarity], result of:
      0.028759988 = score(doc=664,freq=4.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.24703519 = fieldWeight in 664, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=664)
    0.012470853 = product of:
      0.024941705 = sum of:
        0.024941705 = weight(_text_:22 in 664) [ClassicSimilarity], result of:
          0.024941705 = score(doc=664,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.19345059 = fieldWeight in 664, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=664)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: A new challenge, accessing multiple relevant entities, arises from the availability of linked heterogeneous data. In this article, we address more specifically the problem of accessing relevant entities, such as publications and authors within a bibliographic network, given an information need. We propose a novel algorithm, called BibRank, that estimates a joint relevance of documents and authors within a bibliographic network. This model ranks each type of entity using a score propagation algorithm with respect to the query topic and the structure of the underlying bi-type information entity network. Evidence sources, namely content-based and network-based scores, are both used to estimate the topical similarity between connected entities. For this purpose, authorship relationships are analyzed through a language model-based score on the one hand and on the other hand, non topically related entities of the same type are detected through marginal citations. The article reports the results of experiments using the Bibrank algorithm for an information retrieval task. The CiteSeerX bibliographic data set forms the basis for the topical query automatic generation and evaluation. We show that a statistically significant improvement over closely related ranking models is achieved.
Date: 22. 3.2013 19:34:49

Carpineto, C.; Romano, G.: Information retrieval through hybrid navigation of lattice representations (1996) 0.02

0.015917132 = product of:
  0.071627095 = sum of:
    0.04315616 = weight(_text_:bibliographic in 7434) [ClassicSimilarity], result of:
      0.04315616 = score(doc=7434,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.30108726 = fieldWeight in 7434, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7434)
    0.028470935 = weight(_text_:data in 7434) [ClassicSimilarity], result of:
      0.028470935 = score(doc=7434,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.24455236 = fieldWeight in 7434, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7434)
  0.22222222 = coord(2/9)

Abstract: Presents a comprehensive approach to automatic organization and hybrid navigation of text databases. An organizing stage builds a particular lattice representation of the data, through text indexing followed by lattice clustering of the indexed texts. The lattice representation supports the navigation state of the system, a visual retrieval interface that combines 3 main retrieval strategies: browsing, querying, and bounding. Such a hybrid paradigm permits high flexibility in trading off information exploration and retrieval, and had good retrieval performance. Compares information retrieval using lattice-based hybrid navigation with conventional Boolean querying. Experiments conducted on 2 medium-sized bibliographic databases showed that the performance of lattice retrieval was comparable to or better than Boolean retrieval

Faloutsos, C.: Signature files (1992) 0.01

0.011664795 = product of:
  0.052491575 = sum of:
    0.032538213 = weight(_text_:data in 3499) [ClassicSimilarity], result of:
      0.032538213 = score(doc=3499,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.2794884 = fieldWeight in 3499, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=3499)
    0.019953365 = product of:
      0.03990673 = sum of:
        0.03990673 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
          0.03990673 = score(doc=3499,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.30952093 = fieldWeight in 3499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3499)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)

Date: 7. 5.1999 15:22:48
Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

Lewandowski, D.: How can library materials be ranked in the OPAC? (2009) 0.01
```
0.011369381 = product of:
  0.051162213 = sum of:
    0.03082583 = weight(_text_:bibliographic in 2810) [ClassicSimilarity], result of:
      0.03082583 = score(doc=2810,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.21506234 = fieldWeight in 2810, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2810)
    0.020336384 = weight(_text_:data in 2810) [ClassicSimilarity], result of:
      0.020336384 = score(doc=2810,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.17468026 = fieldWeight in 2810, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2810)
  0.22222222 = coord(2/9)
```
Abstract

Some Online Public Access Catalogues offer a ranking component. However, ranking there is merely text-based and is doomed to fail due to limited text in bibliographic data. The main assumption for the talk is that we are in a situation where the appropriate ranking factors for OPACs should be defined, while the implementation is no major problem. We must define what we want, and not so much focus on the technical work. Some deep thinking is necessary on the "perfect results set" and how we can achieve it through ranking. The talk presents a set of potential ranking factors and clustering possibilities for further discussion. A look at commercial Web search engines could provide us with ideas how ranking can be improved with additional factors. Search engines are way beyond pure text-based ranking and apply ranking factors in the groups like popularity, freshness, personalisation, etc. The talk describes the main factors used in search engines and how derivatives of these could be used for libraries' purposes. The goal of ranking is to provide the user with the best-suitable results on top of the results list. How can this goal be achieved with the library catalogue and also concerning the library's different collections and databases? The assumption is that ranking of such materials is a complex problem and is yet nowhere near solved. Libraries should focus on ranking to improve user experience.
Joss, M.W.; Wszola, S.: ¬The engines that can : text search and retrieval software, their strategies, and vendors (1996) 0.01
```
0.010994892 = product of:
  0.04947701 = sum of:
    0.034511987 = weight(_text_:data in 5123) [ClassicSimilarity], result of:
      0.034511987 = score(doc=5123,freq=4.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.29644224 = fieldWeight in 5123, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=5123)
    0.014965023 = product of:
      0.029930046 = sum of:
        0.029930046 = weight(_text_:22 in 5123) [ClassicSimilarity], result of:
          0.029930046 = score(doc=5123,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.23214069 = fieldWeight in 5123, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=5123)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)
```
Abstract

Traces the development of text searching and retrieval software designed to cope with the increasing demands made by the storage and handling of large amounts of data, recorded on high data storage media, from CD-ROM to multi gigabyte storage media and online information services, with particular reference to the need to cope with graphics as well as conventional ASCII text. Includes details of: Boolean searching, fuzzy searching and matching; relevance ranking; proximity searching and improved strategies for dealing with text searching in very large databases. Concludes that the best searching tools for CD-ROM publishers are those optimized for searching and retrieval on CD-ROM. CD-ROM drives have relatively lower random seek times than hard discs and so the software most appropriate to the medium is that which can effectively arrange the indexes and text on the CD-ROM to avoid continuous random access searching. Lists and reviews a selection of software packages designed to achieve the sort of results required for rapid CD-ROM searching

Date

12. 9.1996 13:56:22
Burgin, R.: ¬The retrieval effectiveness of 5 clustering algorithms as a function of indexing exhaustivity (1995) 0.01
```
0.010598778 = product of:
  0.0476945 = sum of:
    0.035223648 = weight(_text_:data in 3365) [ClassicSimilarity], result of:
      0.035223648 = score(doc=3365,freq=6.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.30255508 = fieldWeight in 3365, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3365)
    0.012470853 = product of:
      0.024941705 = sum of:
        0.024941705 = weight(_text_:22 in 3365) [ClassicSimilarity], result of:
          0.024941705 = score(doc=3365,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.19345059 = fieldWeight in 3365, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3365)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)
```
Abstract

The retrieval effectiveness of 5 hierarchical clustering methods (single link, complete link, group average, Ward's method, and weighted average) is examined as a function of indexing exhaustivity with 4 test collections (CR, Cranfield, Medlars, and Time). Evaluations of retrieval effectiveness, based on 3 measures of optimal retrieval performance, confirm earlier findings that the performance of a retrieval system based on single link clustering varies as a function of indexing exhaustivity but fail ti find similar patterns for other clustering methods. The data also confirm earlier findings regarding the poor performance of single link clustering is a retrieval environment. The poor performance of single link clustering appears to derive from that method's tendency to produce a small number of large, ill defined document clusters. By contrast, the data examined here found the retrieval performance of the other clustering methods to be general comparable. The data presented also provides an opportunity to examine the theoretical limits of cluster based retrieval and to compare these theoretical limits to the effectiveness of operational implementations. Performance standards of the 4 document collections examined were found to vary widely, and the effectiveness of operational implementations were found to be in the range defined as unacceptable. Further improvements in search strategies and document representations warrant investigations

Date

22. 2.1996 11:20:06

Schiminovich, S.: Automatic classification and retrieval of documents by means of a bibliographic pattern discovery algorithm (1971) 0.01

0.009590258 = product of:
  0.08631232 = sum of:
    0.08631232 = weight(_text_:bibliographic in 4846) [ClassicSimilarity], result of:
      0.08631232 = score(doc=4846,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.6021745 = fieldWeight in 4846, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.109375 = fieldNorm(doc=4846)
  0.11111111 = coord(1/9)

Aho, A.; Corasick, M.: Efficient string matching : an aid to bibliographic search (1975) 0.01

0.009590258 = product of:
  0.08631232 = sum of:
    0.08631232 = weight(_text_:bibliographic in 3506) [ClassicSimilarity], result of:
      0.08631232 = score(doc=3506,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.6021745 = fieldWeight in 3506, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.109375 = fieldNorm(doc=3506)
  0.11111111 = coord(1/9)

Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.01
```
0.009162409 = product of:
  0.041230842 = sum of:
    0.028759988 = weight(_text_:data in 56) [ClassicSimilarity], result of:
      0.028759988 = score(doc=56,freq=4.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.24703519 = fieldWeight in 56, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=56)
    0.012470853 = product of:
      0.024941705 = sum of:
        0.024941705 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
          0.024941705 = score(doc=56,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.19345059 = fieldWeight in 56, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)
```
Abstract

The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.

Date

22. 7.2006 16:32:43
Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.01
```
0.008748596 = product of:
  0.03936868 = sum of:
    0.024403658 = weight(_text_:data in 2419) [ClassicSimilarity], result of:
      0.024403658 = score(doc=2419,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.2096163 = fieldWeight in 2419, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=2419)
    0.014965023 = product of:
      0.029930046 = sum of:
        0.029930046 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
          0.029930046 = score(doc=2419,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.23214069 = fieldWeight in 2419, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)
```
Abstract

The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.

Date

16.11.2008 16:22:48
Stanfill, C.: Parallel information retrieval algorithms (1992) 0.01
```
0.008084184 = product of:
  0.072757654 = sum of:
    0.072757654 = weight(_text_:data in 3515) [ClassicSimilarity], result of:
      0.072757654 = score(doc=3515,freq=10.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.6249551 = fieldWeight in 3515, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=3515)
  0.11111111 = coord(1/9)
```
Abstract

Data Parallel computers, such as the connection Machine CM-2, can provide interactive access to text databases containign tens, hundreds or even thousands of Gigabytes of data. Starts by presenting a brief overview of data parallel computing, a performance model of the CM-2, and a model of the workload involved in searching text databases. Discusses various algorithms used in information retrieval and gives performance estimates based on the data and procssing models presented

Source

Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

Baeza-Yates, R.A.: Introduction to data structures and algorithms related to information retrieval (1992) 0.01

0.0078274775 = product of:
  0.070447296 = sum of:
    0.070447296 = weight(_text_:data in 3082) [ClassicSimilarity], result of:
      0.070447296 = score(doc=3082,freq=6.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.60511017 = fieldWeight in 3082, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.078125 = fieldNorm(doc=3082)
  0.11111111 = coord(1/9)

Abstract: In this chapter we review the main concepts and data structures used in information retrieval, and we classify information retrieval related algorithms
Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

Baloh, P.; Desouza, K.C.; Hackney, R.: Contextualizing organizational interventions of knowledge management systems : a design science perspectiveA domain analysis (2012) 0.01
```
0.0072904974 = product of:
  0.03280724 = sum of:
    0.020336384 = weight(_text_:data in 241) [ClassicSimilarity], result of:
      0.020336384 = score(doc=241,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.17468026 = fieldWeight in 241, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=241)
    0.012470853 = product of:
      0.024941705 = sum of:
        0.024941705 = weight(_text_:22 in 241) [ClassicSimilarity], result of:
          0.024941705 = score(doc=241,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.19345059 = fieldWeight in 241, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=241)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)
```
Abstract

We address how individuals' (workers) knowledge needs influence the design of knowledge management systems (KMS), enabling knowledge creation and utilization. It is evident that KMS technologies and activities are indiscriminately deployed in most organizations with little regard to the actual context of their adoption. Moreover, it is apparent that the extant literature pertaining to knowledge management projects is frequently deficient in identifying the variety of factors indicative for successful KMS. This presents an obvious business practice and research gap that requires a critical analysis of the necessary intervention that will actually improve how workers can leverage and form organization-wide knowledge. This research involved an extensive review of the literature, a grounded theory methodological approach and rigorous data collection and synthesis through an empirical case analysis (Parsons Brinckerhoff and Samsung). The contribution of this study is the formulation of a model for designing KMS based upon the design science paradigm, which aspires to create artifacts that are interdependent of people and organizations. The essential proposition is that KMS design and implementation must be contextualized in relation to knowledge needs and that these will differ for various organizational settings. The findings present valuable insights and further understanding of the way in which KMS design efforts should be focused.

Date

11. 6.2012 14:22:34
Bodoff, D.; Enache, D.; Kambil, A.; Simon, G.; Yukhimets, A.: ¬A unified maximum likelihood approach to document retrieval (2001) 0.01
```
0.006063138 = product of:
  0.054568242 = sum of:
    0.054568242 = weight(_text_:data in 174) [ClassicSimilarity], result of:
      0.054568242 = score(doc=174,freq=10.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.46871632 = fieldWeight in 174, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=174)
  0.11111111 = coord(1/9)
```
Abstract

Empirical work shows significant benefits from using relevance feedback data to improve information retrieval (IR) performance. Still, one fundamental difficulty has limited the ability to fully exploit this valuable data. The problem is that it is not clear whether the relevance feedback data should be used to train the system about what the users really mean, or about what the documents really mean. In this paper, we resolve the question using a maximum likelihood framework. We show how all the available data can be used to simultaneously estimate both documents and queries in proportions that are optimal in a maximum likelihood sense. The resulting algorithm is directly applicable to many approaches to IR, and the unified framework can help explain previously reported results as well as guidethe search for new methods that utilize feedback data in IR
Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.01
```
0.005534862 = product of:
  0.04981376 = sum of:
    0.04981376 = weight(_text_:data in 4218) [ClassicSimilarity], result of:
      0.04981376 = score(doc=4218,freq=12.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.4278775 = fieldWeight in 4218, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4218)
  0.11111111 = coord(1/9)
```
Abstract

This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning.
Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.01
```
0.005480147 = product of:
  0.049321324 = sum of:
    0.049321324 = weight(_text_:bibliographic in 2564) [ClassicSimilarity], result of:
      0.049321324 = score(doc=2564,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.34409973 = fieldWeight in 2564, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0625 = fieldNorm(doc=2564)
  0.11111111 = coord(1/9)
```
Abstract

The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.

Harman, D.; Fox, E.; Baeza-Yates, R.; Lee, W.: Inverted files (1992) 0.01

0.005112887 = product of:
  0.04601598 = sum of:
    0.04601598 = weight(_text_:data in 3497) [ClassicSimilarity], result of:
      0.04601598 = score(doc=3497,freq=4.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.3952563 = fieldWeight in 3497, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=3497)
  0.11111111 = coord(1/9)

Abstract: This chaper presents a survey of the various structures (techniques) that can be used in building inverted files, and gives the details for producing an inverted file using sorted arrays. The chapter ends with 2 modifications to this basic method that are affective for large data collections
Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

Couvreur, T.R.; Benzel, R.N.; Miller, S.F.; Zeitler, D.N.; Lee, D.L.; Singhal, M.; Shivaratri, N.; Wong, W.Y.P.: ¬An analysis of performance and cost factors in searching large text databases using parallel search systems (1994) 0.00
```
0.004795129 = product of:
  0.04315616 = sum of:
    0.04315616 = weight(_text_:bibliographic in 7657) [ClassicSimilarity], result of:
      0.04315616 = score(doc=7657,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.30108726 = fieldWeight in 7657, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7657)
  0.11111111 = coord(1/9)
```
Abstract

The results of modelling the performance of searching large text databases (>10 GBytes) via various parallel hardware architectures and search algorithms are discussed. The performance under load and the cost of each configuration are compared. Strengths, weaknesses, performance sensitivities, and search features supported for each configuration are also addressed. In addition, a common search workload used in the modelling is described. The search workload is derived from a set of searches run against the Chemical Abstracts file of bibliographic and abstract text available on STN International. This common workload is applied to all configurations modelled to provide a common basis of comparison
French, J.C.; Powell, A.L.; Schulman, E.: Using clustering strategies for creating authority files (2000) 0.00
```
0.0046964865 = product of:
  0.042268377 = sum of:
    0.042268377 = weight(_text_:data in 4811) [ClassicSimilarity], result of:
      0.042268377 = score(doc=4811,freq=6.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.3630661 = fieldWeight in 4811, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=4811)
  0.11111111 = coord(1/9)
```
Abstract

As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. Authority work, the need to discover and reconcile variant forms of strings in bibliographical entries, will become more critical in the future. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. We investigate a number of approximate string matching techniques that have traditionally been used to help with this problem. We then introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms. We demonstrate the utility of these approaches using data from the Astrophysics Data System and show how we can reduce the human effort involved in the creation of authority files
Bodoff, D.; Robertson, S.: ¬A new unified probabilistic model (2004) 0.00
```
0.0046964865 = product of:
  0.042268377 = sum of:
    0.042268377 = weight(_text_:data in 2129) [ClassicSimilarity], result of:
      0.042268377 = score(doc=2129,freq=6.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.3630661 = fieldWeight in 2129, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=2129)
  0.11111111 = coord(1/9)
```
Abstract

This paper proposes a new unified probabilistic model. Two previous models, Robertson et al.'s "Model 0" and "Model 3," each have strengths and weaknesses. The strength of Model 0 not found in Model 3, is that it does not require relevance data about the particular document or query, and, related to that, its probability estimates are straightforward. The strength of Model 3 not found in Model 0 is that it can utilize feedback information about the particular document and query in question. In this paper we introduce a new unified probabilistic model that combines these strengths: the expression of its probabilities is straightforward, it does not require that data must be available for the particular document or query in question, but it can utilize such specific data if it is available. The model is one way to resolve the difficulty of combining two marginal views in probabilistic retrieval.

Search (111 results, page 1 of 6)

Authors

Years

Languages

Types

Themes

Subjects

Classifications