Search (3 results, page 1 of 1)

  • × theme_ss:"Retrievalalgorithmen"
  • × type_ss:"el"
  1. Lewandowski, D.: How can library materials be ranked in the OPAC? (2009) 0.02
    0.020067489 = product of:
      0.050168723 = sum of:
        0.03772483 = weight(_text_:bibliographic in 2810) [ClassicSimilarity], result of:
          0.03772483 = score(doc=2810,freq=2.0), product of:
            0.17541347 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.04505818 = queryNorm
            0.21506234 = fieldWeight in 2810, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2810)
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 2810) [ClassicSimilarity], result of:
              0.024887787 = score(doc=2810,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 2810, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2810)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Some Online Public Access Catalogues offer a ranking component. However, ranking there is merely text-based and is doomed to fail due to limited text in bibliographic data. The main assumption for the talk is that we are in a situation where the appropriate ranking factors for OPACs should be defined, while the implementation is no major problem. We must define what we want, and not so much focus on the technical work. Some deep thinking is necessary on the "perfect results set" and how we can achieve it through ranking. The talk presents a set of potential ranking factors and clustering possibilities for further discussion. A look at commercial Web search engines could provide us with ideas how ranking can be improved with additional factors. Search engines are way beyond pure text-based ranking and apply ranking factors in the groups like popularity, freshness, personalisation, etc. The talk describes the main factors used in search engines and how derivatives of these could be used for libraries' purposes. The goal of ranking is to provide the user with the best-suitable results on top of the results list. How can this goal be achieved with the library catalogue and also concerning the library's different collections and databases? The assumption is that ranking of such materials is a complex problem and is yet nowhere near solved. Libraries should focus on ranking to improve user experience.
  2. Qi, Q.; Hessen, D.J.; Heijden, P.G.M. van der: Improving information retrieval through correspondenceanalysis instead of latent semantic analysis (2023) 0.00
    0.0042235977 = product of:
      0.021117989 = sum of:
        0.021117989 = product of:
          0.042235978 = sum of:
            0.042235978 = weight(_text_:data in 1045) [ClassicSimilarity], result of:
              0.042235978 = score(doc=1045,freq=4.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.29644224 = fieldWeight in 1045, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1045)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    The initial dimensions extracted by latent semantic analysis (LSA) of a document-term matrixhave been shown to mainly display marginal effects, which are irrelevant for informationretrieval. To improve the performance of LSA, usually the elements of the raw document-term matrix are weighted and the weighting exponent of singular values can be adjusted.An alternative information retrieval technique that ignores the marginal effects is correspon-dence analysis (CA). In this paper, the information retrieval performance of LSA and CA isempirically compared. Moreover, it is explored whether the two weightings also improve theperformance of CA. The results for four empirical datasets show that CA always performsbetter than LSA. Weighting the elements of the raw data matrix can improve CA; however,it is data dependent and the improvement is small. Adjusting the singular value weightingexponent often improves the performance of CA; however, the extent of the improvementdepends on the dataset and the number of dimensions. (PDF) Improving information retrieval through correspondence analysis instead of latent semantic analysis.
  3. Robertson, S.E.; Sparck Jones, K.: Simple, proven approaches to text retrieval (1997) 0.00
    0.0024887787 = product of:
      0.012443894 = sum of:
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 4532) [ClassicSimilarity], result of:
              0.024887787 = score(doc=4532,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 4532, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4532)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    This technical note describes straightforward techniques for document indexing and retrieval that have been solidly established through extensive testing and are easy to apply. They are useful for many different types of text material, are viable for very large files, and have the advantage that they do not require special skills or training for searching, but are easy for end users. The document and text retrieval methods described here have a sound theoretical basis, are well established by extensive testing, and the ideas involved are now implemented in some commercial retrieval systems. Testing in the last few years has, in particular, shown that the methods presented here work very well with full texts, not only title and abstracts, and with large files of texts containing three quarters of a million documents. These tests, the TREC Tests (see Harman 1993 - 1997; IP&M 1995), have been rigorous comparative evaluations involving many different approaches to information retrieval. These techniques depend an the use of simple terms for indexing both request and document texts; an term weighting exploiting statistical information about term occurrences; an scoring for request-document matching, using these weights, to obtain a ranked search output; and an relevance feedback to modify request weights or term sets in iterative searching. The normal implementation is via an inverted file organisation using a term list with linked document identifiers, plus counting data, and pointers to the actual texts. The user's request can be a word list, phrases, sentences or extended text.