Search (70 results, page 1 of 4)

  • × theme_ss:"Retrievalalgorithmen"
  1. Koumenides, C.L.; Shadbolt, N.R.: Ranking methods for entity-oriented semantic web search (2014) 0.09
    0.08781542 = product of:
      0.13172312 = sum of:
        0.10759281 = weight(_text_:systematic in 1280) [ClassicSimilarity], result of:
          0.10759281 = score(doc=1280,freq=2.0), product of:
            0.28397155 = queryWeight, product of:
              5.715473 = idf(docFreq=395, maxDocs=44218)
              0.049684696 = queryNorm
            0.3788859 = fieldWeight in 1280, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.715473 = idf(docFreq=395, maxDocs=44218)
              0.046875 = fieldNorm(doc=1280)
        0.024130303 = product of:
          0.048260607 = sum of:
            0.048260607 = weight(_text_:indexing in 1280) [ClassicSimilarity], result of:
              0.048260607 = score(doc=1280,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2537542 = fieldWeight in 1280, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1280)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    This article provides a technical review of semantic search methods used to support text-based search over formal Semantic Web knowledge bases. Our focus is on ranking methods and auxiliary processes explored by existing semantic search systems, outlined within broad areas of classification. We present reflective examples from the literature in some detail, which should appeal to readers interested in a deeper perspective on the various methods and systems implemented in the outlined literature. The presentation covers graph exploration and propagation methods, adaptations of classic probabilistic retrieval models, and query-independent link analysis via flexible extensions to the PageRank algorithm. Future research directions are discussed, including development of more cohesive retrieval models to unlock further potentials and uses, data indexing schemes, integration with user interfaces, and building community consensus for more systematic evaluation and gradual development.
  2. Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.09
    0.08519173 = product of:
      0.12778759 = sum of:
        0.10759281 = weight(_text_:systematic in 2419) [ClassicSimilarity], result of:
          0.10759281 = score(doc=2419,freq=2.0), product of:
            0.28397155 = queryWeight, product of:
              5.715473 = idf(docFreq=395, maxDocs=44218)
              0.049684696 = queryNorm
            0.3788859 = fieldWeight in 2419, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.715473 = idf(docFreq=395, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
        0.02019477 = product of:
          0.04038954 = sum of:
            0.04038954 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
              0.04038954 = score(doc=2419,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.23214069 = fieldWeight in 2419, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2419)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.
    Date
    16.11.2008 16:22:48
  3. Davis, C.H.; McKim, G.W.: Systematic weighting and ranking : cutting the Gordian knot (1999) 0.04
    0.041841652 = product of:
      0.12552495 = sum of:
        0.12552495 = weight(_text_:systematic in 3548) [ClassicSimilarity], result of:
          0.12552495 = score(doc=3548,freq=2.0), product of:
            0.28397155 = queryWeight, product of:
              5.715473 = idf(docFreq=395, maxDocs=44218)
              0.049684696 = queryNorm
            0.44203353 = fieldWeight in 3548, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.715473 = idf(docFreq=395, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3548)
      0.33333334 = coord(1/3)
    
  4. Kanaeva, Z.: Ranking: Google und CiteSeer (2005) 0.03
    0.03447506 = product of:
      0.103425175 = sum of:
        0.103425175 = sum of:
          0.05630404 = weight(_text_:indexing in 3276) [ClassicSimilarity], result of:
            0.05630404 = score(doc=3276,freq=2.0), product of:
              0.19018644 = queryWeight, product of:
                3.8278677 = idf(docFreq=2614, maxDocs=44218)
                0.049684696 = queryNorm
              0.29604656 = fieldWeight in 3276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.8278677 = idf(docFreq=2614, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3276)
          0.047121134 = weight(_text_:22 in 3276) [ClassicSimilarity], result of:
            0.047121134 = score(doc=3276,freq=2.0), product of:
              0.17398734 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049684696 = queryNorm
              0.2708308 = fieldWeight in 3276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3276)
      0.33333334 = coord(1/3)
    
    Abstract
    Im Rahmen des klassischen Information Retrieval wurden verschiedene Verfahren für das Ranking sowie die Suche in einer homogenen strukturlosen Dokumentenmenge entwickelt. Die Erfolge der Suchmaschine Google haben gezeigt dass die Suche in einer zwar inhomogenen aber zusammenhängenden Dokumentenmenge wie dem Internet unter Berücksichtigung der Dokumentenverbindungen (Links) sehr effektiv sein kann. Unter den von der Suchmaschine Google realisierten Konzepten ist ein Verfahren zum Ranking von Suchergebnissen (PageRank), das in diesem Artikel kurz erklärt wird. Darüber hinaus wird auf die Konzepte eines Systems namens CiteSeer eingegangen, welches automatisch bibliographische Angaben indexiert (engl. Autonomous Citation Indexing, ACI). Letzteres erzeugt aus einer Menge von nicht vernetzten wissenschaftlichen Dokumenten eine zusammenhängende Dokumentenmenge und ermöglicht den Einsatz von Banking-Verfahren, die auf den von Google genutzten Verfahren basieren.
    Date
    20. 3.2005 16:23:22
  5. Burgin, R.: ¬The retrieval effectiveness of 5 clustering algorithms as a function of indexing exhaustivity (1995) 0.03
    0.034438714 = product of:
      0.103316136 = sum of:
        0.103316136 = sum of:
          0.06965818 = weight(_text_:indexing in 3365) [ClassicSimilarity], result of:
            0.06965818 = score(doc=3365,freq=6.0), product of:
              0.19018644 = queryWeight, product of:
                3.8278677 = idf(docFreq=2614, maxDocs=44218)
                0.049684696 = queryNorm
              0.3662626 = fieldWeight in 3365, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.8278677 = idf(docFreq=2614, maxDocs=44218)
                0.0390625 = fieldNorm(doc=3365)
          0.033657953 = weight(_text_:22 in 3365) [ClassicSimilarity], result of:
            0.033657953 = score(doc=3365,freq=2.0), product of:
              0.17398734 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049684696 = queryNorm
              0.19345059 = fieldWeight in 3365, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=3365)
      0.33333334 = coord(1/3)
    
    Abstract
    The retrieval effectiveness of 5 hierarchical clustering methods (single link, complete link, group average, Ward's method, and weighted average) is examined as a function of indexing exhaustivity with 4 test collections (CR, Cranfield, Medlars, and Time). Evaluations of retrieval effectiveness, based on 3 measures of optimal retrieval performance, confirm earlier findings that the performance of a retrieval system based on single link clustering varies as a function of indexing exhaustivity but fail ti find similar patterns for other clustering methods. The data also confirm earlier findings regarding the poor performance of single link clustering is a retrieval environment. The poor performance of single link clustering appears to derive from that method's tendency to produce a small number of large, ill defined document clusters. By contrast, the data examined here found the retrieval performance of the other clustering methods to be general comparable. The data presented also provides an opportunity to examine the theoretical limits of cluster based retrieval and to compare these theoretical limits to the effectiveness of operational implementations. Performance standards of the 4 document collections examined were found to vary widely, and the effectiveness of operational implementations were found to be in the range defined as unacceptable. Further improvements in search strategies and document representations warrant investigations
    Date
    22. 2.1996 11:20:06
  6. Picard, J.; Savoy, J.: Enhancing retrieval with hyperlinks : a general model based on propositional argumentation systems (2003) 0.03
    0.029886894 = product of:
      0.08966068 = sum of:
        0.08966068 = weight(_text_:systematic in 1427) [ClassicSimilarity], result of:
          0.08966068 = score(doc=1427,freq=2.0), product of:
            0.28397155 = queryWeight, product of:
              5.715473 = idf(docFreq=395, maxDocs=44218)
              0.049684696 = queryNorm
            0.31573826 = fieldWeight in 1427, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.715473 = idf(docFreq=395, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1427)
      0.33333334 = coord(1/3)
    
    Abstract
    Fast, effective, and adaptable techniques are needed to automatically organize and retrieve information an the ever-increasing World Wide Web. In that respect, different strategies have been suggested to take hypertext links into account. For example, hyperlinks have been used to (1) enhance document representation, (2) improve document ranking by propagating document score, (3) provide an indicator of popularity, and (4) find hubs and authorities for a given topic. Although the TREC experiments have not demonstrated the usefulness of hyperlinks for retrieval, the hypertext structure is nevertheless an essential aspect of the Web, and as such, should not be ignored. The development of abstract models of the IR task was a key factor to the improvement of search engines. However, at this time conceptual tools for modeling the hypertext retrieval task are lacking, making it difficult to compare, improve, and reason an the existing techniques. This article proposes a general model for using hyperlinks based an Probabilistic Argumentation Systems, in which each of the above-mentioned techniques can be stated. This model will allow to discover some inconsistencies in the mentioned techniques, and to take a higher level and systematic approach for using hyperlinks for retrieval.
  7. Kelledy, F.; Smeaton, A.F.: Signature files and beyond (1996) 0.03
    0.029550051 = product of:
      0.08865015 = sum of:
        0.08865015 = sum of:
          0.048260607 = weight(_text_:indexing in 6973) [ClassicSimilarity], result of:
            0.048260607 = score(doc=6973,freq=2.0), product of:
              0.19018644 = queryWeight, product of:
                3.8278677 = idf(docFreq=2614, maxDocs=44218)
                0.049684696 = queryNorm
              0.2537542 = fieldWeight in 6973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.8278677 = idf(docFreq=2614, maxDocs=44218)
                0.046875 = fieldNorm(doc=6973)
          0.04038954 = weight(_text_:22 in 6973) [ClassicSimilarity], result of:
            0.04038954 = score(doc=6973,freq=2.0), product of:
              0.17398734 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049684696 = queryNorm
              0.23214069 = fieldWeight in 6973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=6973)
      0.33333334 = coord(1/3)
    
    Abstract
    Proposes that signature files be used as a viable alternative to other indexing strategies such as inverted files for searching through large volumes of text. Demonstrates through simulation, that search times can be further reduced by enhancing the basic signature file concept using deterministic partitioning algorithms which eliminate the need for an exhaustive search of the entire signature file. Reports research to evaluate the performance of some deterministic partitioning algorithms in a non simulated environment using 276 MB of raw newspaper text (taken from the Wall Street Journal) and real user queries. Presents a selection of results to illustrate trends and highlight important aspects of the performance of these methods under realistic rather than simulated operating conditions. As a result of the research reported here certain aspects of this approach to signature files are shown to be found wanting and require improvement. Suggests lines of future research on the partitioning of signature files
    Source
    Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon
  8. Chang, R.: Keyword searching and indexing (1993) 0.02
    0.02398089 = product of:
      0.071942665 = sum of:
        0.071942665 = product of:
          0.14388533 = sum of:
            0.14388533 = weight(_text_:indexing in 7223) [ClassicSimilarity], result of:
              0.14388533 = score(doc=7223,freq=10.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.7565488 = fieldWeight in 7223, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7223)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Explains how a computer indexing system works. Reviews fundamentals of how data are stored and retrieved by computers. Describes B-Tree and B+-Tree indexing structures. Gives basic keyword searching techniques that the user must apply to make use of the indexing programs. The demand for keyword retrieval is increasing and librarians should expect to see the keyword-indexing feature become commonly available
  9. Abdelkareem, M.A.A.: In terms of publication index, what indicator is the best for researchers indexing, Google Scholar, Scopus, Clarivate or others? (2018) 0.02
    0.020983277 = product of:
      0.06294983 = sum of:
        0.06294983 = product of:
          0.12589966 = sum of:
            0.12589966 = weight(_text_:indexing in 4548) [ClassicSimilarity], result of:
              0.12589966 = score(doc=4548,freq=10.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.6619802 = fieldWeight in 4548, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4548)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    I believe that Google Scholar is the most popular academic indexing way for researchers and citations. However, some other indexing institutions may be more professional than Google Scholar but not as popular as Google Scholar. Other indexing websites like Scopus and Clarivate are providing more statistical figures for scholars, institutions or even journals. On account of publication citations, always Google Scholar shows higher citations for a paper than other indexing websites since Google Scholar consider most of the publication platforms so he can easily count the citations. While other databases just consider the citations come from those journals that are already indexed in their database
  10. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02
    0.01795091 = product of:
      0.053852726 = sum of:
        0.053852726 = product of:
          0.10770545 = sum of:
            0.10770545 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.10770545 = score(doc=402,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  11. MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the generation of partitioned inverted files (2005) 0.02
    0.016086869 = product of:
      0.048260607 = sum of:
        0.048260607 = product of:
          0.09652121 = sum of:
            0.09652121 = weight(_text_:indexing in 651) [ClassicSimilarity], result of:
              0.09652121 = score(doc=651,freq=8.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.5075084 = fieldWeight in 651, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=651)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - The generation of inverted indexes is one of the most computationally intensive activities for information retrieval systems: indexing large multi-gigabyte text databases can take many hours or even days to complete. We examine the generation of partitioned inverted files in order to speed up the process of indexing. Two types of index partitions are investigated: TermId and DocId. Design/methodology/approach - We use standard measures used in parallel computing such as speedup and efficiency to examine the computing results and also the space costs of our trial indexing experiments. Findings - The results from runs on both partitioning methods are compared and contrasted, concluding that DocId is the more efficient method. Practical implications - The practical implications are that the DocId partitioning method would in most circumstances be used for distributing inverted file data in a parallel computer, particularly if indexing speed is the primary consideration. Originality/value - The paper is of value to database administrators who manage large-scale text collections, and who need to use parallel computing to implement their text retrieval services.
  12. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.02
    0.015707046 = product of:
      0.047121134 = sum of:
        0.047121134 = product of:
          0.09424227 = sum of:
            0.09424227 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.09424227 = score(doc=2134,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    30. 3.2001 13:32:22
  13. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.02
    0.015707046 = product of:
      0.047121134 = sum of:
        0.047121134 = product of:
          0.09424227 = sum of:
            0.09424227 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.09424227 = score(doc=3445,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    25. 8.2005 17:42:22
  14. Chang, R.: ¬The development of indexing technology (1993) 0.02
    0.015166845 = product of:
      0.045500536 = sum of:
        0.045500536 = product of:
          0.09100107 = sum of:
            0.09100107 = weight(_text_:indexing in 7024) [ClassicSimilarity], result of:
              0.09100107 = score(doc=7024,freq=4.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.47848347 = fieldWeight in 7024, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7024)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Reviews the basic techniques of computerized indexing, including various file accessing methods such as: Sequential Access Method (SAM); Direct Access Method (DAM); Indexed Sequential Access Method (ISAM), and Virtual Indexed Sequential Access Method (VSAM); and various B-tree (balanced tree)structures. Illustrates how records are stored and accessed, and how B-trees are used to for improving the operations of information retrieval and maintenance
  15. Frakes, W.B.: Stemming algorithms (1992) 0.02
    0.015166845 = product of:
      0.045500536 = sum of:
        0.045500536 = product of:
          0.09100107 = sum of:
            0.09100107 = weight(_text_:indexing in 3503) [ClassicSimilarity], result of:
              0.09100107 = score(doc=3503,freq=4.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.47848347 = fieldWeight in 3503, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3503)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Desribes stemming algorithms - programs that relate morphologically similar indexing and search terms. Stemming is used to improve retrieval effectiveness and to reduce the size of indexing files. Several approaches to stemming are describes - table lookup, affix removal, successor variety, and n-gram. empirical studies of stemming are summarized. The Porter stemmer is described in detail, and a full implementation in C is presented
  16. Maron, M.E.: ¬An historical note on the origins of probabilistic indexing (2008) 0.02
    0.015166845 = product of:
      0.045500536 = sum of:
        0.045500536 = product of:
          0.09100107 = sum of:
            0.09100107 = weight(_text_:indexing in 2047) [ClassicSimilarity], result of:
              0.09100107 = score(doc=2047,freq=4.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.47848347 = fieldWeight in 2047, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2047)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The motivation behind "Probabilistic Indexing" was to replace two-valued thinking about information retrieval with probabilistic notions. This involved a new view of the information retrieval problem - viewing it as problem of inference and prediction, and introducing probabilistically weighted indexes and probabilistically ranked output. These ideas were first formulated and written up in August 1958.
  17. Thompson, P.: Looking back: on relevance, probabilistic indexing and information retrieval (2008) 0.02
    0.015166845 = product of:
      0.045500536 = sum of:
        0.045500536 = product of:
          0.09100107 = sum of:
            0.09100107 = weight(_text_:indexing in 2074) [ClassicSimilarity], result of:
              0.09100107 = score(doc=2074,freq=4.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.47848347 = fieldWeight in 2074, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2074)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Forty-eight years ago Maron and Kuhns published their paper, "On Relevance, Probabilistic Indexing and Information Retrieval" (1960). This was the first paper to present a probabilistic approach to information retrieval, and perhaps the first paper on ranked retrieval. Although it is one of the most widely cited papers in the field of information retrieval, many researchers today may not be familiar with its influence. This paper describes the Maron and Kuhns article and the influence that it has had on the field of information retrieval.
  18. Efron, M.: Query expansion and dimensionality reduction : Notions of optimality in Rocchio relevance feedback and latent semantic indexing (2008) 0.01
    0.013931636 = product of:
      0.041794907 = sum of:
        0.041794907 = product of:
          0.083589815 = sum of:
            0.083589815 = weight(_text_:indexing in 2020) [ClassicSimilarity], result of:
              0.083589815 = score(doc=2020,freq=6.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.4395151 = fieldWeight in 2020, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2020)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method's basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in some sense least-squares optimal, we ask: what is the relationship between LSI's and Rocchio's notions of optimality? What does this relationship imply for IR? Using an analytical approach, we argue that Rocchio relevance feedback is optimal if we understand retrieval as a simplified classification problem. On the other hand, LSI's motivation comes to the fore if we understand it as a biased regression technique, where projection onto a low-dimensional orthogonal subspace of the documents reduces model variance.
    Object
    Latent semantic indexing
  19. Deerwester, S.; Dumais, S.; Landauer, T.; Furnass, G.; Beck, L.: Improving information retrieval with latent semantic indexing (1988) 0.01
    0.013931636 = product of:
      0.041794907 = sum of:
        0.041794907 = product of:
          0.083589815 = sum of:
            0.083589815 = weight(_text_:indexing in 2396) [ClassicSimilarity], result of:
              0.083589815 = score(doc=2396,freq=6.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.4395151 = fieldWeight in 2396, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2396)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Describes a latent semantic indexing (LSI) approach for improving information retrieval. Most document retrieval systems depend on matching keywords in queries against those in documents. The LSI approach tries to overcome the incompleteness and imprecision of latent relations among terms and documents. Tested performance of the LSI method ranged from considerably better than to roughly comparable to performance based on weighted keyword matching, apparently depending on the quality of the queries. Best LSI performance was found using a global entropy weighting for terms and about 100 dimensions for representing terms, documents and queries.
    Object
    Latent Semantic Indexing
  20. Deerwester, S.C.; Dumais, S.T.; Landauer, T.K.; Furnas, G.W.; Harshman, R.A.: Indexing by latent semantic analysis (1990) 0.01
    0.013931636 = product of:
      0.041794907 = sum of:
        0.041794907 = product of:
          0.083589815 = sum of:
            0.083589815 = weight(_text_:indexing in 2399) [ClassicSimilarity], result of:
              0.083589815 = score(doc=2399,freq=6.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.4395151 = fieldWeight in 2399, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2399)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents ("semantic structure") in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. Initial tests find this completely automatic method for retrieval to be promising.
    Object
    Latent Semantic Indexing

Years

Languages

  • e 64
  • d 5
  • chi 1
  • More… Less…

Types

  • a 58
  • m 8
  • s 3
  • el 2
  • r 1
  • More… Less…