Search (129 results, page 1 of 7)

  • × theme_ss:"Retrievalalgorithmen"
  1. Vechtomova, O.; Karamuftuoglu, M.: Lexical cohesion and term proximity in document ranking (2008) 0.02
    0.023217112 = product of:
      0.069651335 = sum of:
        0.021349104 = product of:
          0.04269821 = sum of:
            0.04269821 = weight(_text_:29 in 2101) [ClassicSimilarity], result of:
              0.04269821 = score(doc=2101,freq=2.0), product of:
                0.13732746 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03903913 = queryNorm
                0.31092256 = fieldWeight in 2101, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2101)
          0.5 = coord(1/2)
        0.048302233 = product of:
          0.09660447 = sum of:
            0.09660447 = weight(_text_:methods in 2101) [ClassicSimilarity], result of:
              0.09660447 = score(doc=2101,freq=6.0), product of:
                0.15695344 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.03903913 = queryNorm
                0.6154976 = fieldWeight in 2101, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2101)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    We demonstrate effective new methods of document ranking based on lexical cohesive relationships between query terms. The proposed methods rely solely on the lexical relationships between original query terms, and do not involve query expansion or relevance feedback. Two types of lexical cohesive relationship information between query terms are used in document ranking: short-distance collocation relationship between query terms, and long-distance relationship, determined by the collocation of query terms with other words. The methods are evaluated on TREC corpora, and show improvements over baseline systems.
    Date
    1. 8.2008 12:29:05
  2. Faloutsos, C.: Signature files (1992) 0.02
    0.023153096 = product of:
      0.13891858 = sum of:
        0.13891858 = sum of:
          0.09660447 = weight(_text_:methods in 3499) [ClassicSimilarity], result of:
            0.09660447 = score(doc=3499,freq=6.0), product of:
              0.15695344 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.03903913 = queryNorm
              0.6154976 = fieldWeight in 3499, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0625 = fieldNorm(doc=3499)
          0.04231411 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
            0.04231411 = score(doc=3499,freq=2.0), product of:
              0.1367084 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03903913 = queryNorm
              0.30952093 = fieldWeight in 3499, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=3499)
      0.16666667 = coord(1/6)
    
    Abstract
    Presents a survey and discussion on signature-based text retrieval methods. It describes the main idea behind the signature approach and its advantages over other text retrieval methods, it provides a classification of the signature methods that have appeared in the literature, it describes the main representatives of each class, together with the relative advantages and drawbacks, and it gives a list of applications as well as commercial or university prototypes that use the signature approach
    Date
    7. 5.1999 15:22:48
  3. Wong, S.K.M.: On modelling information retrieval with probabilistic inference (1995) 0.02
    0.019240541 = product of:
      0.057721622 = sum of:
        0.029834319 = product of:
          0.059668638 = sum of:
            0.059668638 = weight(_text_:theory in 1938) [ClassicSimilarity], result of:
              0.059668638 = score(doc=1938,freq=2.0), product of:
                0.16234003 = queryWeight, product of:
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.03903913 = queryNorm
                0.36755344 = fieldWeight in 1938, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1938)
          0.5 = coord(1/2)
        0.027887305 = product of:
          0.05577461 = sum of:
            0.05577461 = weight(_text_:methods in 1938) [ClassicSimilarity], result of:
              0.05577461 = score(doc=1938,freq=2.0), product of:
                0.15695344 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.03903913 = queryNorm
                0.35535768 = fieldWeight in 1938, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1938)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Examines and extends the logical models of information retrieval in the context of probability theory and extends the applications of these fundamental ideas to term weighting and relevance. Develops a unified framework for modelling the retrieval process with probabilistic inference to provide a common conceptual and mathematical basis for many retrieval models, such as Boolean, fuzzy sets, vector space, and conventional probabilistic models. Employs this framework to identify the underlying assumptions by each model and analyzes the inherent relationships between them. Although the treatment is primarily theoretical, practical methods for rstimating the required probabilities are provided by simple examples
  4. Campos, L.M. de; Fernández-Luna, J.M.; Huete, J.F.: Implementing relevance feedback in the Bayesian network retrieval model (2003) 0.02
    0.019232918 = product of:
      0.1153975 = sum of:
        0.1153975 = sum of:
          0.083661914 = weight(_text_:methods in 825) [ClassicSimilarity], result of:
            0.083661914 = score(doc=825,freq=8.0), product of:
              0.15695344 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.03903913 = queryNorm
              0.53303653 = fieldWeight in 825, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.046875 = fieldNorm(doc=825)
          0.03173558 = weight(_text_:22 in 825) [ClassicSimilarity], result of:
            0.03173558 = score(doc=825,freq=2.0), product of:
              0.1367084 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03903913 = queryNorm
              0.23214069 = fieldWeight in 825, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=825)
      0.16666667 = coord(1/6)
    
    Abstract
    Relevance Feedback consists in automatically formulating a new query according to the relevance judgments provided by the user after evaluating a set of retrieved documents. In this article, we introduce several relevance feedback methods for the Bayesian Network Retrieval ModeL The theoretical frame an which our methods are based uses the concept of partial evidences, which summarize the new pieces of information gathered after evaluating the results obtained by the original query. These partial evidences are inserted into the underlying Bayesian network and a new inference process (probabilities propagation) is run to compute the posterior relevance probabilities of the documents in the collection given the new query. The quality of the proposed methods is tested using a preliminary experimentation with different standard document collections.
    Date
    22. 3.2003 19:30:19
    Footnote
    Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
  5. Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.02
    0.017853169 = product of:
      0.10711901 = sum of:
        0.10711901 = sum of:
          0.069718264 = weight(_text_:methods in 2591) [ClassicSimilarity], result of:
            0.069718264 = score(doc=2591,freq=8.0), product of:
              0.15695344 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.03903913 = queryNorm
              0.4441971 = fieldWeight in 2591, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2591)
          0.03740074 = weight(_text_:22 in 2591) [ClassicSimilarity], result of:
            0.03740074 = score(doc=2591,freq=4.0), product of:
              0.1367084 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03903913 = queryNorm
              0.27358043 = fieldWeight in 2591, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2591)
      0.16666667 = coord(1/6)
    
    Abstract
    Purpose In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.
    Date
    20. 1.2015 18:30:22
    18. 9.2018 18:22:56
  6. Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.02
    0.017835194 = product of:
      0.05350558 = sum of:
        0.018646449 = product of:
          0.037292898 = sum of:
            0.037292898 = weight(_text_:theory in 4218) [ClassicSimilarity], result of:
              0.037292898 = score(doc=4218,freq=2.0), product of:
                0.16234003 = queryWeight, product of:
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.03903913 = queryNorm
                0.2297209 = fieldWeight in 4218, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4218)
          0.5 = coord(1/2)
        0.034859132 = product of:
          0.069718264 = sum of:
            0.069718264 = weight(_text_:methods in 4218) [ClassicSimilarity], result of:
              0.069718264 = score(doc=4218,freq=8.0), product of:
                0.15695344 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.03903913 = queryNorm
                0.4441971 = fieldWeight in 4218, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4218)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning.
  7. Crestani, F.; Dominich, S.; Lalmas, M.; Rijsbergen, C.J.K. van: Mathematical, logical, and formal methods in information retrieval : an introduction to the special issue (2003) 0.02
    0.017364822 = product of:
      0.104188934 = sum of:
        0.104188934 = sum of:
          0.07245335 = weight(_text_:methods in 1451) [ClassicSimilarity], result of:
            0.07245335 = score(doc=1451,freq=6.0), product of:
              0.15695344 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.03903913 = queryNorm
              0.4616232 = fieldWeight in 1451, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.046875 = fieldNorm(doc=1451)
          0.03173558 = weight(_text_:22 in 1451) [ClassicSimilarity], result of:
            0.03173558 = score(doc=1451,freq=2.0), product of:
              0.1367084 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03903913 = queryNorm
              0.23214069 = fieldWeight in 1451, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=1451)
      0.16666667 = coord(1/6)
    
    Abstract
    Research an the use of mathematical, logical, and formal methods, has been central to Information Retrieval research for a long time. Research in this area is important not only because it helps enhancing retrieval effectiveness, but also because it helps clarifying the underlying concepts of Information Retrieval. In this article we outline some of the major aspects of the subject, and summarize the papers of this special issue with respect to how they relate to these aspects. We conclude by highlighting some directions of future research, which are needed to better understand the formal characteristics of Information Retrieval.
    Date
    22. 3.2003 19:27:36
    Footnote
    Einführung zu den Beiträgen eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
  8. MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.02
    0.01634812 = product of:
      0.09808872 = sum of:
        0.09808872 = sum of:
          0.05577461 = weight(_text_:methods in 5108) [ClassicSimilarity], result of:
            0.05577461 = score(doc=5108,freq=2.0), product of:
              0.15695344 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.03903913 = queryNorm
              0.35535768 = fieldWeight in 5108, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0625 = fieldNorm(doc=5108)
          0.04231411 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
            0.04231411 = score(doc=5108,freq=2.0), product of:
              0.1367084 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03903913 = queryNorm
              0.30952093 = fieldWeight in 5108, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=5108)
      0.16666667 = coord(1/6)
    
    Abstract
    In this paper methods for both speeding up passage processing and examining more passages using parallel computers are explored. The number of passages processed are varied in order to examine the effect on retrieval effectiveness and efficiency. The particular algorithm applied has previously been used to good effect in Okapi experiments at TREC. This algorithm and the mechanism for applying parallel computing to speed up processing are described.
    Date
    20. 1.2007 18:30:22
  9. Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.02
    0.01634812 = product of:
      0.09808872 = sum of:
        0.09808872 = sum of:
          0.05577461 = weight(_text_:methods in 1422) [ClassicSimilarity], result of:
            0.05577461 = score(doc=1422,freq=2.0), product of:
              0.15695344 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.03903913 = queryNorm
              0.35535768 = fieldWeight in 1422, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0625 = fieldNorm(doc=1422)
          0.04231411 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
            0.04231411 = score(doc=1422,freq=2.0), product of:
              0.1367084 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03903913 = queryNorm
              0.30952093 = fieldWeight in 1422, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=1422)
      0.16666667 = coord(1/6)
    
    Date
    22. 3.2003 19:27:23
    Footnote
    Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
  10. Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.02
    0.015196927 = product of:
      0.04559078 = sum of:
        0.016011827 = product of:
          0.032023653 = sum of:
            0.032023653 = weight(_text_:29 in 2648) [ClassicSimilarity], result of:
              0.032023653 = score(doc=2648,freq=2.0), product of:
                0.13732746 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03903913 = queryNorm
                0.23319192 = fieldWeight in 2648, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2648)
          0.5 = coord(1/2)
        0.029578956 = product of:
          0.05915791 = sum of:
            0.05915791 = weight(_text_:methods in 2648) [ClassicSimilarity], result of:
              0.05915791 = score(doc=2648,freq=4.0), product of:
                0.15695344 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.03903913 = queryNorm
                0.37691376 = fieldWeight in 2648, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2648)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    An inverted index stores, for each term that appears in a collection of documents, a list of document numbers containing that term. Such an index is indispensible when Boolean or informal ranked queries are to be answered. Construction of the index ist, however, a non trivial task. Simple methods using in.memory data structures cannot be used for large collections because they require too much random access storage, and traditional disc based methods require large amounts of temporary file space. Describes a new indexing algorithm designed to create large compressed inverted indexes in situ. It makes use of simple compression codes for the positive integers and an in place external multi way merge sort. The new techniques has been used to invert a 2-gigabyte text collection in under 4 hours, using less than 40 megabytes of temporary disc space, and less than 20 megabytes of main memory
    Date
    27.11.1995 21:29:58
  11. Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.02
    0.015148915 = product of:
      0.09089349 = sum of:
        0.09089349 = sum of:
          0.05915791 = weight(_text_:methods in 2717) [ClassicSimilarity], result of:
            0.05915791 = score(doc=2717,freq=4.0), product of:
              0.15695344 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.03903913 = queryNorm
              0.37691376 = fieldWeight in 2717, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.046875 = fieldNorm(doc=2717)
          0.03173558 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
            0.03173558 = score(doc=2717,freq=2.0), product of:
              0.1367084 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03903913 = queryNorm
              0.23214069 = fieldWeight in 2717, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2717)
      0.16666667 = coord(1/6)
    
    Abstract
    Previous work an search-engine design has indicated that information-seekers may benefit from being given the opportunity to exploit multiple sources of evidence of document relatedness. Few existing systems, however, give users more than minimal control over the selections that may be made among methods of exploitation. By applying the methods of "document network analysis" (DNA), a unifying, graph-theoretic model of content-, collaboration-, and context-based systems (CCC) may be developed in which the nature of the similarities between types of document relatedness and document ranking are clarified. The usefulness of the approach to system design suggested by this model may be tested by constructing and evaluating a prototype system (UCXtra) that allows searchers to maintain control over the multiple ways in which document collections may be ranked and re-ranked.
    Date
    11. 9.2004 17:32:22
  12. Burgin, R.: ¬The retrieval effectiveness of 5 clustering algorithms as a function of indexing exhaustivity (1995) 0.01
    0.014470684 = product of:
      0.086824104 = sum of:
        0.086824104 = sum of:
          0.060377788 = weight(_text_:methods in 3365) [ClassicSimilarity], result of:
            0.060377788 = score(doc=3365,freq=6.0), product of:
              0.15695344 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.03903913 = queryNorm
              0.384686 = fieldWeight in 3365, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0390625 = fieldNorm(doc=3365)
          0.026446318 = weight(_text_:22 in 3365) [ClassicSimilarity], result of:
            0.026446318 = score(doc=3365,freq=2.0), product of:
              0.1367084 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03903913 = queryNorm
              0.19345059 = fieldWeight in 3365, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=3365)
      0.16666667 = coord(1/6)
    
    Abstract
    The retrieval effectiveness of 5 hierarchical clustering methods (single link, complete link, group average, Ward's method, and weighted average) is examined as a function of indexing exhaustivity with 4 test collections (CR, Cranfield, Medlars, and Time). Evaluations of retrieval effectiveness, based on 3 measures of optimal retrieval performance, confirm earlier findings that the performance of a retrieval system based on single link clustering varies as a function of indexing exhaustivity but fail ti find similar patterns for other clustering methods. The data also confirm earlier findings regarding the poor performance of single link clustering is a retrieval environment. The poor performance of single link clustering appears to derive from that method's tendency to produce a small number of large, ill defined document clusters. By contrast, the data examined here found the retrieval performance of the other clustering methods to be general comparable. The data presented also provides an opportunity to examine the theoretical limits of cluster based retrieval and to compare these theoretical limits to the effectiveness of operational implementations. Performance standards of the 4 document collections examined were found to vary widely, and the effectiveness of operational implementations were found to be in the range defined as unacceptable. Further improvements in search strategies and document representations warrant investigations
    Date
    22. 2.1996 11:20:06
  13. Carpineto, C.; Romano, G.: Order-theoretical ranking (2000) 0.01
    0.014431859 = product of:
      0.043295577 = sum of:
        0.018646449 = product of:
          0.037292898 = sum of:
            0.037292898 = weight(_text_:theory in 4766) [ClassicSimilarity], result of:
              0.037292898 = score(doc=4766,freq=2.0), product of:
                0.16234003 = queryWeight, product of:
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.03903913 = queryNorm
                0.2297209 = fieldWeight in 4766, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4766)
          0.5 = coord(1/2)
        0.024649128 = product of:
          0.049298257 = sum of:
            0.049298257 = weight(_text_:methods in 4766) [ClassicSimilarity], result of:
              0.049298257 = score(doc=4766,freq=4.0), product of:
                0.15695344 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.03903913 = queryNorm
                0.31409478 = fieldWeight in 4766, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4766)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Current best-match ranking (BMR) systems perform well but cannot handle word mismatch between a query and a document. The best known alternative ranking method, hierarchical clustering-based ranking (HCR), seems to be more robust than BMR with respect to this problem, but it is hampered by theoretical and practical limitations. We present an approach to document ranking that explicitly addresses the word mismatch problem by exploiting interdocument similarity information in a novel way. Document ranking is seen as a query-document transformation driven by a conceptual representation of the whole document collection, into which the query is merged. Our approach is nased on the theory of concept (or Galois) lattices, which, er argue, provides a powerful, well-founded, and conputationally-tractable framework to model the space in which documents and query are represented and to compute such a transformation. We compared information retrieval using concept lattice-based ranking (CLR) to BMR and HCR. The results showed that HCR was outperformed by CLR as well as BMR, and suggested that, of the two best methods, BMR achieved better performance than CLR on the whole document set, whereas CLR compared more favorably when only the first retrieved documents were used for evaluation. We also evaluated the three methods' specific ability to rank documents that did not match the query, in which case the speriority of CLR over BMR and HCR was apparent
  14. Nie, J.-Y.: Query expansion and query translation as logical inference (2003) 0.01
    0.0144304065 = product of:
      0.04329122 = sum of:
        0.022375738 = product of:
          0.044751476 = sum of:
            0.044751476 = weight(_text_:theory in 1425) [ClassicSimilarity], result of:
              0.044751476 = score(doc=1425,freq=2.0), product of:
                0.16234003 = queryWeight, product of:
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.03903913 = queryNorm
                0.27566507 = fieldWeight in 1425, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1425)
          0.5 = coord(1/2)
        0.020915478 = product of:
          0.041830957 = sum of:
            0.041830957 = weight(_text_:methods in 1425) [ClassicSimilarity], result of:
              0.041830957 = score(doc=1425,freq=2.0), product of:
                0.15695344 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.03903913 = queryNorm
                0.26651827 = fieldWeight in 1425, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1425)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    A number of studies have examined the problems of query expansion in monolingual Information Retrieval (IR), and query translation for crosslanguage IR. However, no link has been made between them. This article first shows that query translation is a special case of query expansion. There is also another set of studies an inferential IR. Again, there is no relationship established with query translation or query expansion. The second claim of this article is that logical inference is a general form that covers query expansion and query translation. This analysis provides a unified view of different subareas of IR. We further develop the inferential IR approach in two particular contexts: using fuzzy logic and probability theory. The evaluation formulas obtained are shown to strongly correspond to those used in other IR models. This indicates that inference is indeed the core of advanced IR.
    Footnote
    Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
  15. Dominich, S.: Mathematical foundations of information retrieval (2001) 0.01
    0.013197741 = product of:
      0.03959322 = sum of:
        0.02637006 = product of:
          0.05274012 = sum of:
            0.05274012 = weight(_text_:theory in 1753) [ClassicSimilarity], result of:
              0.05274012 = score(doc=1753,freq=4.0), product of:
                0.16234003 = queryWeight, product of:
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.03903913 = queryNorm
                0.3248744 = fieldWeight in 1753, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1753)
          0.5 = coord(1/2)
        0.013223159 = product of:
          0.026446318 = sum of:
            0.026446318 = weight(_text_:22 in 1753) [ClassicSimilarity], result of:
              0.026446318 = score(doc=1753,freq=2.0), product of:
                0.1367084 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03903913 = queryNorm
                0.19345059 = fieldWeight in 1753, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1753)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    This book offers a comprehensive and consistent mathematical approach to information retrieval (IR) without which no implementation is possible, and sheds an entirely new light upon the structure of IR models. It contains the descriptions of all IR models in a unified formal style and language, along with examples for each, thus offering a comprehensive overview of them. The book also creates mathematical foundations and a consistent mathematical theory (including all mathematical results achieved so far) of IR as a stand-alone mathematical discipline, which thus can be read and taught independently. Also, the book contains all necessary mathematical knowledge on which IR relies, to help the reader avoid searching different sources. The book will be of interest to computer or information scientists, librarians, mathematicians, undergraduate students and researchers whose work involves information retrieval.
    Date
    22. 3.2008 12:26:32
    Series
    Mathematical modelling: theory and applications; 12
  16. Costa Carvalho, A. da; Rossi, C.; Moura, E.S. de; Silva, A.S. da; Fernandes, D.: LePrEF: Learn to precompute evidence fusion for efficient query evaluation (2012) 0.01
    0.012664106 = product of:
      0.037992317 = sum of:
        0.01334319 = product of:
          0.02668638 = sum of:
            0.02668638 = weight(_text_:29 in 278) [ClassicSimilarity], result of:
              0.02668638 = score(doc=278,freq=2.0), product of:
                0.13732746 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03903913 = queryNorm
                0.19432661 = fieldWeight in 278, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=278)
          0.5 = coord(1/2)
        0.024649128 = product of:
          0.049298257 = sum of:
            0.049298257 = weight(_text_:methods in 278) [ClassicSimilarity], result of:
              0.049298257 = score(doc=278,freq=4.0), product of:
                0.15695344 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.03903913 = queryNorm
                0.31409478 = fieldWeight in 278, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=278)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    State-of-the-art search engine ranking methods combine several distinct sources of relevance evidence to produce a high-quality ranking of results for each query. The fusion of information is currently done at query-processing time, which has a direct effect on the response time of search systems. Previous research also shows that an alternative to improve search efficiency in textual databases is to precompute term impacts at indexing time. In this article, we propose a novel alternative to precompute term impacts, providing a generic framework for combining any distinct set of sources of evidence by using a machine-learning technique. This method retains the advantages of producing high-quality results, but avoids the costs of combining evidence at query-processing time. Our method, called Learn to Precompute Evidence Fusion (LePrEF), uses genetic programming to compute a unified precomputed impact value for each term found in each document prior to query processing, at indexing time. Compared with previous research on precomputing term impacts, our method offers the advantage of providing a generic framework to precompute impact using any set of relevance evidence at any text collection, whereas previous research articles do not. The precomputed impact values are indexed and used later for computing document ranking at query-processing time. By doing so, our method effectively reduces the query processing to simple additions of such impacts. We show that this approach, while leading to results comparable to state-of-the-art ranking methods, also can lead to a significant decrease in computational costs during query processing.
    Date
    24. 6.2012 14:29:10
  17. Bodoff, D.; Enache, D.; Kambil, A.; Simon, G.; Yukhimets, A.: ¬A unified maximum likelihood approach to document retrieval (2001) 0.01
    0.012309102 = product of:
      0.036927305 = sum of:
        0.016011827 = product of:
          0.032023653 = sum of:
            0.032023653 = weight(_text_:29 in 174) [ClassicSimilarity], result of:
              0.032023653 = score(doc=174,freq=2.0), product of:
                0.13732746 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03903913 = queryNorm
                0.23319192 = fieldWeight in 174, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=174)
          0.5 = coord(1/2)
        0.020915478 = product of:
          0.041830957 = sum of:
            0.041830957 = weight(_text_:methods in 174) [ClassicSimilarity], result of:
              0.041830957 = score(doc=174,freq=2.0), product of:
                0.15695344 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.03903913 = queryNorm
                0.26651827 = fieldWeight in 174, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.046875 = fieldNorm(doc=174)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Empirical work shows significant benefits from using relevance feedback data to improve information retrieval (IR) performance. Still, one fundamental difficulty has limited the ability to fully exploit this valuable data. The problem is that it is not clear whether the relevance feedback data should be used to train the system about what the users really mean, or about what the documents really mean. In this paper, we resolve the question using a maximum likelihood framework. We show how all the available data can be used to simultaneously estimate both documents and queries in proportions that are optimal in a maximum likelihood sense. The resulting algorithm is directly applicable to many approaches to IR, and the unified framework can help explain previously reported results as well as guidethe search for new methods that utilize feedback data in IR
    Date
    29. 9.2001 17:52:51
  18. Kelledy, F.; Smeaton, A.F.: Signature files and beyond (1996) 0.01
    0.012261091 = product of:
      0.07356654 = sum of:
        0.07356654 = sum of:
          0.041830957 = weight(_text_:methods in 6973) [ClassicSimilarity], result of:
            0.041830957 = score(doc=6973,freq=2.0), product of:
              0.15695344 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.03903913 = queryNorm
              0.26651827 = fieldWeight in 6973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.046875 = fieldNorm(doc=6973)
          0.03173558 = weight(_text_:22 in 6973) [ClassicSimilarity], result of:
            0.03173558 = score(doc=6973,freq=2.0), product of:
              0.1367084 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03903913 = queryNorm
              0.23214069 = fieldWeight in 6973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=6973)
      0.16666667 = coord(1/6)
    
    Abstract
    Proposes that signature files be used as a viable alternative to other indexing strategies such as inverted files for searching through large volumes of text. Demonstrates through simulation, that search times can be further reduced by enhancing the basic signature file concept using deterministic partitioning algorithms which eliminate the need for an exhaustive search of the entire signature file. Reports research to evaluate the performance of some deterministic partitioning algorithms in a non simulated environment using 276 MB of raw newspaper text (taken from the Wall Street Journal) and real user queries. Presents a selection of results to illustrate trends and highlight important aspects of the performance of these methods under realistic rather than simulated operating conditions. As a result of the research reported here certain aspects of this approach to signature files are shown to be found wanting and require improvement. Suggests lines of future research on the partitioning of signature files
    Source
    Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon
  19. Langville, A.N.; Meyer, C.D.: Google's PageRank and beyond : the science of search engine rankings (2006) 0.01
    0.011311792 = product of:
      0.033935376 = sum of:
        0.015822036 = product of:
          0.031644072 = sum of:
            0.031644072 = weight(_text_:theory in 6) [ClassicSimilarity], result of:
              0.031644072 = score(doc=6,freq=4.0), product of:
                0.16234003 = queryWeight, product of:
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.03903913 = queryNorm
                0.19492465 = fieldWeight in 6, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=6)
          0.5 = coord(1/2)
        0.018113337 = product of:
          0.036226675 = sum of:
            0.036226675 = weight(_text_:methods in 6) [ClassicSimilarity], result of:
              0.036226675 = score(doc=6,freq=6.0), product of:
                0.15695344 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.03903913 = queryNorm
                0.2308116 = fieldWeight in 6, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=6)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Content
    Inhalt: Chapter 1. Introduction to Web Search Engines: 1.1 A Short History of Information Retrieval - 1.2 An Overview of Traditional Information Retrieval - 1.3 Web Information Retrieval Chapter 2. Crawling, Indexing, and Query Processing: 2.1 Crawling - 2.2 The Content Index - 2.3 Query Processing Chapter 3. Ranking Webpages by Popularity: 3.1 The Scene in 1998 - 3.2 Two Theses - 3.3 Query-Independence Chapter 4. The Mathematics of Google's PageRank: 4.1 The Original Summation Formula for PageRank - 4.2 Matrix Representation of the Summation Equations - 4.3 Problems with the Iterative Process - 4.4 A Little Markov Chain Theory - 4.5 Early Adjustments to the Basic Model - 4.6 Computation of the PageRank Vector - 4.7 Theorem and Proof for Spectrum of the Google Matrix Chapter 5. Parameters in the PageRank Model: 5.1 The a Factor - 5.2 The Hyperlink Matrix H - 5.3 The Teleportation Matrix E Chapter 6. The Sensitivity of PageRank; 6.1 Sensitivity with respect to alpha - 6.2 Sensitivity with respect to H - 6.3 Sensitivity with respect to vT - 6.4 Other Analyses of Sensitivity - 6.5 Sensitivity Theorems and Proofs Chapter 7. The PageRank Problem as a Linear System: 7.1 Properties of (I - alphaS) - 7.2 Properties of (I - alphaH) - 7.3 Proof of the PageRank Sparse Linear System Chapter 8. Issues in Large-Scale Implementation of PageRank: 8.1 Storage Issues - 8.2 Convergence Criterion - 8.3 Accuracy - 8.4 Dangling Nodes - 8.5 Back Button Modeling
    Chapter 9. Accelerating the Computation of PageRank: 9.1 An Adaptive Power Method - 9.2 Extrapolation - 9.3 Aggregation - 9.4 Other Numerical Methods Chapter 10. Updating the PageRank Vector: 10.1 The Two Updating Problems and their History - 10.2 Restarting the Power Method - 10.3 Approximate Updating Using Approximate Aggregation - 10.4 Exact Aggregation - 10.5 Exact vs. Approximate Aggregation - 10.6 Updating with Iterative Aggregation - 10.7 Determining the Partition - 10.8 Conclusions Chapter 11. The HITS Method for Ranking Webpages: 11.1 The HITS Algorithm - 11.2 HITS Implementation - 11.3 HITS Convergence - 11.4 HITS Example - 11.5 Strengths and Weaknesses of HITS - 11.6 HITS's Relationship to Bibliometrics - 11.7 Query-Independent HITS - 11.8 Accelerating HITS - 11.9 HITS Sensitivity Chapter 12. Other Link Methods for Ranking Webpages: 12.1 SALSA - 12.2 Hybrid Ranking Methods - 12.3 Rankings based on Traffic Flow Chapter 13. The Future of Web Information Retrieval: 13.1 Spam - 13.2 Personalization - 13.3 Clustering - 13.4 Intelligent Agents - 13.5 Trends and Time-Sensitive Search - 13.6 Privacy and Censorship - 13.7 Library Classification Schemes - 13.8 Data Fusion Chapter 14. Resources for Web Information Retrieval: 14.1 Resources for Getting Started - 14.2 Resources for Serious Study Chapter 15. The Mathematics Guide: 15.1 Linear Algebra - 15.2 Perron-Frobenius Theory - 15.3 Markov Chains - 15.4 Perron Complementation - 15.5 Stochastic Complementation - 15.6 Censoring - 15.7 Aggregation - 15.8 Disaggregation
  20. Baloh, P.; Desouza, K.C.; Hackney, R.: Contextualizing organizational interventions of knowledge management systems : a design science perspectiveA domain analysis (2012) 0.01
    0.010623204 = product of:
      0.03186961 = sum of:
        0.018646449 = product of:
          0.037292898 = sum of:
            0.037292898 = weight(_text_:theory in 241) [ClassicSimilarity], result of:
              0.037292898 = score(doc=241,freq=2.0), product of:
                0.16234003 = queryWeight, product of:
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.03903913 = queryNorm
                0.2297209 = fieldWeight in 241, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=241)
          0.5 = coord(1/2)
        0.013223159 = product of:
          0.026446318 = sum of:
            0.026446318 = weight(_text_:22 in 241) [ClassicSimilarity], result of:
              0.026446318 = score(doc=241,freq=2.0), product of:
                0.1367084 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03903913 = queryNorm
                0.19345059 = fieldWeight in 241, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=241)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    We address how individuals' (workers) knowledge needs influence the design of knowledge management systems (KMS), enabling knowledge creation and utilization. It is evident that KMS technologies and activities are indiscriminately deployed in most organizations with little regard to the actual context of their adoption. Moreover, it is apparent that the extant literature pertaining to knowledge management projects is frequently deficient in identifying the variety of factors indicative for successful KMS. This presents an obvious business practice and research gap that requires a critical analysis of the necessary intervention that will actually improve how workers can leverage and form organization-wide knowledge. This research involved an extensive review of the literature, a grounded theory methodological approach and rigorous data collection and synthesis through an empirical case analysis (Parsons Brinckerhoff and Samsung). The contribution of this study is the formulation of a model for designing KMS based upon the design science paradigm, which aspires to create artifacts that are interdependent of people and organizations. The essential proposition is that KMS design and implementation must be contextualized in relation to knowledge needs and that these will differ for various organizational settings. The findings present valuable insights and further understanding of the way in which KMS design efforts should be focused.
    Date
    11. 6.2012 14:22:34

Languages

  • e 115
  • d 12
  • pt 1
  • More… Less…

Types

  • a 115
  • m 8
  • el 3
  • s 3
  • p 1
  • r 1
  • x 1
  • More… Less…