Search (75 results, page 1 of 4)

  • × theme_ss:"Retrievalalgorithmen"
  1. Efron, M.; Winget, M.: Query polyrepresentation for ranking retrieval systems without relevance judgments (2010) 0.01
    0.007346594 = product of:
      0.036732968 = sum of:
        0.012212053 = product of:
          0.03663616 = sum of:
            0.03663616 = weight(_text_:problem in 3469) [ClassicSimilarity], result of:
              0.03663616 = score(doc=3469,freq=2.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.28137225 = fieldWeight in 3469, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3469)
          0.33333334 = coord(1/3)
        0.024520915 = product of:
          0.07356274 = sum of:
            0.07356274 = weight(_text_:2010 in 3469) [ClassicSimilarity], result of:
              0.07356274 = score(doc=3469,freq=5.0), product of:
                0.14672957 = queryWeight, product of:
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.03067635 = queryNorm
                0.5013491 = fieldWeight in 3469, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3469)
          0.33333334 = coord(1/3)
      0.2 = coord(2/10)
    
    Abstract
    Ranking information retrieval (IR) systems with respect to their effectiveness is a crucial operation during IR evaluation, as well as during data fusion. This article offers a novel method of approaching the system-ranking problem, based on the widely studied idea of polyrepresentation. The principle of polyrepresentation suggests that a single information need can be represented by many query articulations-what we call query aspects. By skimming the top k (where k is small) documents retrieved by a single system for multiple query aspects, we collect a set of documents that are likely to be relevant to a given test topic. Labeling these skimmed documents as putatively relevant lets us build pseudorelevance judgments without undue human intervention. We report experiments where using these pseudorelevance judgments delivers a rank ordering of IR systems that correlates highly with rankings based on human relevance judgments.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1081-1091
    Year
    2010
  2. Moura, E.S. de; Fernandes, D.; Ribeiro-Neto, B.; Silva, A.S. da; Gonçalves, M.A.: Using structural information to improve search in Web collections (2010) 0.01
    0.007346594 = product of:
      0.036732968 = sum of:
        0.012212053 = product of:
          0.03663616 = sum of:
            0.03663616 = weight(_text_:problem in 4119) [ClassicSimilarity], result of:
              0.03663616 = score(doc=4119,freq=2.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.28137225 = fieldWeight in 4119, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4119)
          0.33333334 = coord(1/3)
        0.024520915 = product of:
          0.07356274 = sum of:
            0.07356274 = weight(_text_:2010 in 4119) [ClassicSimilarity], result of:
              0.07356274 = score(doc=4119,freq=5.0), product of:
                0.14672957 = queryWeight, product of:
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.03067635 = queryNorm
                0.5013491 = fieldWeight in 4119, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4119)
          0.33333334 = coord(1/3)
      0.2 = coord(2/10)
    
    Abstract
    In this work, we investigate the problem of using the block structure of Web pages to improve ranking results. Starting with basic intuitions provided by the concepts of term frequency (TF) and inverse document frequency (IDF), we propose nine block-weight functions to distinguish the impact of term occurrences inside page blocks, instead of inside whole pages. These are then used to compute a modified BM25 ranking function. Using four distinct Web collections, we ran extensive experiments to compare our block-weight ranking formulas with two other baselines: (a) a BM25 ranking applied to full pages, and (b) a BM25 ranking that takes into account best blocks. Our methods suggest that our block-weighting ranking method is superior to all baselines across all collections we used and that average gain in precision figures from 5 to 20% are generated.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.12, S.2503-2513
    Year
    2010
  3. Wei, F.; Li, W.; Liu, S.: iRANK: a rank-learn-combine framework for unsupervised ensemble ranking (2010) 0.01
    0.0069652274 = product of:
      0.034826137 = sum of:
        0.0143920425 = product of:
          0.043176126 = sum of:
            0.043176126 = weight(_text_:problem in 3472) [ClassicSimilarity], result of:
              0.043176126 = score(doc=3472,freq=4.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.33160037 = fieldWeight in 3472, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3472)
          0.33333334 = coord(1/3)
        0.020434096 = product of:
          0.06130229 = sum of:
            0.06130229 = weight(_text_:2010 in 3472) [ClassicSimilarity], result of:
              0.06130229 = score(doc=3472,freq=5.0), product of:
                0.14672957 = queryWeight, product of:
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.03067635 = queryNorm
                0.41779095 = fieldWeight in 3472, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3472)
          0.33333334 = coord(1/3)
      0.2 = coord(2/10)
    
    Abstract
    The authors address the problem of unsupervised ensemble ranking. Traditional approaches either combine multiple ranking criteria into a unified representation to obtain an overall ranking score or to utilize certain rank fusion or aggregation techniques to combine the ranking results. Beyond the aforementioned combine-then-rank and rank-then-combine approaches, the authors propose a novel rank-learn-combine ranking framework, called Interactive Ranking (iRANK), which allows two base rankers to teach each other before combination during the ranking process by providing their own ranking results as feedback to the others to boost the ranking performance. This mutual ranking refinement process continues until the two base rankers cannot learn from each other any more. The overall performance is improved by the enhancement of the base rankers through the mutual learning mechanism. The authors further design two ranking refinement strategies to efficiently and effectively use the feedback based on reasonable assumptions and rational analysis. Although iRANK is applicable to many applications, as a case study, they apply this framework to the sentence ranking problem in query-focused summarization and evaluate its effectiveness on the DUC 2005 and 2006 data sets. The results are encouraging with consistent and promising improvements.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1232-1243
    Year
    2010
  4. Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.00
    0.0034207494 = product of:
      0.017103747 = sum of:
        0.010176711 = product of:
          0.03053013 = sum of:
            0.03053013 = weight(_text_:problem in 664) [ClassicSimilarity], result of:
              0.03053013 = score(doc=664,freq=2.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.23447686 = fieldWeight in 664, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=664)
          0.33333334 = coord(1/3)
        0.0069270367 = product of:
          0.02078111 = sum of:
            0.02078111 = weight(_text_:22 in 664) [ClassicSimilarity], result of:
              0.02078111 = score(doc=664,freq=2.0), product of:
                0.10742335 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03067635 = queryNorm
                0.19345059 = fieldWeight in 664, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=664)
          0.33333334 = coord(1/3)
      0.2 = coord(2/10)
    
    Abstract
    A new challenge, accessing multiple relevant entities, arises from the availability of linked heterogeneous data. In this article, we address more specifically the problem of accessing relevant entities, such as publications and authors within a bibliographic network, given an information need. We propose a novel algorithm, called BibRank, that estimates a joint relevance of documents and authors within a bibliographic network. This model ranks each type of entity using a score propagation algorithm with respect to the query topic and the structure of the underlying bi-type information entity network. Evidence sources, namely content-based and network-based scores, are both used to estimate the topical similarity between connected entities. For this purpose, authorship relationships are analyzed through a language model-based score on the one hand and on the other hand, non topically related entities of the same type are detected through marginal citations. The article reports the results of experiments using the Bibrank algorithm for an information retrieval task. The CiteSeerX bibliographic data set forms the basis for the topical query automatic generation and evaluation. We show that a statistically significant improvement over closely related ranking models is achieved.
    Date
    22. 3.2013 19:34:49
  5. Fuhr, N.: Zur Überwindung der Diskrepanz zwischen Retrievalforschung und -praxis (1990) 0.00
    0.0029027916 = product of:
      0.029027916 = sum of:
        0.029027916 = product of:
          0.08708375 = sum of:
            0.08708375 = weight(_text_:1990 in 6625) [ClassicSimilarity], result of:
              0.08708375 = score(doc=6625,freq=5.0), product of:
                0.13825724 = queryWeight, product of:
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.03067635 = queryNorm
                0.62986755 = fieldWeight in 6625, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6625)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Source
    Nachrichten für Dokumentation. 41(1990), S.3-7
    Year
    1990
  6. Nakkouzi, Z.S.; Eastman, C.M.: Query formulation for handling negation in information retrieval systems (1990) 0.00
    0.0029027916 = product of:
      0.029027916 = sum of:
        0.029027916 = product of:
          0.08708375 = sum of:
            0.08708375 = weight(_text_:1990 in 3531) [ClassicSimilarity], result of:
              0.08708375 = score(doc=3531,freq=5.0), product of:
                0.13825724 = queryWeight, product of:
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.03067635 = queryNorm
                0.62986755 = fieldWeight in 3531, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3531)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Source
    Journal of the American Society for Information Science. 41(1990) no.3, S.171-182
    Year
    1990
  7. Wong, S.K.M.; Yao, Y.Y.: Query formulation in linear retrieval models (1990) 0.00
    0.0029027916 = product of:
      0.029027916 = sum of:
        0.029027916 = product of:
          0.08708375 = sum of:
            0.08708375 = weight(_text_:1990 in 3571) [ClassicSimilarity], result of:
              0.08708375 = score(doc=3571,freq=5.0), product of:
                0.13825724 = queryWeight, product of:
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.03067635 = queryNorm
                0.62986755 = fieldWeight in 3571, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3571)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Source
    Journal of the American Society for Information Science. 41(1990) no.5, S.334-341
    Year
    1990
  8. Fu, X.: Towards a model of implicit feedback for Web search (2010) 0.00
    0.0024520915 = product of:
      0.024520915 = sum of:
        0.024520915 = product of:
          0.07356274 = sum of:
            0.07356274 = weight(_text_:2010 in 3310) [ClassicSimilarity], result of:
              0.07356274 = score(doc=3310,freq=5.0), product of:
                0.14672957 = queryWeight, product of:
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.03067635 = queryNorm
                0.5013491 = fieldWeight in 3310, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3310)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.1, S.30-49
    Year
    2010
  9. Cecchini, R.L.; Lorenzetti, C.M.; Maguitman, A.G.; Brignole, N.B.: Multiobjective evolutionary algorithms for context-based search (2010) 0.00
    0.0024520915 = product of:
      0.024520915 = sum of:
        0.024520915 = product of:
          0.07356274 = sum of:
            0.07356274 = weight(_text_:2010 in 3482) [ClassicSimilarity], result of:
              0.07356274 = score(doc=3482,freq=5.0), product of:
                0.14672957 = queryWeight, product of:
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.03067635 = queryNorm
                0.5013491 = fieldWeight in 3482, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3482)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1258-1274
    Year
    2010
  10. Efron, M.: Linear time series models for term weighting in information retrieval (2010) 0.00
    0.0024520915 = product of:
      0.024520915 = sum of:
        0.024520915 = product of:
          0.07356274 = sum of:
            0.07356274 = weight(_text_:2010 in 3688) [ClassicSimilarity], result of:
              0.07356274 = score(doc=3688,freq=5.0), product of:
                0.14672957 = queryWeight, product of:
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.03067635 = queryNorm
                0.5013491 = fieldWeight in 3688, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3688)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.7, S.1299-1312
    Year
    2010
  11. Mayr, P.: Bradfordizing mit Katalogdaten : Alternative Sicht auf Suchergebnisse und Publikationsquellen durch Re-Ranking (2010) 0.00
    0.0024520915 = product of:
      0.024520915 = sum of:
        0.024520915 = product of:
          0.07356274 = sum of:
            0.07356274 = weight(_text_:2010 in 4301) [ClassicSimilarity], result of:
              0.07356274 = score(doc=4301,freq=5.0), product of:
                0.14672957 = queryWeight, product of:
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.03067635 = queryNorm
                0.5013491 = fieldWeight in 4301, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4301)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Source
    BuB. 62(2010) H.1, S.61-63
    Year
    2010
  12. Oberhauser, O.: Relevance Ranking in den Online-Katalogen der "nächsten Generation" (2010) 0.00
    0.0024520915 = product of:
      0.024520915 = sum of:
        0.024520915 = product of:
          0.07356274 = sum of:
            0.07356274 = weight(_text_:2010 in 4308) [ClassicSimilarity], result of:
              0.07356274 = score(doc=4308,freq=5.0), product of:
                0.14672957 = queryWeight, product of:
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.03067635 = queryNorm
                0.5013491 = fieldWeight in 4308, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4308)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Source
    Mitteilungen der Vereinigung Österreichischer Bibliothekarinnen und Bibliothekare. 63(2010) H.1/2, S.25-37
    Year
    2010
  13. Maron, M.E.: ¬An historical note on the origins of probabilistic indexing (2008) 0.00
    0.002302727 = product of:
      0.02302727 = sum of:
        0.02302727 = product of:
          0.069081806 = sum of:
            0.069081806 = weight(_text_:problem in 2047) [ClassicSimilarity], result of:
              0.069081806 = score(doc=2047,freq=4.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.5305606 = fieldWeight in 2047, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2047)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    The motivation behind "Probabilistic Indexing" was to replace two-valued thinking about information retrieval with probabilistic notions. This involved a new view of the information retrieval problem - viewing it as problem of inference and prediction, and introducing probabilistically weighted indexes and probabilistically ranked output. These ideas were first formulated and written up in August 1958.
  14. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.00
    0.0022166518 = product of:
      0.022166518 = sum of:
        0.022166518 = product of:
          0.06649955 = sum of:
            0.06649955 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.06649955 = score(doc=402,freq=2.0), product of:
                0.10742335 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03067635 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  15. Deerwester, S.C.; Dumais, S.T.; Landauer, T.K.; Furnas, G.W.; Harshman, R.A.: Indexing by latent semantic analysis (1990) 0.00
    0.0021770939 = product of:
      0.021770937 = sum of:
        0.021770937 = product of:
          0.06531281 = sum of:
            0.06531281 = weight(_text_:1990 in 2399) [ClassicSimilarity], result of:
              0.06531281 = score(doc=2399,freq=5.0), product of:
                0.13825724 = queryWeight, product of:
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.03067635 = queryNorm
                0.47240067 = fieldWeight in 2399, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2399)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Source
    Journal of the American Society for Information Science. 41(1990) no.6, S.391-407
    Year
    1990
  16. Dang, E.K.F.; Luk, R.W.P.; Allan, J.; Ho, K.S.; Chung, K.F.L.; Lee, D.L.: ¬A new context-dependent term weight computed by boost and discount using relevance information (2010) 0.00
    0.0020434097 = product of:
      0.020434096 = sum of:
        0.020434096 = product of:
          0.06130229 = sum of:
            0.06130229 = weight(_text_:2010 in 4120) [ClassicSimilarity], result of:
              0.06130229 = score(doc=4120,freq=5.0), product of:
                0.14672957 = queryWeight, product of:
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.03067635 = queryNorm
                0.41779095 = fieldWeight in 4120, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4120)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.12, S.2514-2530
    Year
    2010
  17. Sachs, W.M.: ¬An approach to associative retrieval through the theory of fuzzy sets (1976) 0.00
    0.0020353422 = product of:
      0.020353422 = sum of:
        0.020353422 = product of:
          0.06106026 = sum of:
            0.06106026 = weight(_text_:problem in 7) [ClassicSimilarity], result of:
              0.06106026 = score(doc=7,freq=2.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.46895373 = fieldWeight in 7, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.078125 = fieldNorm(doc=7)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    The theory of fuzzy sets is used to provide a rogorous formulation of the problem of associative retrieval. This formulation suggests the idea of using fuzzy clustering to organize data for retrieval
  18. Cheng, C.-S.; Chung, C.-P.; Shann, J.J.-J.: Fast query evaluation through document identifier assignment for inverted file-based information retrieval systems (2006) 0.00
    0.0020353422 = product of:
      0.020353422 = sum of:
        0.020353422 = product of:
          0.06106026 = sum of:
            0.06106026 = weight(_text_:problem in 979) [ClassicSimilarity], result of:
              0.06106026 = score(doc=979,freq=8.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.46895373 = fieldWeight in 979, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=979)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    Compressing an inverted file can greatly improve query performance of an information retrieval system (IRS) by reducing disk I/Os. We observe that a good document identifier assignment (DIA) can make the document identifiers in the posting lists more clustered, and result in better compression as well as shorter query processing time. In this paper, we tackle the NP-complete problem of finding an optimal DIA to minimize the average query processing time in an IRS when the probability distribution of query terms is given. We indicate that the greedy nearest neighbor (Greedy-NN) algorithm can provide excellent performance for this problem. However, the Greedy-NN algorithm is inappropriate if used in large-scale IRSs, due to its high complexity O(N2 × n), where N denotes the number of documents and n denotes the number of distinct terms. In real-world IRSs, the distribution of query terms is skewed. Based on this fact, we propose a fast O(N × n) heuristic, called partition-based document identifier assignment (PBDIA) algorithm, which can efficiently assign consecutive document identifiers to those documents containing frequently used query terms, and improve compression efficiency of the posting lists for those terms. This can result in reduced query processing time. The experimental results show that the PBDIA algorithm can yield a competitive performance versus the Greedy-NN for the DIA problem, and that this optimization problem has significant advantages for both long queries and parallel information retrieval (IR).
  19. Na, S.-H.; Kang, I.-S.; Roh, J.-E.; Lee, J.-H.: ¬An empirical study of query expansion and cluster-based retrieval in language modeling approach (2007) 0.00
    0.0020148864 = product of:
      0.020148862 = sum of:
        0.020148862 = product of:
          0.060446583 = sum of:
            0.060446583 = weight(_text_:problem in 906) [ClassicSimilarity], result of:
              0.060446583 = score(doc=906,freq=4.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.46424055 = fieldWeight in 906, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=906)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    The term mismatch problem in information retrieval is a critical problem, and several techniques have been developed, such as query expansion, cluster-based retrieval and dimensionality reduction to resolve this issue. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance by performing experimentations on seven test collections of NTCIR and TREC.
  20. Sánchez-de-Madariaga, R.; Fernández-del-Castillo, J.R.: ¬The bootstrapping of the Yarowsky algorithm in real corpora (2009) 0.00
    0.0020148864 = product of:
      0.020148862 = sum of:
        0.020148862 = product of:
          0.060446583 = sum of:
            0.060446583 = weight(_text_:problem in 2451) [ClassicSimilarity], result of:
              0.060446583 = score(doc=2451,freq=4.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.46424055 = fieldWeight in 2451, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2451)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    The Yarowsky bootstrapping algorithm resolves the homograph-level word sense disambiguation (WSD) problem, which is the sense granularity level required for real natural language processing (NLP) applications. At the same time it resolves the knowledge acquisition bottleneck problem affecting most WSD algorithms and can be easily applied to foreign language corpora. However, this paper shows that the Yarowsky algorithm is significantly less accurate when applied to domain fluctuating, real corpora. This paper also introduces a new bootstrapping methodology that performs much better when applied to these corpora. The accuracy achieved in non-domain fluctuating corpora is not reached due to inherent domain fluctuation ambiguities.

Years

Languages

  • e 66
  • d 9

Types

  • a 68
  • m 4
  • el 2
  • r 1
  • s 1
  • More… Less…