Search (94 results, page 1 of 5)

  • × theme_ss:"Retrievalalgorithmen"
  1. Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.05
    0.04686617 = product of:
      0.1405985 = sum of:
        0.119892076 = weight(_text_:forschung in 2051) [ClassicSimilarity], result of:
          0.119892076 = score(doc=2051,freq=2.0), product of:
            0.1858777 = queryWeight, product of:
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.038207654 = queryNorm
            0.64500517 = fieldWeight in 2051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.09375 = fieldNorm(doc=2051)
        0.020706438 = product of:
          0.062119313 = sum of:
            0.062119313 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
              0.062119313 = score(doc=2051,freq=2.0), product of:
                0.13379669 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038207654 = queryNorm
                0.46428138 = fieldWeight in 2051, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2051)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Date
    14. 6.2015 22:12:56
    Source
    Automatische Indexierung zwischen Forschung und Anwendung, Hrsg.: G. Lustig
  2. Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.03
    0.034277808 = product of:
      0.10283342 = sum of:
        0.089029126 = weight(_text_:propose in 1422) [ClassicSimilarity], result of:
          0.089029126 = score(doc=1422,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.45382494 = fieldWeight in 1422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0625 = fieldNorm(doc=1422)
        0.013804292 = product of:
          0.041412875 = sum of:
            0.041412875 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
              0.041412875 = score(doc=1422,freq=2.0), product of:
                0.13379669 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038207654 = queryNorm
                0.30952093 = fieldWeight in 1422, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1422)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    We propose a novel approach to incorporate term similarity and inverse document frequency into a logical model of information retrieval. The ability of the logic to handle expressive representations along with the use of such classical notions are promising characteristics for IR systems. The approach proposed here has been efficiently implemented and experiments against test collections are presented.
    Date
    22. 3.2003 19:27:23
  3. Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.03
    0.034277808 = product of:
      0.10283342 = sum of:
        0.089029126 = weight(_text_:propose in 1431) [ClassicSimilarity], result of:
          0.089029126 = score(doc=1431,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.45382494 = fieldWeight in 1431, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0625 = fieldNorm(doc=1431)
        0.013804292 = product of:
          0.041412875 = sum of:
            0.041412875 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
              0.041412875 = score(doc=1431,freq=2.0), product of:
                0.13379669 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038207654 = queryNorm
                0.30952093 = fieldWeight in 1431, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1431)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    Properties of a percentile-based rating scale needed in bibliometrics are formulated. Based on these properties, P100 was recently introduced as a new citation-rank approach (Bornmann, Leydesdorff, & Wang, 2013). In this paper, we conceptualize P100 and propose an improvement which we call P100'. Advantages and disadvantages of citation-rank indicators are noted.
    Date
    22. 8.2014 17:05:18
  4. Cannane, A.; Williams, H.E.: General-purpose compression for efficient retrieval (2001) 0.03
    0.025739681 = product of:
      0.07721904 = sum of:
        0.06677184 = weight(_text_:propose in 5705) [ClassicSimilarity], result of:
          0.06677184 = score(doc=5705,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 5705, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=5705)
        0.0104471985 = product of:
          0.031341594 = sum of:
            0.031341594 = weight(_text_:29 in 5705) [ClassicSimilarity], result of:
              0.031341594 = score(doc=5705,freq=2.0), product of:
                0.13440257 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.038207654 = queryNorm
                0.23319192 = fieldWeight in 5705, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5705)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    Compression of databases not only reduces space requirements but can also reduce overall retrieval times. In text databases, compression of documents based on semistatic modeling with words has been shown to be both practical and fast. Similarly, for specific applications -such as databases of integers or scientific databases-specially designed semistatic compression schemes work well. We propose a scheme for general-purpose compression that can be applied to all types of data stored in large collections. We describe our approach -which we call RAY-in detail, and show experimentally the compression available, compression and decompression costs, and performance as a stream and random-access technique. We show that, in many cases, RAY achieves better compression than an efficient Huffman scheme and popular adaptive compression techniques, and that it can be used as an efficient general-purpose compression scheme
    Date
    29. 9.2001 13:59:55
  5. Käki, M.: fKWIC: frequency-based Keyword-in-Context Index for filtering Web search results (2006) 0.03
    0.025739681 = product of:
      0.07721904 = sum of:
        0.06677184 = weight(_text_:propose in 6112) [ClassicSimilarity], result of:
          0.06677184 = score(doc=6112,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 6112, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=6112)
        0.0104471985 = product of:
          0.031341594 = sum of:
            0.031341594 = weight(_text_:29 in 6112) [ClassicSimilarity], result of:
              0.031341594 = score(doc=6112,freq=2.0), product of:
                0.13440257 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.038207654 = queryNorm
                0.23319192 = fieldWeight in 6112, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6112)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    Enormous Web search engine databases combined with short search queries result in large result sets that are often difficult to access. Result ranking works fairly well, but users need help when it fails. For these situations, we propose a filtering interface that is inspired by keyword-in-context (KWIC) indices. The user interface lists the most frequent keyword contexts (fKWIC). When a context is selected, the corresponding results are displayed in the result list, allowing users to concentrate on the specific context. We compared the keyword context index user interface to the rank order result listing in an experiment with 36 participants. The results show that the proposed user interface was 29% faster in finding relevant results, and the precision of the selected results was 19% higher. In addition, participants showed positive attitudes toward the system.
  6. Ziegler, B.: ESS: ein schneller Algorithmus zur Mustersuche in Zeichenfolgen (1996) 0.02
    0.023312349 = product of:
      0.13987409 = sum of:
        0.13987409 = weight(_text_:forschung in 7543) [ClassicSimilarity], result of:
          0.13987409 = score(doc=7543,freq=2.0), product of:
            0.1858777 = queryWeight, product of:
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.038207654 = queryNorm
            0.752506 = fieldWeight in 7543, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.109375 = fieldNorm(doc=7543)
      0.16666667 = coord(1/6)
    
    Source
    Informatik: Forschung und Entwicklung. 11(1996) no.2, S.69-83
  7. Costa Carvalho, A. da; Rossi, C.; Moura, E.S. de; Silva, A.S. da; Fernandes, D.: LePrEF: Learn to precompute evidence fusion for efficient query evaluation (2012) 0.02
    0.021449735 = product of:
      0.064349204 = sum of:
        0.055643205 = weight(_text_:propose in 278) [ClassicSimilarity], result of:
          0.055643205 = score(doc=278,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.2836406 = fieldWeight in 278, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=278)
        0.008706 = product of:
          0.026117997 = sum of:
            0.026117997 = weight(_text_:29 in 278) [ClassicSimilarity], result of:
              0.026117997 = score(doc=278,freq=2.0), product of:
                0.13440257 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.038207654 = queryNorm
                0.19432661 = fieldWeight in 278, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=278)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    State-of-the-art search engine ranking methods combine several distinct sources of relevance evidence to produce a high-quality ranking of results for each query. The fusion of information is currently done at query-processing time, which has a direct effect on the response time of search systems. Previous research also shows that an alternative to improve search efficiency in textual databases is to precompute term impacts at indexing time. In this article, we propose a novel alternative to precompute term impacts, providing a generic framework for combining any distinct set of sources of evidence by using a machine-learning technique. This method retains the advantages of producing high-quality results, but avoids the costs of combining evidence at query-processing time. Our method, called Learn to Precompute Evidence Fusion (LePrEF), uses genetic programming to compute a unified precomputed impact value for each term found in each document prior to query processing, at indexing time. Compared with previous research on precomputing term impacts, our method offers the advantage of providing a generic framework to precompute impact using any set of relevance evidence at any text collection, whereas previous research articles do not. The precomputed impact values are indexed and used later for computing document ranking at query-processing time. By doing so, our method effectively reduces the query processing to simple additions of such impacts. We show that this approach, while leading to results comparable to state-of-the-art ranking methods, also can lead to a significant decrease in computational costs during query processing.
    Date
    24. 6.2012 14:29:10
  8. Silva, R.M.; Gonçalves, M.A.; Veloso, A.: ¬A Two-stage active learning method for learning to rank (2014) 0.02
    0.021449735 = product of:
      0.064349204 = sum of:
        0.055643205 = weight(_text_:propose in 1184) [ClassicSimilarity], result of:
          0.055643205 = score(doc=1184,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.2836406 = fieldWeight in 1184, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1184)
        0.008706 = product of:
          0.026117997 = sum of:
            0.026117997 = weight(_text_:29 in 1184) [ClassicSimilarity], result of:
              0.026117997 = score(doc=1184,freq=2.0), product of:
                0.13440257 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.038207654 = queryNorm
                0.19432661 = fieldWeight in 1184, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1184)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can later be used to rank new query results. These training sets are costly and laborious to produce, requiring human annotators to assess the relevance or order of the documents in relation to a query. Active learning algorithms are able to reduce the labeling effort by selectively sampling an unlabeled set and choosing data instances that maximize a learning function's effectiveness. In this article, we propose a novel two-stage active learning method for L2R that combines and exploits interesting properties of its constituent parts, thus being effective and practical. In the first stage, an association rule active sampling algorithm is used to select a very small but effective initial training set. In the second stage, a query-by-committee strategy trained with the first-stage set is used to iteratively select more examples until a preset labeling budget is met or a target effectiveness is achieved. We test our method with various LETOR benchmarking data sets and compare it with several baselines to show that it achieves good results using only a small portion of the original training sets.
    Date
    26. 1.2014 20:29:57
  9. Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.02
    0.021423629 = product of:
      0.064270884 = sum of:
        0.055643205 = weight(_text_:propose in 664) [ClassicSimilarity], result of:
          0.055643205 = score(doc=664,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.2836406 = fieldWeight in 664, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=664)
        0.008627683 = product of:
          0.025883049 = sum of:
            0.025883049 = weight(_text_:22 in 664) [ClassicSimilarity], result of:
              0.025883049 = score(doc=664,freq=2.0), product of:
                0.13379669 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038207654 = queryNorm
                0.19345059 = fieldWeight in 664, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=664)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    A new challenge, accessing multiple relevant entities, arises from the availability of linked heterogeneous data. In this article, we address more specifically the problem of accessing relevant entities, such as publications and authors within a bibliographic network, given an information need. We propose a novel algorithm, called BibRank, that estimates a joint relevance of documents and authors within a bibliographic network. This model ranks each type of entity using a score propagation algorithm with respect to the query topic and the structure of the underlying bi-type information entity network. Evidence sources, namely content-based and network-based scores, are both used to estimate the topical similarity between connected entities. For this purpose, authorship relationships are analyzed through a language model-based score on the one hand and on the other hand, non topically related entities of the same type are detected through marginal citations. The article reports the results of experiments using the Bibrank algorithm for an information retrieval task. The CiteSeerX bibliographic data set forms the basis for the topical query automatic generation and evaluation. We show that a statistically significant improvement over closely related ranking models is achieved.
    Date
    22. 3.2013 19:34:49
  10. Ayadi, H.; Torjmen-Khemakhem, M.; Daoud, M.; Xiangji Huang, J.; Ben Jemaa, M.: MF-Re-Rank : a modality feature-based re-ranking model for medical image retrieval (2018) 0.02
    0.017159788 = product of:
      0.051479362 = sum of:
        0.044514563 = weight(_text_:propose in 4459) [ClassicSimilarity], result of:
          0.044514563 = score(doc=4459,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.22691247 = fieldWeight in 4459, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.03125 = fieldNorm(doc=4459)
        0.006964799 = product of:
          0.020894397 = sum of:
            0.020894397 = weight(_text_:29 in 4459) [ClassicSimilarity], result of:
              0.020894397 = score(doc=4459,freq=2.0), product of:
                0.13440257 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.038207654 = queryNorm
                0.15546128 = fieldWeight in 4459, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4459)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    One of the main challenges in medical image retrieval is the increasing volume of image data, which render it difficult for domain experts to find relevant information from large data sets. Effective and efficient medical image retrieval systems are required to better manage medical image information. Text-based image retrieval (TBIR) was very successful in retrieving images with textual descriptions. Several TBIR approaches rely on models based on bag-of-words approaches, in which the image retrieval problem turns into one of standard text-based information retrieval; where the meanings and values of specific medical entities in the text and metadata are ignored in the image representation and retrieval process. However, we believe that TBIR should extract specific medical entities and terms and then exploit these elements to achieve better image retrieval results. Therefore, we propose a novel reranking method based on medical-image-dependent features. These features are manually selected by a medical expert from imaging modalities and medical terminology. First, we represent queries and images using only medical-image-dependent features such as image modality and image scale. Second, we exploit the defined features in a new reranking method for medical image retrieval. Our motivation is the large influence of image modality in medical image retrieval and its impact on image-relevance scores. To evaluate our approach, we performed a series of experiments on the medical ImageCLEF data sets from 2009 to 2013. The BM25 model, a language model, and an image-relevance feedback model are used as baselines to evaluate our approach. The experimental results show that compared to the BM25 model, the proposed model significantly enhances image retrieval performance. We also compared our approach with other state-of-the-art approaches and show that our approach performs comparably to those of the top three runs in the official ImageCLEF competition.
    Date
    29. 9.2018 11:43:31
  11. Chang, M.; Poon, C.K.: Efficient phrase querying with common phrase index (2008) 0.02
    0.015738275 = product of:
      0.09442965 = sum of:
        0.09442965 = weight(_text_:propose in 2061) [ClassicSimilarity], result of:
          0.09442965 = score(doc=2061,freq=4.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.48135406 = fieldWeight in 2061, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=2061)
      0.16666667 = coord(1/6)
    
    Abstract
    In this paper, we propose a common phrase index as an efficient index structure to support phrase queries in a very large text database. Our structure is an extension of previous index structures for phrases and achieves better query efficiency with modest extra storage cost. Further improvement in efficiency can be attained by implementing our index according to our observation of the dynamic nature of common word set. In experimental evaluation, a common phrase index using 255 common words has an improvement of about 11% and 62% in query time for the overall and large queries (queries of long phrases) respectively over an auxiliary nextword index. Moreover, it has only about 19% extra storage cost. Compared with an inverted index, our improvement is about 72% and 87% for the overall and large queries respectively. We also propose to implement a common phrase index with dynamic update feature. Our experiments show that more improvement in time efficiency can be achieved.
  12. Behnert, C.; Borst, T.: Neue Formen der Relevanz-Sortierung in bibliothekarischen Informationssystemen : das DFG-Projekt LibRank (2015) 0.01
    0.013321342 = product of:
      0.07992805 = sum of:
        0.07992805 = weight(_text_:forschung in 5392) [ClassicSimilarity], result of:
          0.07992805 = score(doc=5392,freq=2.0), product of:
            0.1858777 = queryWeight, product of:
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.038207654 = queryNorm
            0.43000343 = fieldWeight in 5392, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.0625 = fieldNorm(doc=5392)
      0.16666667 = coord(1/6)
    
    Source
    Bibliothek: Forschung und Praxis. 39(2015) H.3, S.384-393
  13. Ozdemiray, A.M.; Altingovde, I.S.: Explicit search result diversification using score and rank aggregation methods (2015) 0.01
    0.01311523 = product of:
      0.07869138 = sum of:
        0.07869138 = weight(_text_:propose in 1856) [ClassicSimilarity], result of:
          0.07869138 = score(doc=1856,freq=4.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.40112838 = fieldWeight in 1856, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1856)
      0.16666667 = coord(1/6)
    
    Abstract
    Search result diversification is one of the key techniques to cope with the ambiguous and underspecified information needs of web users. In the last few years, strategies that are based on the explicit knowledge of query aspects emerged as highly effective ways of diversifying search results. Our contributions in this article are two-fold. First, we extensively evaluate the performance of a state-of-the-art explicit diversification strategy and pin-point its potential weaknesses. We propose basic yet novel optimizations to remedy these weaknesses and boost the performance of this algorithm. As a second contribution, inspired by the success of the current diversification strategies that exploit the relevance of the candidate documents to individual query aspects, we cast the diversification problem into the problem of ranking aggregation. To this end, we propose to materialize the re-rankings of the candidate documents for each query aspect and then merge these rankings by adapting the score(-based) and rank(-based) aggregation methods. Our extensive experimental evaluations show that certain ranking aggregation methods are superior to existing explicit diversification strategies in terms of diversification effectiveness. Furthermore, these ranking aggregation methods have lower computational complexity than the state-of-the-art diversification strategies.
  14. Lee, J.; Min, J.-K.; Oh, A.; Chung, C.-W.: Effective ranking and search techniques for Web resources considering semantic relationships (2014) 0.01
    0.01311523 = product of:
      0.07869138 = sum of:
        0.07869138 = weight(_text_:propose in 2670) [ClassicSimilarity], result of:
          0.07869138 = score(doc=2670,freq=4.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.40112838 = fieldWeight in 2670, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2670)
      0.16666667 = coord(1/6)
    
    Abstract
    On the Semantic Web, the types of resources and the semantic relationships between resources are defined in an ontology. By using that information, the accuracy of information retrieval can be improved. In this paper, we present effective ranking and search techniques considering the semantic relationships in an ontology. Our technique retrieves top-k resources which are the most relevant to query keywords through the semantic relationships. To do this, we propose a weighting measure for the semantic relationship. Based on this measure, we propose a novel ranking method which considers the number of meaningful semantic relationships between a resource and keywords as well as the coverage and discriminating power of keywords. In order to improve the efficiency of the search, we prune the unnecessary search space using the length and weight thresholds of the semantic relationship path. In addition, we exploit Threshold Algorithm based on an extended inverted index to answer top-k results efficiently. The experimental results using real data sets demonstrate that our retrieval method using the semantic information generates accurate results efficiently compared to the traditional methods.
  15. Jiang, J.-D.; Jiang, J.-Y.; Cheng, P.-J.: Cocluster hypothesis and ranking consistency for relevance ranking in web search (2019) 0.01
    0.01311523 = product of:
      0.07869138 = sum of:
        0.07869138 = weight(_text_:propose in 5247) [ClassicSimilarity], result of:
          0.07869138 = score(doc=5247,freq=4.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.40112838 = fieldWeight in 5247, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5247)
      0.16666667 = coord(1/6)
    
    Abstract
    Conventional approaches to relevance ranking typically optimize ranking models by each query separately. The traditional cluster hypothesis also does not consider the dependency between related queries. The goal of this paper is to leverage similar search intents to perform ranking consistency so that the search performance can be improved accordingly. Different from the previous supervised approach, which learns relevance by click-through data, we propose a novel cocluster hypothesis to bridge the gap between relevance ranking and ranking consistency. A nearest-neighbors test is also designed to measure the extent to which the cocluster hypothesis holds. Based on the hypothesis, we further propose a two-stage unsupervised approach, in which two ranking heuristics and a cost function are developed to optimize the combination of consistency and uniqueness (or inconsistency). Extensive experiments have been conducted on a real and large-scale search engine log. The experimental results not only verify the applicability of the proposed cocluster hypothesis but also show that our approach is effective in boosting the retrieval performance of the commercial search engine and reaches a comparable performance to the supervised approach.
  16. Heinz, S.; Zobel, J.: Efficient single-pass index construction for text databases (2003) 0.01
    0.012983414 = product of:
      0.077900484 = sum of:
        0.077900484 = weight(_text_:propose in 1678) [ClassicSimilarity], result of:
          0.077900484 = score(doc=1678,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3970968 = fieldWeight in 1678, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1678)
      0.16666667 = coord(1/6)
    
    Abstract
    Efficient construction of inverted indexes is essential to provision of search over large collections of text data. In this article, we review the principal approaches to inversion, analyze their theoretical cost, and present experimental results. We identify the drawbacks of existing inversion approaches and propose a single-pass inversion method that, in contrast to previous approaches, does not require the complete vocabulary of the indexed collection in main memory, can operate within limited resources, and does not sacrifice speed with high temporary storage requirements. We show that the performance of the single-pass approach can be improved by constructing inverted files in segments, reducing the cost of disk accesses during inversion of large volumes of data.
  17. Bidoki, A.M.Z.; Yazdani, N.: an intelligent ranking algorithm for web pages : DistanceRank (2008) 0.01
    0.012983414 = product of:
      0.077900484 = sum of:
        0.077900484 = weight(_text_:propose in 2068) [ClassicSimilarity], result of:
          0.077900484 = score(doc=2068,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3970968 = fieldWeight in 2068, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2068)
      0.16666667 = coord(1/6)
    
    Abstract
    A fast and efficient page ranking mechanism for web crawling and retrieval remains as a challenging issue. Recently, several link based ranking algorithms like PageRank, HITS and OPIC have been proposed. In this paper, we propose a novel recursive method based on reinforcement learning which considers distance between pages as punishment, called "DistanceRank" to compute ranks of web pages. The distance is defined as the number of "average clicks" between two pages. The objective is to minimize punishment or distance so that a page with less distance to have a higher rank. Experimental results indicate that DistanceRank outperforms other ranking algorithms in page ranking and crawling scheduling. Furthermore, the complexity of DistanceRank is low. We have used University of California at Berkeley's web for our experiments.
  18. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment (1998) 0.01
    0.011128641 = product of:
      0.06677184 = sum of:
        0.06677184 = weight(_text_:propose in 5) [ClassicSimilarity], result of:
          0.06677184 = score(doc=5,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 5, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=5)
      0.16666667 = coord(1/6)
    
    Abstract
    The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.
  19. Yang, L.; Ji, D.; Leong, M.: Document reranking by term distribution and maximal marginal relevance for chinese information retrieval (2007) 0.01
    0.011128641 = product of:
      0.06677184 = sum of:
        0.06677184 = weight(_text_:propose in 907) [ClassicSimilarity], result of:
          0.06677184 = score(doc=907,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 907, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=907)
      0.16666667 = coord(1/6)
    
    Abstract
    In this paper, we propose a document reranking method for Chinese information retrieval. The method is based on a term weighting scheme, which integrates local and global distribution of terms as well as document frequency, document positions and term length. The weight scheme allows randomly setting a larger portion of the retrieved documents as relevance feedback, and lifts off the worry that very fewer relevant documents appear in top retrieved documents. It also helps to improve the performance of maximal marginal relevance (MMR) in document reranking. The method was evaluated by MAP (mean average precision), a recall-oriented measure. Significance tests showed that our method can get significant improvement against standard baselines, and outperform relevant methods consistently.
  20. White, R.W.; Jose, J.M.; Ruthven, I.: ¬An implicit feedback approach for interactive information retrieval (2006) 0.01
    0.011128641 = product of:
      0.06677184 = sum of:
        0.06677184 = weight(_text_:propose in 964) [ClassicSimilarity], result of:
          0.06677184 = score(doc=964,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 964, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=964)
      0.16666667 = coord(1/6)
    
    Abstract
    Searchers can face problems finding the information they seek. One reason for this is that they may have difficulty devising queries to express their information needs. In this article, we describe an approach that uses unobtrusive monitoring of interaction to proactively support searchers. The approach chooses terms to better represent information needs by monitoring searcher interaction with different representations of top-ranked documents. Information needs are dynamic and can change as a searcher views information. The approach we propose gathers evidence on potential changes in these needs and uses this evidence to choose new retrieval strategies. We present an evaluation of how well our technique estimates information needs, how well it estimates changes in these needs and the appropriateness of the interface support it offers. The results are presented and the avenues for future research identified.

Years

Languages

  • e 79
  • d 13
  • m 1
  • pt 1
  • More… Less…

Types

  • a 88
  • m 4
  • s 2
  • el 1
  • r 1
  • x 1
  • More… Less…