Search (11 results, page 1 of 1)

Hollink, V.; Kamps, J.; Monz, C.; Rijke, M. de: Monolingual document retrieval for European languages (2004) 0.01

0.011347376 = product of:
  0.034042127 = sum of:
    0.034042127 = product of:
      0.10212638 = sum of:
        0.10212638 = weight(_text_:retrieval in 3828) [ClassicSimilarity], result of:
          0.10212638 = score(doc=3828,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.6617001 = fieldWeight in 3828, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.109375 = fieldNorm(doc=3828)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: Information retrieval. 7(2004) no.1, S.33-52

Kamps, J.; Rijke, M. de; Sigurbjörnsson, B.: Length normalization in XML retrieval (2004) 0.01

0.009726323 = product of:
  0.02917897 = sum of:
    0.02917897 = product of:
      0.08753691 = sum of:
        0.08753691 = weight(_text_:retrieval in 4106) [ClassicSimilarity], result of:
          0.08753691 = score(doc=4106,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.5671716 = fieldWeight in 4106, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.09375 = fieldNorm(doc=4106)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Graus, D.; Odijk, D.; Rijke, M. de: ¬The birth of collective memories : analyzing emerging entities in text streams (2018) 0.01
```
0.0076319277 = product of:
  0.022895783 = sum of:
    0.022895783 = product of:
      0.06868735 = sum of:
        0.06868735 = weight(_text_:online in 4252) [ClassicSimilarity], result of:
          0.06868735 = score(doc=4252,freq=14.0), product of:
            0.1548489 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.051022716 = queryNorm
            0.4435766 = fieldWeight in 4252, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4252)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

We study how collective memories are formed online. We do so by tracking entities that emerge in public discourse, that is, in online text streams such as social media and news streams, before they are incorporated into Wikipedia, which, we argue, can be viewed as an online place for collective memory. By tracking how entities emerge in public discourse, that is, the temporal patterns between their first mention in online text streams and subsequent incorporation into collective memory, we gain insights into how the collective remembrance process happens online. Specifically, we analyze nearly 80,000 entities as they emerge in online text streams before they are incorporated into Wikipedia. The online text streams we use for our analysis comprise of social media and news streams, and span over 579 million documents in a time span of 18 months. We discover two main emergence patterns: entities that emerge in a "bursty" fashion, that is, that appear in public discourse without a precedent, blast into activity and transition into collective memory. Other entities display a "delayed" pattern, where they appear in public discourse, experience a period of inactivity, and then resurface before transitioning into our cultural collective memory.
Hofmann, K.; Balog, K.; Bogers, T.; Rijke, M. de: Contextual factors for finding similar experts (2010) 0.01
```
0.00701937 = product of:
  0.021058109 = sum of:
    0.021058109 = product of:
      0.06317432 = sum of:
        0.06317432 = weight(_text_:retrieval in 3456) [ClassicSimilarity], result of:
          0.06317432 = score(doc=3456,freq=12.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.40932083 = fieldWeight in 3456, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3456)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Expertise-seeking research studies how people search for expertise and choose whom to contact in the context of a specific task. An important outcome are models that identify factors that influence expert finding. Expertise retrieval addresses the same problem, expert finding, but from a system-centered perspective. The main focus has been on developing content-based algorithms similar to document search. These algorithms identify matching experts primarily on the basis of the textual content of documents with which experts are associated. Other factors, such as the ones identified by expertise-seeking models, are rarely taken into account. In this article, we extend content-based expert-finding approaches with contextual factors that have been found to influence human expert finding. We focus on a task of science communicators in a knowledge-intensive environment, the task of finding similar experts, given an example expert. Our approach combines expertise-seeking and retrieval research. First, we conduct a user study to identify contextual factors that may play a role in the studied task and environment. Then, we design expert retrieval models to capture these factors. We combine these with content-based retrieval models and evaluate them in a retrieval experiment. Our main finding is that while content-based features are the most important, human participants also take contextual factors into account, such as media experience and organizational structure. We develop two principled ways of modeling the identified factors and integrate them with content-based retrieval models. Our experiments show that models combining content-based and contextual factors can significantly outperform existing content-based models.
Meij, E.; Trieschnigg, D.; Rijke, M. de; Kraaij, W.: Conceptual language models for domain-specific retrieval (2010) 0.00
```
0.0049634436 = product of:
  0.014890331 = sum of:
    0.014890331 = product of:
      0.04467099 = sum of:
        0.04467099 = weight(_text_:retrieval in 4238) [ClassicSimilarity], result of:
          0.04467099 = score(doc=4238,freq=6.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.28943354 = fieldWeight in 4238, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4238)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Over the years, various meta-languages have been used to manually enrich documents with conceptual knowledge of some kind. Examples include keyword assignment to citations or, more recently, tags to websites. In this paper we propose generative concept models as an extension to query modeling within the language modeling framework, which leverages these conceptual annotations to improve retrieval. By means of relevance feedback the original query is translated into a conceptual representation, which is subsequently used to update the query model. Extensive experimental work on five test collections in two domains shows that our approach gives significant improvements in terms of recall, initial precision and mean average precision with respect to a baseline without relevance feedback. On one test collection, it is also able to outperform a text-based pseudo-relevance feedback approach based on relevance models. On the other test collections it performs similarly to relevance models. Overall, conceptual language models have the added advantage of offering query and browsing suggestions in the form of conceptual annotations. In addition, the internal structure of the meta-language can be exploited to add related terms. Our contributions are threefold. First, an extensive study is conducted on how to effectively translate a textual query into a conceptual representation. Second, we propose a method for updating a textual query model using the concepts in conceptual representation. Finally, we provide an extensive analysis of when and how this conceptual feedback improves retrieval.
Meij, E.; Rijke, M. de: Thesaurus-based feedback to support mixed search and browsing environments (2007) 0.00
```
0.004052635 = product of:
  0.012157904 = sum of:
    0.012157904 = product of:
      0.03647371 = sum of:
        0.03647371 = weight(_text_:retrieval in 2432) [ClassicSimilarity], result of:
          0.03647371 = score(doc=2432,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.23632148 = fieldWeight in 2432, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2432)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

We propose and evaluate a query expansion mechanism that supports searching and browsing in collections of annotated documents. Based on generative language models, our feedback mechanism uses document-level annotations to bias the generation of expansion terms and to generate browsing suggestions in the form of concepts selected from a controlled vocabulary (as typically used in digital library settings). We provide a detailed formalization of our feedback mechanism and evaluate its effectiveness using the TREC 2006 Genomics track test set. As to the retrieval effectiveness, we find a 20% improvement in mean average precision over a query-likelihood baseline, whilst increasing precision at 10. When we base the parameter estimation and feedback generation of our algorithm on a large corpus, we also find an improvement over state-of-the-art relevance models. The browsing suggestions are assessed along two dimensions: relevancy and specifity. We present an account of per-topic results, which helps understand for what type of queries our feedback mechanism is particularly helpful.

Theme

Semantisches Umfeld in Indexierung u. Retrieval
Berendsen, R.; Rijke, M. de; Balog, K.; Bogers, T.; Bosch, A. van den: On the assessment of expertise profiles (2013) 0.00
```
0.004052635 = product of:
  0.012157904 = sum of:
    0.012157904 = product of:
      0.03647371 = sum of:
        0.03647371 = weight(_text_:retrieval in 1089) [ClassicSimilarity], result of:
          0.03647371 = score(doc=1089,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.23632148 = fieldWeight in 1089, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1089)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Expertise retrieval has attracted significant interest in the field of information retrieval. Expert finding has been studied extensively, with less attention going to the complementary task of expert profiling, that is, automatically identifying topics about which a person is knowledgeable. We describe a test collection for expert profiling in which expert users have self-selected their knowledge areas. Motivated by the sparseness of this set of knowledge areas, we report on an assessment experiment in which academic experts judge a profile that has been automatically generated by state-of-the-art expert-profiling algorithms; optionally, experts can indicate a level of expertise for relevant areas. Experts may also give feedback on the quality of the system-generated knowledge areas. We report on a content analysis of these comments and gain insights into what aspects of profiles matter to experts. We provide an error analysis of the system-generated profiles, identifying factors that help explain why certain experts may be harder to profile than others. We also analyze the impact on evaluating expert-profiling systems of using self-selected versus judged system-generated knowledge areas as ground truth; they rank systems somewhat differently but detect about the same amount of pairwise significant differences despite the fact that the judged system-generated assessments are more sparse.
Balog, K.; Azzopardi, L.; Rijke, M. de: ¬A language modeling framework for expert finding (2009) 0.00
```
0.0034387745 = product of:
  0.0103163235 = sum of:
    0.0103163235 = product of:
      0.03094897 = sum of:
        0.03094897 = weight(_text_:retrieval in 2447) [ClassicSimilarity], result of:
          0.03094897 = score(doc=2447,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.20052543 = fieldWeight in 2447, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2447)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Statistical language models have been successfully applied to many information retrieval tasks, including expert finding: the process of identifying experts given a particular topic. In this paper, we introduce and detail language modeling approaches that integrate the representation, association and search of experts using various textual data sources into a generative probabilistic framework. This provides a simple, intuitive, and extensible theoretical framework to underpin research into expertise search. To demonstrate the flexibility of the framework, two search strategies to find experts are modeled that incorporate different types of evidence extracted from the data, before being extended to also incorporate co-occurrence information. The models proposed are evaluated in the context of enterprise search systems within an intranet environment, where it is reasonable to assume that the list of experts is known, and that data to be mined is publicly accessible. Our experiments show that excellent performance can be achieved by using these models in such environments, and that this theoretical and empirical work paves the way for future principled extensions.
Kenter, T.; Balog, K.; Rijke, M. de: Evaluating document filtering systems over time (2015) 0.00
```
0.0032421078 = product of:
  0.009726323 = sum of:
    0.009726323 = product of:
      0.029178968 = sum of:
        0.029178968 = weight(_text_:retrieval in 2672) [ClassicSimilarity], result of:
          0.029178968 = score(doc=2672,freq=4.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.18905719 = fieldWeight in 2672, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03125 = fieldNorm(doc=2672)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Document filtering is a popular task in information retrieval. A stream of documents arriving over time is filtered for documents relevant to a set of topics. The distinguishing feature of document filtering is the temporal aspect introduced by the stream of documents. Document filtering systems, up to now, have been evaluated in terms of traditional metrics like (micro- or macro-averaged) precision, recall, MAP, nDCG, F1 and utility. We argue that these metrics do not capture all relevant aspects of the systems being evaluated. In particular, they lack support for the temporal dimension of the task. We propose a time-sensitive way of measuring performance of document filtering systems over time by employing trend estimation. In short, the performance is calculated for batches, a trend line is fitted to the results, and the estimated performance of systems at the end of the evaluation period is used to compare systems. We detail the application of our proposed trend estimation framework and examine the assumptions that need to hold for valid significance testing. Additionally, we analyze the requirements a document filtering metric has to meet and show that traditional macro-averaged true-positive-based metrics, like precision, recall and utility fail to capture essential information when applied in a batch setting. In particular, false positives returned in a batch for topics that are absent from the ground truth in that batch go unnoticed. This is a serious flaw as over-generation of a system might be overlooked this way. We propose a new metric, aptness, that does capture false positives. We incorporate this metric in an overall score and show that this new score does meet all requirements. To demonstrate the results of our proposed evaluation methodology, we analyze the runs submitted to the two most recent editions of a document filtering evaluation campaign. We re-evaluate the runs submitted to the Cumulative Citation Recommendation task of the 2012 and 2013 editions of the TREC Knowledge Base Acceleration track, and show that important new insights emerge.

Footnote

Beitrag in einem Themenschwerpunkt "Time and information retrieval"
He, J.; Meij, E.; Rijke, M. de: Result diversification based on query-specific cluster ranking (2011) 0.00
```
0.0028656456 = product of:
  0.008596936 = sum of:
    0.008596936 = product of:
      0.025790809 = sum of:
        0.025790809 = weight(_text_:retrieval in 4355) [ClassicSimilarity], result of:
          0.025790809 = score(doc=4355,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.16710453 = fieldWeight in 4355, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4355)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification is restricted to documents belonging to clusters that potentially contain a high percentage of relevant documents. Empirical results show that the proposed framework improves the performance of several existing diversification methods. The framework also gives rise to a simple yet effective cluster-based approach to result diversification that selects documents from different clusters to be included in a ranked list in a round robin fashion. We describe a set of experiments aimed at thoroughly analyzing the behavior of the two main components of the proposed diversification framework, ranking and selecting clusters for diversification. Both components have a crucial impact on the overall performance of our framework, but ranking clusters plays a more important role than selecting clusters. We also examine properties that clusters should have in order for our diversification framework to be effective. Most relevant documents should be contained in a small number of high-quality clusters, while there should be no dominantly large clusters. Also, documents from these high-quality clusters should have a diverse content. These properties are strongly correlated with the overall performance of the proposed diversification framework.

Cai, F.; Rijke, M. de: Learning from homologous queries and semantically related terms for query auto completion (2016) 0.00

0.0028656456 = product of:
  0.008596936 = sum of:
    0.008596936 = product of:
      0.025790809 = sum of:
        0.025790809 = weight(_text_:retrieval in 2971) [ClassicSimilarity], result of:
          0.025790809 = score(doc=2971,freq=2.0), product of:
            0.15433937 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.051022716 = queryNorm
            0.16710453 = fieldWeight in 2971, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2971)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Theme: Semantisches Umfeld in Indexierung u. Retrieval

Search (11 results, page 1 of 1)

Authors

Years

Themes