Search (10 results, page 1 of 1)

Cai, F.; Rijke, M. de: Learning from homologous queries and semantically related terms for query auto completion (2016) 0.01

0.008026919 = product of:
  0.037458956 = sum of:
    0.017435152 = weight(_text_:web in 2971) [ClassicSimilarity], result of:
      0.017435152 = score(doc=2971,freq=2.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.18028519 = fieldWeight in 2971, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2971)
    0.0050448296 = weight(_text_:information in 2971) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=2971,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 2971, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2971)
    0.014978974 = weight(_text_:retrieval in 2971) [ClassicSimilarity], result of:
      0.014978974 = score(doc=2971,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.16710453 = fieldWeight in 2971, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2971)
  0.21428572 = coord(3/14)

Abstract: Query auto completion (QAC) models recommend possible queries to web search users when they start typing a query prefix. Most of today's QAC models rank candidate queries by popularity (i.e., frequency), and in doing so they tend to follow a strict query matching policy when counting the queries. That is, they ignore the contributions from so-called homologous queries, queries with the same terms but ordered differently or queries that expand the original query. Importantly, homologous queries often express a remarkably similar search intent. Moreover, today's QAC approaches often ignore semantically related terms. We argue that users are prone to combine semantically related terms when generating queries. We propose a learning to rank-based QAC approach, where, for the first time, features derived from homologous queries and semantically related terms are introduced. In particular, we consider: (i) the observed and predicted popularity of homologous queries for a query candidate; and (ii) the semantic relatedness of pairs of terms inside a query and pairs of queries inside a session. We quantify the improvement of the proposed new features using two large-scale real-world query logs and show that the mean reciprocal rank and the success rate can be improved by up to 9% over state-of-the-art QAC models.
Source: Information processing and management. 52(2016) no.4, S.628-643
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Hofmann, K.; Balog, K.; Bogers, T.; Rijke, M. de: Contextual factors for finding similar experts (2010) 0.01
```
0.0059622396 = product of:
  0.041735675 = sum of:
    0.0050448296 = weight(_text_:information in 3456) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=3456,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 3456, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3456)
    0.036690846 = weight(_text_:retrieval in 3456) [ClassicSimilarity], result of:
      0.036690846 = score(doc=3456,freq=12.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.40932083 = fieldWeight in 3456, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3456)
  0.14285715 = coord(2/14)
```
Abstract

Expertise-seeking research studies how people search for expertise and choose whom to contact in the context of a specific task. An important outcome are models that identify factors that influence expert finding. Expertise retrieval addresses the same problem, expert finding, but from a system-centered perspective. The main focus has been on developing content-based algorithms similar to document search. These algorithms identify matching experts primarily on the basis of the textual content of documents with which experts are associated. Other factors, such as the ones identified by expertise-seeking models, are rarely taken into account. In this article, we extend content-based expert-finding approaches with contextual factors that have been found to influence human expert finding. We focus on a task of science communicators in a knowledge-intensive environment, the task of finding similar experts, given an example expert. Our approach combines expertise-seeking and retrieval research. First, we conduct a user study to identify contextual factors that may play a role in the studied task and environment. Then, we design expert retrieval models to capture these factors. We combine these with content-based retrieval models and evaluate them in a retrieval experiment. Our main finding is that while content-based features are the most important, human participants also take contextual factors into account, such as media experience and organizational structure. We develop two principled ways of modeling the identified factors and integrate them with content-based retrieval models. Our experiments show that models combining content-based and contextual factors can significantly outperform existing content-based models.

Source

Journal of the American Society for Information Science and Technology. 61(2010) no.5, S.994-1014
Meij, E.; Trieschnigg, D.; Rijke, M. de; Kraaij, W.: Conceptual language models for domain-specific retrieval (2010) 0.00
```
0.004427025 = product of:
  0.030989174 = sum of:
    0.0050448296 = weight(_text_:information in 4238) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=4238,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 4238, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4238)
    0.025944345 = weight(_text_:retrieval in 4238) [ClassicSimilarity], result of:
      0.025944345 = score(doc=4238,freq=6.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.28943354 = fieldWeight in 4238, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4238)
  0.14285715 = coord(2/14)
```
Abstract

Over the years, various meta-languages have been used to manually enrich documents with conceptual knowledge of some kind. Examples include keyword assignment to citations or, more recently, tags to websites. In this paper we propose generative concept models as an extension to query modeling within the language modeling framework, which leverages these conceptual annotations to improve retrieval. By means of relevance feedback the original query is translated into a conceptual representation, which is subsequently used to update the query model. Extensive experimental work on five test collections in two domains shows that our approach gives significant improvements in terms of recall, initial precision and mean average precision with respect to a baseline without relevance feedback. On one test collection, it is also able to outperform a text-based pseudo-relevance feedback approach based on relevance models. On the other test collections it performs similarly to relevance models. Overall, conceptual language models have the added advantage of offering query and browsing suggestions in the form of conceptual annotations. In addition, the internal structure of the meta-language can be exploited to add related terms. Our contributions are threefold. First, an extensive study is conducted on how to effectively translate a textual query into a conceptual representation. Second, we propose a method for updating a textual query model using the concepts in conceptual representation. Finally, we provide an extensive analysis of when and how this conceptual feedback improves retrieval.

Source

Information processing and management. 46(2010) no.4, S.448-469
Berendsen, R.; Rijke, M. de; Balog, K.; Bogers, T.; Bosch, A. van den: On the assessment of expertise profiles (2013) 0.00
```
0.0040454194 = product of:
  0.028317936 = sum of:
    0.0071344664 = weight(_text_:information in 1089) [ClassicSimilarity], result of:
      0.0071344664 = score(doc=1089,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13714671 = fieldWeight in 1089, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1089)
    0.021183468 = weight(_text_:retrieval in 1089) [ClassicSimilarity], result of:
      0.021183468 = score(doc=1089,freq=4.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.23632148 = fieldWeight in 1089, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1089)
  0.14285715 = coord(2/14)
```
Abstract

Expertise retrieval has attracted significant interest in the field of information retrieval. Expert finding has been studied extensively, with less attention going to the complementary task of expert profiling, that is, automatically identifying topics about which a person is knowledgeable. We describe a test collection for expert profiling in which expert users have self-selected their knowledge areas. Motivated by the sparseness of this set of knowledge areas, we report on an assessment experiment in which academic experts judge a profile that has been automatically generated by state-of-the-art expert-profiling algorithms; optionally, experts can indicate a level of expertise for relevant areas. Experts may also give feedback on the quality of the system-generated knowledge areas. We report on a content analysis of these comments and gain insights into what aspects of profiles matter to experts. We provide an error analysis of the system-generated profiles, identifying factors that help explain why certain experts may be harder to profile than others. We also analyze the impact on evaluating expert-profiling systems of using self-selected versus judged system-generated knowledge areas as ground truth; they rank systems somewhat differently but detect about the same amount of pairwise significant differences despite the fact that the judged system-generated assessments are more sparse.

Source

Journal of the American Society for Information Science and Technology. 64(2013) no.10, S.2024-2044
Kenter, T.; Balog, K.; Rijke, M. de: Evaluating document filtering systems over time (2015) 0.00
```
0.003574072 = product of:
  0.025018502 = sum of:
    0.008071727 = weight(_text_:information in 2672) [ClassicSimilarity], result of:
      0.008071727 = score(doc=2672,freq=8.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.1551638 = fieldWeight in 2672, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=2672)
    0.016946774 = weight(_text_:retrieval in 2672) [ClassicSimilarity], result of:
      0.016946774 = score(doc=2672,freq=4.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.18905719 = fieldWeight in 2672, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=2672)
  0.14285715 = coord(2/14)
```
Abstract

Document filtering is a popular task in information retrieval. A stream of documents arriving over time is filtered for documents relevant to a set of topics. The distinguishing feature of document filtering is the temporal aspect introduced by the stream of documents. Document filtering systems, up to now, have been evaluated in terms of traditional metrics like (micro- or macro-averaged) precision, recall, MAP, nDCG, F1 and utility. We argue that these metrics do not capture all relevant aspects of the systems being evaluated. In particular, they lack support for the temporal dimension of the task. We propose a time-sensitive way of measuring performance of document filtering systems over time by employing trend estimation. In short, the performance is calculated for batches, a trend line is fitted to the results, and the estimated performance of systems at the end of the evaluation period is used to compare systems. We detail the application of our proposed trend estimation framework and examine the assumptions that need to hold for valid significance testing. Additionally, we analyze the requirements a document filtering metric has to meet and show that traditional macro-averaged true-positive-based metrics, like precision, recall and utility fail to capture essential information when applied in a batch setting. In particular, false positives returned in a batch for topics that are absent from the ground truth in that batch go unnoticed. This is a serious flaw as over-generation of a system might be overlooked this way. We propose a new metric, aptness, that does capture false positives. We incorporate this metric in an overall score and show that this new score does meet all requirements. To demonstrate the results of our proposed evaluation methodology, we analyze the runs submitted to the two most recent editions of a document filtering evaluation campaign. We re-evaluate the runs submitted to the Cumulative Citation Recommendation task of the 2012 and 2013 editions of the TREC Knowledge Base Acceleration track, and show that important new insights emerge.

Footnote

Beitrag in einem Themenschwerpunkt "Time and information retrieval"

Source

Information processing and management. 51(2015) no.6, S.791-808
He, J.; Meij, E.; Rijke, M. de: Result diversification based on query-specific cluster ranking (2011) 0.00
```
0.0028605436 = product of:
  0.020023804 = sum of:
    0.0050448296 = weight(_text_:information in 4355) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=4355,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 4355, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4355)
    0.014978974 = weight(_text_:retrieval in 4355) [ClassicSimilarity], result of:
      0.014978974 = score(doc=4355,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.16710453 = fieldWeight in 4355, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4355)
  0.14285715 = coord(2/14)
```
Abstract

Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification is restricted to documents belonging to clusters that potentially contain a high percentage of relevant documents. Empirical results show that the proposed framework improves the performance of several existing diversification methods. The framework also gives rise to a simple yet effective cluster-based approach to result diversification that selects documents from different clusters to be included in a ranked list in a round robin fashion. We describe a set of experiments aimed at thoroughly analyzing the behavior of the two main components of the proposed diversification framework, ranking and selecting clusters for diversification. Both components have a crucial impact on the overall performance of our framework, but ranking clusters plays a more important role than selecting clusters. We also examine properties that clusters should have in order for our diversification framework to be effective. Most relevant documents should be contained in a small number of high-quality clusters, while there should be no dominantly large clusters. Also, documents from these high-quality clusters should have a diverse content. These properties are strongly correlated with the overall performance of the proposed diversification framework.

Source

Journal of the American Society for Information Science and Technology. 62(2011) no.3, S.550-571
Bron, M.; Gorp, J. Van; Rijke, M. de: Media studies research in the data-driven age : how research questions evolve (2016) 0.00
```
8.826613E-4 = product of:
  0.012357258 = sum of:
    0.012357258 = weight(_text_:information in 3008) [ClassicSimilarity], result of:
      0.012357258 = score(doc=3008,freq=12.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.23754507 = fieldWeight in 3008, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3008)
  0.071428575 = coord(1/14)
```
Abstract

The introduction of new technologies and access to new information channels continue to change the way media studies researchers work and the questions they seek to answer. We investigate the current practices of media studies researchers and how these practices affect their research questions. Through the analysis of 27 interviews about the research practices of media studies researchers during a research project we developed a model of the activities in their research cycle. We find that information gathering and analysis activities are dominating the research cycle. These activities influence the research outcomes as they determine how research questions asked by media studies researchers evolve. Specifically, we show how research questions are related to the availability and accessibility of data as well as new information sources for contextualization of the research topic. Our contribution is a comprehensive account of the overall research cycle of media studies researchers as well as specific aspects of the research cycle, i.e., information sources, information seeking challenges, and the development of research questions. This work confirms findings of previous work in this area using a previously unstudied group of researchers, as well as providing new details about how research questions evolve.

Source

Journal of the Association for Information Science and Technology. 67(2016) no.7, S.1535-1554

Tsagkias, M.; Larson, M.; Rijke, M. de: Predicting podcast preference : an analysis framework and its application (2010) 0.00

3.6034497E-4 = product of:
  0.0050448296 = sum of:
    0.0050448296 = weight(_text_:information in 3339) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=3339,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 3339, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3339)
  0.071428575 = coord(1/14)

Source: Journal of the American Society for Information Science and Technology. 61(2010) no.2, S.374-391

Huurnink, B.; Hollink, L.; Heuvel, W. van den; Rijke, M. de: Search behavior of media professionals at an audiovisual archive : a transaction log analysis (2010) 0.00

3.6034497E-4 = product of:
  0.0050448296 = sum of:
    0.0050448296 = weight(_text_:information in 3468) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=3468,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 3468, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3468)
  0.071428575 = coord(1/14)

Source: Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1180-1197

Graus, D.; Odijk, D.; Rijke, M. de: ¬The birth of collective memories : analyzing emerging entities in text streams (2018) 0.00

3.6034497E-4 = product of:
  0.0050448296 = sum of:
    0.0050448296 = weight(_text_:information in 4252) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=4252,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 4252, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4252)
  0.071428575 = coord(1/14)

Source: Journal of the Association for Information Science and Technology. 69(2018) no.6, S.773-786

Search (10 results, page 1 of 1)

Authors

Themes