Search (6 results, page 1 of 1)

Zhu, J.; Song, D.; Rüger, S.: Integrating multiple windows and document features for expert finding (2009) 0.09
```
0.08966473 = product of:
  0.13449709 = sum of:
    0.11778076 = weight(_text_:query in 2755) [ClassicSimilarity], result of:
      0.11778076 = score(doc=2755,freq=8.0), product of:
        0.22937049 = queryWeight, product of:
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.049352113 = queryNorm
        0.5134957 = fieldWeight in 2755, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2755)
    0.016716326 = product of:
      0.03343265 = sum of:
        0.03343265 = weight(_text_:22 in 2755) [ClassicSimilarity], result of:
          0.03343265 = score(doc=2755,freq=2.0), product of:
            0.1728227 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049352113 = queryNorm
            0.19345059 = fieldWeight in 2755, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2755)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Expert finding is a key task in enterprise search and has recently attracted lots of attention from both research and industry communities. Given a search topic, a prominent existing approach is to apply some information retrieval (IR) system to retrieve top ranking documents, which will then be used to derive associations between experts and the search topic based on cooccurrences. However, we argue that expert finding is more sensitive to multiple levels of associations and document features that current expert finding systems insufficiently address, including (a) multiple levels of associations between experts and search topics, (b) document internal structure, and (c) document authority. We propose a novel approach that integrates the above-mentioned three aspects as well as a query expansion technique in a two-stage model for expert finding. A systematic evaluation is conducted on TREC collections to test the performance of our approach as well as the effects of multiple windows, document features, and query expansion. These experimental results show that query expansion can dramatically improve expert finding performance with statistical significance. For three well-known IR models with or without query expansion, document internal structures help improve a single window-based approach but without statistical significance, while our novel multiple window-based approach can significantly improve the performance of a single window-based approach both with and without document internal structures.

Date

22. 3.2009 18:55:47
Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.08
```
0.07914498 = product of:
  0.11871746 = sum of:
    0.10200114 = weight(_text_:query in 1428) [ClassicSimilarity], result of:
      0.10200114 = score(doc=1428,freq=6.0), product of:
        0.22937049 = queryWeight, product of:
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.049352113 = queryNorm
        0.44470036 = fieldWeight in 1428, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1428)
    0.016716326 = product of:
      0.03343265 = sum of:
        0.03343265 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
          0.03343265 = score(doc=1428,freq=2.0), product of:
            0.1728227 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049352113 = queryNorm
            0.19345059 = fieldWeight in 1428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1428)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.

Date

22. 3.2003 19:35:46
Kruschwitz, U.; Lungley, D.; Albakour, M-D.; Song, D.: Deriving query suggestions for site search (2013) 0.07
```
0.06510577 = product of:
  0.19531731 = sum of:
    0.19531731 = weight(_text_:query in 1085) [ClassicSimilarity], result of:
      0.19531731 = score(doc=1085,freq=22.0), product of:
        0.22937049 = queryWeight, product of:
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.049352113 = queryNorm
        0.85153633 = fieldWeight in 1085, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1085)
  0.33333334 = coord(1/3)
```
Abstract

Modern search engines have been moving away from simplistic interfaces that aimed at satisfying a user's need with a single-shot query. Interactive features are now integral parts of web search engines. However, generating good query modification suggestions remains a challenging issue. Query log analysis is one of the major strands of work in this direction. Although much research has been performed on query logs collected on the web as a whole, query log analysis to enhance search on smaller and more focused collections has attracted less attention, despite its increasing practical importance. In this article, we report on a systematic study of different query modification methods applied to a substantial query log collected on a local website that already uses an interactive search engine. We conducted experiments in which we asked users to assess the relevance of potential query modification suggestions that have been constructed using a range of log analysis methods and different baseline approaches. The experimental results demonstrate the usefulness of log analysis to extract query modification suggestions. Furthermore, our experiments demonstrate that a more fine-grained approach than grouping search requests into sessions allows for extraction of better refinement terms from query log files.
Li, J.; Zhang, P.; Song, D.; Wu, Y.: Understanding an enriched multidimensional user relevance model by analyzing query logs (2017) 0.05
```
0.04711231 = product of:
  0.14133692 = sum of:
    0.14133692 = weight(_text_:query in 3961) [ClassicSimilarity], result of:
      0.14133692 = score(doc=3961,freq=8.0), product of:
        0.22937049 = queryWeight, product of:
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.049352113 = queryNorm
        0.61619484 = fieldWeight in 3961, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.046875 = fieldNorm(doc=3961)
  0.33333334 = coord(1/3)
```
Abstract

Modeling multidimensional relevance in information retrieval (IR) has attracted much attention in recent years. However, most existing studies are conducted through relatively small-scale user studies, which may not reflect a real-world and natural search scenario. In this article, we propose to study the multidimensional user relevance model (MURM) on large scale query logs, which record users' various search behaviors (e.g., query reformulations, clicks and dwelling time, etc.) in natural search settings. We advance an existing MURM model (including five dimensions: topicality, novelty, reliability, understandability, and scope) by providing two additional dimensions, that is, interest and habit. The two new dimensions represent personalized relevance judgment on retrieved documents. Further, for each dimension in the enriched MURM model, a set of computable features are formulated. By conducting extensive document ranking experiments on Bing's query logs and TREC session Track data, we systematically investigated the impact of each dimension on retrieval performance and gained a series of insightful findings which may bring benefits for the design of future IR systems.
Yan, X.; Li, X.; Song, D.: ¬A correlation analysis on LSA and HAL semantic space models (2004) 0.03
```
0.03331343 = product of:
  0.09994029 = sum of:
    0.09994029 = weight(_text_:query in 2152) [ClassicSimilarity], result of:
      0.09994029 = score(doc=2152,freq=4.0), product of:
        0.22937049 = queryWeight, product of:
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.049352113 = queryNorm
        0.43571556 = fieldWeight in 2152, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.046875 = fieldNorm(doc=2152)
  0.33333334 = coord(1/3)
```
Abstract

In this paper, we compare a well-known semantic spacemodel, Latent Semantic Analysis (LSA) with another model, Hyperspace Analogue to Language (HAL) which is widely used in different area, especially in automatic query refinement. We conduct this comparative analysis to prove our hypothesis that with respect to ability of extracting the lexical information from a corpus of text, LSA is quite similar to HAL. We regard HAL and LSA as black boxes. Through a Pearson's correlation analysis to the outputs of these two black boxes, we conclude that LSA highly co-relates with HAL and thus there is a justification that LSA and HAL can potentially play a similar role in the area of facilitating automatic query refinement. This paper evaluates LSA in a new application area and contributes an effective way to compare different semantic space models.
Hoenkamp, E.; Bruza, P.D.; Song, D.; Huang, Q.: ¬An effective approach to verbose queries using a limited dependencies language model (2009) 0.03
```
0.031408206 = product of:
  0.09422461 = sum of:
    0.09422461 = weight(_text_:query in 2122) [ClassicSimilarity], result of:
      0.09422461 = score(doc=2122,freq=8.0), product of:
        0.22937049 = queryWeight, product of:
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.049352113 = queryNorm
        0.41079655 = fieldWeight in 2122, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.03125 = fieldNorm(doc=2122)
  0.33333334 = coord(1/3)
```
Abstract

Intuitively, any 'bag of words' approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document's initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur's search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.

Search (6 results, page 1 of 1)

Authors

Years

Themes