Search (3 results, page 1 of 1)

Did you mean:
sbss%3a%22Pub A 91 %2f information%22 3

Hoenkamp, E.; Bruza, P.: How everyday language can and will boost effective information retrieval (2015) 0.01
```
0.0072039375 = product of:
  0.018009843 = sum of:
    0.008341924 = weight(_text_:a in 2123) [ClassicSimilarity], result of:
      0.008341924 = score(doc=2123,freq=12.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15602624 = fieldWeight in 2123, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2123)
    0.009667919 = product of:
      0.019335838 = sum of:
        0.019335838 = weight(_text_:information in 2123) [ClassicSimilarity], result of:
          0.019335838 = score(doc=2123,freq=12.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.23754507 = fieldWeight in 2123, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2123)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Typing 2 or 3 keywords into a browser has become an easy and efficient way to find information. Yet, typing even short queries becomes tedious on ever shrinking (virtual) keyboards. Meanwhile, speech processing is maturing rapidly, facilitating everyday language input. Also, wearable technology can inform users proactively by listening in on their conversations or processing their social media interactions. Given these developments, everyday language may soon become the new input of choice. We present an information retrieval (IR) algorithm specifically designed to accept everyday language. It integrates two paradigms of information retrieval, previously studied in isolation; one directed mainly at the surface structure of language, the other primarily at the underlying meaning. The integration was achieved by a Markov machine that encodes meaning by its transition graph, and surface structure by the language it generates. A rigorous evaluation of the approach showed, first, that it can compete with the quality of existing language models, second, that it is more effective the more verbose the input, and third, as a consequence, that it is promising for an imminent transition from keyword input, where the onus is on the user to formulate concise queries, to a modality where users can express more freely, more informal, and more natural their need for information in everyday language.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.8, S.1546-1558

Type

a
Hoenkamp, E.: Unitary operators on the document space (2003) 0.01
```
0.00556948 = product of:
  0.0139237 = sum of:
    0.008341924 = weight(_text_:a in 3457) [ClassicSimilarity], result of:
      0.008341924 = score(doc=3457,freq=12.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15602624 = fieldWeight in 3457, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3457)
    0.0055817757 = product of:
      0.011163551 = sum of:
        0.011163551 = weight(_text_:information in 3457) [ClassicSimilarity], result of:
          0.011163551 = score(doc=3457,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13714671 = fieldWeight in 3457, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3457)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.

Footnote

Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval

Source

Journal of the American Society for Information Science and technology. 54(2003) no.4, S.314-320

Type

a
Hoenkamp, E.; Bruza, P.D.; Song, D.; Huang, Q.: ¬An effective approach to verbose queries using a limited dependencies language model (2009) 0.00
```
0.004868544 = product of:
  0.01217136 = sum of:
    0.00770594 = weight(_text_:a in 2122) [ClassicSimilarity], result of:
      0.00770594 = score(doc=2122,freq=16.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.14413087 = fieldWeight in 2122, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03125 = fieldNorm(doc=2122)
    0.0044654203 = product of:
      0.0089308405 = sum of:
        0.0089308405 = weight(_text_:information in 2122) [ClassicSimilarity], result of:
          0.0089308405 = score(doc=2122,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.10971737 = fieldWeight in 2122, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=2122)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Intuitively, any 'bag of words' approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document's initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur's search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.

Series

Lecture notes in computer science : advances in information retrieval theory; 5766

Source

Second International Conference on the Theory of Information Retrieval, ICTIR 2009 Cambridge, UK, September 10-12, 2009 Proceedings. Ed.: L. Azzopardi

Type

a

Search (3 results, page 1 of 1)

Authors

Years