Search (10 results, page 1 of 1)

Crestani, F.; Dominich, S.; Lalmas, M.; Rijsbergen, C.J.K. van: Mathematical, logical, and formal methods in information retrieval : an introduction to the special issue (2003) 0.01

0.014333045 = product of:
  0.06688754 = sum of:
    0.01482871 = weight(_text_:information in 1451) [ClassicSimilarity], result of:
      0.01482871 = score(doc=1451,freq=12.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.2850541 = fieldWeight in 1451, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1451)
    0.044029012 = weight(_text_:retrieval in 1451) [ClassicSimilarity], result of:
      0.044029012 = score(doc=1451,freq=12.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.49118498 = fieldWeight in 1451, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1451)
    0.008029819 = product of:
      0.024089456 = sum of:
        0.024089456 = weight(_text_:22 in 1451) [ClassicSimilarity], result of:
          0.024089456 = score(doc=1451,freq=2.0), product of:
            0.103770934 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.029633347 = queryNorm
            0.23214069 = fieldWeight in 1451, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1451)
      0.33333334 = coord(1/3)
  0.21428572 = coord(3/14)

Abstract: Research an the use of mathematical, logical, and formal methods, has been central to Information Retrieval research for a long time. Research in this area is important not only because it helps enhancing retrieval effectiveness, but also because it helps clarifying the underlying concepts of Information Retrieval. In this article we outline some of the major aspects of the subject, and summarize the papers of this special issue with respect to how they relate to these aspects. We conclude by highlighting some directions of future research, which are needed to better understand the formal characteristics of Information Retrieval.
Date: 22. 3.2003 19:27:36
Footnote: Einführung zu den Beiträgen eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
Source: Journal of the American Society for Information Science and technology. 54(2003) no.4, S.281-284

Crestani, F.; Du, H.: Written versus spoken queries : a qualitative and quantitative comparative analysis (2006) 0.01

0.011671037 = product of:
  0.05446484 = sum of:
    0.0104854815 = weight(_text_:information in 5047) [ClassicSimilarity], result of:
      0.0104854815 = score(doc=5047,freq=6.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.20156369 = fieldWeight in 5047, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=5047)
    0.03594954 = weight(_text_:retrieval in 5047) [ClassicSimilarity], result of:
      0.03594954 = score(doc=5047,freq=8.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.40105087 = fieldWeight in 5047, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=5047)
    0.008029819 = product of:
      0.024089456 = sum of:
        0.024089456 = weight(_text_:22 in 5047) [ClassicSimilarity], result of:
          0.024089456 = score(doc=5047,freq=2.0), product of:
            0.103770934 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.029633347 = queryNorm
            0.23214069 = fieldWeight in 5047, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=5047)
      0.33333334 = coord(1/3)
  0.21428572 = coord(3/14)

Abstract: The authors report on an experimental study on the differences between spoken and written queries. A set of written and spontaneous spoken queries are generated by users from written topics. These two sets of queries are compared in qualitative terms and in terms of their retrieval effectiveness. Written and spoken queries are compared in terms of length, duration, and part of speech. In addition, assuming perfect transcription of the spoken queries, written and spoken queries are compared in terms of their aptitude to describe relevant documents. The retrieval effectiveness of spoken and written queries is compared using three different information retrieval models. The results show that using speech to formulate one's information need provides a way to express it more naturally and encourages the formulation of longer queries. Despite that, longer spoken queries do not seem to significantly improve retrieval effectiveness compared with written queries.
Date: 5. 6.2006 11:22:23
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.7, S.881-890

Crestani, F.; Wu, S.: Testing the cluster hypothesis in distributed information retrieval (2006) 0.01
```
0.009135511 = product of:
  0.06394857 = sum of:
    0.014268933 = weight(_text_:information in 984) [ClassicSimilarity], result of:
      0.014268933 = score(doc=984,freq=16.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.27429342 = fieldWeight in 984, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=984)
    0.049679637 = weight(_text_:retrieval in 984) [ClassicSimilarity], result of:
      0.049679637 = score(doc=984,freq=22.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.554223 = fieldWeight in 984, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=984)
  0.14285715 = coord(2/14)
```
Abstract

How to merge and organise query results retrieved from different resources is one of the key issues in distributed information retrieval. Some previous research and experiments suggest that cluster-based document browsing is more effective than a single merged list. Cluster-based retrieval results presentation is based on the cluster hypothesis, which states that documents that cluster together have a similar relevance to a given query. However, while this hypothesis has been demonstrated to hold in classical information retrieval environments, it has never been fully tested in heterogeneous distributed information retrieval environments. Heterogeneous document representations, the presence of document duplicates, and disparate qualities of retrieval results, are major features of an heterogeneous distributed information retrieval environment that might disrupt the effectiveness of the cluster hypothesis. In this paper we report on an experimental investigation into the validity and effectiveness of the cluster hypothesis in highly heterogeneous distributed information retrieval environments. The results show that although clustering is affected by different retrieval results representations and quality, the cluster hypothesis still holds and that generating hierarchical clusters in highly heterogeneous distributed information retrieval environments is still a very effective way of presenting retrieval results to users.

Source

Information processing and management. 42(2006) no.5, S.1137-1150

Crestani, F.; Lee, P.L.: Searching the web by constraining spreading activities (2000) 0.01

0.008991993 = product of:
  0.06294395 = sum of:
    0.048818428 = weight(_text_:web in 1326) [ClassicSimilarity], result of:
      0.048818428 = score(doc=1326,freq=2.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.50479853 = fieldWeight in 1326, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.109375 = fieldNorm(doc=1326)
    0.014125523 = weight(_text_:information in 1326) [ClassicSimilarity], result of:
      0.014125523 = score(doc=1326,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.27153665 = fieldWeight in 1326, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=1326)
  0.14285715 = coord(2/14)

Source: Information processing and management. 36(2000) no.4, S.585-605

Crestani, F.: Combination of similarity measures for effective spoken document retrieval (2003) 0.01

0.008009522 = product of:
  0.05606665 = sum of:
    0.014125523 = weight(_text_:information in 4690) [ClassicSimilarity], result of:
      0.014125523 = score(doc=4690,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.27153665 = fieldWeight in 4690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=4690)
    0.04194113 = weight(_text_:retrieval in 4690) [ClassicSimilarity], result of:
      0.04194113 = score(doc=4690,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.46789268 = fieldWeight in 4690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.109375 = fieldNorm(doc=4690)
  0.14285715 = coord(2/14)

Source: Journal of information science. 29(2003) no.2, S.87-96

Simeoni, F.; Yakici, M.; Neely, S.; Crestani, F.: Metadata harvesting for content-based distributed information retrieval (2008) 0.01
```
0.0076678325 = product of:
  0.053674825 = sum of:
    0.008737902 = weight(_text_:information in 1336) [ClassicSimilarity], result of:
      0.008737902 = score(doc=1336,freq=6.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.16796975 = fieldWeight in 1336, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1336)
    0.04493692 = weight(_text_:retrieval in 1336) [ClassicSimilarity], result of:
      0.04493692 = score(doc=1336,freq=18.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.50131357 = fieldWeight in 1336, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1336)
  0.14285715 = coord(2/14)
```
Abstract

We propose an approach to content-based Distributed Information Retrieval based on the periodic and incremental centralization of full-content indices of widely dispersed and autonomously managed document sources. Inspired by the success of the Open Archive Initiative's (OAI) Protocol for metadata harvesting, the approach occupies middle ground between content crawling and distributed retrieval. As in crawling, some data move toward the retrieval process, but it is statistics about the content rather than content itself; this grants more efficient use of network resources and wider scope of application. As in distributed retrieval, some processing is distributed along with the data, but it is indexing rather than retrieval; this reduces the costs of content provision while promoting the simplicity, effectiveness, and responsiveness of retrieval. Overall, we argue that the approach retains the good properties of centralized retrieval without renouncing to cost-effective, large-scale resource pooling. We discuss the requirements associated with the approach and identify two strategies to deploy it on top of the OAI infrastructure. In particular, we define a minimal extension of the OAI protocol which supports the coordinated harvesting of full-content indices and descriptive metadata for content resources. Finally, we report on the implementation of a proof-of-concept prototype service for multimodel content-based retrieval of distributed file collections.

Source

Journal of the American Society for Information Science and Technology. 59(2008) no.1, S.12-24
Crestani, F.; Vegas, J.; Fuente, P. de la: ¬A graphical user interface for the retrieval of hierarchically structured documents (2004) 0.01
```
0.007239756 = product of:
  0.05067829 = sum of:
    0.0104854815 = weight(_text_:information in 2555) [ClassicSimilarity], result of:
      0.0104854815 = score(doc=2555,freq=6.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.20156369 = fieldWeight in 2555, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2555)
    0.04019281 = weight(_text_:retrieval in 2555) [ClassicSimilarity], result of:
      0.04019281 = score(doc=2555,freq=10.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.44838852 = fieldWeight in 2555, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2555)
  0.14285715 = coord(2/14)
```
Abstract

Past research has proved that graphical user interfaces (GUIs) can significantly improve the effectiveness of the information access task. Our work is based on the consideration that structured document retrieval requires different user graphical interfaces from standard information retrieval. In structured document retrieval a GUI has to enable a user to query, browse retrieved documents, provide query refinement and relevance feedback based not only on full documents, but also on specific document parts in relation to the document structure. In this paper, we present a new GUI for structured document retrieval specifically designed for hierarchically structured documents. A user task-oriented evaluation has shown that the proposed interface provides the user with an intuitive and powerful set of tools for structured document searching, retrieved list navigation, and search refinement.

Source

Information processing and management. 40(2004) no.2, S.269-289

Bache, R.; Baillie, M.; Crestani, F.: Measuring the likelihood property of scoring functions in general retrieval models (2009) 0.01

0.007000556 = product of:
  0.04900389 = sum of:
    0.0070627616 = weight(_text_:information in 2860) [ClassicSimilarity], result of:
      0.0070627616 = score(doc=2860,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13576832 = fieldWeight in 2860, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2860)
    0.04194113 = weight(_text_:retrieval in 2860) [ClassicSimilarity], result of:
      0.04194113 = score(doc=2860,freq=8.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.46789268 = fieldWeight in 2860, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2860)
  0.14285715 = coord(2/14)

Abstract: Although retrieval systems based on probabilistic models will rank the objects (e.g., documents) being retrieved according to the probability of some matching criterion (e.g., relevance), they rarely yield an actual probability, and the scoring function is interpreted to be purely ordinal within a given retrieval task. In this brief communication, it is shown that some scoring functions possess the likelihood property, which means that the scoring function indicates the likelihood of matching when compared to other retrieval tasks, which is potentially more useful than pure ranking although it cannot be interpreted as an actual probability. This property can be detected by using two modified effectiveness measures: entire precision and entire recall.
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.6, S.1294-1297

Tombros, T.; Crestani, F.: Users' perception of relevance of spoken documents (2000) 0.00

0.0044226884 = product of:
  0.030958816 = sum of:
    0.009988253 = weight(_text_:information in 4996) [ClassicSimilarity], result of:
      0.009988253 = score(doc=4996,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.1920054 = fieldWeight in 4996, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4996)
    0.020970564 = weight(_text_:retrieval in 4996) [ClassicSimilarity], result of:
      0.020970564 = score(doc=4996,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.23394634 = fieldWeight in 4996, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4996)
  0.14285715 = coord(2/14)

Abstract: We present the results of a study of user's perception of relevance of documents. The aim is to study experimentally how users' perception varies depending on the form that retrieved documents are presented. Documents retrieved in response to a query are presented to users in a variety of ways, from full text to a machine spoken query-biased automatically-generated summary, and the difference in users' perception of relevance is studied. The experimental results suggest that the effectiveness of advanced multimedia Information Retrieval applications may be affected by the low level of users' perception of relevance of retrieved documents
Source: Journal of the American Society for Information Science. 51(2000) no.10, S.929-939

Sweeney, S.; Crestani, F.; Losada, D.E.: 'Show me more' : incremental length summarisation using novelty detection (2008) 0.00
```
8.826613E-4 = product of:
  0.012357258 = sum of:
    0.012357258 = weight(_text_:information in 2054) [ClassicSimilarity], result of:
      0.012357258 = score(doc=2054,freq=12.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.23754507 = fieldWeight in 2054, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2054)
  0.071428575 = coord(1/14)
```
Abstract

The paper presents a study investigating the effects of incorporating novelty detection in automatic text summarisation. Condensing a textual document, automatic text summarisation can reduce the need to refer to the source document. It also offers a means to deliver device-friendly content when accessing information in non-traditional environments. An effective method of summarisation could be to produce a summary that includes only novel information. However, a consequence of focusing exclusively on novel parts may result in a loss of context, which may have an impact on the correct interpretation of the summary, with respect to the source document. In this study we compare two strategies to produce summaries that incorporate novelty in different ways: a constant length summary, which contains only novel sentences, and an incremental summary, containing additional sentences that provide context. The aim is to establish whether a summary that contains only novel sentences provides sufficient basis to determine relevance of a document, or if indeed we need to include additional sentences to provide context. Findings from the study seem to suggest that there is only a minimal difference in performance for the tasks we set our users and that the presence of contextual information is not so important. However, for the case of mobile information access, a summary that contains only novel information does offer benefits, given bandwidth constraints.

Source

Information processing and management. 44(2008) no.2, S.663-686

Search (10 results, page 1 of 1)

Authors

Themes