Search (10 results, page 1 of 1)

Lalmas, M.: XML retrieval (2009) 0.02
```
0.018127304 = product of:
  0.08459408 = sum of:
    0.032137483 = weight(_text_:wide in 4998) [ClassicSimilarity], result of:
      0.032137483 = score(doc=4998,freq=2.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.24476713 = fieldWeight in 4998, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4998)
    0.010089659 = weight(_text_:information in 4998) [ClassicSimilarity], result of:
      0.010089659 = score(doc=4998,freq=8.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.19395474 = fieldWeight in 4998, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4998)
    0.042366937 = weight(_text_:retrieval in 4998) [ClassicSimilarity], result of:
      0.042366937 = score(doc=4998,freq=16.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.47264296 = fieldWeight in 4998, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4998)
  0.21428572 = coord(3/14)
```
Abstract

Documents usually have a content and a structure. The content refers to the text of the document, whereas the structure refers to how a document is logically organized. An increasingly common way to encode the structure is through the use of a mark-up language. Nowadays, the most widely used mark-up language for representing structure is the eXtensible Mark-up Language (XML). XML can be used to provide a focused access to documents, i.e. returning XML elements, such as sections and paragraphs, instead of whole documents in response to a query. Such focused strategies are of particular benefit for information repositories containing long documents, or documents covering a wide variety of topics, where users are directed to the most relevant content within a document. The increased adoption of XML to represent a document structure requires the development of tools to effectively access documents marked-up in XML. This book provides a detailed description of query languages, indexing strategies, ranking algorithms, presentation scenarios developed to access XML documents. Major advances in XML retrieval were seen from 2002 as a result of INEX, the Initiative for Evaluation of XML Retrieval. INEX, also described in this book, provided test sets for evaluating XML retrieval effectiveness. Many of the developments and results described in this book were investigated within INEX.

Content

Table of Contents: Introduction / Basic XML Concepts / Historical Perspectives / Query Languages / Indexing Strategies / Ranking Strategies / Presentation Strategies / Evaluating XML Retrieval Effectiveness / Conclusions

LCSH

Information retrieval

Series

Synthesis lectures on information concepts, retrieval & services; 7

Subject

Information retrieval

Crestani, F.; Dominich, S.; Lalmas, M.; Rijsbergen, C.J.K. van: Mathematical, logical, and formal methods in information retrieval : an introduction to the special issue (2003) 0.01

0.014333045 = product of:
  0.06688754 = sum of:
    0.01482871 = weight(_text_:information in 1451) [ClassicSimilarity], result of:
      0.01482871 = score(doc=1451,freq=12.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.2850541 = fieldWeight in 1451, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1451)
    0.044029012 = weight(_text_:retrieval in 1451) [ClassicSimilarity], result of:
      0.044029012 = score(doc=1451,freq=12.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.49118498 = fieldWeight in 1451, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1451)
    0.008029819 = product of:
      0.024089456 = sum of:
        0.024089456 = weight(_text_:22 in 1451) [ClassicSimilarity], result of:
          0.024089456 = score(doc=1451,freq=2.0), product of:
            0.103770934 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.029633347 = queryNorm
            0.23214069 = fieldWeight in 1451, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1451)
      0.33333334 = coord(1/3)
  0.21428572 = coord(3/14)

Abstract: Research an the use of mathematical, logical, and formal methods, has been central to Information Retrieval research for a long time. Research in this area is important not only because it helps enhancing retrieval effectiveness, but also because it helps clarifying the underlying concepts of Information Retrieval. In this article we outline some of the major aspects of the subject, and summarize the papers of this special issue with respect to how they relate to these aspects. We conclude by highlighting some directions of future research, which are needed to better understand the formal characteristics of Information Retrieval.
Date: 22. 3.2003 19:27:36
Footnote: Einführung zu den Beiträgen eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
Source: Journal of the American Society for Information Science and technology. 54(2003) no.4, S.281-284

Dominich, S.; Lalmas, M.; Rijsbergen, C.J.K. van: Special issue on model design, formulation and explanation in information retrieval using mathematics (2006) 0.01

0.010258756 = product of:
  0.07181129 = sum of:
    0.020970963 = weight(_text_:information in 110) [ClassicSimilarity], result of:
      0.020970963 = score(doc=110,freq=6.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.40312737 = fieldWeight in 110, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.09375 = fieldNorm(doc=110)
    0.050840326 = weight(_text_:retrieval in 110) [ClassicSimilarity], result of:
      0.050840326 = score(doc=110,freq=4.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.5671716 = fieldWeight in 110, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.09375 = fieldNorm(doc=110)
  0.14285715 = coord(2/14)

Footnote: Einführung in einen thematischen Schwerpunkt "Formal Methods for Information Retrieval"
Source: Information processing and management. 42(2006) no.1, S.1-3

Reid, J.; Lalmas, M.; Finesilver, K.; Hertzum, M.: Best entry points for structured document retrieval : part II: types, usage and effectiveness (2006) 0.01

0.008446382 = product of:
  0.05912467 = sum of:
    0.012233062 = weight(_text_:information in 961) [ClassicSimilarity], result of:
      0.012233062 = score(doc=961,freq=6.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.23515764 = fieldWeight in 961, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=961)
    0.046891607 = weight(_text_:retrieval in 961) [ClassicSimilarity], result of:
      0.046891607 = score(doc=961,freq=10.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.5231199 = fieldWeight in 961, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=961)
  0.14285715 = coord(2/14)

Abstract: Structured document retrieval makes use of document components as the basis of the retrieval process, rather than complete documents. The inherent relationships between these components make it vital to support users' natural browsing behaviour in order to offer effective and efficient access to structured documents. This paper examines the concept of best entry points, which are document components from which the user can browse to obtain optimal access to relevant document components. It investigates at the types of best entry points in structured document retrieval, and their usage and effectiveness in real information search tasks.
Footnote: Beitrag innerhalb eines thematischen Schwerpunktes "Formal Methods for Information Retrieval"
Source: Information processing and management. 42(2006) no.1, S.89-105

Kazai, G.; Lalmas, M.: ¬The overlap problem in content-oriented XML retrieval evaluation (2004) 0.01

0.0074937996 = product of:
  0.052456595 = sum of:
    0.010089659 = weight(_text_:information in 4083) [ClassicSimilarity], result of:
      0.010089659 = score(doc=4083,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.19395474 = fieldWeight in 4083, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=4083)
    0.042366937 = weight(_text_:retrieval in 4083) [ClassicSimilarity], result of:
      0.042366937 = score(doc=4083,freq=4.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.47264296 = fieldWeight in 4083, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=4083)
  0.14285715 = coord(2/14)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Reid, J.; Lalmas, M.; Finesilver, K.; Hertzum, M.: Best entry points for structured document retrieval : part I: characteristics (2006) 0.01

0.0074184835 = product of:
  0.05192938 = sum of:
    0.009988253 = weight(_text_:information in 960) [ClassicSimilarity], result of:
      0.009988253 = score(doc=960,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.1920054 = fieldWeight in 960, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=960)
    0.04194113 = weight(_text_:retrieval in 960) [ClassicSimilarity], result of:
      0.04194113 = score(doc=960,freq=8.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.46789268 = fieldWeight in 960, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=960)
  0.14285715 = coord(2/14)

Abstract: Structured document retrieval makes use of document components as the basis of the retrieval process, rather than complete documents. The inherent relationships between these components make it vital to support users' natural browsing behaviour in order to offer effective and efficient access to structured documents. This paper examines the concept of best entry points, which are document components from which the user can browse to obtain optimal access to relevant document components. In particular this paper investigates the basic characteristics of best entry points.
Footnote: Beitrag innerhalb eines thematischen Schwerpunktes "Formal Methods for Information Retrieval"
Source: Information processing and management. 42(2006) no.1, S.74-88

Kazai, G.; Lalmas, M.; Fuhr, N.; Gövert, N.: ¬A report an the first year of the INitiative for the Evaluation of XML Retrieval (INEX'02) (2004) 0.01

0.0061978353 = product of:
  0.043384846 = sum of:
    0.0070627616 = weight(_text_:information in 2267) [ClassicSimilarity], result of:
      0.0070627616 = score(doc=2267,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13576832 = fieldWeight in 2267, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2267)
    0.036322083 = weight(_text_:retrieval in 2267) [ClassicSimilarity], result of:
      0.036322083 = score(doc=2267,freq=6.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.40520695 = fieldWeight in 2267, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2267)
  0.14285715 = coord(2/14)

Abstract: The INitiative for the Evaluation of XML retrieval (INEX) aims at providing an infrastructure to evaluate the effectiveness of content-oriented XML retrieval systems. To this end, in the first round of INEX in 2002, a test collection of real world XML documents along with a set of topics and respective relevance assessments have been created with the collaboration of 36 participating organizations. In this article, we provide an overview of the first round of the INEX initiative.
Source: Journal of the American Society for Information Science and Technology. 55(2004) no.6, S.551-556

Lalmas, M.: XML information retrieval (2009) 0.00

0.0044226884 = product of:
  0.030958816 = sum of:
    0.009988253 = weight(_text_:information in 3880) [ClassicSimilarity], result of:
      0.009988253 = score(doc=3880,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.1920054 = fieldWeight in 3880, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3880)
    0.020970564 = weight(_text_:retrieval in 3880) [ClassicSimilarity], result of:
      0.020970564 = score(doc=3880,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.23394634 = fieldWeight in 3880, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3880)
  0.14285715 = coord(2/14)

Source: Encyclopedia of library and information sciences. 3rd ed. Ed.: M.J. Bates

Ruthven, I.; Lalmas, M.; Rijsbergen, K. van: Combining and selecting characteristics of information use (2002) 0.00
```
0.003574072 = product of:
  0.025018502 = sum of:
    0.008071727 = weight(_text_:information in 5208) [ClassicSimilarity], result of:
      0.008071727 = score(doc=5208,freq=8.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.1551638 = fieldWeight in 5208, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=5208)
    0.016946774 = weight(_text_:retrieval in 5208) [ClassicSimilarity], result of:
      0.016946774 = score(doc=5208,freq=4.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.18905719 = fieldWeight in 5208, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=5208)
  0.14285715 = coord(2/14)
```
Abstract

Ruthven, Lalmas, and van Rijsbergen use traditional term importance measures like inverse document frequency, noise, based upon in-document frequency, and term frequency supplemented by theme value which is calculated from differences of expected positions of words in a text from their actual positions, on the assumption that even distribution indicates term association with a main topic, and context, which is based on a query term's distance from the nearest other query term relative to the average expected distribution of all query terms in the document. They then define document characteristics like specificity, the sum of all idf values in a document over the total terms in the document, or document complexity, measured by the documents average idf value; and information to noise ratio, info-noise, tokens after stopping and stemming over tokens before these processes, measuring the ratio of useful and non-useful information in a document. Retrieval tests are then carried out using each characteristic, combinations of the characteristics, and relevance feedback to determine the correct combination of characteristics. A file ranks independently of query terms by both specificity and info-noise, but if presence of a query term is required unique rankings are generated. Tested on five standard collections the traditional characteristics out preformed the new characteristics, which did, however, out preform random retrieval. All possible combinations of characteristics were also tested both with and without a set of scaling weights applied. All characteristics can benefit by combination with another characteristic or set of characteristics and performance as a single characteristic is a good indicator of performance in combination. Larger combinations tended to be more effective than smaller ones and weighting increased precision measures of middle ranking combinations but decreased the ranking of poorer combinations. The best combinations vary for each collection, and in some collections with the addition of weighting. Finally, with all documents ranked by the all characteristics combination, they take the top 30 documents and calculate the characteristic scores for each term in both the relevant and the non-relevant sets. Then taking for each query term the characteristics whose average was higher for relevant than non-relevant documents the documents are re-ranked. The relevance feedback method of selecting characteristics can select a good set of characteristics for query terms.

Source

Journal of the American Society for Information Science and technology. 53(2002) no.5, S.378-396
Ruthven, T.; Lalmas, M.; Rijsbergen, K.van: Incorporating user research behavior into relevance feedback (2003) 0.00
```
0.0033881254 = product of:
  0.023716876 = sum of:
    0.008737902 = weight(_text_:information in 5169) [ClassicSimilarity], result of:
      0.008737902 = score(doc=5169,freq=6.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.16796975 = fieldWeight in 5169, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5169)
    0.014978974 = weight(_text_:retrieval in 5169) [ClassicSimilarity], result of:
      0.014978974 = score(doc=5169,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.16710453 = fieldWeight in 5169, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5169)
  0.14285715 = coord(2/14)
```
Abstract

Ruthven, Mounia, and van Rijsbergen rank and select terms for query expansion using information gathered on searcher evaluation behavior. Using the TREC Financial Times and Los Angeles Times collections and search topics from TREC-6 placed in simulated work situations, six student subjects each preformed three searches on an experimental system and three on a control system with instructions to search by natural language expression in any way they found comfortable. Searching was analyzed for behavior differences between experimental and control situations, and for effectiveness and perceptions. In three experiments paired t-tests were the analysis tool with controls being a no relevance feedback system, a standard ranking for automatic expansion system, and a standard ranking for interactive expansion while the experimental systems based ranking upon user information on temporal relevance and partial relevance. Two further experiments compare using user behavior (number assessed relevant and similarity of relevant documents) to choose a query expansion technique against a non-selective technique and finally the effect of providing the user with knowledge of the process. When partial relevance data and time of assessment data are incorporated in term ranking more relevant documents were recovered in fewer iterations, however retrieval effectiveness overall was not improved. The subjects, none-the-less, rated the suggested terms as more useful and used them more heavily. Explanations of what the feedback techniques were doing led to higher use of the techniques.

Source

Journal of the American Society for Information Science and technology. 54(2003) no.6, S.528-548

Search (10 results, page 1 of 1)

Authors

Types

Themes

Subjects

Classifications