Search (493 results, page 1 of 25)

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.40

0.40061224 = product of:
  0.60091835 = sum of:
    0.18458658 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.18458658 = score(doc=563,freq=2.0), product of:
        0.32843533 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038739666 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.007914125 = weight(_text_:information in 563) [ClassicSimilarity], result of:
      0.007914125 = score(doc=563,freq=2.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.116372846 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.023498412 = weight(_text_:retrieval in 563) [ClassicSimilarity], result of:
      0.023498412 = score(doc=563,freq=2.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.20052543 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.18458658 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.18458658 = score(doc=563,freq=2.0), product of:
        0.32843533 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038739666 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.18458658 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.18458658 = score(doc=563,freq=2.0), product of:
        0.32843533 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038739666 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.01574607 = product of:
      0.03149214 = sum of:
        0.03149214 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
          0.03149214 = score(doc=563,freq=2.0), product of:
            0.13565971 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038739666 = queryNorm
            0.23214069 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
      0.5 = coord(1/2)
  0.6666667 = coord(6/9)

Abstract: In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Content: A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Date: 10. 1.2013 19:22:47

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.35

0.35057482 = product of:
  0.6310347 = sum of:
    0.06152886 = product of:
      0.18458658 = sum of:
        0.18458658 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.18458658 = score(doc=562,freq=2.0), product of:
            0.32843533 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.038739666 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.18458658 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.18458658 = score(doc=562,freq=2.0), product of:
        0.32843533 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038739666 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.18458658 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.18458658 = score(doc=562,freq=2.0), product of:
        0.32843533 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038739666 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.18458658 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.18458658 = score(doc=562,freq=2.0), product of:
        0.32843533 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038739666 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.01574607 = product of:
      0.03149214 = sum of:
        0.03149214 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.03149214 = score(doc=562,freq=2.0), product of:
            0.13565971 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038739666 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.5555556 = coord(5/9)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.27

0.2734616 = product of:
  0.6152886 = sum of:
    0.06152886 = product of:
      0.18458658 = sum of:
        0.18458658 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.18458658 = score(doc=862,freq=2.0), product of:
            0.32843533 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.038739666 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
    0.18458658 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.18458658 = score(doc=862,freq=2.0), product of:
        0.32843533 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038739666 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.18458658 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.18458658 = score(doc=862,freq=2.0), product of:
        0.32843533 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038739666 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.18458658 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.18458658 = score(doc=862,freq=2.0), product of:
        0.32843533 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038739666 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
  0.44444445 = coord(4/9)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Wacholder, N.; Byrd, R.J.: Retrieving information from full text using linguistic knowledge (1994) 0.08

0.08375433 = product of:
  0.18844724 = sum of:
    0.08076138 = weight(_text_:line in 8524) [ClassicSimilarity], result of:
      0.08076138 = score(doc=8524,freq=2.0), product of:
        0.21724595 = queryWeight, product of:
          5.6078424 = idf(docFreq=440, maxDocs=44218)
          0.038739666 = queryNorm
        0.37175092 = fieldWeight in 8524, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.6078424 = idf(docFreq=440, maxDocs=44218)
          0.046875 = fieldNorm(doc=8524)
    0.013707667 = weight(_text_:information in 8524) [ClassicSimilarity], result of:
      0.013707667 = score(doc=8524,freq=6.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.20156369 = fieldWeight in 8524, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=8524)
    0.023498412 = weight(_text_:retrieval in 8524) [ClassicSimilarity], result of:
      0.023498412 = score(doc=8524,freq=2.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.20052543 = fieldWeight in 8524, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=8524)
    0.07047977 = weight(_text_:techniques in 8524) [ClassicSimilarity], result of:
      0.07047977 = score(doc=8524,freq=4.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.4129904 = fieldWeight in 8524, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=8524)
  0.44444445 = coord(4/9)

Abstract: Examines how techniques in the field of natural language processing can be applied to the analysis of text in information retrieval. State of the art text searching programs cannot distinguish, for example, between occurrences of the sickness, AIDS and aids as tool or between library school and school nor equate such terms as online or on-line which are variants of the same form. To make these distinction, systems must incorporate knowledge about the meaning of words in context. Research in natural language processing has concentrated on the automatic 'understanding' of language; how to analyze the grammatical structure and meaning of text. Although many asoects of this research remain experimental, describes how these techniques to recognize spelling variants, names, acronyms, and abbreviations
Imprint: Medford, NJ : Learned Information

Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.05

0.05461249 = product of:
  0.122878104 = sum of:
    0.067301154 = weight(_text_:line in 2541) [ClassicSimilarity], result of:
      0.067301154 = score(doc=2541,freq=2.0), product of:
        0.21724595 = queryWeight, product of:
          5.6078424 = idf(docFreq=440, maxDocs=44218)
          0.038739666 = queryNorm
        0.30979243 = fieldWeight in 2541, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.6078424 = idf(docFreq=440, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
    0.009326885 = weight(_text_:information in 2541) [ClassicSimilarity], result of:
      0.009326885 = score(doc=2541,freq=4.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.13714671 = fieldWeight in 2541, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
    0.027693143 = weight(_text_:retrieval in 2541) [ClassicSimilarity], result of:
      0.027693143 = score(doc=2541,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.23632148 = fieldWeight in 2541, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
    0.01855692 = product of:
      0.03711384 = sum of:
        0.03711384 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
          0.03711384 = score(doc=2541,freq=4.0), product of:
            0.13565971 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038739666 = queryNorm
            0.27358043 = fieldWeight in 2541, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2541)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)

Abstract: The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.
Date: 14. 8.2004 17:22:56
Source: Online. 28(2004) no.3, S.22-29

Bowker, L.: Information retrieval in translation memory systems : assessment of current limitations and possibilities for future development (2002) 0.05

0.05264769 = product of:
  0.15794307 = sum of:
    0.018466292 = weight(_text_:information in 1854) [ClassicSimilarity], result of:
      0.018466292 = score(doc=1854,freq=8.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.27153665 = fieldWeight in 1854, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1854)
    0.0387704 = weight(_text_:retrieval in 1854) [ClassicSimilarity], result of:
      0.0387704 = score(doc=1854,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.33085006 = fieldWeight in 1854, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1854)
    0.100706376 = weight(_text_:techniques in 1854) [ClassicSimilarity], result of:
      0.100706376 = score(doc=1854,freq=6.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.5901092 = fieldWeight in 1854, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1854)
  0.33333334 = coord(3/9)

Abstract: A translation memory system is a new type of human language technology (HLT) tool that is gaining popularity among translators. Such tools allow translators to store previously translated texts in a type of aligned bilingual database, and to recycle relevant parts of these texts when producing new translations. Currently, these tools retrieve information from the database using superficial character string matching, which often results in poor precision and recall. This paper explains how translation memory systems work, and it considers some possible ways for introducing more sophisticated information retrieval techniques into such systems by taking syntactic and semantic similarity into account. Some of the suggested techniques are inspired by these used in other areas of HLT, and some by techniques used in information science.

Kettunen, K.: Reductive and generative approaches to management of morphological variation of keywords in monolingual information retrieval : an overview (2009) 0.05

0.052528843 = product of:
  0.15758653 = sum of:
    0.013707667 = weight(_text_:information in 2835) [ClassicSimilarity], result of:
      0.013707667 = score(doc=2835,freq=6.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.20156369 = fieldWeight in 2835, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2835)
    0.057559118 = weight(_text_:retrieval in 2835) [ClassicSimilarity], result of:
      0.057559118 = score(doc=2835,freq=12.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.49118498 = fieldWeight in 2835, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2835)
    0.086319745 = weight(_text_:techniques in 2835) [ClassicSimilarity], result of:
      0.086319745 = score(doc=2835,freq=6.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.5058079 = fieldWeight in 2835, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=2835)
  0.33333334 = coord(3/9)

Abstract: Purpose - The purpose of this article is to discuss advantages and disadvantages of various means to manage morphological variation of keywords in monolingual information retrieval. Design/methodology/approach - The authors present a compilation of query results from 11 mostly European languages and a new general classification of the language dependent techniques for management of morphological variation. Variants of the different techniques are compared in some detail in terms of retrieval effectiveness and other criteria. The paper consists mainly of an overview of different management methods for keyword variation in information retrieval. Typical IR retrieval results of 11 languages and a new classification for keyword management methods are also presented. Findings - The main results of the paper are an overall comparison of reductive and generative keyword management methods in terms of retrieval effectiveness and other broader criteria. Originality/value - The paper is of value to anyone who wants to get an overall picture of keyword management techniques used in IR.

Galvez, C.; Moya-Anegón, F. de; Solana, V.H.: Term conflation methods in information retrieval : non-linguistic and linguistic approaches (2005) 0.05

0.048032492 = product of:
  0.14409748 = sum of:
    0.011192262 = weight(_text_:information in 4394) [ClassicSimilarity], result of:
      0.011192262 = score(doc=4394,freq=4.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.16457605 = fieldWeight in 4394, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4394)
    0.033231772 = weight(_text_:retrieval in 4394) [ClassicSimilarity], result of:
      0.033231772 = score(doc=4394,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.2835858 = fieldWeight in 4394, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=4394)
    0.09967345 = weight(_text_:techniques in 4394) [ClassicSimilarity], result of:
      0.09967345 = score(doc=4394,freq=8.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.5840566 = fieldWeight in 4394, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=4394)
  0.33333334 = coord(3/9)

Abstract: Purpose - To propose a categorization of the different conflation procedures at the two basic approaches, non-linguistic and linguistic techniques, and to justify the application of normalization methods within the framework of linguistic techniques. Design/methodology/approach - Presents a range of term conflation methods, that can be used in information retrieval. The uniterm and multiterm variants can be considered equivalent units for the purposes of automatic indexing. Stemming algorithms, segmentation rules, association measures and clustering techniques are well evaluated non-linguistic methods, and experiments with these techniques show a wide variety of results. Alternatively, the lemmatisation and the use of syntactic pattern-matching, through equivalence relations represented in finite-state transducers (FST), are emerging methods for the recognition and standardization of terms. Findings - The survey attempts to point out the positive and negative effects of the linguistic approach and its potential as a term conflation method. Originality/value - Outlines the importance of FSTs for the normalization of term variants.

Kreymer, O.: ¬An evaluation of help mechanisms in natural language information retrieval systems (2002) 0.05

0.047987543 = product of:
  0.14396262 = sum of:
    0.020938806 = weight(_text_:information in 2557) [ClassicSimilarity], result of:
      0.020938806 = score(doc=2557,freq=14.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.3078936 = fieldWeight in 2557, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2557)
    0.05254405 = weight(_text_:retrieval in 2557) [ClassicSimilarity], result of:
      0.05254405 = score(doc=2557,freq=10.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.44838852 = fieldWeight in 2557, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2557)
    0.07047977 = weight(_text_:techniques in 2557) [ClassicSimilarity], result of:
      0.07047977 = score(doc=2557,freq=4.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.4129904 = fieldWeight in 2557, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=2557)
  0.33333334 = coord(3/9)

Abstract: The field of natural language processing (NLP) demonstrates rapid changes in the design of information retrieval systems and human-computer interaction. While natural language is being looked on as the most effective tool for information retrieval in a contemporary information environment, the systems using it are only beginning to emerge. This study attempts to evaluate the current state of NLP information retrieval systems from the user's point of view: what techniques are used by these systems to guide their users through the search process? The analysis focused on the structure and components of the systems' help mechanisms. Results of the study demonstrated that systems which claimed to be using natural language searching in fact used a wide range of information retrieval techniques from real natural language processing to Boolean searching. As a result, the user assistance mechanisms of these systems also varied. While pseudo-NLP systems would suit a more traditional method of instruction, real NLP systems primarily utilised the methods of explanation and user-system dialogue.
Source: Online information review. 26(2002) no.1, S.30-39

Robertson, S.E.; Sparck Jones, K.: Relevance weighting of search terms (1976) 0.05

0.04786038 = product of:
  0.14358114 = sum of:
    0.018276889 = weight(_text_:information in 71) [ClassicSimilarity], result of:
      0.018276889 = score(doc=71,freq=6.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.2687516 = fieldWeight in 71, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=71)
    0.031331215 = weight(_text_:retrieval in 71) [ClassicSimilarity], result of:
      0.031331215 = score(doc=71,freq=2.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.26736724 = fieldWeight in 71, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=71)
    0.09397303 = weight(_text_:techniques in 71) [ClassicSimilarity], result of:
      0.09397303 = score(doc=71,freq=4.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.5506539 = fieldWeight in 71, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0625 = fieldNorm(doc=71)
  0.33333334 = coord(3/9)

Abstract: Examines statistical techniques for exploiting relevance information to weight search terms. These techniques are presented as a natural extension of weighting methods using information about the distribution of index terms in documents in general. A series of relevance weighting functions is derived and is justified by theoretical considerations. In particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval. Different applications of relevance weighting are illustrated by experimental results for test collections
Source: Journal of the American Society for Information Science. 27(1976), S.129-146

Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.05

0.04728058 = product of:
  0.14184174 = sum of:
    0.017696522 = weight(_text_:information in 2502) [ClassicSimilarity], result of:
      0.017696522 = score(doc=2502,freq=10.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.2602176 = fieldWeight in 2502, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2502)
    0.0743085 = weight(_text_:retrieval in 2502) [ClassicSimilarity], result of:
      0.0743085 = score(doc=2502,freq=20.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.63411707 = fieldWeight in 2502, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2502)
    0.049836725 = weight(_text_:techniques in 2502) [ClassicSimilarity], result of:
      0.049836725 = score(doc=2502,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.2920283 = fieldWeight in 2502, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=2502)
  0.33333334 = coord(3/9)

Abstract: Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.
Source: Journal of the American Society for Information Science and Technology. 55(2004) no.10, S.859-868

Salton, G.: Automatic processing of foreign language documents (1985) 0.04
```
0.044460833 = product of:
  0.1333825 = sum of:
    0.010552166 = weight(_text_:information in 3650) [ClassicSimilarity], result of:
      0.010552166 = score(doc=3650,freq=8.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.1551638 = fieldWeight in 3650, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=3650)
    0.0414473 = weight(_text_:retrieval in 3650) [ClassicSimilarity], result of:
      0.0414473 = score(doc=3650,freq=14.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.3536936 = fieldWeight in 3650, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=3650)
    0.08138303 = weight(_text_:techniques in 3650) [ClassicSimilarity], result of:
      0.08138303 = score(doc=3650,freq=12.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.47688022 = fieldWeight in 3650, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.03125 = fieldNorm(doc=3650)
  0.33333334 = coord(3/9)
```
Abstract

The attempt to computerize a process, such as indexing, abstracting, classifying, or retrieving information, begins with an analysis of the process into its intellectual and nonintellectual components. That part of the process which is amenable to computerization is mechanical or algorithmic. What is not is intellectual or creative and requires human intervention. Gerard Salton has been an innovator, experimenter, and promoter in the area of mechanized information systems since the early 1960s. He has been particularly ingenious at analyzing the process of information retrieval into its algorithmic components. He received a doctorate in applied mathematics from Harvard University before moving to the computer science department at Cornell, where he developed a prototype automatic retrieval system called SMART. Working with this system he and his students contributed for over a decade to our theoretical understanding of the retrieval process. On a more practical level, they have contributed design criteria for operating retrieval systems. The following selection presents one of the early descriptions of the SMART system; it is valuable as it shows the direction automatic retrieval methods were to take beyond simple word-matching techniques. These include various word normalization techniques to improve recall, for instance, the separation of words into stems and affixes; the correlation and clustering, using statistical association measures, of related terms; and the identification, using a concept thesaurus, of synonymous, broader, narrower, and sibling terms. They include, as weIl, techniques, both linguistic and statistical, to deal with the thorny problem of how to automatically extract from texts index terms that consist of more than one word. They include weighting techniques and various documentrequest matching algorithms. Significant among the latter are those which produce a retrieval output of citations ranked in relevante order. During the 1970s, Salton and his students went an to further refine these various techniques, particularly the weighting and statistical association measures. Many of their early innovations seem commonplace today. Some of their later techniques are still ahead of their time and await technological developments for implementation. The particular focus of the selection that follows is an the evaluation of a particular component of the SMART system, a multilingual thesaurus. By mapping English language expressions and their German equivalents to a common concept number, the thesaurus permitted the automatic processing of German language documents against English language queries and vice versa. The results of the evaluation, as it turned out, were somewhat inconclusive. However, this SMART experiment suggested in a bold and optimistic way how one might proceed to answer such complex questions as What is meant by retrieval language compatability? How it is to be achieved, and how evaluated?

Footnote

Original in: Journal of the American Society for Information Science 21(1970) no.3, S.187-194.

Aizawa, A.; Kohlhase, M.: Mathematical information retrieval (2021) 0.04

0.043812923 = product of:
  0.13143876 = sum of:
    0.018466292 = weight(_text_:information in 667) [ClassicSimilarity], result of:
      0.018466292 = score(doc=667,freq=8.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.27153665 = fieldWeight in 667, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=667)
    0.054829627 = weight(_text_:retrieval in 667) [ClassicSimilarity], result of:
      0.054829627 = score(doc=667,freq=8.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.46789268 = fieldWeight in 667, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=667)
    0.05814285 = weight(_text_:techniques in 667) [ClassicSimilarity], result of:
      0.05814285 = score(doc=667,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.3406997 = fieldWeight in 667, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0546875 = fieldNorm(doc=667)
  0.33333334 = coord(3/9)

Abstract: We present an overview of the NTCIR Math Tasks organized during NTCIR-10, 11, and 12. These tasks are primarily dedicated to techniques for searching mathematical content with formula expressions. In this chapter, we first summarize the task design and introduce test collections generated in the tasks. We also describe the features and main challenges of mathematical information retrieval systems and discuss future perspectives in the field.
Series: ¬The Information retrieval series, vol 43
Source: Evaluating information retrieval and access tasks. Eds.: Sakai, T., Oard, D., Kando, N. [https://doi.org/10.1007/978-981-15-5554-1_12]

Wright, L.W.; Nardini, H.K.G.; Aronson, A.R.; Rindflesch, T.C.: Hierarchical concept indexing of full-text documents in the Unified Medical Language System Information sources Map (1999) 0.04

0.04155012 = product of:
  0.12465035 = sum of:
    0.020938806 = weight(_text_:information in 2111) [ClassicSimilarity], result of:
      0.020938806 = score(doc=2111,freq=14.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.3078936 = fieldWeight in 2111, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2111)
    0.033231772 = weight(_text_:retrieval in 2111) [ClassicSimilarity], result of:
      0.033231772 = score(doc=2111,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.2835858 = fieldWeight in 2111, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2111)
    0.07047977 = weight(_text_:techniques in 2111) [ClassicSimilarity], result of:
      0.07047977 = score(doc=2111,freq=4.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.4129904 = fieldWeight in 2111, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=2111)
  0.33333334 = coord(3/9)

Abstract: Full-text documents are a vital and rapidly growing part of online biomedical information. A single large document can contain as much information as a small database, but normally lacks the tight structure and consistent indexing of a database. Retrieval systems will often miss highly relevant parts of a document if the document as a whole appears irrelevant. Access to full-text information is further complicated by the need to search separately many disparate information resources. This research explores how these problems can be addressed by the combined use of 2 techniques: 1) natural language processing for automatic concept-based indexing of full text, and 2) methods for exploiting the structure and hierarchy of full-text documents. We describe methods for applying these techniques to a large collection of full-text documents drawn from the Health Services / Technology Assessment Text (HSTAT) database at the NLM and examine how this hierarchical concept indexing can assist both document- and source-level retrieval in the context of NLM's Information Source Map project
Source: Journal of the American Society for Information Science. 50(1999) no.6, S.514-523

Mustafa el Hadi, W.: Terminology & information retrieval : new tools for new needs. Integration of knowledge across boundaries (2003) 0.04

0.041175276 = product of:
  0.12352583 = sum of:
    0.013707667 = weight(_text_:information in 2688) [ClassicSimilarity], result of:
      0.013707667 = score(doc=2688,freq=6.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.20156369 = fieldWeight in 2688, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2688)
    0.023498412 = weight(_text_:retrieval in 2688) [ClassicSimilarity], result of:
      0.023498412 = score(doc=2688,freq=2.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.20052543 = fieldWeight in 2688, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2688)
    0.086319745 = weight(_text_:techniques in 2688) [ClassicSimilarity], result of:
      0.086319745 = score(doc=2688,freq=6.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.5058079 = fieldWeight in 2688, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=2688)
  0.33333334 = coord(3/9)

Abstract: The radical changes in information and communication techniques at the end of the 20th century have significantly modified the function of terminology and its applications in all forms of communication. The introduction of new mediums has deeply changed the possibilities of distribution of scientific information. What in this situation is the role of terminology and its practical applications? What is the place for multiple functions of terminology in the communication society? What is the impact of natural language (NLP) techniques used in its processing and management? In this article we will focus an the possibilities NLP techniques offer and how they can be directed towards the satisfaction of the newly expressed needs.

Chen, K.-H.: Evaluating Chinese text retrieval with multilingual queries (2002) 0.04

0.040735207 = product of:
  0.12220562 = sum of:
    0.009233146 = weight(_text_:information in 1851) [ClassicSimilarity], result of:
      0.009233146 = score(doc=1851,freq=2.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.13576832 = fieldWeight in 1851, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1851)
    0.054829627 = weight(_text_:retrieval in 1851) [ClassicSimilarity], result of:
      0.054829627 = score(doc=1851,freq=8.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.46789268 = fieldWeight in 1851, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1851)
    0.05814285 = weight(_text_:techniques in 1851) [ClassicSimilarity], result of:
      0.05814285 = score(doc=1851,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.3406997 = fieldWeight in 1851, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1851)
  0.33333334 = coord(3/9)

Abstract: This paper reports the design of a Chinese test collection with multilingual queries and the application of this test collection to evaluate information retrieval Systems. The effective indexing units, IR models, translation techniques, and query expansion for Chinese text retrieval are identified. The collaboration of East Asian countries for construction of test collections for cross-language multilingual text retrieval is also discussed in this paper. As well, a tool is designed to help assessors judge relevante and gather the events of relevante judgment. The log file created by this tool will be used to analyze the behaviors of assessors in the future.

Kishida, K.: Term disambiguation techniques based on target document collection for cross-language information retrieval : an empirical comparison of performance between techniques (2007) 0.04
```
0.04072581 = product of:
  0.12217742 = sum of:
    0.011423056 = weight(_text_:information in 897) [ClassicSimilarity], result of:
      0.011423056 = score(doc=897,freq=6.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.16796975 = fieldWeight in 897, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=897)
    0.027693143 = weight(_text_:retrieval in 897) [ClassicSimilarity], result of:
      0.027693143 = score(doc=897,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.23632148 = fieldWeight in 897, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=897)
    0.08306122 = weight(_text_:techniques in 897) [ClassicSimilarity], result of:
      0.08306122 = score(doc=897,freq=8.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.4867139 = fieldWeight in 897, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=897)
  0.33333334 = coord(3/9)
```
Abstract

Dictionary-based query translation for cross-language information retrieval often yields various translation candidates having different meanings for a source term in the query. This paper examines methods for solving the ambiguity of translations based on only the target document collections. First, we discuss two kinds of disambiguation technique: (1) one is a method using term co-occurrence statistics in the collection, and (2) a technique based on pseudo-relevance feedback. Next, these techniques are empirically compared using the CLEF 2003 test collection for German to Italian bilingual searches, which are executed by using English language as a pivot. The experiments showed that a variation of term co-occurrence based techniques, in which the best sequence algorithm for selecting translations is used with the Cosine coefficient, is dominant, and that the PRF method shows comparable high search performance, although statistical tests did not sufficiently support these conclusions. Furthermore, we repeat the same experiments for the case of French to Italian (pivot) and English to Italian (non-pivot) searches on the same CLEF 2003 test collection in order to verity our findings. Again, similar results were observed except that the Dice coefficient outperforms slightly the Cosine coefficient in the case of disambiguation based on term co-occurrence for English to Italian searches.

Source

Information processing and management. 43(2007) no.1, S.103-120

Mustafa el Hadi, W.: Dynamics of the linguistic paradigm in information retrieval (2000) 0.04

0.0398466 = product of:
  0.1195398 = sum of:
    0.01582825 = weight(_text_:information in 151) [ClassicSimilarity], result of:
      0.01582825 = score(doc=151,freq=8.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.23274569 = fieldWeight in 151, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=151)
    0.033231772 = weight(_text_:retrieval in 151) [ClassicSimilarity], result of:
      0.033231772 = score(doc=151,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.2835858 = fieldWeight in 151, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=151)
    0.07047977 = weight(_text_:techniques in 151) [ClassicSimilarity], result of:
      0.07047977 = score(doc=151,freq=4.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.4129904 = fieldWeight in 151, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=151)
  0.33333334 = coord(3/9)

Abstract: In this paper we briefly sketch the dynamics of the linguistic paradigm in Information Retrieval (IR) and its adaptation to the Internet. The emergence of Natural Language Processing (NLP) techniques has been a major factor leading to this adaptation. These techniques and tools try to adapt to the current needs, i.e. retrieving information from documents written and indexed in a foreign language by using a native language query to express the information need. This process, known as cross-language IR (CLIR), is a field at the cross roads of both Machine Translation and IR. This field represents a real challenge to the IR community and will require a solid cooperation with the NLP community.

Mustafa el Hadi, W.: Human language technology and its role in information access and management (2003) 0.04
```
0.03958424 = product of:
  0.11875272 = sum of:
    0.02085555 = weight(_text_:information in 5524) [ClassicSimilarity], result of:
      0.02085555 = score(doc=5524,freq=20.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.30666938 = fieldWeight in 5524, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5524)
    0.03916402 = weight(_text_:retrieval in 5524) [ClassicSimilarity], result of:
      0.03916402 = score(doc=5524,freq=8.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.33420905 = fieldWeight in 5524, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5524)
    0.058733147 = weight(_text_:techniques in 5524) [ClassicSimilarity], result of:
      0.058733147 = score(doc=5524,freq=4.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.34415868 = fieldWeight in 5524, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5524)
  0.33333334 = coord(3/9)
```
Abstract

The role of linguistics in information access, extraction and dissemination is essential. Radical changes in the techniques of information and communication at the end of the twentieth century have had a significant effect on the function of the linguistic paradigm and its applications in all forms of communication. The introduction of new technical means have deeply changed the possibilities for the distribution of information. In this situation, what is the role of the linguistic paradigm and its practical applications, i.e., natural language processing (NLP) techniques when applied to information access? What solutions can linguistics offer in human computer interaction, extraction and management? Many fields show the relevance of the linguistic paradigm through the various technologies that require NLP, such as document and message understanding, information detection, extraction, and retrieval, question and answer, cross-language information retrieval (CLIR), text summarization, filtering, and spoken document retrieval. This paper focuses on the central role of human language technologies in the information society, surveys the current situation, describes the benefits of the above mentioned applications, outlines successes and challenges, and discusses solutions. It reviews the resources and means needed to advance information access and dissemination across language boundaries in the twenty-first century. Multilingualism, which is a natural result of globalization, requires more effort in the direction of language technology. The scope of human language technology (HLT) is large, so we limit our review to applications that involve multilinguality.

Content

Beitrag eines Themenheftes "Knowledge organization and classification in international information retrieval"

Ponte, J.M.: Language models for relevance feedback (2000) 0.04

0.03940301 = product of:
  0.11820903 = sum of:
    0.01582825 = weight(_text_:information in 35) [ClassicSimilarity], result of:
      0.01582825 = score(doc=35,freq=8.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.23274569 = fieldWeight in 35, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=35)
    0.05254405 = weight(_text_:retrieval in 35) [ClassicSimilarity], result of:
      0.05254405 = score(doc=35,freq=10.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.44838852 = fieldWeight in 35, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=35)
    0.049836725 = weight(_text_:techniques in 35) [ClassicSimilarity], result of:
      0.049836725 = score(doc=35,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.2920283 = fieldWeight in 35, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=35)
  0.33333334 = coord(3/9)

Abstract: The language modeling approach to Information Retrieval (IR) is a conceptually simple model of IR originally developed by Ponte and Croft (1998). In this approach, the query is treated as a random event and documents are ranked according to the likelihood that the query would be generated via a language model estimated for each document. The intuition behind this approach is that users have a prototypical document in mind and will choose query terms accordingly. The intuitive appeal of this method is that inferences about the semantic content of documents do not need to be made resulting in a conceptually simple model. In this paper, techniques for relevance feedback and routing are derived from the language modeling approach in a straightforward manner and their effectiveness is demonstrated empirically. These experiments demonstrate further proof of concept for the language modeling approach to retrieval
Series: The Kluwer international series on information retrieval; 7
Source: Advances in information retrieval: Recent research from the Center for Intelligent Information Retrieval. Ed.: W.B. Croft

Search (493 results, page 1 of 25)

Authors

Years

Languages

Types

Themes

Subjects

Classifications