Search (110 results, page 1 of 6)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.10

0.104134515 = sum of:
  0.08291535 = product of:
    0.24874605 = sum of:
      0.24874605 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.24874605 = score(doc=562,freq=2.0), product of:
          0.44259444 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.052204985 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.021219164 = product of:
    0.04243833 = sum of:
      0.04243833 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.04243833 = score(doc=562,freq=2.0), product of:
          0.18281296 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052204985 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.5 = coord(1/2)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.04
```
0.043666452 = product of:
  0.087332904 = sum of:
    0.087332904 = sum of:
      0.037318856 = weight(_text_:retrieval in 2541) [ClassicSimilarity], result of:
        0.037318856 = score(doc=2541,freq=4.0), product of:
          0.15791564 = queryWeight, product of:
            3.024915 = idf(docFreq=5836, maxDocs=44218)
            0.052204985 = queryNorm
          0.23632148 = fieldWeight in 2541, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.024915 = idf(docFreq=5836, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2541)
      0.05001405 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
        0.05001405 = score(doc=2541,freq=4.0), product of:
          0.18281296 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052204985 = queryNorm
          0.27358043 = fieldWeight in 2541, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2541)
  0.5 = coord(1/2)
```
Abstract

The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.

Date

14. 8.2004 17:22:56

Source

Online. 28(2004) no.3, S.22-29

Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.04

0.037052214 = product of:
  0.07410443 = sum of:
    0.07410443 = sum of:
      0.0316661 = weight(_text_:retrieval in 4436) [ClassicSimilarity], result of:
        0.0316661 = score(doc=4436,freq=2.0), product of:
          0.15791564 = queryWeight, product of:
            3.024915 = idf(docFreq=5836, maxDocs=44218)
            0.052204985 = queryNorm
          0.20052543 = fieldWeight in 4436, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.024915 = idf(docFreq=5836, maxDocs=44218)
            0.046875 = fieldNorm(doc=4436)
      0.04243833 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
        0.04243833 = score(doc=4436,freq=2.0), product of:
          0.18281296 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052204985 = queryNorm
          0.23214069 = fieldWeight in 4436, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=4436)
  0.5 = coord(1/2)

Abstract: Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
Date: 16. 2.2000 14:22:39

Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.03
```
0.028374974 = product of:
  0.056749947 = sum of:
    0.056749947 = sum of:
      0.031994257 = weight(_text_:retrieval in 1616) [ClassicSimilarity], result of:
        0.031994257 = score(doc=1616,freq=6.0), product of:
          0.15791564 = queryWeight, product of:
            3.024915 = idf(docFreq=5836, maxDocs=44218)
            0.052204985 = queryNorm
          0.20260347 = fieldWeight in 1616, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.024915 = idf(docFreq=5836, maxDocs=44218)
            0.02734375 = fieldNorm(doc=1616)
      0.02475569 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
        0.02475569 = score(doc=1616,freq=2.0), product of:
          0.18281296 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052204985 = queryNorm
          0.1354154 = fieldWeight in 1616, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.02734375 = fieldNorm(doc=1616)
  0.5 = coord(1/2)
```
Abstract

The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.

Footnote

Teil eines Themenheftes: "Web retrieval and mining: A machine learning perspective"

Pirkola, A.; Hedlund, T.; Keskustalo, H.; Järvelin, K.: Dictionary-based cross-language information retrieval : problems, methods, and research findings (2001) 0.03

0.0261232 = product of:
  0.0522464 = sum of:
    0.0522464 = product of:
      0.1044928 = sum of:
        0.1044928 = weight(_text_:retrieval in 3908) [ClassicSimilarity], result of:
          0.1044928 = score(doc=3908,freq=4.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.6617001 = fieldWeight in 3908, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.109375 = fieldNorm(doc=3908)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information retrieval. 4(2001), S.209-230

Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.03
```
0.02503425 = product of:
  0.0500685 = sum of:
    0.0500685 = product of:
      0.100137 = sum of:
        0.100137 = weight(_text_:retrieval in 2502) [ClassicSimilarity], result of:
          0.100137 = score(doc=2502,freq=20.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.63411707 = fieldWeight in 2502, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2502)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.

Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.02

0.021219164 = product of:
  0.04243833 = sum of:
    0.04243833 = product of:
      0.08487666 = sum of:
        0.08487666 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
          0.08487666 = score(doc=4888,freq=2.0), product of:
            0.18281296 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052204985 = queryNorm
            0.46428138 = fieldWeight in 4888, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4888)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 3.2013 14:56:22

Monnerjahn, P.: Vorsprung ohne Technik : Übersetzen: Computer und Qualität (2000) 0.02

0.021219164 = product of:
  0.04243833 = sum of:
    0.04243833 = product of:
      0.08487666 = sum of:
        0.08487666 = weight(_text_:22 in 5429) [ClassicSimilarity], result of:
          0.08487666 = score(doc=5429,freq=2.0), product of:
            0.18281296 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052204985 = queryNorm
            0.46428138 = fieldWeight in 5429, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5429)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: c't. 2000, H.22, S.230-231

Rapke, K.: Automatische Indexierung von Volltexten für die Gruner+Jahr Pressedatenbank (2001) 0.02
```
0.019391447 = product of:
  0.038782895 = sum of:
    0.038782895 = product of:
      0.07756579 = sum of:
        0.07756579 = weight(_text_:retrieval in 6386) [ClassicSimilarity], result of:
          0.07756579 = score(doc=6386,freq=12.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.49118498 = fieldWeight in 6386, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=6386)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Retrieval Tests sind die anerkannteste Methode, um neue Verfahren der Inhaltserschließung gegenüber traditionellen Verfahren zu rechtfertigen. Im Rahmen einer Diplomarbeit wurden zwei grundsätzlich unterschiedliche Systeme der automatischen inhaltlichen Erschließung anhand der Pressedatenbank des Verlagshauses Gruner + Jahr (G+J) getestet und evaluiert. Untersucht wurde dabei natürlichsprachliches Retrieval im Vergleich zu Booleschem Retrieval. Bei den beiden Systemen handelt es sich zum einen um Autonomy von Autonomy Inc. und DocCat, das von IBM an die Datenbankstruktur der G+J Pressedatenbank angepasst wurde. Ersteres ist ein auf natürlichsprachlichem Retrieval basierendes, probabilistisches System. DocCat demgegenüber basiert auf Booleschem Retrieval und ist ein lernendes System, das auf Grund einer intellektuell erstellten Trainingsvorlage indexiert. Methodisch geht die Evaluation vom realen Anwendungskontext der Textdokumentation von G+J aus. Die Tests werden sowohl unter statistischen wie auch qualitativen Gesichtspunkten bewertet. Ein Ergebnis der Tests ist, dass DocCat einige Mängel gegenüber der intellektuellen Inhaltserschließung aufweist, die noch behoben werden müssen, während das natürlichsprachliche Retrieval von Autonomy in diesem Rahmen und für die speziellen Anforderungen der G+J Textdokumentation so nicht einsetzbar ist
Kettunen, K.: Reductive and generative approaches to management of morphological variation of keywords in monolingual information retrieval : an overview (2009) 0.02
```
0.019391447 = product of:
  0.038782895 = sum of:
    0.038782895 = product of:
      0.07756579 = sum of:
        0.07756579 = weight(_text_:retrieval in 2835) [ClassicSimilarity], result of:
          0.07756579 = score(doc=2835,freq=12.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.49118498 = fieldWeight in 2835, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2835)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - The purpose of this article is to discuss advantages and disadvantages of various means to manage morphological variation of keywords in monolingual information retrieval. Design/methodology/approach - The authors present a compilation of query results from 11 mostly European languages and a new general classification of the language dependent techniques for management of morphological variation. Variants of the different techniques are compared in some detail in terms of retrieval effectiveness and other criteria. The paper consists mainly of an overview of different management methods for keyword variation in information retrieval. Typical IR retrieval results of 11 languages and a new classification for keyword management methods are also presented. Findings - The main results of the paper are an overall comparison of reductive and generative keyword management methods in terms of retrieval effectiveness and other broader criteria. Originality/value - The paper is of value to anyone who wants to get an overall picture of keyword management techniques used in IR.
Blair, D.C.: Information retrieval and the philosophy of language (2002) 0.02
```
0.019028958 = product of:
  0.038057916 = sum of:
    0.038057916 = product of:
      0.07611583 = sum of:
        0.07611583 = weight(_text_:retrieval in 4283) [ClassicSimilarity], result of:
          0.07611583 = score(doc=4283,freq=26.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.48200315 = fieldWeight in 4283, product of:
              5.0990195 = tf(freq=26.0), with freq of:
                26.0 = termFreq=26.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03125 = fieldNorm(doc=4283)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Information retrieval - the retrieval, primarily, of documents or textual material - is fundamentally a linguistic process. At the very least we must describe what we want and match that description with descriptions of the information that is available to us. Furthermore, when we describe what we want, we must mean something by that description. This is a deceptively simple act, but such linguistic events have been the grist for philosophical analysis since Aristotle. Although there are complexities involved in referring to authors, document types, or other categories of information retrieval context, here I wish to focus an one of the most problematic activities in information retrieval: the description of the intellectual content of information items. And even though I take information retrieval to involve the description and retrieval of written text, what I say here is applicable to any information item whose intellectual content can be described for retrieval-books, documents, images, audio clips, video clips, scientific specimens, engineering schematics, and so forth. For convenience, though, I will refer only to the description and retrieval of documents. The description of intellectual content can go wrong in many obvious ways. We may describe what we want incorrectly; we may describe it correctly but in such general terms that its description is useless for retrieval; or we may describe what we want correctly, but misinterpret the descriptions of available information, and thereby match our description of what we want incorrectly. From a linguistic point of view, we can be misunderstood in the process of retrieval in many ways. Because the philosophy of language deals specifically with how we are understood and mis-understood, it should have some use for understanding the process of description in information retrieval. First, however, let us examine more closely the kinds of misunderstandings that can occur in information retrieval. We use language in searching for information in two principal ways. We use it to describe what we want and to discriminate what we want from other information that is available to us but that we do not want. Description and discrimination together articulate the goals of the information search process; they also delineate the two principal ways in which language can fail us in this process. Van Rijsbergen (1979) was the first to make this distinction, calling them "representation" and "discrimination.""

Liu, S.; Liu, F.; Yu, C.; Meng, W.: ¬An effective approach to document retrieval via utilizing WordNet and recognizing phrases (2004) 0.02

0.018659428 = product of:
  0.037318856 = sum of:
    0.037318856 = product of:
      0.07463771 = sum of:
        0.07463771 = weight(_text_:retrieval in 4078) [ClassicSimilarity], result of:
          0.07463771 = score(doc=4078,freq=4.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.47264296 = fieldWeight in 4078, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=4078)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Ekmekcioglu, F.C.; Willett, P.: Effectiveness of stemming for Turkish text retrieval (2000) 0.02

0.018471893 = product of:
  0.036943786 = sum of:
    0.036943786 = product of:
      0.07388757 = sum of:
        0.07388757 = weight(_text_:retrieval in 5423) [ClassicSimilarity], result of:
          0.07388757 = score(doc=5423,freq=2.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.46789268 = fieldWeight in 5423, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.109375 = fieldNorm(doc=5423)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Perez-Carballo, J.; Strzalkowski, T.: Natural language information retrieval : progress report (2000) 0.02

0.018471893 = product of:
  0.036943786 = sum of:
    0.036943786 = product of:
      0.07388757 = sum of:
        0.07388757 = weight(_text_:retrieval in 6421) [ClassicSimilarity], result of:
          0.07388757 = score(doc=6421,freq=2.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.46789268 = fieldWeight in 6421, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.109375 = fieldNorm(doc=6421)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Chen, K.-H.: Evaluating Chinese text retrieval with multilingual queries (2002) 0.02
```
0.018471893 = product of:
  0.036943786 = sum of:
    0.036943786 = product of:
      0.07388757 = sum of:
        0.07388757 = weight(_text_:retrieval in 1851) [ClassicSimilarity], result of:
          0.07388757 = score(doc=1851,freq=8.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.46789268 = fieldWeight in 1851, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1851)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper reports the design of a Chinese test collection with multilingual queries and the application of this test collection to evaluate information retrieval Systems. The effective indexing units, IR models, translation techniques, and query expansion for Chinese text retrieval are identified. The collaboration of East Asian countries for construction of test collections for cross-language multilingual text retrieval is also discussed in this paper. As well, a tool is designed to help assessors judge relevante and gather the events of relevante judgment. The log file created by this tool will be used to analyze the behaviors of assessors in the future.
Kummer, N.: Indexierungstechniken für das japanische Retrieval (2006) 0.02
```
0.018282432 = product of:
  0.036564864 = sum of:
    0.036564864 = product of:
      0.07312973 = sum of:
        0.07312973 = weight(_text_:retrieval in 5979) [ClassicSimilarity], result of:
          0.07312973 = score(doc=5979,freq=6.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.46309367 = fieldWeight in 5979, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0625 = fieldNorm(doc=5979)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Der vorliegende Artikel beschreibt die Herausforderungen, die die japanische Sprache aufgrund der besonderen Struktur ihres Schriftsystems an das Information Retrieval stellt und präsentiert Strategien und Ansätze für die Indexierung japanischer Dokumente. Im Besonderen soll auf die Effektivität aussprachebasierter (yomi-based) Indexierung sowie Fusion verschiedener einzelner Indexierungsansätze eingegangen werden.

Source

Effektive Information Retrieval Verfahren in Theorie und Praxis: ausgewählte und erweiterte Beiträge des Vierten Hildesheimer Evaluierungs- und Retrievalworkshop (HIER 2005), Hildesheim, 20.7.2005. Hrsg.: T. Mandl u. C. Womser-Hacker
Ponte, J.M.: Language models for relevance feedback (2000) 0.02
```
0.01770189 = product of:
  0.03540378 = sum of:
    0.03540378 = product of:
      0.07080756 = sum of:
        0.07080756 = weight(_text_:retrieval in 35) [ClassicSimilarity], result of:
          0.07080756 = score(doc=35,freq=10.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.44838852 = fieldWeight in 35, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=35)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The language modeling approach to Information Retrieval (IR) is a conceptually simple model of IR originally developed by Ponte and Croft (1998). In this approach, the query is treated as a random event and documents are ranked according to the likelihood that the query would be generated via a language model estimated for each document. The intuition behind this approach is that users have a prototypical document in mind and will choose query terms accordingly. The intuitive appeal of this method is that inferences about the semantic content of documents do not need to be made resulting in a conceptually simple model. In this paper, techniques for relevance feedback and routing are derived from the language modeling approach in a straightforward manner and their effectiveness is demonstrated empirically. These experiments demonstrate further proof of concept for the language modeling approach to retrieval

Series

The Kluwer international series on information retrieval; 7

Source

Advances in information retrieval: Recent research from the Center for Intelligent Information Retrieval. Ed.: W.B. Croft
Kreymer, O.: ¬An evaluation of help mechanisms in natural language information retrieval systems (2002) 0.02
```
0.01770189 = product of:
  0.03540378 = sum of:
    0.03540378 = product of:
      0.07080756 = sum of:
        0.07080756 = weight(_text_:retrieval in 2557) [ClassicSimilarity], result of:
          0.07080756 = score(doc=2557,freq=10.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.44838852 = fieldWeight in 2557, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2557)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The field of natural language processing (NLP) demonstrates rapid changes in the design of information retrieval systems and human-computer interaction. While natural language is being looked on as the most effective tool for information retrieval in a contemporary information environment, the systems using it are only beginning to emerge. This study attempts to evaluate the current state of NLP information retrieval systems from the user's point of view: what techniques are used by these systems to guide their users through the search process? The analysis focused on the structure and components of the systems' help mechanisms. Results of the study demonstrated that systems which claimed to be using natural language searching in fact used a wide range of information retrieval techniques from real natural language processing to Boolean searching. As a result, the user assistance mechanisms of these systems also varied. While pseudo-NLP systems would suit a more traditional method of instruction, real NLP systems primarily utilised the methods of explanation and user-system dialogue.

Kuhlmann, U.; Monnerjahn, P.: Sprache auf Knopfdruck : Sieben automatische Übersetzungsprogramme im Test (2000) 0.02

0.017682636 = product of:
  0.035365272 = sum of:
    0.035365272 = product of:
      0.070730545 = sum of:
        0.070730545 = weight(_text_:22 in 5428) [ClassicSimilarity], result of:
          0.070730545 = score(doc=5428,freq=2.0), product of:
            0.18281296 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052204985 = queryNorm
            0.38690117 = fieldWeight in 5428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=5428)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: c't. 2000, H.22, S.220-229

Ballesteros, L.A.: Cross-language retrieval via transitive relation (2000) 0.02
```
0.01615954 = product of:
  0.03231908 = sum of:
    0.03231908 = product of:
      0.06463816 = sum of:
        0.06463816 = weight(_text_:retrieval in 30) [ClassicSimilarity], result of:
          0.06463816 = score(doc=30,freq=12.0), product of:
            0.15791564 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.052204985 = queryNorm
            0.40932083 = fieldWeight in 30, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=30)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The growth in availability of multi-lingual data in all areas of the public and private sector is driving an increasing need for systems that facilitate access to multi-lingual resources. Cross-language Retrieval (CLR) technology is a means of addressing this need. A CLR system must address two main hurdles to effective cross-language retrieval. First, it must address the ambiguity that arises when trying to map the meaning of text across languages. That is, it must address both within-language ambiguity and cross-language ambiguity. Second, it has to incorporate multilingual resources that will enable it to perform the mapping across languages. The difficulty here is that there is a limited number of lexical resources and virtually none for some pairs of languages. This work focuses on a dictionary approach to addressing the problem of limited lexical resources. A dictionary approach is taken since bilingual dictionaries are more prevalent and simpler to apply than other resources. We show that a transitive translation approach, where a third language is employed as an interlingua between the source and target languages, is a viable means of performing CLR between languages for which no bilingual dictionary is available

Series

The Kluwer international series on information retrieval; 7

Source

Advances in information retrieval: Recent research from the Center for Intelligent Information Retrieval. Ed.: W.B. Croft

Search (110 results, page 1 of 6)

Authors

Languages

Types

Themes

Subjects

Classifications