Search (225 results, page 1 of 12)

Rosemblat, G.; Graham, L.: Cross-language search in a monolingual health information system : flexible designs and lexical processes (2006) 0.08

0.078905575 = product of:
  0.17753755 = sum of:
    0.08076138 = weight(_text_:line in 241) [ClassicSimilarity], result of:
      0.08076138 = score(doc=241,freq=2.0), product of:
        0.21724595 = queryWeight, product of:
          5.6078424 = idf(docFreq=440, maxDocs=44218)
          0.038739666 = queryNorm
        0.37175092 = fieldWeight in 241, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.6078424 = idf(docFreq=440, maxDocs=44218)
          0.046875 = fieldNorm(doc=241)
    0.013707667 = weight(_text_:information in 241) [ClassicSimilarity], result of:
      0.013707667 = score(doc=241,freq=6.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.20156369 = fieldWeight in 241, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=241)
    0.033231772 = weight(_text_:retrieval in 241) [ClassicSimilarity], result of:
      0.033231772 = score(doc=241,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.2835858 = fieldWeight in 241, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=241)
    0.049836725 = weight(_text_:techniques in 241) [ClassicSimilarity], result of:
      0.049836725 = score(doc=241,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.2920283 = fieldWeight in 241, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=241)
  0.44444445 = coord(4/9)

Abstract: The predominance of English-only online health information poses a serious challenge to nonEnglish speakers. To overcome this barrier, we incorporated cross-language information retrieval (CLIR) techniques into a fully functional prototype. It supports Spanish language searches over an English data set using a Spanish-English bilingual term list (BTL). The modular design allows for system and BTL growth and takes advantage of English-system enhancements. Language-based design decisions and implications for integrating non-English components with the existing monolingual architecture are presented. Algorithmic and BTL improvements are used to bring CUR retrieval scores in line with the monolingual values. After validating these changes, we conducted a failure analysis and error categorization for the worst performing queries. We conclude with a comprehensive discussion and directions for future work.

Freitas-Junior, H.R.; Ribeiro-Neto, B.A.; Freitas-Vale, R. de; Laender, A.H.F.; Lima, L.R.S. de: Categorization-driven cross-language retrieval of medical information (2006) 0.06

0.060433533 = product of:
  0.13597545 = sum of:
    0.01615464 = weight(_text_:information in 5282) [ClassicSimilarity], result of:
      0.01615464 = score(doc=5282,freq=12.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.23754507 = fieldWeight in 5282, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5282)
    0.047965933 = weight(_text_:retrieval in 5282) [ClassicSimilarity], result of:
      0.047965933 = score(doc=5282,freq=12.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.40932083 = fieldWeight in 5282, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5282)
    0.058733147 = weight(_text_:techniques in 5282) [ClassicSimilarity], result of:
      0.058733147 = score(doc=5282,freq=4.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.34415868 = fieldWeight in 5282, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5282)
    0.013121725 = product of:
      0.02624345 = sum of:
        0.02624345 = weight(_text_:22 in 5282) [ClassicSimilarity], result of:
          0.02624345 = score(doc=5282,freq=2.0), product of:
            0.13565971 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038739666 = queryNorm
            0.19345059 = fieldWeight in 5282, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5282)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)

Abstract: The Web has become a large repository of documents (or pages) written in many different languages. In this context, traditional information retrieval (IR) techniques cannot be used whenever the user query and the documents being retrieved are in different languages. To address this problem, new cross-language information retrieval (CLIR) techniques have been proposed. In this work, we describe a method for cross-language retrieval of medical information. This method combines query terms and related medical concepts obtained automatically through a categorization procedure. The medical concepts are used to create a linguistic abstraction that allows retrieval of information in a language-independent way, minimizing linguistic problems such as polysemy. To evaluate our method, we carried out experiments using the OHSUMED test collection, whose documents are written in English, with queries expressed in Portuguese, Spanish, and French. The results indicate that our cross-language retrieval method is as effective as a standard vector space model algorithm operating on queries and documents in the same language. Further, our results are better than previous results in the literature.
Date: 22. 7.2006 16:46:36
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.4, S.501-510

Kishida, K.: Technical issues of cross-language information retrieval : a review (2005) 0.06

0.059226304 = product of:
  0.17767891 = sum of:
    0.018276889 = weight(_text_:information in 1019) [ClassicSimilarity], result of:
      0.018276889 = score(doc=1019,freq=6.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.2687516 = fieldWeight in 1019, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=1019)
    0.04430903 = weight(_text_:retrieval in 1019) [ClassicSimilarity], result of:
      0.04430903 = score(doc=1019,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.37811437 = fieldWeight in 1019, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=1019)
    0.11509299 = weight(_text_:techniques in 1019) [ClassicSimilarity], result of:
      0.11509299 = score(doc=1019,freq=6.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.6744105 = fieldWeight in 1019, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0625 = fieldNorm(doc=1019)
  0.33333334 = coord(3/9)

Abstract: This paper reviews state-of-the-art techniques and methods for enhancing effectiveness of cross-language information retrieval (CLIR). The following research issues are covered: (1) matching strategies and translation techniques, (2) methods for solving the problem of translation ambiguity, (3) formal models for CLIR such as application of the language model, (4) the pivot language approach, (5) methods for searching multilingual document collection, (6) techniques for combining multiple language resources, etc.
Source: Information processing and management. 41(2005) no.3, S.433-456

Levow, G.-A.; Oard, D.W.; Resnik, P.: Dictionary-based techniques for cross-language information retrieval (2005) 0.05

0.052792586 = product of:
  0.15837775 = sum of:
    0.013707667 = weight(_text_:information in 1025) [ClassicSimilarity], result of:
      0.013707667 = score(doc=1025,freq=6.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.20156369 = fieldWeight in 1025, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1025)
    0.033231772 = weight(_text_:retrieval in 1025) [ClassicSimilarity], result of:
      0.033231772 = score(doc=1025,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.2835858 = fieldWeight in 1025, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1025)
    0.11143831 = weight(_text_:techniques in 1025) [ClassicSimilarity], result of:
      0.11143831 = score(doc=1025,freq=10.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.65299517 = fieldWeight in 1025, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=1025)
  0.33333334 = coord(3/9)

Abstract: Cross-language information retrieval (CLIR) systems allow users to find documents written in different languages from that of their query. Simple knowledge structures such as bilingual term lists have proven to be a remarkably useful basis for bridging that language gap. A broad array of dictionary-based techniques have demonstrated utility, but comparison across techniques has been difficult because evaluation results often span only a limited range of conditions. This article identifies the key issues in dictionary-based CLIR, develops unified frameworks for term selection and term translation that help to explain the relationships among existing techniques, and illustrates the effect of those techniques using four contrasting languages for systematic experiments with a uniform query translation architecture. Key results include identification of a previously unseen dependence of pre- and post-translation expansion on orthographic cognates and development of a query-specific measure for translation fanout that helps to explain the utility of structured query methods.
Source: Information processing and management. 41(2005) no.3, S.523-548

Cross-language information retrieval (1998) 0.05
```
0.04707476 = product of:
  0.10591821 = sum of:
    0.033650577 = weight(_text_:line in 6299) [ClassicSimilarity], result of:
      0.033650577 = score(doc=6299,freq=2.0), product of:
        0.21724595 = queryWeight, product of:
          5.6078424 = idf(docFreq=440, maxDocs=44218)
          0.038739666 = queryNorm
        0.15489621 = fieldWeight in 6299, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.6078424 = idf(docFreq=440, maxDocs=44218)
          0.01953125 = fieldNorm(doc=6299)
    0.01233831 = weight(_text_:information in 6299) [ClassicSimilarity], result of:
      0.01233831 = score(doc=6299,freq=28.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.18142805 = fieldWeight in 6299, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.01953125 = fieldNorm(doc=6299)
    0.03916402 = weight(_text_:retrieval in 6299) [ClassicSimilarity], result of:
      0.03916402 = score(doc=6299,freq=32.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.33420905 = fieldWeight in 6299, product of:
          5.656854 = tf(freq=32.0), with freq of:
            32.0 = termFreq=32.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.01953125 = fieldNorm(doc=6299)
    0.020765305 = weight(_text_:techniques in 6299) [ClassicSimilarity], result of:
      0.020765305 = score(doc=6299,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.12167847 = fieldWeight in 6299, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.01953125 = fieldNorm(doc=6299)
  0.44444445 = coord(4/9)
```
Content

Enthält die Beiträge: GREFENSTETTE, G.: The Problem of Cross-Language Information Retrieval; DAVIS, M.W.: On the Effective Use of Large Parallel Corpora in Cross-Language Text Retrieval; BALLESTEROS, L. u. W.B. CROFT: Statistical Methods for Cross-Language Information Retrieval; Distributed Cross-Lingual Information Retrieval; Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing; EVANS, D.A. u.a.: Mapping Vocabularies Using Latent Semantics; PICCHI, E. u. C. PETERS: Cross-Language Information Retrieval: A System for Comparable Corpus Querying; YAMABANA, K. u.a.: A Language Conversion Front-End for Cross-Language Information Retrieval; GACHOT, D.A. u.a.: The Systran NLP Browser: An Application of Machine Translation Technology in Cross-Language Information Retrieval; HULL, D.: A Weighted Boolean Model for Cross-Language Text Retrieval; SHERIDAN, P. u.a. Building a Large Multilingual Test Collection from Comparable News Documents; OARD; D.W. u. B.J. DORR: Evaluating Cross-Language Text Filtering Effectiveness

Footnote

Rez. in: Machine translation review: 1999, no.10, S.26-27 (D. Lewis): "Cross Language Information Retrieval (CLIR) addresses the growing need to access large volumes of data across language boundaries. The typical requirement is for the user to input a free form query, usually a brief description of a topic, into a search or retrieval engine which returns a list, in ranked order, of documents or web pages that are relevant to the topic. The search engine matches the terms in the query to indexed terms, usually keywords previously derived from the target documents. Unlike monolingual information retrieval, CLIR requires query terms in one language to be matched to indexed terms in another. Matching can be done by bilingual dictionary lookup, full machine translation, or by applying statistical methods. A query's success is measured in terms of recall (how many potentially relevant target documents are found) and precision (what proportion of documents found are relevant). Issues in CLIR are how to translate query terms into index terms, how to eliminate alternative translations (e.g. to decide that French 'traitement' in a query means 'treatment' and not 'salary'), and how to rank or weight translation alternatives that are retained (e.g. how to order the French terms 'aventure', 'business', 'affaire', and 'liaison' as relevant translations of English 'affair'). Grefenstette provides a lucid and useful overview of the field and the problems. The volume brings together a number of experiments and projects in CLIR. Mark Davies (New Mexico State University) describes Recuerdo, a Spanish retrieval engine which reduces translation ambiguities by scanning indexes for parallel texts; it also uses either a bilingual dictionary or direct equivalents from a parallel corpus in order to compare results for queries on parallel texts. Lisa Ballesteros and Bruce Croft (University of Massachusetts) use a 'local feedback' technique which automatically enhances a query by adding extra terms to it both before and after translation; such terms can be derived from documents known to be relevant to the query.
Christian Fluhr at al (DIST/SMTI, France) outline the EMIR (European Multilingual Information Retrieval) and ESPRIT projects. They found that using SYSTRAN to machine translate queries and to access material from various multilingual databases produced less relevant results than a method referred to as 'multilingual reformulation' (the mechanics of which are only hinted at). An interesting technique is Latent Semantic Indexing (LSI), described by Michael Littman et al (Brown University) and, most clearly, by David Evans et al (Carnegie Mellon University). LSI involves creating matrices of documents and the terms they contain and 'fitting' related documents into a reduced matrix space. This effectively allows queries to be mapped onto a common semantic representation of the documents. Eugenio Picchi and Carol Peters (Pisa) report on a procedure to create links between translation equivalents in an Italian-English parallel corpus. The links are used to construct parallel linguistic contexts in real-time for any term or combination of terms that is being searched for in either language. Their interest is primarily lexicographic but they plan to apply the same procedure to comparable corpora, i.e. to texts which are not translations of each other but which share the same domain. Kiyoshi Yamabana et al (NEC, Japan) address the issue of how to disambiguate between alternative translations of query terms. Their DMAX (double maximise) method looks at co-occurrence frequencies between both source language words and target language words in order to arrive at the most probable translation. The statistical data for the decision are derived, not from the translation texts but independently from monolingual corpora in each language. An interactive user interface allows the user to influence the selection of terms during the matching process. Denis Gachot et al (SYSTRAN) describe the SYSTRAN NLP browser, a prototype tool which collects parsing information derived from a text or corpus previously translated with SYSTRAN. The user enters queries into the browser in either a structured or free form and receives grammatical and lexical information about the source text and/or its translation.
The retrieved output from a query including the phrase 'big rockets' may be, for instance, a sentence containing 'giant rocket' which is semantically ranked above 'military ocket'. David Hull (Xerox Research Centre, Grenoble) describes an implementation of a weighted Boolean model for Spanish-English CLIR. Users construct Boolean-type queries, weighting each term in the query, which is then translated by an on-line dictionary before being applied to the database. Comparisons with the performance of unweighted free-form queries ('vector space' models) proved encouraging. Two contributions consider the evaluation of CLIR systems. In order to by-pass the time-consuming and expensive process of assembling a standard collection of documents and of user queries against which the performance of an CLIR system is manually assessed, Páriac Sheridan et al (ETH Zurich) propose a method based on retrieving 'seed documents'. This involves identifying a unique document in a database (the 'seed document') and, for a number of queries, measuring how fast it is retrieved. The authors have also assembled a large database of multilingual news documents for testing purposes. By storing the (fairly short) documents in a structured form tagged with descriptor codes (e.g. for topic, country and area), the test suite is easily expanded while remaining consistent for the purposes of testing. Douglas Ouard and Bonne Dorr (University of Maryland) describe an evaluation methodology which appears to apply LSI techniques in order to filter and rank incoming documents designed for testing CLIR systems. The volume provides the reader an excellent overview of several projects in CLIR. It is well supported with references and is intended as a secondary text for researchers and practitioners. It highlights the need for a good, general tutorial introduction to the field."

Series

The Kluwer International series on information retrieval

Larkey, L.S.; Connell, M.E.: Structured queries, language modelling, and relevance modelling in cross-language information retrieval (2005) 0.05

0.04677307 = product of:
  0.105239406 = sum of:
    0.011423056 = weight(_text_:information in 1022) [ClassicSimilarity], result of:
      0.011423056 = score(doc=1022,freq=6.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.16796975 = fieldWeight in 1022, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1022)
    0.03916402 = weight(_text_:retrieval in 1022) [ClassicSimilarity], result of:
      0.03916402 = score(doc=1022,freq=8.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.33420905 = fieldWeight in 1022, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1022)
    0.04153061 = weight(_text_:techniques in 1022) [ClassicSimilarity], result of:
      0.04153061 = score(doc=1022,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.24335694 = fieldWeight in 1022, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1022)
    0.013121725 = product of:
      0.02624345 = sum of:
        0.02624345 = weight(_text_:22 in 1022) [ClassicSimilarity], result of:
          0.02624345 = score(doc=1022,freq=2.0), product of:
            0.13565971 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038739666 = queryNorm
            0.19345059 = fieldWeight in 1022, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1022)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)

Abstract: Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries--one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus. We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.
Date: 26.12.2007 20:22:11
Source: Information processing and management. 41(2005) no.3, S.457-474

Levergood, B.; Farrenkopf, S.; Frasnelli, E.: ¬The specification of the language of the field and interoperability : cross-language access to catalogues and online libraries (CACAO) (2008) 0.04

0.04456599 = product of:
  0.100273475 = sum of:
    0.011192262 = weight(_text_:information in 2646) [ClassicSimilarity], result of:
      0.011192262 = score(doc=2646,freq=4.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.16457605 = fieldWeight in 2646, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2646)
    0.023498412 = weight(_text_:retrieval in 2646) [ClassicSimilarity], result of:
      0.023498412 = score(doc=2646,freq=2.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.20052543 = fieldWeight in 2646, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2646)
    0.049836725 = weight(_text_:techniques in 2646) [ClassicSimilarity], result of:
      0.049836725 = score(doc=2646,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.2920283 = fieldWeight in 2646, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=2646)
    0.01574607 = product of:
      0.03149214 = sum of:
        0.03149214 = weight(_text_:22 in 2646) [ClassicSimilarity], result of:
          0.03149214 = score(doc=2646,freq=2.0), product of:
            0.13565971 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038739666 = queryNorm
            0.23214069 = fieldWeight in 2646, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2646)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)

Abstract: The CACAO Project (Cross-language Access to Catalogues and Online Libraries) has been designed to implement natural language processing and cross-language information retrieval techniques to provide cross-language access to information in libraries, a critical issue in the linguistically diverse European Union. This project report addresses two metadata-related challenges for the library community in this context: "false friends" (identical words having different meanings in different languages) and term ambiguity. The possible solutions involve enriching the metadata with attributes specifying language or the source authority file, or associating potential search terms to classes in a classification system. The European Library will evaluate an early implementation of this work in late 2008.
Source: Metadata for semantic and social applications : proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, 22 - 26 September 2008, DC 2008: Berlin, Germany / ed. by Jane Greenberg and Wolfgang Klas

Multilingual information management : current levels and future abilities. A report Commissioned by the US National Science Foundation and also delivered to the European Commission's Language Engineering Office and the US Defense Advanced Research Projects Agency, April 1999 (1999) 0.04
```
0.044080213 = product of:
  0.13224064 = sum of:
    0.015828248 = weight(_text_:information in 6068) [ClassicSimilarity], result of:
      0.015828248 = score(doc=6068,freq=18.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.23274568 = fieldWeight in 6068, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=6068)
    0.035029367 = weight(_text_:retrieval in 6068) [ClassicSimilarity], result of:
      0.035029367 = score(doc=6068,freq=10.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.29892567 = fieldWeight in 6068, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=6068)
    0.08138303 = weight(_text_:techniques in 6068) [ClassicSimilarity], result of:
      0.08138303 = score(doc=6068,freq=12.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.47688022 = fieldWeight in 6068, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.03125 = fieldNorm(doc=6068)
  0.33333334 = coord(3/9)
```
Abstract

Over the past 50 years, a variety of language-related capabilities has been developed in machine translation, information retrieval, speech recognition, text summarization, and so on. These applications rest upon a set of core techniques such as language modeling, information extraction, parsing, generation, and multimedia planning and integration; and they involve methods using statistics, rules, grammars, lexicons, ontologies, training techniques, and so on. It is a puzzling fact that although all of this work deals with language in some form or other, the major applications have each developed a separate research field. For example, there is no reason why speech recognition techniques involving n-grams and hidden Markov models could not have been used in machine translation 15 years earlier than they were, or why some of the lexical and semantic insights from the subarea called Computational Linguistics are still not used in information retrieval.
This picture will rapidly change. The twin challenges of massive information overload via the web and ubiquitous computers present us with an unavoidable task: developing techniques to handle multilingual and multi-modal information robustly and efficiently, with as high quality performance as possible. The most effective way for us to address such a mammoth task, and to ensure that our various techniques and applications fit together, is to start talking across the artificial research boundaries. Extending the current technologies will require integrating the various capabilities into multi-functional and multi-lingual natural language systems. However, at this time there is no clear vision of how these technologies could or should be assembled into a coherent framework. What would be involved in connecting a speech recognition system to an information retrieval engine, and then using machine translation and summarization software to process the retrieved text? How can traditional parsing and generation be enhanced with statistical techniques? What would be the effect of carefully crafted lexicons on traditional information retrieval? At which points should machine translation be interleaved within information retrieval systems to enable multilingual processing?

Wang, J.; Oard, D.W.: Matching meaning for cross-language information retrieval (2012) 0.04

0.043812923 = product of:
  0.13143876 = sum of:
    0.018466292 = weight(_text_:information in 7430) [ClassicSimilarity], result of:
      0.018466292 = score(doc=7430,freq=8.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.27153665 = fieldWeight in 7430, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7430)
    0.054829627 = weight(_text_:retrieval in 7430) [ClassicSimilarity], result of:
      0.054829627 = score(doc=7430,freq=8.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.46789268 = fieldWeight in 7430, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7430)
    0.05814285 = weight(_text_:techniques in 7430) [ClassicSimilarity], result of:
      0.05814285 = score(doc=7430,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.3406997 = fieldWeight in 7430, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7430)
  0.33333334 = coord(3/9)

Abstract: This article describes a framework for cross-language information retrieval that efficiently leverages statistical estimation of translation probabilities. The framework provides a unified perspective into which some earlier work on techniques for cross-language information retrieval based on translation probabilities can be cast. Modeling synonymy and filtering translation probabilities using bidirectional evidence are shown to yield a balance between retrieval effectiveness and query-time (or indexing-time) efficiency that seems well suited large-scale applications. Evaluations with six test collections show consistent improvements over strong baselines.
Source: Information processing and management. 48(2012) no.4, S.631-653

Fluhr, C.: Crosslingual access to photo databases (2012) 0.04

0.04310904 = product of:
  0.09699534 = sum of:
    0.007914125 = weight(_text_:information in 93) [ClassicSimilarity], result of:
      0.007914125 = score(doc=93,freq=2.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.116372846 = fieldWeight in 93, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=93)
    0.023498412 = weight(_text_:retrieval in 93) [ClassicSimilarity], result of:
      0.023498412 = score(doc=93,freq=2.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.20052543 = fieldWeight in 93, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=93)
    0.049836725 = weight(_text_:techniques in 93) [ClassicSimilarity], result of:
      0.049836725 = score(doc=93,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.2920283 = fieldWeight in 93, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=93)
    0.01574607 = product of:
      0.03149214 = sum of:
        0.03149214 = weight(_text_:22 in 93) [ClassicSimilarity], result of:
          0.03149214 = score(doc=93,freq=2.0), product of:
            0.13565971 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038739666 = queryNorm
            0.23214069 = fieldWeight in 93, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=93)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)

Abstract: This paper is about search of photos in photo databases of agencies which sell photos over the Internet. The problem is far from the behavior of photo databases managed by librarians and also far from the corpora generally used for research purposes. The descriptions use mainly single words and it is well known that it is not the best way to have a good search. This increases the problem of semantic ambiguity. This problem of semantic ambiguity is crucial for cross-language querying. On the other hand, users are not aware of documentation techniques and use generally very simple queries but want to get precise answers. This paper gives the experience gained in a 3 year use (2006-2008) of a cross-language access to several of the main international commercial photo databases. The languages used were French, English, and German.
Date: 17. 4.2012 14:25:22
Source: Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a

Qin, J.; Zhou, Y.; Chau, M.; Chen, H.: Multilingual Web retrieval : an experiment in English-Chinese business intelligence (2006) 0.04
```
0.04194808 = product of:
  0.12584424 = sum of:
    0.0147471 = weight(_text_:information in 5054) [ClassicSimilarity], result of:
      0.0147471 = score(doc=5054,freq=10.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.21684799 = fieldWeight in 5054, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5054)
    0.03916402 = weight(_text_:retrieval in 5054) [ClassicSimilarity], result of:
      0.03916402 = score(doc=5054,freq=8.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.33420905 = fieldWeight in 5054, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5054)
    0.07193312 = weight(_text_:techniques in 5054) [ClassicSimilarity], result of:
      0.07193312 = score(doc=5054,freq=6.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.42150658 = fieldWeight in 5054, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5054)
  0.33333334 = coord(3/9)
```
Abstract

As increasing numbers of non-English resources have become available on the Web, the interesting and important issue of how Web users can retrieve documents in different languages has arisen. Cross-language information retrieval (CLIP), the study of retrieving information in one language by queries expressed in another language, is a promising approach to the problem. Cross-language information retrieval has attracted much attention in recent years. Most research systems have achieved satisfactory performance on standard Text REtrieval Conference (TREC) collections such as news articles, but CLIR techniques have not been widely studied and evaluated for applications such as Web portals. In this article, the authors present their research in developing and evaluating a multilingual English-Chinese Web portal that incorporates various CLIP techniques for use in the business domain. A dictionary-based approach was adopted and combines phrasal translation, co-occurrence analysis, and pre- and posttranslation query expansion. The portal was evaluated by domain experts, using a set of queries in both English and Chinese. The experimental results showed that co-occurrence-based phrasal translation achieved a 74.6% improvement in precision over simple word-byword translation. When used together, pre- and posttranslation query expansion improved the performance slightly, achieving a 78.0% improvement over the baseline word-by-word translation approach. In general, applying CLIR techniques in Web applications shows promise.

Footnote

Beitrag einer special topic section on multilingual information systems

Source

Journal of the American Society for Information Science and Technology. 57(2006) no.5, S.671-683

Chen, K.-H.: Evaluating Chinese text retrieval with multilingual queries (2002) 0.04

0.040735207 = product of:
  0.12220562 = sum of:
    0.009233146 = weight(_text_:information in 1851) [ClassicSimilarity], result of:
      0.009233146 = score(doc=1851,freq=2.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.13576832 = fieldWeight in 1851, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1851)
    0.054829627 = weight(_text_:retrieval in 1851) [ClassicSimilarity], result of:
      0.054829627 = score(doc=1851,freq=8.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.46789268 = fieldWeight in 1851, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1851)
    0.05814285 = weight(_text_:techniques in 1851) [ClassicSimilarity], result of:
      0.05814285 = score(doc=1851,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.3406997 = fieldWeight in 1851, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1851)
  0.33333334 = coord(3/9)

Abstract: This paper reports the design of a Chinese test collection with multilingual queries and the application of this test collection to evaluate information retrieval Systems. The effective indexing units, IR models, translation techniques, and query expansion for Chinese text retrieval are identified. The collaboration of East Asian countries for construction of test collections for cross-language multilingual text retrieval is also discussed in this paper. As well, a tool is designed to help assessors judge relevante and gather the events of relevante judgment. The log file created by this tool will be used to analyze the behaviors of assessors in the future.

Kishida, K.: Term disambiguation techniques based on target document collection for cross-language information retrieval : an empirical comparison of performance between techniques (2007) 0.04
```
0.04072581 = product of:
  0.12217742 = sum of:
    0.011423056 = weight(_text_:information in 897) [ClassicSimilarity], result of:
      0.011423056 = score(doc=897,freq=6.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.16796975 = fieldWeight in 897, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=897)
    0.027693143 = weight(_text_:retrieval in 897) [ClassicSimilarity], result of:
      0.027693143 = score(doc=897,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.23632148 = fieldWeight in 897, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=897)
    0.08306122 = weight(_text_:techniques in 897) [ClassicSimilarity], result of:
      0.08306122 = score(doc=897,freq=8.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.4867139 = fieldWeight in 897, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=897)
  0.33333334 = coord(3/9)
```
Abstract

Dictionary-based query translation for cross-language information retrieval often yields various translation candidates having different meanings for a source term in the query. This paper examines methods for solving the ambiguity of translations based on only the target document collections. First, we discuss two kinds of disambiguation technique: (1) one is a method using term co-occurrence statistics in the collection, and (2) a technique based on pseudo-relevance feedback. Next, these techniques are empirically compared using the CLEF 2003 test collection for German to Italian bilingual searches, which are executed by using English language as a pivot. The experiments showed that a variation of term co-occurrence based techniques, in which the best sequence algorithm for selecting translations is used with the Cosine coefficient, is dominant, and that the PRF method shows comparable high search performance, although statistical tests did not sufficiently support these conclusions. Furthermore, we repeat the same experiments for the case of French to Italian (pivot) and English to Italian (non-pivot) searches on the same CLEF 2003 test collection in order to verity our findings. Again, similar results were observed except that the Dice coefficient outperforms slightly the Cosine coefficient in the case of disambiguation based on term co-occurrence for English to Italian searches.

Source

Information processing and management. 43(2007) no.1, S.103-120

Mustafa el Hadi, W.: Dynamics of the linguistic paradigm in information retrieval (2000) 0.04

0.0398466 = product of:
  0.1195398 = sum of:
    0.01582825 = weight(_text_:information in 151) [ClassicSimilarity], result of:
      0.01582825 = score(doc=151,freq=8.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.23274569 = fieldWeight in 151, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=151)
    0.033231772 = weight(_text_:retrieval in 151) [ClassicSimilarity], result of:
      0.033231772 = score(doc=151,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.2835858 = fieldWeight in 151, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=151)
    0.07047977 = weight(_text_:techniques in 151) [ClassicSimilarity], result of:
      0.07047977 = score(doc=151,freq=4.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.4129904 = fieldWeight in 151, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=151)
  0.33333334 = coord(3/9)

Abstract: In this paper we briefly sketch the dynamics of the linguistic paradigm in Information Retrieval (IR) and its adaptation to the Internet. The emergence of Natural Language Processing (NLP) techniques has been a major factor leading to this adaptation. These techniques and tools try to adapt to the current needs, i.e. retrieving information from documents written and indexed in a foreign language by using a native language query to express the information need. This process, known as cross-language IR (CLIR), is a field at the cross roads of both Machine Translation and IR. This field represents a real challenge to the IR community and will require a solid cooperation with the NLP community.

Mustafa el Hadi, W.: Human language technology and its role in information access and management (2003) 0.04
```
0.03958424 = product of:
  0.11875272 = sum of:
    0.02085555 = weight(_text_:information in 5524) [ClassicSimilarity], result of:
      0.02085555 = score(doc=5524,freq=20.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.30666938 = fieldWeight in 5524, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5524)
    0.03916402 = weight(_text_:retrieval in 5524) [ClassicSimilarity], result of:
      0.03916402 = score(doc=5524,freq=8.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.33420905 = fieldWeight in 5524, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5524)
    0.058733147 = weight(_text_:techniques in 5524) [ClassicSimilarity], result of:
      0.058733147 = score(doc=5524,freq=4.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.34415868 = fieldWeight in 5524, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5524)
  0.33333334 = coord(3/9)
```
Abstract

The role of linguistics in information access, extraction and dissemination is essential. Radical changes in the techniques of information and communication at the end of the twentieth century have had a significant effect on the function of the linguistic paradigm and its applications in all forms of communication. The introduction of new technical means have deeply changed the possibilities for the distribution of information. In this situation, what is the role of the linguistic paradigm and its practical applications, i.e., natural language processing (NLP) techniques when applied to information access? What solutions can linguistics offer in human computer interaction, extraction and management? Many fields show the relevance of the linguistic paradigm through the various technologies that require NLP, such as document and message understanding, information detection, extraction, and retrieval, question and answer, cross-language information retrieval (CLIR), text summarization, filtering, and spoken document retrieval. This paper focuses on the central role of human language technologies in the information society, surveys the current situation, describes the benefits of the above mentioned applications, outlines successes and challenges, and discusses solutions. It reviews the resources and means needed to advance information access and dissemination across language boundaries in the twenty-first century. Multilingualism, which is a natural result of globalization, requires more effort in the direction of language technology. The scope of human language technology (HLT) is large, so we limit our review to applications that involve multilinguality.

Content

Beitrag eines Themenheftes "Knowledge organization and classification in international information retrieval"

Pearce, C.; Nicholas, C.: TELLTALE: Experiments in a dynamic hypertext environment for degraded and multilingual data (1996) 0.04

0.03830127 = product of:
  0.11490381 = sum of:
    0.011192262 = weight(_text_:information in 4071) [ClassicSimilarity], result of:
      0.011192262 = score(doc=4071,freq=4.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.16457605 = fieldWeight in 4071, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4071)
    0.033231772 = weight(_text_:retrieval in 4071) [ClassicSimilarity], result of:
      0.033231772 = score(doc=4071,freq=4.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.2835858 = fieldWeight in 4071, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=4071)
    0.07047977 = weight(_text_:techniques in 4071) [ClassicSimilarity], result of:
      0.07047977 = score(doc=4071,freq=4.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.4129904 = fieldWeight in 4071, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=4071)
  0.33333334 = coord(3/9)

Abstract: Methods and tools for finding documents relevant to a user's needs in a document corpora can be found in the information retrieval, library science, and hypertext communities. Typically, these systems provide retrieval capabilities for fairly static copora, their algorithms are dependent on the language for which they are written, e.g. English, and they do not perform well when presented with misspelled words or text that has been degraded by OCR techniques. In this article, we present experimentation results for the TELLTALE system. TELLTALE is a dynamic hypertext environment that provides full-text search from a hypertext-style user interface for text corpora that may be garbled by OCR or transmission errors, and that may contain languages other than English. TELLTALE uses several techniques based on n-grams (n character sequences of text). With these results we show that the dynamic linkage mechanisms in TELLTALE are tolerant of garbles in up to 30% of the characters in the body of the texts
Source: Journal of the American Society for Information Science. 47(1996) no.4, S.263-275

Pollitt, A.S.; Ellis, G.: Multilingual access to document databases (1993) 0.04

0.03545514 = product of:
  0.10636542 = sum of:
    0.01582825 = weight(_text_:information in 1302) [ClassicSimilarity], result of:
      0.01582825 = score(doc=1302,freq=8.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.23274569 = fieldWeight in 1302, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1302)
    0.040700447 = weight(_text_:retrieval in 1302) [ClassicSimilarity], result of:
      0.040700447 = score(doc=1302,freq=6.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.34732026 = fieldWeight in 1302, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1302)
    0.049836725 = weight(_text_:techniques in 1302) [ClassicSimilarity], result of:
      0.049836725 = score(doc=1302,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.2920283 = fieldWeight in 1302, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=1302)
  0.33333334 = coord(3/9)

Abstract: This paper examines the reasons why approaches to facilitate document retrieval which apply AI (Artificial Intelligence) or Expert Systems techniques, relying on so-called "natural language" query statements from the end-user will result in sub-optimal solutions. It does so by reflecting on the nature of language and the fundamental problems in document retrieval. Support is given to the work of thesaurus builders and indexers with illustrations of how their work may be utilised in a generally applicable computer-based document retrieval system using Multilingual MenUSE software. The EuroMenUSE interface providing multilingual document access to EPOQUE, the European Parliament's Online Query System is described.
Imprint: Antigonish, NS : Canadian Association for Information Science
Series: Annual Conference / Canadian Association for Information Science ; 21
Source: Information as a Global Commodity - Communication, Processing and Use (CAIS/ACSI '93) : 21st Annual Conference Canadian Association for Information Science, Antigonish, Nova Scotia, Canada. July 1993

Oard, D.W.; He, D.; Wang, J.: User-assisted query translation for interactive cross-language information retrieval (2008) 0.04

0.03545514 = product of:
  0.10636542 = sum of:
    0.01582825 = weight(_text_:information in 2030) [ClassicSimilarity], result of:
      0.01582825 = score(doc=2030,freq=8.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.23274569 = fieldWeight in 2030, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2030)
    0.040700447 = weight(_text_:retrieval in 2030) [ClassicSimilarity], result of:
      0.040700447 = score(doc=2030,freq=6.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.34732026 = fieldWeight in 2030, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2030)
    0.049836725 = weight(_text_:techniques in 2030) [ClassicSimilarity], result of:
      0.049836725 = score(doc=2030,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.2920283 = fieldWeight in 2030, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=2030)
  0.33333334 = coord(3/9)

Abstract: Interactive Cross-Language Information Retrieval (CLIR), a process in which searcher and system collaborate to find documents that satisfy an information need regardless of the language in which those documents are written, calls for designs in which synergies between searcher and system can be leveraged so that the strengths of one can cover weaknesses of the other. This paper describes an approach that employs user-assisted query translation to help searchers better understand the system's operation. Supporting interaction and interface designs are introduced, and results from three user studies are presented. The results indicate that experienced searchers presented with this new system evolve new search strategies that make effective use of the new capabilities, that they achieve retrieval effectiveness comparable to results obtained using fully automatic techniques, and that reported satisfaction with support for cross-language searching increased. The paper concludes with a description of a freely available interactive CLIR system that incorporates lessons learned from this research.
Source: Information processing and management. 44(2008) no.1, S.181-211

Peters, C.; Braschler, M.; Clough, P.: Multilingual information retrieval : from research to practice (2012) 0.04
```
0.035205204 = product of:
  0.10561561 = sum of:
    0.020434182 = weight(_text_:information in 361) [ClassicSimilarity], result of:
      0.020434182 = score(doc=361,freq=30.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.3004734 = fieldWeight in 361, product of:
          5.477226 = tf(freq=30.0), with freq of:
            30.0 = termFreq=30.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=361)
    0.051956944 = weight(_text_:retrieval in 361) [ClassicSimilarity], result of:
      0.051956944 = score(doc=361,freq=22.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.44337842 = fieldWeight in 361, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=361)
    0.033224486 = weight(_text_:techniques in 361) [ClassicSimilarity], result of:
      0.033224486 = score(doc=361,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.19468555 = fieldWeight in 361, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.03125 = fieldNorm(doc=361)
  0.33333334 = coord(3/9)
```
Abstract

We are living in a multilingual world and the diversity in languages which are used to interact with information access systems has generated a wide variety of challenges to be addressed by computer and information scientists. The growing amount of non-English information accessible globally and the increased worldwide exposure of enterprises also necessitates the adaptation of Information Retrieval (IR) methods to new, multilingual settings.Peters, Braschler and Clough present a comprehensive description of the technologies involved in designing and developing systems for Multilingual Information Retrieval (MLIR). They provide readers with broad coverage of the various issues involved in creating systems to make accessible digitally stored materials regardless of the language(s) they are written in. Details on Cross-Language Information Retrieval (CLIR) are also covered that help readers to understand how to develop retrieval systems that cross language boundaries. Their work is divided into six chapters and accompanies the reader step-by-step through the various stages involved in building, using and evaluating MLIR systems. The book concludes with some examples of recent applications that utilise MLIR technologies. Some of the techniques described have recently started to appear in commercial search systems, while others have the potential to be part of future incarnations.The book is intended for graduate students, scholars, and practitioners with a basic understanding of classical text retrieval methods. It offers guidelines and information on all aspects that need to be taken into consideration when building MLIR systems, while avoiding too many 'hands-on details' that could rapidly become obsolete. Thus it bridges the gap between the material covered by most of the classical IR textbooks and the novel requirements related to the acquisition and dissemination of information in whatever language it is stored.

Content

Inhalt: 1 Introduction 2 Within-Language Information Retrieval 3 Cross-Language Information Retrieval 4 Interaction and User Interfaces 5 Evaluation for Multilingual Information Retrieval Systems 6 Applications of Multilingual Information Access

RSWK

Information-Retrieval-System / Mehrsprachigkeit / Abfrage / Zugriff

Subject

Information-Retrieval-System / Mehrsprachigkeit / Abfrage / Zugriff

Airio, E.; Kettunen, K.: Does dictionary based bilingual retrieval work in a non-normalized index? (2009) 0.03

0.034915894 = product of:
  0.104747675 = sum of:
    0.007914125 = weight(_text_:information in 4224) [ClassicSimilarity], result of:
      0.007914125 = score(doc=4224,freq=2.0), product of:
        0.06800663 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.038739666 = queryNorm
        0.116372846 = fieldWeight in 4224, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4224)
    0.046996824 = weight(_text_:retrieval in 4224) [ClassicSimilarity], result of:
      0.046996824 = score(doc=4224,freq=8.0), product of:
        0.1171842 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.038739666 = queryNorm
        0.40105087 = fieldWeight in 4224, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=4224)
    0.049836725 = weight(_text_:techniques in 4224) [ClassicSimilarity], result of:
      0.049836725 = score(doc=4224,freq=2.0), product of:
        0.17065717 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.038739666 = queryNorm
        0.2920283 = fieldWeight in 4224, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.046875 = fieldNorm(doc=4224)
  0.33333334 = coord(3/9)

Abstract: Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English-Finnish, English-Swedish, Swedish-Finnish and Finnish-Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish-Swedish, where s-gramming outperformed FCG.
Source: Information processing and management. 45(2009) no.6, S.703-713

Search (225 results, page 1 of 12)

Authors

Years

Languages

Types

Themes

Subjects

Classifications