Search (90 results, page 5 of 5)

  • × theme_ss:"Multilinguale Probleme"
  1. Schlenkrich, C.: Aspekte neuer Regelwerksarbeit : Multimediales Datenmodell für ARD und ZDF (2003) 0.01
    0.0070702205 = product of:
      0.014140441 = sum of:
        0.014140441 = product of:
          0.028280882 = sum of:
            0.028280882 = weight(_text_:22 in 1515) [ClassicSimilarity], result of:
              0.028280882 = score(doc=1515,freq=2.0), product of:
                0.1827397 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.052184064 = queryNorm
                0.15476047 = fieldWeight in 1515, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1515)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 4.2003 12:05:56
  2. Ballesteros, L.A.: Cross-language retrieval via transitive relation (2000) 0.01
    0.0068065543 = product of:
      0.013613109 = sum of:
        0.013613109 = product of:
          0.027226217 = sum of:
            0.027226217 = weight(_text_:systems in 30) [ClassicSimilarity], result of:
              0.027226217 = score(doc=30,freq=2.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.1697705 = fieldWeight in 30, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=30)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The growth in availability of multi-lingual data in all areas of the public and private sector is driving an increasing need for systems that facilitate access to multi-lingual resources. Cross-language Retrieval (CLR) technology is a means of addressing this need. A CLR system must address two main hurdles to effective cross-language retrieval. First, it must address the ambiguity that arises when trying to map the meaning of text across languages. That is, it must address both within-language ambiguity and cross-language ambiguity. Second, it has to incorporate multilingual resources that will enable it to perform the mapping across languages. The difficulty here is that there is a limited number of lexical resources and virtually none for some pairs of languages. This work focuses on a dictionary approach to addressing the problem of limited lexical resources. A dictionary approach is taken since bilingual dictionaries are more prevalent and simpler to apply than other resources. We show that a transitive translation approach, where a third language is employed as an interlingua between the source and target languages, is a viable means of performing CLR between languages for which no bilingual dictionary is available
  3. Li, K.W.; Yang, C.C.: Conceptual analysis of parallel corpus collected from the Web (2006) 0.01
    0.0068065543 = product of:
      0.013613109 = sum of:
        0.013613109 = product of:
          0.027226217 = sum of:
            0.027226217 = weight(_text_:systems in 5051) [ClassicSimilarity], result of:
              0.027226217 = score(doc=5051,freq=2.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.1697705 = fieldWeight in 5051, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5051)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Footnote
    Beitrag einer special topic section on multilingual information systems
  4. Talvensaari, T.; Juhola, M.; Laurikkala, J.; Järvelin, K.: Corpus-based cross-language information retrieval in retrieval of highly relevant documents (2007) 0.01
    0.0068065543 = product of:
      0.013613109 = sum of:
        0.013613109 = product of:
          0.027226217 = sum of:
            0.027226217 = weight(_text_:systems in 139) [ClassicSimilarity], result of:
              0.027226217 = score(doc=139,freq=2.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.1697705 = fieldWeight in 139, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=139)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Information retrieval systems' ability to retrieve highly relevant documents has become more and more important in the age of extremely large collections, such as the World Wide Web (WWW). The authors' aim was to find out how corpus-based cross-language information retrieval (CLIR) manages in retrieving highly relevant documents. They created a Finnish-Swedish comparable corpus from two loosely related document collections and used it as a source of knowledge for query translation. Finnish test queries were translated into Swedish and run against a Swedish test collection. Graded relevance assessments were used in evaluating the results and three relevance criterion levels-liberal, regular, and stringent-were applied. The runs were also evaluated with generalized recall and precision, which weight the retrieved documents according to their relevance level. The performance of the Comparable Corpus Translation system (COCOT) was compared to that of a dictionarybased query translation program; the two translation methods were also combined. The results indicate that corpus-based CUR performs particularly well with highly relevant documents. In average precision, COCOT even matched the monolingual baseline on the highest relevance level. The performance of the different query translation methods was further analyzed by finding out reasons for poor rankings of highly relevant documents.
  5. Bellaachia, A.; Amor-Tijani, G.: Proper nouns in English-Arabic cross language information retrieval (2008) 0.01
    0.0068065543 = product of:
      0.013613109 = sum of:
        0.013613109 = product of:
          0.027226217 = sum of:
            0.027226217 = weight(_text_:systems in 2372) [ClassicSimilarity], result of:
              0.027226217 = score(doc=2372,freq=2.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.1697705 = fieldWeight in 2372, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2372)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Out of vocabulary words, mostly proper nouns and technical terms, are one main source of performance degradation in Cross Language Information Retrieval (CLIR) systems. Those are words not found in the dictionary. Bilingual dictionaries in general do not cover most proper nouns, which are usually primary keys in the query. As they are spelling variants of each other in most languages, using an approximate string matching technique against the target database index is the common approach taken to find the target language correspondents of the original query key. N-gram technique proved to be the most effective among other string matching techniques. The issue arises when the languages dealt with have different alphabets. Transliteration is then applied based on phonetic similarities between the languages involved. In this study, both transliteration and the n-gram technique are combined to generate possible transliterations in an English-Arabic CLIR system. We refer to this technique as Transliteration N-Gram (TNG). We further enhance TNG by applying Part Of Speech disambiguation on the set of transliterations so that words with a similar spelling, but a different meaning, are excluded. Experimental results show that TNG gives promising results, and enhanced TNG further improves performance.
  6. Luca, E.W. de: Extending the linked data cloud with multilingual lexical linked data (2013) 0.01
    0.0068065543 = product of:
      0.013613109 = sum of:
        0.013613109 = product of:
          0.027226217 = sum of:
            0.027226217 = weight(_text_:systems in 1073) [ClassicSimilarity], result of:
              0.027226217 = score(doc=1073,freq=2.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.1697705 = fieldWeight in 1073, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1073)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A lot of information that is already available on the Web, or retrieved from local information systems and social networks, is structured in data silos that are not semantically related. Semantic technologies make it apparent that the use of typed links that directly express their relations are an advantage for every application that can reuse the incorporated knowledge about the data. For this reason, data integration, through reengineering (e.g., triplify) or querying (e.g., D2R), is an important task in order to make information available for everyone. Thus, in order to build a semantic map of the data, we need knowledge about data items itself and the relation between heterogeneous data items. Here we present our work of providing Lexical Linked Data (LLD) through a meta-model that contains all the resources and gives the possibility to retrieve and navigate them from different perspectives. After giving the definition of Lexical Linked Data, we describe the existing datasets we collected and the new datasets we included. Here we describe their format and show some use cases where we link lexical data, and show how to reuse and inference semantic data derived from lexical data. Different lexical resources (MultiWordNet, EuroWordNet, MEMODATA Lexicon, the Hamburg Methaphor Database) are connected to each other towards an Integrated Vocabulary for LLD that we evaluate and present.
  7. Ménard, E.: Ordinary image retrieval in a multilingual context : a comparison of two indexing vocabularies (2010) 0.01
    0.0054452433 = product of:
      0.010890487 = sum of:
        0.010890487 = product of:
          0.021780973 = sum of:
            0.021780973 = weight(_text_:systems in 3946) [ClassicSimilarity], result of:
              0.021780973 = score(doc=3946,freq=2.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.1358164 = fieldWeight in 3946, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3946)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Purpose - This paper seeks to examine image retrieval within two different contexts: a monolingual context where the language of the query is the same as the indexing language and a multilingual context where the language of the query is different from the indexing language. The study also aims to compare two different approaches for the indexing of ordinary images representing common objects: traditional image indexing with the use of a controlled vocabulary and free image indexing using uncontrolled vocabulary. Design/methodology/approach - This research uses three data collection methods. An analysis of the indexing terms was employed in order to examine the multiplicity of term types assigned to images. A simulation of the retrieval process involving a set of 30 images was performed with 60 participants. The quantification of the retrieval performance of each indexing approach was based on the usability measures, that is, effectiveness, efficiency and satisfaction of the user. Finally, a questionnaire was used to gather information on searcher satisfaction during and after the retrieval process. Findings - The results of this research are twofold. The analysis of indexing terms associated with all the 3,950 images provides a comprehensive description of the characteristics of the four non-combined indexing forms used for the study. Also, the retrieval simulation results offers information about the relative performance of the six indexing forms (combined and non-combined) in terms of their effectiveness, efficiency (temporal and human) and the image searcher's satisfaction. Originality/value - The findings of the study suggest that, in the near future, the information systems could benefit from allowing an increased coexistence of controlled vocabularies and uncontrolled vocabularies, resulting from collaborative image tagging, for example, and giving the users the possibility to dynamically participate in the image-indexing process, in a more user-centred way.
  8. Cross-language information retrieval (1998) 0.00
    0.0048129605 = product of:
      0.009625921 = sum of:
        0.009625921 = product of:
          0.019251842 = sum of:
            0.019251842 = weight(_text_:systems in 6299) [ClassicSimilarity], result of:
              0.019251842 = score(doc=6299,freq=4.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.12004587 = fieldWeight in 6299, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.01953125 = fieldNorm(doc=6299)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Footnote
    The retrieved output from a query including the phrase 'big rockets' may be, for instance, a sentence containing 'giant rocket' which is semantically ranked above 'military ocket'. David Hull (Xerox Research Centre, Grenoble) describes an implementation of a weighted Boolean model for Spanish-English CLIR. Users construct Boolean-type queries, weighting each term in the query, which is then translated by an on-line dictionary before being applied to the database. Comparisons with the performance of unweighted free-form queries ('vector space' models) proved encouraging. Two contributions consider the evaluation of CLIR systems. In order to by-pass the time-consuming and expensive process of assembling a standard collection of documents and of user queries against which the performance of an CLIR system is manually assessed, Páriac Sheridan et al (ETH Zurich) propose a method based on retrieving 'seed documents'. This involves identifying a unique document in a database (the 'seed document') and, for a number of queries, measuring how fast it is retrieved. The authors have also assembled a large database of multilingual news documents for testing purposes. By storing the (fairly short) documents in a structured form tagged with descriptor codes (e.g. for topic, country and area), the test suite is easily expanded while remaining consistent for the purposes of testing. Douglas Ouard and Bonne Dorr (University of Maryland) describe an evaluation methodology which appears to apply LSI techniques in order to filter and rank incoming documents designed for testing CLIR systems. The volume provides the reader an excellent overview of several projects in CLIR. It is well supported with references and is intended as a secondary text for researchers and practitioners. It highlights the need for a good, general tutorial introduction to the field."
  9. Borgman, C.L.: Multi-media, multi-cultural, and multi-lingual digital libraries : or how do we exchange data In 400 languages? (1997) 0.00
    0.004764588 = product of:
      0.009529176 = sum of:
        0.009529176 = product of:
          0.019058352 = sum of:
            0.019058352 = weight(_text_:systems in 1263) [ClassicSimilarity], result of:
              0.019058352 = score(doc=1263,freq=2.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.118839346 = fieldWeight in 1263, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=1263)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The Internet would not be very useful if communication were limited to textual exchanges between speakers of English located in the United States. Rather, its value lies in its ability to enable people from multiple nations, speaking multiple languages, to employ multiple media in interacting with each other. While computer networks broke through national boundaries long ago, they remain much more effective for textual communication than for exchanges of sound, images, or mixed media -- and more effective for communication in English than for exchanges in most other languages, much less interactions involving multiple languages. Supporting searching and display in multiple languages is an increasingly important issue for all digital libraries accessible on the Internet. Even if a digital library contains materials in only one language, the content needs to be searchable and displayable on computers in countries speaking other languages. We need to exchange data between digital libraries, whether in a single language or in multiple languages. Data exchanges may be large batch updates or interactive hyperlinks. In any of these cases, character sets must be represented in a consistent manner if exchanges are to succeed. Issues of interoperability, portability, and data exchange related to multi-lingual character sets have received surprisingly little attention in the digital library community or in discussions of standards for information infrastructure, except in Europe. The landmark collection of papers on Standards Policy for Information Infrastructure, for example, contains no discussion of multi-lingual issues except for a passing reference to the Unicode standard. The goal of this short essay is to draw attention to the multi-lingual issues involved in designing digital libraries accessible on the Internet. Many of the multi-lingual design issues parallel those of multi-media digital libraries, a topic more familiar to most readers of D-Lib Magazine. This essay draws examples from multi-media DLs to illustrate some of the urgent design challenges in creating a globally distributed network serving people who speak many languages other than English. First we introduce some general issues of medium, culture, and language, then discuss the design challenges in the transition from local to global systems, lastly addressing technical matters. The technical issues involve the choice of character sets to represent languages, similar to the choices made in representing images or sound. However, the scale of the language problem is far greater. Standards for multi-media representation are being adopted fairly rapidly, in parallel with the availability of multi-media content in electronic form. By contrast, we have hundreds (and sometimes thousands) of years worth of textual materials in hundreds of languages, created long before data encoding standards existed. Textual content from past and present is being encoded in language and application-specific representations that are difficult to exchange without losing data -- if they exchange at all. We illustrate the multi-language DL challenge with examples drawn from the research library community, which typically handles collections of materials in 400 or so languages. These are problems faced not only by developers of digital libraries, but by those who develop and manage any communication technology that crosses national or linguistic boundaries.
  10. Markó, K.G.: Foundation, implementation and evaluation of the MorphoSaurus system (2008) 0.00
    0.004764588 = product of:
      0.009529176 = sum of:
        0.009529176 = product of:
          0.019058352 = sum of:
            0.019058352 = weight(_text_:systems in 4415) [ClassicSimilarity], result of:
              0.019058352 = score(doc=4415,freq=2.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.118839346 = fieldWeight in 4415, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=4415)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The proper handling of acronyms plays a crucial role in medical texts, e.g. in patient records, as well as in scientific literature. Chapter six presents an approach, in which acronyms are automatically acquired from (bio-) medical literature. Furthermore, acronyms and their definitions in different languages are linked to each other using the MorphoSaurus text processing system. Automatic word sense disambiguation is still one of the most challenging tasks in Natural Language Processing. In Chapter seven, cross-lingual considerations lead to a new methodology for automatic disambiguation applied to subwords. Beginning with Chapter eight, a series of applications based onMorphoSaurus are introduced. Firstly, the implementation of the subword approach within a crosslanguage information retrieval setting for the medical domain is described and evaluated on standard test document collections. In Chapter nine, this methodology is extended to multilingual information retrieval in the Web, for which user queries are translated into target languages based on the segmentation into subwords and their interlingual mappings. The cross-lingual, automatic assignment of document descriptors to documents is the topic of Chapter ten. A large-scale evaluation of a heuristic, as well as a statistical algorithm is carried out using a prominent medical thesaurus as a controlled vocabulary. In Chapter eleven, it will be shown how MorphoSaurus can be used to map monolingual, lexical resources across different languages. As a result, a large multilingual medical lexicon with high coverage and complete lexical information is built and evaluated against a comparable, already available and commonly used lexical repository for the medical domain. Chapter twelve sketches a few applications based on MorphoSaurus. The generality and applicability of the subword approach to other domains is outlined, and proof-of-concepts in real-world scenarios are presented. Finally, Chapter thirteen recapitulates the most important aspects of MorphoSaurus and the potential benefit of its employment in medical information systems is carefully assessed, both for medical experts in their everyday life, but also with regard to health care consumers and their existential information needs.

Years

Languages

  • e 79
  • d 8
  • f 1
  • m 1
  • ro 1
  • More… Less…

Types

  • a 80
  • el 8
  • m 3
  • s 3
  • x 2
  • r 1
  • More… Less…