Search (112 results, page 1 of 6)

  • × theme_ss:"Multilinguale Probleme"
  1. Niininen, S.; Nykyri, S.; Suominen, O.: ¬The future of metadata : open, linked, and multilingual - the YSO case (2017) 0.03
    0.02514151 = product of:
      0.087995276 = sum of:
        0.019587006 = weight(_text_:based in 3707) [ClassicSimilarity], result of:
          0.019587006 = score(doc=3707,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.16644597 = fieldWeight in 3707, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3707)
        0.068408266 = weight(_text_:great in 3707) [ClassicSimilarity], result of:
          0.068408266 = score(doc=3707,freq=2.0), product of:
            0.21992016 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.03905679 = queryNorm
            0.31105953 = fieldWeight in 3707, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3707)
      0.2857143 = coord(2/7)
    
    Abstract
    Purpose The purpose of this paper is threefold: to focus on the process of multilingual concept scheme construction and the challenges involved; to addresses concrete challenges faced in the construction process and especially those related to equivalence between terms and concepts; and to briefly outlines the translation strategies developed during the process of concept scheme construction. Design/methodology/approach The analysis is based on experience acquired during the establishment of the Finnish thesaurus and ontology service Finto as well as the trilingual General Finnish Ontology YSO, both of which are being maintained and further developed at the National Library of Finland. Findings Although uniform resource identifiers can be considered language-independent, they do not render concept schemes and their construction free of language-related challenges. The fundamental issue with all the challenges faced is how to maintain consistency and predictability when the nature of language requires each concept to be treated individually. The key to such challenges is to recognise the function of the vocabulary and the needs of its intended users. Social implications Open science increases the transparency of not only research products, but also metadata tools. Gaining a deeper understanding of the challenges involved in their construction is important for a great variety of users - e.g. indexers, vocabulary builders and information seekers. Today, multilingualism is an essential aspect at both the national and international information society level. Originality/value This paper draws on the practical challenges faced in concept scheme construction in a trilingual environment, with a focus on "concept scheme" as a translation and mapping unit.
  2. Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.02
    0.0233882 = product of:
      0.081858695 = sum of:
        0.05540042 = weight(_text_:based in 4157) [ClassicSimilarity], result of:
          0.05540042 = score(doc=4157,freq=4.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.47078028 = fieldWeight in 4157, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.078125 = fieldNorm(doc=4157)
        0.026458278 = product of:
          0.052916557 = sum of:
            0.052916557 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
              0.052916557 = score(doc=4157,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.38690117 = fieldWeight in 4157, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4157)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Source
    Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill
  3. Francu, V.: Language-independent structures and multilingual information access (2003) 0.02
    0.020113206 = product of:
      0.070396215 = sum of:
        0.015669605 = weight(_text_:based in 2753) [ClassicSimilarity], result of:
          0.015669605 = score(doc=2753,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.13315678 = fieldWeight in 2753, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03125 = fieldNorm(doc=2753)
        0.05472661 = weight(_text_:great in 2753) [ClassicSimilarity], result of:
          0.05472661 = score(doc=2753,freq=2.0), product of:
            0.21992016 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.03905679 = queryNorm
            0.24884763 = fieldWeight in 2753, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.03125 = fieldNorm(doc=2753)
      0.2857143 = coord(2/7)
    
    Abstract
    The existence of huge amounts of information available in information systems and networks worldwide imposes the creation of adequate tools able to efficiently organize it and allow its retrieval across geographical, linguistic and cultural boundaries. An indexing language covering all areas of knowledge and converting the language-independent structure of a classification system like the Universal Decimal Classification into a thesaurus structure in more than one language seems to be a solution. Among the key attributes of the indexing language thus obtained we can mention: consistency in indexing, control an terms, user-friendliness. The paper presents the great potential in information retrieval of the combined retrieval method by means of a case study. 1. Introduction Among the consequences of the rapid development of the global information society a major one is the existence of huge amounts of information stored in information systems and networks across geographical, linguistic and cultural boundaries. The need was imposed to create tools and technologies able to efficiently organize and allow retrieval of information in this universal context. Information professionals had to cope not only with the multitude of knowledge organisation and representation systems but also with the multitude of languages the available information is stored in order to provide the users with effective information retrieval tools. For this purpose a real language industry has been developed, theoreticians and researchers making considerable efforts to find feasible solutions to problems of multilingual access by way of natural language processing and machine translation methodologies. Such corporate efforts belong to the CoBRA+ working group for multilingual access to subjects (MACS) or to the cross-language information retrieval (CLIR) tracks of the Text Retrieval Conferences that annually report the progress made in multilingual information access and retrieval. The encouraging results they have obtained so far are still confined to discipline/domain restrictions and most of their achievements are based an language pairs rather than multiple language combinations.
  4. Zhou, Y. et al.: Analysing entity context in multilingual Wikipedia to support entity-centric retrieval applications (2016) 0.02
    0.018752083 = product of:
      0.06563229 = sum of:
        0.039174013 = weight(_text_:based in 2758) [ClassicSimilarity], result of:
          0.039174013 = score(doc=2758,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.33289194 = fieldWeight in 2758, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.078125 = fieldNorm(doc=2758)
        0.026458278 = product of:
          0.052916557 = sum of:
            0.052916557 = weight(_text_:22 in 2758) [ClassicSimilarity], result of:
              0.052916557 = score(doc=2758,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.38690117 = fieldWeight in 2758, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2758)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Date
    1. 2.2016 18:25:22
    Source
    Semantic keyword-based search on structured data sources: First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers. Eds.: J. Cardoso et al
  5. Cao, L.; Leong, M.-K.; Low, H.-B.: Searching heterogeneous multilingual bibliographic sources (1998) 0.02
    0.015001667 = product of:
      0.052505832 = sum of:
        0.03133921 = weight(_text_:based in 3564) [ClassicSimilarity], result of:
          0.03133921 = score(doc=3564,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.26631355 = fieldWeight in 3564, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0625 = fieldNorm(doc=3564)
        0.021166623 = product of:
          0.042333245 = sum of:
            0.042333245 = weight(_text_:22 in 3564) [ClassicSimilarity], result of:
              0.042333245 = score(doc=3564,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.30952093 = fieldWeight in 3564, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3564)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Propopses a Web-based architecture for searching distributed heterogeneous multi-asian language bibliographic sources, and describes a successful pilot implementation of the system at the Chinese Library (CLib) system developed in Singapore and tested at 2 university libraries and a public library
    Date
    1. 8.1996 22:08:06
  6. Frâncu, V.; Sabo, C.-N.: Implementation of a UDC-based multilingual thesaurus in a library catalogue : the case of BiblioPhil (2010) 0.01
    0.01403292 = product of:
      0.04911522 = sum of:
        0.03324025 = weight(_text_:based in 3697) [ClassicSimilarity], result of:
          0.03324025 = score(doc=3697,freq=4.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.28246817 = fieldWeight in 3697, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=3697)
        0.015874967 = product of:
          0.031749934 = sum of:
            0.031749934 = weight(_text_:22 in 3697) [ClassicSimilarity], result of:
              0.031749934 = score(doc=3697,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.23214069 = fieldWeight in 3697, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3697)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    In order to enhance the use of Universal Decimal Classification (UDC) numbers in information retrieval, the authors have represented classification with multilingual thesaurus descriptors and implemented this solution in an automated way. The authors illustrate a solution implemented in a BiblioPhil library system. The standard formats used are UNIMARC for subject authority records (i.e. the UDC-based multilingual thesaurus) and MARC XML support for data transfer. The multilingual thesaurus was built according to existing standards, the constituent parts of the classification notations being used as the basis for search terms in the multilingual information retrieval. The verbal equivalents, descriptors and non-descriptors, are used to expand the number of concepts and are given in Romanian, English and French. This approach saves the time of the indexer and provides more user-friendly and easier access to the bibliographic information. The multilingual aspect of the thesaurus enhances information access for a greater number of online users
    Date
    22. 7.2010 20:40:56
  7. De Luca, E.W.; Dahlberg, I.: Including knowledge domains from the ICC into the multilingual lexical linked data cloud (2014) 0.01
    0.013259726 = product of:
      0.046409037 = sum of:
        0.02770021 = weight(_text_:based in 1493) [ClassicSimilarity], result of:
          0.02770021 = score(doc=1493,freq=4.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.23539014 = fieldWeight in 1493, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1493)
        0.018708827 = product of:
          0.037417654 = sum of:
            0.037417654 = weight(_text_:22 in 1493) [ClassicSimilarity], result of:
              0.037417654 = score(doc=1493,freq=4.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.27358043 = fieldWeight in 1493, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1493)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    A lot of information that is already available on the Web, or retrieved from local information systems and social networks is structured in data silos that are not semantically related. Semantic technologies make it emerge that the use of typed links that directly express their relations are an advantage for every application that can reuse the incorporated knowledge about the data. For this reason, data integration, through reengineering (e.g. triplify), or querying (e.g. D2R) is an important task in order to make information available for everyone. Thus, in order to build a semantic map of the data, we need knowledge about data items itself and the relation between heterogeneous data items. In this paper, we present our work of providing Lexical Linked Data (LLD) through a meta-model that contains all the resources and gives the possibility to retrieve and navigate them from different perspectives. We combine the existing work done on knowledge domains (based on the Information Coding Classification) within the Multilingual Lexical Linked Data Cloud (based on the RDF/OWL EurowordNet and the related integrated lexical resources (MultiWordNet, EuroWordNet, MEMODATA Lexicon, Hamburg Methaphor DB).
    Date
    22. 9.2014 19:01:18
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  8. Chen, H.-H.; Lin, W.-C.; Yang, C.; Lin, W.-H.: Translating-transliterating named entities for multilingual information access (2006) 0.01
    0.013126459 = product of:
      0.045942605 = sum of:
        0.02742181 = weight(_text_:based in 1080) [ClassicSimilarity], result of:
          0.02742181 = score(doc=1080,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.23302436 = fieldWeight in 1080, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1080)
        0.018520795 = product of:
          0.03704159 = sum of:
            0.03704159 = weight(_text_:22 in 1080) [ClassicSimilarity], result of:
              0.03704159 = score(doc=1080,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.2708308 = fieldWeight in 1080, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1080)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Named entities are major constituents of a document but are usually unknown words. This work proposes a systematic way of dealing with formulation, transformation, translation, and transliteration of multilingual-named entities. The rules and similarity matrices for translation and transliteration are learned automatically from parallel-named-entity corpora. The results are applied in cross-language access to collections of images with captions. Experimental results demonstrate that the similarity-based transliteration of named entities is effective, and runs in which transliteration is considered outperform the runs in which it is neglected.
    Date
    4. 6.2006 19:52:22
  9. Dilevko, J.; Dali, K.: ¬The challenge of building multilingual collections in Canadian public libraries (2002) 0.01
    0.013126459 = product of:
      0.045942605 = sum of:
        0.02742181 = weight(_text_:based in 139) [ClassicSimilarity], result of:
          0.02742181 = score(doc=139,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.23302436 = fieldWeight in 139, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=139)
        0.018520795 = product of:
          0.03704159 = sum of:
            0.03704159 = weight(_text_:22 in 139) [ClassicSimilarity], result of:
              0.03704159 = score(doc=139,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.2708308 = fieldWeight in 139, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=139)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    A Web-based survey was conducted to determine the extent to which Canadian public libraries are collecting multilingual materials (foreign languages other than English and French), the methods that they use to select these materials, and whether public librarians are sufficiently prepared to provide their multilingual clientele with an adequate range of materials and services. There is room for improvement with regard to collection development of multilingual materials in Canadian public libraries, as well as in educating staff about keeping multilingual collections current, diverse, and of sufficient interest to potential users to keep such materials circulating. The main constraints preventing public libraries from developing better multilingual collections are addressed, and recommendations for improving the state of multilingual holdings are provided.
    Date
    10. 9.2000 17:38:22
  10. Holley, R.P.: ¬The Répertoire de Vedettes-matière de l'Université Laval Library, 1946-92 : Francophone subject access in North America and Europe (2002) 0.01
    0.0116941 = product of:
      0.040929347 = sum of:
        0.02770021 = weight(_text_:based in 159) [ClassicSimilarity], result of:
          0.02770021 = score(doc=159,freq=4.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.23539014 = fieldWeight in 159, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=159)
        0.013229139 = product of:
          0.026458278 = sum of:
            0.026458278 = weight(_text_:22 in 159) [ClassicSimilarity], result of:
              0.026458278 = score(doc=159,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.19345059 = fieldWeight in 159, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=159)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    In 1946, the Université Laval in Quebec City, Quebec, Canada, started using Library of Congress Subject Headings (LCSH) in French by creating an authority list, Répertoire de Vedettes-matière (RVM), whose first published edition appeared in 1962. In the 1970s, the most important libraries in Canada with an interest in French-language cataloging - the Université de Montréal, the Bibliothèque Nationale du Canada, and the Bibliothèque Nationale du Quebec - forged partnerships with the Université Laval to support RVM. In 1974, the Bibliothèque Publique d'Information, Centre Pompidou, Paris, France became the first library in Europe to adopt RVM. During the 1980s, the Bibliothèque Nationale de France (BNF) created an authority list, RAMEAU, based upon RVM, which is used by numerous French libraries of all types. The major libraries in Luxembourg adopted RVM in 1985. Individual libraries in Belgium also use RVM, often in combination with LCSH. The spread of RVM in the francophone world reflects the increasing importance of the pragmatic North American tradition of shared cataloging and library cooperation. RVM and its European versions are based upon literary warrant and make changes to LCSH to reflect the specific cultural and linguistic meeds of their user communities. While the users of RVM seek to harmonize the various versions, differences in terminology and probably syntax are inevitable.
    Date
    10. 9.2000 17:38:22
  11. Larkey, L.S.; Connell, M.E.: Structured queries, language modelling, and relevance modelling in cross-language information retrieval (2005) 0.01
    0.0116941 = product of:
      0.040929347 = sum of:
        0.02770021 = weight(_text_:based in 1022) [ClassicSimilarity], result of:
          0.02770021 = score(doc=1022,freq=4.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.23539014 = fieldWeight in 1022, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1022)
        0.013229139 = product of:
          0.026458278 = sum of:
            0.026458278 = weight(_text_:22 in 1022) [ClassicSimilarity], result of:
              0.026458278 = score(doc=1022,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.19345059 = fieldWeight in 1022, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1022)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries--one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus. We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.
    Date
    26.12.2007 20:22:11
  12. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.01
    0.01125125 = product of:
      0.039379373 = sum of:
        0.023504408 = weight(_text_:based in 4436) [ClassicSimilarity], result of:
          0.023504408 = score(doc=4436,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.19973516 = fieldWeight in 4436, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=4436)
        0.015874967 = product of:
          0.031749934 = sum of:
            0.031749934 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
              0.031749934 = score(doc=4436,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.23214069 = fieldWeight in 4436, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4436)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
    Date
    16. 2.2000 14:22:39
  13. Seo, H.-C.; Kim, S.-B.; Rim, H.-C.; Myaeng, S.-H.: lmproving query translation in English-Korean Cross-language information retrieval (2005) 0.01
    0.01125125 = product of:
      0.039379373 = sum of:
        0.023504408 = weight(_text_:based in 1023) [ClassicSimilarity], result of:
          0.023504408 = score(doc=1023,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.19973516 = fieldWeight in 1023, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=1023)
        0.015874967 = product of:
          0.031749934 = sum of:
            0.031749934 = weight(_text_:22 in 1023) [ClassicSimilarity], result of:
              0.031749934 = score(doc=1023,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.23214069 = fieldWeight in 1023, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1023)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Query translation is a viable method for cross-language information retrieval (CLIR), but it suffers from translation ambiguities caused by multiple translations of individual query terms. Previous research has employed various methods for disambiguation, including the method of selecting an individual target query term from multiple candidates by comparing their statistical associations with the candidate translations of other query terms. This paper proposes a new method where we examine all combinations of target query term translations corresponding to the source query terms, instead of looking at the candidates for each query term and selecting the best one at a time. The goodness value for a combination of target query terms is computed based on the association value between each pair of the terms in the combination. We tested our method using the NTCIR-3 English-Korean CLIR test collection. The results show some improvements regardless of the association measures we used.
    Date
    26.12.2007 20:22:38
  14. EuropeanaTech and Multilinguality : Issue 1 of EuropeanaTech Insight (2015) 0.01
    0.0110564465 = product of:
      0.07739512 = sum of:
        0.07739512 = weight(_text_:great in 1832) [ClassicSimilarity], result of:
          0.07739512 = score(doc=1832,freq=4.0), product of:
            0.21992016 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.03905679 = queryNorm
            0.3519237 = fieldWeight in 1832, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.03125 = fieldNorm(doc=1832)
      0.14285715 = coord(1/7)
    
    Abstract
    Welcome to the very first issue of EuropeanaTech Insight, a multimedia publication about research and development within the EuropeanaTech community. EuropeanaTech is a very active community. It spans all of Europe and is made up of technical experts from the various disciplines within digital cultural heritage. At any given moment, members can be found presenting their work in project meetings, seminars and conferences around the world. Now, through EuropeanaTech Insight, we can share that inspiring work with the whole community. In our first three issues, we're showcasing topics discussed at the EuropeanaTech 2015 Conference, an exciting event that gave rise to lots of innovative ideas and fruitful conversations on the themes of data quality, data modelling, open data, data re-use, multilingualism and discovery. Welcome, bienvenue, bienvenido, Välkommen, Tervetuloa to the first Issue of EuropeanaTech Insight. Are we talking your language? No? Well I can guarantee you Europeana is. One of the European Union's great beauties and strengths is its diversity. That diversity is perhaps most evident in the 24 different languages spoken in the EU. Making it possible for all European citizens to easily and seamlessly communicate in their native language with others who do not speak that language is a huge technical undertaking. Translating documents, news, speeches and historical texts was once exclusively done manually. Clearly, that takes a huge amount of time and resources and means that not everything can be translated... However, with the advances in machine and automatic translation, it's becoming more possible to provide instant and pretty accurate translations. Europeana provides access to over 40 million digitised cultural heritage offering content in over 33 languages. But what value does Europeana provide if people can only find results in their native language? None. That's why the EuropeanaTech community is collectively working towards making it more possible for everyone to discover our collections in their native language. In this issue of EuropeanaTech Insight, we hear from community members who are making great strides in machine translation and enrichment tools to help improve not only access to data, but also how we retrieve, browse and understand it.
  15. Lonsdale, D.; Mitamura, T.; Nyberg, E.: Acquisition of large lexicons for practical knowledge-based MT (1994/95) 0.01
    0.008883832 = product of:
      0.062186822 = sum of:
        0.062186822 = weight(_text_:based in 7409) [ClassicSimilarity], result of:
          0.062186822 = score(doc=7409,freq=14.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.5284496 = fieldWeight in 7409, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=7409)
      0.14285715 = coord(1/7)
    
    Abstract
    Although knowledge based MT systems have the potential to achieve high translation accuracy, each successful application system requires a large amount of hand coded lexical knowledge. Systems like KBMT-89 and its descendants have demonstarted how knowledge based translation can produce good results in technical domains with tractable domain semantics. Nevertheless, the magnitude of the development task for large scale applications with 10s of 1000s of of domain concepts precludes a purely hand crafted approach. The current challenge for the next generation of knowledge based MT systems is to utilize online textual resources and corpus analysis software in order to automate the most laborious aspects of the knowledge acquisition process. This partial automation can in turn maximize the productivity of human knowledge engineers and help to make large scale applications of knowledge based MT an viable approach. Discusses the corpus based knowledge acquisition methodology used in KANT, a knowledge based translation system for multilingual document production. This methodology can be generalized beyond the KANT interlinhua approach for use with any system that requires similar kinds of knowledge
  16. Francu, V.: UDC-based thesauri and multilingual access to information (2004) 0.01
    0.007834803 = product of:
      0.05484362 = sum of:
        0.05484362 = weight(_text_:based in 3767) [ClassicSimilarity], result of:
          0.05484362 = score(doc=3767,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.46604872 = fieldWeight in 3767, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.109375 = fieldNorm(doc=3767)
      0.14285715 = coord(1/7)
    
  17. Landry, P.: MACS update : moving toward a link management production database (2003) 0.01
    0.007818088 = product of:
      0.05472661 = sum of:
        0.05472661 = weight(_text_:great in 2864) [ClassicSimilarity], result of:
          0.05472661 = score(doc=2864,freq=2.0), product of:
            0.21992016 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.03905679 = queryNorm
            0.24884763 = fieldWeight in 2864, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.03125 = fieldNorm(doc=2864)
      0.14285715 = coord(1/7)
    
    Abstract
    Conclusion After a few years of design work and testing, it now appears that the MACS project is almost ready to move to production. The latest LMI release has shown that it can be used in a federated work network and that it is robust enough to manage many thousands of links. Once in the production phase, consideration should be given to extend MACS to other SHLs in other languages. There is still a great interest from other CENL members to participate in this project and the consortium structure will need to be finalised in order to incorporate gradually and successfully new partners in the MACS system. Work will also continue to improve the Search Interface (SI) before it can be successfully integrated in each of the partners OPAC. In this context, some form of access to the local authority files should be investigated so that users can select the most appropriate heading within each subject hierarchies before sending their search to the different target databases. Testing of Z39.50 access to the partners' library catalogues will also continue to further refine search results. The long range prospect of the MACS initiative will have to be addressed in the foreseeable future. Financial as well as institutional support will need to be reinforced and possibly new types of partnership identified. As the need to improve subject access continues to be an issue for many European national libraries, MACS will hopefully remain a viable tool for ensuring cross-language access. One of the potential targets is the TEL project. Within the scope of that initiative, is it possible and useful to envisage the integration of MACS in TEL as an additional access point? It is worth stating the question in light of the challenge to European national libraries to offer improved access to their collections.
  18. Tsuji, K.; Kageura, K.: Automatic generation of Japanese-English bilingual thesauri based on bilingual corpora (2006) 0.01
    0.0068540247 = product of:
      0.04797817 = sum of:
        0.04797817 = weight(_text_:based in 5061) [ClassicSimilarity], result of:
          0.04797817 = score(doc=5061,freq=12.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.4077077 = fieldWeight in 5061, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5061)
      0.14285715 = coord(1/7)
    
    Abstract
    The authors propose a method for automatically generating Japanese-English bilingual thesauri based on bilingual corpora. The term bilingual thesaurus refers to a set of bilingual equivalent words and their synonyms. Most of the methods proposed so far for extracting bilingual equivalent word clusters from bilingual corpora depend heavily on word frequency and are not effective for dealing with low-frequency clusters. These low-frequency bilingual clusters are worth extracting because they contain many newly coined terms that are in demand but are not listed in existing bilingual thesauri. Assuming that single language-pair-independent methods such as frequency-based ones have reached their limitations and that a language-pair-dependent method used in combination with other methods shows promise, the authors propose the following approach: (a) Extract translation pairs based on transliteration patterns; (b) remove the pairs from among the candidate words; (c) extract translation pairs based on word frequency from the remaining candidate words; and (d) generate bilingual clusters based on the extracted pairs using a graph-theoretic method. The proposed method has been found to be significantly more effective than other methods.
  19. Kishida, K.: Term disambiguation techniques based on target document collection for cross-language information retrieval : an empirical comparison of performance between techniques (2007) 0.01
    0.0068540247 = product of:
      0.04797817 = sum of:
        0.04797817 = weight(_text_:based in 897) [ClassicSimilarity], result of:
          0.04797817 = score(doc=897,freq=12.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.4077077 = fieldWeight in 897, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=897)
      0.14285715 = coord(1/7)
    
    Abstract
    Dictionary-based query translation for cross-language information retrieval often yields various translation candidates having different meanings for a source term in the query. This paper examines methods for solving the ambiguity of translations based on only the target document collections. First, we discuss two kinds of disambiguation technique: (1) one is a method using term co-occurrence statistics in the collection, and (2) a technique based on pseudo-relevance feedback. Next, these techniques are empirically compared using the CLEF 2003 test collection for German to Italian bilingual searches, which are executed by using English language as a pivot. The experiments showed that a variation of term co-occurrence based techniques, in which the best sequence algorithm for selecting translations is used with the Cosine coefficient, is dominant, and that the PRF method shows comparable high search performance, although statistical tests did not sufficiently support these conclusions. Furthermore, we repeat the same experiments for the case of French to Italian (pivot) and English to Italian (non-pivot) searches on the same CLEF 2003 test collection in order to verity our findings. Again, similar results were observed except that the Dice coefficient outperforms slightly the Cosine coefficient in the case of disambiguation based on term co-occurrence for English to Italian searches.
  20. Haruyama, A.; Yamashita, Y.; Kubota, H.: Development of a multilingual indexing vocabulary based on a faceted thesauri (1996) 0.01
    0.0067155454 = product of:
      0.047008816 = sum of:
        0.047008816 = weight(_text_:based in 3492) [ClassicSimilarity], result of:
          0.047008816 = score(doc=3492,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.39947033 = fieldWeight in 3492, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.09375 = fieldNorm(doc=3492)
      0.14285715 = coord(1/7)
    

Years

Languages

  • e 102
  • d 6
  • ro 2
  • f 1
  • m 1
  • More… Less…

Types

  • a 103
  • el 7
  • m 2
  • s 2
  • x 2
  • r 1
  • More… Less…