Search (113 results, page 1 of 6)

  • × theme_ss:"Multilinguale Probleme"
  1. Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.07
    0.07087437 = product of:
      0.10631155 = sum of:
        0.071949646 = weight(_text_:based in 4157) [ClassicSimilarity], result of:
          0.071949646 = score(doc=4157,freq=4.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.47078028 = fieldWeight in 4157, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.078125 = fieldNorm(doc=4157)
        0.034361906 = product of:
          0.06872381 = sum of:
            0.06872381 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
              0.06872381 = score(doc=4157,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.38690117 = fieldWeight in 4157, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4157)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Source
    Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill
  2. Zhou, Y. et al.: Analysing entity context in multilingual Wikipedia to support entity-centric retrieval applications (2016) 0.06
    0.056825332 = product of:
      0.085237995 = sum of:
        0.050876085 = weight(_text_:based in 2758) [ClassicSimilarity], result of:
          0.050876085 = score(doc=2758,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.33289194 = fieldWeight in 2758, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.078125 = fieldNorm(doc=2758)
        0.034361906 = product of:
          0.06872381 = sum of:
            0.06872381 = weight(_text_:22 in 2758) [ClassicSimilarity], result of:
              0.06872381 = score(doc=2758,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.38690117 = fieldWeight in 2758, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2758)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Date
    1. 2.2016 18:25:22
    Source
    Semantic keyword-based search on structured data sources: First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers. Eds.: J. Cardoso et al
  3. Ferber, R.: Automated indexing with thesaurus descriptors : a co-occurence based approach to multilingual retrieval (1997) 0.05
    0.045772478 = product of:
      0.06865872 = sum of:
        0.025438042 = weight(_text_:based in 4144) [ClassicSimilarity], result of:
          0.025438042 = score(doc=4144,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.16644597 = fieldWeight in 4144, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4144)
        0.043220676 = product of:
          0.08644135 = sum of:
            0.08644135 = weight(_text_:training in 4144) [ClassicSimilarity], result of:
              0.08644135 = score(doc=4144,freq=4.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.3648797 = fieldWeight in 4144, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4144)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Indexing documents with descriptors from a multilingual thesaurus is an approach to multilingual information retrieval. However, manual indexing is expensive. Automazed indexing methods in general use terms found in the document. Thesaurus descriptors are complex terms that are often not used in documents or have specific meanings within the thesaurus; therefore most weighting schemes of automated indexing methods are not suited to select thesaurus descriptors. In this paper a linear associative system is described that uses similarity values extracted from a large corpus of manually indexed documents to construct a rank ordering of the descriptors for a given document title. The system is adaptive and has to be tuned with a training sample of records for the specific task. The system was tested on a corpus of some 80.000 bibliographic records. The results show a high variability with changing parameter values. This indicated that it is very important to empirically adapt the model to the specific situation it is used in. The overall median of the manually assigned descriptors in the automatically generated ranked list of all 3.631 descriptors is 14 for the set used to adapt the system and 11 for a test set not used in the optimization process. This result shows that the optimization is not a fitting to a specific training set but a real adaptation of the model to the setting
  4. Cao, L.; Leong, M.-K.; Low, H.-B.: Searching heterogeneous multilingual bibliographic sources (1998) 0.05
    0.045460265 = product of:
      0.068190396 = sum of:
        0.040700868 = weight(_text_:based in 3564) [ClassicSimilarity], result of:
          0.040700868 = score(doc=3564,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.26631355 = fieldWeight in 3564, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0625 = fieldNorm(doc=3564)
        0.027489524 = product of:
          0.05497905 = sum of:
            0.05497905 = weight(_text_:22 in 3564) [ClassicSimilarity], result of:
              0.05497905 = score(doc=3564,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.30952093 = fieldWeight in 3564, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3564)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Propopses a Web-based architecture for searching distributed heterogeneous multi-asian language bibliographic sources, and describes a successful pilot implementation of the system at the Chinese Library (CLib) system developed in Singapore and tested at 2 university libraries and a public library
    Date
    1. 8.1996 22:08:06
  5. Caumanns, J.; Hollfelde, S.: Web-basierte Repositories zur Speicherung, Verwaltung und Wiederverwendung multimedialer Lernfragmente (2001) 0.04
    0.044799745 = product of:
      0.06719962 = sum of:
        0.03052565 = weight(_text_:based in 5881) [ClassicSimilarity], result of:
          0.03052565 = score(doc=5881,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.19973516 = fieldWeight in 5881, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=5881)
        0.036673963 = product of:
          0.073347926 = sum of:
            0.073347926 = weight(_text_:training in 5881) [ClassicSimilarity], result of:
              0.073347926 = score(doc=5881,freq=2.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.3096107 = fieldWeight in 5881, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5881)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Theme
    Computer Based Training
  6. Gupta, P.; Banchs, R.E.; Rosso, P.: Continuous space models for CLIR (2017) 0.04
    0.044799745 = product of:
      0.06719962 = sum of:
        0.03052565 = weight(_text_:based in 3295) [ClassicSimilarity], result of:
          0.03052565 = score(doc=3295,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.19973516 = fieldWeight in 3295, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=3295)
        0.036673963 = product of:
          0.073347926 = sum of:
            0.073347926 = weight(_text_:training in 3295) [ClassicSimilarity], result of:
              0.073347926 = score(doc=3295,freq=2.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.3096107 = fieldWeight in 3295, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3295)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    We present and evaluate a novel technique for learning cross-lingual continuous space models to aid cross-language information retrieval (CLIR). Our model, which is referred to as external-data composition neural network (XCNN), is based on a composition function that is implemented on top of a deep neural network that provides a distributed learning framework. Different from most existing models, which rely only on available parallel data for training, our learning framework provides a natural way to exploit monolingual data and its associated relevance metadata for learning continuous space representations of language. Cross-language extensions of the obtained models can then be trained by using a small set of parallel data. This property is very helpful for resource-poor languages, therefore, we carry out experiments on the English-Hindi language pair. On the conducted comparative evaluation, the proposed model is shown to outperform state-of-the-art continuous space models with statistically significant margin on two different tasks: parallel sentence retrieval and ad-hoc retrieval.
  7. Frâncu, V.; Sabo, C.-N.: Implementation of a UDC-based multilingual thesaurus in a library catalogue : the case of BiblioPhil (2010) 0.04
    0.04252462 = product of:
      0.06378693 = sum of:
        0.04316979 = weight(_text_:based in 3697) [ClassicSimilarity], result of:
          0.04316979 = score(doc=3697,freq=4.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.28246817 = fieldWeight in 3697, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=3697)
        0.020617142 = product of:
          0.041234285 = sum of:
            0.041234285 = weight(_text_:22 in 3697) [ClassicSimilarity], result of:
              0.041234285 = score(doc=3697,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.23214069 = fieldWeight in 3697, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3697)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In order to enhance the use of Universal Decimal Classification (UDC) numbers in information retrieval, the authors have represented classification with multilingual thesaurus descriptors and implemented this solution in an automated way. The authors illustrate a solution implemented in a BiblioPhil library system. The standard formats used are UNIMARC for subject authority records (i.e. the UDC-based multilingual thesaurus) and MARC XML support for data transfer. The multilingual thesaurus was built according to existing standards, the constituent parts of the classification notations being used as the basis for search terms in the multilingual information retrieval. The verbal equivalents, descriptors and non-descriptors, are used to expand the number of concepts and are given in Romanian, English and French. This approach saves the time of the indexer and provides more user-friendly and easier access to the bibliographic information. The multilingual aspect of the thesaurus enhances information access for a greater number of online users
    Date
    22. 7.2010 20:40:56
  8. De Luca, E.W.; Dahlberg, I.: Including knowledge domains from the ICC into the multilingual lexical linked data cloud (2014) 0.04
    0.040181573 = product of:
      0.06027236 = sum of:
        0.035974823 = weight(_text_:based in 1493) [ClassicSimilarity], result of:
          0.035974823 = score(doc=1493,freq=4.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.23539014 = fieldWeight in 1493, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1493)
        0.024297535 = product of:
          0.04859507 = sum of:
            0.04859507 = weight(_text_:22 in 1493) [ClassicSimilarity], result of:
              0.04859507 = score(doc=1493,freq=4.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.27358043 = fieldWeight in 1493, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1493)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    A lot of information that is already available on the Web, or retrieved from local information systems and social networks is structured in data silos that are not semantically related. Semantic technologies make it emerge that the use of typed links that directly express their relations are an advantage for every application that can reuse the incorporated knowledge about the data. For this reason, data integration, through reengineering (e.g. triplify), or querying (e.g. D2R) is an important task in order to make information available for everyone. Thus, in order to build a semantic map of the data, we need knowledge about data items itself and the relation between heterogeneous data items. In this paper, we present our work of providing Lexical Linked Data (LLD) through a meta-model that contains all the resources and gives the possibility to retrieve and navigate them from different perspectives. We combine the existing work done on knowledge domains (based on the Information Coding Classification) within the Multilingual Lexical Linked Data Cloud (based on the RDF/OWL EurowordNet and the related integrated lexical resources (MultiWordNet, EuroWordNet, MEMODATA Lexicon, Hamburg Methaphor DB).
    Date
    22. 9.2014 19:01:18
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  9. Chen, H.-H.; Lin, W.-C.; Yang, C.; Lin, W.-H.: Translating-transliterating named entities for multilingual information access (2006) 0.04
    0.039777733 = product of:
      0.059666596 = sum of:
        0.03561326 = weight(_text_:based in 1080) [ClassicSimilarity], result of:
          0.03561326 = score(doc=1080,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.23302436 = fieldWeight in 1080, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1080)
        0.024053333 = product of:
          0.048106667 = sum of:
            0.048106667 = weight(_text_:22 in 1080) [ClassicSimilarity], result of:
              0.048106667 = score(doc=1080,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.2708308 = fieldWeight in 1080, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1080)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Named entities are major constituents of a document but are usually unknown words. This work proposes a systematic way of dealing with formulation, transformation, translation, and transliteration of multilingual-named entities. The rules and similarity matrices for translation and transliteration are learned automatically from parallel-named-entity corpora. The results are applied in cross-language access to collections of images with captions. Experimental results demonstrate that the similarity-based transliteration of named entities is effective, and runs in which transliteration is considered outperform the runs in which it is neglected.
    Date
    4. 6.2006 19:52:22
  10. Dilevko, J.; Dali, K.: ¬The challenge of building multilingual collections in Canadian public libraries (2002) 0.04
    0.039777733 = product of:
      0.059666596 = sum of:
        0.03561326 = weight(_text_:based in 139) [ClassicSimilarity], result of:
          0.03561326 = score(doc=139,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.23302436 = fieldWeight in 139, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=139)
        0.024053333 = product of:
          0.048106667 = sum of:
            0.048106667 = weight(_text_:22 in 139) [ClassicSimilarity], result of:
              0.048106667 = score(doc=139,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.2708308 = fieldWeight in 139, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=139)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    A Web-based survey was conducted to determine the extent to which Canadian public libraries are collecting multilingual materials (foreign languages other than English and French), the methods that they use to select these materials, and whether public librarians are sufficiently prepared to provide their multilingual clientele with an adequate range of materials and services. There is room for improvement with regard to collection development of multilingual materials in Canadian public libraries, as well as in educating staff about keeping multilingual collections current, diverse, and of sufficient interest to potential users to keep such materials circulating. The main constraints preventing public libraries from developing better multilingual collections are addressed, and recommendations for improving the state of multilingual holdings are provided.
    Date
    10. 9.2000 17:38:22
  11. Holley, R.P.: ¬The Répertoire de Vedettes-matière de l'Université Laval Library, 1946-92 : Francophone subject access in North America and Europe (2002) 0.04
    0.035437185 = product of:
      0.053155776 = sum of:
        0.035974823 = weight(_text_:based in 159) [ClassicSimilarity], result of:
          0.035974823 = score(doc=159,freq=4.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.23539014 = fieldWeight in 159, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=159)
        0.017180953 = product of:
          0.034361906 = sum of:
            0.034361906 = weight(_text_:22 in 159) [ClassicSimilarity], result of:
              0.034361906 = score(doc=159,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.19345059 = fieldWeight in 159, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=159)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In 1946, the Université Laval in Quebec City, Quebec, Canada, started using Library of Congress Subject Headings (LCSH) in French by creating an authority list, Répertoire de Vedettes-matière (RVM), whose first published edition appeared in 1962. In the 1970s, the most important libraries in Canada with an interest in French-language cataloging - the Université de Montréal, the Bibliothèque Nationale du Canada, and the Bibliothèque Nationale du Quebec - forged partnerships with the Université Laval to support RVM. In 1974, the Bibliothèque Publique d'Information, Centre Pompidou, Paris, France became the first library in Europe to adopt RVM. During the 1980s, the Bibliothèque Nationale de France (BNF) created an authority list, RAMEAU, based upon RVM, which is used by numerous French libraries of all types. The major libraries in Luxembourg adopted RVM in 1985. Individual libraries in Belgium also use RVM, often in combination with LCSH. The spread of RVM in the francophone world reflects the increasing importance of the pragmatic North American tradition of shared cataloging and library cooperation. RVM and its European versions are based upon literary warrant and make changes to LCSH to reflect the specific cultural and linguistic meeds of their user communities. While the users of RVM seek to harmonize the various versions, differences in terminology and probably syntax are inevitable.
    Date
    10. 9.2000 17:38:22
  12. Larkey, L.S.; Connell, M.E.: Structured queries, language modelling, and relevance modelling in cross-language information retrieval (2005) 0.04
    0.035437185 = product of:
      0.053155776 = sum of:
        0.035974823 = weight(_text_:based in 1022) [ClassicSimilarity], result of:
          0.035974823 = score(doc=1022,freq=4.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.23539014 = fieldWeight in 1022, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1022)
        0.017180953 = product of:
          0.034361906 = sum of:
            0.034361906 = weight(_text_:22 in 1022) [ClassicSimilarity], result of:
              0.034361906 = score(doc=1022,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.19345059 = fieldWeight in 1022, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1022)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries--one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus. We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.
    Date
    26.12.2007 20:22:11
  13. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.03
    0.034095198 = product of:
      0.051142793 = sum of:
        0.03052565 = weight(_text_:based in 4436) [ClassicSimilarity], result of:
          0.03052565 = score(doc=4436,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.19973516 = fieldWeight in 4436, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=4436)
        0.020617142 = product of:
          0.041234285 = sum of:
            0.041234285 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
              0.041234285 = score(doc=4436,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.23214069 = fieldWeight in 4436, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4436)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
    Date
    16. 2.2000 14:22:39
  14. Seo, H.-C.; Kim, S.-B.; Rim, H.-C.; Myaeng, S.-H.: lmproving query translation in English-Korean Cross-language information retrieval (2005) 0.03
    0.034095198 = product of:
      0.051142793 = sum of:
        0.03052565 = weight(_text_:based in 1023) [ClassicSimilarity], result of:
          0.03052565 = score(doc=1023,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.19973516 = fieldWeight in 1023, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=1023)
        0.020617142 = product of:
          0.041234285 = sum of:
            0.041234285 = weight(_text_:22 in 1023) [ClassicSimilarity], result of:
              0.041234285 = score(doc=1023,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.23214069 = fieldWeight in 1023, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1023)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Query translation is a viable method for cross-language information retrieval (CLIR), but it suffers from translation ambiguities caused by multiple translations of individual query terms. Previous research has employed various methods for disambiguation, including the method of selecting an individual target query term from multiple candidates by comparing their statistical associations with the candidate translations of other query terms. This paper proposes a new method where we examine all combinations of target query term translations corresponding to the source query terms, instead of looking at the candidates for each query term and selecting the best one at a time. The goodness value for a combination of target query terms is computed based on the association value between each pair of the terms in the combination. We tested our method using the NTCIR-3 English-Korean CLIR test collection. The results show some improvements regardless of the association measures we used.
    Date
    26.12.2007 20:22:38
  15. Lonsdale, D.; Mitamura, T.; Nyberg, E.: Acquisition of large lexicons for practical knowledge-based MT (1994/95) 0.03
    0.026921097 = product of:
      0.08076329 = sum of:
        0.08076329 = weight(_text_:based in 7409) [ClassicSimilarity], result of:
          0.08076329 = score(doc=7409,freq=14.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.5284496 = fieldWeight in 7409, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=7409)
      0.33333334 = coord(1/3)
    
    Abstract
    Although knowledge based MT systems have the potential to achieve high translation accuracy, each successful application system requires a large amount of hand coded lexical knowledge. Systems like KBMT-89 and its descendants have demonstarted how knowledge based translation can produce good results in technical domains with tractable domain semantics. Nevertheless, the magnitude of the development task for large scale applications with 10s of 1000s of of domain concepts precludes a purely hand crafted approach. The current challenge for the next generation of knowledge based MT systems is to utilize online textual resources and corpus analysis software in order to automate the most laborious aspects of the knowledge acquisition process. This partial automation can in turn maximize the productivity of human knowledge engineers and help to make large scale applications of knowledge based MT an viable approach. Discusses the corpus based knowledge acquisition methodology used in KANT, a knowledge based translation system for multilingual document production. This methodology can be generalized beyond the KANT interlinhua approach for use with any system that requires similar kinds of knowledge
  16. Francu, V.: UDC-based thesauri and multilingual access to information (2004) 0.02
    0.023742175 = product of:
      0.07122652 = sum of:
        0.07122652 = weight(_text_:based in 3767) [ClassicSimilarity], result of:
          0.07122652 = score(doc=3767,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.46604872 = fieldWeight in 3767, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.109375 = fieldNorm(doc=3767)
      0.33333334 = coord(1/3)
    
  17. Tsuji, K.; Kageura, K.: Automatic generation of Japanese-English bilingual thesauri based on bilingual corpora (2006) 0.02
    0.020770075 = product of:
      0.062310223 = sum of:
        0.062310223 = weight(_text_:based in 5061) [ClassicSimilarity], result of:
          0.062310223 = score(doc=5061,freq=12.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.4077077 = fieldWeight in 5061, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5061)
      0.33333334 = coord(1/3)
    
    Abstract
    The authors propose a method for automatically generating Japanese-English bilingual thesauri based on bilingual corpora. The term bilingual thesaurus refers to a set of bilingual equivalent words and their synonyms. Most of the methods proposed so far for extracting bilingual equivalent word clusters from bilingual corpora depend heavily on word frequency and are not effective for dealing with low-frequency clusters. These low-frequency bilingual clusters are worth extracting because they contain many newly coined terms that are in demand but are not listed in existing bilingual thesauri. Assuming that single language-pair-independent methods such as frequency-based ones have reached their limitations and that a language-pair-dependent method used in combination with other methods shows promise, the authors propose the following approach: (a) Extract translation pairs based on transliteration patterns; (b) remove the pairs from among the candidate words; (c) extract translation pairs based on word frequency from the remaining candidate words; and (d) generate bilingual clusters based on the extracted pairs using a graph-theoretic method. The proposed method has been found to be significantly more effective than other methods.
  18. Kishida, K.: Term disambiguation techniques based on target document collection for cross-language information retrieval : an empirical comparison of performance between techniques (2007) 0.02
    0.020770075 = product of:
      0.062310223 = sum of:
        0.062310223 = weight(_text_:based in 897) [ClassicSimilarity], result of:
          0.062310223 = score(doc=897,freq=12.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.4077077 = fieldWeight in 897, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=897)
      0.33333334 = coord(1/3)
    
    Abstract
    Dictionary-based query translation for cross-language information retrieval often yields various translation candidates having different meanings for a source term in the query. This paper examines methods for solving the ambiguity of translations based on only the target document collections. First, we discuss two kinds of disambiguation technique: (1) one is a method using term co-occurrence statistics in the collection, and (2) a technique based on pseudo-relevance feedback. Next, these techniques are empirically compared using the CLEF 2003 test collection for German to Italian bilingual searches, which are executed by using English language as a pivot. The experiments showed that a variation of term co-occurrence based techniques, in which the best sequence algorithm for selecting translations is used with the Cosine coefficient, is dominant, and that the PRF method shows comparable high search performance, although statistical tests did not sufficiently support these conclusions. Furthermore, we repeat the same experiments for the case of French to Italian (pivot) and English to Italian (non-pivot) searches on the same CLEF 2003 test collection in order to verity our findings. Again, similar results were observed except that the Dice coefficient outperforms slightly the Cosine coefficient in the case of disambiguation based on term co-occurrence for English to Italian searches.
  19. Haruyama, A.; Yamashita, Y.; Kubota, H.: Development of a multilingual indexing vocabulary based on a faceted thesauri (1996) 0.02
    0.020350434 = product of:
      0.0610513 = sum of:
        0.0610513 = weight(_text_:based in 3492) [ClassicSimilarity], result of:
          0.0610513 = score(doc=3492,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.39947033 = fieldWeight in 3492, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.09375 = fieldNorm(doc=3492)
      0.33333334 = coord(1/3)
    
  20. Weihs, J.: Three tales of multilingual cataloguing (1998) 0.02
    0.01832635 = product of:
      0.05497905 = sum of:
        0.05497905 = product of:
          0.1099581 = sum of:
            0.1099581 = weight(_text_:22 in 6063) [ClassicSimilarity], result of:
              0.1099581 = score(doc=6063,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.61904186 = fieldWeight in 6063, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=6063)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    2. 8.2001 8:55:22

Years

Languages

  • e 103
  • d 6
  • ro 2
  • f 1
  • m 1
  • More… Less…

Types

  • a 105
  • el 6
  • m 2
  • r 2
  • s 2
  • x 2
  • More… Less…