Search (184 results, page 2 of 10)

  • × theme_ss:"Multilinguale Probleme"
  • × type_ss:"a"
  1. Seo, H.-C.; Kim, S.-B.; Rim, H.-C.; Myaeng, S.-H.: lmproving query translation in English-Korean Cross-language information retrieval (2005) 0.01
    0.005982068 = product of:
      0.020937236 = sum of:
        0.006214436 = product of:
          0.03107218 = sum of:
            0.03107218 = weight(_text_:retrieval in 1023) [ClassicSimilarity], result of:
              0.03107218 = score(doc=1023,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.2835858 = fieldWeight in 1023, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1023)
          0.2 = coord(1/5)
        0.0147228 = product of:
          0.0294456 = sum of:
            0.0294456 = weight(_text_:22 in 1023) [ClassicSimilarity], result of:
              0.0294456 = score(doc=1023,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23214069 = fieldWeight in 1023, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1023)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Query translation is a viable method for cross-language information retrieval (CLIR), but it suffers from translation ambiguities caused by multiple translations of individual query terms. Previous research has employed various methods for disambiguation, including the method of selecting an individual target query term from multiple candidates by comparing their statistical associations with the candidate translations of other query terms. This paper proposes a new method where we examine all combinations of target query term translations corresponding to the source query terms, instead of looking at the candidates for each query term and selecting the best one at a time. The goodness value for a combination of target query terms is computed based on the association value between each pair of the terms in the combination. We tested our method using the NTCIR-3 English-Korean CLIR test collection. The results show some improvements regardless of the association measures we used.
    Date
    26.12.2007 20:22:38
  2. Weihs, J.: Three tales of multilingual cataloguing (1998) 0.01
    0.0056086862 = product of:
      0.0392608 = sum of:
        0.0392608 = product of:
          0.0785216 = sum of:
            0.0785216 = weight(_text_:22 in 6063) [ClassicSimilarity], result of:
              0.0785216 = score(doc=6063,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.61904186 = fieldWeight in 6063, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=6063)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Date
    2. 8.2001 8:55:22
  3. Larkey, L.S.; Connell, M.E.: Structured queries, language modelling, and relevance modelling in cross-language information retrieval (2005) 0.01
    0.005597939 = product of:
      0.019592784 = sum of:
        0.007323784 = product of:
          0.03661892 = sum of:
            0.03661892 = weight(_text_:retrieval in 1022) [ClassicSimilarity], result of:
              0.03661892 = score(doc=1022,freq=8.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.33420905 = fieldWeight in 1022, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1022)
          0.2 = coord(1/5)
        0.0122690005 = product of:
          0.024538001 = sum of:
            0.024538001 = weight(_text_:22 in 1022) [ClassicSimilarity], result of:
              0.024538001 = score(doc=1022,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.19345059 = fieldWeight in 1022, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1022)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries--one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus. We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.
    Date
    26.12.2007 20:22:11
  4. Dabbadie, M.; Blancherie, J.M.: Alexandria, a multilingual dictionary for knowledge management purposes (2006) 0.01
    0.005567615 = product of:
      0.01948665 = sum of:
        0.0047638514 = product of:
          0.023819257 = sum of:
            0.023819257 = weight(_text_:system in 2465) [ClassicSimilarity], result of:
              0.023819257 = score(doc=2465,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.20878783 = fieldWeight in 2465, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2465)
          0.2 = coord(1/5)
        0.0147228 = product of:
          0.0294456 = sum of:
            0.0294456 = weight(_text_:22 in 2465) [ClassicSimilarity], result of:
              0.0294456 = score(doc=2465,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23214069 = fieldWeight in 2465, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2465)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Alexandria is an innovation of international impact. It is the only multilingual dictionary for websites and PCs. A double click on a word opens a small window that gives interactive translations between 22 languages and includes meaning, synonyms and associated expressions. It is an ASP application grounded on a semantic network that is portable on any operating system or platform. Behind the application is the Integral Dictionary is the semantic network created by Memodata. Alexandria can be customized with specific vocabulary, descriptive articles, images, sounds, videos, etc. Its domains of application are considerable: e-tourism, online medias, language learning, international websites. Alexandria has also proved to be a basic tool for knowledge management purposes. The application can be customized according to a user or an organization needs. An application dedicated to mobile devices is currently being developed. Future developments are planned in the field of e-tourism in relation with French "pôles de compétitivité".
  5. Clough, P.; Sanderson, M.: User experiments with the Eurovision Cross-Language Image Retrieval System (2006) 0.01
    0.0055545247 = product of:
      0.03888167 = sum of:
        0.03888167 = product of:
          0.09720418 = sum of:
            0.0439427 = weight(_text_:retrieval in 5052) [ClassicSimilarity], result of:
              0.0439427 = score(doc=5052,freq=8.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.40105087 = fieldWeight in 5052, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5052)
            0.053261478 = weight(_text_:system in 5052) [ClassicSimilarity], result of:
              0.053261478 = score(doc=5052,freq=10.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.46686378 = fieldWeight in 5052, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5052)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    In this article the authors present Eurovision, a textbased system for cross-language (CL) image retrieval. The system is evaluated by multilingual users for two search tasks with the system configured in English and five other languages. To the authors' knowledge, this is the first published set of user experiments for CL image retrieval. They show that (a) it is possible to create a usable multilingual search engine using little knowledge of any language other than English, (b) categorizing images assists the user's search, and (c) there are differences in the way users search between the proposed search tasks. Based on the two search tasks and user feedback, they describe important aspects of any CL image retrieval system.
  6. Fluhr, C.: Crosslingual access to photo databases (2012) 0.01
    0.00546202 = product of:
      0.01911707 = sum of:
        0.00439427 = product of:
          0.02197135 = sum of:
            0.02197135 = weight(_text_:retrieval in 93) [ClassicSimilarity], result of:
              0.02197135 = score(doc=93,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.20052543 = fieldWeight in 93, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=93)
          0.2 = coord(1/5)
        0.0147228 = product of:
          0.0294456 = sum of:
            0.0294456 = weight(_text_:22 in 93) [ClassicSimilarity], result of:
              0.0294456 = score(doc=93,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23214069 = fieldWeight in 93, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=93)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Date
    17. 4.2012 14:25:22
    Source
    Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a
  7. Baliková, M.: Looking for the best way of subject access (2008) 0.01
    0.0051095546 = product of:
      0.03576688 = sum of:
        0.03576688 = product of:
          0.089417204 = sum of:
            0.03107218 = weight(_text_:retrieval in 2187) [ClassicSimilarity], result of:
              0.03107218 = score(doc=2187,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.2835858 = fieldWeight in 2187, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2187)
            0.058345027 = weight(_text_:system in 2187) [ClassicSimilarity], result of:
              0.058345027 = score(doc=2187,freq=12.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.51142365 = fieldWeight in 2187, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2187)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    M-CAST which stands for »Multilingual Content Aggregation System based on TRUST Search Engine« is a multilingual indexing and retrieval system based on semantic technology; it allows asking a question in one language and finding an exact answer in digitalized resources in different languages. It can serve as a monolingual query-answering system as well. Presently, we have a prototype of the M-CAST system; it was developed to evaluate both retrieval effectiveness and correctness of the interpretation process and has been tested in real-world situations. Further research will be done to increase the capabilities of the system. The M-CAST question-answering could be applied in both digital and hybrid libraries, because it enables to pose questions using either a set of search terms or natural-language questions. In addition, it enables to narrow a search in advanced search module using UDC (Universal Decimal Classification) system, which is widely used in libraries.
  8. Hubrich, J.: Multilinguale Wissensorganisation im Zeitalter der Globalisierung : das Projekt CrissCross (2010) 0.00
    0.0049850564 = product of:
      0.017447697 = sum of:
        0.005178697 = product of:
          0.025893483 = sum of:
            0.025893483 = weight(_text_:retrieval in 4793) [ClassicSimilarity], result of:
              0.025893483 = score(doc=4793,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23632148 = fieldWeight in 4793, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4793)
          0.2 = coord(1/5)
        0.0122690005 = product of:
          0.024538001 = sum of:
            0.024538001 = weight(_text_:22 in 4793) [ClassicSimilarity], result of:
              0.024538001 = score(doc=4793,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.19345059 = fieldWeight in 4793, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4793)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Im Zuge zunehmender Globalisierung werden Wissensorganisationssysteme erforderlich, die ein sprachunabhängiges Retrieval ermöglichen, ohne dass dadurch bereits existierende und national bewährte Wissenssysteme obsolet werden. Das durch die Deutsche Forschungsgemeinschaft (DFG) geförderte und von der Deutschen Nationalbibliothek in Kooperation mit der Fachhochschule Köln durchgeführte Projekt CrissCross leistet einen wesentlichen Beitrag zur Schaffung eines solchen Wissensspeichers, indem es die Sachschlagwörter der deutschen Schlagwortnormdatei (SWD) mit Notationen der Dewey-Dezimalklassifikation sowie mit ihren Äquivalenten der Library of Congress Subject Headings (LCSH) und der französischen Schlagwortsprache RAMEAU (Repertoire d'autorité-matière encyclopédique et alphabétique unifié) verknüpft. Ein erweitertes multilinguales und thesaurusbasiertes Recherchevokabular wird erstellt, das für die inhaltliche Suche nach Dokumenten in heterogen erschlossenen Beständen verwendet werden kann. In diesem Artikel wird die Problematik bei der Verknüpfung semantisch heterogener Systeme unter besonderer Berücksichtigung der Unterschiede zwischen der DDC und der SWD skizziert. Die in CrissCross gewählte Methodik bei der Verknüpfung von SWD und DDC wird vorgestellt. Abschließend wird der Nutzen der erstellten Daten für das Retrieval aufgezeigt.
    Source
    Wissensspeicher in digitalen Räumen: Nachhaltigkeit - Verfügbarkeit - semantische Interoperabilität. Proceedings der 11. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation, Konstanz, 20. bis 22. Februar 2008. Hrsg.: J. Sieglerschmidt u. H.P.Ohly
  9. Ménard, E.; Khashman, N.; Kochkina, S.; Torres-Moreno, J.-M.; Velazquez-Morales, P.; Zhou, F.; Jourlin, P.; Rawat, P.; Peinl, P.; Linhares Pontes, E.; Brunetti., I.: ¬A second life for TIIARA : from bilingual to multilingual! (2016) 0.00
    0.0049850564 = product of:
      0.017447697 = sum of:
        0.005178697 = product of:
          0.025893483 = sum of:
            0.025893483 = weight(_text_:retrieval in 2834) [ClassicSimilarity], result of:
              0.025893483 = score(doc=2834,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23632148 = fieldWeight in 2834, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2834)
          0.2 = coord(1/5)
        0.0122690005 = product of:
          0.024538001 = sum of:
            0.024538001 = weight(_text_:22 in 2834) [ClassicSimilarity], result of:
              0.024538001 = score(doc=2834,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.19345059 = fieldWeight in 2834, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2834)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Multilingual controlled vocabularies are rare and often very limited in the choice of languages offered. TIIARA (Taxonomy for Image Indexing and RetrievAl) is a bilingual taxonomy developed for image indexing and retrieval. This controlled vocabulary offers indexers and image searchers innovative and coherent access points for ordinary images. The preliminary steps of the elaboration of the bilingual structure are presented. For its initial development, TIIARA included only two languages, French and English. As a logical follow-up, TIIARA was translated into eight languages-Arabic, Spanish, Brazilian Portuguese, Mandarin Chinese, Italian, German, Hindi and Russian-in order to increase its international scope. This paper briefly describes the different stages of the development of the bilingual structure. The processes used in the translations are subsequently presented, as well as the main difficulties encountered by the translators. Adding more languages in TIIARA constitutes an added value for a controlled vocabulary meant to be used by image searchers, who are often limited by their lack of knowledge of multiple languages.
    Source
    Knowledge organization. 43(2016) no.1, S.22-34
  10. Oard, D.W.; He, D.; Wang, J.: User-assisted query translation for interactive cross-language information retrieval (2008) 0.00
    0.004896801 = product of:
      0.034277607 = sum of:
        0.034277607 = product of:
          0.085694015 = sum of:
            0.0380555 = weight(_text_:retrieval in 2030) [ClassicSimilarity], result of:
              0.0380555 = score(doc=2030,freq=6.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.34732026 = fieldWeight in 2030, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2030)
            0.047638513 = weight(_text_:system in 2030) [ClassicSimilarity], result of:
              0.047638513 = score(doc=2030,freq=8.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.41757566 = fieldWeight in 2030, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2030)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Interactive Cross-Language Information Retrieval (CLIR), a process in which searcher and system collaborate to find documents that satisfy an information need regardless of the language in which those documents are written, calls for designs in which synergies between searcher and system can be leveraged so that the strengths of one can cover weaknesses of the other. This paper describes an approach that employs user-assisted query translation to help searchers better understand the system's operation. Supporting interaction and interface designs are introduced, and results from three user studies are presented. The results indicate that experienced searchers presented with this new system evolve new search strategies that make effective use of the new capabilities, that they achieve retrieval effectiveness comparable to results obtained using fully automatic techniques, and that reported satisfaction with support for cross-language searching increased. The paper concludes with a description of a freely available interactive CLIR system that incorporates lessons learned from this research.
  11. Petrelli, D.; Levin, S.; Beaulieu, M.; Sanderson, M.: Which user interaction for cross-language information retrieval? : design issues and reflections (2006) 0.00
    0.004435898 = product of:
      0.031051284 = sum of:
        0.031051284 = product of:
          0.07762821 = sum of:
            0.0439427 = weight(_text_:retrieval in 5053) [ClassicSimilarity], result of:
              0.0439427 = score(doc=5053,freq=8.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.40105087 = fieldWeight in 5053, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5053)
            0.033685513 = weight(_text_:system in 5053) [ClassicSimilarity], result of:
              0.033685513 = score(doc=5053,freq=4.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.29527056 = fieldWeight in 5053, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5053)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    A novel and complex form of information access is cross-language information retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. The authors present three user evaluations undertaken during the iterative design of Clarity, a cross-language retrieval system for lowdensity languages, and shows how the user-interaction design evolved depending on the results of usability tests. The first test was instrumental to identify weaknesses in both functionalities and interface; the second was run to determine if query translation should be shown or not; the final was a global assessment and focused on user satisfaction criteria. Lessons were learned at every stage of the process leading to a much more informed view of what a cross-language retrieval system should offer to users.
  12. Tartakovski, O.; Shramko, M.: Implementierung eines Werkzeugs zur Sprachidentifikation in mono- und multilingualen Texten (2006) 0.00
    0.0043171803 = product of:
      0.030220259 = sum of:
        0.030220259 = product of:
          0.075550646 = sum of:
            0.036250874 = weight(_text_:retrieval in 5978) [ClassicSimilarity], result of:
              0.036250874 = score(doc=5978,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.33085006 = fieldWeight in 5978, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5978)
            0.039299767 = weight(_text_:system in 5978) [ClassicSimilarity], result of:
              0.039299767 = score(doc=5978,freq=4.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.34448233 = fieldWeight in 5978, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5978)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Die Identifikation der Sprache bzw. der Sprachen in Textdokumenten ist einer der wichtigsten Schritte maschineller Textverarbeitung für das Information Retrieval. Der vorliegende Artikel stellt Langldent vor, ein System zur Sprachidentifikation von mono- und multilingualen elektronischen Textdokumenten. Das System bietet sowohl eine Auswahl von gängigen Algorithmen für die Sprachidentifikation monolingualer Textdokumente als auch einen neuen Algorithmus für die Sprachidentifikation multilingualer Textdokumente.
    Source
    Effektive Information Retrieval Verfahren in Theorie und Praxis: ausgewählte und erweiterte Beiträge des Vierten Hildesheimer Evaluierungs- und Retrievalworkshop (HIER 2005), Hildesheim, 20.7.2005. Hrsg.: T. Mandl u. C. Womser-Hacker
  13. Evens, M.: Thesaural relations in information retrieval (2002) 0.00
    0.004168497 = product of:
      0.029179478 = sum of:
        0.029179478 = product of:
          0.072948694 = sum of:
            0.049129434 = weight(_text_:retrieval in 1201) [ClassicSimilarity], result of:
              0.049129434 = score(doc=1201,freq=10.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.44838852 = fieldWeight in 1201, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1201)
            0.023819257 = weight(_text_:system in 1201) [ClassicSimilarity], result of:
              0.023819257 = score(doc=1201,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.20878783 = fieldWeight in 1201, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1201)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Thesaural relations have long been used in information retrieval to enrich queries; they have sometimes been used to cluster documents as well. Sometimes the first query to an information retrieval system yields no results at all, or, what can be even more disconcerting, many thousands of hits. One solution is to rephrase the query, improving the choice of query terms by using related terms of different types. A collection of related terms is often called a thesaurus. This chapter describes the lexical-semantic relations that have been used in building thesauri and summarizes some of the effects of using these relational thesauri in information retrieval experiments
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  14. Rosemblat, G.; Graham, L.: Cross-language search in a monolingual health information system : flexible designs and lexical processes (2006) 0.00
    0.0041330485 = product of:
      0.028931338 = sum of:
        0.028931338 = product of:
          0.072328344 = sum of:
            0.03107218 = weight(_text_:retrieval in 241) [ClassicSimilarity], result of:
              0.03107218 = score(doc=241,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.2835858 = fieldWeight in 241, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=241)
            0.041256163 = weight(_text_:system in 241) [ClassicSimilarity], result of:
              0.041256163 = score(doc=241,freq=6.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.36163113 = fieldWeight in 241, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=241)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    The predominance of English-only online health information poses a serious challenge to nonEnglish speakers. To overcome this barrier, we incorporated cross-language information retrieval (CLIR) techniques into a fully functional prototype. It supports Spanish language searches over an English data set using a Spanish-English bilingual term list (BTL). The modular design allows for system and BTL growth and takes advantage of English-system enhancements. Language-based design decisions and implications for integrating non-English components with the existing monolingual architecture are presented. Algorithmic and BTL improvements are used to bring CUR retrieval scores in line with the monolingual values. After validating these changes, we conducted a failure analysis and error categorization for the worst performing queries. We conclude with a comprehensive discussion and directions for future work.
  15. Pollitt, A.S.; Ellis, G.: Multilingual access to document databases (1993) 0.00
    0.004099487 = product of:
      0.028696407 = sum of:
        0.028696407 = product of:
          0.071741015 = sum of:
            0.0380555 = weight(_text_:retrieval in 1302) [ClassicSimilarity], result of:
              0.0380555 = score(doc=1302,freq=6.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.34732026 = fieldWeight in 1302, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1302)
            0.033685513 = weight(_text_:system in 1302) [ClassicSimilarity], result of:
              0.033685513 = score(doc=1302,freq=4.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.29527056 = fieldWeight in 1302, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1302)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    This paper examines the reasons why approaches to facilitate document retrieval which apply AI (Artificial Intelligence) or Expert Systems techniques, relying on so-called "natural language" query statements from the end-user will result in sub-optimal solutions. It does so by reflecting on the nature of language and the fundamental problems in document retrieval. Support is given to the work of thesaurus builders and indexers with illustrations of how their work may be utilised in a generally applicable computer-based document retrieval system using Multilingual MenUSE software. The EuroMenUSE interface providing multilingual document access to EPOQUE, the European Parliament's Online Query System is described.
  16. Schlenkrich, C.: Aspekte neuer Regelwerksarbeit : Multimediales Datenmodell für ARD und ZDF (2003) 0.00
    0.004087601 = product of:
      0.014306603 = sum of:
        0.0044914023 = product of:
          0.022457011 = sum of:
            0.022457011 = weight(_text_:system in 1515) [ClassicSimilarity], result of:
              0.022457011 = score(doc=1515,freq=4.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.19684705 = fieldWeight in 1515, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1515)
          0.2 = coord(1/5)
        0.0098152 = product of:
          0.0196304 = sum of:
            0.0196304 = weight(_text_:22 in 1515) [ClassicSimilarity], result of:
              0.0196304 = score(doc=1515,freq=2.0), product of:
                0.12684377 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03622214 = queryNorm
                0.15476047 = fieldWeight in 1515, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1515)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Wir sind mitten in der Arbeit, deshalb kann ich Ihnen nur Arbeitsstände weitergeben. Es ist im Fluss, und wir bemühen uns in der Tat, die "alten Regelwerke" fit zu machen und sie für den Multimediabereich aufzuarbeiten. Ganz kurz zur Arbeitsgruppe: Sie entstammt der AG Orgatec, der Schall- und Hörfunkarchivleiter- und der Fernseharchivleiterkonferenz zur Erstellung eines verbindlichen multimedialen Regelwerks. Durch die Digitalisierung haben sich die Aufgaben in den Archivbereichen eindeutig geändert. Wir versuchen, diese Prozesse abzufangen, und zwar vom Produktionsprozess bis hin zur Archivierung neu zu regeln und neu zu definieren. Wir haben mit unserer Arbeit begonnen im April letzten Jahres, sind also jetzt nahezu exakt ein Jahr zugange, und ich werde Ihnen im Laufe des kurzen Vortrages berichten können, wie wir unsere Arbeit gestaltet haben. Etwas zu den Mitgliedern der Arbeitsgruppe - ich denke, es ist ganz interessant, einfach mal zu sehen, aus welchen Bereichen und Spektren unsere Arbeitsgruppe sich zusammensetzt. Wir haben also Vertreter des Bayrischen Rundfunks, des Norddeutschen -, des Westdeutschen Rundfunks, des Mitteldeutschen von Ost nach West, von Süd nach Nord und aus den verschiedensten Arbeitsbereichen von Audio über Video bis hin zu Online- und Printbereichen. Es ist eine sehr bunt gemischte Truppe, aber auch eine hochspannenden Diskussion exakt eben aufgrund der Vielfalt, die wir abbilden wollen und abbilden müssen. Die Ziele: Wir wollen verbindlich ein multimediales Datenmodell entwickeln und verabschieden, was insbesondere den digitalen Produktionscenter und Archiv-Workflow von ARD und - da haben wir uns besonders gefreut - auch in guter alter Tradition in gemeinsamer Zusammenarbeit mit dem ZDF bildet. Wir wollen Erfassungs- und Erschließungsregeln definieren. Wir wollen Mittlerdaten generieren und bereitstellen, um den Produktions-Workflow abzubilden und zu gewährleisten, und das Datenmodell, das wir uns sozusagen als Zielstellung definiert haben, soll für den Programmaustausch Grundlagen schaffen, damit von System zu System intern und extern kommuniziert werden kann. Nun könnte man meinen, dass ein neues multimediales Datenmodell aus einem Mix der alten Regelwerke Fernsehen, Wort und Musik recht einfach zusammenzuführen sei. Man stellt einfach die Datenlisten der einzelnen Regelwerke synoptisch gegenüber, klärt Gemeinsames und Spezifisches ab, ergänzt Fehlendes, eliminiert eventuell nicht Benötigtes und stellt es einfach neu zusammen, fertig ist das neue Regelwerk. Leider ist es nicht ganz so einfach, denn es gibt dabei doch eine ganze Reihe von Aspekten zu berücksichtigen, die eine vorgelagerte Abstraktionsebene auch zwingend erforderlich machen.
    Date
    22. 4.2003 12:05:56
  17. Oard, D.W.: Alternative approaches for cross-language text retrieval (1997) 0.00
    0.0039593396 = product of:
      0.027715376 = sum of:
        0.027715376 = product of:
          0.06928844 = sum of:
            0.04963856 = weight(_text_:retrieval in 1164) [ClassicSimilarity], result of:
              0.04963856 = score(doc=1164,freq=30.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.45303512 = fieldWeight in 1164, product of:
                  5.477226 = tf(freq=30.0), with freq of:
                    30.0 = termFreq=30.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=1164)
            0.019649884 = weight(_text_:system in 1164) [ClassicSimilarity], result of:
              0.019649884 = score(doc=1164,freq=4.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.17224117 = fieldWeight in 1164, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=1164)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    The explosive growth of the Internet and other sources of networked information have made automatic mediation of access to networked information sources an increasingly important problem. Much of this information is expressed as electronic text, and it is becoming practical to automatically convert some printed documents and recorded speech to electronic text as well. Thus, automated systems capable of detecting useful documents are finding widespread application. With even a small number of languages it can be inconvenient to issue the same query repeatedly in every language, so users who are able to read more than one language will likely prefer a multilingual text retrieval system over a collection of monolingual systems. And since reading ability in a language does not always imply fluent writing ability in that language, such users will likely find cross-language text retrieval particularly useful for languages in which they are less confident of their ability to express their information needs effectively. The use of such systems can be also be beneficial if the user is able to read only a single language. For example, when only a small portion of the document collection will ever be examined by the user, performing retrieval before translation can be significantly more economical than performing translation before retrieval. So when the application is sufficiently important to justify the time and effort required for translation, those costs can be minimized if an effective cross-language text retrieval system is available. Even when translation is not available, there are circumstances in which cross-language text retrieval could be useful to a monolingual user. For example, a researcher might find a paper published in an unfamiliar language useful if that paper contains references to works by the same author that are in the researcher's native language.
    Multilingual text retrieval can be defined as selection of useful documents from collections that may contain several languages (English, French, Chinese, etc.). This formulation allows for the possibility that individual documents might contain more than one language, a common occurrence in some applications. Both cross-language and within-language retrieval are included in this formulation, but it is the cross-language aspect of the problem which distinguishes multilingual text retrieval from its well studied monolingual counterpart. At the SIGIR 96 workshop on "Cross-Linguistic Information Retrieval" the participants discussed the proliferation of terminology being used to describe the field and settled on "Cross-Language" as the best single description of the salient aspect of the problem. "Multilingual" was felt to be too broad, since that term has also been used to describe systems able to perform within-language retrieval in more than one language but that lack any cross-language capability. "Cross-lingual" and "cross-linguistic" were felt to be equally good descriptions of the field, but "crosslanguage" was selected as the preferred term in the interest of standardization. Unfortunately, at about the same time the U.S. Defense Advanced Research Projects Agency (DARPA) introduced "translingual" as their preferred term, so we are still some distance from reaching consensus on this matter.
    I will not attempt to draw a sharp distinction between retrieval and filtering in this survey. Although my own work on adaptive cross-language text filtering has led me to make this distinction fairly carefully in other presentations (c.f., (Oard 1997b)), such an proach does little to help understand the fundamental techniques which have been applied or the results that have been obtained in this case. Since it is still common to view filtering (detection of useful documents in dynamic document streams) as a kind of retrieval, will simply adopt that perspective here.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  18. Pirkola, A.; Puolamäki, D.; Järvelin, K.: Applying query structuring in cross-language retrieval (2003) 0.00
    0.0038721121 = product of:
      0.027104784 = sum of:
        0.027104784 = product of:
          0.06776196 = sum of:
            0.0439427 = weight(_text_:retrieval in 1074) [ClassicSimilarity], result of:
              0.0439427 = score(doc=1074,freq=8.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.40105087 = fieldWeight in 1074, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1074)
            0.023819257 = weight(_text_:system in 1074) [ClassicSimilarity], result of:
              0.023819257 = score(doc=1074,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.20878783 = fieldWeight in 1074, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1074)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    We will explore various ways to apply query structuring in cross-language information retrieval. In the first test, English queries were translated into Finnish using an electronic dictionary, and were run in a Finnish newspaper database of 55,000 articles. Queries were structured by combining the Finnish translation equivalents of the same English query key using the syn-operator of the InQuery retrieval system. Structured queries performed markedly better than unstructured queries. Second, the effects of compound-based structuring using a proximity operator for the translation equivalents of query language compound components were tested. The method was not useful in syn-based queries but resulted in decrease in retrieval effectiveness. Proper names are often non-identical spelling variants in different languages. This allows n-gram based translation of names not included in a dictionary. In the third test, a query structuring method where the Boolean and-operator was used to assign more weight to keys translated through n-gram matching gave good results.
  19. Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.00
    0.0038721121 = product of:
      0.027104784 = sum of:
        0.027104784 = product of:
          0.06776196 = sum of:
            0.0439427 = weight(_text_:retrieval in 2342) [ClassicSimilarity], result of:
              0.0439427 = score(doc=2342,freq=8.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.40105087 = fieldWeight in 2342, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2342)
            0.023819257 = weight(_text_:system in 2342) [ClassicSimilarity], result of:
              0.023819257 = score(doc=2342,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.20878783 = fieldWeight in 2342, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2342)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.
  20. Ferber, R.: Automated indexing with thesaurus descriptors : a co-occurence based approach to multilingual retrieval (1997) 0.00
    0.0037481284 = product of:
      0.026236897 = sum of:
        0.026236897 = product of:
          0.065592244 = sum of:
            0.025893483 = weight(_text_:retrieval in 4144) [ClassicSimilarity], result of:
              0.025893483 = score(doc=4144,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.23632148 = fieldWeight in 4144, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4144)
            0.03969876 = weight(_text_:system in 4144) [ClassicSimilarity], result of:
              0.03969876 = score(doc=4144,freq=8.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.3479797 = fieldWeight in 4144, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4144)
          0.4 = coord(2/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Indexing documents with descriptors from a multilingual thesaurus is an approach to multilingual information retrieval. However, manual indexing is expensive. Automazed indexing methods in general use terms found in the document. Thesaurus descriptors are complex terms that are often not used in documents or have specific meanings within the thesaurus; therefore most weighting schemes of automated indexing methods are not suited to select thesaurus descriptors. In this paper a linear associative system is described that uses similarity values extracted from a large corpus of manually indexed documents to construct a rank ordering of the descriptors for a given document title. The system is adaptive and has to be tuned with a training sample of records for the specific task. The system was tested on a corpus of some 80.000 bibliographic records. The results show a high variability with changing parameter values. This indicated that it is very important to empirically adapt the model to the specific situation it is used in. The overall median of the manually assigned descriptors in the automatically generated ranked list of all 3.631 descriptors is 14 for the set used to adapt the system and 11 for a test set not used in the optimization process. This result shows that the optimization is not a fitting to a specific training set but a real adaptation of the model to the setting

Years

Languages