Search (242 results, page 1 of 13)

Li, K.W.; Yang, C.C.: Conceptual analysis of parallel corpus collected from the Web (2006) 0.11

0.11475631 = product of:
  0.1836101 = sum of:
    0.04882569 = weight(_text_:world in 5051) [ClassicSimilarity], result of:
      0.04882569 = score(doc=5051,freq=4.0), product of:
        0.16259687 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.042302497 = queryNorm
        0.30028677 = fieldWeight in 5051, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5051)
    0.0648802 = weight(_text_:wide in 5051) [ClassicSimilarity], result of:
      0.0648802 = score(doc=5051,freq=4.0), product of:
        0.18743214 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.042302497 = queryNorm
        0.34615302 = fieldWeight in 5051, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5051)
    0.04310937 = weight(_text_:web in 5051) [ClassicSimilarity], result of:
      0.04310937 = score(doc=5051,freq=6.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.3122631 = fieldWeight in 5051, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5051)
    0.01610337 = weight(_text_:information in 5051) [ClassicSimilarity], result of:
      0.01610337 = score(doc=5051,freq=10.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.21684799 = fieldWeight in 5051, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5051)
    0.01069147 = product of:
      0.02138294 = sum of:
        0.02138294 = weight(_text_:retrieval in 5051) [ClassicSimilarity], result of:
          0.02138294 = score(doc=5051,freq=2.0), product of:
            0.12796146 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.042302497 = queryNorm
            0.16710453 = fieldWeight in 5051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5051)
      0.5 = coord(1/2)
  0.625 = coord(5/8)

Abstract: As illustrated by the World Wide Web, the volume of information in languages other than English has grown significantly in recent years. This highlights the importance of multilingual corpora. Much effort has been devoted to the compilation of multilingual corpora for the purpose of cross-lingual information retrieval and machine translation. Existing parallel corpora mostly involve European languages, such as English-French and English-Spanish. There is still a lack of parallel corpora between European languages and Asian. languages. In the authors' previous work, an alignment method to identify one-to-one Chinese and English title pairs was developed to construct an English-Chinese parallel corpus that works automatically from the World Wide Web, and a 100% precision and 87% recall were obtained. Careful analysis of these results has helped the authors to understand how the alignment method can be improved. A conceptual analysis was conducted, which includes the analysis of conceptual equivalent and conceptual information alternation in the aligned and nonaligned English-Chinese title pairs that are obtained by the alignment method. The result of the analysis not only reflects the characteristics of parallel corpora, but also gives insight into the strengths and weaknesses of the alignment method. In particular, conceptual alternation, such as omission and addition, is found to have a significant impact on the performance of the alignment method.
Footnote: Beitrag einer special topic section on multilingual information systems
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.5, S.632-644

Cao, L.; Leong, M.-K.; Low, H.-B.: Searching heterogeneous multilingual bibliographic sources (1998) 0.10

0.1039435 = product of:
  0.207887 = sum of:
    0.055239964 = weight(_text_:world in 3564) [ClassicSimilarity], result of:
      0.055239964 = score(doc=3564,freq=2.0), product of:
        0.16259687 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.042302497 = queryNorm
        0.33973572 = fieldWeight in 3564, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0625 = fieldNorm(doc=3564)
    0.07340356 = weight(_text_:wide in 3564) [ClassicSimilarity], result of:
      0.07340356 = score(doc=3564,freq=2.0), product of:
        0.18743214 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.042302497 = queryNorm
        0.3916274 = fieldWeight in 3564, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0625 = fieldNorm(doc=3564)
    0.056317843 = weight(_text_:web in 3564) [ClassicSimilarity], result of:
      0.056317843 = score(doc=3564,freq=4.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.4079388 = fieldWeight in 3564, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0625 = fieldNorm(doc=3564)
    0.022925617 = product of:
      0.045851234 = sum of:
        0.045851234 = weight(_text_:22 in 3564) [ClassicSimilarity], result of:
          0.045851234 = score(doc=3564,freq=2.0), product of:
            0.14813614 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042302497 = queryNorm
            0.30952093 = fieldWeight in 3564, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3564)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: Propopses a Web-based architecture for searching distributed heterogeneous multi-asian language bibliographic sources, and describes a successful pilot implementation of the system at the Chinese Library (CLib) system developed in Singapore and tested at 2 university libraries and a public library
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia

Yang, C.C.; Lam, W.: Introduction to the special topic section on multilingual information systems (2006) 0.09

0.089771524 = product of:
  0.17954305 = sum of:
    0.07175882 = weight(_text_:world in 5043) [ClassicSimilarity], result of:
      0.07175882 = score(doc=5043,freq=6.0), product of:
        0.16259687 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.042302497 = queryNorm
        0.44132966 = fieldWeight in 5043, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=5043)
    0.05505267 = weight(_text_:wide in 5043) [ClassicSimilarity], result of:
      0.05505267 = score(doc=5043,freq=2.0), product of:
        0.18743214 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.042302497 = queryNorm
        0.29372054 = fieldWeight in 5043, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=5043)
    0.029867046 = weight(_text_:web in 5043) [ClassicSimilarity], result of:
      0.029867046 = score(doc=5043,freq=2.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.21634221 = fieldWeight in 5043, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5043)
    0.022864517 = weight(_text_:information in 5043) [ClassicSimilarity], result of:
      0.022864517 = score(doc=5043,freq=14.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.3078936 = fieldWeight in 5043, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=5043)
  0.5 = coord(4/8)

Abstract: The information available in languages other than English on the World Wide Web and global information systems is increasing significantly. According to some recent reports. the growth of non-English speaking Internet users is significantly higher than the growth of English-speaking Internet users. Asia and Europe have become the two most-populated regions of Internet users. However, there are many different languages in the many different countries of Asia and Europe. And there are many countries in the world using more than one language as their official languages. For example, Chinese and English are official languages in Hong Kong SAR; English and French are official languages in Canada. In the global economy, information systems are no longer utilized by users in a single geographical region but all over the world. Information can be generated, stored, processed, and accessed in several different languages. All of this reveals the importance of research in multilingual information systems.
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.5, S.629-631

Talvensaari, T.; Juhola, M.; Laurikkala, J.; Järvelin, K.: Corpus-based cross-language information retrieval in retrieval of highly relevant documents (2007) 0.09

0.08817352 = product of:
  0.14107764 = sum of:
    0.034524977 = weight(_text_:world in 139) [ClassicSimilarity], result of:
      0.034524977 = score(doc=139,freq=2.0), product of:
        0.16259687 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.042302497 = queryNorm
        0.21233483 = fieldWeight in 139, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=139)
    0.045877226 = weight(_text_:wide in 139) [ClassicSimilarity], result of:
      0.045877226 = score(doc=139,freq=2.0), product of:
        0.18743214 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.042302497 = queryNorm
        0.24476713 = fieldWeight in 139, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=139)
    0.024889207 = weight(_text_:web in 139) [ClassicSimilarity], result of:
      0.024889207 = score(doc=139,freq=2.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.18028519 = fieldWeight in 139, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=139)
    0.014403292 = weight(_text_:information in 139) [ClassicSimilarity], result of:
      0.014403292 = score(doc=139,freq=8.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.19395474 = fieldWeight in 139, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=139)
    0.02138294 = product of:
      0.04276588 = sum of:
        0.04276588 = weight(_text_:retrieval in 139) [ClassicSimilarity], result of:
          0.04276588 = score(doc=139,freq=8.0), product of:
            0.12796146 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.042302497 = queryNorm
            0.33420905 = fieldWeight in 139, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=139)
      0.5 = coord(1/2)
  0.625 = coord(5/8)

Abstract: Information retrieval systems' ability to retrieve highly relevant documents has become more and more important in the age of extremely large collections, such as the World Wide Web (WWW). The authors' aim was to find out how corpus-based cross-language information retrieval (CLIR) manages in retrieving highly relevant documents. They created a Finnish-Swedish comparable corpus from two loosely related document collections and used it as a source of knowledge for query translation. Finnish test queries were translated into Swedish and run against a Swedish test collection. Graded relevance assessments were used in evaluating the results and three relevance criterion levels-liberal, regular, and stringent-were applied. The runs were also evaluated with generalized recall and precision, which weight the retrieved documents according to their relevance level. The performance of the Comparable Corpus Translation system (COCOT) was compared to that of a dictionarybased query translation program; the two translation methods were also combined. The results indicate that corpus-based CUR performs particularly well with highly relevant documents. In average precision, COCOT even matched the monolingual baseline on the highest relevance level. The performance of the different query translation methods was further analyzed by finding out reasons for poor rankings of highly relevant documents.
Source: Journal of the American Society for Information Science and Technology. 58(2007) no.3, S.322-334

Peters, C.; Picchi, E.: Across languages, across cultures : issues in multilinguality and digital libraries (1997) 0.08

0.08102267 = product of:
  0.16204534 = sum of:
    0.055239964 = weight(_text_:world in 1233) [ClassicSimilarity], result of:
      0.055239964 = score(doc=1233,freq=2.0), product of:
        0.16259687 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.042302497 = queryNorm
        0.33973572 = fieldWeight in 1233, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0625 = fieldNorm(doc=1233)
    0.07340356 = weight(_text_:wide in 1233) [ClassicSimilarity], result of:
      0.07340356 = score(doc=1233,freq=2.0), product of:
        0.18743214 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.042302497 = queryNorm
        0.3916274 = fieldWeight in 1233, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0625 = fieldNorm(doc=1233)
    0.016295465 = weight(_text_:information in 1233) [ClassicSimilarity], result of:
      0.016295465 = score(doc=1233,freq=4.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.21943474 = fieldWeight in 1233, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=1233)
    0.01710635 = product of:
      0.0342127 = sum of:
        0.0342127 = weight(_text_:retrieval in 1233) [ClassicSimilarity], result of:
          0.0342127 = score(doc=1233,freq=2.0), product of:
            0.12796146 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.042302497 = queryNorm
            0.26736724 = fieldWeight in 1233, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0625 = fieldNorm(doc=1233)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: With the recent rapid diffusion over the international computer networks of world-wide distributed document bases, the question of multilingual access and multilingual information retrieval is becoming increasingly relevant. We briefly discuss just some of the issues that must be addressed in order to implement a multilingual interface for a Digital Library system and describe our own approach to this problem.
Theme: Information Gateway

Powell, J.; Fox, E.A.: Multilingual federated searching across heterogeneous collections (1998) 0.06

0.063174844 = product of:
  0.16846626 = sum of:
    0.055239964 = weight(_text_:world in 1250) [ClassicSimilarity], result of:
      0.055239964 = score(doc=1250,freq=2.0), product of:
        0.16259687 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.042302497 = queryNorm
        0.33973572 = fieldWeight in 1250, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0625 = fieldNorm(doc=1250)
    0.07340356 = weight(_text_:wide in 1250) [ClassicSimilarity], result of:
      0.07340356 = score(doc=1250,freq=2.0), product of:
        0.18743214 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.042302497 = queryNorm
        0.3916274 = fieldWeight in 1250, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0625 = fieldNorm(doc=1250)
    0.039822727 = weight(_text_:web in 1250) [ClassicSimilarity], result of:
      0.039822727 = score(doc=1250,freq=2.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.2884563 = fieldWeight in 1250, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0625 = fieldNorm(doc=1250)
  0.375 = coord(3/8)

Abstract: This article describes a scalable system for searching heterogeneous multilingual collections on the World Wide Web. It details a markup language for describing the characteristics of a search engine and its interface, and a protocol for requesting word translations between languages.

Fulford, H.: Monolingual or multilingual web sites? : An exploratory study of UK SMEs (2000) 0.06
```
0.058151186 = product of:
  0.15506983 = sum of:
    0.034524977 = weight(_text_:world in 5561) [ClassicSimilarity], result of:
      0.034524977 = score(doc=5561,freq=2.0), product of:
        0.16259687 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.042302497 = queryNorm
        0.21233483 = fieldWeight in 5561, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5561)
    0.045877226 = weight(_text_:wide in 5561) [ClassicSimilarity], result of:
      0.045877226 = score(doc=5561,freq=2.0), product of:
        0.18743214 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.042302497 = queryNorm
        0.24476713 = fieldWeight in 5561, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5561)
    0.07466762 = weight(_text_:web in 5561) [ClassicSimilarity], result of:
      0.07466762 = score(doc=5561,freq=18.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.5408555 = fieldWeight in 5561, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5561)
  0.375 = coord(3/8)
```
Abstract

The strategic importance of the internet as a tool for penetrating global markets is increasingly being realized by UK-based SMEs (Small- Medium-sized Enterprises). This may be evidenced by the proliferation over the past few years of SME web sites promoting products and services, and more recently still by the growing number of SMEs offering facilities on their web sites for conducting business transactions online. In this paper, we report on an exploratory study considering the use being made of the world wide web by UK-based SMEs. The study is focussed on the strategies SMEs are employing to communicate via the web with an international client base. We investigate in particular the languages being used to present web content, considering specifically the extent to which English is being employed. Preliminary results obtained to date suggest that there is heavy reliance on the assumption that the language of the web is English. Based on the findings of our study, we discuss some of the performance and competition issues surrounding the use of foreign languages in business, and consider some of the possible barriers to SMEs creating multilingual web sites. We conclude by making some recommendations for SMEs endeavouring to establish a multilingual online presence, and note the strategic role to be played by web designers, IT consultants, business strategists, professional translators, and localization specialists to help achieve this presence effectively and professionally
Peters, C.; Braschler, M.; Clough, P.: Multilingual information retrieval : from research to practice (2012) 0.06
```
0.05750146 = product of:
  0.11500292 = sum of:
    0.027619982 = weight(_text_:world in 361) [ClassicSimilarity], result of:
      0.027619982 = score(doc=361,freq=2.0), product of:
        0.16259687 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.042302497 = queryNorm
        0.16986786 = fieldWeight in 361, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.03125 = fieldNorm(doc=361)
    0.03670178 = weight(_text_:wide in 361) [ClassicSimilarity], result of:
      0.03670178 = score(doc=361,freq=2.0), product of:
        0.18743214 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.042302497 = queryNorm
        0.1958137 = fieldWeight in 361, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.03125 = fieldNorm(doc=361)
    0.022313485 = weight(_text_:information in 361) [ClassicSimilarity], result of:
      0.022313485 = score(doc=361,freq=30.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.3004734 = fieldWeight in 361, product of:
          5.477226 = tf(freq=30.0), with freq of:
            30.0 = termFreq=30.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=361)
    0.028367674 = product of:
      0.056735348 = sum of:
        0.056735348 = weight(_text_:retrieval in 361) [ClassicSimilarity], result of:
          0.056735348 = score(doc=361,freq=22.0), product of:
            0.12796146 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.042302497 = queryNorm
            0.44337842 = fieldWeight in 361, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03125 = fieldNorm(doc=361)
      0.5 = coord(1/2)
  0.5 = coord(4/8)
```
Abstract

We are living in a multilingual world and the diversity in languages which are used to interact with information access systems has generated a wide variety of challenges to be addressed by computer and information scientists. The growing amount of non-English information accessible globally and the increased worldwide exposure of enterprises also necessitates the adaptation of Information Retrieval (IR) methods to new, multilingual settings.Peters, Braschler and Clough present a comprehensive description of the technologies involved in designing and developing systems for Multilingual Information Retrieval (MLIR). They provide readers with broad coverage of the various issues involved in creating systems to make accessible digitally stored materials regardless of the language(s) they are written in. Details on Cross-Language Information Retrieval (CLIR) are also covered that help readers to understand how to develop retrieval systems that cross language boundaries. Their work is divided into six chapters and accompanies the reader step-by-step through the various stages involved in building, using and evaluating MLIR systems. The book concludes with some examples of recent applications that utilise MLIR technologies. Some of the techniques described have recently started to appear in commercial search systems, while others have the potential to be part of future incarnations.The book is intended for graduate students, scholars, and practitioners with a basic understanding of classical text retrieval methods. It offers guidelines and information on all aspects that need to be taken into consideration when building MLIR systems, while avoiding too many 'hands-on details' that could rapidly become obsolete. Thus it bridges the gap between the material covered by most of the classical IR textbooks and the novel requirements related to the acquisition and dissemination of information in whatever language it is stored.

Content

Inhalt: 1 Introduction 2 Within-Language Information Retrieval 3 Cross-Language Information Retrieval 4 Interaction and User Interfaces 5 Evaluation for Multilingual Information Retrieval Systems 6 Applications of Multilingual Information Access

RSWK

Information-Retrieval-System / Mehrsprachigkeit / Abfrage / Zugriff

Subject

Information-Retrieval-System / Mehrsprachigkeit / Abfrage / Zugriff

Gey, F.C.; Kando, N.; Peters, C.: Cross-Language Information Retrieval : the way ahead (2005) 0.05

0.052204695 = product of:
  0.10440939 = sum of:
    0.04142997 = weight(_text_:world in 1018) [ClassicSimilarity], result of:
      0.04142997 = score(doc=1018,freq=2.0), product of:
        0.16259687 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.042302497 = queryNorm
        0.25480178 = fieldWeight in 1018, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=1018)
    0.029867046 = weight(_text_:web in 1018) [ClassicSimilarity], result of:
      0.029867046 = score(doc=1018,freq=2.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.21634221 = fieldWeight in 1018, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=1018)
    0.014968341 = weight(_text_:information in 1018) [ClassicSimilarity], result of:
      0.014968341 = score(doc=1018,freq=6.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.20156369 = fieldWeight in 1018, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1018)
    0.018144025 = product of:
      0.03628805 = sum of:
        0.03628805 = weight(_text_:retrieval in 1018) [ClassicSimilarity], result of:
          0.03628805 = score(doc=1018,freq=4.0), product of:
            0.12796146 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.042302497 = queryNorm
            0.2835858 = fieldWeight in 1018, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=1018)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: This introductory paper covers not only the research content of the articles in this special issue of IP&M but attempts to characterize the state-of-the-art in the Cross-Language Information Retrieval (CLIR) domain. We present our view of some major directions for CLIR research in the future. In particular, we find that insufficient attention has been given to the Web as a resource for multilingual research, and to languages which are spoken by hundreds of millions of people in the world but have been mainly neglected by the CLIR research community. In addition, we find that most CLIR evaluation has focussed narrowly on the news genre to the exclusion of other important genres such as scientific and technical literature. The paper concludes by describing an ambitious 5-year research plan proposed by James Mayfield and Paul McNamee.
Source: Information processing and management. 41(2005) no.3, S.415-432

Larkey, L.S.; Connell, M.E.: Structured queries, language modelling, and relevance modelling in cross-language information retrieval (2005) 0.05

0.04866516 = product of:
  0.12977375 = sum of:
    0.045877226 = weight(_text_:wide in 1022) [ClassicSimilarity], result of:
      0.045877226 = score(doc=1022,freq=2.0), product of:
        0.18743214 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.042302497 = queryNorm
        0.24476713 = fieldWeight in 1022, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1022)
    0.012473618 = weight(_text_:information in 1022) [ClassicSimilarity], result of:
      0.012473618 = score(doc=1022,freq=6.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.16796975 = fieldWeight in 1022, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1022)
    0.071422905 = sum of:
      0.04276588 = weight(_text_:retrieval in 1022) [ClassicSimilarity], result of:
        0.04276588 = score(doc=1022,freq=8.0), product of:
          0.12796146 = queryWeight, product of:
            3.024915 = idf(docFreq=5836, maxDocs=44218)
            0.042302497 = queryNorm
          0.33420905 = fieldWeight in 1022, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            3.024915 = idf(docFreq=5836, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1022)
      0.028657023 = weight(_text_:22 in 1022) [ClassicSimilarity], result of:
        0.028657023 = score(doc=1022,freq=2.0), product of:
          0.14813614 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.042302497 = queryNorm
          0.19345059 = fieldWeight in 1022, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1022)
  0.375 = coord(3/8)

Abstract: Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries--one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus. We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.
Date: 26.12.2007 20:22:11
Source: Information processing and management. 41(2005) no.3, S.457-474

Freitas-Junior, H.R.; Ribeiro-Neto, B.A.; Freitas-Vale, R. de; Laender, A.H.F.; Lima, L.R.S. de: Categorization-driven cross-language retrieval of medical information (2006) 0.05

0.046336457 = product of:
  0.123563886 = sum of:
    0.024889207 = weight(_text_:web in 5282) [ClassicSimilarity], result of:
      0.024889207 = score(doc=5282,freq=2.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.18028519 = fieldWeight in 5282, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5282)
    0.017640358 = weight(_text_:information in 5282) [ClassicSimilarity], result of:
      0.017640358 = score(doc=5282,freq=12.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.23754507 = fieldWeight in 5282, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5282)
    0.08103432 = sum of:
      0.05237729 = weight(_text_:retrieval in 5282) [ClassicSimilarity], result of:
        0.05237729 = score(doc=5282,freq=12.0), product of:
          0.12796146 = queryWeight, product of:
            3.024915 = idf(docFreq=5836, maxDocs=44218)
            0.042302497 = queryNorm
          0.40932083 = fieldWeight in 5282, product of:
            3.4641016 = tf(freq=12.0), with freq of:
              12.0 = termFreq=12.0
            3.024915 = idf(docFreq=5836, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5282)
      0.028657023 = weight(_text_:22 in 5282) [ClassicSimilarity], result of:
        0.028657023 = score(doc=5282,freq=2.0), product of:
          0.14813614 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.042302497 = queryNorm
          0.19345059 = fieldWeight in 5282, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5282)
  0.375 = coord(3/8)

Abstract: The Web has become a large repository of documents (or pages) written in many different languages. In this context, traditional information retrieval (IR) techniques cannot be used whenever the user query and the documents being retrieved are in different languages. To address this problem, new cross-language information retrieval (CLIR) techniques have been proposed. In this work, we describe a method for cross-language retrieval of medical information. This method combines query terms and related medical concepts obtained automatically through a categorization procedure. The medical concepts are used to create a linguistic abstraction that allows retrieval of information in a language-independent way, minimizing linguistic problems such as polysemy. To evaluate our method, we carried out experiments using the OHSUMED test collection, whose documents are written in English, with queries expressed in Portuguese, Spanish, and French. The results indicate that our cross-language retrieval method is as effective as a standard vector space model algorithm operating on queries and documents in the same language. Further, our results are better than previous results in the literature.
Date: 22. 7.2006 16:46:36
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.4, S.501-510

Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.04

0.043970507 = product of:
  0.11725468 = sum of:
    0.042238384 = weight(_text_:web in 4436) [ClassicSimilarity], result of:
      0.042238384 = score(doc=4436,freq=4.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.3059541 = fieldWeight in 4436, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=4436)
    0.014968341 = weight(_text_:information in 4436) [ClassicSimilarity], result of:
      0.014968341 = score(doc=4436,freq=6.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.20156369 = fieldWeight in 4436, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4436)
    0.060047954 = sum of:
      0.025659526 = weight(_text_:retrieval in 4436) [ClassicSimilarity], result of:
        0.025659526 = score(doc=4436,freq=2.0), product of:
          0.12796146 = queryWeight, product of:
            3.024915 = idf(docFreq=5836, maxDocs=44218)
            0.042302497 = queryNorm
          0.20052543 = fieldWeight in 4436, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.024915 = idf(docFreq=5836, maxDocs=44218)
            0.046875 = fieldNorm(doc=4436)
      0.034388427 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
        0.034388427 = score(doc=4436,freq=2.0), product of:
          0.14813614 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.042302497 = queryNorm
          0.23214069 = fieldWeight in 4436, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=4436)
  0.375 = coord(3/8)

Abstract: Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
Date: 16. 2.2000 14:22:39
Source: Journal of the American Society for Information Science. 51(2000) no.3, S.281-296

Qin, J.; Zhou, Y.; Chau, M.; Chen, H.: Multilingual Web retrieval : an experiment in English-Chinese business intelligence (2006) 0.04
```
0.036919564 = product of:
  0.098452166 = sum of:
    0.060965855 = weight(_text_:web in 5054) [ClassicSimilarity], result of:
      0.060965855 = score(doc=5054,freq=12.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.4416067 = fieldWeight in 5054, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5054)
    0.01610337 = weight(_text_:information in 5054) [ClassicSimilarity], result of:
      0.01610337 = score(doc=5054,freq=10.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.21684799 = fieldWeight in 5054, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5054)
    0.02138294 = product of:
      0.04276588 = sum of:
        0.04276588 = weight(_text_:retrieval in 5054) [ClassicSimilarity], result of:
          0.04276588 = score(doc=5054,freq=8.0), product of:
            0.12796146 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.042302497 = queryNorm
            0.33420905 = fieldWeight in 5054, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5054)
      0.5 = coord(1/2)
  0.375 = coord(3/8)
```
Abstract

As increasing numbers of non-English resources have become available on the Web, the interesting and important issue of how Web users can retrieve documents in different languages has arisen. Cross-language information retrieval (CLIP), the study of retrieving information in one language by queries expressed in another language, is a promising approach to the problem. Cross-language information retrieval has attracted much attention in recent years. Most research systems have achieved satisfactory performance on standard Text REtrieval Conference (TREC) collections such as news articles, but CLIR techniques have not been widely studied and evaluated for applications such as Web portals. In this article, the authors present their research in developing and evaluating a multilingual English-Chinese Web portal that incorporates various CLIP techniques for use in the business domain. A dictionary-based approach was adopted and combines phrasal translation, co-occurrence analysis, and pre- and posttranslation query expansion. The portal was evaluated by domain experts, using a set of queries in both English and Chinese. The experimental results showed that co-occurrence-based phrasal translation achieved a 74.6% improvement in precision over simple word-byword translation. When used together, pre- and posttranslation query expansion improved the performance slightly, achieving a 78.0% improvement over the baseline word-by-word translation approach. In general, applying CLIR techniques in Web applications shows promise.

Footnote

Beitrag einer special topic section on multilingual information systems

Source

Journal of the American Society for Information Science and Technology. 57(2006) no.5, S.671-683

Mitchell, J.S.; Zeng, M.L.; Zumer, M.: Modeling classification systems in multicultural and multilingual contexts (2012) 0.04

0.03585499 = product of:
  0.09561331 = sum of:
    0.04142997 = weight(_text_:world in 1967) [ClassicSimilarity], result of:
      0.04142997 = score(doc=1967,freq=2.0), product of:
        0.16259687 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.042302497 = queryNorm
        0.25480178 = fieldWeight in 1967, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=1967)
    0.029867046 = weight(_text_:web in 1967) [ClassicSimilarity], result of:
      0.029867046 = score(doc=1967,freq=2.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.21634221 = fieldWeight in 1967, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=1967)
    0.02431629 = product of:
      0.04863258 = sum of:
        0.04863258 = weight(_text_:22 in 1967) [ClassicSimilarity], result of:
          0.04863258 = score(doc=1967,freq=4.0), product of:
            0.14813614 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042302497 = queryNorm
            0.32829654 = fieldWeight in 1967, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1967)
      0.5 = coord(1/2)
  0.375 = coord(3/8)

Abstract: This paper reports on the second part of an initiative of the authors on researching classification systems with the conceptual model defined by the Functional Requirements for Subject Authority Data (FRSAD) final report. In an earlier study, the authors explored whether the FRSAD conceptual model could be extended beyond subject authority data to model classification data. The focus of the current study is to determine if classification data modeled using FRSAD can be used to solve real-world discovery problems in multicultural and multilingual contexts. The paper discusses the relationships between entities (same type or different types) in the context of classification systems that involve multiple translations and /or multicultural implementations. Results of two case studies are presented in detail: (a) two instances of the DDC (DDC 22 in English, and the Swedish-English mixed translation of DDC 22), and (b) Chinese Library Classification. The use cases of conceptual models in practice are also discussed.
Source: Beyond libraries - subject metadata in the digital environment and semantic web. IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn

Li, Q.; Chen, Y.P.; Myaeng, S.-H.; Jin, Y.; Kang, B.-Y.: Concept unification of terms in different languages via web mining for Information Retrieval (2009) 0.04

0.035102777 = product of:
  0.09360741 = sum of:
    0.049778413 = weight(_text_:web in 4215) [ClassicSimilarity], result of:
      0.049778413 = score(doc=4215,freq=8.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.36057037 = fieldWeight in 4215, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4215)
    0.017640358 = weight(_text_:information in 4215) [ClassicSimilarity], result of:
      0.017640358 = score(doc=4215,freq=12.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.23754507 = fieldWeight in 4215, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4215)
    0.026188646 = product of:
      0.05237729 = sum of:
        0.05237729 = weight(_text_:retrieval in 4215) [ClassicSimilarity], result of:
          0.05237729 = score(doc=4215,freq=12.0), product of:
            0.12796146 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.042302497 = queryNorm
            0.40932083 = fieldWeight in 4215, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4215)
      0.5 = coord(1/2)
  0.375 = coord(3/8)

Abstract: For historical and cultural reasons, English phrases, especially proper nouns and new words, frequently appear in Web pages written primarily in East Asian languages such as Chinese, Korean, and Japanese. Although such English terms and their equivalences in these East Asian languages refer to the same concept, they are often erroneously treated as independent index units in traditional Information Retrieval (IR). This paper describes the degree to which the problem arises in IR and proposes a novel technique to solve it. Our method first extracts English terms from native Web documents in an East Asian language, and then unifies the extracted terms and their equivalences in the native language as one index unit. For Cross-Language Information Retrieval (CLIR), one of the major hindrances to achieving retrieval performance at the level of Mono-Lingual Information Retrieval (MLIR) is the translation of terms in search queries which can not be found in a bilingual dictionary. The Web mining approach proposed in this paper for concept unification of terms in different languages can also be applied to solve this well-known challenge in CLIR. Experimental results based on NTCIR and KT-Set test collections show that the high translation precision of our approach greatly improves performance of both Mono-Lingual and Cross-Language Information Retrieval.
Source: Information processing and management. 45(2009) no.2, S.246-262

Markó, K.G.: Foundation, implementation and evaluation of the MorphoSaurus system (2008) 0.03
```
0.034947813 = product of:
  0.069895625 = sum of:
    0.024167484 = weight(_text_:world in 4415) [ClassicSimilarity], result of:
      0.024167484 = score(doc=4415,freq=2.0), product of:
        0.16259687 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.042302497 = queryNorm
        0.14863437 = fieldWeight in 4415, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4415)
    0.017422445 = weight(_text_:web in 4415) [ClassicSimilarity], result of:
      0.017422445 = score(doc=4415,freq=2.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.12619963 = fieldWeight in 4415, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4415)
    0.013337635 = weight(_text_:information in 4415) [ClassicSimilarity], result of:
      0.013337635 = score(doc=4415,freq=14.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.1796046 = fieldWeight in 4415, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4415)
    0.014968057 = product of:
      0.029936114 = sum of:
        0.029936114 = weight(_text_:retrieval in 4415) [ClassicSimilarity], result of:
          0.029936114 = score(doc=4415,freq=8.0), product of:
            0.12796146 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.042302497 = queryNorm
            0.23394634 = fieldWeight in 4415, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02734375 = fieldNorm(doc=4415)
      0.5 = coord(1/2)
  0.5 = coord(4/8)
```
Abstract

This work proposes an approach which is intended to meet the particular challenges of Medical Language Processing, in particular medical information retrieval. At its core lies a new type of dictionary, in which the entries are equivalence classes of subwords, i.e., semantically minimal units. These equivalence classes capture intralingual as well as interlingual synonymy. As equivalence classes abstract away from subtle particularities within and between languages and reference to them is realized via a language-independent conceptual system, they form an interlingua. In this work, the theoretical foundations of this approach are elaborated on. Furthermore, design considerations of applications based on the subword methodology are drawn up and showcase implementations are evaluated in detail. Starting with the introduction of Medical Linguistics as a field of active research in Chapter two, its consideration as a domain separated form general linguistics is motivated. In particular, morphological phenomena inherent to medical language are figured in more detail, which leads to an alternative view on medical terms and the introduction of the notion of subwords. Chapter three describes the formal foundation of subwords and the underlying linguistic declarative as well as procedural knowledge. An implementation of the subword model for the medical domain, the MorphoSaurus system, is presented in Chapter four. Emphasis will be given on the multilingual aspect of the proposed approach, including English, German, and Portuguese. The automatic acquisition of (medical) subwords for other languages (Spanish, French, and Swedish), and their integration in already available resources is described in the fifth Chapter.
The proper handling of acronyms plays a crucial role in medical texts, e.g. in patient records, as well as in scientific literature. Chapter six presents an approach, in which acronyms are automatically acquired from (bio-) medical literature. Furthermore, acronyms and their definitions in different languages are linked to each other using the MorphoSaurus text processing system. Automatic word sense disambiguation is still one of the most challenging tasks in Natural Language Processing. In Chapter seven, cross-lingual considerations lead to a new methodology for automatic disambiguation applied to subwords. Beginning with Chapter eight, a series of applications based onMorphoSaurus are introduced. Firstly, the implementation of the subword approach within a crosslanguage information retrieval setting for the medical domain is described and evaluated on standard test document collections. In Chapter nine, this methodology is extended to multilingual information retrieval in the Web, for which user queries are translated into target languages based on the segmentation into subwords and their interlingual mappings. The cross-lingual, automatic assignment of document descriptors to documents is the topic of Chapter ten. A large-scale evaluation of a heuristic, as well as a statistical algorithm is carried out using a prominent medical thesaurus as a controlled vocabulary. In Chapter eleven, it will be shown how MorphoSaurus can be used to map monolingual, lexical resources across different languages. As a result, a large multilingual medical lexicon with high coverage and complete lexical information is built and evaluated against a comparable, already available and commonly used lexical repository for the medical domain. Chapter twelve sketches a few applications based on MorphoSaurus. The generality and applicability of the subword approach to other domains is outlined, and proof-of-concepts in real-world scenarios are presented. Finally, Chapter thirteen recapitulates the most important aspects of MorphoSaurus and the potential benefit of its employment in medical information systems is carefully assessed, both for medical experts in their everyday life, but also with regard to health care consumers and their existential information needs.

Source

Subword indexing, lexical learning and word sense disambiguation for medical crosslanguage information retrieval

Cheng, P.J.; Teng, J.W.; Chen, R.C.; Wang, J.H.; Lu, W.H.; Chien, L.F.: Translating unknown queries with Web corpora for cross-language information languages (2004) 0.03

0.034324005 = product of:
  0.09153068 = sum of:
    0.049778413 = weight(_text_:web in 4131) [ClassicSimilarity], result of:
      0.049778413 = score(doc=4131,freq=2.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.36057037 = fieldWeight in 4131, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=4131)
    0.02036933 = weight(_text_:information in 4131) [ClassicSimilarity], result of:
      0.02036933 = score(doc=4131,freq=4.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.27429342 = fieldWeight in 4131, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=4131)
    0.02138294 = product of:
      0.04276588 = sum of:
        0.04276588 = weight(_text_:retrieval in 4131) [ClassicSimilarity], result of:
          0.04276588 = score(doc=4131,freq=2.0), product of:
            0.12796146 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.042302497 = queryNorm
            0.33420905 = fieldWeight in 4131, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=4131)
      0.5 = coord(1/2)
  0.375 = coord(3/8)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.03

0.03360464 = product of:
  0.08961237 = sum of:
    0.051731247 = weight(_text_:web in 2342) [ClassicSimilarity], result of:
      0.051731247 = score(doc=2342,freq=6.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.37471575 = fieldWeight in 2342, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2342)
    0.012221599 = weight(_text_:information in 2342) [ClassicSimilarity], result of:
      0.012221599 = score(doc=2342,freq=4.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.16457605 = fieldWeight in 2342, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2342)
    0.025659526 = product of:
      0.05131905 = sum of:
        0.05131905 = weight(_text_:retrieval in 2342) [ClassicSimilarity], result of:
          0.05131905 = score(doc=2342,freq=8.0), product of:
            0.12796146 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.042302497 = queryNorm
            0.40105087 = fieldWeight in 2342, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2342)
      0.5 = coord(1/2)
  0.375 = coord(3/8)

Abstract: Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.

Wang, J.-H.; Teng, J.-W.; Lu, W.-H.; Chien, L.-F.: Exploiting the Web as the multilingual corpus for unknown query translation (2006) 0.03

0.032824576 = product of:
  0.0875322 = sum of:
    0.05973409 = weight(_text_:web in 5050) [ClassicSimilarity], result of:
      0.05973409 = score(doc=5050,freq=8.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.43268442 = fieldWeight in 5050, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5050)
    0.014968341 = weight(_text_:information in 5050) [ClassicSimilarity], result of:
      0.014968341 = score(doc=5050,freq=6.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.20156369 = fieldWeight in 5050, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=5050)
    0.012829763 = product of:
      0.025659526 = sum of:
        0.025659526 = weight(_text_:retrieval in 5050) [ClassicSimilarity], result of:
          0.025659526 = score(doc=5050,freq=2.0), product of:
            0.12796146 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.042302497 = queryNorm
            0.20052543 = fieldWeight in 5050, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=5050)
      0.5 = coord(1/2)
  0.375 = coord(3/8)

Abstract: Users' cross-lingual queries to a digital library system might be short and the query terms may not be included in a common translation dictionary (unknown terms). In this article, the authors investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms for cross-language information retrieval in digital libraries. They propose a Webbased term translation approach to determine effective translations for unknown query terms by mining bilingual search-result pages obtained from a real Web search engine. This approach can enhance the construction of a domain-specific bilingual lexicon and bring multilingual support to a digital library that only has monolingual document collections. Very promising results have been obtained in generating effective translation equivalents for many unknown terms, including proper nouns, technical terms, and Web query terms, and in assisting bilingual lexicon construction for a real digital library system.
Footnote: Beitrag einer special topic section on multilingual information systems
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.5, S.660-670

Jensen, N.: Evaluierung von mehrsprachigem Web-Retrieval : Experimente mit dem EuroGOV-Korpus im Rahmen des Cross Language Evaluation Forum (CLEF) (2006) 0.03

0.030973136 = product of:
  0.08259503 = sum of:
    0.051731247 = weight(_text_:web in 5964) [ClassicSimilarity], result of:
      0.051731247 = score(doc=5964,freq=6.0), product of:
        0.13805464 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.042302497 = queryNorm
        0.37471575 = fieldWeight in 5964, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5964)
    0.008641975 = weight(_text_:information in 5964) [ClassicSimilarity], result of:
      0.008641975 = score(doc=5964,freq=2.0), product of:
        0.0742611 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.042302497 = queryNorm
        0.116372846 = fieldWeight in 5964, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=5964)
    0.022221804 = product of:
      0.044443607 = sum of:
        0.044443607 = weight(_text_:retrieval in 5964) [ClassicSimilarity], result of:
          0.044443607 = score(doc=5964,freq=6.0), product of:
            0.12796146 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.042302497 = queryNorm
            0.34732026 = fieldWeight in 5964, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=5964)
      0.5 = coord(1/2)
  0.375 = coord(3/8)

Abstract: Der vorliegende Artikel beschreibt die Experimente der Universität Hildesheim im Rahmen des ersten Web Track der CLEF-Initiative (WebCLEF) im Jahr 2005. Bei der Teilnahme konnten Erfahrungen mit einem multilingualen Web-Korpus (EuroGOV) bei der Vorverarbeitung, der Topic- bzw. Query-Entwicklung, bei sprachunabhängigen Indexierungsmethoden und multilingualen Retrieval-Strategien gesammelt werden. Aufgrund des großen Um-fangs des Korpus und der zeitlichen Einschränkungen wurden multilinguale Indizes aufgebaut. Der Artikel beschreibt die Vorgehensweise bei der Teilnahme der Universität Hildesheim und die Ergebnisse der offiziell eingereichten sowie weiterer Experimente. Für den Multilingual Task konnte das beste Ergebnis in CLEF erzielt werden.
Source: Effektive Information Retrieval Verfahren in Theorie und Praxis: ausgewählte und erweiterte Beiträge des Vierten Hildesheimer Evaluierungs- und Retrievalworkshop (HIER 2005), Hildesheim, 20.7.2005. Hrsg.: T. Mandl u. C. Womser-Hacker

Search (242 results, page 1 of 13)

Authors

Years

Languages

Types

Themes

Subjects

Classifications