Search (251 results, page 1 of 13)

Lonsdale, D.; Mitamura, T.; Nyberg, E.: Acquisition of large lexicons for practical knowledge-based MT (1994/95) 0.08

0.07804743 = product of:
  0.17560673 = sum of:
    0.07118686 = weight(_text_:applications in 7409) [ClassicSimilarity], result of:
      0.07118686 = score(doc=7409,freq=4.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.41273528 = fieldWeight in 7409, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.046875 = fieldNorm(doc=7409)
    0.021062955 = weight(_text_:of in 7409) [ClassicSimilarity], result of:
      0.021062955 = score(doc=7409,freq=22.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.34381276 = fieldWeight in 7409, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=7409)
    0.042483397 = weight(_text_:systems in 7409) [ClassicSimilarity], result of:
      0.042483397 = score(doc=7409,freq=6.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.35286134 = fieldWeight in 7409, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.046875 = fieldNorm(doc=7409)
    0.040873505 = weight(_text_:software in 7409) [ClassicSimilarity], result of:
      0.040873505 = score(doc=7409,freq=2.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.2629875 = fieldWeight in 7409, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.046875 = fieldNorm(doc=7409)
  0.44444445 = coord(4/9)

Abstract: Although knowledge based MT systems have the potential to achieve high translation accuracy, each successful application system requires a large amount of hand coded lexical knowledge. Systems like KBMT-89 and its descendants have demonstarted how knowledge based translation can produce good results in technical domains with tractable domain semantics. Nevertheless, the magnitude of the development task for large scale applications with 10s of 1000s of of domain concepts precludes a purely hand crafted approach. The current challenge for the next generation of knowledge based MT systems is to utilize online textual resources and corpus analysis software in order to automate the most laborious aspects of the knowledge acquisition process. This partial automation can in turn maximize the productivity of human knowledge engineers and help to make large scale applications of knowledge based MT an viable approach. Discusses the corpus based knowledge acquisition methodology used in KANT, a knowledge based translation system for multilingual document production. This methodology can be generalized beyond the KANT interlinhua approach for use with any system that requires similar kinds of knowledge

Schubert, K.: Parameters for the design of an intermediate language for multilingual thesauri (1995) 0.06

0.06359902 = product of:
  0.14309779 = sum of:
    0.01960283 = weight(_text_:of in 2092) [ClassicSimilarity], result of:
      0.01960283 = score(doc=2092,freq=14.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.31997898 = fieldWeight in 2092, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2092)
    0.057231534 = weight(_text_:systems in 2092) [ClassicSimilarity], result of:
      0.057231534 = score(doc=2092,freq=8.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.47535738 = fieldWeight in 2092, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2092)
    0.047685754 = weight(_text_:software in 2092) [ClassicSimilarity], result of:
      0.047685754 = score(doc=2092,freq=2.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.30681872 = fieldWeight in 2092, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2092)
    0.018577661 = product of:
      0.037155323 = sum of:
        0.037155323 = weight(_text_:22 in 2092) [ClassicSimilarity], result of:
          0.037155323 = score(doc=2092,freq=2.0), product of:
            0.13719016 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03917671 = queryNorm
            0.2708308 = fieldWeight in 2092, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2092)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)

Abstract: The architecture of multilingual software systems is sometimes centred around an intermediate language. The question is analyzed to what extent this approach can be useful for multilingual thesauri, in particular regarding the functionality the thesaurus is designed to fulfil. Both the runtime use, and the construction and maintenance of the system is taken into consideration. Using the perspective of general language technology enables to draw on experience from a broader range of fields beyond thesaurus design itself as well as to consider the possibility of using a thesaurus as a knowledge module in various systems which process natural language. Therefore the features which thesauri and other natural-language processing systems have in common are emphasized, especially at the level of systems design and their core functionality
Source: Knowledge organization. 22(1995) nos.3/4, S.136-140

Multilingual information management : current levels and future abilities. A report Commissioned by the US National Science Foundation and also delivered to the European Commission's Language Engineering Office and the US Defense Advanced Research Projects Agency, April 1999 (1999) 0.05
```
0.05319977 = product of:
  0.119699486 = sum of:
    0.05812384 = weight(_text_:applications in 6068) [ClassicSimilarity], result of:
      0.05812384 = score(doc=6068,freq=6.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.33699697 = fieldWeight in 6068, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03125 = fieldNorm(doc=6068)
    0.011201616 = weight(_text_:of in 6068) [ClassicSimilarity], result of:
      0.011201616 = score(doc=6068,freq=14.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.18284513 = fieldWeight in 6068, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=6068)
    0.023125032 = weight(_text_:systems in 6068) [ClassicSimilarity], result of:
      0.023125032 = score(doc=6068,freq=4.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.19207339 = fieldWeight in 6068, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03125 = fieldNorm(doc=6068)
    0.027249003 = weight(_text_:software in 6068) [ClassicSimilarity], result of:
      0.027249003 = score(doc=6068,freq=2.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.17532499 = fieldWeight in 6068, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03125 = fieldNorm(doc=6068)
  0.44444445 = coord(4/9)
```
Abstract

Over the past 50 years, a variety of language-related capabilities has been developed in machine translation, information retrieval, speech recognition, text summarization, and so on. These applications rest upon a set of core techniques such as language modeling, information extraction, parsing, generation, and multimedia planning and integration; and they involve methods using statistics, rules, grammars, lexicons, ontologies, training techniques, and so on. It is a puzzling fact that although all of this work deals with language in some form or other, the major applications have each developed a separate research field. For example, there is no reason why speech recognition techniques involving n-grams and hidden Markov models could not have been used in machine translation 15 years earlier than they were, or why some of the lexical and semantic insights from the subarea called Computational Linguistics are still not used in information retrieval.
This picture will rapidly change. The twin challenges of massive information overload via the web and ubiquitous computers present us with an unavoidable task: developing techniques to handle multilingual and multi-modal information robustly and efficiently, with as high quality performance as possible. The most effective way for us to address such a mammoth task, and to ensure that our various techniques and applications fit together, is to start talking across the artificial research boundaries. Extending the current technologies will require integrating the various capabilities into multi-functional and multi-lingual natural language systems. However, at this time there is no clear vision of how these technologies could or should be assembled into a coherent framework. What would be involved in connecting a speech recognition system to an information retrieval engine, and then using machine translation and summarization software to process the retrieved text? How can traditional parsing and generation be enhanced with statistical techniques? What would be the effect of carefully crafted lexicons on traditional information retrieval? At which points should machine translation be interleaved within information retrieval systems to enable multilingual processing?

Rettinger, A.; Schumilin, A.; Thoma, S.; Ell, B.: Learning a cross-lingual semantic representation of relations expressed in text (2015) 0.05

0.045119576 = product of:
  0.13535872 = sum of:
    0.08389453 = weight(_text_:applications in 2027) [ClassicSimilarity], result of:
      0.08389453 = score(doc=2027,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.4864132 = fieldWeight in 2027, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.078125 = fieldNorm(doc=2027)
    0.010584532 = weight(_text_:of in 2027) [ClassicSimilarity], result of:
      0.010584532 = score(doc=2027,freq=2.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.17277241 = fieldWeight in 2027, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.078125 = fieldNorm(doc=2027)
    0.040879667 = weight(_text_:systems in 2027) [ClassicSimilarity], result of:
      0.040879667 = score(doc=2027,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.339541 = fieldWeight in 2027, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.078125 = fieldNorm(doc=2027)
  0.33333334 = coord(3/9)

Series: Information Systems and Applications, incl. Internet/Web, and HCI; Bd. 9088

Subirats, I.; Prasad, A.R.D.; Keizer, J.; Bagdanov, A.: Implementation of rich metadata formats and demantic tools using DSpace (2008) 0.04
```
0.04260672 = product of:
  0.095865116 = sum of:
    0.04745791 = weight(_text_:applications in 2656) [ClassicSimilarity], result of:
      0.04745791 = score(doc=2656,freq=4.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.27515686 = fieldWeight in 2656, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03125 = fieldNorm(doc=2656)
    0.014666359 = weight(_text_:of in 2656) [ClassicSimilarity], result of:
      0.014666359 = score(doc=2656,freq=24.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.23940048 = fieldWeight in 2656, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=2656)
    0.023125032 = weight(_text_:systems in 2656) [ClassicSimilarity], result of:
      0.023125032 = score(doc=2656,freq=4.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.19207339 = fieldWeight in 2656, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03125 = fieldNorm(doc=2656)
    0.010615807 = product of:
      0.021231614 = sum of:
        0.021231614 = weight(_text_:22 in 2656) [ClassicSimilarity], result of:
          0.021231614 = score(doc=2656,freq=2.0), product of:
            0.13719016 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03917671 = queryNorm
            0.15476047 = fieldWeight in 2656, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2656)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)
```
Abstract

This poster explores the customization of DSpace to allow the use of the AGRIS Application Profile metadata standard and the AGROVOC thesaurus. The objective is the adaptation of DSpace, through the least invasive code changes either in the form of plug-ins or add-ons, to the specific needs of the Agricultural Sciences and Technology community. Metadata standards such as AGRIS AP, and Knowledge Organization Systems such as the AGROVOC thesaurus, provide mechanisms for sharing information in a standardized manner by recommending the use of common semantics and interoperable syntax (Subirats et al., 2007). AGRIS AP was created to enhance the description, exchange and subsequent retrieval of agricultural Document-like Information Objects (DLIOs). It is a metadata schema which draws from Metadata standards such as Dublin Core (DC), the Australian Government Locator Service Metadata (AGLS) and the Agricultural Metadata Element Set (AgMES) namespaces. It allows sharing of information across dispersed bibliographic systems (FAO, 2005). AGROVOC68 is a multilingual structured thesaurus covering agricultural and related domains. Its main role is to standardize the indexing process in order to make searching simpler and more efficient. AGROVOC is developed by FAO (Lauser et al., 2006). The customization of the DSpace is taking place in several phases. First, the AGRIS AP metadata schema was mapped onto the metadata DSpace model, with several enhancements implemented to support AGRIS AP elements. Next, AGROVOC will be integrated as a controlled vocabulary accessed through a local SKOS or OWL file. Eventually the system will be configurable to access AGROVOC through local files or remotely via webservices. Finally, spell checking and tooltips will be incorporated in the user interface to support metadata editing. Adapting DSpace to support AGRIS AP and annotation using the semantically-rich AGROVOC thesaurus transform DSpace into a powerful, domain-specific system for annotation and exchange of bibliographic metadata in the agricultural domain.

Source

Metadata for semantic and social applications : proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, 22 - 26 September 2008, DC 2008: Berlin, Germany / ed. by Jane Greenberg and Wolfgang Klas

Muraszkiewicz, M.; Rybinski, H.; Struk, W.: Software problems of merging multilingual thesauri (1996) 0.04

0.042445045 = product of:
  0.12733513 = sum of:
    0.018332949 = weight(_text_:of in 1130) [ClassicSimilarity], result of:
      0.018332949 = score(doc=1130,freq=6.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.2992506 = fieldWeight in 1130, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.078125 = fieldNorm(doc=1130)
    0.040879667 = weight(_text_:systems in 1130) [ClassicSimilarity], result of:
      0.040879667 = score(doc=1130,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.339541 = fieldWeight in 1130, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.078125 = fieldNorm(doc=1130)
    0.068122506 = weight(_text_:software in 1130) [ClassicSimilarity], result of:
      0.068122506 = score(doc=1130,freq=2.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.43831247 = fieldWeight in 1130, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.078125 = fieldNorm(doc=1130)
  0.33333334 = coord(3/9)

Source: Compatibility and integration of order systems: Research Seminar Proceedings of the TIP/ISKO Meeting, Warsaw, 13-15 September 1995

Automated systems for access to multilingual and multiscript library materials : Proceedings of the ... IFLA satellite meeting ... Madrid, August 18-19, 1993 (1994) 0.04

0.039862238 = product of:
  0.119586706 = sum of:
    0.010584532 = weight(_text_:of in 7705) [ClassicSimilarity], result of:
      0.010584532 = score(doc=7705,freq=2.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.17277241 = fieldWeight in 7705, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.078125 = fieldNorm(doc=7705)
    0.040879667 = weight(_text_:systems in 7705) [ClassicSimilarity], result of:
      0.040879667 = score(doc=7705,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.339541 = fieldWeight in 7705, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.078125 = fieldNorm(doc=7705)
    0.068122506 = weight(_text_:software in 7705) [ClassicSimilarity], result of:
      0.068122506 = score(doc=7705,freq=2.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.43831247 = fieldWeight in 7705, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.078125 = fieldNorm(doc=7705)
  0.33333334 = coord(3/9)

Content: Enthält u.a.: CHACHRA, V.: Subject access in an automated multithesaurus and multilingual environment; GAVIN, T.A.G.: Recent developments in the MINISIS computer software for developing countries

Smith, R.: National bibliographies on CD-ROM : development of a common approach (1994) 0.04

0.036927555 = product of:
  0.11078266 = sum of:
    0.050336715 = weight(_text_:applications in 7406) [ClassicSimilarity], result of:
      0.050336715 = score(doc=7406,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.2918479 = fieldWeight in 7406, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.046875 = fieldNorm(doc=7406)
    0.017962547 = weight(_text_:of in 7406) [ClassicSimilarity], result of:
      0.017962547 = score(doc=7406,freq=16.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.2932045 = fieldWeight in 7406, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=7406)
    0.042483397 = weight(_text_:systems in 7406) [ClassicSimilarity], result of:
      0.042483397 = score(doc=7406,freq=6.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.35286134 = fieldWeight in 7406, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.046875 = fieldNorm(doc=7406)
  0.33333334 = coord(3/9)

Abstract: Describes the activities of the National Bibliographic Service, UK, in developing national bibliographies on CD-ROM. The key aims of the project, partly funded by the Commission of the European Communities under its IMPACT programme, are: to promote better and easier access to European and national bibliographies; to promote economies in library cataloguing through improved interchange of bibliographic records between European (and non European) national libraries, irrespective of differing national MARC formats; and to develop shared approaches to strategies, applications and formats for bibliographic data on CD-ROM. Notes the involvement of 7 national libraries and describes: CD-ROM retrieval interface; MARC conversions; character sets; multilingual interface; links to online systems; links to local library systems; menu systems and compatibility; the Pilot CD-ROM database; and framework for exchange of bibliographic records

Peters, C.; Braschler, M.; Clough, P.: Multilingual information retrieval : from research to practice (2012) 0.04
```
0.036516484 = product of:
  0.10954945 = sum of:
    0.04745791 = weight(_text_:applications in 361) [ClassicSimilarity], result of:
      0.04745791 = score(doc=361,freq=4.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.27515686 = fieldWeight in 361, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03125 = fieldNorm(doc=361)
    0.015841477 = weight(_text_:of in 361) [ClassicSimilarity], result of:
      0.015841477 = score(doc=361,freq=28.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.25858206 = fieldWeight in 361, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=361)
    0.046250064 = weight(_text_:systems in 361) [ClassicSimilarity], result of:
      0.046250064 = score(doc=361,freq=16.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.38414678 = fieldWeight in 361, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03125 = fieldNorm(doc=361)
  0.33333334 = coord(3/9)
```
Abstract

We are living in a multilingual world and the diversity in languages which are used to interact with information access systems has generated a wide variety of challenges to be addressed by computer and information scientists. The growing amount of non-English information accessible globally and the increased worldwide exposure of enterprises also necessitates the adaptation of Information Retrieval (IR) methods to new, multilingual settings.Peters, Braschler and Clough present a comprehensive description of the technologies involved in designing and developing systems for Multilingual Information Retrieval (MLIR). They provide readers with broad coverage of the various issues involved in creating systems to make accessible digitally stored materials regardless of the language(s) they are written in. Details on Cross-Language Information Retrieval (CLIR) are also covered that help readers to understand how to develop retrieval systems that cross language boundaries. Their work is divided into six chapters and accompanies the reader step-by-step through the various stages involved in building, using and evaluating MLIR systems. The book concludes with some examples of recent applications that utilise MLIR technologies. Some of the techniques described have recently started to appear in commercial search systems, while others have the potential to be part of future incarnations.The book is intended for graduate students, scholars, and practitioners with a basic understanding of classical text retrieval methods. It offers guidelines and information on all aspects that need to be taken into consideration when building MLIR systems, while avoiding too many 'hands-on details' that could rapidly become obsolete. Thus it bridges the gap between the material covered by most of the classical IR textbooks and the novel requirements related to the acquisition and dissemination of information in whatever language it is stored.

Content

Inhalt: 1 Introduction 2 Within-Language Information Retrieval 3 Cross-Language Information Retrieval 4 Interaction and User Interfaces 5 Evaluation for Multilingual Information Retrieval Systems 6 Applications of Multilingual Information Access
Qin, J.; Zhou, Y.; Chau, M.; Chen, H.: Multilingual Web retrieval : an experiment in English-Chinese business intelligence (2006) 0.03
```
0.033354186 = product of:
  0.10006255 = sum of:
    0.059322387 = weight(_text_:applications in 5054) [ClassicSimilarity], result of:
      0.059322387 = score(doc=5054,freq=4.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.34394607 = fieldWeight in 5054, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5054)
    0.011833867 = weight(_text_:of in 5054) [ClassicSimilarity], result of:
      0.011833867 = score(doc=5054,freq=10.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.19316542 = fieldWeight in 5054, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5054)
    0.02890629 = weight(_text_:systems in 5054) [ClassicSimilarity], result of:
      0.02890629 = score(doc=5054,freq=4.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.24009174 = fieldWeight in 5054, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5054)
  0.33333334 = coord(3/9)
```
Abstract

As increasing numbers of non-English resources have become available on the Web, the interesting and important issue of how Web users can retrieve documents in different languages has arisen. Cross-language information retrieval (CLIP), the study of retrieving information in one language by queries expressed in another language, is a promising approach to the problem. Cross-language information retrieval has attracted much attention in recent years. Most research systems have achieved satisfactory performance on standard Text REtrieval Conference (TREC) collections such as news articles, but CLIR techniques have not been widely studied and evaluated for applications such as Web portals. In this article, the authors present their research in developing and evaluating a multilingual English-Chinese Web portal that incorporates various CLIP techniques for use in the business domain. A dictionary-based approach was adopted and combines phrasal translation, co-occurrence analysis, and pre- and posttranslation query expansion. The portal was evaluated by domain experts, using a set of queries in both English and Chinese. The experimental results showed that co-occurrence-based phrasal translation achieved a 74.6% improvement in precision over simple word-byword translation. When used together, pre- and posttranslation query expansion improved the performance slightly, achieving a 78.0% improvement over the baseline word-by-word translation approach. In general, applying CLIR techniques in Web applications shows promise.

Footnote

Beitrag einer special topic section on multilingual information systems

Source

Journal of the American Society for Information Science and Technology. 57(2006) no.5, S.671-683

Levergood, B.; Farrenkopf, S.; Frasnelli, E.: ¬The specification of the language of the field and interoperability : cross-language access to catalogues and online libraries (CACAO) (2008) 0.03

0.033270672 = product of:
  0.09981201 = sum of:
    0.07118686 = weight(_text_:applications in 2646) [ClassicSimilarity], result of:
      0.07118686 = score(doc=2646,freq=4.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.41273528 = fieldWeight in 2646, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.046875 = fieldNorm(doc=2646)
    0.012701439 = weight(_text_:of in 2646) [ClassicSimilarity], result of:
      0.012701439 = score(doc=2646,freq=8.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.20732689 = fieldWeight in 2646, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2646)
    0.015923709 = product of:
      0.031847417 = sum of:
        0.031847417 = weight(_text_:22 in 2646) [ClassicSimilarity], result of:
          0.031847417 = score(doc=2646,freq=2.0), product of:
            0.13719016 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03917671 = queryNorm
            0.23214069 = fieldWeight in 2646, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2646)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: The CACAO Project (Cross-language Access to Catalogues and Online Libraries) has been designed to implement natural language processing and cross-language information retrieval techniques to provide cross-language access to information in libraries, a critical issue in the linguistically diverse European Union. This project report addresses two metadata-related challenges for the library community in this context: "false friends" (identical words having different meanings in different languages) and term ambiguity. The possible solutions involve enriching the metadata with attributes specifying language or the source authority file, or associating potential search terms to classes in a classification system. The European Library will evaluate an early implementation of this work in late 2008.
Source: Metadata for semantic and social applications : proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, 22 - 26 September 2008, DC 2008: Berlin, Germany / ed. by Jane Greenberg and Wolfgang Klas

Yu, L.-C.; Wu, C.-H.; Chang, R.-Y.; Liu, C.-H.; Hovy, E.H.: Annotation and verification of sense pools in OntoNotes (2010) 0.03

0.030772962 = product of:
  0.092318885 = sum of:
    0.041947264 = weight(_text_:applications in 4236) [ClassicSimilarity], result of:
      0.041947264 = score(doc=4236,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.2432066 = fieldWeight in 4236, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4236)
    0.014968789 = weight(_text_:of in 4236) [ClassicSimilarity], result of:
      0.014968789 = score(doc=4236,freq=16.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.24433708 = fieldWeight in 4236, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4236)
    0.03540283 = weight(_text_:systems in 4236) [ClassicSimilarity], result of:
      0.03540283 = score(doc=4236,freq=6.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.29405114 = fieldWeight in 4236, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4236)
  0.33333334 = coord(3/9)

Abstract: The paper describes the OntoNotes, a multilingual (English, Chinese and Arabic) corpus with large-scale semantic annotations, including predicate-argument structure, word senses, ontology linking, and coreference. The underlying semantic model of OntoNotes involves word senses that are grouped into so-called sense pools, i.e., sets of near-synonymous senses of words. Such information is useful for many applications, including query expansion for information retrieval (IR) systems, (near-)duplicate detection for text summarization systems, and alternative word selection for writing support systems. Although a sense pool provides a set of near-synonymous senses of words, there is still no knowledge about whether two words in a pool are interchangeable in practical use. Therefore, this paper devises an unsupervised algorithm that incorporates Google n-grams and a statistical test to determine whether a word in a pool can be substituted by other words in the same pool. The n-gram features are used to measure the degree of context mismatch for a substitution. The statistical test is then applied to determine whether the substitution is adequate based on the degree of mismatch. The proposed method is compared with a supervised method, namely Linear Discriminant Analysis (LDA). Experimental results show that the proposed unsupervised method can achieve comparable performance with the supervised method.

Markó, K.G.: Foundation, implementation and evaluation of the MorphoSaurus system (2008) 0.03
```
0.027896393 = product of:
  0.083689176 = sum of:
    0.050858356 = weight(_text_:applications in 4415) [ClassicSimilarity], result of:
      0.050858356 = score(doc=4415,freq=6.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.29487234 = fieldWeight in 4415, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4415)
    0.018522931 = weight(_text_:of in 4415) [ClassicSimilarity], result of:
      0.018522931 = score(doc=4415,freq=50.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.3023517 = fieldWeight in 4415, product of:
          7.071068 = tf(freq=50.0), with freq of:
            50.0 = termFreq=50.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4415)
    0.014307884 = weight(_text_:systems in 4415) [ClassicSimilarity], result of:
      0.014307884 = score(doc=4415,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.118839346 = fieldWeight in 4415, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4415)
  0.33333334 = coord(3/9)
```
Abstract

This work proposes an approach which is intended to meet the particular challenges of Medical Language Processing, in particular medical information retrieval. At its core lies a new type of dictionary, in which the entries are equivalence classes of subwords, i.e., semantically minimal units. These equivalence classes capture intralingual as well as interlingual synonymy. As equivalence classes abstract away from subtle particularities within and between languages and reference to them is realized via a language-independent conceptual system, they form an interlingua. In this work, the theoretical foundations of this approach are elaborated on. Furthermore, design considerations of applications based on the subword methodology are drawn up and showcase implementations are evaluated in detail. Starting with the introduction of Medical Linguistics as a field of active research in Chapter two, its consideration as a domain separated form general linguistics is motivated. In particular, morphological phenomena inherent to medical language are figured in more detail, which leads to an alternative view on medical terms and the introduction of the notion of subwords. Chapter three describes the formal foundation of subwords and the underlying linguistic declarative as well as procedural knowledge. An implementation of the subword model for the medical domain, the MorphoSaurus system, is presented in Chapter four. Emphasis will be given on the multilingual aspect of the proposed approach, including English, German, and Portuguese. The automatic acquisition of (medical) subwords for other languages (Spanish, French, and Swedish), and their integration in already available resources is described in the fifth Chapter.
The proper handling of acronyms plays a crucial role in medical texts, e.g. in patient records, as well as in scientific literature. Chapter six presents an approach, in which acronyms are automatically acquired from (bio-) medical literature. Furthermore, acronyms and their definitions in different languages are linked to each other using the MorphoSaurus text processing system. Automatic word sense disambiguation is still one of the most challenging tasks in Natural Language Processing. In Chapter seven, cross-lingual considerations lead to a new methodology for automatic disambiguation applied to subwords. Beginning with Chapter eight, a series of applications based onMorphoSaurus are introduced. Firstly, the implementation of the subword approach within a crosslanguage information retrieval setting for the medical domain is described and evaluated on standard test document collections. In Chapter nine, this methodology is extended to multilingual information retrieval in the Web, for which user queries are translated into target languages based on the segmentation into subwords and their interlingual mappings. The cross-lingual, automatic assignment of document descriptors to documents is the topic of Chapter ten. A large-scale evaluation of a heuristic, as well as a statistical algorithm is carried out using a prominent medical thesaurus as a controlled vocabulary. In Chapter eleven, it will be shown how MorphoSaurus can be used to map monolingual, lexical resources across different languages. As a result, a large multilingual medical lexicon with high coverage and complete lexical information is built and evaluated against a comparable, already available and commonly used lexical repository for the medical domain. Chapter twelve sketches a few applications based on MorphoSaurus. The generality and applicability of the subword approach to other domains is outlined, and proof-of-concepts in real-world scenarios are presented. Finally, Chapter thirteen recapitulates the most important aspects of MorphoSaurus and the potential benefit of its employment in medical information systems is carefully assessed, both for medical experts in their everyday life, but also with regard to health care consumers and their existential information needs.

Mitchell, J.S.; Zeng, M.L.; Zumer, M.: Modeling classification systems in multicultural and multilingual contexts (2012) 0.03

0.027655158 = product of:
  0.08296547 = sum of:
    0.017962547 = weight(_text_:of in 1967) [ClassicSimilarity], result of:
      0.017962547 = score(doc=1967,freq=16.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.2932045 = fieldWeight in 1967, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1967)
    0.042483397 = weight(_text_:systems in 1967) [ClassicSimilarity], result of:
      0.042483397 = score(doc=1967,freq=6.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.35286134 = fieldWeight in 1967, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.046875 = fieldNorm(doc=1967)
    0.022519529 = product of:
      0.045039058 = sum of:
        0.045039058 = weight(_text_:22 in 1967) [ClassicSimilarity], result of:
          0.045039058 = score(doc=1967,freq=4.0), product of:
            0.13719016 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03917671 = queryNorm
            0.32829654 = fieldWeight in 1967, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1967)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: This paper reports on the second part of an initiative of the authors on researching classification systems with the conceptual model defined by the Functional Requirements for Subject Authority Data (FRSAD) final report. In an earlier study, the authors explored whether the FRSAD conceptual model could be extended beyond subject authority data to model classification data. The focus of the current study is to determine if classification data modeled using FRSAD can be used to solve real-world discovery problems in multicultural and multilingual contexts. The paper discusses the relationships between entities (same type or different types) in the context of classification systems that involve multiple translations and /or multicultural implementations. Results of two case studies are presented in detail: (a) two instances of the DDC (DDC 22 in English, and the Swedish-English mixed translation of DDC 22), and (b) Chinese Library Classification. The use cases of conceptual models in practice are also discussed.

Pollitt, A.S.; Ellis, G.: Multilingual access to document databases (1993) 0.03

0.025467023 = product of:
  0.07640107 = sum of:
    0.010999769 = weight(_text_:of in 1302) [ClassicSimilarity], result of:
      0.010999769 = score(doc=1302,freq=6.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.17955035 = fieldWeight in 1302, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1302)
    0.0245278 = weight(_text_:systems in 1302) [ClassicSimilarity], result of:
      0.0245278 = score(doc=1302,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.2037246 = fieldWeight in 1302, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.046875 = fieldNorm(doc=1302)
    0.040873505 = weight(_text_:software in 1302) [ClassicSimilarity], result of:
      0.040873505 = score(doc=1302,freq=2.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.2629875 = fieldWeight in 1302, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.046875 = fieldNorm(doc=1302)
  0.33333334 = coord(3/9)

Abstract: This paper examines the reasons why approaches to facilitate document retrieval which apply AI (Artificial Intelligence) or Expert Systems techniques, relying on so-called "natural language" query statements from the end-user will result in sub-optimal solutions. It does so by reflecting on the nature of language and the fundamental problems in document retrieval. Support is given to the work of thesaurus builders and indexers with illustrations of how their work may be utilised in a generally applicable computer-based document retrieval system using Multilingual MenUSE software. The EuroMenUSE interface providing multilingual document access to EPOQUE, the European Parliament's Online Query System is described.

Francu, V.: Multilingual access to information using an intermediate language (2003) 0.03
```
0.025205677 = product of:
  0.07561703 = sum of:
    0.03355781 = weight(_text_:applications in 1742) [ClassicSimilarity], result of:
      0.03355781 = score(doc=1742,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.19456528 = fieldWeight in 1742, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03125 = fieldNorm(doc=1742)
    0.018934188 = weight(_text_:of in 1742) [ClassicSimilarity], result of:
      0.018934188 = score(doc=1742,freq=40.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.3090647 = fieldWeight in 1742, product of:
          6.3245554 = tf(freq=40.0), with freq of:
            40.0 = termFreq=40.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=1742)
    0.023125032 = weight(_text_:systems in 1742) [ClassicSimilarity], result of:
      0.023125032 = score(doc=1742,freq=4.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.19207339 = fieldWeight in 1742, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03125 = fieldNorm(doc=1742)
  0.33333334 = coord(3/9)
```
Abstract

While being theoretically so widely available, information can be restricted from a more general use by linguistic barriers. The linguistic aspects of the information languages and particularly the chances of an enhanced access to information by means of multilingual access facilities will make the substance of this thesis. The main problem of this research is thus to demonstrate that information retrieval can be improved by using multilingual thesaurus terms based on an intermediate or switching language to search with. Universal classification systems in general can play the role of switching languages for reasons dealt with in the forthcoming pages. The Universal Decimal Classification (UDC) in particular is the classification system used as example of a switching language for our objectives. The question may arise: why a universal classification system and not another thesaurus? Because the UDC like most of the classification systems uses symbols. Therefore, it is language independent and the problems of compatibility between such a thesaurus and different other thesauri in different languages are avoided. Another question may still arise? Why not then, assign running numbers to the descriptors in a thesaurus and make a switching language out of the resulting enumerative system? Because of some other characteristics of the UDC: hierarchical structure and terminological richness, consistency and control. One big problem to find an answer to is: can a thesaurus be made having as a basis a classification system in any and all its parts? To what extent this question can be given an affirmative answer? This depends much on the attributes of the universal classification system which can be favourably used to this purpose. Examples of different situations will be given and discussed upon beginning with those classes of UDC which are best fitted for building a thesaurus structure out of them (classes which are both hierarchical and faceted)...

Content

Inhalt: INFORMATION LANGUAGES: A LINGUISTIC APPROACH MULTILINGUAL ASPECTS IN INFORMATION STORAGE AND RETRIEVAL COMPATIBILITY AND CONVERTIBILITY OF INFORMATION LANGUAGES CURRENT TRENDS IN MULTILINGUAL ACCESS BUILDING UDC-BASED MULTILINGUAL THESAURI ONLINE APPLICATIONS OF THE UDC-BASED MULTILINGUAL THESAURI THE IMPACT OF SPECIFICITY ON THE RETRIEVAL POWER OF A UDC-BASED MULTILINGUAL THESAURUS FINAL REMARKS AND GENERAL CONCLUSIONS Proefschrift voorgelegd tot het behalen van de graad van doctor in de Taal- en Letterkunde aan de Universiteit Antwerpen. - Vgl.: http://dlist.sir.arizona.edu/1862/.
Oard, D.W.: Alternative approaches for cross-language text retrieval (1997) 0.02
```
0.024708923 = product of:
  0.074126765 = sum of:
    0.029363085 = weight(_text_:applications in 1164) [ClassicSimilarity], result of:
      0.029363085 = score(doc=1164,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.17024462 = fieldWeight in 1164, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1164)
    0.016147917 = weight(_text_:of in 1164) [ClassicSimilarity], result of:
      0.016147917 = score(doc=1164,freq=38.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.2635841 = fieldWeight in 1164, product of:
          6.164414 = tf(freq=38.0), with freq of:
            38.0 = termFreq=38.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1164)
    0.028615767 = weight(_text_:systems in 1164) [ClassicSimilarity], result of:
      0.028615767 = score(doc=1164,freq=8.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.23767869 = fieldWeight in 1164, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1164)
  0.33333334 = coord(3/9)
```
Abstract

The explosive growth of the Internet and other sources of networked information have made automatic mediation of access to networked information sources an increasingly important problem. Much of this information is expressed as electronic text, and it is becoming practical to automatically convert some printed documents and recorded speech to electronic text as well. Thus, automated systems capable of detecting useful documents are finding widespread application. With even a small number of languages it can be inconvenient to issue the same query repeatedly in every language, so users who are able to read more than one language will likely prefer a multilingual text retrieval system over a collection of monolingual systems. And since reading ability in a language does not always imply fluent writing ability in that language, such users will likely find cross-language text retrieval particularly useful for languages in which they are less confident of their ability to express their information needs effectively. The use of such systems can be also be beneficial if the user is able to read only a single language. For example, when only a small portion of the document collection will ever be examined by the user, performing retrieval before translation can be significantly more economical than performing translation before retrieval. So when the application is sufficiently important to justify the time and effort required for translation, those costs can be minimized if an effective cross-language text retrieval system is available. Even when translation is not available, there are circumstances in which cross-language text retrieval could be useful to a monolingual user. For example, a researcher might find a paper published in an unfamiliar language useful if that paper contains references to works by the same author that are in the researcher's native language.
Multilingual text retrieval can be defined as selection of useful documents from collections that may contain several languages (English, French, Chinese, etc.). This formulation allows for the possibility that individual documents might contain more than one language, a common occurrence in some applications. Both cross-language and within-language retrieval are included in this formulation, but it is the cross-language aspect of the problem which distinguishes multilingual text retrieval from its well studied monolingual counterpart. At the SIGIR 96 workshop on "Cross-Linguistic Information Retrieval" the participants discussed the proliferation of terminology being used to describe the field and settled on "Cross-Language" as the best single description of the salient aspect of the problem. "Multilingual" was felt to be too broad, since that term has also been used to describe systems able to perform within-language retrieval in more than one language but that lack any cross-language capability. "Cross-lingual" and "cross-linguistic" were felt to be equally good descriptions of the field, but "crosslanguage" was selected as the preferred term in the interest of standardization. Unfortunately, at about the same time the U.S. Defense Advanced Research Projects Agency (DARPA) introduced "translingual" as their preferred term, so we are still some distance from reaching consensus on this matter.
I will not attempt to draw a sharp distinction between retrieval and filtering in this survey. Although my own work on adaptive cross-language text filtering has led me to make this distinction fairly carefully in other presentations (c.f., (Oard 1997b)), such an proach does little to help understand the fundamental techniques which have been applied or the results that have been obtained in this case. Since it is still common to view filtering (detection of useful documents in dynamic document streams) as a kind of retrieval, will simply adopt that perspective here.

Zhou, Y. et al.: Analysing entity context in multilingual Wikipedia to support entity-centric retrieval applications (2016) 0.02

0.0245409 = product of:
  0.11043405 = sum of:
    0.08389453 = weight(_text_:applications in 2758) [ClassicSimilarity], result of:
      0.08389453 = score(doc=2758,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.4864132 = fieldWeight in 2758, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.078125 = fieldNorm(doc=2758)
    0.026539518 = product of:
      0.053079035 = sum of:
        0.053079035 = weight(_text_:22 in 2758) [ClassicSimilarity], result of:
          0.053079035 = score(doc=2758,freq=2.0), product of:
            0.13719016 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03917671 = queryNorm
            0.38690117 = fieldWeight in 2758, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2758)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)

Date: 1. 2.2016 18:25:22

Mitchell, J.S.; Zeng, M.L.; Zumer, M.: Modeling classification systems in multicultural and multilingual contexts (2014) 0.02

0.023045965 = product of:
  0.06913789 = sum of:
    0.014968789 = weight(_text_:of in 1962) [ClassicSimilarity], result of:
      0.014968789 = score(doc=1962,freq=16.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.24433708 = fieldWeight in 1962, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1962)
    0.03540283 = weight(_text_:systems in 1962) [ClassicSimilarity], result of:
      0.03540283 = score(doc=1962,freq=6.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.29405114 = fieldWeight in 1962, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1962)
    0.018766273 = product of:
      0.037532546 = sum of:
        0.037532546 = weight(_text_:22 in 1962) [ClassicSimilarity], result of:
          0.037532546 = score(doc=1962,freq=4.0), product of:
            0.13719016 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03917671 = queryNorm
            0.27358043 = fieldWeight in 1962, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1962)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: This article reports on the second part of an initiative of the authors on researching classification systems with the conceptual model defined by the Functional Requirements for Subject Authority Data (FRSAD) final report. In an earlier study, the authors explored whether the FRSAD conceptual model could be extended beyond subject authority data to model classification data. The focus of the current study is to determine if classification data modeled using FRSAD can be used to solve real-world discovery problems in multicultural and multilingual contexts. The article discusses the relationships between entities (same type or different types) in the context of classification systems that involve multiple translations and/or multicultural implementations. Results of two case studies are presented in detail: (a) two instances of the Dewey Decimal Classification [DDC] (DDC 22 in English, and the Swedish-English mixed translation of DDC 22), and (b) Chinese Library Classification. The use cases of conceptual models in practice are also discussed.

Mustafa el Hadi, W.: Human language technology and its role in information access and management (2003) 0.02
```
0.02304364 = product of:
  0.103696376 = sum of:
    0.08389453 = weight(_text_:applications in 5524) [ClassicSimilarity], result of:
      0.08389453 = score(doc=5524,freq=8.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.4864132 = fieldWeight in 5524, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5524)
    0.019801848 = weight(_text_:of in 5524) [ClassicSimilarity], result of:
      0.019801848 = score(doc=5524,freq=28.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.32322758 = fieldWeight in 5524, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5524)
  0.22222222 = coord(2/9)
```
Abstract

The role of linguistics in information access, extraction and dissemination is essential. Radical changes in the techniques of information and communication at the end of the twentieth century have had a significant effect on the function of the linguistic paradigm and its applications in all forms of communication. The introduction of new technical means have deeply changed the possibilities for the distribution of information. In this situation, what is the role of the linguistic paradigm and its practical applications, i.e., natural language processing (NLP) techniques when applied to information access? What solutions can linguistics offer in human computer interaction, extraction and management? Many fields show the relevance of the linguistic paradigm through the various technologies that require NLP, such as document and message understanding, information detection, extraction, and retrieval, question and answer, cross-language information retrieval (CLIR), text summarization, filtering, and spoken document retrieval. This paper focuses on the central role of human language technologies in the information society, surveys the current situation, describes the benefits of the above mentioned applications, outlines successes and challenges, and discusses solutions. It reviews the resources and means needed to advance information access and dissemination across language boundaries in the twenty-first century. Multilingualism, which is a natural result of globalization, requires more effort in the direction of language technology. The scope of human language technology (HLT) is large, so we limit our review to applications that involve multilinguality.

Search (251 results, page 1 of 13)

Authors

Years

Languages

Types

Themes

Classifications