Search (139 results, page 1 of 7)

Liu, S.; Liu, F.; Yu, C.; Meng, W.: ¬An effective approach to document retrieval via utilizing WordNet and recognizing phrases (2004) 0.06

0.056392573 = product of:
  0.084588856 = sum of:
    0.06461004 = weight(_text_:development in 4078) [ClassicSimilarity], result of:
      0.06461004 = score(doc=4078,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.40352166 = fieldWeight in 4078, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.078125 = fieldNorm(doc=4078)
    0.01997881 = product of:
      0.059936427 = sum of:
        0.059936427 = weight(_text_:29 in 4078) [ClassicSimilarity], result of:
          0.059936427 = score(doc=4078,freq=2.0), product of:
            0.1542157 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04384008 = queryNorm
            0.38865322 = fieldWeight in 4078, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.078125 = fieldNorm(doc=4078)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Date: 10.10.2005 10:29:08
Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.05

0.054339416 = product of:
  0.08150912 = sum of:
    0.06962967 = product of:
      0.208889 = sum of:
        0.208889 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.208889 = score(doc=562,freq=2.0), product of:
            0.37167668 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.04384008 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.011879452 = product of:
      0.035638355 = sum of:
        0.035638355 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.035638355 = score(doc=562,freq=2.0), product of:
            0.1535205 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04384008 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.05
```
0.053522646 = product of:
  0.08028397 = sum of:
    0.03230502 = weight(_text_:development in 2541) [ClassicSimilarity], result of:
      0.03230502 = score(doc=2541,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.20176083 = fieldWeight in 2541, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
    0.04797895 = product of:
      0.07196842 = sum of:
        0.029968213 = weight(_text_:29 in 2541) [ClassicSimilarity], result of:
          0.029968213 = score(doc=2541,freq=2.0), product of:
            0.1542157 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04384008 = queryNorm
            0.19432661 = fieldWeight in 2541, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2541)
        0.042000204 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
          0.042000204 = score(doc=2541,freq=4.0), product of:
            0.1535205 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04384008 = queryNorm
            0.27358043 = fieldWeight in 2541, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2541)
      0.6666667 = coord(2/3)
  0.6666667 = coord(2/3)
```
Abstract

The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.

Date

14. 8.2004 17:22:56

Source

Online. 28(2004) no.3, S.22-29
Dorr, B.J.: Large-scale dictionary construction for foreign language tutoring and interlingual machine translation (1997) 0.04
```
0.044468593 = product of:
  0.06670289 = sum of:
    0.054823436 = weight(_text_:development in 3244) [ClassicSimilarity], result of:
      0.054823436 = score(doc=3244,freq=4.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.34239948 = fieldWeight in 3244, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.046875 = fieldNorm(doc=3244)
    0.011879452 = product of:
      0.035638355 = sum of:
        0.035638355 = weight(_text_:22 in 3244) [ClassicSimilarity], result of:
          0.035638355 = score(doc=3244,freq=2.0), product of:
            0.1535205 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04384008 = queryNorm
            0.23214069 = fieldWeight in 3244, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3244)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

Describes techniques for automatic construction of dictionaries for use in large-scale foreign language tutoring (FLT) and interlingual machine translation (MT) systems. The dictionaries are based on a language independent representation called lexical conceptual structure (LCS). Demonstrates that synonymous verb senses share distribution patterns. Shows how the syntax-semantics relation can be used to develop a lexical acquisition approach that contributes both toward the enrichment of existing online resources and toward the development of lexicons containing more complete information than is provided in any of these resources alone. Describes the structure of the LCS and shows how this representation is used in FLT and MT. Focuses on the problem of building LCS dictionaries for large-scale FLT and MT. Describes authoring tools for manual and semi-automatic construction of LCS dictionaries. Presents an approach that uses linguistic techniques for building word definitions automatically. The techniques have been implemented as part of a set of lixicon-development tools used in the MILT FLT project

Date

31. 7.1996 9:22:19

Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.04

0.039474797 = product of:
  0.059212193 = sum of:
    0.04522703 = weight(_text_:development in 1595) [ClassicSimilarity], result of:
      0.04522703 = score(doc=1595,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.28246516 = fieldWeight in 1595, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1595)
    0.013985164 = product of:
      0.041955493 = sum of:
        0.041955493 = weight(_text_:29 in 1595) [ClassicSimilarity], result of:
          0.041955493 = score(doc=1595,freq=2.0), product of:
            0.1542157 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04384008 = queryNorm
            0.27205724 = fieldWeight in 1595, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1595)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Abstract: This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based an the divide-and-conquer principle. The method is evaluated using backpropagation neural networks, such as the machine learning algorithm, that leam to assign MeSH categories to a subset of MEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers, are provided. The results indicate that the use of hierarchical structures improves Performance significantly.
Date: 11. 5.2003 18:29:44

Bowker, L.: Information retrieval in translation memory systems : assessment of current limitations and possibilities for future development (2002) 0.04

0.039474797 = product of:
  0.059212193 = sum of:
    0.04522703 = weight(_text_:development in 1854) [ClassicSimilarity], result of:
      0.04522703 = score(doc=1854,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.28246516 = fieldWeight in 1854, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1854)
    0.013985164 = product of:
      0.041955493 = sum of:
        0.041955493 = weight(_text_:29 in 1854) [ClassicSimilarity], result of:
          0.041955493 = score(doc=1854,freq=2.0), product of:
            0.1542157 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04384008 = queryNorm
            0.27205724 = fieldWeight in 1854, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1854)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Source: Knowledge organization. 29(2002) nos.3/4, S.198-203

Litkowski, K.C.: Category development based on semantic principles (1997) 0.03
```
0.02611184 = product of:
  0.078335516 = sum of:
    0.078335516 = weight(_text_:development in 1824) [ClassicSimilarity], result of:
      0.078335516 = score(doc=1824,freq=6.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.48924404 = fieldWeight in 1824, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1824)
  0.33333334 = coord(1/3)
```
Abstract

Describes the beginnings of computerized information retrieval and text analysis, particularly from the perspective of the use of thesauri and cataloguing systems. Describes formalisations of linguistic principles in the development of formal grammars and semantics. Presents the principles for category development, based on research in linguistic formalism continuing with ever richer grammars and semantic formalism. Descrines the progress of these formalisms in the examiniation of the categories used in Minnesota Contextual Content Analysis approach. Describes current research toward an integration of semantic principles into content analysis abstraction procedures for characterising the category of any text

Chieu, H.L.; Lee, Y.K.: Query based event extraction along a timeline (2004) 0.03

0.025844015 = product of:
  0.077532046 = sum of:
    0.077532046 = weight(_text_:development in 4108) [ClassicSimilarity], result of:
      0.077532046 = score(doc=4108,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.484226 = fieldWeight in 4108, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.09375 = fieldNorm(doc=4108)
  0.33333334 = coord(1/3)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.02

0.02320989 = product of:
  0.06962967 = sum of:
    0.06962967 = product of:
      0.208889 = sum of:
        0.208889 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.208889 = score(doc=862,freq=2.0), product of:
            0.37167668 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.04384008 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Jones, I.; Cunliffe, D.; Tudhope, D.: Natural language processing and knowledge organization systems as an aid to retrieval (2004) 0.02
```
0.021668348 = product of:
  0.03250252 = sum of:
    0.022613514 = weight(_text_:development in 2677) [ClassicSimilarity], result of:
      0.022613514 = score(doc=2677,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.14123258 = fieldWeight in 2677, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.02734375 = fieldNorm(doc=2677)
    0.009889007 = product of:
      0.029667018 = sum of:
        0.029667018 = weight(_text_:29 in 2677) [ClassicSimilarity], result of:
          0.029667018 = score(doc=2677,freq=4.0), product of:
            0.1542157 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04384008 = queryNorm
            0.19237353 = fieldWeight in 2677, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2677)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Content

1. Introduction The need for research into the application of linguistic techniques in Information Retrieval (IR) in general, and a similar need in faceted Knowledge Organization Systems (KOS) has been indicated by various authors. Smeaton (1997) points out the inherent limitations of conventional approaches to IR based an "bags of words", mainly difficulties caused by lexical ambiguity in the words concerned, and goes an to suggest the possibility of using Natural Language Processing (NLP) in query formulation. Past experience with a faceted retrieval system highlighted the need for integrating the linguistic perspective in order to fully utilise the potential of a KOS (Tudhope et al." 2002). The present research seeks to address some of these needs in using NLP to improve the efficacy of KOS tools in query and retrieval systems. Syntactic parsing and part-of-speech tagging can substantially reduce lexical ambiguity through homograph disambiguation. Given the two strings "1 fable the motion" and "I put the motion an the fable", for instance, the parser used in this research clearly indicates that 'fable' in the first string is a verb, while 'table' in the second string is a noun, a distinction that would be missed in the "bag of words" approach. This syntactic disambiguation enables a more precise matching from free text to the controlled vocabulary of a KOS and vice versa. The use of a general linguistic resource, namely Roget's Thesaurus of English Words and Phrases (RTEWP), as an intermediary in this process, is investigated. The adaptation of the Link parser (Sleator & Temperley, 1993) to the purposes of the research is reported. The design and implementation of the early practical stages of the project are described, and the results of the initial experiments are presented and evaluated. Applications of the techniques developed are foreseen in the areas of query disambiguation, information retrieval and automatic indexing. In the first section of the paper a brief review of the literature and relevant current work in the field is presented. The second section includes reports an the development of algorithms, the construction of data sets and theoretical and experimental work undertaken to date. The third section evaluates the results obtained, and outlines directions for future research.

Date

29. 8.2004 19:29:56

Sokirko, A.V.: Programnaya realizatsiya Russkogo abshchesemanticheskogo slovarya (1997) 0.02

0.021536682 = product of:
  0.06461004 = sum of:
    0.06461004 = weight(_text_:development in 2258) [ClassicSimilarity], result of:
      0.06461004 = score(doc=2258,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.40352166 = fieldWeight in 2258, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.078125 = fieldNorm(doc=2258)
  0.33333334 = coord(1/3)

Abstract: Discusses the Dolphi2 for Windows software which has been used for the development of the Russian Semantic Dictionay ROSS. Although not a relational database as such, Dolphi actively uses standard objects of relational databases

Pimenov, E.N.: Normativnost' i nekotorye problem razrabotki tezauruzov i drugikh lingvistiicheskikh sredstv IPS (2000) 0.02

0.021536682 = product of:
  0.06461004 = sum of:
    0.06461004 = weight(_text_:development in 3281) [ClassicSimilarity], result of:
      0.06461004 = score(doc=3281,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.40352166 = fieldWeight in 3281, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.078125 = fieldNorm(doc=3281)
  0.33333334 = coord(1/3)

Footnote: Übers. des Titels: Standardisation and some other issues connected with the development of thesauri and other linguistic information retrieval tools

Xu, J.; Weischedel, R.; Licuanan, A.: Evaluation of an extraction-based approach to answering definitional questions (2004) 0.02

0.021536682 = product of:
  0.06461004 = sum of:
    0.06461004 = weight(_text_:development in 4107) [ClassicSimilarity], result of:
      0.06461004 = score(doc=4107,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.40352166 = fieldWeight in 4107, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.078125 = fieldNorm(doc=4107)
  0.33333334 = coord(1/3)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Ekmekcioglu, F.C.; Lynch, M.F.; Willet, P.: Development and evaluation of conflation techniques for the implementation of a document retrieval system for Turkish text databases (1995) 0.02
```
0.021320224 = product of:
  0.06396067 = sum of:
    0.06396067 = weight(_text_:development in 5797) [ClassicSimilarity], result of:
      0.06396067 = score(doc=5797,freq=4.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.39946604 = fieldWeight in 5797, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5797)
  0.33333334 = coord(1/3)
```
Abstract

Considers language processing techniques necessary for the implementation of a document retrieval system for Turkish text databases. Introduces the main characteristics of the Turkish language. Discusses the development of a stopword list and the evaluation of a stemming algorithm that takes account of the language's morphological structure. A 2 level description of Turkish morphology developed in Bilkent University, Ankara, is incorporated into a morphological parser, PC-KIMMO, to carry out stemming in Turkish databases. Describes the evaluation of string similarity measures - n-gram matching techniques - for Turkish. Reports experiments on 6 different Turkish text corpora
Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.02
```
0.019695465 = product of:
  0.029543195 = sum of:
    0.022613514 = weight(_text_:development in 1616) [ClassicSimilarity], result of:
      0.022613514 = score(doc=1616,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.14123258 = fieldWeight in 1616, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1616)
    0.0069296802 = product of:
      0.02078904 = sum of:
        0.02078904 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
          0.02078904 = score(doc=1616,freq=2.0), product of:
            0.1535205 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04384008 = queryNorm
            0.1354154 = fieldWeight in 1616, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1616)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.

Melby, A.: Some notes on 'The proper place of men and machines in language translation' (1997) 0.02

0.018563017 = product of:
  0.055689048 = sum of:
    0.055689048 = product of:
      0.08353357 = sum of:
        0.041955493 = weight(_text_:29 in 330) [ClassicSimilarity], result of:
          0.041955493 = score(doc=330,freq=2.0), product of:
            0.1542157 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04384008 = queryNorm
            0.27205724 = fieldWeight in 330, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=330)
        0.04157808 = weight(_text_:22 in 330) [ClassicSimilarity], result of:
          0.04157808 = score(doc=330,freq=2.0), product of:
            0.1535205 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04384008 = queryNorm
            0.2708308 = fieldWeight in 330, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=330)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)

Date: 31. 7.1996 9:22:19
Source: Machine translation. 12(1997) nos.1/2, S.29-34

Natural language processing and speech technology : Results of the 3rd KONVENS Conference, Bielefeld, October 1996 (1996) 0.02

0.018274479 = product of:
  0.054823436 = sum of:
    0.054823436 = weight(_text_:development in 7291) [ClassicSimilarity], result of:
      0.054823436 = score(doc=7291,freq=4.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.34239948 = fieldWeight in 7291, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.046875 = fieldNorm(doc=7291)
  0.33333334 = coord(1/3)

Abstract: Kapitelüberschriften: (1) Modelling cognition, perception and behaviour; (2) Language and speech systems; (3) Multilingual research and development; (4) Prosody; (5) Syntax, morphology, lexicon; (6) Semantics; (7) Formalisms and parsing; (8) Tools for development and teaching

Schwarz, C.: Content based text handling (1990) 0.02

0.017229345 = product of:
  0.05168803 = sum of:
    0.05168803 = weight(_text_:development in 5248) [ClassicSimilarity], result of:
      0.05168803 = score(doc=5248,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.32281733 = fieldWeight in 5248, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.0625 = fieldNorm(doc=5248)
  0.33333334 = coord(1/3)

Abstract: Whereas up to now document analysis was mainly concerned with the handling of formal properties of documents (scanning, editing), AI (artificial intelligence) techniques in the field of Natural Language Processing have shown the possibility of "Content based text handling", i.e., a content analysis for textual documents. Research and development in this field at The Siemens Corporate Research Laboratories are described in this article.

Rahmstorf, G.: Compositional semantics and concept representation (1991) 0.02
```
0.017229345 = product of:
  0.05168803 = sum of:
    0.05168803 = weight(_text_:development in 6673) [ClassicSimilarity], result of:
      0.05168803 = score(doc=6673,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.32281733 = fieldWeight in 6673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.0625 = fieldNorm(doc=6673)
  0.33333334 = coord(1/3)
```
Abstract

Concept systems are not only used in the sciences, but also in secondary supporting fields, e.g. in libraries, in documentation, in terminology and increasingly also in knowledge representation. It is suggested that the development of concept systems be based on semantic analysis. Methodical steps are described. The principle of morpho-syntactic composition in semantics will serve as a theoretical basis for the suggested method. The implications and limitations of this principle will be demonstrated

Derrington, S.: MT - myth, muddle or reality? (1994) 0.02

0.017229345 = product of:
  0.05168803 = sum of:
    0.05168803 = weight(_text_:development in 7047) [ClassicSimilarity], result of:
      0.05168803 = score(doc=7047,freq=2.0), product of:
        0.16011542 = queryWeight, product of:
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.04384008 = queryNorm
        0.32281733 = fieldWeight in 7047, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.652261 = idf(docFreq=3116, maxDocs=44218)
          0.0625 = fieldNorm(doc=7047)
  0.33333334 = coord(1/3)

Abstract: The trend away from the development of fully automatic machine translation (FAMT) is the result of failure to develop the foundation level of machine translation (MT) systems design theory. In order to create this level and establish reliably whether FAMT is achievable or not it is necessary to revise the currently accepted view of the interdisciplinary approach. Concludes with an assessment of the interdisciplinary approach as applied to date

Search (139 results, page 1 of 7)

Authors

Years

Languages

Types

Themes

Subjects

Classifications