Search (80 results, page 1 of 4)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.38

0.38179284 = product of:
  0.47724104 = sum of:
    0.065772705 = product of:
      0.1973181 = sum of:
        0.1973181 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.1973181 = score(doc=562,freq=2.0), product of:
            0.35108855 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.041411664 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.1973181 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1973181 = score(doc=562,freq=2.0), product of:
        0.35108855 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.041411664 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.1973181 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1973181 = score(doc=562,freq=2.0), product of:
        0.35108855 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.041411664 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.016832126 = product of:
      0.033664253 = sum of:
        0.033664253 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.033664253 = score(doc=562,freq=2.0), product of:
            0.1450166 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.041411664 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.8 = coord(4/5)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.28

0.27624536 = product of:
  0.46040893 = sum of:
    0.065772705 = product of:
      0.1973181 = sum of:
        0.1973181 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.1973181 = score(doc=862,freq=2.0), product of:
            0.35108855 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.041411664 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
    0.1973181 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.1973181 = score(doc=862,freq=2.0), product of:
        0.35108855 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.041411664 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.1973181 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.1973181 = score(doc=862,freq=2.0), product of:
        0.35108855 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.041411664 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
  0.6 = coord(3/5)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.25

0.24688101 = product of:
  0.41146833 = sum of:
    0.1973181 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.1973181 = score(doc=563,freq=2.0), product of:
        0.35108855 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.041411664 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.1973181 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.1973181 = score(doc=563,freq=2.0), product of:
        0.35108855 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.041411664 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.016832126 = product of:
      0.033664253 = sum of:
        0.033664253 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
          0.033664253 = score(doc=563,freq=2.0), product of:
            0.1450166 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.041411664 = queryNorm
            0.23214069 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Content: A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Date: 10. 1.2013 19:22:47

Dorr, B.J.: Large-scale dictionary construction for foreign language tutoring and interlingual machine translation (1997) 0.11
```
0.11183228 = product of:
  0.2795807 = sum of:
    0.2627486 = weight(_text_:dictionaries in 3244) [ClassicSimilarity], result of:
      0.2627486 = score(doc=3244,freq=8.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.9171746 = fieldWeight in 3244, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.046875 = fieldNorm(doc=3244)
    0.016832126 = product of:
      0.033664253 = sum of:
        0.033664253 = weight(_text_:22 in 3244) [ClassicSimilarity], result of:
          0.033664253 = score(doc=3244,freq=2.0), product of:
            0.1450166 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.041411664 = queryNorm
            0.23214069 = fieldWeight in 3244, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3244)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Describes techniques for automatic construction of dictionaries for use in large-scale foreign language tutoring (FLT) and interlingual machine translation (MT) systems. The dictionaries are based on a language independent representation called lexical conceptual structure (LCS). Demonstrates that synonymous verb senses share distribution patterns. Shows how the syntax-semantics relation can be used to develop a lexical acquisition approach that contributes both toward the enrichment of existing online resources and toward the development of lexicons containing more complete information than is provided in any of these resources alone. Describes the structure of the LCS and shows how this representation is used in FLT and MT. Focuses on the problem of building LCS dictionaries for large-scale FLT and MT. Describes authoring tools for manual and semi-automatic construction of LCS dictionaries. Presents an approach that uses linguistic techniques for building word definitions automatically. The techniques have been implemented as part of a set of lixicon-development tools used in the MILT FLT project

Date

31. 7.1996 9:22:19

Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.11

0.108065836 = product of:
  0.27016458 = sum of:
    0.24772175 = weight(_text_:dictionaries in 6752) [ClassicSimilarity], result of:
      0.24772175 = score(doc=6752,freq=4.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.86472046 = fieldWeight in 6752, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.0625 = fieldNorm(doc=6752)
    0.022442836 = product of:
      0.044885673 = sum of:
        0.044885673 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
          0.044885673 = score(doc=6752,freq=2.0), product of:
            0.1450166 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.041411664 = queryNorm
            0.30952093 = fieldWeight in 6752, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=6752)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: AutoSlog is a system that addresses the knowledge engineering bottleneck for information extraction. AutoSlog automatically creates domain specific dictionaries for information extraction, given an appropriate training corpus. Describes experiments with AutoSlog in terrorism, joint ventures and microelectronics domains. Compares the performance of AutoSlog across the 3 domains, discusses the lessons learned and presents results from 2 experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog
Date: 6. 3.1997 16:22:15

Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.05
```
0.051726174 = product of:
  0.12931544 = sum of:
    0.109478585 = weight(_text_:dictionaries in 2541) [ClassicSimilarity], result of:
      0.109478585 = score(doc=2541,freq=2.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.38215607 = fieldWeight in 2541, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
    0.019836852 = product of:
      0.039673705 = sum of:
        0.039673705 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
          0.039673705 = score(doc=2541,freq=4.0), product of:
            0.1450166 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.041411664 = queryNorm
            0.27358043 = fieldWeight in 2541, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2541)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.

Date

14. 8.2004 17:22:56

Source

Online. 28(2004) no.3, S.22-29

¬The language engineering directory (1993) 0.04

0.043791436 = product of:
  0.21895717 = sum of:
    0.21895717 = weight(_text_:dictionaries in 8408) [ClassicSimilarity], result of:
      0.21895717 = score(doc=8408,freq=2.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.76431215 = fieldWeight in 8408, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.078125 = fieldNorm(doc=8408)
  0.2 = coord(1/5)

Abstract: This is a reference guide to language technology organizations and products around the world. Areas covered in the directory include: Artificial intelligence, Document storage and retrieval, Electronic dictionaries (mono- and multilingual), Expert language systems, Multilingual word processors, Natural language database interfaces, Term databanks, Terminology management, Text content analysis, Thesauri

Sokirko, A.V.: Obzor zarubezhnykh sistem avtomaticheskoi obrabotki teksta, ispol'zuyushchikh poverkhnosto-semanticheskoe predstavlenie, i mashinnykh sematicheskikh slovarei (2000) 0.04

0.043791436 = product of:
  0.21895717 = sum of:
    0.21895717 = weight(_text_:dictionaries in 8870) [ClassicSimilarity], result of:
      0.21895717 = score(doc=8870,freq=2.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.76431215 = fieldWeight in 8870, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.078125 = fieldNorm(doc=8870)
  0.2 = coord(1/5)

Footnote: Übers. des Titels: Review of foreign systems for automated text processing using semantic presentations and electronic semantic dictionaries

Egger, W.: Helferlein für jedermann : Elektronische Wörterbücher (2004) 0.04

0.043791436 = product of:
  0.21895717 = sum of:
    0.21895717 = weight(_text_:dictionaries in 1501) [ClassicSimilarity], result of:
      0.21895717 = score(doc=1501,freq=2.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.76431215 = fieldWeight in 1501, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.078125 = fieldNorm(doc=1501)
  0.2 = coord(1/5)

Abstract: Zahllose online-dictionaries und einzelne, teilweise ausgezeichnete elektronische Wörterbücher wollen hier nicht erwähnt werden, da ihre Vorzüge teilweise folgenden Nachteilen gegenüber stehen: Internet-Verbindung, CD-Rom, bzw. zeitaufwändiges Aufrufen der Wörterbücher oder Wechsel der Sprachrichtung sind erforderlich.

Akman, K.I.: ¬A new text compression technique based on natural language structure (1995) 0.04
```
0.043351308 = product of:
  0.21675654 = sum of:
    0.21675654 = weight(_text_:dictionaries in 1860) [ClassicSimilarity], result of:
      0.21675654 = score(doc=1860,freq=4.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.7566304 = fieldWeight in 1860, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1860)
  0.2 = coord(1/5)
```
Abstract

Describes a new data compression technique which utilizes some of the common structural characteristics of languages. The proposed algorithm partitions words into their roots and suffixes which are then replaced by shorter bit representations. The method used 3 dictionaries in the from of binary search trees and 1 character array. The first 2 dictionaries are for roots, and the third one is for suffixes. The character array is used for both searching compressible words and coding incompressible words. The number of bits in representing a substring depends on the number of the entries in the dictionary in which the substring is found. The proposed algorithm is implemented in the Turkish language and tested using 3 different text groups with different lenghts. Results indicate a compression factor of up to 47 per cent
Greengrass, M.: Conflation methods for searching databases of Latin text (1996) 0.04
```
0.043351308 = product of:
  0.21675654 = sum of:
    0.21675654 = weight(_text_:dictionaries in 6987) [ClassicSimilarity], result of:
      0.21675654 = score(doc=6987,freq=4.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.7566304 = fieldWeight in 6987, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6987)
  0.2 = coord(1/5)
```
Abstract

Describes the results of a project to develop conflation tools for searching databases of Latin text. Reports on the results of a questionnaire sent to 64 users of Latin text retrieval systems. Describes a Latin stemming algorithm that uses a simple longest match with some recoding but differs from most stemmers in its use of 2 separate suffix dictionaries for processing query and database words. Describes a retrieval system in which a user inputs the principal component of their search term, these components are stemmed and the resulting stems matched against the noun based and verb based stem dictionaries. Evaluates the system, describing its limitations, and a more complex system

Lezius, W.; Rapp, R.; Wettler, M.: ¬A morphology-system and part-of-speech tagger for German (1996) 0.04

0.042089686 = product of:
  0.21044843 = sum of:
    0.21044843 = sum of:
      0.15434134 = weight(_text_:german in 1693) [ClassicSimilarity], result of:
        0.15434134 = score(doc=1693,freq=2.0), product of:
          0.24051933 = queryWeight, product of:
            5.808009 = idf(docFreq=360, maxDocs=44218)
            0.041411664 = queryNorm
          0.6417004 = fieldWeight in 1693, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.808009 = idf(docFreq=360, maxDocs=44218)
            0.078125 = fieldNorm(doc=1693)
      0.05610709 = weight(_text_:22 in 1693) [ClassicSimilarity], result of:
        0.05610709 = score(doc=1693,freq=2.0), product of:
          0.1450166 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.041411664 = queryNorm
          0.38690117 = fieldWeight in 1693, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=1693)
  0.2 = coord(1/5)

Date: 22. 3.2015 9:37:18

Santana Suárez, O.; Carreras Riudavets, F.J.; Hernández Figueroa, Z.; González Cabrera, A.C.: Integration of an XML electronic dictionary with linguistic tools for natural language processing (2007) 0.04
```
0.037158266 = product of:
  0.18579133 = sum of:
    0.18579133 = weight(_text_:dictionaries in 921) [ClassicSimilarity], result of:
      0.18579133 = score(doc=921,freq=4.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.6485404 = fieldWeight in 921, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.046875 = fieldNorm(doc=921)
  0.2 = coord(1/5)
```
Abstract

This study proposes the codification of lexical information in electronic dictionaries, in accordance with a generic and extendable XML scheme model, and its conjunction with linguistic tools for the processing of natural language. Our approach is different from other similar studies in that we propose XML coding of those items from a dictionary of meanings that are less related to the lexical units. Linguistic information, such as morphology, syllables, phonology, etc., will be included by means of specific linguistic tools. The use of XML as a container for the information allows the use of other XML tools for carrying out searches or for enabling presentation of the information in different resources. This model is particularly important as it combines two parallel paradigms-extendable labelling of documents and computational linguistics-and it is also applicable to other languages. We have included a comparison with the labelling proposal of printed dictionaries carried out by the Text Encoding Initiative (TEI). The proposed design has been validated with a dictionary of more than 145 000 accepted meanings.

Gopestake, A.: Acquisition of lexical translation relations from MRDS (1994/95) 0.04

0.035033148 = product of:
  0.17516573 = sum of:
    0.17516573 = weight(_text_:dictionaries in 4073) [ClassicSimilarity], result of:
      0.17516573 = score(doc=4073,freq=2.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.6114497 = fieldWeight in 4073, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.0625 = fieldNorm(doc=4073)
  0.2 = coord(1/5)

Abstract: Presents a methodology for extracting information about lexical translation equivalences from the machine readable versions of conventional dictionaries (MRDs), and describes a series of experiments on semi automatic construction of a linked multilingual lexical knowledge base for English, Dutch and Spanish. Discusses the advantage and limitations of using MRDs that this has revealed, and some strategies developed to cover gaps where direct translation can be found

Yang, C.C.; Li, K.W.: Automatic construction of English/Chinese parallel corpora (2003) 0.03
```
0.030339597 = product of:
  0.15169798 = sum of:
    0.15169798 = weight(_text_:dictionaries in 1683) [ClassicSimilarity], result of:
      0.15169798 = score(doc=1683,freq=6.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.529531 = fieldWeight in 1683, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.03125 = fieldNorm(doc=1683)
  0.2 = coord(1/5)
```
Abstract

As the demand for global information increases significantly, multilingual corpora has become a valuable linguistic resource for applications to cross-lingual information retrieval and natural language processing. In order to cross the boundaries that exist between different languages, dictionaries are the most typical tools. However, the general-purpose dictionary is less sensitive in both genre and domain. It is also impractical to manually construct tailored bilingual dictionaries or sophisticated multilingual thesauri for large applications. Corpusbased approaches, which do not have the limitation of dictionaries, provide a statistical translation model with which to cross the language boundary. There are many domain-specific parallel or comparable corpora that are employed in machine translation and cross-lingual information retrieval. Most of these are corpora between Indo-European languages, such as English/French and English/Spanish. The Asian/Indo-European corpus, especially English/Chinese corpus, is relatively sparse. The objective of the present research is to construct English/ Chinese parallel corpus automatically from the World Wide Web. In this paper, an alignment method is presented which is based an dynamic programming to identify the one-to-one Chinese and English title pairs. The method includes alignment at title level, word level and character level. The longest common subsequence (LCS) is applied to find the most reliabie Chinese translation of an English word. As one word for a language may translate into two or more words repetitively in another language, the edit operation, deletion, is used to resolve redundancy. A score function is then proposed to determine the optimal title pairs. Experiments have been conducted to investigate the performance of the proposed method using the daily press release articles by the Hong Kong SAR government as the test bed. The precision of the result is 0.998 while the recall is 0.806. The release articles and speech articles, published by Hongkong & Shanghai Banking Corporation Limited, are also used to test our method, the precision is 1.00, and the recall is 0.948.
Galvez, C.; Moya-Anegón, F. de: ¬An evaluation of conflation accuracy using finite-state transducers (2006) 0.03
```
0.02627486 = product of:
  0.1313743 = sum of:
    0.1313743 = weight(_text_:dictionaries in 5599) [ClassicSimilarity], result of:
      0.1313743 = score(doc=5599,freq=2.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.4585873 = fieldWeight in 5599, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.046875 = fieldNorm(doc=5599)
  0.2 = coord(1/5)
```
Abstract

Purpose - To evaluate the accuracy of conflation methods based on finite-state transducers (FSTs). Design/methodology/approach - Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm. Findings - The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms. Originality/value - The report outlines the potential of transducers in their application to normalization processes.

Spitkovsky, V.I.; Chang, A.X.: ¬A cross-lingual dictionary for english Wikipedia concepts (2012) 0.03

0.02627486 = product of:
  0.1313743 = sum of:
    0.1313743 = weight(_text_:dictionaries in 336) [ClassicSimilarity], result of:
      0.1313743 = score(doc=336,freq=2.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.4585873 = fieldWeight in 336, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.046875 = fieldNorm(doc=336)
  0.2 = coord(1/5)

Content: Vgl. auch: Spitkovsky, V., P. Norvig: From words to concepts and back: dictionaries for linking text, entities and ideas. In: http://googleresearch.blogspot.de/2012/05/from-words-to-concepts-and-back.html. Für den Datenpool vgl.: nlp.stanford.edu/pubs/corsswikis-data.tar.bz2.

Muresan, S.; Klavans, J.L.: Inducing terminologies from text : a case study for the consumer health domain (2013) 0.03
```
0.02627486 = product of:
  0.1313743 = sum of:
    0.1313743 = weight(_text_:dictionaries in 682) [ClassicSimilarity], result of:
      0.1313743 = score(doc=682,freq=2.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.4585873 = fieldWeight in 682, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.046875 = fieldNorm(doc=682)
  0.2 = coord(1/5)
```
Abstract

Specialized medical ontologies and terminologies, such as SNOMED CT and the Unified Medical Language System (UMLS), have been successfully leveraged in medical information systems to provide a standard web-accessible medium for interoperability, access, and reuse. However, these clinically oriented terminologies and ontologies cannot provide sufficient support when integrated into consumer-oriented applications, because these applications must "understand" both technical and lay vocabulary. The latter is not part of these specialized terminologies and ontologies. In this article, we propose a two-step approach for building consumer health terminologies from text: 1) automatic extraction of definitions from consumer-oriented articles and web documents, which reflects language in use, rather than relying solely on dictionaries, and 2) learning to map definitions expressed in natural language to terminological knowledge by inducing a syntactic-semantic grammar rather than using hand-written patterns or grammars. We present quantitative and qualitative evaluations of our two-step approach, which show that our framework could be used to induce consumer health terminologies from text.
Olsen, K.A.; Williams, J.G.: Spelling and grammar checking using the Web as a text repository (2004) 0.02
```
0.024772175 = product of:
  0.12386087 = sum of:
    0.12386087 = weight(_text_:dictionaries in 2891) [ClassicSimilarity], result of:
      0.12386087 = score(doc=2891,freq=4.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.43236023 = fieldWeight in 2891, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.03125 = fieldNorm(doc=2891)
  0.2 = coord(1/5)
```
Abstract

Natural languages are both complex and dynamic. They are in part formalized through dictionaries and grammar. Dictionaries attempt to provide definitions and examples of various usages for all the words in a language. Grammar, on the other hand, is the system of rules that defines the structure of a language and is concerned with the correct use and application of the language in speaking or writing. The fact that these two mechanisms lag behind the language as currently used is not a serious problem for those living in a language culture and talking their native language. However, the correct choice of words, expressions, and word relationships is much more difficult when speaking or writing in a foreign language. The basics of the grammar of a language may have been learned in school decades ago, and even then there were always several choices for the correct expression for an idea, fact, opinion, or emotion. Although many different parts of speech and their relationships can make for difficult language decisions, prepositions tend to be problematic for nonnative speakers of English, and, in reality, prepositions are a major problem in most languages. Does a speaker or writer say "in the West Coast" or "on the West Coast," or perhaps "at the West Coast"? In Norwegian, we are "in" a city, but "at" a place. But the distinction between cities and places is vague. To be absolutely correct, one really has to learn the right preposition for every single place. A simplistic way of resolving these language issues is to ask a native speaker. But even native speakers may disagree about the right choice of words. If there is disagreement, then one will have to ask more than one native speaker, treat his/her response as a vote for a particular choice, and perhaps choose the majority choice as the best possible alternative. In real life, such a procedure may be impossible or impractical, but in the electronic world, as we shall see, this is quite easy to achieve. Using the vast text repository of the Web, we may get a significant voting base for even the most detailed and distinct phrases. We shall start by introducing a set of examples to present our idea of using the text repository an the Web to aid in making the best word selection, especially for the use of prepositions. Then we will present a more general discussion of the possibilities and limitations of using the Web as an aid for correct writing.
Spitkovsky, V.; Norvig, P.: From words to concepts and back : dictionaries for linking text, entities and ideas (2012) 0.02
```
0.024772175 = product of:
  0.12386087 = sum of:
    0.12386087 = weight(_text_:dictionaries in 337) [ClassicSimilarity], result of:
      0.12386087 = score(doc=337,freq=4.0), product of:
        0.2864761 = queryWeight, product of:
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.041411664 = queryNorm
        0.43236023 = fieldWeight in 337, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.9177637 = idf(docFreq=118, maxDocs=44218)
          0.03125 = fieldNorm(doc=337)
  0.2 = coord(1/5)
```
Abstract

Human language is both rich and ambiguous. When we hear or read words, we resolve meanings to mental representations, for example recognizing and linking names to the intended persons, locations or organizations. Bridging words and meaning - from turning search queries into relevant results to suggesting targeted keywords for advertisers - is also Google's core competency, and important for many other tasks in information retrieval and natural language processing. We are happy to release a resource, spanning 7,560,141 concepts and 175,100,788 unique text strings, that we hope will help everyone working in these areas. How do we represent concepts? Our approach piggybacks on the unique titles of entries from an encyclopedia, which are mostly proper and common noun phrases. We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia's groupings of articles into hierarchical categories. The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article's canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept's url. Our database thus includes weights that measure degrees of association. For example, the top two entries for football indicate that it is an ambiguous term, which is almost twice as likely to refer to what we in the US call soccer. Vgl. auch: Spitkovsky, V.I., A.X. Chang: A cross-lingual dictionary for english Wikipedia concepts. In: http://nlp.stanford.edu/pubs/crosswikis.pdf.

Search (80 results, page 1 of 4)

Authors

Years

Languages

Types

Themes

Subjects

Classifications