Search (102 results, page 4 of 6)

  • × theme_ss:"Computerlinguistik"
  1. Rahmstorf, G.: Information retrieval using conceptual representations of phrases (1994) 0.01
    0.0080434345 = product of:
      0.024130303 = sum of:
        0.024130303 = product of:
          0.048260607 = sum of:
            0.048260607 = weight(_text_:indexing in 7862) [ClassicSimilarity], result of:
              0.048260607 = score(doc=7862,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2537542 = fieldWeight in 7862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=7862)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The information retrieval problem is described starting from an analysis of the concepts 'user's information request' and 'information offerings of texts'. It is shown that natural language phrases are a more adequate medium for expressing information requests and information offerings than character string based query and indexing languages complemented by Boolean oprators. The phrases must be represented as concepts to reach a language invariant level for rule based relevance analysis. The special type of representation called advanced thesaurus is used for the semantic representation of natural language phrases and for relevance processing. The analysis of the retrieval problem leads to a symmetric system structure
  2. Mustafa el Hadi, W.: Automatic term recognition & extraction tools : examining the new interfaces and their effective communication role in LSP discourse (1998) 0.01
    0.0080434345 = product of:
      0.024130303 = sum of:
        0.024130303 = product of:
          0.048260607 = sum of:
            0.048260607 = weight(_text_:indexing in 67) [ClassicSimilarity], result of:
              0.048260607 = score(doc=67,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2537542 = fieldWeight in 67, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=67)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    In this paper we will discuss the possibility of reorienting NLP (Natural Language Processing) systems towards the extraction, not only of terms and their semantic relations, but also towards a variety of other uses; the storage, accessing and retrieving of Language for Special Purposes (LSPZ-20) lexical combinations, the provision of contexts and other information on terms through the integration of more interfaces to terminological data-bases, term managing systems and existing NLP systems. The aim of making such interfaces available is to increase the efficiency of the systems and improve the terminology-oriented text analysis. Since automatic term extraction is the backbone of many applications such as machine translation (MT), indexing, technical writing, thesaurus construction and knowledge representation developments in this area will have asignificant impact
  3. Sidhom, S.; Hassoun, M.: Morpho-syntactic parsing for a text mining environment : An NP recognition model for knowledge visualization and information retrieval (2002) 0.01
    0.0080434345 = product of:
      0.024130303 = sum of:
        0.024130303 = product of:
          0.048260607 = sum of:
            0.048260607 = weight(_text_:indexing in 1852) [ClassicSimilarity], result of:
              0.048260607 = score(doc=1852,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2537542 = fieldWeight in 1852, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1852)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Sidhom and Hassoun discuss the crucial role of NLP tools in Knowledge Extraction and Management as well as in the design of Information Retrieval Systems. The authors focus more specifically an the morpho-syntactic issues by describing their morpho-syntactic analysis platform, which has been implemented to cover the automatic indexing and information retrieval topics. To this end they implemented the Cascaded "Augmented Transition Network (ATN)". They used this formalism in order to analyse French text descriptions of Multimedia documents. An implementation of an ATN parsing automaton is briefly described. The Platform in its logical operation is considered as an investigative tool towards the knowledge organization (based an an NP recognition model) and management of multiform e-documents (text, multimedia, audio, image) using their text descriptions.
  4. Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.01
    0.0080434345 = product of:
      0.024130303 = sum of:
        0.024130303 = product of:
          0.048260607 = sum of:
            0.048260607 = weight(_text_:indexing in 3390) [ClassicSimilarity], result of:
              0.048260607 = score(doc=3390,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2537542 = fieldWeight in 3390, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3390)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to 1960. Until the late '80s, the dominant approach to the problem involved knowledge-engineering automatic categorisers, i.e. manually building a set of rules encoding expert knowledge an how to classify documents. In the '90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest. A newer paradigm based an machine learning has superseded the previous approach. Within this paradigm, a general inductive process automatically builds a classifier by "learning", from a set of previously classified documents, the characteristics of one or more categories; the advantages are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this tutorial we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues of document indexing, classifier construction, and classifier evaluation, will be touched upon.
  5. Galvez, C.; Moya-Anegón, F. de; Solana, V.H.: Term conflation methods in information retrieval : non-linguistic and linguistic approaches (2005) 0.01
    0.0080434345 = product of:
      0.024130303 = sum of:
        0.024130303 = product of:
          0.048260607 = sum of:
            0.048260607 = weight(_text_:indexing in 4394) [ClassicSimilarity], result of:
              0.048260607 = score(doc=4394,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2537542 = fieldWeight in 4394, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4394)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - To propose a categorization of the different conflation procedures at the two basic approaches, non-linguistic and linguistic techniques, and to justify the application of normalization methods within the framework of linguistic techniques. Design/methodology/approach - Presents a range of term conflation methods, that can be used in information retrieval. The uniterm and multiterm variants can be considered equivalent units for the purposes of automatic indexing. Stemming algorithms, segmentation rules, association measures and clustering techniques are well evaluated non-linguistic methods, and experiments with these techniques show a wide variety of results. Alternatively, the lemmatisation and the use of syntactic pattern-matching, through equivalence relations represented in finite-state transducers (FST), are emerging methods for the recognition and standardization of terms. Findings - The survey attempts to point out the positive and negative effects of the linguistic approach and its potential as a term conflation method. Originality/value - Outlines the importance of FSTs for the normalization of term variants.
  6. Warner, J.: Linguistics and information theory : analytic advantages (2007) 0.01
    0.0080434345 = product of:
      0.024130303 = sum of:
        0.024130303 = product of:
          0.048260607 = sum of:
            0.048260607 = weight(_text_:indexing in 77) [ClassicSimilarity], result of:
              0.048260607 = score(doc=77,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2537542 = fieldWeight in 77, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=77)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The analytic advantages of central concepts from linguistics and information theory, and the analogies demonstrated between them, for understanding patterns of retrieval from full-text indexes to documents are developed. The interaction between the syntagm and the paradigm in computational operations on written language in indexing, searching, and retrieval is used to account for transformations of the signified or meaning between documents and their representation and between queries and documents retrieved. Characteristics of the message, and messages for selection for written language, are brought to explain the relative frequency of occurrence of words and multiple word sequences in documents. The examples given in the companion article are revisited and a fuller example introduced. The signified of the sequence stood for, the term classically used in the definitions of the sign, as something standing for something else, can itself change rapidly according to its syntagm. A greater than ordinary discourse understanding of patterns in retrieval is obtained.
  7. Savoy, J.: Searching strategies for the Hungarian language (2008) 0.01
    0.0080434345 = product of:
      0.024130303 = sum of:
        0.024130303 = product of:
          0.048260607 = sum of:
            0.048260607 = weight(_text_:indexing in 2037) [ClassicSimilarity], result of:
              0.048260607 = score(doc=2037,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2537542 = fieldWeight in 2037, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2037)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper reports on the underlying IR problems encountered when dealing with the complex morphology and compound constructions found in the Hungarian language. It describes evaluations carried out on two general stemming strategies for this language, and also demonstrates that a light stemming approach could be quite effective. Based on searches done on the CLEF test collection, we find that a more aggressive suffix-stripping approach may produce better MAP. When compared to an IR scheme without stemming or one based on only a light stemmer, we find the differences to be statistically significant. When compared with probabilistic, vector-space and language models, we find that the Okapi model results in the best retrieval effectiveness. The resulting MAP is found to be about 35% better than the classical tf idf approach, particularly for very short requests. Finally, we demonstrate that applying an automatic decompounding procedure for both queries and documents significantly improves IR performance (+10%), compared to word-based indexing strategies.
  8. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.01
    0.0080434345 = product of:
      0.024130303 = sum of:
        0.024130303 = product of:
          0.048260607 = sum of:
            0.048260607 = weight(_text_:indexing in 2950) [ClassicSimilarity], result of:
              0.048260607 = score(doc=2950,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2537542 = fieldWeight in 2950, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2950)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    It is important in information retrieval (IR), information extraction, or classification tasks that morphologically related forms are conflated under the same stem (using stemmer) or lemma (using morphological analyzer). To achieve this for the English language, algorithmic stemming or various morphological analysis approaches have been suggested. Based on Cross-Language Evaluation Forum test collections containing 284 queries and various IR models, this article evaluates these word-normalization proposals. Stemming improves the mean average precision significantly by around 7% while performance differences are not significant when comparing various algorithmic stemmers or algorithmic stemmers and morphological analysis. Accounting for thesaurus class numbers during indexing does not modify overall retrieval performances. Finally, we demonstrate that including a stop word list, even one containing only around 10 terms, might significantly improve retrieval performance, depending on the IR model.
  9. Airio, E.; Kettunen, K.: Does dictionary based bilingual retrieval work in a non-normalized index? (2009) 0.01
    0.0080434345 = product of:
      0.024130303 = sum of:
        0.024130303 = product of:
          0.048260607 = sum of:
            0.048260607 = weight(_text_:indexing in 4224) [ClassicSimilarity], result of:
              0.048260607 = score(doc=4224,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2537542 = fieldWeight in 4224, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4224)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English-Finnish, English-Swedish, Swedish-Finnish and Finnish-Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish-Swedish, where s-gramming outperformed FCG.
  10. Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.01
    0.007933255 = product of:
      0.023799766 = sum of:
        0.023799766 = product of:
          0.04759953 = sum of:
            0.04759953 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
              0.04759953 = score(doc=2541,freq=4.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.27358043 = fieldWeight in 2541, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2541)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    14. 8.2004 17:22:56
    Source
    Online. 28(2004) no.3, S.22-29
  11. Kay, M.: ¬The proper place of men and machines in language translation (1997) 0.01
    0.007853523 = product of:
      0.023560567 = sum of:
        0.023560567 = product of:
          0.047121134 = sum of:
            0.047121134 = weight(_text_:22 in 1178) [ClassicSimilarity], result of:
              0.047121134 = score(doc=1178,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2708308 = fieldWeight in 1178, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1178)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    31. 7.1996 9:22:19
  12. Liddy, E.D.: Natural language processing for information retrieval and knowledge discovery (1998) 0.01
    0.007853523 = product of:
      0.023560567 = sum of:
        0.023560567 = product of:
          0.047121134 = sum of:
            0.047121134 = weight(_text_:22 in 2345) [ClassicSimilarity], result of:
              0.047121134 = score(doc=2345,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2708308 = fieldWeight in 2345, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2345)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    22. 9.1997 19:16:05
  13. Godby, J.: WordSmith research project bridges gap between tokens and indexes (1998) 0.01
    0.007853523 = product of:
      0.023560567 = sum of:
        0.023560567 = product of:
          0.047121134 = sum of:
            0.047121134 = weight(_text_:22 in 4729) [ClassicSimilarity], result of:
              0.047121134 = score(doc=4729,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2708308 = fieldWeight in 4729, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4729)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    OCLC newsletter. 1998, no.234, Jul/Aug, S.22-24
  14. Hammwöhner, R.: TransRouter revisited : Decision support in the routing of translation projects (2000) 0.01
    0.007853523 = product of:
      0.023560567 = sum of:
        0.023560567 = product of:
          0.047121134 = sum of:
            0.047121134 = weight(_text_:22 in 5483) [ClassicSimilarity], result of:
              0.047121134 = score(doc=5483,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2708308 = fieldWeight in 5483, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5483)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    10.12.2000 18:22:35
  15. Melby, A.: Some notes on 'The proper place of men and machines in language translation' (1997) 0.01
    0.007853523 = product of:
      0.023560567 = sum of:
        0.023560567 = product of:
          0.047121134 = sum of:
            0.047121134 = weight(_text_:22 in 330) [ClassicSimilarity], result of:
              0.047121134 = score(doc=330,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2708308 = fieldWeight in 330, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=330)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    31. 7.1996 9:22:19
  16. Schneider, J.W.; Borlund, P.: ¬A bibliometric-based semiautomatic approach to identification of candidate thesaurus terms : parsing and filtering of noun phrases from citation contexts (2005) 0.01
    0.007853523 = product of:
      0.023560567 = sum of:
        0.023560567 = product of:
          0.047121134 = sum of:
            0.047121134 = weight(_text_:22 in 156) [ClassicSimilarity], result of:
              0.047121134 = score(doc=156,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2708308 = fieldWeight in 156, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=156)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    8. 3.2007 19:55:22
  17. Paolillo, J.C.: Linguistics and the information sciences (2009) 0.01
    0.007853523 = product of:
      0.023560567 = sum of:
        0.023560567 = product of:
          0.047121134 = sum of:
            0.047121134 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
              0.047121134 = score(doc=3840,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2708308 = fieldWeight in 3840, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3840)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    27. 8.2011 14:22:33
  18. Schneider, R.: Web 3.0 ante portas? : Integration von Social Web und Semantic Web (2008) 0.01
    0.007853523 = product of:
      0.023560567 = sum of:
        0.023560567 = product of:
          0.047121134 = sum of:
            0.047121134 = weight(_text_:22 in 4184) [ClassicSimilarity], result of:
              0.047121134 = score(doc=4184,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2708308 = fieldWeight in 4184, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4184)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    22. 1.2011 10:38:28
  19. Humphrey, S.M.; Rogers, W.J.; Kilicoglu, H.; Demner-Fushman, D.; Rindflesch, T.C.: Word sense disambiguation by selecting the best semantic type based on journal descriptor indexing : preliminary experiment (2006) 0.01
    0.0075834226 = product of:
      0.022750268 = sum of:
        0.022750268 = product of:
          0.045500536 = sum of:
            0.045500536 = weight(_text_:indexing in 4912) [ClassicSimilarity], result of:
              0.045500536 = score(doc=4912,freq=4.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.23924173 = fieldWeight in 4912, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4912)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    An experiment was performed at the National Library of Medicine® (NLM®) in word sense disambiguation (WSD) using the Journal Descriptor Indexing (JDI) methodology. The motivation is the need to solve the ambiguity problem confronting NLM's MetaMap system, which maps free text to terms corresponding to concepts in NLM's Unified Medical Language System® (UMLS®) Metathesaurus®. If the text maps to more than one Metathesaurus concept at the same high confidence score, MetaMap has no way of knowing which concept is the correct mapping. We describe the JDI methodology, which is ultimately based an statistical associations between words in a training set of MEDLINE® citations and a small set of journal descriptors (assigned by humans to journals per se) assumed to be inherited by the citations. JDI is the basis for selecting the best meaning that is correlated to UMLS semantic types (STs) assigned to ambiguous concepts in the Metathesaurus. For example, the ambiguity transport has two meanings: "Biological Transport" assigned the ST Cell Function and "Patient transport" assigned the ST Health Care Activity. A JDI-based methodology can analyze text containing transport and determine which ST receives a higher score for that text, which then returns the associated meaning, presumed to apply to the ambiguity itself. We then present an experiment in which a baseline disambiguation method was compared to four versions of JDI in disambiguating 45 ambiguous strings from NLM's WSD Test Collection. Overall average precision for the highest-scoring JDI version was 0.7873 compared to 0.2492 for the baseline method, and average precision for individual ambiguities was greater than 0.90 for 23 of them (51%), greater than 0.85 for 24 (53%), and greater than 0.65 for 35 (79%). On the basis of these results, we hope to improve performance of JDI and test its use in applications.
  20. Dorr, B.J.: Large-scale dictionary construction for foreign language tutoring and interlingual machine translation (1997) 0.01
    0.0067315903 = product of:
      0.02019477 = sum of:
        0.02019477 = product of:
          0.04038954 = sum of:
            0.04038954 = weight(_text_:22 in 3244) [ClassicSimilarity], result of:
              0.04038954 = score(doc=3244,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.23214069 = fieldWeight in 3244, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3244)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    31. 7.1996 9:22:19

Languages

  • e 78
  • d 21
  • da 1
  • f 1
  • m 1
  • More… Less…

Types

  • a 82
  • m 9
  • el 8
  • s 4
  • p 3
  • x 2
  • b 1
  • d 1
  • More… Less…

Classifications