Search (31 results, page 1 of 2)

  • × theme_ss:"Computerlinguistik"
  • × type_ss:"a"
  • × year_i:[2000 TO 2010}
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.23
    0.23356806 = product of:
      0.31142408 = sum of:
        0.07317444 = product of:
          0.21952331 = sum of:
            0.21952331 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.21952331 = score(doc=562,freq=2.0), product of:
                0.39059833 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046071928 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.21952331 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.21952331 = score(doc=562,freq=2.0), product of:
            0.39059833 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046071928 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.01872633 = product of:
          0.03745266 = sum of:
            0.03745266 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.03745266 = score(doc=562,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Manhart, K.: Digitales Kauderwelsch : Online-Übersetzungsdienste (2004) 0.05
    0.04749774 = product of:
      0.09499548 = sum of:
        0.069554016 = weight(_text_:sites in 2077) [ClassicSimilarity], result of:
          0.069554016 = score(doc=2077,freq=2.0), product of:
            0.2408473 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046071928 = queryNorm
            0.28878886 = fieldWeight in 2077, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2077)
        0.02544146 = product of:
          0.05088292 = sum of:
            0.05088292 = weight(_text_:design in 2077) [ClassicSimilarity], result of:
              0.05088292 = score(doc=2077,freq=4.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.29373983 = fieldWeight in 2077, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2077)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Eine englische oder französische Website mal schnell ins Deutsche übersetzen - nichts einfacher als das. OnlineÜbersetzungsdienste versprechen den Sprachtransfer per Mausklick und zum Nulltarif. Doch was taugen sie wirklich? Online-Übersetzungsdienste wollen die Sprachbarriere im WWW beseitigen. Die automatischen Übersetzer versprechen, die E-Mail-Korrespondenz verständlich zu machen und das deutschsprachige Surfen in fremdsprachigen Webangeboten zu ermöglichen. Englische, spanische oder gar chinesische EMails und Websites können damit per Mausklick schnell in die eigene Sprache übertragen werden. Auch komplizierte englische Bedienungsanleitungen oder russische Nachrichten sollen für die Dienste kein Problem sein. Und der eine oder andere Homepage-Besitzer träumt davon, mit Hilfe der digitalen Übersetzungshelfer seine deutsche Website in perfektem Englisch online stellen zu können - in der Hoffung auf internationale Kontakte und höhere Besucherzahlen. Das klingt schön - doch die Realität sieht anders aus. Wer jemals einen solchen Dienst konsultiert hat, reibt sich meist verwundert die Augen über die gebotenen Ergebnisse. Schon einfache Sätze bereiten vielen Online-Über setzern Probleme-und sorgen unfreiwillig für Humor. Aus der CNN-Meldung "Iraq blast injures 31 U.S. troops" wird im Deutschen der Satz: "Der Irak Knall verletzt 31 Vereinigte Staaten Truppen." Sites mit schwierigem Satzbau können die Übersetzer oft nur unverständlich wiedergeben. Den Satz "The Slider is equipped with a brilliant color screen and sports an innovative design that slides open with a push of your thumb" übersetzt der bekannteste Online-Dolmetscher Babelfish mit folgendem Kauderwelsch: "Der Schweber wird mit einem leuchtenden Farbe Schirm ausgerüstet und ein erfinderisches Design sports, das geöffnetes mit einem Stoß Ihres Daumens schiebt." Solch dadaistische Texte muten alle Übersetzer ihren Nutzern zu.
  3. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.02
    0.020157062 = product of:
      0.080628246 = sum of:
        0.080628246 = sum of:
          0.04317559 = weight(_text_:design in 4436) [ClassicSimilarity], result of:
            0.04317559 = score(doc=4436,freq=2.0), product of:
              0.17322445 = queryWeight, product of:
                3.7598698 = idf(docFreq=2798, maxDocs=44218)
                0.046071928 = queryNorm
              0.24924651 = fieldWeight in 4436, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7598698 = idf(docFreq=2798, maxDocs=44218)
                0.046875 = fieldNorm(doc=4436)
          0.03745266 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
            0.03745266 = score(doc=4436,freq=2.0), product of:
              0.16133605 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046071928 = queryNorm
              0.23214069 = fieldWeight in 4436, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=4436)
      0.25 = coord(1/4)
    
    Abstract
    Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
    Date
    16. 2.2000 14:22:39
  4. Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.02
    0.020029511 = product of:
      0.080118045 = sum of:
        0.080118045 = sum of:
          0.03597966 = weight(_text_:design in 2541) [ClassicSimilarity], result of:
            0.03597966 = score(doc=2541,freq=2.0), product of:
              0.17322445 = queryWeight, product of:
                3.7598698 = idf(docFreq=2798, maxDocs=44218)
                0.046071928 = queryNorm
              0.20770542 = fieldWeight in 2541, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7598698 = idf(docFreq=2798, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2541)
          0.044138387 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
            0.044138387 = score(doc=2541,freq=4.0), product of:
              0.16133605 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046071928 = queryNorm
              0.27358043 = fieldWeight in 2541, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2541)
      0.25 = coord(1/4)
    
    Abstract
    The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.
    Date
    14. 8.2004 17:22:56
    Source
    Online. 28(2004) no.3, S.22-29
  5. Monnerjahn, P.: Vorsprung ohne Technik : Übersetzen: Computer und Qualität (2000) 0.01
    0.009363165 = product of:
      0.03745266 = sum of:
        0.03745266 = product of:
          0.07490532 = sum of:
            0.07490532 = weight(_text_:22 in 5429) [ClassicSimilarity], result of:
              0.07490532 = score(doc=5429,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.46428138 = fieldWeight in 5429, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5429)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    c't. 2000, H.22, S.230-231
  6. Kuhlmann, U.; Monnerjahn, P.: Sprache auf Knopfdruck : Sieben automatische Übersetzungsprogramme im Test (2000) 0.01
    0.007802638 = product of:
      0.031210553 = sum of:
        0.031210553 = product of:
          0.062421106 = sum of:
            0.062421106 = weight(_text_:22 in 5428) [ClassicSimilarity], result of:
              0.062421106 = score(doc=5428,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.38690117 = fieldWeight in 5428, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=5428)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    c't. 2000, H.22, S.220-229
  7. Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.01
    0.0062964396 = product of:
      0.025185758 = sum of:
        0.025185758 = product of:
          0.050371516 = sum of:
            0.050371516 = weight(_text_:design in 1595) [ClassicSimilarity], result of:
              0.050371516 = score(doc=1595,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.29078758 = fieldWeight in 1595, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1595)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based an the divide-and-conquer principle. The method is evaluated using backpropagation neural networks, such as the machine learning algorithm, that leam to assign MeSH categories to a subset of MEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers, are provided. The results indicate that the use of hierarchical structures improves Performance significantly.
  8. Neumann, H.: Inszenierung und Metabotschaften eines periodisch getakteten Fernsehauftritts : Die Neujahrsansprachen der Bundeskanzler Helmut Kohl und Gerhard Schröder im Vergleich (2003) 0.01
    0.0062964396 = product of:
      0.025185758 = sum of:
        0.025185758 = product of:
          0.050371516 = sum of:
            0.050371516 = weight(_text_:design in 1632) [ClassicSimilarity], result of:
              0.050371516 = score(doc=1632,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.29078758 = fieldWeight in 1632, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1632)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Herstellen der gleichen Wellenlänge zwischen Sender und Empfänger entscheidet über den kommunikativen Erfolg -gerade auch im politischen Bereich. Unter politikwissenschaftlicher als auch unter kommunikationswissenschaftlicher Fragestellung werden in der vorliegenden Arbeit acht Neujahrsansprachen von 1994 bis 2001 der Bundeskanzler Helmut Kohl und Gerhard Schröder einer systematischen Analyse unterzogen. Es findet eine Untersuchung der Sach- und Beziehungsebene statt. Verbale und visuelle Rhetorik beider Bundeskanzler werden miteinander verglichen und decodiert. Die Arbeit gibt zum einen Aufschluss über die Metabotschaften und das Corporate Design beider Bundeskanzler und diskutiert zum anderen Vor- und Nachteile der Kommunikationsstrategien zweier Kommunikationstypen, die unterschiedlicher nicht sein können.
  9. Chen, K.-H.: Evaluating Chinese text retrieval with multilingual queries (2002) 0.01
    0.0062964396 = product of:
      0.025185758 = sum of:
        0.025185758 = product of:
          0.050371516 = sum of:
            0.050371516 = weight(_text_:design in 1851) [ClassicSimilarity], result of:
              0.050371516 = score(doc=1851,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.29078758 = fieldWeight in 1851, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1851)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This paper reports the design of a Chinese test collection with multilingual queries and the application of this test collection to evaluate information retrieval Systems. The effective indexing units, IR models, translation techniques, and query expansion for Chinese text retrieval are identified. The collaboration of East Asian countries for construction of test collections for cross-language multilingual text retrieval is also discussed in this paper. As well, a tool is designed to help assessors judge relevante and gather the events of relevante judgment. The log file created by this tool will be used to analyze the behaviors of assessors in the future.
  10. Moens, M.F.; Dumortier, J.: Use of a text grammar for generating highlight abstracts of magazine articles (2000) 0.01
    0.0062964396 = product of:
      0.025185758 = sum of:
        0.025185758 = product of:
          0.050371516 = sum of:
            0.050371516 = weight(_text_:design in 4540) [ClassicSimilarity], result of:
              0.050371516 = score(doc=4540,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.29078758 = fieldWeight in 4540, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4540)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Browsing a database of article abstracts is one way to select and buy relevant magazine articles online. Our research contributes to the design and development of text grammars for abstracting texts in unlimited subject domains. We developed a system that parses texts based on the text grammar of a specific text type and that extracts sentences and statements which are relevant for inclusion in the abstracts. The system employs knowledge of the discourse patterns that are typical of news stories. The results are encouraging and demonstrate the importance of discourse structures in text summarisation.
  11. Melucci, M.; Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation (2007) 0.01
    0.0062964396 = product of:
      0.025185758 = sum of:
        0.025185758 = product of:
          0.050371516 = sum of:
            0.050371516 = weight(_text_:design in 268) [ClassicSimilarity], result of:
              0.050371516 = score(doc=268,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.29078758 = fieldWeight in 268, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=268)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
  12. Hammwöhner, R.: TransRouter revisited : Decision support in the routing of translation projects (2000) 0.01
    0.0054618465 = product of:
      0.021847386 = sum of:
        0.021847386 = product of:
          0.04369477 = sum of:
            0.04369477 = weight(_text_:22 in 5483) [ClassicSimilarity], result of:
              0.04369477 = score(doc=5483,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.2708308 = fieldWeight in 5483, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5483)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    10.12.2000 18:22:35
  13. Schneider, J.W.; Borlund, P.: ¬A bibliometric-based semiautomatic approach to identification of candidate thesaurus terms : parsing and filtering of noun phrases from citation contexts (2005) 0.01
    0.0054618465 = product of:
      0.021847386 = sum of:
        0.021847386 = product of:
          0.04369477 = sum of:
            0.04369477 = weight(_text_:22 in 156) [ClassicSimilarity], result of:
              0.04369477 = score(doc=156,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.2708308 = fieldWeight in 156, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=156)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    8. 3.2007 19:55:22
  14. Paolillo, J.C.: Linguistics and the information sciences (2009) 0.01
    0.0054618465 = product of:
      0.021847386 = sum of:
        0.021847386 = product of:
          0.04369477 = sum of:
            0.04369477 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
              0.04369477 = score(doc=3840,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.2708308 = fieldWeight in 3840, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3840)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    27. 8.2011 14:22:33
  15. Schneider, R.: Web 3.0 ante portas? : Integration von Social Web und Semantic Web (2008) 0.01
    0.0054618465 = product of:
      0.021847386 = sum of:
        0.021847386 = product of:
          0.04369477 = sum of:
            0.04369477 = weight(_text_:22 in 4184) [ClassicSimilarity], result of:
              0.04369477 = score(doc=4184,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.2708308 = fieldWeight in 4184, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4184)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 1.2011 10:38:28
  16. Sidhom, S.; Hassoun, M.: Morpho-syntactic parsing for a text mining environment : An NP recognition model for knowledge visualization and information retrieval (2002) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 1852) [ClassicSimilarity], result of:
              0.04317559 = score(doc=1852,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 1852, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1852)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Sidhom and Hassoun discuss the crucial role of NLP tools in Knowledge Extraction and Management as well as in the design of Information Retrieval Systems. The authors focus more specifically an the morpho-syntactic issues by describing their morpho-syntactic analysis platform, which has been implemented to cover the automatic indexing and information retrieval topics. To this end they implemented the Cascaded "Augmented Transition Network (ATN)". They used this formalism in order to analyse French text descriptions of Multimedia documents. An implementation of an ATN parsing automaton is briefly described. The Platform in its logical operation is considered as an investigative tool towards the knowledge organization (based an an NP recognition model) and management of multiform e-documents (text, multimedia, audio, image) using their text descriptions.
  17. Galvez, C.; Moya-Anegón, F. de; Solana, V.H.: Term conflation methods in information retrieval : non-linguistic and linguistic approaches (2005) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 4394) [ClassicSimilarity], result of:
              0.04317559 = score(doc=4394,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 4394, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4394)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - To propose a categorization of the different conflation procedures at the two basic approaches, non-linguistic and linguistic techniques, and to justify the application of normalization methods within the framework of linguistic techniques. Design/methodology/approach - Presents a range of term conflation methods, that can be used in information retrieval. The uniterm and multiterm variants can be considered equivalent units for the purposes of automatic indexing. Stemming algorithms, segmentation rules, association measures and clustering techniques are well evaluated non-linguistic methods, and experiments with these techniques show a wide variety of results. Alternatively, the lemmatisation and the use of syntactic pattern-matching, through equivalence relations represented in finite-state transducers (FST), are emerging methods for the recognition and standardization of terms. Findings - The survey attempts to point out the positive and negative effects of the linguistic approach and its potential as a term conflation method. Originality/value - Outlines the importance of FSTs for the normalization of term variants.
  18. Galvez, C.; Moya-Anegón, F. de: ¬An evaluation of conflation accuracy using finite-state transducers (2006) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 5599) [ClassicSimilarity], result of:
              0.04317559 = score(doc=5599,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 5599, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5599)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - To evaluate the accuracy of conflation methods based on finite-state transducers (FSTs). Design/methodology/approach - Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm. Findings - The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms. Originality/value - The report outlines the potential of transducers in their application to normalization processes.
  19. Santana Suárez, O.; Carreras Riudavets, F.J.; Hernández Figueroa, Z.; González Cabrera, A.C.: Integration of an XML electronic dictionary with linguistic tools for natural language processing (2007) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 921) [ClassicSimilarity], result of:
              0.04317559 = score(doc=921,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 921, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=921)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This study proposes the codification of lexical information in electronic dictionaries, in accordance with a generic and extendable XML scheme model, and its conjunction with linguistic tools for the processing of natural language. Our approach is different from other similar studies in that we propose XML coding of those items from a dictionary of meanings that are less related to the lexical units. Linguistic information, such as morphology, syllables, phonology, etc., will be included by means of specific linguistic tools. The use of XML as a container for the information allows the use of other XML tools for carrying out searches or for enabling presentation of the information in different resources. This model is particularly important as it combines two parallel paradigms-extendable labelling of documents and computational linguistics-and it is also applicable to other languages. We have included a comparison with the labelling proposal of printed dictionaries carried out by the Text Encoding Initiative (TEI). The proposed design has been validated with a dictionary of more than 145 000 accepted meanings.
  20. Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 2342) [ClassicSimilarity], result of:
              0.04317559 = score(doc=2342,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 2342, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2342)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.