Search (26 results, page 1 of 2)

  • × language_ss:"e"
  • × theme_ss:"Computerlinguistik"
  • × year_i:[2000 TO 2010}
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.23
    0.23356806 = product of:
      0.31142408 = sum of:
        0.07317444 = product of:
          0.21952331 = sum of:
            0.21952331 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.21952331 = score(doc=562,freq=2.0), product of:
                0.39059833 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046071928 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.21952331 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.21952331 = score(doc=562,freq=2.0), product of:
            0.39059833 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046071928 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.01872633 = product of:
          0.03745266 = sum of:
            0.03745266 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.03745266 = score(doc=562,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.02
    0.020157062 = product of:
      0.080628246 = sum of:
        0.080628246 = sum of:
          0.04317559 = weight(_text_:design in 4436) [ClassicSimilarity], result of:
            0.04317559 = score(doc=4436,freq=2.0), product of:
              0.17322445 = queryWeight, product of:
                3.7598698 = idf(docFreq=2798, maxDocs=44218)
                0.046071928 = queryNorm
              0.24924651 = fieldWeight in 4436, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7598698 = idf(docFreq=2798, maxDocs=44218)
                0.046875 = fieldNorm(doc=4436)
          0.03745266 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
            0.03745266 = score(doc=4436,freq=2.0), product of:
              0.16133605 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046071928 = queryNorm
              0.23214069 = fieldWeight in 4436, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=4436)
      0.25 = coord(1/4)
    
    Abstract
    Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
    Date
    16. 2.2000 14:22:39
  3. Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.02
    0.020029511 = product of:
      0.080118045 = sum of:
        0.080118045 = sum of:
          0.03597966 = weight(_text_:design in 2541) [ClassicSimilarity], result of:
            0.03597966 = score(doc=2541,freq=2.0), product of:
              0.17322445 = queryWeight, product of:
                3.7598698 = idf(docFreq=2798, maxDocs=44218)
                0.046071928 = queryNorm
              0.20770542 = fieldWeight in 2541, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7598698 = idf(docFreq=2798, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2541)
          0.044138387 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
            0.044138387 = score(doc=2541,freq=4.0), product of:
              0.16133605 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046071928 = queryNorm
              0.27358043 = fieldWeight in 2541, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2541)
      0.25 = coord(1/4)
    
    Abstract
    The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.
    Date
    14. 8.2004 17:22:56
    Source
    Online. 28(2004) no.3, S.22-29
  4. Collins, C.: WordNet explorer : applying visualization principles to lexical semantics (2006) 0.01
    0.010176584 = product of:
      0.040706336 = sum of:
        0.040706336 = product of:
          0.08141267 = sum of:
            0.08141267 = weight(_text_:design in 1288) [ClassicSimilarity], result of:
              0.08141267 = score(doc=1288,freq=4.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.46998373 = fieldWeight in 1288, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1288)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Interface designs for lexical databases in NLP have suffered from not following design principles developed in the information visualization research community. We present a design paradigm and show it can be used to generate visualizations which maximize the usability and utility ofWordNet. The techniques can be generally applied to other lexical databases used in NLP research.
  5. Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.01
    0.009363165 = product of:
      0.03745266 = sum of:
        0.03745266 = product of:
          0.07490532 = sum of:
            0.07490532 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
              0.07490532 = score(doc=4888,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.46428138 = fieldWeight in 4888, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4888)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    1. 3.2013 14:56:22
  6. Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.01
    0.0062964396 = product of:
      0.025185758 = sum of:
        0.025185758 = product of:
          0.050371516 = sum of:
            0.050371516 = weight(_text_:design in 1595) [ClassicSimilarity], result of:
              0.050371516 = score(doc=1595,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.29078758 = fieldWeight in 1595, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1595)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based an the divide-and-conquer principle. The method is evaluated using backpropagation neural networks, such as the machine learning algorithm, that leam to assign MeSH categories to a subset of MEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers, are provided. The results indicate that the use of hierarchical structures improves Performance significantly.
  7. Chen, K.-H.: Evaluating Chinese text retrieval with multilingual queries (2002) 0.01
    0.0062964396 = product of:
      0.025185758 = sum of:
        0.025185758 = product of:
          0.050371516 = sum of:
            0.050371516 = weight(_text_:design in 1851) [ClassicSimilarity], result of:
              0.050371516 = score(doc=1851,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.29078758 = fieldWeight in 1851, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1851)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This paper reports the design of a Chinese test collection with multilingual queries and the application of this test collection to evaluate information retrieval Systems. The effective indexing units, IR models, translation techniques, and query expansion for Chinese text retrieval are identified. The collaboration of East Asian countries for construction of test collections for cross-language multilingual text retrieval is also discussed in this paper. As well, a tool is designed to help assessors judge relevante and gather the events of relevante judgment. The log file created by this tool will be used to analyze the behaviors of assessors in the future.
  8. Moens, M.F.; Dumortier, J.: Use of a text grammar for generating highlight abstracts of magazine articles (2000) 0.01
    0.0062964396 = product of:
      0.025185758 = sum of:
        0.025185758 = product of:
          0.050371516 = sum of:
            0.050371516 = weight(_text_:design in 4540) [ClassicSimilarity], result of:
              0.050371516 = score(doc=4540,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.29078758 = fieldWeight in 4540, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4540)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Browsing a database of article abstracts is one way to select and buy relevant magazine articles online. Our research contributes to the design and development of text grammars for abstracting texts in unlimited subject domains. We developed a system that parses texts based on the text grammar of a specific text type and that extracts sentences and statements which are relevant for inclusion in the abstracts. The system employs knowledge of the discourse patterns that are typical of news stories. The results are encouraging and demonstrate the importance of discourse structures in text summarisation.
  9. Melucci, M.; Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation (2007) 0.01
    0.0062964396 = product of:
      0.025185758 = sum of:
        0.025185758 = product of:
          0.050371516 = sum of:
            0.050371516 = weight(_text_:design in 268) [ClassicSimilarity], result of:
              0.050371516 = score(doc=268,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.29078758 = fieldWeight in 268, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=268)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
  10. Hammwöhner, R.: TransRouter revisited : Decision support in the routing of translation projects (2000) 0.01
    0.0054618465 = product of:
      0.021847386 = sum of:
        0.021847386 = product of:
          0.04369477 = sum of:
            0.04369477 = weight(_text_:22 in 5483) [ClassicSimilarity], result of:
              0.04369477 = score(doc=5483,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.2708308 = fieldWeight in 5483, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5483)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    10.12.2000 18:22:35
  11. Schneider, J.W.; Borlund, P.: ¬A bibliometric-based semiautomatic approach to identification of candidate thesaurus terms : parsing and filtering of noun phrases from citation contexts (2005) 0.01
    0.0054618465 = product of:
      0.021847386 = sum of:
        0.021847386 = product of:
          0.04369477 = sum of:
            0.04369477 = weight(_text_:22 in 156) [ClassicSimilarity], result of:
              0.04369477 = score(doc=156,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.2708308 = fieldWeight in 156, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=156)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    8. 3.2007 19:55:22
  12. Paolillo, J.C.: Linguistics and the information sciences (2009) 0.01
    0.0054618465 = product of:
      0.021847386 = sum of:
        0.021847386 = product of:
          0.04369477 = sum of:
            0.04369477 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
              0.04369477 = score(doc=3840,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.2708308 = fieldWeight in 3840, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3840)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    27. 8.2011 14:22:33
  13. Sidhom, S.; Hassoun, M.: Morpho-syntactic parsing for a text mining environment : An NP recognition model for knowledge visualization and information retrieval (2002) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 1852) [ClassicSimilarity], result of:
              0.04317559 = score(doc=1852,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 1852, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1852)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Sidhom and Hassoun discuss the crucial role of NLP tools in Knowledge Extraction and Management as well as in the design of Information Retrieval Systems. The authors focus more specifically an the morpho-syntactic issues by describing their morpho-syntactic analysis platform, which has been implemented to cover the automatic indexing and information retrieval topics. To this end they implemented the Cascaded "Augmented Transition Network (ATN)". They used this formalism in order to analyse French text descriptions of Multimedia documents. An implementation of an ATN parsing automaton is briefly described. The Platform in its logical operation is considered as an investigative tool towards the knowledge organization (based an an NP recognition model) and management of multiform e-documents (text, multimedia, audio, image) using their text descriptions.
  14. Galvez, C.; Moya-Anegón, F. de; Solana, V.H.: Term conflation methods in information retrieval : non-linguistic and linguistic approaches (2005) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 4394) [ClassicSimilarity], result of:
              0.04317559 = score(doc=4394,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 4394, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4394)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - To propose a categorization of the different conflation procedures at the two basic approaches, non-linguistic and linguistic techniques, and to justify the application of normalization methods within the framework of linguistic techniques. Design/methodology/approach - Presents a range of term conflation methods, that can be used in information retrieval. The uniterm and multiterm variants can be considered equivalent units for the purposes of automatic indexing. Stemming algorithms, segmentation rules, association measures and clustering techniques are well evaluated non-linguistic methods, and experiments with these techniques show a wide variety of results. Alternatively, the lemmatisation and the use of syntactic pattern-matching, through equivalence relations represented in finite-state transducers (FST), are emerging methods for the recognition and standardization of terms. Findings - The survey attempts to point out the positive and negative effects of the linguistic approach and its potential as a term conflation method. Originality/value - Outlines the importance of FSTs for the normalization of term variants.
  15. Galvez, C.; Moya-Anegón, F. de: ¬An evaluation of conflation accuracy using finite-state transducers (2006) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 5599) [ClassicSimilarity], result of:
              0.04317559 = score(doc=5599,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 5599, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5599)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - To evaluate the accuracy of conflation methods based on finite-state transducers (FSTs). Design/methodology/approach - Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm. Findings - The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms. Originality/value - The report outlines the potential of transducers in their application to normalization processes.
  16. Santana Suárez, O.; Carreras Riudavets, F.J.; Hernández Figueroa, Z.; González Cabrera, A.C.: Integration of an XML electronic dictionary with linguistic tools for natural language processing (2007) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 921) [ClassicSimilarity], result of:
              0.04317559 = score(doc=921,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 921, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=921)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This study proposes the codification of lexical information in electronic dictionaries, in accordance with a generic and extendable XML scheme model, and its conjunction with linguistic tools for the processing of natural language. Our approach is different from other similar studies in that we propose XML coding of those items from a dictionary of meanings that are less related to the lexical units. Linguistic information, such as morphology, syllables, phonology, etc., will be included by means of specific linguistic tools. The use of XML as a container for the information allows the use of other XML tools for carrying out searches or for enabling presentation of the information in different resources. This model is particularly important as it combines two parallel paradigms-extendable labelling of documents and computational linguistics-and it is also applicable to other languages. We have included a comparison with the labelling proposal of printed dictionaries carried out by the Text Encoding Initiative (TEI). The proposed design has been validated with a dictionary of more than 145 000 accepted meanings.
  17. Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 2342) [ClassicSimilarity], result of:
              0.04317559 = score(doc=2342,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 2342, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2342)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.
  18. Kreymer, O.: ¬An evaluation of help mechanisms in natural language information retrieval systems (2002) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 2557) [ClassicSimilarity], result of:
              0.04317559 = score(doc=2557,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 2557, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2557)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The field of natural language processing (NLP) demonstrates rapid changes in the design of information retrieval systems and human-computer interaction. While natural language is being looked on as the most effective tool for information retrieval in a contemporary information environment, the systems using it are only beginning to emerge. This study attempts to evaluate the current state of NLP information retrieval systems from the user's point of view: what techniques are used by these systems to guide their users through the search process? The analysis focused on the structure and components of the systems' help mechanisms. Results of the study demonstrated that systems which claimed to be using natural language searching in fact used a wide range of information retrieval techniques from real natural language processing to Boolean searching. As a result, the user assistance mechanisms of these systems also varied. While pseudo-NLP systems would suit a more traditional method of instruction, real NLP systems primarily utilised the methods of explanation and user-system dialogue.
  19. Kettunen, K.: Reductive and generative approaches to management of morphological variation of keywords in monolingual information retrieval : an overview (2009) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 2835) [ClassicSimilarity], result of:
              0.04317559 = score(doc=2835,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 2835, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2835)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - The purpose of this article is to discuss advantages and disadvantages of various means to manage morphological variation of keywords in monolingual information retrieval. Design/methodology/approach - The authors present a compilation of query results from 11 mostly European languages and a new general classification of the language dependent techniques for management of morphological variation. Variants of the different techniques are compared in some detail in terms of retrieval effectiveness and other criteria. The paper consists mainly of an overview of different management methods for keyword variation in information retrieval. Typical IR retrieval results of 11 languages and a new classification for keyword management methods are also presented. Findings - The main results of the paper are an overall comparison of reductive and generative keyword management methods in terms of retrieval effectiveness and other broader criteria. Originality/value - The paper is of value to anyone who wants to get an overall picture of keyword management techniques used in IR.
  20. Li, W.; Wong, K.-F.; Yuan, C.: Toward automatic Chinese temporal information extraction (2001) 0.00
    0.0044974573 = product of:
      0.01798983 = sum of:
        0.01798983 = product of:
          0.03597966 = sum of:
            0.03597966 = weight(_text_:design in 6029) [ClassicSimilarity], result of:
              0.03597966 = score(doc=6029,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.20770542 = fieldWeight in 6029, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6029)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Over the past few years, temporal information processing and temporal database management have increasingly become hot topics. Nevertheless, only a few researchers have investigated these areas in the Chinese language. This lays down the objective of our research: to exploit Chinese language processing techniques for temporal information extraction and concept reasoning. In this article, we first study the mechanism for expressing time in Chinese. On the basis of the study, we then design a general frame structure for maintaining the extracted temporal concepts and propose a system for extracting time-dependent information from Hong Kong financial news. In the system, temporal knowledge is represented by different types of temporal concepts (TTC) and different temporal relations, including absolute and relative relations, which are used to correlate between action times and reference times. In analyzing a sentence, the algorithm first determines the situation related to the verb. This in turn will identify the type of temporal concept associated with the verb. After that, the relevant temporal information is extracted and the temporal relations are derived. These relations link relevant concept frames together in chronological order, which in turn provide the knowledge to fulfill users' queries, e.g., for question-answering (i.e., Q&A) applications