Search (18 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × type_ss:"a"
  • × year_i:[2000 TO 2010}
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07
    0.07164219 = sum of:
      0.053415764 = product of:
        0.21366306 = sum of:
          0.21366306 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
            0.21366306 = score(doc=562,freq=2.0), product of:
              0.38017118 = queryWeight, product of:
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.044842023 = queryNorm
              0.56201804 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.25 = coord(1/4)
      0.018226424 = product of:
        0.03645285 = sum of:
          0.03645285 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
            0.03645285 = score(doc=562,freq=2.0), product of:
              0.15702912 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.044842023 = queryNorm
              0.23214069 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.5 = coord(1/2)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Peis, E.; Herrera-Viedma, E.; Herrera, J.C.: On the evaluation of XML documents using Fuzzy linguistic techniques (2003) 0.02
    0.020225393 = product of:
      0.040450785 = sum of:
        0.040450785 = product of:
          0.08090157 = sum of:
            0.08090157 = weight(_text_:e.g in 2778) [ClassicSimilarity], result of:
              0.08090157 = score(doc=2778,freq=2.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.34583107 = fieldWeight in 2778, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2778)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Recommender systems evaluate and filter the great amount of information available an the Web to assist people in their search processes. A fuzzy evaluation method of XML documents based an computing with words is presented. Given an XML document type (e.g. scientific article), we consider that its elements are not equally informative. This is indicated by the use of a DTD and defining linguistic importance attributes to the more meaningful elements of the DTD designed. Then, the evaluation method generates linguistic recommendations from linguistic evaluation judgements provided by different recommenders an meaningful elements of DTD.
  3. Monnerjahn, P.: Vorsprung ohne Technik : Übersetzen: Computer und Qualität (2000) 0.02
    0.018226424 = product of:
      0.03645285 = sum of:
        0.03645285 = product of:
          0.0729057 = sum of:
            0.0729057 = weight(_text_:22 in 5429) [ClassicSimilarity], result of:
              0.0729057 = score(doc=5429,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.46428138 = fieldWeight in 5429, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5429)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    c't. 2000, H.22, S.230-231
  4. Li, W.; Wong, K.-F.; Yuan, C.: Toward automatic Chinese temporal information extraction (2001) 0.02
    0.016854495 = product of:
      0.03370899 = sum of:
        0.03370899 = product of:
          0.06741798 = sum of:
            0.06741798 = weight(_text_:e.g in 6029) [ClassicSimilarity], result of:
              0.06741798 = score(doc=6029,freq=2.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.28819257 = fieldWeight in 6029, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6029)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Over the past few years, temporal information processing and temporal database management have increasingly become hot topics. Nevertheless, only a few researchers have investigated these areas in the Chinese language. This lays down the objective of our research: to exploit Chinese language processing techniques for temporal information extraction and concept reasoning. In this article, we first study the mechanism for expressing time in Chinese. On the basis of the study, we then design a general frame structure for maintaining the extracted temporal concepts and propose a system for extracting time-dependent information from Hong Kong financial news. In the system, temporal knowledge is represented by different types of temporal concepts (TTC) and different temporal relations, including absolute and relative relations, which are used to correlate between action times and reference times. In analyzing a sentence, the algorithm first determines the situation related to the verb. This in turn will identify the type of temporal concept associated with the verb. After that, the relevant temporal information is extracted and the temporal relations are derived. These relations link relevant concept frames together in chronological order, which in turn provide the knowledge to fulfill users' queries, e.g., for question-answering (i.e., Q&A) applications
  5. Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.02
    0.016854495 = product of:
      0.03370899 = sum of:
        0.03370899 = product of:
          0.06741798 = sum of:
            0.06741798 = weight(_text_:e.g in 831) [ClassicSimilarity], result of:
              0.06741798 = score(doc=831,freq=2.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.28819257 = fieldWeight in 831, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=831)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Purpose - The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task. Design/methodology/approach - Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation-based approach was compared with the non-segmentation-based approach. Findings - There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy. Practical implications - Apply the findings to real web text classification is ongoing work. Originality/value - The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.
  6. Witschel, H.F.: Terminology extraction and automatic indexing : comparison and qualitative evaluation of methods (2005) 0.02
    0.016854495 = product of:
      0.03370899 = sum of:
        0.03370899 = product of:
          0.06741798 = sum of:
            0.06741798 = weight(_text_:e.g in 1842) [ClassicSimilarity], result of:
              0.06741798 = score(doc=1842,freq=2.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.28819257 = fieldWeight in 1842, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1842)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Many terminology engineering processes involve the task of automatic terminology extraction: before the terminology of a given domain can be modelled, organised or standardised, important concepts (or terms) of this domain have to be identified and fed into terminological databases. These serve in further steps as a starting point for compiling dictionaries, thesauri or maybe even terminological ontologies for the domain. For the extraction of the initial concepts, extraction methods are needed that operate on specialised language texts. On the other hand, many machine learning or information retrieval applications require automatic indexing techniques. In Machine Learning applications concerned with the automatic clustering or classification of texts, often feature vectors are needed that describe the contents of a given text briefly but meaningfully. These feature vectors typically consist of a fairly small set of index terms together with weights indicating their importance. Short but meaningful descriptions of document contents as provided by good index terms are also useful to humans: some knowledge management applications (e.g. topic maps) use them as a set of basic concepts (topics). The author believes that the tasks of terminology extraction and automatic indexing have much in common and can thus benefit from the same set of basic algorithms. It is the goal of this paper to outline some methods that may be used in both contexts, but also to find the discriminating factors between the two tasks that call for the variation of parameters or application of different techniques. The discussion of these methods will be based on statistical, syntactical and especially morphological properties of (index) terms. The paper is concluded by the presentation of some qualitative and quantitative results comparing statistical and morphological methods.
  7. Kuhlmann, U.; Monnerjahn, P.: Sprache auf Knopfdruck : Sieben automatische Übersetzungsprogramme im Test (2000) 0.02
    0.0151886875 = product of:
      0.030377375 = sum of:
        0.030377375 = product of:
          0.06075475 = sum of:
            0.06075475 = weight(_text_:22 in 5428) [ClassicSimilarity], result of:
              0.06075475 = score(doc=5428,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.38690117 = fieldWeight in 5428, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=5428)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    c't. 2000, H.22, S.220-229
  8. Bird, S.; Dale, R.; Dorr, B.; Gibson, B.; Joseph, M.; Kan, M.-Y.; Lee, D.; Powley, B.; Radev, D.; Tan, Y.F.: ¬The ACL Anthology Reference Corpus : a reference dataset for bibliographic research in computational linguistics (2008) 0.01
    0.013483594 = product of:
      0.026967188 = sum of:
        0.026967188 = product of:
          0.053934377 = sum of:
            0.053934377 = weight(_text_:e.g in 2804) [ClassicSimilarity], result of:
              0.053934377 = score(doc=2804,freq=2.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.23055404 = fieldWeight in 2804, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2804)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Content
    Vgl. auch: Automatic Term Recognition (ATR) is a research task that deals with the identification of domain-specific terms. Terms, in simple words, are textual realization of significant concepts in an expertise domain. Additionally, domain-specific terms may be classified into a number of categories, in which each category represents a significant concept. A term classification task is often defined on top of an ATR procedure to perform such categorization. For instance, in the biomedical domain, terms can be classified as drugs, proteins, and genes. This is a reference dataset for terminology extraction and classification research in computational linguistics. It is a set of manually annotated terms in English language that are extracted from the ACL Anthology Reference Corpus (ACL ARC). The ACL ARC is a canonicalised and frozen subset of scientific publications in the domain of Human Language Technologies (HLT). It consists of 10,921 articles from 1965 to 2006. The dataset, called ACL RD-TEC, is comprised of more than 69,000 candidate terms that are manually annotated as valid and invalid terms. Furthermore, valid terms are classified as technology and non-technology terms. Technology terms refer to a method, process, or in general a technological concept in the domain of HLT, e.g. machine translation, word sense disambiguation, and language modelling. On the other hand, non-technology terms refer to important concepts other than technological; examples of such terms in the domain of HLT are multilingual lexicon, corpora, word sense, and language model. The dataset is created to serve as a gold standard for the comparison of the algorithms of term recognition and classification. [http://catalog.elra.info/product_info.php?products_id=1236].
  9. Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.01
    0.010740024 = product of:
      0.021480048 = sum of:
        0.021480048 = product of:
          0.042960096 = sum of:
            0.042960096 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
              0.042960096 = score(doc=2541,freq=4.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.27358043 = fieldWeight in 2541, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2541)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    14. 8.2004 17:22:56
    Source
    Online. 28(2004) no.3, S.22-29
  10. Hammwöhner, R.: TransRouter revisited : Decision support in the routing of translation projects (2000) 0.01
    0.010632081 = product of:
      0.021264162 = sum of:
        0.021264162 = product of:
          0.042528324 = sum of:
            0.042528324 = weight(_text_:22 in 5483) [ClassicSimilarity], result of:
              0.042528324 = score(doc=5483,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.2708308 = fieldWeight in 5483, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5483)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    10.12.2000 18:22:35
  11. Schneider, J.W.; Borlund, P.: ¬A bibliometric-based semiautomatic approach to identification of candidate thesaurus terms : parsing and filtering of noun phrases from citation contexts (2005) 0.01
    0.010632081 = product of:
      0.021264162 = sum of:
        0.021264162 = product of:
          0.042528324 = sum of:
            0.042528324 = weight(_text_:22 in 156) [ClassicSimilarity], result of:
              0.042528324 = score(doc=156,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.2708308 = fieldWeight in 156, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=156)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    8. 3.2007 19:55:22
  12. Paolillo, J.C.: Linguistics and the information sciences (2009) 0.01
    0.010632081 = product of:
      0.021264162 = sum of:
        0.021264162 = product of:
          0.042528324 = sum of:
            0.042528324 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
              0.042528324 = score(doc=3840,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.2708308 = fieldWeight in 3840, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3840)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    27. 8.2011 14:22:33
  13. Schneider, R.: Web 3.0 ante portas? : Integration von Social Web und Semantic Web (2008) 0.01
    0.010632081 = product of:
      0.021264162 = sum of:
        0.021264162 = product of:
          0.042528324 = sum of:
            0.042528324 = weight(_text_:22 in 4184) [ClassicSimilarity], result of:
              0.042528324 = score(doc=4184,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.2708308 = fieldWeight in 4184, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4184)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 1.2011 10:38:28
  14. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.01
    0.009113212 = product of:
      0.018226424 = sum of:
        0.018226424 = product of:
          0.03645285 = sum of:
            0.03645285 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
              0.03645285 = score(doc=4436,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.23214069 = fieldWeight in 4436, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4436)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    16. 2.2000 14:22:39
  15. Sienel, J.; Weiss, M.; Laube, M.: Sprachtechnologien für die Informationsgesellschaft des 21. Jahrhunderts (2000) 0.01
    0.0075943437 = product of:
      0.0151886875 = sum of:
        0.0151886875 = product of:
          0.030377375 = sum of:
            0.030377375 = weight(_text_:22 in 5557) [ClassicSimilarity], result of:
              0.030377375 = score(doc=5557,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.19345059 = fieldWeight in 5557, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5557)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    26.12.2000 13:22:17
  16. Schürmann, H.: Software scannt Radio- und Fernsehsendungen : Recherche in Nachrichtenarchiven erleichtert (2001) 0.01
    0.0053160405 = product of:
      0.010632081 = sum of:
        0.010632081 = product of:
          0.021264162 = sum of:
            0.021264162 = weight(_text_:22 in 5759) [ClassicSimilarity], result of:
              0.021264162 = score(doc=5759,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.1354154 = fieldWeight in 5759, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=5759)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Handelsblatt. Nr.79 vom 24.4.2001, S.22
  17. Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.01
    0.0053160405 = product of:
      0.010632081 = sum of:
        0.010632081 = product of:
          0.021264162 = sum of:
            0.021264162 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
              0.021264162 = score(doc=1616,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.1354154 = fieldWeight in 1616, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=1616)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.
  18. Melzer, C.: ¬Der Maschine anpassen : PC-Spracherkennung - Programme sind mittlerweile alltagsreif (2005) 0.01
    0.0053160405 = product of:
      0.010632081 = sum of:
        0.010632081 = product of:
          0.021264162 = sum of:
            0.021264162 = weight(_text_:22 in 4044) [ClassicSimilarity], result of:
              0.021264162 = score(doc=4044,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.1354154 = fieldWeight in 4044, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=4044)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    3. 5.1997 8:44:22