Search (38 results, page 2 of 2)

  • × theme_ss:"Computerlinguistik"
  • × type_ss:"a"
  • × year_i:[2000 TO 2010}
  1. Kreymer, O.: ¬An evaluation of help mechanisms in natural language information retrieval systems (2002) 0.01
    0.013419857 = product of:
      0.04025957 = sum of:
        0.04025957 = weight(_text_:search in 2557) [ClassicSimilarity], result of:
          0.04025957 = score(doc=2557,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.230407 = fieldWeight in 2557, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=2557)
      0.33333334 = coord(1/3)
    
    Abstract
    The field of natural language processing (NLP) demonstrates rapid changes in the design of information retrieval systems and human-computer interaction. While natural language is being looked on as the most effective tool for information retrieval in a contemporary information environment, the systems using it are only beginning to emerge. This study attempts to evaluate the current state of NLP information retrieval systems from the user's point of view: what techniques are used by these systems to guide their users through the search process? The analysis focused on the structure and components of the systems' help mechanisms. Results of the study demonstrated that systems which claimed to be using natural language searching in fact used a wide range of information retrieval techniques from real natural language processing to Boolean searching. As a result, the user assistance mechanisms of these systems also varied. While pseudo-NLP systems would suit a more traditional method of instruction, real NLP systems primarily utilised the methods of explanation and user-system dialogue.
  2. Kuhlmann, U.; Monnerjahn, P.: Sprache auf Knopfdruck : Sieben automatische Übersetzungsprogramme im Test (2000) 0.01
    0.011352143 = product of:
      0.03405643 = sum of:
        0.03405643 = product of:
          0.06811286 = sum of:
            0.06811286 = weight(_text_:22 in 5428) [ClassicSimilarity], result of:
              0.06811286 = score(doc=5428,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.38690117 = fieldWeight in 5428, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=5428)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    c't. 2000, H.22, S.220-229
  3. Aljlayl, M.; Frieder, O.; Grossman, D.: On bidirectional English-Arabic search (2002) 0.01
    0.011183213 = product of:
      0.03354964 = sum of:
        0.03354964 = weight(_text_:search in 5227) [ClassicSimilarity], result of:
          0.03354964 = score(doc=5227,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.19200584 = fieldWeight in 5227, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5227)
      0.33333334 = coord(1/3)
    
  4. Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.01
    0.011183213 = product of:
      0.03354964 = sum of:
        0.03354964 = weight(_text_:search in 831) [ClassicSimilarity], result of:
          0.03354964 = score(doc=831,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.19200584 = fieldWeight in 831, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=831)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task. Design/methodology/approach - Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation-based approach was compared with the non-segmentation-based approach. Findings - There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy. Practical implications - Apply the findings to real web text classification is ongoing work. Originality/value - The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.
  5. Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.01
    0.011183213 = product of:
      0.03354964 = sum of:
        0.03354964 = weight(_text_:search in 896) [ClassicSimilarity], result of:
          0.03354964 = score(doc=896,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.19200584 = fieldWeight in 896, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=896)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper deals with Swedish full text retrieval and the problem of morphological variation of query terms in the document database. The effects of combination of indexing strategies with query terms on retrieval effectiveness were studied. Three of five tested combinations involved indexing strategies that used conflation, in the form of normalization. Further, two of these three combinations used indexing strategies that employed compound splitting. Normalization and compound splitting were performed by SWETWOL, a morphological analyzer for the Swedish language. A fourth combination attempted to group related terms by right hand truncation of query terms. The four combinations were compared to each other and to a baseline combination, where no attempt was made to counteract the problem of morphological variation of query terms in the document database. The five combinations were evaluated under six different user scenarios, where each scenario simulated a certain user type. The four alternative combinations outperformed the baseline, for each user scenario. The truncation combination had the best performance under each user scenario. The main conclusion of the paper is that normalization and right hand truncation (performed by a search expert) enhanced retrieval effectiveness in comparison to the baseline. The performance of the three combinations of indexing strategies with query terms based on normalization was not far below the performance of the truncation combination.
  6. Kishida, K.: Term disambiguation techniques based on target document collection for cross-language information retrieval : an empirical comparison of performance between techniques (2007) 0.01
    0.011183213 = product of:
      0.03354964 = sum of:
        0.03354964 = weight(_text_:search in 897) [ClassicSimilarity], result of:
          0.03354964 = score(doc=897,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.19200584 = fieldWeight in 897, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=897)
      0.33333334 = coord(1/3)
    
    Abstract
    Dictionary-based query translation for cross-language information retrieval often yields various translation candidates having different meanings for a source term in the query. This paper examines methods for solving the ambiguity of translations based on only the target document collections. First, we discuss two kinds of disambiguation technique: (1) one is a method using term co-occurrence statistics in the collection, and (2) a technique based on pseudo-relevance feedback. Next, these techniques are empirically compared using the CLEF 2003 test collection for German to Italian bilingual searches, which are executed by using English language as a pivot. The experiments showed that a variation of term co-occurrence based techniques, in which the best sequence algorithm for selecting translations is used with the Cosine coefficient, is dominant, and that the PRF method shows comparable high search performance, although statistical tests did not sufficiently support these conclusions. Furthermore, we repeat the same experiments for the case of French to Italian (pivot) and English to Italian (non-pivot) searches on the same CLEF 2003 test collection in order to verity our findings. Again, similar results were observed except that the Dice coefficient outperforms slightly the Cosine coefficient in the case of disambiguation based on term co-occurrence for English to Italian searches.
  7. Li, Q.; Chen, Y.P.; Myaeng, S.-H.; Jin, Y.; Kang, B.-Y.: Concept unification of terms in different languages via web mining for Information Retrieval (2009) 0.01
    0.011183213 = product of:
      0.03354964 = sum of:
        0.03354964 = weight(_text_:search in 4215) [ClassicSimilarity], result of:
          0.03354964 = score(doc=4215,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.19200584 = fieldWeight in 4215, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4215)
      0.33333334 = coord(1/3)
    
    Abstract
    For historical and cultural reasons, English phrases, especially proper nouns and new words, frequently appear in Web pages written primarily in East Asian languages such as Chinese, Korean, and Japanese. Although such English terms and their equivalences in these East Asian languages refer to the same concept, they are often erroneously treated as independent index units in traditional Information Retrieval (IR). This paper describes the degree to which the problem arises in IR and proposes a novel technique to solve it. Our method first extracts English terms from native Web documents in an East Asian language, and then unifies the extracted terms and their equivalences in the native language as one index unit. For Cross-Language Information Retrieval (CLIR), one of the major hindrances to achieving retrieval performance at the level of Mono-Lingual Information Retrieval (MLIR) is the translation of terms in search queries which can not be found in a bilingual dictionary. The Web mining approach proposed in this paper for concept unification of terms in different languages can also be applied to solve this well-known challenge in CLIR. Experimental results based on NTCIR and KT-Set test collections show that the high translation precision of our approach greatly improves performance of both Mono-Lingual and Cross-Language Information Retrieval.
  8. Blair, D.C.: Information retrieval and the philosophy of language (2002) 0.01
    0.0089465715 = product of:
      0.026839713 = sum of:
        0.026839713 = weight(_text_:search in 4283) [ClassicSimilarity], result of:
          0.026839713 = score(doc=4283,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.15360467 = fieldWeight in 4283, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.03125 = fieldNorm(doc=4283)
      0.33333334 = coord(1/3)
    
    Abstract
    Information retrieval - the retrieval, primarily, of documents or textual material - is fundamentally a linguistic process. At the very least we must describe what we want and match that description with descriptions of the information that is available to us. Furthermore, when we describe what we want, we must mean something by that description. This is a deceptively simple act, but such linguistic events have been the grist for philosophical analysis since Aristotle. Although there are complexities involved in referring to authors, document types, or other categories of information retrieval context, here I wish to focus an one of the most problematic activities in information retrieval: the description of the intellectual content of information items. And even though I take information retrieval to involve the description and retrieval of written text, what I say here is applicable to any information item whose intellectual content can be described for retrieval-books, documents, images, audio clips, video clips, scientific specimens, engineering schematics, and so forth. For convenience, though, I will refer only to the description and retrieval of documents. The description of intellectual content can go wrong in many obvious ways. We may describe what we want incorrectly; we may describe it correctly but in such general terms that its description is useless for retrieval; or we may describe what we want correctly, but misinterpret the descriptions of available information, and thereby match our description of what we want incorrectly. From a linguistic point of view, we can be misunderstood in the process of retrieval in many ways. Because the philosophy of language deals specifically with how we are understood and mis-understood, it should have some use for understanding the process of description in information retrieval. First, however, let us examine more closely the kinds of misunderstandings that can occur in information retrieval. We use language in searching for information in two principal ways. We use it to describe what we want and to discriminate what we want from other information that is available to us but that we do not want. Description and discrimination together articulate the goals of the information search process; they also delineate the two principal ways in which language can fail us in this process. Van Rijsbergen (1979) was the first to make this distinction, calling them "representation" and "discrimination.""
  9. Hoenkamp, E.; Bruza, P.D.; Song, D.; Huang, Q.: ¬An effective approach to verbose queries using a limited dependencies language model (2009) 0.01
    0.0089465715 = product of:
      0.026839713 = sum of:
        0.026839713 = weight(_text_:search in 2122) [ClassicSimilarity], result of:
          0.026839713 = score(doc=2122,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.15360467 = fieldWeight in 2122, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.03125 = fieldNorm(doc=2122)
      0.33333334 = coord(1/3)
    
    Abstract
    Intuitively, any 'bag of words' approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document's initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur's search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.
  10. Hammwöhner, R.: TransRouter revisited : Decision support in the routing of translation projects (2000) 0.01
    0.007946501 = product of:
      0.0238395 = sum of:
        0.0238395 = product of:
          0.047679 = sum of:
            0.047679 = weight(_text_:22 in 5483) [ClassicSimilarity], result of:
              0.047679 = score(doc=5483,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.2708308 = fieldWeight in 5483, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5483)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    10.12.2000 18:22:35
  11. Schneider, J.W.; Borlund, P.: ¬A bibliometric-based semiautomatic approach to identification of candidate thesaurus terms : parsing and filtering of noun phrases from citation contexts (2005) 0.01
    0.007946501 = product of:
      0.0238395 = sum of:
        0.0238395 = product of:
          0.047679 = sum of:
            0.047679 = weight(_text_:22 in 156) [ClassicSimilarity], result of:
              0.047679 = score(doc=156,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.2708308 = fieldWeight in 156, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=156)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    8. 3.2007 19:55:22
  12. Paolillo, J.C.: Linguistics and the information sciences (2009) 0.01
    0.007946501 = product of:
      0.0238395 = sum of:
        0.0238395 = product of:
          0.047679 = sum of:
            0.047679 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
              0.047679 = score(doc=3840,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.2708308 = fieldWeight in 3840, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3840)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    27. 8.2011 14:22:33
  13. Schneider, R.: Web 3.0 ante portas? : Integration von Social Web und Semantic Web (2008) 0.01
    0.007946501 = product of:
      0.0238395 = sum of:
        0.0238395 = product of:
          0.047679 = sum of:
            0.047679 = weight(_text_:22 in 4184) [ClassicSimilarity], result of:
              0.047679 = score(doc=4184,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.2708308 = fieldWeight in 4184, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4184)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    22. 1.2011 10:38:28
  14. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.01
    0.0068112854 = product of:
      0.020433856 = sum of:
        0.020433856 = product of:
          0.040867712 = sum of:
            0.040867712 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
              0.040867712 = score(doc=4436,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.23214069 = fieldWeight in 4436, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4436)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    16. 2.2000 14:22:39
  15. Sienel, J.; Weiss, M.; Laube, M.: Sprachtechnologien für die Informationsgesellschaft des 21. Jahrhunderts (2000) 0.01
    0.0056760716 = product of:
      0.017028214 = sum of:
        0.017028214 = product of:
          0.03405643 = sum of:
            0.03405643 = weight(_text_:22 in 5557) [ClassicSimilarity], result of:
              0.03405643 = score(doc=5557,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.19345059 = fieldWeight in 5557, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5557)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    26.12.2000 13:22:17
  16. Schürmann, H.: Software scannt Radio- und Fernsehsendungen : Recherche in Nachrichtenarchiven erleichtert (2001) 0.00
    0.0039732503 = product of:
      0.01191975 = sum of:
        0.01191975 = product of:
          0.0238395 = sum of:
            0.0238395 = weight(_text_:22 in 5759) [ClassicSimilarity], result of:
              0.0238395 = score(doc=5759,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.1354154 = fieldWeight in 5759, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=5759)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Handelsblatt. Nr.79 vom 24.4.2001, S.22
  17. Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.00
    0.0039732503 = product of:
      0.01191975 = sum of:
        0.01191975 = product of:
          0.0238395 = sum of:
            0.0238395 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
              0.0238395 = score(doc=1616,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.1354154 = fieldWeight in 1616, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=1616)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.
  18. Melzer, C.: ¬Der Maschine anpassen : PC-Spracherkennung - Programme sind mittlerweile alltagsreif (2005) 0.00
    0.0039732503 = product of:
      0.01191975 = sum of:
        0.01191975 = product of:
          0.0238395 = sum of:
            0.0238395 = weight(_text_:22 in 4044) [ClassicSimilarity], result of:
              0.0238395 = score(doc=4044,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.1354154 = fieldWeight in 4044, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=4044)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    3. 5.1997 8:44:22

Languages

  • e 32
  • d 6