Search (38 results, page 2 of 2)

Galvez, C.; Moya-Anegón, F. de; Solana, V.H.: Term conflation methods in information retrieval : non-linguistic and linguistic approaches (2005) 0.01
```
0.0053969487 = product of:
  0.021587795 = sum of:
    0.021587795 = product of:
      0.04317559 = sum of:
        0.04317559 = weight(_text_:design in 4394) [ClassicSimilarity], result of:
          0.04317559 = score(doc=4394,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.24924651 = fieldWeight in 4394, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046875 = fieldNorm(doc=4394)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Purpose - To propose a categorization of the different conflation procedures at the two basic approaches, non-linguistic and linguistic techniques, and to justify the application of normalization methods within the framework of linguistic techniques. Design/methodology/approach - Presents a range of term conflation methods, that can be used in information retrieval. The uniterm and multiterm variants can be considered equivalent units for the purposes of automatic indexing. Stemming algorithms, segmentation rules, association measures and clustering techniques are well evaluated non-linguistic methods, and experiments with these techniques show a wide variety of results. Alternatively, the lemmatisation and the use of syntactic pattern-matching, through equivalence relations represented in finite-state transducers (FST), are emerging methods for the recognition and standardization of terms. Findings - The survey attempts to point out the positive and negative effects of the linguistic approach and its potential as a term conflation method. Originality/value - Outlines the importance of FSTs for the normalization of term variants.
Galvez, C.; Moya-Anegón, F. de: ¬An evaluation of conflation accuracy using finite-state transducers (2006) 0.01
```
0.0053969487 = product of:
  0.021587795 = sum of:
    0.021587795 = product of:
      0.04317559 = sum of:
        0.04317559 = weight(_text_:design in 5599) [ClassicSimilarity], result of:
          0.04317559 = score(doc=5599,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.24924651 = fieldWeight in 5599, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046875 = fieldNorm(doc=5599)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Purpose - To evaluate the accuracy of conflation methods based on finite-state transducers (FSTs). Design/methodology/approach - Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm. Findings - The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms. Originality/value - The report outlines the potential of transducers in their application to normalization processes.
Santana Suárez, O.; Carreras Riudavets, F.J.; Hernández Figueroa, Z.; González Cabrera, A.C.: Integration of an XML electronic dictionary with linguistic tools for natural language processing (2007) 0.01
```
0.0053969487 = product of:
  0.021587795 = sum of:
    0.021587795 = product of:
      0.04317559 = sum of:
        0.04317559 = weight(_text_:design in 921) [ClassicSimilarity], result of:
          0.04317559 = score(doc=921,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.24924651 = fieldWeight in 921, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046875 = fieldNorm(doc=921)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

This study proposes the codification of lexical information in electronic dictionaries, in accordance with a generic and extendable XML scheme model, and its conjunction with linguistic tools for the processing of natural language. Our approach is different from other similar studies in that we propose XML coding of those items from a dictionary of meanings that are less related to the lexical units. Linguistic information, such as morphology, syllables, phonology, etc., will be included by means of specific linguistic tools. The use of XML as a container for the information allows the use of other XML tools for carrying out searches or for enabling presentation of the information in different resources. This model is particularly important as it combines two parallel paradigms-extendable labelling of documents and computational linguistics-and it is also applicable to other languages. We have included a comparison with the labelling proposal of printed dictionaries carried out by the Text Encoding Initiative (TEI). The proposed design has been validated with a dictionary of more than 145 000 accepted meanings.
Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.01
```
0.0053969487 = product of:
  0.021587795 = sum of:
    0.021587795 = product of:
      0.04317559 = sum of:
        0.04317559 = weight(_text_:design in 2342) [ClassicSimilarity], result of:
          0.04317559 = score(doc=2342,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.24924651 = fieldWeight in 2342, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046875 = fieldNorm(doc=2342)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.
Kreymer, O.: ¬An evaluation of help mechanisms in natural language information retrieval systems (2002) 0.01
```
0.0053969487 = product of:
  0.021587795 = sum of:
    0.021587795 = product of:
      0.04317559 = sum of:
        0.04317559 = weight(_text_:design in 2557) [ClassicSimilarity], result of:
          0.04317559 = score(doc=2557,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.24924651 = fieldWeight in 2557, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046875 = fieldNorm(doc=2557)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

The field of natural language processing (NLP) demonstrates rapid changes in the design of information retrieval systems and human-computer interaction. While natural language is being looked on as the most effective tool for information retrieval in a contemporary information environment, the systems using it are only beginning to emerge. This study attempts to evaluate the current state of NLP information retrieval systems from the user's point of view: what techniques are used by these systems to guide their users through the search process? The analysis focused on the structure and components of the systems' help mechanisms. Results of the study demonstrated that systems which claimed to be using natural language searching in fact used a wide range of information retrieval techniques from real natural language processing to Boolean searching. As a result, the user assistance mechanisms of these systems also varied. While pseudo-NLP systems would suit a more traditional method of instruction, real NLP systems primarily utilised the methods of explanation and user-system dialogue.
Kettunen, K.: Reductive and generative approaches to management of morphological variation of keywords in monolingual information retrieval : an overview (2009) 0.01
```
0.0053969487 = product of:
  0.021587795 = sum of:
    0.021587795 = product of:
      0.04317559 = sum of:
        0.04317559 = weight(_text_:design in 2835) [ClassicSimilarity], result of:
          0.04317559 = score(doc=2835,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.24924651 = fieldWeight in 2835, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046875 = fieldNorm(doc=2835)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Purpose - The purpose of this article is to discuss advantages and disadvantages of various means to manage morphological variation of keywords in monolingual information retrieval. Design/methodology/approach - The authors present a compilation of query results from 11 mostly European languages and a new general classification of the language dependent techniques for management of morphological variation. Variants of the different techniques are compared in some detail in terms of retrieval effectiveness and other criteria. The paper consists mainly of an overview of different management methods for keyword variation in information retrieval. Typical IR retrieval results of 11 languages and a new classification for keyword management methods are also presented. Findings - The main results of the paper are an overall comparison of reductive and generative keyword management methods in terms of retrieval effectiveness and other broader criteria. Originality/value - The paper is of value to anyone who wants to get an overall picture of keyword management techniques used in IR.

Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.00

0.0046815826 = product of:
  0.01872633 = sum of:
    0.01872633 = product of:
      0.03745266 = sum of:
        0.03745266 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
          0.03745266 = score(doc=1746,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.23214069 = fieldWeight in 1746, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1746)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 3.2015 9:17:30

Li, W.; Wong, K.-F.; Yuan, C.: Toward automatic Chinese temporal information extraction (2001) 0.00
```
0.0044974573 = product of:
  0.01798983 = sum of:
    0.01798983 = product of:
      0.03597966 = sum of:
        0.03597966 = weight(_text_:design in 6029) [ClassicSimilarity], result of:
          0.03597966 = score(doc=6029,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.20770542 = fieldWeight in 6029, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6029)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Over the past few years, temporal information processing and temporal database management have increasingly become hot topics. Nevertheless, only a few researchers have investigated these areas in the Chinese language. This lays down the objective of our research: to exploit Chinese language processing techniques for temporal information extraction and concept reasoning. In this article, we first study the mechanism for expressing time in Chinese. On the basis of the study, we then design a general frame structure for maintaining the extracted temporal concepts and propose a system for extracting time-dependent information from Hong Kong financial news. In the system, temporal knowledge is represented by different types of temporal concepts (TTC) and different temporal relations, including absolute and relative relations, which are used to correlate between action times and reference times. In analyzing a sentence, the algorithm first determines the situation related to the verb. This in turn will identify the type of temporal concept associated with the verb. After that, the relevant temporal information is extracted and the temporal relations are derived. These relations link relevant concept frames together in chronological order, which in turn provide the knowledge to fulfill users' queries, e.g., for question-answering (i.e., Q&A) applications
Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.00
```
0.0044974573 = product of:
  0.01798983 = sum of:
    0.01798983 = product of:
      0.03597966 = sum of:
        0.03597966 = weight(_text_:design in 4395) [ClassicSimilarity], result of:
          0.03597966 = score(doc=4395,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.20770542 = fieldWeight in 4395, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4395)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Purpose - To show that stem generation compares well with lemmatization as a morphological tool for a highly inflectional language for IR purposes in a best-match retrieval system. Design/methodology/approach - Effects of three different morphological methods - lemmatization, stemming and stem production - for Finnish are compared in a probabilistic IR environment (INQUERY). Evaluation is done using a four-point relevance scale which is partitioned differently in different test settings. Findings - Results show that stem production, a lighter method than morphological lemmatization, compares well with lemmatization in a best-match IR environment. Differences in performance between stem production and lemmatization are small and they are not statistically significant in most of the tested settings. It is also shown that hitherto a rather neglected method of morphological processing for Finnish, stemming, performs reasonably well although the stemmer used - a Porter stemmer implementation - is far from optimal for a morphologically complex language like Finnish. In another series of tests, the effects of compound splitting and derivational expansion of queries are tested. Practical implications - Usefulness of morphological lemmatization and stem generation for IR purposes can be estimated with many factors. On the average P-R level they seem to behave very close to each other in a probabilistic IR system. Thus, the choice of the used method with highly inflectional languages needs to be estimated along other dimensions too. Originality/value - Results are achieved using Finnish as an example of a highly inflectional language. The results are of interest for anyone who is interested in processing of morphological variation of a highly inflected language for IR purposes.
Arsenault, C.: Aggregation consistency and frequency of Chinese words and characters (2006) 0.00
```
0.0044974573 = product of:
  0.01798983 = sum of:
    0.01798983 = product of:
      0.03597966 = sum of:
        0.03597966 = weight(_text_:design in 609) [ClassicSimilarity], result of:
          0.03597966 = score(doc=609,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.20770542 = fieldWeight in 609, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.0390625 = fieldNorm(doc=609)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Purpose - Aims to measure syllable aggregation consistency of Romanized Chinese data in the title fields of bibliographic records. Also aims to verify if the term frequency distributions satisfy conventional bibliometric laws. Design/methodology/approach - Uses Cooper's interindexer formula to evaluate aggregation consistency within and between two sets of Chinese bibliographic data. Compares the term frequency distributions of polysyllabic words and monosyllabic characters (for vernacular and Romanized data) with the Lotka and the generalised Zipf theoretical distributions. The fits are tested with the Kolmogorov-Smirnov test. Findings - Finds high internal aggregation consistency within each data set but some aggregation discrepancy between sets. Shows that word (polysyllabic) distributions satisfy Lotka's law but that character (monosyllabic) distributions do not abide by the law. Research limitations/implications - The findings are limited to only two sets of bibliographic data (for aggregation consistency analysis) and to one set of data for the frequency distribution analysis. Only two bibliometric distributions are tested. Internal consistency within each database remains fairly high. Therefore the main argument against syllable aggregation does not appear to hold true. The analysis revealed that Chinese words and characters behave differently in terms of frequency distribution but that there is no noticeable difference between vernacular and Romanized data. The distribution of Romanized characters exhibits the worst case in terms of fit to either Lotka's or Zipf's laws, which indicates that Romanized data in aggregated form appear to be a preferable option. Originality/value - Provides empirical data on consistency and distribution of Romanized Chinese titles in bibliographic records.
Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.00
```
0.0044974573 = product of:
  0.01798983 = sum of:
    0.01798983 = product of:
      0.03597966 = sum of:
        0.03597966 = weight(_text_:design in 831) [ClassicSimilarity], result of:
          0.03597966 = score(doc=831,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.20770542 = fieldWeight in 831, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.0390625 = fieldNorm(doc=831)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Purpose - The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task. Design/methodology/approach - Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation-based approach was compared with the non-segmentation-based approach. Findings - There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy. Practical implications - Apply the findings to real web text classification is ongoing work. Originality/value - The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.
Jones, I.; Cunliffe, D.; Tudhope, D.: Natural language processing and knowledge organization systems as an aid to retrieval (2004) 0.00
```
0.0044522556 = product of:
  0.017809022 = sum of:
    0.017809022 = product of:
      0.035618044 = sum of:
        0.035618044 = weight(_text_:design in 2677) [ClassicSimilarity], result of:
          0.035618044 = score(doc=2677,freq=4.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.20561787 = fieldWeight in 2677, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2677)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

This paper discusses research that employs methods from Natural Language Processing (NLP) in exploiting the intellectual resources of Knowledge Organization Systems (KOS), particularly in the retrieval of information. A technique for the disambiguation of homographs and nominal compounds in free text, where these are known ambiguous terms in the KOS itself, is described. The use of Roget's Thesaurus as an intermediary in the process is also reported. A short review of the relevant literature in the field is given. Design considerations, results and conclusions are presented from the implementation of a prototype system. The linguistic techniques are applied at two complementary levels, namely an a free text string used as an entry point to the KOS, and an the underlying controlled vocabulary itself.

Content

1. Introduction The need for research into the application of linguistic techniques in Information Retrieval (IR) in general, and a similar need in faceted Knowledge Organization Systems (KOS) has been indicated by various authors. Smeaton (1997) points out the inherent limitations of conventional approaches to IR based an "bags of words", mainly difficulties caused by lexical ambiguity in the words concerned, and goes an to suggest the possibility of using Natural Language Processing (NLP) in query formulation. Past experience with a faceted retrieval system highlighted the need for integrating the linguistic perspective in order to fully utilise the potential of a KOS (Tudhope et al." 2002). The present research seeks to address some of these needs in using NLP to improve the efficacy of KOS tools in query and retrieval systems. Syntactic parsing and part-of-speech tagging can substantially reduce lexical ambiguity through homograph disambiguation. Given the two strings "1 fable the motion" and "I put the motion an the fable", for instance, the parser used in this research clearly indicates that 'fable' in the first string is a verb, while 'table' in the second string is a noun, a distinction that would be missed in the "bag of words" approach. This syntactic disambiguation enables a more precise matching from free text to the controlled vocabulary of a KOS and vice versa. The use of a general linguistic resource, namely Roget's Thesaurus of English Words and Phrases (RTEWP), as an intermediary in this process, is investigated. The adaptation of the Link parser (Sleator & Temperley, 1993) to the purposes of the research is reported. The design and implementation of the early practical stages of the project are described, and the results of the initial experiments are presented and evaluated. Applications of the techniques developed are foreseen in the areas of query disambiguation, information retrieval and automatic indexing. In the first section of the paper a brief review of the literature and relevant current work in the field is presented. The second section includes reports an the development of algorithms, the construction of data sets and theoretical and experimental work undertaken to date. The third section evaluates the results obtained, and outlines directions for future research.

Sienel, J.; Weiss, M.; Laube, M.: Sprachtechnologien für die Informationsgesellschaft des 21. Jahrhunderts (2000) 0.00

0.003901319 = product of:
  0.015605276 = sum of:
    0.015605276 = product of:
      0.031210553 = sum of:
        0.031210553 = weight(_text_:22 in 5557) [ClassicSimilarity], result of:
          0.031210553 = score(doc=5557,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.19345059 = fieldWeight in 5557, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5557)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 26.12.2000 13:22:17

Pinker, S.: Wörter und Regeln : Die Natur der Sprache (2000) 0.00

0.003901319 = product of:
  0.015605276 = sum of:
    0.015605276 = product of:
      0.031210553 = sum of:
        0.031210553 = weight(_text_:22 in 734) [ClassicSimilarity], result of:
          0.031210553 = score(doc=734,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.19345059 = fieldWeight in 734, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=734)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 19. 7.2002 14:22:31

Computational linguistics for the new millennium : divergence or synergy? Proceedings of the International Symposium held at the Ruprecht-Karls Universität Heidelberg, 21-22 July 2000. Festschrift in honour of Peter Hellwig on the occasion of his 60th birthday (2002) 0.00

0.003901319 = product of:
  0.015605276 = sum of:
    0.015605276 = product of:
      0.031210553 = sum of:
        0.031210553 = weight(_text_:22 in 4900) [ClassicSimilarity], result of:
          0.031210553 = score(doc=4900,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.19345059 = fieldWeight in 4900, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4900)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Schürmann, H.: Software scannt Radio- und Fernsehsendungen : Recherche in Nachrichtenarchiven erleichtert (2001) 0.00

0.0027309232 = product of:
  0.010923693 = sum of:
    0.010923693 = product of:
      0.021847386 = sum of:
        0.021847386 = weight(_text_:22 in 5759) [ClassicSimilarity], result of:
          0.021847386 = score(doc=5759,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.1354154 = fieldWeight in 5759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=5759)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Handelsblatt. Nr.79 vom 24.4.2001, S.22

Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.00
```
0.0027309232 = product of:
  0.010923693 = sum of:
    0.010923693 = product of:
      0.021847386 = sum of:
        0.021847386 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
          0.021847386 = score(doc=1616,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.1354154 = fieldWeight in 1616, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1616)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.

Melzer, C.: ¬Der Maschine anpassen : PC-Spracherkennung - Programme sind mittlerweile alltagsreif (2005) 0.00

0.0027309232 = product of:
  0.010923693 = sum of:
    0.010923693 = product of:
      0.021847386 = sum of:
        0.021847386 = weight(_text_:22 in 4044) [ClassicSimilarity], result of:
          0.021847386 = score(doc=4044,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.1354154 = fieldWeight in 4044, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=4044)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 3. 5.1997 8:44:22

Search (38 results, page 2 of 2)

Authors

Languages

Types

Themes

Subjects