Search (189 results, page 1 of 10)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.12

0.11659854 = product of:
  0.29149634 = sum of:
    0.24901254 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
      0.24901254 = score(doc=562,freq=2.0), product of:
        0.4430686 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.052260913 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.042483795 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
      0.042483795 = score(doc=562,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.23214069 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
  0.4 = coord(2/5)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.11
```
0.11119729 = product of:
  0.27799323 = sum of:
    0.24901254 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
      0.24901254 = score(doc=862,freq=2.0), product of:
        0.4430686 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.052260913 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.028980678 = weight(_text_:it in 862) [ClassicSimilarity], result of:
      0.028980678 = score(doc=862,freq=2.0), product of:
        0.15115225 = queryWeight, product of:
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.052260913 = queryNorm
        0.19173169 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
  0.4 = coord(2/5)
```
Abstract

This research revisits the classic Turing test and compares recent large language models such as ChatGPT for their abilities to reproduce human-level comprehension and compelling text generation. Two task challenges- summary and question answering- prompt ChatGPT to produce original content (98-99%) from a single text entry and sequential questions initially posed by Turing in 1950. We score the original and generated content against the OpenAI GPT-2 Output Detector from 2019, and establish multiple cases where the generated content proves original and undetectable (98%). The question of a machine fooling a human judge recedes in this work relative to the question of "how would one prove it?" The original contribution of the work presents a metric and simple grammatical set for understanding the writing mechanics of chatbots in evaluating their readability and statistical clarity, engagement, delivery, overall quality, and plagiarism risks. While Turing's original prose scores at least 14% below the machine-generated output, whether an algorithm displays hints of Turing's true initial thoughts (the "Lovelace 2.0" test) remains unanswerable.

Source

https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Schwarz, C.: THESYS: Thesaurus Syntax System : a fully automatic thesaurus building aid (1988) 0.04

0.038952045 = product of:
  0.09738011 = sum of:
    0.04781568 = weight(_text_:it in 1361) [ClassicSimilarity], result of:
      0.04781568 = score(doc=1361,freq=4.0), product of:
        0.15115225 = queryWeight, product of:
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.052260913 = queryNorm
        0.31634116 = fieldWeight in 1361, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1361)
    0.04956443 = weight(_text_:22 in 1361) [ClassicSimilarity], result of:
      0.04956443 = score(doc=1361,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.2708308 = fieldWeight in 1361, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1361)
  0.4 = coord(2/5)

Abstract: THESYS is based on the natural language processing of free-text databases. It yields statistically evaluated correlations between words of the database. These correlations correspond to traditional thesaurus relations. The person who has to build a thesaurus is thus assisted by the proposals made by THESYS. THESYS is being tested on commercial databases under real world conditions. It is part of a text processing project at Siemens, called TINA (Text-Inhalts-Analyse). Software from TINA is actually being applied and evaluated by the US Department of Commerce for patent search and indexing (REALIST: REtrieval Aids by Linguistics and STatistics)
Date: 6. 1.1999 10:22:07

Paolillo, J.C.: Linguistics and the information sciences (2009) 0.04

0.038952045 = product of:
  0.09738011 = sum of:
    0.04781568 = weight(_text_:it in 3840) [ClassicSimilarity], result of:
      0.04781568 = score(doc=3840,freq=4.0), product of:
        0.15115225 = queryWeight, product of:
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.052260913 = queryNorm
        0.31634116 = fieldWeight in 3840, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3840)
    0.04956443 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
      0.04956443 = score(doc=3840,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.2708308 = fieldWeight in 3840, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3840)
  0.4 = coord(2/5)

Abstract: Linguistics is the scientific study of language which emphasizes language spoken in everyday settings by human beings. It has a long history of interdisciplinarity, both internally and in contribution to other fields, including information science. A linguistic perspective is beneficial in many ways in information science, since it examines the relationship between the forms of meaningful expressions and their social, cognitive, institutional, and communicative context, these being two perspectives on information that are actively studied, to different degrees, in information science. Examples of issues relevant to information science are presented for which the approach taken under a linguistic perspective is illustrated.
Date: 27. 8.2011 14:22:33

Rahmstorf, G.: Concept structures for large vocabularies (1998) 0.03

0.02858579 = product of:
  0.07146447 = sum of:
    0.028980678 = weight(_text_:it in 75) [ClassicSimilarity], result of:
      0.028980678 = score(doc=75,freq=2.0), product of:
        0.15115225 = queryWeight, product of:
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.052260913 = queryNorm
        0.19173169 = fieldWeight in 75, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.046875 = fieldNorm(doc=75)
    0.042483795 = weight(_text_:22 in 75) [ClassicSimilarity], result of:
      0.042483795 = score(doc=75,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.23214069 = fieldWeight in 75, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=75)
  0.4 = coord(2/5)

Abstract: A technology is described which supports the acquisition, visualisation and manipulation of large vocabularies with associated structures. It is used for dictionary production, terminology data bases, thesauri, library classification systems etc. Essential features of the technology are a lexicographic user interface, variable word description, unlimited list of word readings, a concept language, automatic transformations of formulas into graphic structures, structure manipulation operations and retransformation into formulas. The concept language includes notations for undefined concepts. The structure of defined concepts can be constructed interactively. The technology supports the generation of large vocabularies with structures representing word senses. Concept structures and ordering systems for indexing and retrieval can be constructed separately and connected by associating relations.
Date: 30.12.2001 19:01:22

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.03
```
0.02858579 = product of:
  0.07146447 = sum of:
    0.028980678 = weight(_text_:it in 563) [ClassicSimilarity], result of:
      0.028980678 = score(doc=563,freq=2.0), product of:
        0.15115225 = queryWeight, product of:
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.052260913 = queryNorm
        0.19173169 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.042483795 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
      0.042483795 = score(doc=563,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.23214069 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
  0.4 = coord(2/5)
```
Abstract

In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.

Date

10. 1.2013 19:22:47
Computational linguistics for the new millennium : divergence or synergy? Proceedings of the International Symposium held at the Ruprecht-Karls Universität Heidelberg, 21-22 July 2000. Festschrift in honour of Peter Hellwig on the occasion of his 60th birthday (2002) 0.03
```
0.027822888 = product of:
  0.06955722 = sum of:
    0.034154054 = weight(_text_:it in 4900) [ClassicSimilarity], result of:
      0.034154054 = score(doc=4900,freq=4.0), product of:
        0.15115225 = queryWeight, product of:
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.052260913 = queryNorm
        0.22595796 = fieldWeight in 4900, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4900)
    0.035403162 = weight(_text_:22 in 4900) [ClassicSimilarity], result of:
      0.035403162 = score(doc=4900,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.19345059 = fieldWeight in 4900, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4900)
  0.4 = coord(2/5)
```
Content

Contents: Manfred Klenner / Henriette Visser: Introduction - Khurshid Ahmad: Writing Linguistics: When I use a word it means what I choose it to mean - Jürgen Handke: 2000 and Beyond: The Potential of New Technologies in Linguistics - Jurij Apresjan / Igor Boguslavsky / Leonid Iomdin / Leonid Tsinman: Lexical Functions in NU: Possible Uses - Hubert Lehmann: Practical Machine Translation and Linguistic Theory - Karin Haenelt: A Contextbased Approach towards Content Processing of Electronic Documents - Petr Sgall / Eva Hajicová: Are Linguistic Frameworks Comparable? - Wolfgang Menzel: Theory and Applications in Computational Linguistics - Is there Common Ground? - Robert Porzel / Michael Strube: Towards Context-adaptive Natural Language Processing Systems - Nicoletta Calzolari: Language Resources in a Multilingual Setting: The European Perspective - Piek Vossen: Computational Linguistics for Theory and Practice.
Deventer, J.P. van; Kruger, C.J.; Johnson, R.D.: Delineating knowledge management through lexical analysis : a retrospective (2015) 0.03
```
0.025033532 = product of:
  0.06258383 = sum of:
    0.037801612 = weight(_text_:it in 3807) [ClassicSimilarity], result of:
      0.037801612 = score(doc=3807,freq=10.0), product of:
        0.15115225 = queryWeight, product of:
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.052260913 = queryNorm
        0.25008965 = fieldWeight in 3807, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3807)
    0.024782214 = weight(_text_:22 in 3807) [ClassicSimilarity], result of:
      0.024782214 = score(doc=3807,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.1354154 = fieldWeight in 3807, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3807)
  0.4 = coord(2/5)
```
Abstract

Purpose Academic authors tend to define terms that meet their own needs. Knowledge Management (KM) is a term that comes to mind and is examined in this study. Lexicographical research identified KM terms used by authors from 1996 to 2006 in academic outlets to define KM. Data were collected based on strict criteria which included that definitions should be unique instances. From 2006 onwards, these authors could not identify new unique instances of definitions with repetitive usage of such definition instances. Analysis revealed that KM is directly defined by People (Person and Organisation), Processes (Codify, Share, Leverage, and Process) and Contextualised Content (Information). The paper aims to discuss these issues. Design/methodology/approach The aim of this paper is to add to the body of knowledge in the KM discipline and supply KM practitioners and scholars with insight into what is commonly regarded to be KM so as to reignite the debate on what one could consider as KM. The lexicon used by KM scholars was evaluated though the application of lexicographical research methods as extended though Knowledge Discovery and Text Analysis methods. Findings By simplifying term relationships through the application of lexicographical research methods, as extended though Knowledge Discovery and Text Analysis methods, it was found that KM is directly defined by People (Person and Organisation), Processes (Codify, Share, Leverage, Process) and Contextualised Content (Information). One would therefore be able to indicate that KM, from an academic point of view, refers to people processing contextualised content.
Research limitations/implications In total, 42 definitions were identified spanning a period of 11 years. This represented the first use of KM through the estimated apex of terms used. From 2006 onwards definitions were used in repetition, and all definitions that were considered to repeat were therefore subsequently excluded as not being unique instances. All definitions listed are by no means complete and exhaustive. The definitions are viewed outside the scope and context in which they were originally formulated and then used to review the key concepts in the definitions themselves. Social implications When the authors refer to the aforementioned discussion of KM content as well as the presentation of the method followed in this paper, the authors may have a few implications for future research in KM. First the research validates ideas presented by the OECD in 2005 pertaining to KM. It also validates that through the evolution of KM, the authors ended with a description of KM that may be seen as a standardised description. If the authors as academics and practitioners, for example, refer to KM as the same construct and/or idea, it has the potential to speculatively, distinguish between what KM may or may not be. Originality/value By simplifying the term used to define KM, by focusing on the most common definitions, the paper assist in refocusing KM by reconsidering the dimensions that is the most common in how it has been defined over time. This would hopefully assist in reigniting discussions about KM and how it may be used to the benefit of an organisation.

Date

20. 1.2015 18:30:22
Fóris, A.: Network theory and terminology (2013) 0.02
```
0.023821492 = product of:
  0.059553728 = sum of:
    0.024150565 = weight(_text_:it in 1365) [ClassicSimilarity], result of:
      0.024150565 = score(doc=1365,freq=2.0), product of:
        0.15115225 = queryWeight, product of:
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.052260913 = queryNorm
        0.15977642 = fieldWeight in 1365, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1365)
    0.035403162 = weight(_text_:22 in 1365) [ClassicSimilarity], result of:
      0.035403162 = score(doc=1365,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.19345059 = fieldWeight in 1365, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1365)
  0.4 = coord(2/5)
```
Abstract

The paper aims to present the relations of network theory and terminology. The model of scale-free networks, which has been recently developed and widely applied since, can be effectively used in terminology research as well. Operation based on the principle of networks is a universal characteristic of complex systems. Networks are governed by general laws. The model of scale-free networks can be viewed as a statistical-probability model, and it can be described with mathematical tools. Its main feature is that "everything is connected to everything else," that is, every node is reachable (in a few steps) starting from any other node; this phenomena is called "the small world phenomenon." The existence of a linguistic network and the general laws of the operation of networks enable us to place issues of language use in the complex system of relations that reveal the deeper connection s between phenomena with the help of networks embedded in each other. The realization of the metaphor that language also has a network structure is the basis of the classification methods of the terminological system, and likewise of the ways of creating terminology databases, which serve the purpose of providing easy and versatile accessibility to specialised knowledge.

Date

2. 9.2014 21:22:48

Warner, A.J.: Natural language processing (1987) 0.02

0.022658026 = product of:
  0.11329012 = sum of:
    0.11329012 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
      0.11329012 = score(doc=337,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.61904186 = fieldWeight in 337, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.125 = fieldNorm(doc=337)
  0.2 = coord(1/5)

Source: Annual review of information science and technology. 22(1987), S.79-108

McMahon, J.G.; Smith, F.J.: Improved statistical language model performance with automatic generated word hierarchies (1996) 0.02

0.019825771 = product of:
  0.09912886 = sum of:
    0.09912886 = weight(_text_:22 in 3164) [ClassicSimilarity], result of:
      0.09912886 = score(doc=3164,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.5416616 = fieldWeight in 3164, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.109375 = fieldNorm(doc=3164)
  0.2 = coord(1/5)

Source: Computational linguistics. 22(1996) no.2, S.217-248

Ruge, G.: ¬A spreading activation network for automatic generation of thesaurus relationships (1991) 0.02

0.019825771 = product of:
  0.09912886 = sum of:
    0.09912886 = weight(_text_:22 in 4506) [ClassicSimilarity], result of:
      0.09912886 = score(doc=4506,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.5416616 = fieldWeight in 4506, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.109375 = fieldNorm(doc=4506)
  0.2 = coord(1/5)

Date: 8.10.2000 11:52:22

Somers, H.: Example-based machine translation : Review article (1999) 0.02

0.019825771 = product of:
  0.09912886 = sum of:
    0.09912886 = weight(_text_:22 in 6672) [ClassicSimilarity], result of:
      0.09912886 = score(doc=6672,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.5416616 = fieldWeight in 6672, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.109375 = fieldNorm(doc=6672)
  0.2 = coord(1/5)

Date: 31. 7.1996 9:22:19

New tools for human translators (1997) 0.02

0.019825771 = product of:
  0.09912886 = sum of:
    0.09912886 = weight(_text_:22 in 1179) [ClassicSimilarity], result of:
      0.09912886 = score(doc=1179,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.5416616 = fieldWeight in 1179, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.109375 = fieldNorm(doc=1179)
  0.2 = coord(1/5)

Date: 31. 7.1996 9:22:19

Baayen, R.H.; Lieber, H.: Word frequency distributions and lexical semantics (1997) 0.02

0.019825771 = product of:
  0.09912886 = sum of:
    0.09912886 = weight(_text_:22 in 3117) [ClassicSimilarity], result of:
      0.09912886 = score(doc=3117,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.5416616 = fieldWeight in 3117, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.109375 = fieldNorm(doc=3117)
  0.2 = coord(1/5)

Date: 28. 2.1999 10:48:22

¬Der Student aus dem Computer (2023) 0.02

0.019825771 = product of:
  0.09912886 = sum of:
    0.09912886 = weight(_text_:22 in 1079) [ClassicSimilarity], result of:
      0.09912886 = score(doc=1079,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.5416616 = fieldWeight in 1079, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.109375 = fieldNorm(doc=1079)
  0.2 = coord(1/5)

Date: 27. 1.2023 16:22:55

Byrne, C.C.; McCracken, S.A.: ¬An adaptive thesaurus employing semantic distance, relational inheritance and nominal compound interpretation for linguistic support of information retrieval (1999) 0.02

0.016993519 = product of:
  0.08496759 = sum of:
    0.08496759 = weight(_text_:22 in 4483) [ClassicSimilarity], result of:
      0.08496759 = score(doc=4483,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.46428138 = fieldWeight in 4483, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.09375 = fieldNorm(doc=4483)
  0.2 = coord(1/5)

Date: 15. 3.2000 10:22:37

Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.02

0.016993519 = product of:
  0.08496759 = sum of:
    0.08496759 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
      0.08496759 = score(doc=4888,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.46428138 = fieldWeight in 4888, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.09375 = fieldNorm(doc=4888)
  0.2 = coord(1/5)

Date: 1. 3.2013 14:56:22

Monnerjahn, P.: Vorsprung ohne Technik : Übersetzen: Computer und Qualität (2000) 0.02

0.016993519 = product of:
  0.08496759 = sum of:
    0.08496759 = weight(_text_:22 in 5429) [ClassicSimilarity], result of:
      0.08496759 = score(doc=5429,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.46428138 = fieldWeight in 5429, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.09375 = fieldNorm(doc=5429)
  0.2 = coord(1/5)

Source: c't. 2000, H.22, S.230-231

Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.02
```
0.016675044 = product of:
  0.041687608 = sum of:
    0.016905395 = weight(_text_:it in 1616) [ClassicSimilarity], result of:
      0.016905395 = score(doc=1616,freq=2.0), product of:
        0.15115225 = queryWeight, product of:
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.052260913 = queryNorm
        0.11184349 = fieldWeight in 1616, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.892262 = idf(docFreq=6664, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1616)
    0.024782214 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
      0.024782214 = score(doc=1616,freq=2.0), product of:
        0.18300882 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.052260913 = queryNorm
        0.1354154 = fieldWeight in 1616, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1616)
  0.4 = coord(2/5)
```
Abstract

The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.

Search (189 results, page 1 of 10)

Authors

Years

Languages

Types

Themes

Subjects

Classifications