Search (21 results, page 1 of 2)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.10

0.10109107 = sum of:
  0.080492064 = product of:
    0.24147618 = sum of:
      0.24147618 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.24147618 = score(doc=562,freq=2.0), product of:
          0.42965913 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.050679237 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.020599011 = product of:
    0.041198023 = sum of:
      0.041198023 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.041198023 = score(doc=562,freq=2.0), product of:
          0.17747006 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050679237 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.5 = coord(1/2)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Humphrey, S.M.; Rogers, W.J.; Kilicoglu, H.; Demner-Fushman, D.; Rindflesch, T.C.: Word sense disambiguation by selecting the best semantic type based on journal descriptor indexing : preliminary experiment (2006) 0.03
```
0.0250038 = product of:
  0.0500076 = sum of:
    0.0500076 = product of:
      0.1000152 = sum of:
        0.1000152 = weight(_text_:maps in 4912) [ClassicSimilarity], result of:
          0.1000152 = score(doc=4912,freq=4.0), product of:
            0.28477904 = queryWeight, product of:
              5.619245 = idf(docFreq=435, maxDocs=44218)
              0.050679237 = queryNorm
            0.35120282 = fieldWeight in 4912, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.619245 = idf(docFreq=435, maxDocs=44218)
              0.03125 = fieldNorm(doc=4912)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

An experiment was performed at the National Library of Medicine® (NLM®) in word sense disambiguation (WSD) using the Journal Descriptor Indexing (JDI) methodology. The motivation is the need to solve the ambiguity problem confronting NLM's MetaMap system, which maps free text to terms corresponding to concepts in NLM's Unified Medical Language System® (UMLS®) Metathesaurus®. If the text maps to more than one Metathesaurus concept at the same high confidence score, MetaMap has no way of knowing which concept is the correct mapping. We describe the JDI methodology, which is ultimately based an statistical associations between words in a training set of MEDLINE® citations and a small set of journal descriptors (assigned by humans to journals per se) assumed to be inherited by the citations. JDI is the basis for selecting the best meaning that is correlated to UMLS semantic types (STs) assigned to ambiguous concepts in the Metathesaurus. For example, the ambiguity transport has two meanings: "Biological Transport" assigned the ST Cell Function and "Patient transport" assigned the ST Health Care Activity. A JDI-based methodology can analyze text containing transport and determine which ST receives a higher score for that text, which then returns the associated meaning, presumed to apply to the ambiguity itself. We then present an experiment in which a baseline disambiguation method was compared to four versions of JDI in disambiguating 45 ambiguous strings from NLM's WSD Test Collection. Overall average precision for the highest-scoring JDI version was 0.7873 compared to 0.2492 for the baseline method, and average precision for individual ambiguities was greater than 0.90 for 23 of them (51%), greater than 0.85 for 24 (53%), and greater than 0.65 for 35 (79%). On the basis of these results, we hope to improve performance of JDI and test its use in applications.
Witschel, H.F.: Terminologie-Extraktion : Möglichkeiten der Kombination statistischer uns musterbasierter Verfahren (2004) 0.02
```
0.022100445 = product of:
  0.04420089 = sum of:
    0.04420089 = product of:
      0.08840178 = sum of:
        0.08840178 = weight(_text_:maps in 123) [ClassicSimilarity], result of:
          0.08840178 = score(doc=123,freq=2.0), product of:
            0.28477904 = queryWeight, product of:
              5.619245 = idf(docFreq=435, maxDocs=44218)
              0.050679237 = queryNorm
            0.31042236 = fieldWeight in 123, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.619245 = idf(docFreq=435, maxDocs=44218)
              0.0390625 = fieldNorm(doc=123)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Die Suche nach Informationen in unstrukturierten natürlichsprachlichen Daten ist Gegenstand des sogenannten Text Mining. In dieser Arbeit wird ein Teilgebiet des Text Mining beleuchtet, nämlich die Extraktion domänenspezifischer Fachbegriffe aus Fachtexten der jeweiligen Domäne. Wofür überhaupt Terminologie-Extraktion? Die Antwort darauf ist einfach: der Schlüssel zum Verständnis vieler Fachgebiete liegt in der Kenntnis der zugehörigen Terminologie. Natürlich genügt es nicht, nur eine Liste der Fachtermini einer Domäne zu kennen, um diese zu durchdringen. Eine solche Liste ist aber eine wichtige Voraussetzung für die Erstellung von Fachwörterbüchern (man denke z.B. an Nachschlagewerke wie das klinische Wörterbuch "Pschyrembel"): zunächst muß geklärt werden, welche Begriffe in das Wörterbuch aufgenommen werden sollen, bevor man sich Gedanken um die genaue Definition der einzelnen Termini machen kann. Ein Fachwörterbuch sollte genau diejenigen Begriffe einer Domäne beinhalten, welche Gegenstand der Forschung in diesem Gebiet sind oder waren. Was liegt also näher, als entsprechende Fachliteratur zu betrachten und das darin enthaltene Wissen in Form von Fachtermini zu extrahieren? Darüberhinaus sind weitere Anwendungen der Terminologie-Extraktion denkbar, wie z.B. die automatische Beschlagwortung von Texten oder die Erstellung sogenannter Topic Maps, welche wichtige Begriffe zu einem Thema darstellt und in Beziehung setzt. Es muß also zunächst die Frage geklärt werden, was Terminologie eigentlich ist, vor allem aber werden verschiedene Methoden entwickelt, welche die Eigenschaften von Fachtermini ausnutzen, um diese aufzufinden. Die Verfahren werden aus den linguistischen und 'statistischen' Charakteristika von Fachbegriffen hergeleitet und auf geeignete Weise kombiniert.
Witschel, H.F.: Terminology extraction and automatic indexing : comparison and qualitative evaluation of methods (2005) 0.02
```
0.022100445 = product of:
  0.04420089 = sum of:
    0.04420089 = product of:
      0.08840178 = sum of:
        0.08840178 = weight(_text_:maps in 1842) [ClassicSimilarity], result of:
          0.08840178 = score(doc=1842,freq=2.0), product of:
            0.28477904 = queryWeight, product of:
              5.619245 = idf(docFreq=435, maxDocs=44218)
              0.050679237 = queryNorm
            0.31042236 = fieldWeight in 1842, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.619245 = idf(docFreq=435, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1842)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Many terminology engineering processes involve the task of automatic terminology extraction: before the terminology of a given domain can be modelled, organised or standardised, important concepts (or terms) of this domain have to be identified and fed into terminological databases. These serve in further steps as a starting point for compiling dictionaries, thesauri or maybe even terminological ontologies for the domain. For the extraction of the initial concepts, extraction methods are needed that operate on specialised language texts. On the other hand, many machine learning or information retrieval applications require automatic indexing techniques. In Machine Learning applications concerned with the automatic clustering or classification of texts, often feature vectors are needed that describe the contents of a given text briefly but meaningfully. These feature vectors typically consist of a fairly small set of index terms together with weights indicating their importance. Short but meaningful descriptions of document contents as provided by good index terms are also useful to humans: some knowledge management applications (e.g. topic maps) use them as a set of basic concepts (topics). The author believes that the tasks of terminology extraction and automatic indexing have much in common and can thus benefit from the same set of basic algorithms. It is the goal of this paper to outline some methods that may be used in both contexts, but also to find the discriminating factors between the two tasks that call for the variation of parameters or application of different techniques. The discussion of these methods will be based on statistical, syntactical and especially morphological properties of (index) terms. The paper is concluded by the presentation of some qualitative and quantitative results comparing statistical and morphological methods.

Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.02

0.020599011 = product of:
  0.041198023 = sum of:
    0.041198023 = product of:
      0.082396045 = sum of:
        0.082396045 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
          0.082396045 = score(doc=4888,freq=2.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.46428138 = fieldWeight in 4888, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4888)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 3.2013 14:56:22

Monnerjahn, P.: Vorsprung ohne Technik : Übersetzen: Computer und Qualität (2000) 0.02

0.020599011 = product of:
  0.041198023 = sum of:
    0.041198023 = product of:
      0.082396045 = sum of:
        0.082396045 = weight(_text_:22 in 5429) [ClassicSimilarity], result of:
          0.082396045 = score(doc=5429,freq=2.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.46428138 = fieldWeight in 5429, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5429)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: c't. 2000, H.22, S.230-231

Kuhlmann, U.; Monnerjahn, P.: Sprache auf Knopfdruck : Sieben automatische Übersetzungsprogramme im Test (2000) 0.02

0.017165843 = product of:
  0.034331687 = sum of:
    0.034331687 = product of:
      0.06866337 = sum of:
        0.06866337 = weight(_text_:22 in 5428) [ClassicSimilarity], result of:
          0.06866337 = score(doc=5428,freq=2.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.38690117 = fieldWeight in 5428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=5428)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: c't. 2000, H.22, S.220-229

Information und Sprache : Beiträge zu Informationswissenschaft, Computerlinguistik, Bibliothekswesen und verwandten Fächern. Festschrift für Harald H. Zimmermann. Herausgegeben von Ilse Harms, Heinz-Dirk Luckhardt und Hans W. Giessen (2006) 0.01
```
0.0125019 = product of:
  0.0250038 = sum of:
    0.0250038 = product of:
      0.0500076 = sum of:
        0.0500076 = weight(_text_:maps in 91) [ClassicSimilarity], result of:
          0.0500076 = score(doc=91,freq=4.0), product of:
            0.28477904 = queryWeight, product of:
              5.619245 = idf(docFreq=435, maxDocs=44218)
              0.050679237 = queryNorm
            0.17560141 = fieldWeight in 91, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.619245 = idf(docFreq=435, maxDocs=44218)
              0.015625 = fieldNorm(doc=91)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Content

Jiri Panyr: Thesauri, Semantische Netze, Frames, Topic Maps, Taxonomien, Ontologien - begriffliche Verwirrung oder konzeptionelle Vielfalt? Heinz-Dieter Maas: Indexieren mit AUTINDEX Wilhelm Gaus, Rainer Kaluscha: Maschinelle inhaltliche Erschließung von Arztbriefen und Auswertung von Reha-Entlassungsberichten Klaus Lepsky: Automatische Indexierung des Reallexikons zur Deutschen Kunstgeschichte - Analysen und Entwicklungen Ilse Harms: Die computervermittelte Kommunikation als ein Instrument des Wissensmanagements in Organisationen August- Wilhelm Scheer, Dirk Werth: Geschäftsregel-basiertes Geschäftsprozessmanagement Thomas Seeger: Akkreditierung und Evaluierung von Hochschullehre und -forschung in Großbritannien. Hinweise für die Situation in Deutschland Bernd Hagenau: Gehabte Sorgen hab' ich gern? Ein Blick zurück auf die Deutschen Bibliothekartage 1975 bis 1980 - Persönliches Jorgo Chatzimarkakis: Sprache und Information in Europa Alfred Gulden: 7 Briefe und eine Anmerkung Günter Scholdt: Der Weg nach Europa im Spiegel von Mundartgedichten Alfred Guldens Wolfgang Müller: Prof. Dr. Harald H. Zimmermann - Seit 45 Jahren der Universität des Saarlandes verbunden Heinz-Dirk Luckhardt: Computerlinguistik und Informationswissenschaft: Facetten des wissenschaftlichen Wirkens von Harald H. Zimmermann Schriftenverzeichnis Harald H. Zimmermanns 1967-2005 - Projekte in Verantwortung von Harald H. Zimmermann - Adressen der Beiträgerinnen und Beiträger

Footnote

In Thesauri, Semantische Netze, Frames, Topic Maps, Taxonomien, Ontologien - begriffliche Verwirrung oder konzeptionelle Vielfalt? (S. 139-151) gibt Jiri Panyr (München/Saarbrücken) eine gut lesbare und nützliche Übersicht über die im Titel des Beitrags genannten semantischen Repräsentationsformen, die im Zusammenhang mit dem Internet und insbesondere mit dem vorgeschlagenen Semantic Web immer wieder - und zwar häufig unpräzise oder gar unrichtig - Anwendung finden. Insbesondere die Ausführungen zum Modebegriff Ontologie zeigen, dass dieser nicht leichtfertig als Quasi-Synonym zu Thesaurus oder Klassifikation verwendet werden darf. Panyrs Beitrag ist übrigens thematisch verwandt mit jenem von K.-D. Schmitz (Köln), Wörterbuch, Thesaurus, Terminologie, Ontologie (S. 129-137). Abgesehen von dem einfallslosen Titel Wer suchet, der findet? (S. 107- 118) - zum Glück mit dem Untertitel Verbesserung der inhaltlichen Suchmöglichkeiten im Informationssystem Der Deutschen Bibliothek versehen - handelt es sich bei diesem Artikel von Elisabeth Niggemann (Frankfurt am Main) zwar um keinen wissenschaftlichen, doch sicherlich den praktischsten, lesbarsten und aus bibliothekarischer Sicht interessantesten des Buches. Niggemann gibt einen Überblick über die bisherige sachliche Erschliessung der bibliographischen Daten der inzwischen zur Deutschen Nationalbibliothek mutierten DDB, sowie einen Statusbericht nebst Ausblick über gegenwärtige bzw. geplante Verbesserungen der inhaltlichen Suche. Dazu zählen der breite Einsatz eines automatischen Indexierungsverfahrens (MILOS/IDX) ebenso wie Aktivitäten im klassifikatorischen Bereich (DDC), die Vernetzung nationaler Schlagwortsysteme (Projekt MACS) sowie die Beschäftigung mit Crosskonkordanzen (CARMEN) und Ansätzen zur Heterogenitätsbehandlung. Das hier von zentraler Stelle deklarierte "commitment" hinsichtlich der Verbesserung der sachlichen Erschließung des nationalen Online-Informationssystems erfüllt den eher nur Kleinmut und Gleichgültigkeit gewohnten phäakischen Beobachter mit Respekt und wehmutsvollem Neid.

Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.01

0.012138084 = product of:
  0.024276167 = sum of:
    0.024276167 = product of:
      0.048552334 = sum of:
        0.048552334 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
          0.048552334 = score(doc=2541,freq=4.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.27358043 = fieldWeight in 2541, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2541)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 14. 8.2004 17:22:56
Source: Online. 28(2004) no.3, S.22-29

Hammwöhner, R.: TransRouter revisited : Decision support in the routing of translation projects (2000) 0.01

0.01201609 = product of:
  0.02403218 = sum of:
    0.02403218 = product of:
      0.04806436 = sum of:
        0.04806436 = weight(_text_:22 in 5483) [ClassicSimilarity], result of:
          0.04806436 = score(doc=5483,freq=2.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.2708308 = fieldWeight in 5483, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5483)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 10.12.2000 18:22:35

Schneider, J.W.; Borlund, P.: ¬A bibliometric-based semiautomatic approach to identification of candidate thesaurus terms : parsing and filtering of noun phrases from citation contexts (2005) 0.01

0.01201609 = product of:
  0.02403218 = sum of:
    0.02403218 = product of:
      0.04806436 = sum of:
        0.04806436 = weight(_text_:22 in 156) [ClassicSimilarity], result of:
          0.04806436 = score(doc=156,freq=2.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.2708308 = fieldWeight in 156, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=156)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 8. 3.2007 19:55:22

Paolillo, J.C.: Linguistics and the information sciences (2009) 0.01

0.01201609 = product of:
  0.02403218 = sum of:
    0.02403218 = product of:
      0.04806436 = sum of:
        0.04806436 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
          0.04806436 = score(doc=3840,freq=2.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.2708308 = fieldWeight in 3840, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3840)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 27. 8.2011 14:22:33

Schneider, R.: Web 3.0 ante portas? : Integration von Social Web und Semantic Web (2008) 0.01

0.01201609 = product of:
  0.02403218 = sum of:
    0.02403218 = product of:
      0.04806436 = sum of:
        0.04806436 = weight(_text_:22 in 4184) [ClassicSimilarity], result of:
          0.04806436 = score(doc=4184,freq=2.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.2708308 = fieldWeight in 4184, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4184)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 1.2011 10:38:28

Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.01

0.010299506 = product of:
  0.020599011 = sum of:
    0.020599011 = product of:
      0.041198023 = sum of:
        0.041198023 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
          0.041198023 = score(doc=4436,freq=2.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.23214069 = fieldWeight in 4436, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=4436)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 16. 2.2000 14:22:39

Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.01

0.010299506 = product of:
  0.020599011 = sum of:
    0.020599011 = product of:
      0.041198023 = sum of:
        0.041198023 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
          0.041198023 = score(doc=1746,freq=2.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.23214069 = fieldWeight in 1746, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1746)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2015 9:17:30

Sienel, J.; Weiss, M.; Laube, M.: Sprachtechnologien für die Informationsgesellschaft des 21. Jahrhunderts (2000) 0.01

0.008582922 = product of:
  0.017165843 = sum of:
    0.017165843 = product of:
      0.034331687 = sum of:
        0.034331687 = weight(_text_:22 in 5557) [ClassicSimilarity], result of:
          0.034331687 = score(doc=5557,freq=2.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.19345059 = fieldWeight in 5557, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5557)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 26.12.2000 13:22:17

Pinker, S.: Wörter und Regeln : Die Natur der Sprache (2000) 0.01

0.008582922 = product of:
  0.017165843 = sum of:
    0.017165843 = product of:
      0.034331687 = sum of:
        0.034331687 = weight(_text_:22 in 734) [ClassicSimilarity], result of:
          0.034331687 = score(doc=734,freq=2.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.19345059 = fieldWeight in 734, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=734)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 19. 7.2002 14:22:31

Computational linguistics for the new millennium : divergence or synergy? Proceedings of the International Symposium held at the Ruprecht-Karls Universität Heidelberg, 21-22 July 2000. Festschrift in honour of Peter Hellwig on the occasion of his 60th birthday (2002) 0.01

0.008582922 = product of:
  0.017165843 = sum of:
    0.017165843 = product of:
      0.034331687 = sum of:
        0.034331687 = weight(_text_:22 in 4900) [ClassicSimilarity], result of:
          0.034331687 = score(doc=4900,freq=2.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.19345059 = fieldWeight in 4900, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4900)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Schürmann, H.: Software scannt Radio- und Fernsehsendungen : Recherche in Nachrichtenarchiven erleichtert (2001) 0.01

0.006008045 = product of:
  0.01201609 = sum of:
    0.01201609 = product of:
      0.02403218 = sum of:
        0.02403218 = weight(_text_:22 in 5759) [ClassicSimilarity], result of:
          0.02403218 = score(doc=5759,freq=2.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.1354154 = fieldWeight in 5759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=5759)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Handelsblatt. Nr.79 vom 24.4.2001, S.22

Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.01
```
0.006008045 = product of:
  0.01201609 = sum of:
    0.01201609 = product of:
      0.02403218 = sum of:
        0.02403218 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
          0.02403218 = score(doc=1616,freq=2.0), product of:
            0.17747006 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050679237 = queryNorm
            0.1354154 = fieldWeight in 1616, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1616)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.

Search (21 results, page 1 of 2)

Authors

Languages

Types

Themes

Subjects

Classifications