Search (141 results, page 2 of 8)

Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.01
```
0.0056180023 = product of:
  0.028090011 = sum of:
    0.023567477 = weight(_text_:web in 1616) [ClassicSimilarity], result of:
      0.023567477 = score(doc=1616,freq=8.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.25239927 = fieldWeight in 1616, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1616)
    0.004522534 = product of:
      0.013567602 = sum of:
        0.013567602 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
          0.013567602 = score(doc=1616,freq=2.0), product of:
            0.10019246 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.028611459 = queryNorm
            0.1354154 = fieldWeight in 1616, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1616)
      0.33333334 = coord(1/3)
  0.2 = coord(2/10)
```
Abstract

The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.

Footnote

Teil eines Themenheftes: "Web retrieval and mining: A machine learning perspective"

Zhang, C.; Zeng, D.; Li, J.; Wang, F.-Y.; Zuo, W.: Sentiment analysis of Chinese documents : from sentence to document level (2009) 0.01

0.005604797 = product of:
  0.028023984 = sum of:
    0.020200694 = weight(_text_:web in 3296) [ClassicSimilarity], result of:
      0.020200694 = score(doc=3296,freq=2.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.21634221 = fieldWeight in 3296, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3296)
    0.007823291 = product of:
      0.023469873 = sum of:
        0.023469873 = weight(_text_:29 in 3296) [ClassicSimilarity], result of:
          0.023469873 = score(doc=3296,freq=2.0), product of:
            0.10064617 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.028611459 = queryNorm
            0.23319192 = fieldWeight in 3296, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=3296)
      0.33333334 = coord(1/3)
  0.2 = coord(2/10)

Abstract: User-generated content on the Web has become an extremely valuable source for mining and analyzing user opinions on any topic. Recent years have seen an increasing body of work investigating methods to recognize favorable and unfavorable sentiments toward specific subjects from online text. However, most of these efforts focus on English and there have been very few studies on sentiment analysis of Chinese content. This paper aims to address the unique challenges posed by Chinese sentiment analysis. We propose a rule-based approach including two phases: (1) determining each sentence's sentiment based on word dependency, and (2) aggregating sentences to predict the document sentiment. We report the results of an experimental study comparing our approach with three machine learning-based approaches using two sets of Chinese articles. These results illustrate the effectiveness of our proposed method and its advantages against learning-based approaches.
Date: 2. 2.2010 19:29:56

Kim, W.; Wilbur, W.J.: Corpus-based statistical screening for content-bearing terms (2001) 0.01
```
0.0051931893 = product of:
  0.05193189 = sum of:
    0.05193189 = weight(_text_:log in 5188) [ClassicSimilarity], result of:
      0.05193189 = score(doc=5188,freq=2.0), product of:
        0.18335998 = queryWeight, product of:
          6.4086204 = idf(docFreq=197, maxDocs=44218)
          0.028611459 = queryNorm
        0.2832237 = fieldWeight in 5188, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.4086204 = idf(docFreq=197, maxDocs=44218)
          0.03125 = fieldNorm(doc=5188)
  0.1 = coord(1/10)
```
Abstract

Kim and Wilber present three techniques for the algorithmic identification in text of content bearing terms and phrases intended for human use as entry points or hyperlinks. Using a set of 1,075 terms from MEDLINE evaluated on a zero to four, stop word to definite content word scale, they evaluate the ranked lists of their three methods based on their placement of content words in the top ranks. Data consist of the natural language elements of 304,057 MEDLINE records from 1996, and 173,252 Wall Street Journal records from the TIPSTER collection. Phrases are extracted by breaking at punctuation marks and stop words, normalized by lower casing, replacement of nonalphanumerics with spaces, and the reduction of multiple spaces. In the ``strength of context'' approach each document is a vector of binary values for each word or word pair. The words or word pairs are removed from all documents, and the Robertson, Spark Jones relevance weight for each term computed, negative weights replaced with zero, those below a randomness threshold ignored, and the remainder summed for each document, to yield a score for the document and finally to assign to the term the average document score for documents in which it occurred. The average of these word scores is assigned to the original phrase. The ``frequency clumping'' approach defines a random phrase as one whose distribution among documents is Poisson in character. A pvalue, the probability that a phrase frequency of occurrence would be equal to, or less than, Poisson expectations is computed, and a score assigned which is the negative log of that value. In the ``database comparison'' approach if a phrase occurring in a document allows prediction that the document is in MEDLINE rather that in the Wall Street Journal, it is considered to be content bearing for MEDLINE. The score is computed by dividing the number of occurrences of the term in MEDLINE by occurrences in the Journal, and taking the product of all these values. The one hundred top and bottom ranked phrases that occurred in at least 500 documents were collected for each method. The union set had 476 phrases. A second selection was made of two word phrases occurring each in only three documents with a union of 599 phrases. A judge then ranked the two sets of terms as to subject specificity on a 0 to 4 scale. Precision was the average subject specificity of the first r ranks and recall the fraction of the subject specific phrases in the first r ranks and eleven point average precision was used as a summary measure. The three methods all move content bearing terms forward in the lists as does the use of the sum of the logs of the three methods.
Rahmstorf, G.: Rückkehr von Ordnung in die Informationstechnik? (2000) 0.01
```
0.005011191 = product of:
  0.05011191 = sum of:
    0.05011191 = weight(_text_:kommunikation in 5504) [ClassicSimilarity], result of:
      0.05011191 = score(doc=5504,freq=2.0), product of:
        0.14706601 = queryWeight, product of:
          5.140109 = idf(docFreq=703, maxDocs=44218)
          0.028611459 = queryNorm
        0.34074432 = fieldWeight in 5504, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.140109 = idf(docFreq=703, maxDocs=44218)
          0.046875 = fieldNorm(doc=5504)
  0.1 = coord(1/10)
```
Abstract

Im Zuge der aktuellen Informationstechnik, der weltweiten Kommunikation und des elektronischen Publizierens scheinen die herkömmlichen Instrumente der Ordnungsstiftung - bibliothekarische Klassifikationssysteme und Thesauren - an den Rand gedrängt zu werden oder sogar ganz zu verschwinden. Andererseits sind die Endbenutzer oft unzufrieden mit dem Ergebnis des Recherchierens im Bestand des unabsehbar wachsenden Informationsangebotes. Ist eine präzise und vollständige Recherche bei den gegebenen technischen und Ökonomischen Verhältnissen überhaupt noch realisierbar'?
Schmitz, K.-D.: Projektforschung und Infrastrukturen im Bereich der Terminologie : Wie kann die Wirtschaft davon profitieren? (2000) 0.01
```
0.005011191 = product of:
  0.05011191 = sum of:
    0.05011191 = weight(_text_:kommunikation in 5568) [ClassicSimilarity], result of:
      0.05011191 = score(doc=5568,freq=2.0), product of:
        0.14706601 = queryWeight, product of:
          5.140109 = idf(docFreq=703, maxDocs=44218)
          0.028611459 = queryNorm
        0.34074432 = fieldWeight in 5568, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.140109 = idf(docFreq=703, maxDocs=44218)
          0.046875 = fieldNorm(doc=5568)
  0.1 = coord(1/10)
```
Abstract

In der heutigen Informationsgesellschaft bieten sich der Industrie neue Perspektiven für Kommunikation und Handel auf dem europäischen und internationalen Markt; beide Märkte sind von einer großen sprachlichen, kulturellen und gesellschaftlichen Vielfalt geprägt. Uni Nutzen aus diesen neuen Möglichkeiten zu ziehen und um weiterhin konkurrenzfähig zu bleiben, muß die Industrie spezifische und adäquate Lösungen zur Überwindung der Sprachbarrieren finden. Voraussetzung hierfür ist die genaue Definition, systematische Ordnung und exakte Benennung der Begriffe innerhalb der jeweiligen Fachgebiete, in der eigenen Sprache ebenso wie in den Fremdsprachen. Genau dies sind die Themenbereiche, mit dem sich die Terminologiewissenschaft und die praktische Temninologiearbeit beschäftigen. Die Ergebnisse der Terminologiearbeit im Unternehmen beeinflussen Konstruktion, Produktion, Einkauf, Marketing und Verkauf, Vertragswesen, technische Dokumentation und Übersetzung

Mengel, T.: Wie viel Terminologiearbeit steckt in der Übersetzung der Dewey-Dezimalklassifikation? (2019) 0.01

0.005011191 = product of:
  0.05011191 = sum of:
    0.05011191 = weight(_text_:kommunikation in 5603) [ClassicSimilarity], result of:
      0.05011191 = score(doc=5603,freq=2.0), product of:
        0.14706601 = queryWeight, product of:
          5.140109 = idf(docFreq=703, maxDocs=44218)
          0.028611459 = queryNorm
        0.34074432 = fieldWeight in 5603, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.140109 = idf(docFreq=703, maxDocs=44218)
          0.046875 = fieldNorm(doc=5603)
  0.1 = coord(1/10)

Series: Kommunikation und Medienmanagement - Springer eBooks. Computer Science and Engineering

Susen, A.: Spracherkennung : Aktuelle Einsatzmöglichkeiten im Bereich der Telekommunikation (2000) 0.00
```
0.004175992 = product of:
  0.04175992 = sum of:
    0.04175992 = weight(_text_:kommunikation in 5555) [ClassicSimilarity], result of:
      0.04175992 = score(doc=5555,freq=2.0), product of:
        0.14706601 = queryWeight, product of:
          5.140109 = idf(docFreq=703, maxDocs=44218)
          0.028611459 = queryNorm
        0.28395358 = fieldWeight in 5555, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.140109 = idf(docFreq=703, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5555)
  0.1 = coord(1/10)
```
Abstract

Das Thema der Sprachverarbeitung und insbesondere der Spracherkennung erhitzt schon seit vielen Jahren die Gemüter der Forscher und der Entwickler. Schon zu Beginn des Coniputerzeitalters wurde prophezeit, daß auch der Computer bald menschliche Sprache verstehen wird. Dabei wurde aber die Leistung des Gehirns bei der Erkennung unterschätzt. Erst durch die Vervielfältigung von Speicherkapazität und Rechnergeschwindigkeit, wurden vor ca. 20 Jahren ernstzunehmende Verfahren entwickelt, die eine minimale Sprachbeherrschung ermöglichten. Inzwischen ist die Entwicklung soweit fortgeschritten, daß wir uns über Produkte unterhalten können, die schon im Markt eingeführt sind. Die wahrscheinlich bekanntesten Beispiele für den Einsatz von Spracherkennung sind, neben Diktiersystemen, die sogenannten Telefonsprachcomputer in Firmen, die den Anrufer ohne den Einsatz der klassischen Telefonzentrale mit der gewünschten Abteilung verbinden. Die Spracherkennung ist für die Telekommunikation von besonderer Bedeutung, da der Bereich der Kommunikation größten Veränderungen unterworfen ist. Die Menge der Informationen, welche täglich abrufbereit zur Verfügung steht, ist inzwischen schon so unübersehbar gross geworden, daß eine intelligente Organisation für den sinnvollen Umgang erforderlich ist. Brauchbare Inhalte können nur mit neuen Werkzeugen und weiteren Hilfsmitteln herausgefiltert und weiter verarbeitet werden. Verschiedene bekannte Variationen können hier nur kurzfristigen Erfolg bringen, z.B. die erhöhte Erreichbarkeit durch Mobilfunk. Bei genauer Betrachtung der Einsatzmöglichkeiten von Spracherkennung in der Telekommunikation ist es zunächst erforderlich, den Userkreis genauer zu definieren. Eine erste Unterteilung ergibt der Einsatz im privaten oder geschäftlichen Bereich
Erbach, G.: Sprachdialogsysteme für Telefondienste : Stand der Technik und zukünftige Entwicklungen (2000) 0.00
```
0.004175992 = product of:
  0.04175992 = sum of:
    0.04175992 = weight(_text_:kommunikation in 5556) [ClassicSimilarity], result of:
      0.04175992 = score(doc=5556,freq=2.0), product of:
        0.14706601 = queryWeight, product of:
          5.140109 = idf(docFreq=703, maxDocs=44218)
          0.028611459 = queryNorm
        0.28395358 = fieldWeight in 5556, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.140109 = idf(docFreq=703, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5556)
  0.1 = coord(1/10)
```
Abstract

Trotz des ungebrernsten Wachstums des Internet wird das Telefon auch weiterhin eines der wichtigsten Medien für die Kommunikation zwischen Unternehmen und ihren Kunden bleiben. Die Bedeutung der gesprochenen Sprache wird durch die rasante Verbreitung von Mobiltelefonen noch verstärkt. Fast alle großen Unternehmen betreiben oder beauftragen Call Centers, um ihren Kunden telefonisch zu Diensten zu stehen. Oft sind Call Centers mit sogenannten IVR-Systemen (Interactive Voice Response) ausgestattet, die dem Benutzer eine eingeschränkte Menüauswahl über die Telefontasten oder eine rudimentäre Spracheingabe bieten. Diese Art der Eingabe wird aber bei mehr als fünf Wahlmöglichkeiten als lästig empfunden. Hier bietet sich ein großes Potenzial für automatische Spracherkennung und Sprachdialogsysteme. In diesem Artikel werden die technischen Grundlagen sowie die derzeitigen Möglichkeiten und Grenzen der automatischen Spracherkennungstechnologie dargestellt. Wir berichten über Erfahrungen mit einem System für telefonische Posttarifauskünfte, das am Forschungszentrum Telekommunikation Wien (FTW) in Zusammenarbeit mit Philips Speech Processing und der Österreichischen Post AG realisiert und erprobt wurde. Der Stand der Technik in Sprachausgabe und Sprechererkennung wird kurz dargestellt. Zum Abschluss wird ein Ausblick auf die Rolle von Sprachdialogen in zukünftigen mobilen Multirnedia-Anwendungen gegeben
Granitzer, M.: Statistische Verfahren der Textanalyse (2006) 0.00
```
0.004082007 = product of:
  0.04082007 = sum of:
    0.04082007 = weight(_text_:web in 5809) [ClassicSimilarity], result of:
      0.04082007 = score(doc=5809,freq=6.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.43716836 = fieldWeight in 5809, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5809)
  0.1 = coord(1/10)
```
Abstract

Der vorliegende Artikel bietet einen Überblick über statistische Verfahren der Textanalyse im Kontext des Semantic Webs. Als Einleitung erfolgt die Diskussion von Methoden und gängigen Techniken zur Vorverarbeitung von Texten wie z. B. Stemming oder Part-of-Speech Tagging. Die so eingeführten Repräsentationsformen dienen als Basis für statistische Merkmalsanalysen sowie für weiterführende Techniken wie Information Extraction und maschinelle Lernverfahren. Die Darstellung dieser speziellen Techniken erfolgt im Überblick, wobei auf die wichtigsten Aspekte in Bezug auf das Semantic Web detailliert eingegangen wird. Die Anwendung der vorgestellten Techniken zur Erstellung und Wartung von Ontologien sowie der Verweis auf weiterführende Literatur bilden den Abschluss dieses Artikels.

Source

Semantic Web: Wege zur vernetzten Wissensgesellschaft. Hrsg.: T. Pellegrini, u. A. Blumauer

Theme

Semantic Web
Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.00
```
0.0037641774 = product of:
  0.037641775 = sum of:
    0.037641775 = weight(_text_:web in 604) [ClassicSimilarity], result of:
      0.037641775 = score(doc=604,freq=10.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.40312994 = fieldWeight in 604, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=604)
  0.1 = coord(1/10)
```
Abstract

Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words.

Footnote

Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"

Melby, A.: Some notes on 'The proper place of men and machines in language translation' (1997) 0.00

0.0036344484 = product of:
  0.036344483 = sum of:
    0.036344483 = product of:
      0.054516725 = sum of:
        0.027381519 = weight(_text_:29 in 330) [ClassicSimilarity], result of:
          0.027381519 = score(doc=330,freq=2.0), product of:
            0.10064617 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.028611459 = queryNorm
            0.27205724 = fieldWeight in 330, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=330)
        0.027135205 = weight(_text_:22 in 330) [ClassicSimilarity], result of:
          0.027135205 = score(doc=330,freq=2.0), product of:
            0.10019246 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.028611459 = queryNorm
            0.2708308 = fieldWeight in 330, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=330)
      0.6666667 = coord(2/3)
  0.1 = coord(1/10)

Date: 31. 7.1996 9:22:19
Source: Machine translation. 12(1997) nos.1/2, S.29-34

Radev, D.; Fan, W.; Qu, H.; Wu, H.; Grewal, A.: Probabilistic question answering on the Web (2005) 0.00
```
0.0034988632 = product of:
  0.03498863 = sum of:
    0.03498863 = weight(_text_:web in 3455) [ClassicSimilarity], result of:
      0.03498863 = score(doc=3455,freq=6.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.37471575 = fieldWeight in 3455, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3455)
  0.1 = coord(1/10)
```
Abstract

Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 an the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
Thelwall, M.; Price, L.: Language evolution and the spread of ideas on the Web : a procedure for identifying emergent hybrid word (2006) 0.00
```
0.0034988632 = product of:
  0.03498863 = sum of:
    0.03498863 = weight(_text_:web in 5896) [ClassicSimilarity], result of:
      0.03498863 = score(doc=5896,freq=6.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.37471575 = fieldWeight in 5896, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5896)
  0.1 = coord(1/10)
```
Abstract

Word usage is of interest to linguists for its own sake as well as to social scientists and others who seek to track the spread of ideas, for example, in public debates over political decisions. The historical evolution of language can be analyzed with the tools of corpus linguistics through evolving corpora and the Web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the Web, focusing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.
Jensen, N.: Evaluierung von mehrsprachigem Web-Retrieval : Experimente mit dem EuroGOV-Korpus im Rahmen des Cross Language Evaluation Forum (CLEF) (2006) 0.00
```
0.0034988632 = product of:
  0.03498863 = sum of:
    0.03498863 = weight(_text_:web in 5964) [ClassicSimilarity], result of:
      0.03498863 = score(doc=5964,freq=6.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.37471575 = fieldWeight in 5964, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5964)
  0.1 = coord(1/10)
```
Abstract

Der vorliegende Artikel beschreibt die Experimente der Universität Hildesheim im Rahmen des ersten Web Track der CLEF-Initiative (WebCLEF) im Jahr 2005. Bei der Teilnahme konnten Erfahrungen mit einem multilingualen Web-Korpus (EuroGOV) bei der Vorverarbeitung, der Topic- bzw. Query-Entwicklung, bei sprachunabhängigen Indexierungsmethoden und multilingualen Retrieval-Strategien gesammelt werden. Aufgrund des großen Um-fangs des Korpus und der zeitlichen Einschränkungen wurden multilinguale Indizes aufgebaut. Der Artikel beschreibt die Vorgehensweise bei der Teilnahme der Universität Hildesheim und die Ergebnisse der offiziell eingereichten sowie weiterer Experimente. Für den Multilingual Task konnte das beste Ergebnis in CLEF erzielt werden.
Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.00
```
0.0034988632 = product of:
  0.03498863 = sum of:
    0.03498863 = weight(_text_:web in 2342) [ClassicSimilarity], result of:
      0.03498863 = score(doc=2342,freq=6.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.37471575 = fieldWeight in 2342, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2342)
  0.1 = coord(1/10)
```
Abstract

Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.

Dreehsen, B.: ¬Der PC als Dolmetscher (1998) 0.00

0.0033667826 = product of:
  0.033667825 = sum of:
    0.033667825 = weight(_text_:web in 1474) [ClassicSimilarity], result of:
      0.033667825 = score(doc=1474,freq=2.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.36057037 = fieldWeight in 1474, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=1474)
  0.1 = coord(1/10)

Abstract: Für englische Web-Seiten und fremdsprachige Korrespondenz ist Übersetzungssoftware hilfreich, die per Mausklick den Text ins Deutsche überträgt und umgekehrt. Die neuen Versionen geben den Inhalt sinngemäß bereits gut wieder. CHIP hat die Leistungen von 5 Programmen getestet

Li, Q.; Chen, Y.P.; Myaeng, S.-H.; Jin, Y.; Kang, B.-Y.: Concept unification of terms in different languages via web mining for Information Retrieval (2009) 0.00
```
0.0033667826 = product of:
  0.033667825 = sum of:
    0.033667825 = weight(_text_:web in 4215) [ClassicSimilarity], result of:
      0.033667825 = score(doc=4215,freq=8.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.36057037 = fieldWeight in 4215, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4215)
  0.1 = coord(1/10)
```
Abstract

For historical and cultural reasons, English phrases, especially proper nouns and new words, frequently appear in Web pages written primarily in East Asian languages such as Chinese, Korean, and Japanese. Although such English terms and their equivalences in these East Asian languages refer to the same concept, they are often erroneously treated as independent index units in traditional Information Retrieval (IR). This paper describes the degree to which the problem arises in IR and proposes a novel technique to solve it. Our method first extracts English terms from native Web documents in an East Asian language, and then unifies the extracted terms and their equivalences in the native language as one index unit. For Cross-Language Information Retrieval (CLIR), one of the major hindrances to achieving retrieval performance at the level of Mono-Lingual Information Retrieval (MLIR) is the translation of terms in search queries which can not be found in a bilingual dictionary. The Web mining approach proposed in this paper for concept unification of terms in different languages can also be applied to solve this well-known challenge in CLIR. Experimental results based on NTCIR and KT-Set test collections show that the high translation precision of our approach greatly improves performance of both Mono-Lingual and Cross-Language Information Retrieval.
Rajasurya, S.; Muralidharan, T.; Devi, S.; Swamynathan, S.: Semantic information retrieval using ontology in university domain (2012) 0.00
```
0.0033667826 = product of:
  0.033667825 = sum of:
    0.033667825 = weight(_text_:web in 2861) [ClassicSimilarity], result of:
      0.033667825 = score(doc=2861,freq=8.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.36057037 = fieldWeight in 2861, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2861)
  0.1 = coord(1/10)
```
Abstract

Today's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.
Rozinajová, V.; Macko, P.: Using natural language to search linked data (2017) 0.00
```
0.0033667826 = product of:
  0.033667825 = sum of:
    0.033667825 = weight(_text_:web in 3488) [ClassicSimilarity], result of:
      0.033667825 = score(doc=3488,freq=8.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.36057037 = fieldWeight in 3488, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3488)
  0.1 = coord(1/10)
```
Abstract

There are many endeavors aiming to offer users more effective ways of getting relevant information from web. One of them is represented by a concept of Linked Data, which provides interconnected data sources. But querying these types of data is difficult not only for the conventional web users but also for ex-perts in this field. Therefore, a more comfortable way of user query would be of great value. One direction could be to allow the user to use a natural language. To make this task easier we have proposed a method for translating natural language query to SPARQL query. It is based on a sentence structure - utilizing dependen-cies between the words in user queries. Dependencies are used to map the query to the semantic web structure, which is in the next step translated to SPARQL query. According to our first experiments we are able to answer a significant group of user queries.

Series

Information Systems and Applications, incl. Internet/Web, and HCI; 10151
Weßels, D.: ChatGPT - ein Meilenstein der KI-Entwicklung (2022) 0.00
```
0.0033407938 = product of:
  0.033407938 = sum of:
    0.033407938 = weight(_text_:kommunikation in 929) [ClassicSimilarity], result of:
      0.033407938 = score(doc=929,freq=2.0), product of:
        0.14706601 = queryWeight, product of:
          5.140109 = idf(docFreq=703, maxDocs=44218)
          0.028611459 = queryNorm
        0.22716287 = fieldWeight in 929, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.140109 = idf(docFreq=703, maxDocs=44218)
          0.03125 = fieldNorm(doc=929)
  0.1 = coord(1/10)
```
Content

"Seit dem 30. November 2022 ist meine Welt - und die vieler Bildungsexpertinnen und Bildungsexperten - gefühlt eine andere Welt, die uns in eine "Neuzeit" führt, von der wir noch nicht wissen, ob wir sie lieben oder fürchten sollen. Der Ableger und Prototyp ChatGPT des derzeit (zumindest in der westlichen Welt) führenden generativen KI-Sprachmodells GPT-3 von OpenAI wurde am 30. November veröffentlicht und ist seit dieser Zeit für jeden frei zugänglich und kostenlos. Was zunächst als unspektakuläre Ankündigung von OpenAI anmutete, nämlich das seit 2020 bereits verfügbare KI-Sprachmodell GPT-3 nun in leicht modifizierter Version (GPT-3,5) als Chat-Variante für die Echtzeit-Kommunikation bereitzustellen, entpuppt sich in der Anwendung - aus Sicht der Nutzerinnen und Nutzer - als Meilenstein der KI-Entwicklung. Fakt ist, dass die Leistungsvielfalt und -stärke von ChatGPT selbst IT-Expertinnen und -Experten überrascht hat und sie zu einer Fülle von Superlativen in der Bewertung veranlasst, jedoch immer in Kombination mit Hinweisen zur fehlenden Faktentreue und Verlässlichkeit derartiger generativer KI-Modelle. Mit WebGPT von OpenAI steht aber bereits ein Forschungsprototyp bereit, der mit integrierter Internetsuchfunktion die "Halluzinationen" aktueller GPT-Varianten ausmerzen könnte. Für den Bildungssektor stellt sich die Frage, wie sich das Lehren und Lernen an Hochschulen (und nicht nur dort) verändern wird, wenn derartige KI-Werkzeuge omnipräsent sind und mit ihrer Hilfe nicht nur die Hausarbeit "per Knopfdruck" erstellt werden kann. Beeindruckend ist zudem die fachliche Bandbreite von ChatGPT, siehe den Tweet von @davidtsong, der ChatGPT dem Studierfähigkeitstest SAT unterzogen hat."

Search (141 results, page 2 of 8)

Authors

Years

Languages

Types

Themes