Search (85 results, page 1 of 5)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.11

0.105155796 = product of:
  0.1577337 = sum of:
    0.08062613 = product of:
      0.24187839 = sum of:
        0.24187839 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.24187839 = score(doc=562,freq=2.0), product of:
            0.43037477 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.050763648 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.07710756 = sum of:
      0.035840917 = weight(_text_:web in 562) [ClassicSimilarity], result of:
        0.035840917 = score(doc=562,freq=2.0), product of:
          0.1656677 = queryWeight, product of:
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.050763648 = queryNorm
          0.21634221 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
      0.041266643 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.041266643 = score(doc=562,freq=2.0), product of:
          0.17776565 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050763648 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
  0.6666667 = coord(2/3)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.07
```
0.06961599 = product of:
  0.104423985 = sum of:
    0.03853737 = weight(_text_:wide in 1616) [ClassicSimilarity], result of:
      0.03853737 = score(doc=1616,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.171337 = fieldWeight in 1616, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1616)
    0.06588662 = sum of:
      0.041814405 = weight(_text_:web in 1616) [ClassicSimilarity], result of:
        0.041814405 = score(doc=1616,freq=8.0), product of:
          0.1656677 = queryWeight, product of:
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.050763648 = queryNorm
          0.25239927 = fieldWeight in 1616, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.02734375 = fieldNorm(doc=1616)
      0.024072208 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
        0.024072208 = score(doc=1616,freq=2.0), product of:
          0.17776565 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050763648 = queryNorm
          0.1354154 = fieldWeight in 1616, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.02734375 = fieldNorm(doc=1616)
  0.6666667 = coord(2/3)
```
Abstract

The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.

Footnote

Teil eines Themenheftes: "Web retrieval and mining: A machine learning perspective"

Thelwall, M.; Price, L.: Language evolution and the spread of ideas on the Web : a procedure for identifying emergent hybrid word (2006) 0.06

0.06473547 = product of:
  0.0971032 = sum of:
    0.06606405 = weight(_text_:wide in 5896) [ClassicSimilarity], result of:
      0.06606405 = score(doc=5896,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.29372054 = fieldWeight in 5896, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=5896)
    0.031039147 = product of:
      0.062078293 = sum of:
        0.062078293 = weight(_text_:web in 5896) [ClassicSimilarity], result of:
          0.062078293 = score(doc=5896,freq=6.0), product of:
            0.1656677 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.050763648 = queryNorm
            0.37471575 = fieldWeight in 5896, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=5896)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Word usage is of interest to linguists for its own sake as well as to social scientists and others who seek to track the spread of ideas, for example, in public debates over political decisions. The historical evolution of language can be analyzed with the tools of corpus linguistics through evolving corpora and the Web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the Web, focusing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.

Chowdhury, G.G.: Natural language processing (2002) 0.06

0.055989675 = product of:
  0.08398451 = sum of:
    0.06606405 = weight(_text_:wide in 4284) [ClassicSimilarity], result of:
      0.06606405 = score(doc=4284,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.29372054 = fieldWeight in 4284, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=4284)
    0.017920459 = product of:
      0.035840917 = sum of:
        0.035840917 = weight(_text_:web in 4284) [ClassicSimilarity], result of:
          0.035840917 = score(doc=4284,freq=2.0), product of:
            0.1656677 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.050763648 = queryNorm
            0.21634221 = fieldWeight in 4284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=4284)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Natural Language Processing (NLP) is an area of research and application that explores how computers can be used to understand and manipulate natural language text or speech to do useful things. NLP researchers aim to gather knowledge an how human beings understand and use language so that appropriate tools and techniques can be developed to make computer systems understand and manipulate natural languages to perform desired tasks. The foundations of NLP lie in a number of disciplines, namely, computer and information sciences, linguistics, mathematics, electrical and electronic engineering, artificial intelligence and robotics, and psychology. Applications of NLP include a number of fields of study, such as machine translation, natural language text processing and summarization, user interfaces, multilingual and cross-language information retrieval (CLIR), speech recognition, artificial intelligence, and expert systems. One important application area that is relatively new and has not been covered in previous ARIST chapters an NLP relates to the proliferation of the World Wide Web and digital libraries.

Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.05
```
0.046658065 = product of:
  0.069987096 = sum of:
    0.055053383 = weight(_text_:wide in 1338) [ClassicSimilarity], result of:
      0.055053383 = score(doc=1338,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.24476713 = fieldWeight in 1338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
    0.014933716 = product of:
      0.029867431 = sum of:
        0.029867431 = weight(_text_:web in 1338) [ClassicSimilarity], result of:
          0.029867431 = score(doc=1338,freq=2.0), product of:
            0.1656677 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.050763648 = queryNorm
            0.18028519 = fieldWeight in 1338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1338)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.04

0.037649494 = product of:
  0.11294848 = sum of:
    0.11294848 = sum of:
      0.071681835 = weight(_text_:web in 563) [ClassicSimilarity], result of:
        0.071681835 = score(doc=563,freq=8.0), product of:
          0.1656677 = queryWeight, product of:
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.050763648 = queryNorm
          0.43268442 = fieldWeight in 563, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.046875 = fieldNorm(doc=563)
      0.041266643 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
        0.041266643 = score(doc=563,freq=2.0), product of:
          0.17776565 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050763648 = queryNorm
          0.23214069 = fieldWeight in 563, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=563)
  0.33333334 = coord(1/3)

Abstract: In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Date: 10. 1.2013 19:22:47

Yang, C.C.; Li, K.W.: Automatic construction of English/Chinese parallel corpora (2003) 0.04
```
0.03732645 = product of:
  0.055989675 = sum of:
    0.044042703 = weight(_text_:wide in 1683) [ClassicSimilarity], result of:
      0.044042703 = score(doc=1683,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.1958137 = fieldWeight in 1683, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.03125 = fieldNorm(doc=1683)
    0.011946972 = product of:
      0.023893945 = sum of:
        0.023893945 = weight(_text_:web in 1683) [ClassicSimilarity], result of:
          0.023893945 = score(doc=1683,freq=2.0), product of:
            0.1656677 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.050763648 = queryNorm
            0.14422815 = fieldWeight in 1683, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=1683)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

As the demand for global information increases significantly, multilingual corpora has become a valuable linguistic resource for applications to cross-lingual information retrieval and natural language processing. In order to cross the boundaries that exist between different languages, dictionaries are the most typical tools. However, the general-purpose dictionary is less sensitive in both genre and domain. It is also impractical to manually construct tailored bilingual dictionaries or sophisticated multilingual thesauri for large applications. Corpusbased approaches, which do not have the limitation of dictionaries, provide a statistical translation model with which to cross the language boundary. There are many domain-specific parallel or comparable corpora that are employed in machine translation and cross-lingual information retrieval. Most of these are corpora between Indo-European languages, such as English/French and English/Spanish. The Asian/Indo-European corpus, especially English/Chinese corpus, is relatively sparse. The objective of the present research is to construct English/ Chinese parallel corpus automatically from the World Wide Web. In this paper, an alignment method is presented which is based an dynamic programming to identify the one-to-one Chinese and English title pairs. The method includes alignment at title level, word level and character level. The longest common subsequence (LCS) is applied to find the most reliabie Chinese translation of an English word. As one word for a language may translate into two or more words repetitively in another language, the edit operation, deletion, is used to resolve redundancy. A score function is then proposed to determine the optimal title pairs. Experiments have been conducted to investigate the performance of the proposed method using the daily press release articles by the Hong Kong SAR government as the test bed. The precision of the result is 0.998 while the recall is 0.806. The release articles and speech articles, published by Hongkong & Shanghai Banking Corporation Limited, are also used to test our method, the precision is 1.00, and the recall is 0.948.
Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.03
```
0.03345504 = product of:
  0.10036512 = sum of:
    0.10036512 = sum of:
      0.05173191 = weight(_text_:web in 2541) [ClassicSimilarity], result of:
        0.05173191 = score(doc=2541,freq=6.0), product of:
          0.1656677 = queryWeight, product of:
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.050763648 = queryNorm
          0.3122631 = fieldWeight in 2541, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2541)
      0.048633203 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
        0.048633203 = score(doc=2541,freq=4.0), product of:
          0.17776565 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050763648 = queryNorm
          0.27358043 = fieldWeight in 2541, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2541)
  0.33333334 = coord(1/3)
```
Abstract

The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.

Date

14. 8.2004 17:22:56

Source

Online. 28(2004) no.3, S.22-29
Working with conceptual structures : contributions to ICCS 2000. 8th International Conference on Conceptual Structures: Logical, Linguistic, and Computational Issues. Darmstadt, August 14-18, 2000 (2000) 0.03
```
0.03266065 = product of:
  0.04899097 = sum of:
    0.03853737 = weight(_text_:wide in 5089) [ClassicSimilarity], result of:
      0.03853737 = score(doc=5089,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.171337 = fieldWeight in 5089, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.02734375 = fieldNorm(doc=5089)
    0.010453601 = product of:
      0.020907203 = sum of:
        0.020907203 = weight(_text_:web in 5089) [ClassicSimilarity], result of:
          0.020907203 = score(doc=5089,freq=2.0), product of:
            0.1656677 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.050763648 = queryNorm
            0.12619963 = fieldWeight in 5089, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.02734375 = fieldNorm(doc=5089)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The 8th International Conference on Conceptual Structures - Logical, Linguistic, and Computational Issues (ICCS 2000) brings together a wide range of researchers and practitioners working with conceptual structures. During the last few years, the ICCS conference series has considerably widened its scope on different kinds of conceptual structures, stimulating research across domain boundaries. We hope that this stimulation is further enhanced by ICCS 2000 joining the long tradition of conferences in Darmstadt with extensive, lively discussions. This volume consists of contributions presented at ICCS 2000, complementing the volume "Conceptual Structures: Logical, Linguistic, and Computational Issues" (B. Ganter, G.W. Mineau (Eds.), LNAI 1867, Springer, Berlin-Heidelberg 2000). It contains submissions reviewed by the program committee, and position papers. We wish to express our appreciation to all the authors of submitted papers, to the general chair, the program chair, the editorial board, the program committee, and to the additional reviewers for making ICCS 2000 a valuable contribution in the knowledge processing research field. Special thanks go to the local organizers for making the conference an enjoyable and inspiring event. We are grateful to Darmstadt University of Technology, the Ernst Schröder Center for Conceptual Knowledge Processing, the Center for Interdisciplinary Studies in Technology, the Deutsche Forschungsgemeinschaft, Land Hessen, and NaviCon GmbH for their generous support

Content

Concepts & Language: Knowledge organization by procedures of natural language processing. A case study using the method GABEK (J. Zelger, J. Gadner) - Computer aided narrative analysis using conceptual graphs (H. Schärfe, P. 0hrstrom) - Pragmatic representation of argumentative text: a challenge for the conceptual graph approach (H. Irandoust, B. Moulin) - Conceptual graphs as a knowledge representation core in a complex language learning environment (G. Angelova, A. Nenkova, S. Boycheva, T. Nikolov) - Conceptual Modeling and Ontologies: Relationships and actions in conceptual categories (Ch. Landauer, K.L. Bellman) - Concept approximations for formal concept analysis (J. Saquer, J.S. Deogun) - Faceted information representation (U. Priß) - Simple concept graphs with universal quantifiers (J. Tappe) - A framework for comparing methods for using or reusing multiple ontologies in an application (J. van ZyI, D. Corbett) - Designing task/method knowledge-based systems with conceptual graphs (M. Leclère, F.Trichet, Ch. Choquet) - A logical ontology (J. Farkas, J. Sarbo) - Algorithms and Tools: Fast concept analysis (Ch. Lindig) - A framework for conceptual graph unification (D. Corbett) - Visual CP representation of knowledge (H.D. Pfeiffer, R.T. Hartley) - Maximal isojoin for representing software textual specifications and detecting semantic anomalies (Th. Charnois) - Troika: using grids, lattices and graphs in knowledge acquisition (H.S. Delugach, B.E. Lampkin) - Open world theorem prover for conceptual graphs (J.E. Heaton, P. Kocura) - NetCare: a practical conceptual graphs software tool (S. Polovina, D. Strang) - CGWorld - a web based workbench for conceptual graphs management and applications (P. Dobrev, K. Toutanova) - Position papers: The edition project: Peirce's existential graphs (R. Mülller) - Mining association rules using formal concept analysis (N. Pasquier) - Contextual logic summary (R Wille) - Information channels and conceptual scaling (K.E. Wolff) - Spatial concepts - a rule exploration (S. Rudolph) - The TEXT-TO-ONTO learning environment (A. Mädche, St. Staab) - Controlling the semantics of metadata on audio-visual documents using ontologies (Th. Dechilly, B. Bachimont) - Building the ontological foundations of a terminology from natural language to conceptual graphs with Ribosome, a knowledge extraction system (Ch. Jacquelinet, A. Burgun) - CharGer: some lessons learned and new directions (H.S. Delugach) - Knowledge management using conceptual graphs (W.K. Pun)
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I.: Improving language understanding by Generative Pre-Training 0.03
```
0.031142896 = product of:
  0.093428686 = sum of:
    0.093428686 = weight(_text_:wide in 870) [ClassicSimilarity], result of:
      0.093428686 = score(doc=870,freq=4.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.4153836 = fieldWeight in 870, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=870)
  0.33333334 = coord(1/3)
```
Abstract

Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. For instance, we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on question answering (RACE), and 1.5% on textual entailment (MultiNLI).

Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.03

0.030651119 = product of:
  0.09195335 = sum of:
    0.09195335 = sum of:
      0.05068671 = weight(_text_:web in 4436) [ClassicSimilarity], result of:
        0.05068671 = score(doc=4436,freq=4.0), product of:
          0.1656677 = queryWeight, product of:
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.050763648 = queryNorm
          0.3059541 = fieldWeight in 4436, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.046875 = fieldNorm(doc=4436)
      0.041266643 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
        0.041266643 = score(doc=4436,freq=2.0), product of:
          0.17776565 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050763648 = queryNorm
          0.23214069 = fieldWeight in 4436, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=4436)
  0.33333334 = coord(1/3)

Abstract: Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
Date: 16. 2.2000 14:22:39

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.03

0.026875377 = product of:
  0.08062613 = sum of:
    0.08062613 = product of:
      0.24187839 = sum of:
        0.24187839 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.24187839 = score(doc=862,freq=2.0), product of:
            0.43037477 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.050763648 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Sparck Jones, K.; Galliers, J.R.: Evaluating natural language processing systems : an analysis and review (1996) 0.02
```
0.022021351 = product of:
  0.06606405 = sum of:
    0.06606405 = weight(_text_:wide in 2934) [ClassicSimilarity], result of:
      0.06606405 = score(doc=2934,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.29372054 = fieldWeight in 2934, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=2934)
  0.33333334 = coord(1/3)
```
Abstract

This comprehensive state-of-the-art book is the first devoted to the important and timely issue of evaluating NLP systems. It addresses the whole area of NLP system evaluation, including aims and scope, problems and methodology. The authors provide a wide-ranging and careful analysis of evaluation concepts, reinforced with extensive illustrations; they relate systems to their environments and develop a framework for proper evaluation. The discussion of principles is completed by a detailed review of practice and strategies in the field, covering both systems for specific tasks, like translation, and core language processors. The methodology lessons drawn from the analysis and review are applied in a series of example cases. A comprehensive bibliography, a subject index, and term glossary are included
Goller, C.; Löning, J.; Will, T.; Wolff, W.: Automatic document classification : a thourough evaluation of various methods (2000) 0.02
```
0.022021351 = product of:
  0.06606405 = sum of:
    0.06606405 = weight(_text_:wide in 5480) [ClassicSimilarity], result of:
      0.06606405 = score(doc=5480,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.29372054 = fieldWeight in 5480, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=5480)
  0.33333334 = coord(1/3)
```
Abstract

(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. In this paper we thoroughly evaluate a wide variety of these methods on a document classification task for German text. We evaluate different feature construction and selection methods and various classifiers. Our main results are: (1) feature selection is necessary not only to reduce learning and classification time, but also to avoid overfitting (even for Support Vector Machines); (2) surprisingly, our morphological analysis does not improve classification quality compared to a letter 5-gram approach; (3) Support Vector Machines are significantly better than all other classification methods
Galvez, C.; Moya-Anegón, F. de; Solana, V.H.: Term conflation methods in information retrieval : non-linguistic and linguistic approaches (2005) 0.02
```
0.022021351 = product of:
  0.06606405 = sum of:
    0.06606405 = weight(_text_:wide in 4394) [ClassicSimilarity], result of:
      0.06606405 = score(doc=4394,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.29372054 = fieldWeight in 4394, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=4394)
  0.33333334 = coord(1/3)
```
Abstract

Purpose - To propose a categorization of the different conflation procedures at the two basic approaches, non-linguistic and linguistic techniques, and to justify the application of normalization methods within the framework of linguistic techniques. Design/methodology/approach - Presents a range of term conflation methods, that can be used in information retrieval. The uniterm and multiterm variants can be considered equivalent units for the purposes of automatic indexing. Stemming algorithms, segmentation rules, association measures and clustering techniques are well evaluated non-linguistic methods, and experiments with these techniques show a wide variety of results. Alternatively, the lemmatisation and the use of syntactic pattern-matching, through equivalence relations represented in finite-state transducers (FST), are emerging methods for the recognition and standardization of terms. Findings - The survey attempts to point out the positive and negative effects of the linguistic approach and its potential as a term conflation method. Originality/value - Outlines the importance of FSTs for the normalization of term variants.
Kreymer, O.: ¬An evaluation of help mechanisms in natural language information retrieval systems (2002) 0.02
```
0.022021351 = product of:
  0.06606405 = sum of:
    0.06606405 = weight(_text_:wide in 2557) [ClassicSimilarity], result of:
      0.06606405 = score(doc=2557,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.29372054 = fieldWeight in 2557, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=2557)
  0.33333334 = coord(1/3)
```
Abstract

The field of natural language processing (NLP) demonstrates rapid changes in the design of information retrieval systems and human-computer interaction. While natural language is being looked on as the most effective tool for information retrieval in a contemporary information environment, the systems using it are only beginning to emerge. This study attempts to evaluate the current state of NLP information retrieval systems from the user's point of view: what techniques are used by these systems to guide their users through the search process? The analysis focused on the structure and components of the systems' help mechanisms. Results of the study demonstrated that systems which claimed to be using natural language searching in fact used a wide range of information retrieval techniques from real natural language processing to Boolean searching. As a result, the user assistance mechanisms of these systems also varied. While pseudo-NLP systems would suit a more traditional method of instruction, real NLP systems primarily utilised the methods of explanation and user-system dialogue.
Vasalou, A.; Gill, A.J.; Mazanderani, F.; Papoutsi, C.; Joinson, A.: Privacy dictionary : a new resource for the automated content analysis of privacy (2011) 0.02
```
0.022021351 = product of:
  0.06606405 = sum of:
    0.06606405 = weight(_text_:wide in 4915) [ClassicSimilarity], result of:
      0.06606405 = score(doc=4915,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.29372054 = fieldWeight in 4915, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=4915)
  0.33333334 = coord(1/3)
```
Abstract

This article presents the privacy dictionary, a new linguistic resource for automated content analysis on privacy-related texts. To overcome the definitional challenges inherent in privacy research, the dictionary was informed by an inclusive set of relevant theoretical perspectives. Using methods from corpus linguistics, we constructed and validated eight dictionary categories on empirical material from a wide range of privacy-sensitive contexts. It was shown that the dictionary categories are able to measure unique linguistic patterns within privacy discussions. At a time when privacy considerations are increasing and online resources provide ever-growing quantities of textual data, the privacy dictionary can play a significant role not only for research in the social sciences but also in technology design and policymaking.
Korman, D.Z.; Mack, E.; Jett, J.; Renear, A.H.: Defining textual entailment (2018) 0.02
```
0.022021351 = product of:
  0.06606405 = sum of:
    0.06606405 = weight(_text_:wide in 4284) [ClassicSimilarity], result of:
      0.06606405 = score(doc=4284,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.29372054 = fieldWeight in 4284, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=4284)
  0.33333334 = coord(1/3)
```
Abstract

Textual entailment is a relationship that obtains between fragments of text when one fragment in some sense implies the other fragment. The automation of textual entailment recognition supports a wide variety of text-based tasks, including information retrieval, information extraction, question answering, text summarization, and machine translation. Much ingenuity has been devoted to developing algorithms for identifying textual entailments, but relatively little to saying what textual entailment actually is. This article is a review of the logical and philosophical issues involved in providing an adequate definition of textual entailment. We show that many natural definitions of textual entailment are refuted by counterexamples, including the most widely cited definition of Dagan et al. We then articulate and defend the following revised definition: T textually entails H?=?df typically, a human reading T would be justified in inferring the proposition expressed by H from the proposition expressed by T. We also show that textual entailment is context-sensitive, nontransitive, and nonmonotonic.
Aydin, Ö.; Karaarslan, E.: OpenAI ChatGPT generated literature review: : digital twin in healthcare (2022) 0.02
```
0.020761931 = product of:
  0.062285792 = sum of:
    0.062285792 = weight(_text_:wide in 851) [ClassicSimilarity], result of:
      0.062285792 = score(doc=851,freq=4.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.2769224 = fieldWeight in 851, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.03125 = fieldNorm(doc=851)
  0.33333334 = coord(1/3)
```
Abstract

Literature review articles are essential to summarize the related work in the selected field. However, covering all related studies takes too much time and effort. This study questions how Artificial Intelligence can be used in this process. We used ChatGPT to create a literature review article to show the stage of the OpenAI ChatGPT artificial intelligence application. As the subject, the applications of Digital Twin in the health field were chosen. Abstracts of the last three years (2020, 2021 and 2022) papers were obtained from the keyword "Digital twin in healthcare" search results on Google Scholar and paraphrased by ChatGPT. Later on, we asked ChatGPT questions. The results are promising; however, the paraphrased parts had significant matches when checked with the Ithenticate tool. This article is the first attempt to show the compilation and expression of knowledge will be accelerated with the help of artificial intelligence. We are still at the beginning of such advances. The future academic publishing process will require less human effort, which in turn will allow academics to focus on their studies. In future studies, we will monitor citations to this study to evaluate the academic validity of the content produced by the ChatGPT. 1. Introduction OpenAI ChatGPT (ChatGPT, 2022) is a chatbot based on the OpenAI GPT-3 language model. It is designed to generate human-like text responses to user input in a conversational context. OpenAI ChatGPT is trained on a large dataset of human conversations and can be used to create responses to a wide range of topics and prompts. The chatbot can be used for customer service, content creation, and language translation tasks, creating replies in multiple languages. OpenAI ChatGPT is available through the OpenAI API, which allows developers to access and integrate the chatbot into their applications and systems. OpenAI ChatGPT is a variant of the GPT (Generative Pre-trained Transformer) language model developed by OpenAI. It is designed to generate human-like text, allowing it to engage in conversation with users naturally and intuitively. OpenAI ChatGPT is trained on a large dataset of human conversations, allowing it to understand and respond to a wide range of topics and contexts. It can be used in various applications, such as chatbots, customer service agents, and language translation systems. OpenAI ChatGPT is a state-of-the-art language model able to generate coherent and natural text that can be indistinguishable from text written by a human. As an artificial intelligence, ChatGPT may need help to change academic writing practices. However, it can provide information and guidance on ways to improve people's academic writing skills.
Hmeidi, I.I.; Al-Shalabi, R.F.; Al-Taani, A.T.; Najadat, H.; Al-Hazaimeh, S.A.: ¬A novel approach to the extraction of roots from Arabic words using bigrams (2010) 0.02
```
0.018351128 = product of:
  0.055053383 = sum of:
    0.055053383 = weight(_text_:wide in 3426) [ClassicSimilarity], result of:
      0.055053383 = score(doc=3426,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.24476713 = fieldWeight in 3426, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3426)
  0.33333334 = coord(1/3)
```
Abstract

Root extraction is one of the most important topics in information retrieval (IR), natural language processing (NLP), text summarization, and many other important fields. In the last two decades, several algorithms have been proposed to extract Arabic roots. Most of these algorithms dealt with triliteral roots only, and some with fixed length words only. In this study, a novel approach to the extraction of roots from Arabic words using bigrams is proposed. Two similarity measures are used, the dissimilarity measure called the Manhattan distance, and Dice's measure of similarity. The proposed algorithm is tested on the Holy Qu'ran and on a corpus of 242 abstracts from the Proceedings of the Saudi Arabian National Computer Conferences. The two files used contain a wide range of data: the Holy Qu'ran contains most of the ancient Arabic words while the other file contains some modern Arabic words and some words borrowed from foreign languages in addition to the original Arabic words. The results of this study showed that combining N-grams with the Dice measure gives better results than using the Manhattan distance measure.

Search (85 results, page 1 of 5)

Authors

Years

Languages

Types

Themes

Subjects

Classifications