Search (30 results, page 1 of 2)

  • × theme_ss:"Computerlinguistik"
  • × type_ss:"el"
  1. Artemenko, O.; Shramko, M.: Entwicklung eines Werkzeugs zur Sprachidentifikation in mono- und multilingualen Texten (2005) 0.02
    0.021248309 = product of:
      0.058432847 = sum of:
        0.024445795 = weight(_text_:wide in 572) [ClassicSimilarity], result of:
          0.024445795 = score(doc=572,freq=2.0), product of:
            0.14267668 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.032201413 = queryNorm
            0.171337 = fieldWeight in 572, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.02734375 = fieldNorm(doc=572)
        0.01875569 = weight(_text_:web in 572) [ClassicSimilarity], result of:
          0.01875569 = score(doc=572,freq=4.0), product of:
            0.10508965 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.032201413 = queryNorm
            0.17847323 = fieldWeight in 572, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.02734375 = fieldNorm(doc=572)
        0.003837415 = weight(_text_:information in 572) [ClassicSimilarity], result of:
          0.003837415 = score(doc=572,freq=2.0), product of:
            0.05652887 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.032201413 = queryNorm
            0.06788416 = fieldWeight in 572, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02734375 = fieldNorm(doc=572)
        0.011393951 = weight(_text_:retrieval in 572) [ClassicSimilarity], result of:
          0.011393951 = score(doc=572,freq=2.0), product of:
            0.09740654 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.032201413 = queryNorm
            0.11697317 = fieldWeight in 572, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02734375 = fieldNorm(doc=572)
      0.36363637 = coord(4/11)
    
    Abstract
    Mit der Verbreitung des Internets vermehrt sich die Menge der im World Wide Web verfügbaren Dokumente. Die Gewährleistung eines effizienten Zugangs zu gewünschten Informationen für die Internetbenutzer wird zu einer großen Herausforderung an die moderne Informationsgesellschaft. Eine Vielzahl von Werkzeugen wird bereits eingesetzt, um den Nutzern die Orientierung in der wachsenden Informationsflut zu erleichtern. Allerdings stellt die enorme Menge an unstrukturierten und verteilten Informationen nicht die einzige Schwierigkeit dar, die bei der Entwicklung von Werkzeugen dieser Art zu bewältigen ist. Die zunehmende Vielsprachigkeit von Web-Inhalten resultiert in dem Bedarf an Sprachidentifikations-Software, die Sprache/en von elektronischen Dokumenten zwecks gezielter Weiterverarbeitung identifiziert. Solche Sprachidentifizierer können beispielsweise effektiv im Bereich des Multilingualen Information Retrieval eingesetzt werden, da auf den Sprachidentifikationsergebnissen Prozesse der automatischen Indexbildung wie Stemming, Stoppwörterextraktion etc. aufbauen. In der vorliegenden Arbeit wird das neue System "LangIdent" zur Sprachidentifikation von elektronischen Textdokumenten vorgestellt, das in erster Linie für Lehre und Forschung an der Universität Hildesheim verwendet werden soll. "LangIdent" enthält eine Auswahl von gängigen Algorithmen zu der monolingualen Sprachidentifikation, die durch den Benutzer interaktiv ausgewählt und eingestellt werden können. Zusätzlich wurde im System ein neuer Algorithmus implementiert, der die Identifikation von Sprachen, in denen ein multilinguales Dokument verfasst ist, ermöglicht. Die Identifikation beschränkt sich nicht nur auf eine Aufzählung von gefundenen Sprachen, vielmehr wird der Text in monolinguale Abschnitte aufgeteilt, jeweils mit der Angabe der identifizierten Sprache.
  2. Rajasurya, S.; Muralidharan, T.; Devi, S.; Swamynathan, S.: Semantic information retrieval using ontology in university domain (2012) 0.02
    0.019201802 = product of:
      0.07040661 = sum of:
        0.037892215 = weight(_text_:web in 2861) [ClassicSimilarity], result of:
          0.037892215 = score(doc=2861,freq=8.0), product of:
            0.10508965 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.032201413 = queryNorm
            0.36057037 = fieldWeight in 2861, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2861)
        0.00949514 = weight(_text_:information in 2861) [ClassicSimilarity], result of:
          0.00949514 = score(doc=2861,freq=6.0), product of:
            0.05652887 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.032201413 = queryNorm
            0.16796975 = fieldWeight in 2861, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2861)
        0.023019256 = weight(_text_:retrieval in 2861) [ClassicSimilarity], result of:
          0.023019256 = score(doc=2861,freq=4.0), product of:
            0.09740654 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.032201413 = queryNorm
            0.23632148 = fieldWeight in 2861, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2861)
      0.27272728 = coord(3/11)
    
    Abstract
    Today's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.
  3. Rindflesch, T.C.; Aronson, A.R.: Semantic processing in information retrieval (1993) 0.01
    0.012055447 = product of:
      0.06630496 = sum of:
        0.01534966 = weight(_text_:information in 4121) [ClassicSimilarity], result of:
          0.01534966 = score(doc=4121,freq=8.0), product of:
            0.05652887 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.032201413 = queryNorm
            0.27153665 = fieldWeight in 4121, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4121)
        0.0509553 = weight(_text_:retrieval in 4121) [ClassicSimilarity], result of:
          0.0509553 = score(doc=4121,freq=10.0), product of:
            0.09740654 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.032201413 = queryNorm
            0.5231199 = fieldWeight in 4121, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4121)
      0.18181819 = coord(2/11)
    
    Abstract
    Intuition suggests that one way to enhance the information retrieval process would be the use of phrases to characterize the contents of text. A number of researchers, however, have noted that phrases alone do not improve retrieval effectiveness. In this paper we briefly review the use of phrases in information retrieval and then suggest extensions to this paradigm using semantic information. We claim that semantic processing, which can be viewed as expressing relations between the concepts represented by phrases, will in fact enhance retrieval effectiveness. The availability of the UMLS® domain model, which we exploit extensively, significantly contributes to the feasibility of this processing.
  4. Spitkovsky, V.; Norvig, P.: From words to concepts and back : dictionaries for linking text, entities and ideas (2012) 0.01
    0.0119072115 = product of:
      0.043659773 = sum of:
        0.026252497 = weight(_text_:web in 337) [ClassicSimilarity], result of:
          0.026252497 = score(doc=337,freq=6.0), product of:
            0.10508965 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.032201413 = queryNorm
            0.24981049 = fieldWeight in 337, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=337)
        0.004385617 = weight(_text_:information in 337) [ClassicSimilarity], result of:
          0.004385617 = score(doc=337,freq=2.0), product of:
            0.05652887 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.032201413 = queryNorm
            0.0775819 = fieldWeight in 337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=337)
        0.013021658 = weight(_text_:retrieval in 337) [ClassicSimilarity], result of:
          0.013021658 = score(doc=337,freq=2.0), product of:
            0.09740654 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.032201413 = queryNorm
            0.13368362 = fieldWeight in 337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03125 = fieldNorm(doc=337)
      0.27272728 = coord(3/11)
    
    Abstract
    Human language is both rich and ambiguous. When we hear or read words, we resolve meanings to mental representations, for example recognizing and linking names to the intended persons, locations or organizations. Bridging words and meaning - from turning search queries into relevant results to suggesting targeted keywords for advertisers - is also Google's core competency, and important for many other tasks in information retrieval and natural language processing. We are happy to release a resource, spanning 7,560,141 concepts and 175,100,788 unique text strings, that we hope will help everyone working in these areas. How do we represent concepts? Our approach piggybacks on the unique titles of entries from an encyclopedia, which are mostly proper and common noun phrases. We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia's groupings of articles into hierarchical categories. The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article's canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept's url. Our database thus includes weights that measure degrees of association. For example, the top two entries for football indicate that it is an ambiguous term, which is almost twice as likely to refer to what we in the US call soccer. Vgl. auch: Spitkovsky, V.I., A.X. Chang: A cross-lingual dictionary for english Wikipedia concepts. In: http://nlp.stanford.edu/pubs/crosswikis.pdf.
  5. Aizawa, A.; Kohlhase, M.: Mathematical information retrieval (2021) 0.01
    0.011077357 = product of:
      0.060925465 = sum of:
        0.01534966 = weight(_text_:information in 667) [ClassicSimilarity], result of:
          0.01534966 = score(doc=667,freq=8.0), product of:
            0.05652887 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.032201413 = queryNorm
            0.27153665 = fieldWeight in 667, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=667)
        0.045575805 = weight(_text_:retrieval in 667) [ClassicSimilarity], result of:
          0.045575805 = score(doc=667,freq=8.0), product of:
            0.09740654 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.032201413 = queryNorm
            0.46789268 = fieldWeight in 667, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=667)
      0.18181819 = coord(2/11)
    
    Abstract
    We present an overview of the NTCIR Math Tasks organized during NTCIR-10, 11, and 12. These tasks are primarily dedicated to techniques for searching mathematical content with formula expressions. In this chapter, we first summarize the task design and introduce test collections generated in the tasks. We also describe the features and main challenges of mathematical information retrieval systems and discuss future perspectives in the field.
    Series
    ¬The Information retrieval series, vol 43
    Source
    Evaluating information retrieval and access tasks. Eds.: Sakai, T., Oard, D., Kando, N. [https://doi.org/10.1007/978-981-15-5554-1_12]
  6. Wong, W.; Liu, W.; Bennamoun, M.: Ontology learning from text : a look back and into the future (2010) 0.01
    0.008793678 = product of:
      0.048365228 = sum of:
        0.03751138 = weight(_text_:web in 4733) [ClassicSimilarity], result of:
          0.03751138 = score(doc=4733,freq=4.0), product of:
            0.10508965 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.032201413 = queryNorm
            0.35694647 = fieldWeight in 4733, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4733)
        0.010853848 = weight(_text_:information in 4733) [ClassicSimilarity], result of:
          0.010853848 = score(doc=4733,freq=4.0), product of:
            0.05652887 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.032201413 = queryNorm
            0.1920054 = fieldWeight in 4733, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4733)
      0.18181819 = coord(2/11)
    
    Abstract
    Ontologies are often viewed as the answer to the need for inter-operable semantics in modern information systems. The explosion of textual information on the "Read/Write" Web coupled with the increasing demand for ontologies to power the Semantic Web have made (semi-)automatic ontology learning from text a very promising research area. This together with the advanced state in related areas such as natural language processing have fuelled research into ontology learning over the past decade. This survey looks at how far we have come since the turn of the millennium, and discusses the remaining challenges that will define the research directions in this area in the near future.
  7. Aydin, Ö.; Karaarslan, E.: OpenAI ChatGPT generated literature review: : digital twin in healthcare (2022) 0.01
    0.007981089 = product of:
      0.043895986 = sum of:
        0.03951037 = weight(_text_:wide in 851) [ClassicSimilarity], result of:
          0.03951037 = score(doc=851,freq=4.0), product of:
            0.14267668 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.032201413 = queryNorm
            0.2769224 = fieldWeight in 851, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.03125 = fieldNorm(doc=851)
        0.004385617 = weight(_text_:information in 851) [ClassicSimilarity], result of:
          0.004385617 = score(doc=851,freq=2.0), product of:
            0.05652887 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.032201413 = queryNorm
            0.0775819 = fieldWeight in 851, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=851)
      0.18181819 = coord(2/11)
    
    Abstract
    Literature review articles are essential to summarize the related work in the selected field. However, covering all related studies takes too much time and effort. This study questions how Artificial Intelligence can be used in this process. We used ChatGPT to create a literature review article to show the stage of the OpenAI ChatGPT artificial intelligence application. As the subject, the applications of Digital Twin in the health field were chosen. Abstracts of the last three years (2020, 2021 and 2022) papers were obtained from the keyword "Digital twin in healthcare" search results on Google Scholar and paraphrased by ChatGPT. Later on, we asked ChatGPT questions. The results are promising; however, the paraphrased parts had significant matches when checked with the Ithenticate tool. This article is the first attempt to show the compilation and expression of knowledge will be accelerated with the help of artificial intelligence. We are still at the beginning of such advances. The future academic publishing process will require less human effort, which in turn will allow academics to focus on their studies. In future studies, we will monitor citations to this study to evaluate the academic validity of the content produced by the ChatGPT. 1. Introduction OpenAI ChatGPT (ChatGPT, 2022) is a chatbot based on the OpenAI GPT-3 language model. It is designed to generate human-like text responses to user input in a conversational context. OpenAI ChatGPT is trained on a large dataset of human conversations and can be used to create responses to a wide range of topics and prompts. The chatbot can be used for customer service, content creation, and language translation tasks, creating replies in multiple languages. OpenAI ChatGPT is available through the OpenAI API, which allows developers to access and integrate the chatbot into their applications and systems. OpenAI ChatGPT is a variant of the GPT (Generative Pre-trained Transformer) language model developed by OpenAI. It is designed to generate human-like text, allowing it to engage in conversation with users naturally and intuitively. OpenAI ChatGPT is trained on a large dataset of human conversations, allowing it to understand and respond to a wide range of topics and contexts. It can be used in various applications, such as chatbots, customer service agents, and language translation systems. OpenAI ChatGPT is a state-of-the-art language model able to generate coherent and natural text that can be indistinguishable from text written by a human. As an artificial intelligence, ChatGPT may need help to change academic writing practices. However, it can provide information and guidance on ways to improve people's academic writing skills.
  8. Chowdhury, A.; Mccabe, M.C.: Improving information retrieval systems using part of speech tagging (1993) 0.01
    0.0067138923 = product of:
      0.036926407 = sum of:
        0.009303299 = weight(_text_:information in 1061) [ClassicSimilarity], result of:
          0.009303299 = score(doc=1061,freq=4.0), product of:
            0.05652887 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.032201413 = queryNorm
            0.16457605 = fieldWeight in 1061, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1061)
        0.02762311 = weight(_text_:retrieval in 1061) [ClassicSimilarity], result of:
          0.02762311 = score(doc=1061,freq=4.0), product of:
            0.09740654 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.032201413 = queryNorm
            0.2835858 = fieldWeight in 1061, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=1061)
      0.18181819 = coord(2/11)
    
    Abstract
    The object of Information Retrieval is to retrieve all relevant documents for a user query and only those relevant documents. Much research has focused on achieving this objective with little regard for storage overhead or performance. In the paper we evaluate the use of Part of Speech Tagging to improve, the index storage overhead and general speed of the system with only a minimal reduction to precision recall measurements. We tagged 500Mbs of the Los Angeles Times 1990 and 1989 document collection provided by TREC for parts of speech. We then experimented to find the most relevant part of speech to index. We show that 90% of precision recall is achieved with 40% of the document collections terms. We also show that this is a improvement in overhead with only a 1% reduction in precision recall.
  9. Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I.: Improving language understanding by Generative Pre-Training 0.01
    0.005387778 = product of:
      0.059265554 = sum of:
        0.059265554 = weight(_text_:wide in 870) [ClassicSimilarity], result of:
          0.059265554 = score(doc=870,freq=4.0), product of:
            0.14267668 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.032201413 = queryNorm
            0.4153836 = fieldWeight in 870, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.046875 = fieldNorm(doc=870)
      0.09090909 = coord(1/11)
    
    Abstract
    Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. For instance, we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on question answering (RACE), and 1.5% on textual entailment (MultiNLI).
  10. Galitsky, B.: Can many agents answer questions better than one? (2005) 0.00
    0.004747439 = product of:
      0.026110914 = sum of:
        0.0065784254 = weight(_text_:information in 3094) [ClassicSimilarity], result of:
          0.0065784254 = score(doc=3094,freq=2.0), product of:
            0.05652887 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.032201413 = queryNorm
            0.116372846 = fieldWeight in 3094, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3094)
        0.019532489 = weight(_text_:retrieval in 3094) [ClassicSimilarity], result of:
          0.019532489 = score(doc=3094,freq=2.0), product of:
            0.09740654 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.032201413 = queryNorm
            0.20052543 = fieldWeight in 3094, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=3094)
      0.18181819 = coord(2/11)
    
    Abstract
    The paper addresses the issue of how online natural language question answering, based on deep semantic analysis, may compete with currently popular keyword search, open domain information retrieval systems, covering a horizontal domain. We suggest the multiagent question answering approach, where each domain is represented by an agent which tries to answer questions taking into account its specific knowledge. The meta-agent controls the cooperation between question answering agents and chooses the most relevant answer(s). We argue that multiagent question answering is optimal in terms of access to business and financial knowledge, flexibility in query phrasing, and efficiency and usability of advice. The knowledge and advice encoded in the system are initially prepared by domain experts. We analyze the commercial application of multiagent question answering and the robustness of the meta-agent. The paper suggests that a multiagent architecture is optimal when a real world question answering domain combines a number of vertical ones to form a horizontal domain.
  11. Rötzer, F.: Computer ergooglen die Bedeutung von Worten (2005) 0.00
    0.0044256407 = product of:
      0.024341023 = sum of:
        0.019689374 = weight(_text_:web in 3385) [ClassicSimilarity], result of:
          0.019689374 = score(doc=3385,freq=6.0), product of:
            0.10508965 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.032201413 = queryNorm
            0.18735787 = fieldWeight in 3385, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0234375 = fieldNorm(doc=3385)
        0.0046516494 = weight(_text_:information in 3385) [ClassicSimilarity], result of:
          0.0046516494 = score(doc=3385,freq=4.0), product of:
            0.05652887 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.032201413 = queryNorm
            0.08228803 = fieldWeight in 3385, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0234375 = fieldNorm(doc=3385)
      0.18181819 = coord(2/11)
    
    Content
    "Wie könnten Computer Sprache lernen und dabei auch die Bedeutung von Worten sowie die Beziehungen zwischen ihnen verstehen? Dieses Problem der Semantik stellt eine gewaltige, bislang nur ansatzweise bewältigte Aufgabe dar, da Worte und Wortverbindungen oft mehrere oder auch viele Bedeutungen haben, die zudem vom außersprachlichen Kontext abhängen. Die beiden holländischen (Ein künstliches Bewusstsein aus einfachen Aussagen (1)). Paul Vitanyi (2) und Rudi Cilibrasi vom Nationalen Institut für Mathematik und Informatik (3) in Amsterdam schlagen eine elegante Lösung vor: zum Nachschlagen im Internet, der größten Datenbank, die es gibt, wird einfach Google benutzt. Objekte wie eine Maus können mit ihren Namen "Maus" benannt werden, die Bedeutung allgemeiner Begriffe muss aus ihrem Kontext gelernt werden. Ein semantisches Web zur Repräsentation von Wissen besteht aus den möglichen Verbindungen, die Objekte und ihre Namen eingehen können. Natürlich können in der Wirklichkeit neue Namen, aber auch neue Bedeutungen und damit neue Verknüpfungen geschaffen werden. Sprache ist lebendig und flexibel. Um einer Künstlichen Intelligenz alle Wortbedeutungen beizubringen, müsste mit der Hilfe von menschlichen Experten oder auch vielen Mitarbeitern eine riesige Datenbank mit den möglichen semantischen Netzen aufgebaut und dazu noch ständig aktualisiert werden. Das aber müsste gar nicht notwendig sein, denn mit dem Web gibt es nicht nur die größte und weitgehend kostenlos benutzbare semantische Datenbank, sie wird auch ständig von zahllosen Internetnutzern aktualisiert. Zudem gibt es Suchmaschinen wie Google, die Verbindungen zwischen Worten und damit deren Bedeutungskontext in der Praxis in ihrer Wahrscheinlichkeit quantitativ mit der Angabe der Webseiten, auf denen sie gefunden wurden, messen.
    Mit einem bereits zuvor von Paul Vitanyi und anderen entwickeltem Verfahren, das den Zusammenhang von Objekten misst (normalized information distance - NID ), kann die Nähe zwischen bestimmten Objekten (Bilder, Worte, Muster, Intervalle, Genome, Programme etc.) anhand aller Eigenschaften analysiert und aufgrund der dominanten gemeinsamen Eigenschaft bestimmt werden. Ähnlich können auch die allgemein verwendeten, nicht unbedingt "wahren" Bedeutungen von Namen mit der Google-Suche erschlossen werden. 'At this moment one database stands out as the pinnacle of computer-accessible human knowledge and the most inclusive summary of statistical information: the Google search engine. There can be no doubt that Google has already enabled science to accelerate tremendously and revolutionized the research process. It has dominated the attention of internet users for years, and has recently attracted substantial attention of many Wall Street investors, even reshaping their ideas of company financing.' (Paul Vitanyi und Rudi Cilibrasi) Gibt man ein Wort ein wie beispielsweise "Pferd", erhält man bei Google 4.310.000 indexierte Seiten. Für "Reiter" sind es 3.400.000 Seiten. Kombiniert man beide Begriffe, werden noch 315.000 Seiten erfasst. Für das gemeinsame Auftreten beispielsweise von "Pferd" und "Bart" werden zwar noch immer erstaunliche 67.100 Seiten aufgeführt, aber man sieht schon, dass "Pferd" und "Reiter" enger zusammen hängen. Daraus ergibt sich eine bestimmte Wahrscheinlichkeit für das gemeinsame Auftreten von Begriffen. Aus dieser Häufigkeit, die sich im Vergleich mit der maximalen Menge (5.000.000.000) an indexierten Seiten ergibt, haben die beiden Wissenschaftler eine statistische Größe entwickelt, die sie "normalised Google distance" (NGD) nennen und die normalerweise zwischen 0 und 1 liegt. Je geringer NGD ist, desto enger hängen zwei Begriffe zusammen. "Das ist eine automatische Bedeutungsgenerierung", sagt Vitanyi gegenüber dern New Scientist (4). "Das könnte gut eine Möglichkeit darstellen, einen Computer Dinge verstehen und halbintelligent handeln zu lassen." Werden solche Suchen immer wieder durchgeführt, lässt sich eine Karte für die Verbindungen von Worten erstellen. Und aus dieser Karte wiederum kann ein Computer, so die Hoffnung, auch die Bedeutung der einzelnen Worte in unterschiedlichen natürlichen Sprachen und Kontexten erfassen. So habe man über einige Suchen realisiert, dass ein Computer zwischen Farben und Zahlen unterscheiden, holländische Maler aus dem 17. Jahrhundert und Notfälle sowie Fast-Notfälle auseinander halten oder elektrische oder religiöse Begriffe verstehen könne. Überdies habe eine einfache automatische Übersetzung Englisch-Spanisch bewerkstelligt werden können. Auf diese Weise ließe sich auch, so hoffen die Wissenschaftler, die Bedeutung von Worten erlernen, könne man Spracherkennung verbessern oder ein semantisches Web erstellen und natürlich endlich eine bessere automatische Übersetzung von einer Sprache in die andere realisieren.
  12. Bedathur, S.; Narang, A.: Mind your language : effects of spoken query formulation on retrieval effectiveness (2013) 0.00
    0.0029297236 = product of:
      0.032226957 = sum of:
        0.032226957 = weight(_text_:retrieval in 1150) [ClassicSimilarity], result of:
          0.032226957 = score(doc=1150,freq=4.0), product of:
            0.09740654 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.032201413 = queryNorm
            0.33085006 = fieldWeight in 1150, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1150)
      0.09090909 = coord(1/11)
    
    Abstract
    Voice search is becoming a popular mode for interacting with search engines. As a result, research has gone into building better voice transcription engines, interfaces, and search engines that better handle inherent verbosity of queries. However, when one considers its use by non- native speakers of English, another aspect that becomes important is the formulation of the query by users. In this paper, we present the results of a preliminary study that we conducted with non-native English speakers who formulate queries for given retrieval tasks. Our results show that the current search engines are sensitive in their rankings to the query formulation, and thus highlights the need for developing more robust ranking methods.
  13. Perovsek, M.; Kranjca, J.; Erjaveca, T.; Cestnika, B.; Lavraca, N.: TextFlows : a visual programming platform for text mining and natural language processing (2016) 0.00
    0.0029229645 = product of:
      0.032152608 = sum of:
        0.032152608 = weight(_text_:web in 2697) [ClassicSimilarity], result of:
          0.032152608 = score(doc=2697,freq=4.0), product of:
            0.10508965 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.032201413 = queryNorm
            0.3059541 = fieldWeight in 2697, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=2697)
      0.09090909 = coord(1/11)
    
    Abstract
    Text mining and natural language processing are fast growing areas of research, with numerous applications in business, science and creative industries. This paper presents TextFlows, a web-based text mining and natural language processing platform supporting workflow construction, sharing and execution. The platform enables visual construction of text mining workflows through a web browser, and the execution of the constructed workflows on a processing cloud. This makes TextFlows an adaptable infrastructure for the construction and sharing of text processing workflows, which can be reused in various applications. The paper presents the implemented text mining and language processing modules, and describes some precomposed workflows. Their features are demonstrated on three use cases: comparison of document classifiers and of different part-of-speech taggers on a text categorization problem, and outlier detection in document corpora.
  14. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.00
    0.0027693394 = product of:
      0.015231366 = sum of:
        0.003837415 = weight(_text_:information in 1536) [ClassicSimilarity], result of:
          0.003837415 = score(doc=1536,freq=2.0), product of:
            0.05652887 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.032201413 = queryNorm
            0.06788416 = fieldWeight in 1536, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1536)
        0.011393951 = weight(_text_:retrieval in 1536) [ClassicSimilarity], result of:
          0.011393951 = score(doc=1536,freq=2.0), product of:
            0.09740654 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.032201413 = queryNorm
            0.11697317 = fieldWeight in 1536, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1536)
      0.18181819 = coord(2/11)
    
    Abstract
    Multiword expressions (MWEs) are lexical items that can be decomposed into single words and display lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasy (Sag et al., 2002; Kim, 2008; Calzolari et al., 2002). The proper treatment of multiword expressions such as rock 'n' roll and make a decision is essential for many natural language processing (NLP) applications like information extraction and retrieval, terminology extraction and machine translation, and it is important to identify multiword expressions in context. For example, in machine translation we must know that MWEs form one semantic unit, hence their parts should not be translated separately. For this, multiword expressions should be identified first in the text to be translated. The chief aim of this thesis is to develop machine learning-based approaches for the automatic detection of different types of multiword expressions in English and Hungarian natural language texts. In our investigations, we pay attention to the characteristics of different types of multiword expressions such as nominal compounds, multiword named entities and light verb constructions, and we apply novel methods to identify MWEs in raw texts. In the thesis it will be demonstrated that nominal compounds and multiword amed entities may require a similar approach for their automatic detection as they behave in the same way from a linguistic point of view. Furthermore, it will be shown that the automatic detection of light verb constructions can be carried out using two effective machine learning-based approaches.
  15. Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.00
    0.0015864897 = product of:
      0.017451387 = sum of:
        0.017451387 = product of:
          0.05235416 = sum of:
            0.05235416 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
              0.05235416 = score(doc=4888,freq=2.0), product of:
                0.11276386 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.032201413 = queryNorm
                0.46428138 = fieldWeight in 4888, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4888)
          0.33333334 = coord(1/3)
      0.09090909 = coord(1/11)
    
    Date
    1. 3.2013 14:56:22
  16. Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; Agarwal, S.; Herbert-Voss, A.; Krueger, G.; Henighan, T.; Child, R.; Ramesh, A.; Ziegler, D.M.; Wu, J.; Winter, C.; Hesse, C.; Chen, M.; Sigler, E.; Litwin, M.; Gray, S.; Chess, B.; Clark, J.; Berner, C.; McCandlish, S.; Radford, A.; Sutskever, I.; Amodei, D.: Language models are few-shot learners (2020) 0.00
    0.0013778987 = product of:
      0.015156886 = sum of:
        0.015156886 = weight(_text_:web in 872) [ClassicSimilarity], result of:
          0.015156886 = score(doc=872,freq=2.0), product of:
            0.10508965 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.032201413 = queryNorm
            0.14422815 = fieldWeight in 872, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=872)
      0.09090909 = coord(1/11)
    
    Abstract
    Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
  17. Ramisch, C.; Schreiner, P.; Idiart, M.; Villavicencio, A.: ¬An evaluation of methods for the extraction of multiword expressions (20xx) 0.00
    0.0011276726 = product of:
      0.012404398 = sum of:
        0.012404398 = weight(_text_:information in 962) [ClassicSimilarity], result of:
          0.012404398 = score(doc=962,freq=4.0), product of:
            0.05652887 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.032201413 = queryNorm
            0.21943474 = fieldWeight in 962, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=962)
      0.09090909 = coord(1/11)
    
    Abstract
    This paper focuses on the evaluation of some methods for the automatic acquisition of Multiword Expressions (MWEs). First we investigate the hypothesis that MWEs can be detected solely by the distinct statistical properties of their component words, regardless of their type, comparing 3 statistical measures: Mutual Information, Chi**2 and Permutation Entropy. Moreover, we also look at the impact that the addition of type-specific linguistic information has on the performance of these methods.
  18. Lezius, W.: Morphy - Morphologie und Tagging für das Deutsche (2013) 0.00
    0.0010576599 = product of:
      0.011634259 = sum of:
        0.011634259 = product of:
          0.034902774 = sum of:
            0.034902774 = weight(_text_:22 in 1490) [ClassicSimilarity], result of:
              0.034902774 = score(doc=1490,freq=2.0), product of:
                0.11276386 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.032201413 = queryNorm
                0.30952093 = fieldWeight in 1490, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1490)
          0.33333334 = coord(1/3)
      0.09090909 = coord(1/11)
    
    Date
    22. 3.2015 9:30:24
  19. Bager, J.: ¬Die Text-KI ChatGPT schreibt Fachtexte, Prosa, Gedichte und Programmcode (2023) 0.00
    0.0010576599 = product of:
      0.011634259 = sum of:
        0.011634259 = product of:
          0.034902774 = sum of:
            0.034902774 = weight(_text_:22 in 835) [ClassicSimilarity], result of:
              0.034902774 = score(doc=835,freq=2.0), product of:
                0.11276386 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.032201413 = queryNorm
                0.30952093 = fieldWeight in 835, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=835)
          0.33333334 = coord(1/3)
      0.09090909 = coord(1/11)
    
    Date
    29.12.2022 18:22:55
  20. Rieger, F.: Lügende Computer (2023) 0.00
    0.0010576599 = product of:
      0.011634259 = sum of:
        0.011634259 = product of:
          0.034902774 = sum of:
            0.034902774 = weight(_text_:22 in 912) [ClassicSimilarity], result of:
              0.034902774 = score(doc=912,freq=2.0), product of:
                0.11276386 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.032201413 = queryNorm
                0.30952093 = fieldWeight in 912, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=912)
          0.33333334 = coord(1/3)
      0.09090909 = coord(1/11)
    
    Date
    16. 3.2023 19:22:55

Years

Languages

Types