Search (14 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × type_ss:"a"
  • × type_ss:"el"
  1. Franke-Maier, M.: Computerlinguistik und Bibliotheken : Editorial (2016) 0.03
    0.028887425 = product of:
      0.05777485 = sum of:
        0.03657866 = weight(_text_:data in 3206) [ClassicSimilarity], result of:
          0.03657866 = score(doc=3206,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.24703519 = fieldWeight in 3206, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3206)
        0.021196188 = product of:
          0.042392377 = sum of:
            0.042392377 = weight(_text_:processing in 3206) [ClassicSimilarity], result of:
              0.042392377 = score(doc=3206,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.22363065 = fieldWeight in 3206, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3206)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Vor 50 Jahren, im Februar 1966, wies Floyd M. Cammack auf den Zusammenhang von "Linguistics and Libraries" hin. Er ging dabei von dem Eintrag für "Linguistics" in den Library of Congress Subject Headings (LCSH) von 1957 aus, der als Verweis "See Language and Languages; Philology; Philology, Comparative" enthielt. Acht Jahre später kamen unter dem Schlagwort "Language and Languages" Ergänzungen wie "language data processing", "automatic indexing", "machine translation" und "psycholinguistics" hinzu. Für Cammack zeigt sich hier ein Netz komplexer Wechselbeziehungen, die unter dem Begriff "Linguistics" zusammengefasst werden sollten. Dieses System habe wichtigen Einfluss auf alle, die mit dem Sammeln, Organisieren, Speichern und Wiederauffinden von Informationen befasst seien. (Cammack 1966:73). Hier liegt - im übertragenen Sinne - ein Heft vor Ihnen, in dem es um computerlinguistische Verfahren in Bibliotheken geht. Letztlich geht es um eine Versachlichung der Diskussion, um den Stellenwert der Inhaltserschliessung und die Rekalibrierung ihrer Wertschätzung in Zeiten von Mega-Indizes und Big Data. Der derzeitige Widerspruch zwischen dem Wunsch nach relevanter Treffermenge in Rechercheoberflächen vs. der Erfahrung des Relevanz-Rankings ist zu lösen. Explizit auch die Frage, wie oft wir von letzterem enttäuscht wurden und was zu tun ist, um das Verhältnis von recall und precision wieder in ein angebrachtes Gleichgewicht zu bringen. Unsere Nutzerinnen und Nutzer werden es uns danken.
  2. Perovsek, M.; Kranjca, J.; Erjaveca, T.; Cestnika, B.; Lavraca, N.: TextFlows : a visual programming platform for text mining and natural language processing (2016) 0.02
    0.015575955 = product of:
      0.06230382 = sum of:
        0.06230382 = product of:
          0.12460764 = sum of:
            0.12460764 = weight(_text_:processing in 2697) [ClassicSimilarity], result of:
              0.12460764 = score(doc=2697,freq=12.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.6573372 = fieldWeight in 2697, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2697)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Text mining and natural language processing are fast growing areas of research, with numerous applications in business, science and creative industries. This paper presents TextFlows, a web-based text mining and natural language processing platform supporting workflow construction, sharing and execution. The platform enables visual construction of text mining workflows through a web browser, and the execution of the constructed workflows on a processing cloud. This makes TextFlows an adaptable infrastructure for the construction and sharing of text processing workflows, which can be reused in various applications. The paper presents the implemented text mining and language processing modules, and describes some precomposed workflows. Their features are demonstrated on three use cases: comparison of document classifiers and of different part-of-speech taggers on a text categorization problem, and outlier detection in document corpora.
  3. Rindflesch, T.C.; Aronson, A.R.: Semantic processing in information retrieval (1993) 0.01
    0.012849508 = product of:
      0.05139803 = sum of:
        0.05139803 = product of:
          0.10279606 = sum of:
            0.10279606 = weight(_text_:processing in 4121) [ClassicSimilarity], result of:
              0.10279606 = score(doc=4121,freq=6.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.54227555 = fieldWeight in 4121, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4121)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Intuition suggests that one way to enhance the information retrieval process would be the use of phrases to characterize the contents of text. A number of researchers, however, have noted that phrases alone do not improve retrieval effectiveness. In this paper we briefly review the use of phrases in information retrieval and then suggest extensions to this paradigm using semantic information. We claim that semantic processing, which can be viewed as expressing relations between the concepts represented by phrases, will in fact enhance retrieval effectiveness. The availability of the UMLS® domain model, which we exploit extensively, significantly contributes to the feasibility of this processing.
  4. Griffiths, T.L.; Steyvers, M.: ¬A probabilistic approach to semantic representation (2002) 0.01
    0.0103460075 = product of:
      0.04138403 = sum of:
        0.04138403 = weight(_text_:data in 3671) [ClassicSimilarity], result of:
          0.04138403 = score(doc=3671,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2794884 = fieldWeight in 3671, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0625 = fieldNorm(doc=3671)
      0.25 = coord(1/4)
    
    Abstract
    Semantic networks produced from human data have statistical properties that cannot be easily captured by spatial representations. We explore a probabilistic approach to semantic representation that explicitly models the probability with which words occurin diffrent contexts, and hence captures the probabilistic relationships between words. We show that this representation has statistical properties consistent with the large-scale structure of semantic networks constructed by humans, and trace the origins of these properties.
  5. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amode, D.; Sutskever, I.: Language models are unsupervised multitask learners 0.01
    0.008992782 = product of:
      0.035971127 = sum of:
        0.035971127 = product of:
          0.071942255 = sum of:
            0.071942255 = weight(_text_:processing in 871) [ClassicSimilarity], result of:
              0.071942255 = score(doc=871,freq=4.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.3795138 = fieldWeight in 871, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=871)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on task-specific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText. When conditioned on a document plus questions, the answers generated by the language model reach 55 F1 on the CoQA dataset - matching or exceeding the performance of 3 out of 4 baseline systems without using the 127,000+ training examples. The capacity of the language model is essential to the success of zero-shot task transfer and increasing it improves performance in a log-linear fashion across tasks. Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits WebText. Samples from the model reflect these improvements and contain coherent paragraphs of text. These findings suggest a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
  6. Altmann, E.G.; Cristadoro, G.; Esposti, M.D.: On the origin of long-range correlations in texts (2012) 0.01
    0.0077595054 = product of:
      0.031038022 = sum of:
        0.031038022 = weight(_text_:data in 330) [ClassicSimilarity], result of:
          0.031038022 = score(doc=330,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2096163 = fieldWeight in 330, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=330)
      0.25 = coord(1/4)
    
    Abstract
    The complexity of human interactions with social and natural phenomena is mirrored in the way we describe our experiences through natural language. In order to retain and convey such a high dimensional information, the statistical properties of our linguistic output has to be highly correlated in time. An example are the robust observations, still largely not understood, of correlations on arbitrary long scales in literary texts. In this paper we explain how long-range correlations flow from highly structured linguistic levels down to the building blocks of a text (words, letters, etc..). By combining calculations and data analysis we show that correlations take form of a bursty sequence of events once we approach the semantically relevant topics of the text. The mechanisms we identify are fairly general and can be equally applied to other hierarchical settings.
  7. Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I.: Improving language understanding by Generative Pre-Training 0.01
    0.0077595054 = product of:
      0.031038022 = sum of:
        0.031038022 = weight(_text_:data in 870) [ClassicSimilarity], result of:
          0.031038022 = score(doc=870,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2096163 = fieldWeight in 870, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=870)
      0.25 = coord(1/4)
    
    Abstract
    Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. For instance, we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on question answering (RACE), and 1.5% on textual entailment (MultiNLI).
  8. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I.: Attention Is all you need (2017) 0.01
    0.0077595054 = product of:
      0.031038022 = sum of:
        0.031038022 = weight(_text_:data in 970) [ClassicSimilarity], result of:
          0.031038022 = score(doc=970,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2096163 = fieldWeight in 970, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=970)
      0.25 = coord(1/4)
    
    Abstract
    The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
  9. Shree, P.: ¬The journey of Open AI GPT models (2020) 0.01
    0.0063588563 = product of:
      0.025435425 = sum of:
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 869) [ClassicSimilarity], result of:
              0.05087085 = score(doc=869,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 869, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=869)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Generative Pre-trained Transformer (GPT) models by OpenAI have taken natural language processing (NLP) community by storm by introducing very powerful language models. These models can perform various NLP tasks like question answering, textual entailment, text summarisation etc. without any supervised training. These language models need very few to no examples to understand the tasks and perform equivalent or even better than the state-of-the-art models trained in supervised fashion. In this article we will cover the journey of these models and understand how they have evolved over a period of 2 years. 1. Discussion of GPT-1 paper (Improving Language Understanding by Generative Pre-training). 2. Discussion of GPT-2 paper (Language Models are unsupervised multitask learners) and its subsequent improvements over GPT-1. 3. Discussion of GPT-3 paper (Language models are few shot learners) and the improvements which have made it one of the most powerful models NLP has seen till date. This article assumes familiarity with the basics of NLP terminologies and transformer architecture.
  10. Bager, J.: ¬Die Text-KI ChatGPT schreibt Fachtexte, Prosa, Gedichte und Programmcode (2023) 0.01
    0.006344468 = product of:
      0.025377871 = sum of:
        0.025377871 = product of:
          0.050755743 = sum of:
            0.050755743 = weight(_text_:22 in 835) [ClassicSimilarity], result of:
              0.050755743 = score(doc=835,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.30952093 = fieldWeight in 835, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=835)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    29.12.2022 18:22:55
  11. Rieger, F.: Lügende Computer (2023) 0.01
    0.006344468 = product of:
      0.025377871 = sum of:
        0.025377871 = product of:
          0.050755743 = sum of:
            0.050755743 = weight(_text_:22 in 912) [ClassicSimilarity], result of:
              0.050755743 = score(doc=912,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.30952093 = fieldWeight in 912, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=912)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    16. 3.2023 19:22:55
  12. Bird, S.; Dale, R.; Dorr, B.; Gibson, B.; Joseph, M.; Kan, M.-Y.; Lee, D.; Powley, B.; Radev, D.; Tan, Y.F.: ¬The ACL Anthology Reference Corpus : a reference dataset for bibliographic research in computational linguistics (2008) 0.01
    0.0059951874 = product of:
      0.02398075 = sum of:
        0.02398075 = product of:
          0.0479615 = sum of:
            0.0479615 = weight(_text_:processing in 2804) [ClassicSimilarity], result of:
              0.0479615 = score(doc=2804,freq=4.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.2530092 = fieldWeight in 2804, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2804)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Anthology that can be used for research in scholarly document processing. This corpus, which we call the ACL Anthology Reference Corpus (ACL ARC), brings together the recent activities of a number of research groups around the world. Our goal is to make the corpus widely available, and to encourage other researchers to use it as a standard testbed for experiments in both bibliographic and bibliometric research.
  13. Rajasurya, S.; Muralidharan, T.; Devi, S.; Swamynathan, S.: Semantic information retrieval using ontology in university domain (2012) 0.01
    0.005299047 = product of:
      0.021196188 = sum of:
        0.021196188 = product of:
          0.042392377 = sum of:
            0.042392377 = weight(_text_:processing in 2861) [ClassicSimilarity], result of:
              0.042392377 = score(doc=2861,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.22363065 = fieldWeight in 2861, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2861)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Today's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.
  14. Rötzer, F.: KI-Programm besser als Menschen im Verständnis natürlicher Sprache (2018) 0.00
    0.003172234 = product of:
      0.012688936 = sum of:
        0.012688936 = product of:
          0.025377871 = sum of:
            0.025377871 = weight(_text_:22 in 4217) [ClassicSimilarity], result of:
              0.025377871 = score(doc=4217,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.15476047 = fieldWeight in 4217, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4217)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 1.2018 11:32:44