Search (42 results, page 1 of 3)

Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.00

0.0026206372 = product of:
  0.02620637 = sum of:
    0.0052914224 = weight(_text_:in in 4888) [ClassicSimilarity], result of:
      0.0052914224 = score(doc=4888,freq=2.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.18034597 = fieldWeight in 4888, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.09375 = fieldNorm(doc=4888)
    0.0033805002 = weight(_text_:s in 4888) [ClassicSimilarity], result of:
      0.0033805002 = score(doc=4888,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.14414869 = fieldWeight in 4888, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.09375 = fieldNorm(doc=4888)
    0.017534448 = product of:
      0.035068896 = sum of:
        0.035068896 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
          0.035068896 = score(doc=4888,freq=2.0), product of:
            0.07553371 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.021569785 = queryNorm
            0.46428138 = fieldWeight in 4888, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4888)
      0.5 = coord(1/2)
  0.1 = coord(3/30)

Date: 1. 3.2013 14:56:22

Liu, P.J.; Saleh, M.; Pot, E.; Goodrich, B.; Sepassi, R.; Kaiser, L.; Shazeer, N.: Generating Wikipedia by summarizing long sequences (2018) 0.00

8.766588E-4 = product of:
  0.013149882 = sum of:
    0.004365201 = weight(_text_:in in 773) [ClassicSimilarity], result of:
      0.004365201 = score(doc=773,freq=4.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.14877784 = fieldWeight in 773, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=773)
    0.008784681 = product of:
      0.026354041 = sum of:
        0.026354041 = weight(_text_:l in 773) [ClassicSimilarity], result of:
          0.026354041 = score(doc=773,freq=2.0), product of:
            0.0857324 = queryWeight, product of:
              3.9746525 = idf(docFreq=2257, maxDocs=44218)
              0.021569785 = queryNorm
            0.30739886 = fieldWeight in 773, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9746525 = idf(docFreq=2257, maxDocs=44218)
              0.0546875 = fieldNorm(doc=773)
      0.33333334 = coord(1/3)
  0.06666667 = coord(2/30)

Abstract: We show that generating English Wikipedia articles can be approached as a multi-document summarization of source documents. We use extractive summarization to coarsely identify salient information and a neural abstractive model to generate the article. For the abstractive model, we introduce a decoder-only architecture that can scalably attend to very long sequences, much longer than typical encoder- decoder architectures used in sequence transduction. We show that this model can generate fluent, coherent multi-sentence paragraphs and even whole Wikipedia articles. When given reference documents, we show it can extract relevant factual information as reflected in perplexity, ROUGE scores and human evaluations.

Zadeh, B.Q.; Handschuh, S.: ¬The ACL RD-TEC : a dataset for benchmarking terminology extraction and classification in computational linguistics (2014) 0.00
```
5.537577E-4 = product of:
  0.008306365 = sum of:
    0.005915991 = weight(_text_:in in 2803) [ClassicSimilarity], result of:
      0.005915991 = score(doc=2803,freq=10.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.20163295 = fieldWeight in 2803, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=2803)
    0.002390375 = weight(_text_:s in 2803) [ClassicSimilarity], result of:
      0.002390375 = score(doc=2803,freq=4.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.101928525 = fieldWeight in 2803, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.046875 = fieldNorm(doc=2803)
  0.06666667 = coord(2/30)
```
Abstract

This paper introduces ACL RD-TEC: a dataset for evaluating the extraction and classification of terms from literature in the domain of computational linguistics. The dataset is derived from the Association for Computational Linguistics anthology reference corpus (ACL ARC). In its first release, the ACL RD-TEC consists of automatically segmented, part-of-speech-tagged ACL ARC documents, three lists of candidate terms, and more than 82,000 manually annotated terms. The annotated terms are marked as either valid or invalid, and valid terms are further classified as technology and non-technology terms. Technology terms signify methods, algorithms, and solutions in computational linguistics. The paper describes the dataset and reports the relevant statistics. We hope the step described in this paper encourages a collaborative effort towards building a full-fledged annotated corpus from the computational linguistics literature.

Pages

S.52-63
Bird, S.; Dale, R.; Dorr, B.; Gibson, B.; Joseph, M.; Kan, M.-Y.; Lee, D.; Powley, B.; Radev, D.; Tan, Y.F.: ¬The ACL Anthology Reference Corpus : a reference dataset for bibliographic research in computational linguistics (2008) 0.00
```
5.305355E-4 = product of:
  0.007958031 = sum of:
    0.0068311975 = weight(_text_:in in 2804) [ClassicSimilarity], result of:
      0.0068311975 = score(doc=2804,freq=30.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.23282567 = fieldWeight in 2804, product of:
          5.477226 = tf(freq=30.0), with freq of:
            30.0 = termFreq=30.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=2804)
    0.0011268335 = weight(_text_:s in 2804) [ClassicSimilarity], result of:
      0.0011268335 = score(doc=2804,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.048049565 = fieldWeight in 2804, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.03125 = fieldNorm(doc=2804)
  0.06666667 = coord(2/30)
```
Abstract

The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Anthology that can be used for research in scholarly document processing. This corpus, which we call the ACL Anthology Reference Corpus (ACL ARC), brings together the recent activities of a number of research groups around the world. Our goal is to make the corpus widely available, and to encourage other researchers to use it as a standard testbed for experiments in both bibliographic and bibliometric research.

Content

Vgl. auch: Automatic Term Recognition (ATR) is a research task that deals with the identification of domain-specific terms. Terms, in simple words, are textual realization of significant concepts in an expertise domain. Additionally, domain-specific terms may be classified into a number of categories, in which each category represents a significant concept. A term classification task is often defined on top of an ATR procedure to perform such categorization. For instance, in the biomedical domain, terms can be classified as drugs, proteins, and genes. This is a reference dataset for terminology extraction and classification research in computational linguistics. It is a set of manually annotated terms in English language that are extracted from the ACL Anthology Reference Corpus (ACL ARC). The ACL ARC is a canonicalised and frozen subset of scientific publications in the domain of Human Language Technologies (HLT). It consists of 10,921 articles from 1965 to 2006. The dataset, called ACL RD-TEC, is comprised of more than 69,000 candidate terms that are manually annotated as valid and invalid terms. Furthermore, valid terms are classified as technology and non-technology terms. Technology terms refer to a method, process, or in general a technological concept in the domain of HLT, e.g. machine translation, word sense disambiguation, and language modelling. On the other hand, non-technology terms refer to important concepts other than technological; examples of such terms in the domain of HLT are multilingual lexicon, corpora, word sense, and language model. The dataset is created to serve as a gold standard for the comparison of the algorithms of term recognition and classification. [http://catalog.elra.info/product_info.php?products_id=1236].

Aizawa, A.; Kohlhase, M.: Mathematical information retrieval (2021) 0.00

4.8788113E-4 = product of:
  0.0073182164 = sum of:
    0.0053462577 = weight(_text_:in in 667) [ClassicSimilarity], result of:
      0.0053462577 = score(doc=667,freq=6.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.1822149 = fieldWeight in 667, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=667)
    0.0019719584 = weight(_text_:s in 667) [ClassicSimilarity], result of:
      0.0019719584 = score(doc=667,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.08408674 = fieldWeight in 667, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0546875 = fieldNorm(doc=667)
  0.06666667 = coord(2/30)

Abstract: We present an overview of the NTCIR Math Tasks organized during NTCIR-10, 11, and 12. These tasks are primarily dedicated to techniques for searching mathematical content with formula expressions. In this chapter, we first summarize the task design and introduce test collections generated in the tasks. We also describe the features and main challenges of mathematical information retrieval systems and discuss future perspectives in the field.
Pages: S.169-185

Kiela, D.; Clark, S.: Detecting compositionality of multi-word expressions using nearest neighbours in vector space models (2013) 0.00

4.8283124E-4 = product of:
  0.007242468 = sum of:
    0.004988801 = weight(_text_:in in 1161) [ClassicSimilarity], result of:
      0.004988801 = score(doc=1161,freq=4.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.17003182 = fieldWeight in 1161, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0625 = fieldNorm(doc=1161)
    0.002253667 = weight(_text_:s in 1161) [ClassicSimilarity], result of:
      0.002253667 = score(doc=1161,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.09609913 = fieldWeight in 1161, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0625 = fieldNorm(doc=1161)
  0.06666667 = coord(2/30)

Abstract: We present a novel unsupervised approach to detecting the compositionality of multi-word expressions. We compute the compositionality of a phrase through substituting the constituent words with their "neighbours" in a semantic vector space and averaging over the distance between the original phrase and the substituted neighbour phrases. Several methods of obtaining neighbours are presented. The results are compared to existing supervised results and achieve state-of-the-art performance on a verb-object dataset of human compositionality ratings.

Snajder, J.: Distributional semantics of multi-word expressions (2013) 0.00

4.8177355E-4 = product of:
  0.0072266026 = sum of:
    0.004409519 = weight(_text_:in in 2868) [ClassicSimilarity], result of:
      0.004409519 = score(doc=2868,freq=2.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.15028831 = fieldWeight in 2868, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.078125 = fieldNorm(doc=2868)
    0.0028170836 = weight(_text_:s in 2868) [ClassicSimilarity], result of:
      0.0028170836 = score(doc=2868,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.120123915 = fieldWeight in 2868, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.078125 = fieldNorm(doc=2868)
  0.06666667 = coord(2/30)

Content: Folien einer Präsentation anlässlich COST Action IC1207 PARSEME Meeting, Warsaw, September 16, 2013. Vgl. den Beitrag: Snajder, J., P. Almic: Modeling semantic compositionality of Croatian multiword expressions. In: Informatica. 39(2015) H.3, S.301-309.

Stoykova, V.; Petkova, E.: Automatic extraction of mathematical terms for precalculus (2012) 0.00

4.7693148E-4 = product of:
  0.007153972 = sum of:
    0.004365201 = weight(_text_:in in 156) [ClassicSimilarity], result of:
      0.004365201 = score(doc=156,freq=4.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.14877784 = fieldWeight in 156, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=156)
    0.0027887707 = weight(_text_:s in 156) [ClassicSimilarity], result of:
      0.0027887707 = score(doc=156,freq=4.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.118916616 = fieldWeight in 156, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0546875 = fieldNorm(doc=156)
  0.06666667 = coord(2/30)

Abstract: In this work, we present the results of research for evaluating a methodology for extracting mathematical terms for precalculus using the techniques for semantically-oriented statistical search. We use the corpus-based approach and the combination of different statistically-based techniques for extracting keywords, collocations and co-occurrences incorporated in the Sketch Engine software. We evaluate the collocations candidate terms for the basic concept function(s) and approve the related methodology by precalculus domain conceptual terms definitions. Finally, we offer a conceptual terms hierarchical representation and discuss the results with respect to their possible applications.
Source: Procedia Technology. 1(2012), S.464-468

Galitsky, B.: Can many agents answer questions better than one? (2005) 0.00
```
4.181838E-4 = product of:
  0.0062727565 = sum of:
    0.0045825066 = weight(_text_:in in 3094) [ClassicSimilarity], result of:
      0.0045825066 = score(doc=3094,freq=6.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.1561842 = fieldWeight in 3094, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=3094)
    0.0016902501 = weight(_text_:s in 3094) [ClassicSimilarity], result of:
      0.0016902501 = score(doc=3094,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.072074346 = fieldWeight in 3094, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.046875 = fieldNorm(doc=3094)
  0.06666667 = coord(2/30)
```
Abstract

The paper addresses the issue of how online natural language question answering, based on deep semantic analysis, may compete with currently popular keyword search, open domain information retrieval systems, covering a horizontal domain. We suggest the multiagent question answering approach, where each domain is represented by an agent which tries to answer questions taking into account its specific knowledge. The meta-agent controls the cooperation between question answering agents and chooses the most relevant answer(s). We argue that multiagent question answering is optimal in terms of access to business and financial knowledge, flexibility in query phrasing, and efficiency and usability of advice. The knowledge and advice encoded in the system are initially prepared by domain experts. We analyze the commercial application of multiagent question answering and the robustness of the meta-agent. The paper suggests that a multiagent architecture is optimal when a real world question answering domain combines a number of vertical ones to form a horizontal domain.
Rajasurya, S.; Muralidharan, T.; Devi, S.; Swamynathan, S.: Semantic information retrieval using ontology in university domain (2012) 0.00
```
4.1722815E-4 = product of:
  0.006258422 = sum of:
    0.0038187557 = weight(_text_:in in 2861) [ClassicSimilarity], result of:
      0.0038187557 = score(doc=2861,freq=6.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.1301535 = fieldWeight in 2861, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2861)
    0.0024396663 = weight(_text_:s in 2861) [ClassicSimilarity], result of:
      0.0024396663 = score(doc=2861,freq=6.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.10403037 = fieldWeight in 2861, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2861)
  0.06666667 = coord(2/30)
```
Abstract

Today's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.

Nielsen, R.D.; Ward, W.; Martin, J.H.; Palmer, M.: Extracting a representation from text for semantic analysis (2008) 0.00

3.854188E-4 = product of:
  0.0057812817 = sum of:
    0.003527615 = weight(_text_:in in 3365) [ClassicSimilarity], result of:
      0.003527615 = score(doc=3365,freq=2.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.120230645 = fieldWeight in 3365, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0625 = fieldNorm(doc=3365)
    0.002253667 = weight(_text_:s in 3365) [ClassicSimilarity], result of:
      0.002253667 = score(doc=3365,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.09609913 = fieldWeight in 3365, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0625 = fieldNorm(doc=3365)
  0.06666667 = coord(2/30)

Abstract: We present a novel fine-grained semantic representation of text and an approach to constructing it. This representation is largely extractable by today's technologies and facilitates more detailed semantic analysis. We discuss the requirements driving the representation, suggest how it might be of value in the automated tutoring domain, and provide evidence of its validity.
Pages: S.241-244

Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; Agarwal, S.; Herbert-Voss, A.; Krueger, G.; Henighan, T.; Child, R.; Ramesh, A.; Ziegler, D.M.; Wu, J.; Winter, C.; Hesse, C.; Chen, M.; Sigler, E.; Litwin, M.; Gray, S.; Chess, B.; Clark, J.; Berner, C.; McCandlish, S.; Radford, A.; Sutskever, I.; Amodei, D.: Language models are few-shot learners (2020) 0.00
```
3.854188E-4 = product of:
  0.0057812817 = sum of:
    0.003527615 = weight(_text_:in in 872) [ClassicSimilarity], result of:
      0.003527615 = score(doc=872,freq=8.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.120230645 = fieldWeight in 872, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=872)
    0.002253667 = weight(_text_:s in 872) [ClassicSimilarity], result of:
      0.002253667 = score(doc=872,freq=8.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.09609913 = fieldWeight in 872, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.03125 = fieldNorm(doc=872)
  0.06666667 = coord(2/30)
```
Abstract

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

Shen, M.; Liu, D.-R.; Huang, Y.-S.: Extracting semantic relations to enrich domain ontologies (2012) 0.00

3.3724142E-4 = product of:
  0.005058621 = sum of:
    0.0030866629 = weight(_text_:in in 267) [ClassicSimilarity], result of:
      0.0030866629 = score(doc=267,freq=2.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.10520181 = fieldWeight in 267, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=267)
    0.0019719584 = weight(_text_:s in 267) [ClassicSimilarity], result of:
      0.0019719584 = score(doc=267,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.08408674 = fieldWeight in 267, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0546875 = fieldNorm(doc=267)
  0.06666667 = coord(2/30)

Abstract: Domain ontologies facilitate the organization, sharing and reuse of domain knowledge, and enable various vertical domain applications to operate successfully. Most methods for automatically constructing ontologies focus on taxonomic relations, such as is-kind-of and is- part-of relations. However, much of the domain-specific semantics is ignored. This work proposes a semi-unsupervised approach for extracting semantic relations from domain-specific text documents. The approach effectively utilizes text mining and existing taxonomic relations in domain ontologies to discover candidate keywords that can represent semantic relations. A preliminary experiment on the natural science domain (Taiwan K9 education) indicates that the proposed method yields valuable recommendations. This work enriches domain ontologies by adding distilled semantics.

Hausser, R.: Language and nonlanguage cognition (2021) 0.00
```
2.4944008E-4 = product of:
  0.007483202 = sum of:
    0.007483202 = weight(_text_:in in 255) [ClassicSimilarity], result of:
      0.007483202 = score(doc=255,freq=16.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.25504774 = fieldWeight in 255, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=255)
  0.033333335 = coord(1/30)
```
Abstract

A basic distinction in agent-based data-driven Database Semantics (DBS) is between language and nonlanguage cognition. Language cognition transfers content between agents by means of raw data. Nonlanguage cognition maps between content and raw data inside the focus agent. {\it Recognition} applies a concept type to raw data, resulting in a concept token. In language recognition, the focus agent (hearer) takes raw language-data (surfaces) produced by another agent (speaker) as input, while nonlanguage recognition takes raw nonlanguage-data as input. In either case, the output is a content which is stored in the agent's onboard short term memory. {\it Action} adapts a concept type to a purpose, resulting in a token. In language action, the focus agent (speaker) produces language-dependent surfaces for another agent (hearer), while nonlanguage action produces intentions for a nonlanguage purpose. In either case, the output is raw action data. As long as the procedural implementation of place holder values works properly, it is compatible with the DBS requirement of input-output equivalence between the natural prototype and the artificial reconstruction.

Caseiro, D.: Automatic language identification bibliography : Last Update: 20 September 1999 (1999) 0.00

2.0577753E-4 = product of:
  0.0061733257 = sum of:
    0.0061733257 = weight(_text_:in in 1842) [ClassicSimilarity], result of:
      0.0061733257 = score(doc=1842,freq=2.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.21040362 = fieldWeight in 1842, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.109375 = fieldNorm(doc=1842)
  0.033333335 = coord(1/30)

Abstract: This bibliography lists research in Automatic Identification of Spoken Language.

Rindflesch, T.C.; Aronson, A.R.: Semantic processing in information retrieval (1993) 0.00
```
2.0577753E-4 = product of:
  0.0061733257 = sum of:
    0.0061733257 = weight(_text_:in in 4121) [ClassicSimilarity], result of:
      0.0061733257 = score(doc=4121,freq=8.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.21040362 = fieldWeight in 4121, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4121)
  0.033333335 = coord(1/30)
```
Abstract

Intuition suggests that one way to enhance the information retrieval process would be the use of phrases to characterize the contents of text. A number of researchers, however, have noted that phrases alone do not improve retrieval effectiveness. In this paper we briefly review the use of phrases in information retrieval and then suggest extensions to this paradigm using semantic information. We claim that semantic processing, which can be viewed as expressing relations between the concepts represented by phrases, will in fact enhance retrieval effectiveness. The availability of the UMLS® domain model, which we exploit extensively, significantly contributes to the feasibility of this processing.
Wong, W.; Liu, W.; Bennamoun, M.: Ontology learning from text : a look back and into the future (2010) 0.00
```
2.0577753E-4 = product of:
  0.0061733257 = sum of:
    0.0061733257 = weight(_text_:in in 4733) [ClassicSimilarity], result of:
      0.0061733257 = score(doc=4733,freq=8.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.21040362 = fieldWeight in 4733, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4733)
  0.033333335 = coord(1/30)
```
Abstract

Ontologies are often viewed as the answer to the need for inter-operable semantics in modern information systems. The explosion of textual information on the "Read/Write" Web coupled with the increasing demand for ontologies to power the Semantic Web have made (semi-)automatic ontology learning from text a very promising research area. This together with the advanced state in related areas such as natural language processing have fuelled research into ontology learning over the past decade. This survey looks at how far we have come since the turn of the millennium, and discusses the remaining challenges that will define the research directions in this area in the near future.
Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.00
```
2.0577753E-4 = product of:
  0.0061733257 = sum of:
    0.0061733257 = weight(_text_:in in 1536) [ClassicSimilarity], result of:
      0.0061733257 = score(doc=1536,freq=32.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.21040362 = fieldWeight in 1536, product of:
          5.656854 = tf(freq=32.0), with freq of:
            32.0 = termFreq=32.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1536)
  0.033333335 = coord(1/30)
```
Abstract

Multiword expressions (MWEs) are lexical items that can be decomposed into single words and display lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasy (Sag et al., 2002; Kim, 2008; Calzolari et al., 2002). The proper treatment of multiword expressions such as rock 'n' roll and make a decision is essential for many natural language processing (NLP) applications like information extraction and retrieval, terminology extraction and machine translation, and it is important to identify multiword expressions in context. For example, in machine translation we must know that MWEs form one semantic unit, hence their parts should not be translated separately. For this, multiword expressions should be identified first in the text to be translated. The chief aim of this thesis is to develop machine learning-based approaches for the automatic detection of different types of multiword expressions in English and Hungarian natural language texts. In our investigations, we pay attention to the characteristics of different types of multiword expressions such as nominal compounds, multiword named entities and light verb constructions, and we apply novel methods to identify MWEs in raw texts. In the thesis it will be demonstrated that nominal compounds and multiword amed entities may require a similar approach for their automatic detection as they behave in the same way from a linguistic point of view. Furthermore, it will be shown that the automatic detection of light verb constructions can be carried out using two effective machine learning-based approaches.
In this thesis, we focused on the automatic detection of multiword expressions in natural language texts. On the basis of the main contributions, we can argue that: - Supervised machine learning methods can be successfully applied for the automatic detection of different types of multiword expressions in natural language texts. - Machine learning-based multiword expression detection can be successfully carried out for English as well as for Hungarian. - Our supervised machine learning-based model was successfully applied to the automatic detection of nominal compounds from English raw texts. - We developed a Wikipedia-based dictionary labeling method to automatically detect English nominal compounds. - A prior knowledge of nominal compounds can enhance Named Entity Recognition, while previously identified named entities can assist the nominal compound identification process. - The machine learning-based method can also provide acceptable results when it was trained on an automatically generated silver standard corpus. - As named entities form one semantic unit and may consist of more than one word and function as a noun, we can treat them in a similar way to nominal compounds. - Our sequence labelling-based tool can be successfully applied for identifying verbal light verb constructions in two typologically different languages, namely English and Hungarian. - Domain adaptation techniques may help diminish the distance between domains in the automatic detection of light verb constructions. - Our syntax-based method can be successfully applied for the full-coverage identification of light verb constructions. As a first step, a data-driven candidate extraction method can be utilized. After, a machine learning approach that makes use of an extended and rich feature set selects LVCs among extracted candidates. - When a precise syntactic parser is available for the actual domain, the full-coverage identification can be performed better. In other cases, the usage of the sequence labeling method is recommended.
Collard, J.; Paiva, V. de; Fong, B.; Subrahmanian, E.: Extracting mathematical concepts from text (2022) 0.00
```
2.0577753E-4 = product of:
  0.0061733257 = sum of:
    0.0061733257 = weight(_text_:in in 668) [ClassicSimilarity], result of:
      0.0061733257 = score(doc=668,freq=8.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.21040362 = fieldWeight in 668, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=668)
  0.033333335 = coord(1/30)
```
Abstract

We investigate different systems for extracting mathematical entities from English texts in the mathematical field of category theory as a first step for constructing a mathematical knowledge graph. We consider four different term extractors and compare their results. This small experiment showcases some of the issues with the construction and evaluation of terms extracted from noisy domain text. We also make available two open corpora in research mathematics, in particular in category theory: a small corpus of 755 abstracts from the journal TAC (3188 sentences), and a larger corpus from the nLab community wiki (15,000 sentences).

Collins, C.: WordNet explorer : applying visualization principles to lexical semantics (2006) 0.00

2.0366698E-4 = product of:
  0.006110009 = sum of:
    0.006110009 = weight(_text_:in in 1288) [ClassicSimilarity], result of:
      0.006110009 = score(doc=1288,freq=6.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.2082456 = fieldWeight in 1288, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0625 = fieldNorm(doc=1288)
  0.033333335 = coord(1/30)

Abstract: Interface designs for lexical databases in NLP have suffered from not following design principles developed in the information visualization research community. We present a design paradigm and show it can be used to generate visualizations which maximize the usability and utility ofWordNet. The techniques can be generally applied to other lexical databases used in NLP research.

Search (42 results, page 1 of 3)

Authors

Years

Types

Themes