Search (41 results, page 1 of 3)

Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.09

0.09050408 = product of:
  0.21117619 = sum of:
    0.16279194 = weight(_text_:methoden in 4888) [ClassicSimilarity], result of:
      0.16279194 = score(doc=4888,freq=2.0), product of:
        0.23693791 = queryWeight, product of:
          5.1821747 = idf(docFreq=674, maxDocs=44218)
          0.045721713 = queryNorm
        0.6870658 = fieldWeight in 4888, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.1821747 = idf(docFreq=674, maxDocs=44218)
          0.09375 = fieldNorm(doc=4888)
    0.011216287 = weight(_text_:in in 4888) [ClassicSimilarity], result of:
      0.011216287 = score(doc=4888,freq=2.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.18034597 = fieldWeight in 4888, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.09375 = fieldNorm(doc=4888)
    0.037167966 = product of:
      0.07433593 = sum of:
        0.07433593 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
          0.07433593 = score(doc=4888,freq=2.0), product of:
            0.16010965 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045721713 = queryNorm
            0.46428138 = fieldWeight in 4888, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4888)
      0.5 = coord(1/2)
  0.42857143 = coord(3/7)

Abstract: Mit einem Überblick über: Probleme, Methoden, Stand der Forschung u. Literatur.
Date: 1. 3.2013 14:56:22

Snajder, J.: Distributional semantics of multi-word expressions (2013) 0.01

0.014527466 = product of:
  0.05084613 = sum of:
    0.009346906 = weight(_text_:in in 2868) [ClassicSimilarity], result of:
      0.009346906 = score(doc=2868,freq=2.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.15028831 = fieldWeight in 2868, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.078125 = fieldNorm(doc=2868)
    0.041499224 = weight(_text_:den in 2868) [ClassicSimilarity], result of:
      0.041499224 = score(doc=2868,freq=2.0), product of:
        0.13104749 = queryWeight, product of:
          2.866198 = idf(docFreq=6840, maxDocs=44218)
          0.045721713 = queryNorm
        0.31667316 = fieldWeight in 2868, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.866198 = idf(docFreq=6840, maxDocs=44218)
          0.078125 = fieldNorm(doc=2868)
  0.2857143 = coord(2/7)

Content: Folien einer Präsentation anlässlich COST Action IC1207 PARSEME Meeting, Warsaw, September 16, 2013. Vgl. den Beitrag: Snajder, J., P. Almic: Modeling semantic compositionality of Croatian multiword expressions. In: Informatica. 39(2015) H.3, S.301-309.

Spitkovsky, V.I.; Chang, A.X.: ¬A cross-lingual dictionary for english Wikipedia concepts (2012) 0.01
```
0.009889464 = product of:
  0.03461312 = sum of:
    0.00971359 = weight(_text_:in in 336) [ClassicSimilarity], result of:
      0.00971359 = score(doc=336,freq=6.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.1561842 = fieldWeight in 336, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=336)
    0.024899531 = weight(_text_:den in 336) [ClassicSimilarity], result of:
      0.024899531 = score(doc=336,freq=2.0), product of:
        0.13104749 = queryWeight, product of:
          2.866198 = idf(docFreq=6840, maxDocs=44218)
          0.045721713 = queryNorm
        0.19000389 = fieldWeight in 336, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.866198 = idf(docFreq=6840, maxDocs=44218)
          0.046875 = fieldNorm(doc=336)
  0.2857143 = coord(2/7)
```
Abstract

We present a resource for automatically associating strings of text with English Wikipedia concepts. Our machinery is bi-directional, in the sense that it uses the same fundamental probabilistic methods to map strings to empirical distributions over Wikipedia articles as it does to map article URLs to distributions over short, language-independent strings of natural language text. For maximal interoperability, we release our resource as a set of ?at line-based text ?les, lexicographically sorted and encoded with UTF-8. These files capture joint probability distributions underlying concepts (we use the terms article, concept and Wikipedia URL interchangeably) and associated snippets of text, as well as other features that can come in handy when working with Wikipedia articles and related information.

Content

Vgl. auch: Spitkovsky, V., P. Norvig: From words to concepts and back: dictionaries for linking text, entities and ideas. In: http://googleresearch.blogspot.de/2012/05/from-words-to-concepts-and-back.html. Für den Datenpool vgl.: nlp.stanford.edu/pubs/corsswikis-data.tar.bz2.
Spitkovsky, V.; Norvig, P.: From words to concepts and back : dictionaries for linking text, entities and ideas (2012) 0.01
```
0.0071313763 = product of:
  0.024959816 = sum of:
    0.008360127 = weight(_text_:in in 337) [ClassicSimilarity], result of:
      0.008360127 = score(doc=337,freq=10.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.13442196 = fieldWeight in 337, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=337)
    0.016599689 = weight(_text_:den in 337) [ClassicSimilarity], result of:
      0.016599689 = score(doc=337,freq=2.0), product of:
        0.13104749 = queryWeight, product of:
          2.866198 = idf(docFreq=6840, maxDocs=44218)
          0.045721713 = queryNorm
        0.12666926 = fieldWeight in 337, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.866198 = idf(docFreq=6840, maxDocs=44218)
          0.03125 = fieldNorm(doc=337)
  0.2857143 = coord(2/7)
```
Abstract

Human language is both rich and ambiguous. When we hear or read words, we resolve meanings to mental representations, for example recognizing and linking names to the intended persons, locations or organizations. Bridging words and meaning - from turning search queries into relevant results to suggesting targeted keywords for advertisers - is also Google's core competency, and important for many other tasks in information retrieval and natural language processing. We are happy to release a resource, spanning 7,560,141 concepts and 175,100,788 unique text strings, that we hope will help everyone working in these areas. How do we represent concepts? Our approach piggybacks on the unique titles of entries from an encyclopedia, which are mostly proper and common noun phrases. We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia's groupings of articles into hierarchical categories. The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article's canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept's url. Our database thus includes weights that measure degrees of association. For example, the top two entries for football indicate that it is an ambiguous term, which is almost twice as likely to refer to what we in the US call soccer. Vgl. auch: Spitkovsky, V.I., A.X. Chang: A cross-lingual dictionary for english Wikipedia concepts. In: http://nlp.stanford.edu/pubs/crosswikis.pdf.

Content

Für den Datenpool vgl.: nlp.stanford.edu/pubs/corsswikis-data.tar.bz2.
Hausser, R.: Language and nonlanguage cognition (2021) 0.00
```
0.0022660324 = product of:
  0.015862226 = sum of:
    0.015862226 = weight(_text_:in in 255) [ClassicSimilarity], result of:
      0.015862226 = score(doc=255,freq=16.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.25504774 = fieldWeight in 255, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=255)
  0.14285715 = coord(1/7)
```
Abstract

A basic distinction in agent-based data-driven Database Semantics (DBS) is between language and nonlanguage cognition. Language cognition transfers content between agents by means of raw data. Nonlanguage cognition maps between content and raw data inside the focus agent. {\it Recognition} applies a concept type to raw data, resulting in a concept token. In language recognition, the focus agent (hearer) takes raw language-data (surfaces) produced by another agent (speaker) as input, while nonlanguage recognition takes raw nonlanguage-data as input. In either case, the output is a content which is stored in the agent's onboard short term memory. {\it Action} adapts a concept type to a purpose, resulting in a token. In language action, the focus agent (speaker) produces language-dependent surfaces for another agent (hearer), while nonlanguage action produces intentions for a nonlanguage purpose. In either case, the output is raw action data. As long as the procedural implementation of place holder values works properly, it is compatible with the DBS requirement of input-output equivalence between the natural prototype and the artificial reconstruction.
Bird, S.; Dale, R.; Dorr, B.; Gibson, B.; Joseph, M.; Kan, M.-Y.; Lee, D.; Powley, B.; Radev, D.; Tan, Y.F.: ¬The ACL Anthology Reference Corpus : a reference dataset for bibliographic research in computational linguistics (2008) 0.00
```
0.002068595 = product of:
  0.014480165 = sum of:
    0.014480165 = weight(_text_:in in 2804) [ClassicSimilarity], result of:
      0.014480165 = score(doc=2804,freq=30.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.23282567 = fieldWeight in 2804, product of:
          5.477226 = tf(freq=30.0), with freq of:
            30.0 = termFreq=30.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=2804)
  0.14285715 = coord(1/7)
```
Abstract

The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Anthology that can be used for research in scholarly document processing. This corpus, which we call the ACL Anthology Reference Corpus (ACL ARC), brings together the recent activities of a number of research groups around the world. Our goal is to make the corpus widely available, and to encourage other researchers to use it as a standard testbed for experiments in both bibliographic and bibliometric research.

Content

Vgl. auch: Automatic Term Recognition (ATR) is a research task that deals with the identification of domain-specific terms. Terms, in simple words, are textual realization of significant concepts in an expertise domain. Additionally, domain-specific terms may be classified into a number of categories, in which each category represents a significant concept. A term classification task is often defined on top of an ATR procedure to perform such categorization. For instance, in the biomedical domain, terms can be classified as drugs, proteins, and genes. This is a reference dataset for terminology extraction and classification research in computational linguistics. It is a set of manually annotated terms in English language that are extracted from the ACL Anthology Reference Corpus (ACL ARC). The ACL ARC is a canonicalised and frozen subset of scientific publications in the domain of Human Language Technologies (HLT). It consists of 10,921 articles from 1965 to 2006. The dataset, called ACL RD-TEC, is comprised of more than 69,000 candidate terms that are manually annotated as valid and invalid terms. Furthermore, valid terms are classified as technology and non-technology terms. Technology terms refer to a method, process, or in general a technological concept in the domain of HLT, e.g. machine translation, word sense disambiguation, and language modelling. On the other hand, non-technology terms refer to important concepts other than technological; examples of such terms in the domain of HLT are multilingual lexicon, corpora, word sense, and language model. The dataset is created to serve as a gold standard for the comparison of the algorithms of term recognition and classification. [http://catalog.elra.info/product_info.php?products_id=1236].

Caseiro, D.: Automatic language identification bibliography : Last Update: 20 September 1999 (1999) 0.00

0.001869381 = product of:
  0.013085667 = sum of:
    0.013085667 = weight(_text_:in in 1842) [ClassicSimilarity], result of:
      0.013085667 = score(doc=1842,freq=2.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.21040362 = fieldWeight in 1842, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.109375 = fieldNorm(doc=1842)
  0.14285715 = coord(1/7)

Abstract: This bibliography lists research in Automatic Identification of Spoken Language.

Rindflesch, T.C.; Aronson, A.R.: Semantic processing in information retrieval (1993) 0.00
```
0.001869381 = product of:
  0.013085667 = sum of:
    0.013085667 = weight(_text_:in in 4121) [ClassicSimilarity], result of:
      0.013085667 = score(doc=4121,freq=8.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.21040362 = fieldWeight in 4121, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4121)
  0.14285715 = coord(1/7)
```
Abstract

Intuition suggests that one way to enhance the information retrieval process would be the use of phrases to characterize the contents of text. A number of researchers, however, have noted that phrases alone do not improve retrieval effectiveness. In this paper we briefly review the use of phrases in information retrieval and then suggest extensions to this paradigm using semantic information. We claim that semantic processing, which can be viewed as expressing relations between the concepts represented by phrases, will in fact enhance retrieval effectiveness. The availability of the UMLS® domain model, which we exploit extensively, significantly contributes to the feasibility of this processing.
Wong, W.; Liu, W.; Bennamoun, M.: Ontology learning from text : a look back and into the future (2010) 0.00
```
0.001869381 = product of:
  0.013085667 = sum of:
    0.013085667 = weight(_text_:in in 4733) [ClassicSimilarity], result of:
      0.013085667 = score(doc=4733,freq=8.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.21040362 = fieldWeight in 4733, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4733)
  0.14285715 = coord(1/7)
```
Abstract

Ontologies are often viewed as the answer to the need for inter-operable semantics in modern information systems. The explosion of textual information on the "Read/Write" Web coupled with the increasing demand for ontologies to power the Semantic Web have made (semi-)automatic ontology learning from text a very promising research area. This together with the advanced state in related areas such as natural language processing have fuelled research into ontology learning over the past decade. This survey looks at how far we have come since the turn of the millennium, and discusses the remaining challenges that will define the research directions in this area in the near future.
Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.00
```
0.001869381 = product of:
  0.013085667 = sum of:
    0.013085667 = weight(_text_:in in 1536) [ClassicSimilarity], result of:
      0.013085667 = score(doc=1536,freq=32.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.21040362 = fieldWeight in 1536, product of:
          5.656854 = tf(freq=32.0), with freq of:
            32.0 = termFreq=32.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1536)
  0.14285715 = coord(1/7)
```
Abstract

Multiword expressions (MWEs) are lexical items that can be decomposed into single words and display lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasy (Sag et al., 2002; Kim, 2008; Calzolari et al., 2002). The proper treatment of multiword expressions such as rock 'n' roll and make a decision is essential for many natural language processing (NLP) applications like information extraction and retrieval, terminology extraction and machine translation, and it is important to identify multiword expressions in context. For example, in machine translation we must know that MWEs form one semantic unit, hence their parts should not be translated separately. For this, multiword expressions should be identified first in the text to be translated. The chief aim of this thesis is to develop machine learning-based approaches for the automatic detection of different types of multiword expressions in English and Hungarian natural language texts. In our investigations, we pay attention to the characteristics of different types of multiword expressions such as nominal compounds, multiword named entities and light verb constructions, and we apply novel methods to identify MWEs in raw texts. In the thesis it will be demonstrated that nominal compounds and multiword amed entities may require a similar approach for their automatic detection as they behave in the same way from a linguistic point of view. Furthermore, it will be shown that the automatic detection of light verb constructions can be carried out using two effective machine learning-based approaches.
In this thesis, we focused on the automatic detection of multiword expressions in natural language texts. On the basis of the main contributions, we can argue that: - Supervised machine learning methods can be successfully applied for the automatic detection of different types of multiword expressions in natural language texts. - Machine learning-based multiword expression detection can be successfully carried out for English as well as for Hungarian. - Our supervised machine learning-based model was successfully applied to the automatic detection of nominal compounds from English raw texts. - We developed a Wikipedia-based dictionary labeling method to automatically detect English nominal compounds. - A prior knowledge of nominal compounds can enhance Named Entity Recognition, while previously identified named entities can assist the nominal compound identification process. - The machine learning-based method can also provide acceptable results when it was trained on an automatically generated silver standard corpus. - As named entities form one semantic unit and may consist of more than one word and function as a noun, we can treat them in a similar way to nominal compounds. - Our sequence labelling-based tool can be successfully applied for identifying verbal light verb constructions in two typologically different languages, namely English and Hungarian. - Domain adaptation techniques may help diminish the distance between domains in the automatic detection of light verb constructions. - Our syntax-based method can be successfully applied for the full-coverage identification of light verb constructions. As a first step, a data-driven candidate extraction method can be utilized. After, a machine learning approach that makes use of an extended and rich feature set selects LVCs among extracted candidates. - When a precise syntactic parser is available for the actual domain, the full-coverage identification can be performed better. In other cases, the usage of the sequence labeling method is recommended.
Collard, J.; Paiva, V. de; Fong, B.; Subrahmanian, E.: Extracting mathematical concepts from text (2022) 0.00
```
0.001869381 = product of:
  0.013085667 = sum of:
    0.013085667 = weight(_text_:in in 668) [ClassicSimilarity], result of:
      0.013085667 = score(doc=668,freq=8.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.21040362 = fieldWeight in 668, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=668)
  0.14285715 = coord(1/7)
```
Abstract

We investigate different systems for extracting mathematical entities from English texts in the mathematical field of category theory as a first step for constructing a mathematical knowledge graph. We consider four different term extractors and compare their results. This small experiment showcases some of the issues with the construction and evaluation of terms extracted from noisy domain text. We also make available two open corpora in research mathematics, in particular in category theory: a small corpus of 755 abstracts from the journal TAC (3188 sentences), and a larger corpus from the nLab community wiki (15,000 sentences).

Collins, C.: WordNet explorer : applying visualization principles to lexical semantics (2006) 0.00

0.0018502077 = product of:
  0.012951453 = sum of:
    0.012951453 = weight(_text_:in in 1288) [ClassicSimilarity], result of:
      0.012951453 = score(doc=1288,freq=6.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.2082456 = fieldWeight in 1288, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0625 = fieldNorm(doc=1288)
  0.14285715 = coord(1/7)

Abstract: Interface designs for lexical databases in NLP have suffered from not following design principles developed in the information visualization research community. We present a design paradigm and show it can be used to generate visualizations which maximize the usability and utility ofWordNet. The techniques can be generally applied to other lexical databases used in NLP research.

Dunning, T.: Statistical identification of language (1994) 0.00
```
0.001791456 = product of:
  0.012540191 = sum of:
    0.012540191 = weight(_text_:in in 3627) [ClassicSimilarity], result of:
      0.012540191 = score(doc=3627,freq=10.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.20163295 = fieldWeight in 3627, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=3627)
  0.14285715 = coord(1/7)
```
Abstract

A statistically based program has been written which learns to distinguish between languages. The amount of training text that such a program needs is surprisingly small, and the amount of text needed to make an identification is also quite small. The program incorporates no linguistic presuppositions other than the assumption that text can be encoded as a string of bytes. Such a program can be used to determine which language small bits of text are in. It also shows a potential for what might be called 'statistical philology' in that it may be applied directly to phonetic transcriptions to help elucidate family trees among language dialects. A variant of this program has been shown to be useful as a quality control in biochemistry. In this application, genetic sequences are assumed to be expressions in a language peculiar to the organism from which the sequence is taken. Thus language identification becomes species identification.
Zadeh, B.Q.; Handschuh, S.: ¬The ACL RD-TEC : a dataset for benchmarking terminology extraction and classification in computational linguistics (2014) 0.00
```
0.001791456 = product of:
  0.012540191 = sum of:
    0.012540191 = weight(_text_:in in 2803) [ClassicSimilarity], result of:
      0.012540191 = score(doc=2803,freq=10.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.20163295 = fieldWeight in 2803, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=2803)
  0.14285715 = coord(1/7)
```
Abstract

This paper introduces ACL RD-TEC: a dataset for evaluating the extraction and classification of terms from literature in the domain of computational linguistics. The dataset is derived from the Association for Computational Linguistics anthology reference corpus (ACL ARC). In its first release, the ACL RD-TEC consists of automatically segmented, part-of-speech-tagged ACL ARC documents, three lists of candidate terms, and more than 82,000 manually annotated terms. The annotated terms are marked as either valid or invalid, and valid terms are further classified as technology and non-technology terms. Technology terms signify methods, algorithms, and solutions in computational linguistics. The paper describes the dataset and reports the relevant statistics. We hope the step described in this paper encourages a collaborative effort towards building a full-fledged annotated corpus from the computational linguistics literature.
Aydin, Ö.; Karaarslan, E.: OpenAI ChatGPT generated literature review: : digital twin in healthcare (2022) 0.00
```
0.001771439 = product of:
  0.012400072 = sum of:
    0.012400072 = weight(_text_:in in 851) [ClassicSimilarity], result of:
      0.012400072 = score(doc=851,freq=22.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.19937998 = fieldWeight in 851, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=851)
  0.14285715 = coord(1/7)
```
Abstract

Literature review articles are essential to summarize the related work in the selected field. However, covering all related studies takes too much time and effort. This study questions how Artificial Intelligence can be used in this process. We used ChatGPT to create a literature review article to show the stage of the OpenAI ChatGPT artificial intelligence application. As the subject, the applications of Digital Twin in the health field were chosen. Abstracts of the last three years (2020, 2021 and 2022) papers were obtained from the keyword "Digital twin in healthcare" search results on Google Scholar and paraphrased by ChatGPT. Later on, we asked ChatGPT questions. The results are promising; however, the paraphrased parts had significant matches when checked with the Ithenticate tool. This article is the first attempt to show the compilation and expression of knowledge will be accelerated with the help of artificial intelligence. We are still at the beginning of such advances. The future academic publishing process will require less human effort, which in turn will allow academics to focus on their studies. In future studies, we will monitor citations to this study to evaluate the academic validity of the content produced by the ChatGPT. 1. Introduction OpenAI ChatGPT (ChatGPT, 2022) is a chatbot based on the OpenAI GPT-3 language model. It is designed to generate human-like text responses to user input in a conversational context. OpenAI ChatGPT is trained on a large dataset of human conversations and can be used to create responses to a wide range of topics and prompts. The chatbot can be used for customer service, content creation, and language translation tasks, creating replies in multiple languages. OpenAI ChatGPT is available through the OpenAI API, which allows developers to access and integrate the chatbot into their applications and systems. OpenAI ChatGPT is a variant of the GPT (Generative Pre-trained Transformer) language model developed by OpenAI. It is designed to generate human-like text, allowing it to engage in conversation with users naturally and intuitively. OpenAI ChatGPT is trained on a large dataset of human conversations, allowing it to understand and respond to a wide range of topics and contexts. It can be used in various applications, such as chatbots, customer service agents, and language translation systems. OpenAI ChatGPT is a state-of-the-art language model able to generate coherent and natural text that can be indistinguishable from text written by a human. As an artificial intelligence, ChatGPT may need help to change academic writing practices. However, it can provide information and guidance on ways to improve people's academic writing skills.

Aizawa, A.; Kohlhase, M.: Mathematical information retrieval (2021) 0.00

0.0016189317 = product of:
  0.011332521 = sum of:
    0.011332521 = weight(_text_:in in 667) [ClassicSimilarity], result of:
      0.011332521 = score(doc=667,freq=6.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.1822149 = fieldWeight in 667, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=667)
  0.14285715 = coord(1/7)

Abstract: We present an overview of the NTCIR Math Tasks organized during NTCIR-10, 11, and 12. These tasks are primarily dedicated to techniques for searching mathematical content with formula expressions. In this chapter, we first summarize the task design and introduce test collections generated in the tasks. We also describe the features and main challenges of mathematical information retrieval systems and discuss future perspectives in the field.

Perovsek, M.; Kranjca, J.; Erjaveca, T.; Cestnika, B.; Lavraca, N.: TextFlows : a visual programming platform for text mining and natural language processing (2016) 0.00
```
0.0016023268 = product of:
  0.011216287 = sum of:
    0.011216287 = weight(_text_:in in 2697) [ClassicSimilarity], result of:
      0.011216287 = score(doc=2697,freq=8.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.18034597 = fieldWeight in 2697, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=2697)
  0.14285715 = coord(1/7)
```
Abstract

Text mining and natural language processing are fast growing areas of research, with numerous applications in business, science and creative industries. This paper presents TextFlows, a web-based text mining and natural language processing platform supporting workflow construction, sharing and execution. The platform enables visual construction of text mining workflows through a web browser, and the execution of the constructed workflows on a processing cloud. This makes TextFlows an adaptable infrastructure for the construction and sharing of text processing workflows, which can be reused in various applications. The paper presents the implemented text mining and language processing modules, and describes some precomposed workflows. Their features are demonstrated on three use cases: comparison of document classifiers and of different part-of-speech taggers on a text categorization problem, and outlier detection in document corpora.

Source

Science of computer programming. In Press, 2016
Kiela, D.; Clark, S.: Detecting compositionality of multi-word expressions using nearest neighbours in vector space models (2013) 0.00
```
0.0015106882 = product of:
  0.010574817 = sum of:
    0.010574817 = weight(_text_:in in 1161) [ClassicSimilarity], result of:
      0.010574817 = score(doc=1161,freq=4.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.17003182 = fieldWeight in 1161, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0625 = fieldNorm(doc=1161)
  0.14285715 = coord(1/7)
```
Abstract

We present a novel unsupervised approach to detecting the compositionality of multi-word expressions. We compute the compositionality of a phrase through substituting the constituent words with their "neighbours" in a semantic vector space and averaging over the distance between the original phrase and the substituted neighbour phrases. Several methods of obtaining neighbours are presented. The results are compared to existing supervised results and achieve state-of-the-art performance on a verb-object dataset of human compositionality ratings.

ChatGPT : Optimizing language models for dalogue (2022) 0.00

0.0015106882 = product of:
  0.010574817 = sum of:
    0.010574817 = weight(_text_:in in 836) [ClassicSimilarity], result of:
      0.010574817 = score(doc=836,freq=4.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.17003182 = fieldWeight in 836, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0625 = fieldNorm(doc=836)
  0.14285715 = coord(1/7)

Abstract: We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response.

Galitsky, B.: Can many agents answer questions better than one? (2005) 0.00
```
0.0013876559 = product of:
  0.00971359 = sum of:
    0.00971359 = weight(_text_:in in 3094) [ClassicSimilarity], result of:
      0.00971359 = score(doc=3094,freq=6.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.1561842 = fieldWeight in 3094, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=3094)
  0.14285715 = coord(1/7)
```
Abstract

The paper addresses the issue of how online natural language question answering, based on deep semantic analysis, may compete with currently popular keyword search, open domain information retrieval systems, covering a horizontal domain. We suggest the multiagent question answering approach, where each domain is represented by an agent which tries to answer questions taking into account its specific knowledge. The meta-agent controls the cooperation between question answering agents and chooses the most relevant answer(s). We argue that multiagent question answering is optimal in terms of access to business and financial knowledge, flexibility in query phrasing, and efficiency and usability of advice. The knowledge and advice encoded in the system are initially prepared by domain experts. We analyze the commercial application of multiagent question answering and the robustness of the meta-agent. The paper suggests that a multiagent architecture is optimal when a real world question answering domain combines a number of vertical ones to form a horizontal domain.

Search (41 results, page 1 of 3)

Authors

Years

Types

Themes