Search (103 results, page 1 of 6)

Godby, J.: WordSmith research project bridges gap between tokens and indexes (1998) 0.12

0.11779368 = product of:
  0.17669052 = sum of:
    0.06476502 = weight(_text_:reference in 4729) [ClassicSimilarity], result of:
      0.06476502 = score(doc=4729,freq=2.0), product of:
        0.205834 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.050593734 = queryNorm
        0.31464687 = fieldWeight in 4729, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4729)
    0.111925505 = sum of:
      0.06394224 = weight(_text_:database in 4729) [ClassicSimilarity], result of:
        0.06394224 = score(doc=4729,freq=2.0), product of:
          0.20452234 = queryWeight, product of:
            4.042444 = idf(docFreq=2109, maxDocs=44218)
            0.050593734 = queryNorm
          0.31264183 = fieldWeight in 4729, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.042444 = idf(docFreq=2109, maxDocs=44218)
            0.0546875 = fieldNorm(doc=4729)
      0.047983266 = weight(_text_:22 in 4729) [ClassicSimilarity], result of:
        0.047983266 = score(doc=4729,freq=2.0), product of:
          0.17717063 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050593734 = queryNorm
          0.2708308 = fieldWeight in 4729, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=4729)
  0.6666667 = coord(2/3)

Abstract: Reports on an OCLC natural language processing research project to develop methods for identifying terminology in unstructured electronic text, especially material associated with new cultural trends and emerging subjects. Current OCLC production software can only identify single words as indexable terms in full text documents, thus a major goal of the WordSmith project is to develop software that can automatically identify and intelligently organize phrases for uses in database indexes. By analyzing user terminology from local newspapers in the USA, the latest cultural trends and technical developments as well as personal and geographic names have been drawm out. Notes that this new vocabulary can also be mapped into reference works
Source: OCLC newsletter. 1998, no.234, Jul/Aug, S.22-24

¬The language engineering directory (1993) 0.09

0.09212967 = product of:
  0.1381945 = sum of:
    0.09252147 = weight(_text_:reference in 8408) [ClassicSimilarity], result of:
      0.09252147 = score(doc=8408,freq=2.0), product of:
        0.205834 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.050593734 = queryNorm
        0.44949555 = fieldWeight in 8408, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.078125 = fieldNorm(doc=8408)
    0.045673028 = product of:
      0.091346055 = sum of:
        0.091346055 = weight(_text_:database in 8408) [ClassicSimilarity], result of:
          0.091346055 = score(doc=8408,freq=2.0), product of:
            0.20452234 = queryWeight, product of:
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.050593734 = queryNorm
            0.4466312 = fieldWeight in 8408, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.078125 = fieldNorm(doc=8408)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: This is a reference guide to language technology organizations and products around the world. Areas covered in the directory include: Artificial intelligence, Document storage and retrieval, Electronic dictionaries (mono- and multilingual), Expert language systems, Multilingual word processors, Natural language database interfaces, Term databanks, Terminology management, Text content analysis, Thesauri

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07

0.06728035 = product of:
  0.10092052 = sum of:
    0.08035626 = product of:
      0.24106878 = sum of:
        0.24106878 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.24106878 = score(doc=562,freq=2.0), product of:
            0.42893425 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.050593734 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.020564256 = product of:
      0.041128512 = sum of:
        0.041128512 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.041128512 = score(doc=562,freq=2.0), product of:
            0.17717063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050593734 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Li, W.; Wong, K.-F.; Yuan, C.: Toward automatic Chinese temporal information extraction (2001) 0.05
```
0.046064835 = product of:
  0.06909725 = sum of:
    0.046260733 = weight(_text_:reference in 6029) [ClassicSimilarity], result of:
      0.046260733 = score(doc=6029,freq=2.0), product of:
        0.205834 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.050593734 = queryNorm
        0.22474778 = fieldWeight in 6029, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6029)
    0.022836514 = product of:
      0.045673028 = sum of:
        0.045673028 = weight(_text_:database in 6029) [ClassicSimilarity], result of:
          0.045673028 = score(doc=6029,freq=2.0), product of:
            0.20452234 = queryWeight, product of:
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.050593734 = queryNorm
            0.2233156 = fieldWeight in 6029, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6029)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Over the past few years, temporal information processing and temporal database management have increasingly become hot topics. Nevertheless, only a few researchers have investigated these areas in the Chinese language. This lays down the objective of our research: to exploit Chinese language processing techniques for temporal information extraction and concept reasoning. In this article, we first study the mechanism for expressing time in Chinese. On the basis of the study, we then design a general frame structure for maintaining the extracted temporal concepts and propose a system for extracting time-dependent information from Hong Kong financial news. In the system, temporal knowledge is represented by different types of temporal concepts (TTC) and different temporal relations, including absolute and relative relations, which are used to correlate between action times and reference times. In analyzing a sentence, the algorithm first determines the situation related to the verb. This in turn will identify the type of temporal concept associated with the verb. After that, the relevant temporal information is extracted and the temporal relations are derived. These relations link relevant concept frames together in chronological order, which in turn provide the knowledge to fulfill users' queries, e.g., for question-answering (i.e., Q&A) applications

Schwarz, C.: THESYS: Thesaurus Syntax System : a fully automatic thesaurus building aid (1988) 0.04

0.037308503 = product of:
  0.111925505 = sum of:
    0.111925505 = sum of:
      0.06394224 = weight(_text_:database in 1361) [ClassicSimilarity], result of:
        0.06394224 = score(doc=1361,freq=2.0), product of:
          0.20452234 = queryWeight, product of:
            4.042444 = idf(docFreq=2109, maxDocs=44218)
            0.050593734 = queryNorm
          0.31264183 = fieldWeight in 1361, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.042444 = idf(docFreq=2109, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1361)
      0.047983266 = weight(_text_:22 in 1361) [ClassicSimilarity], result of:
        0.047983266 = score(doc=1361,freq=2.0), product of:
          0.17717063 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050593734 = queryNorm
          0.2708308 = fieldWeight in 1361, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1361)
  0.33333334 = coord(1/3)

Abstract: THESYS is based on the natural language processing of free-text databases. It yields statistically evaluated correlations between words of the database. These correlations correspond to traditional thesaurus relations. The person who has to build a thesaurus is thus assisted by the proposals made by THESYS. THESYS is being tested on commercial databases under real world conditions. It is part of a text processing project at Siemens, called TINA (Text-Inhalts-Analyse). Software from TINA is actually being applied and evaluated by the US Department of Commerce for patent search and indexing (REALIST: REtrieval Aids by Linguistics and STatistics)
Date: 6. 1.1999 10:22:07

L'Homme, M.-C.: Processing word combinations in existing terms banks (1995) 0.04

0.037008584 = product of:
  0.11102575 = sum of:
    0.11102575 = weight(_text_:reference in 2949) [ClassicSimilarity], result of:
      0.11102575 = score(doc=2949,freq=2.0), product of:
        0.205834 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.050593734 = queryNorm
        0.5393946 = fieldWeight in 2949, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.09375 = fieldNorm(doc=2949)
  0.33333334 = coord(1/3)

Abstract: How specific can word combinations be stored in computerized reference tools? The focus of this paper is on word lexical groups in special languages and their representation for translation purposes

Allen, E.E.: Searching, naturally (1998) 0.04

0.037008584 = product of:
  0.11102575 = sum of:
    0.11102575 = weight(_text_:reference in 2602) [ClassicSimilarity], result of:
      0.11102575 = score(doc=2602,freq=2.0), product of:
        0.205834 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.050593734 = queryNorm
        0.5393946 = fieldWeight in 2602, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.09375 = fieldNorm(doc=2602)
  0.33333334 = coord(1/3)

Source: Internet reference services quarterly. 3(1998) no.2, S.75-81

Bird, S.; Dale, R.; Dorr, B.; Gibson, B.; Joseph, M.; Kan, M.-Y.; Lee, D.; Powley, B.; Radev, D.; Tan, Y.F.: ¬The ACL Anthology Reference Corpus : a reference dataset for bibliographic research in computational linguistics (2008) 0.03
```
0.03489203 = product of:
  0.10467609 = sum of:
    0.10467609 = weight(_text_:reference in 2804) [ClassicSimilarity], result of:
      0.10467609 = score(doc=2804,freq=16.0), product of:
        0.205834 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.050593734 = queryNorm
        0.5085462 = fieldWeight in 2804, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.03125 = fieldNorm(doc=2804)
  0.33333334 = coord(1/3)
```
Abstract

The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Anthology that can be used for research in scholarly document processing. This corpus, which we call the ACL Anthology Reference Corpus (ACL ARC), brings together the recent activities of a number of research groups around the world. Our goal is to make the corpus widely available, and to encourage other researchers to use it as a standard testbed for experiments in both bibliographic and bibliometric research.

Content

Vgl. auch: Automatic Term Recognition (ATR) is a research task that deals with the identification of domain-specific terms. Terms, in simple words, are textual realization of significant concepts in an expertise domain. Additionally, domain-specific terms may be classified into a number of categories, in which each category represents a significant concept. A term classification task is often defined on top of an ATR procedure to perform such categorization. For instance, in the biomedical domain, terms can be classified as drugs, proteins, and genes. This is a reference dataset for terminology extraction and classification research in computational linguistics. It is a set of manually annotated terms in English language that are extracted from the ACL Anthology Reference Corpus (ACL ARC). The ACL ARC is a canonicalised and frozen subset of scientific publications in the domain of Human Language Technologies (HLT). It consists of 10,921 articles from 1965 to 2006. The dataset, called ACL RD-TEC, is comprised of more than 69,000 candidate terms that are manually annotated as valid and invalid terms. Furthermore, valid terms are classified as technology and non-technology terms. Technology terms refer to a method, process, or in general a technological concept in the domain of HLT, e.g. machine translation, word sense disambiguation, and language modelling. On the other hand, non-technology terms refer to important concepts other than technological; examples of such terms in the domain of HLT are multilingual lexicon, corpora, word sense, and language model. The dataset is created to serve as a gold standard for the comparison of the algorithms of term recognition and classification. [http://catalog.elra.info/product_info.php?products_id=1236].

Object

ACL Anthology Reference Corpus
Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.03
```
0.031381153 = product of:
  0.09414345 = sum of:
    0.09414345 = sum of:
      0.045673028 = weight(_text_:database in 2541) [ClassicSimilarity], result of:
        0.045673028 = score(doc=2541,freq=2.0), product of:
          0.20452234 = queryWeight, product of:
            4.042444 = idf(docFreq=2109, maxDocs=44218)
            0.050593734 = queryNorm
          0.2233156 = fieldWeight in 2541, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.042444 = idf(docFreq=2109, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2541)
      0.04847042 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
        0.04847042 = score(doc=2541,freq=4.0), product of:
          0.17717063 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050593734 = queryNorm
          0.27358043 = fieldWeight in 2541, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2541)
  0.33333334 = coord(1/3)
```
Abstract

The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.

Date

14. 8.2004 17:22:56

Source

Online. 28(2004) no.3, S.22-29

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.03

0.026785422 = product of:
  0.08035626 = sum of:
    0.08035626 = product of:
      0.24106878 = sum of:
        0.24106878 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.24106878 = score(doc=862,freq=2.0), product of:
            0.42893425 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.050593734 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Czejdo. B.D.; Tucci, R.P.: ¬A dataflow graphical language for database applications (1994) 0.03

0.026369337 = product of:
  0.07910801 = sum of:
    0.07910801 = product of:
      0.15821601 = sum of:
        0.15821601 = weight(_text_:database in 559) [ClassicSimilarity], result of:
          0.15821601 = score(doc=559,freq=6.0), product of:
            0.20452234 = queryWeight, product of:
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.050593734 = queryNorm
            0.77358794 = fieldWeight in 559, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.078125 = fieldNorm(doc=559)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: Discusses a graphical language for information retrieval and processing. A lot of recent activity has occured in the area of improving access to database systems. However, current results are restricted to simple interfacing of database systems. Proposes a graphical language for specifying complex applications

Zadeh, B.Q.; Handschuh, S.: ¬The ACL RD-TEC : a dataset for benchmarking terminology extraction and classification in computational linguistics (2014) 0.03
```
0.026169024 = product of:
  0.07850707 = sum of:
    0.07850707 = weight(_text_:reference in 2803) [ClassicSimilarity], result of:
      0.07850707 = score(doc=2803,freq=4.0), product of:
        0.205834 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.050593734 = queryNorm
        0.38140965 = fieldWeight in 2803, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.046875 = fieldNorm(doc=2803)
  0.33333334 = coord(1/3)
```
Abstract

This paper introduces ACL RD-TEC: a dataset for evaluating the extraction and classification of terms from literature in the domain of computational linguistics. The dataset is derived from the Association for Computational Linguistics anthology reference corpus (ACL ARC). In its first release, the ACL RD-TEC consists of automatically segmented, part-of-speech-tagged ACL ARC documents, three lists of candidate terms, and more than 82,000 manually annotated terms. The annotated terms are marked as either valid or invalid, and valid terms are further classified as technology and non-technology terms. Technology terms signify methods, algorithms, and solutions in computational linguistics. The paper describes the dataset and reports the relevant statistics. We hope the step described in this paper encourages a collaborative effort towards building a full-fledged annotated corpus from the computational linguistics literature.

Object

ACL Anthology Reference Corpus
Belbachir, F.; Boughanem, M.: Using language models to improve opinion detection (2018) 0.02
```
0.024672393 = product of:
  0.074017175 = sum of:
    0.074017175 = weight(_text_:reference in 5044) [ClassicSimilarity], result of:
      0.074017175 = score(doc=5044,freq=8.0), product of:
        0.205834 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.050593734 = queryNorm
        0.35959643 = fieldWeight in 5044, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.03125 = fieldNorm(doc=5044)
  0.33333334 = coord(1/3)
```
Abstract

Opinion mining is one of the most important research tasks in the information retrieval research community. With the huge volume of opinionated data available on the Web, approaches must be developed to differentiate opinion from fact. In this paper, we present a lexicon-based approach for opinion retrieval. Generally, opinion retrieval consists of two stages: relevance to the query and opinion detection. In our work, we focus on the second state which itself focusses on detecting opinionated documents . We compare the document to be analyzed with opinionated sources that contain subjective information. We hypothesize that a document with a strong similarity to opinionated sources is more likely to be opinionated itself. Typical lexicon-based approaches treat and choose their opinion sources according to their test collection, then calculate the opinion score based on the frequency of subjective terms in the document. In our work, we use different open opinion collections without any specific treatment and consider them as a reference collection. We then use language models to determine opinion scores. The analysis document and reference collection are represented by different language models (i.e., Dirichlet, Jelinek-Mercer and two-stage models). These language models are generally used in information retrieval to represent the relationship between documents and queries. However, in our study, we modify these language models to represent opinionated documents. We carry out several experiments using Text REtrieval Conference (TREC) Blogs 06 as our analysis collection and Internet Movie Data Bases (IMDB), Multi-Perspective Question Answering (MPQA) and CHESLY as our reference collection. To improve opinion detection, we study the impact of using different language models to represent the document and reference collection alongside different combinations of opinion and retrieval scores. We then use this data to deduce the best opinion detection models. Using the best models, our approach improves on the best baseline of TREC Blog (baseline4) by 30%.
Owei, V.; Higa, K.: ¬A paradigm for natural language explanation of database queries : a semantic data model approach (1994) 0.02
```
0.024358949 = product of:
  0.073076844 = sum of:
    0.073076844 = product of:
      0.14615369 = sum of:
        0.14615369 = weight(_text_:database in 8189) [ClassicSimilarity], result of:
          0.14615369 = score(doc=8189,freq=8.0), product of:
            0.20452234 = queryWeight, product of:
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.050593734 = queryNorm
            0.7146099 = fieldWeight in 8189, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.0625 = fieldNorm(doc=8189)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

An interface that provides the user with automatic feedback in the form of an explanation of how the database management system interprets user specified queries. Proposes an approach that exploits the rich semantics of graphical semantic data models to construct a restricted natural language explanation of database queries that are specified in a very high level declarative form. These interpretations of the specified query represent the system's 'understanding' of the query, and are returned to the user for validation

Source

Journal of database management. 5(1994) no.1, S.18-30
Mustafa el Hadi, W.; Jouis, C.: Natural language processing-based systems for terminological construction and their contribution to information retrieval (1996) 0.02
```
0.02158834 = product of:
  0.06476502 = sum of:
    0.06476502 = weight(_text_:reference in 6331) [ClassicSimilarity], result of:
      0.06476502 = score(doc=6331,freq=2.0), product of:
        0.205834 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.050593734 = queryNorm
        0.31464687 = fieldWeight in 6331, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6331)
  0.33333334 = coord(1/3)
```
Abstract

This paper will survey the capacity of natural language processing (NLP) systems to identify terms or concept names related to a specific field of knowledge (construction of a reference terminology) and the logico-semantic relations they entertain. The scope of our study will be limited to French language NLP systems whose purpose is automatic terms identification with textual area-grounded terms providing access keys to information
Liu, P.J.; Saleh, M.; Pot, E.; Goodrich, B.; Sepassi, R.; Kaiser, L.; Shazeer, N.: Generating Wikipedia by summarizing long sequences (2018) 0.02
```
0.02158834 = product of:
  0.06476502 = sum of:
    0.06476502 = weight(_text_:reference in 773) [ClassicSimilarity], result of:
      0.06476502 = score(doc=773,freq=2.0), product of:
        0.205834 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.050593734 = queryNorm
        0.31464687 = fieldWeight in 773, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.0546875 = fieldNorm(doc=773)
  0.33333334 = coord(1/3)
```
Abstract

We show that generating English Wikipedia articles can be approached as a multi-document summarization of source documents. We use extractive summarization to coarsely identify salient information and a neural abstractive model to generate the article. For the abstractive model, we introduce a decoder-only architecture that can scalably attend to very long sequences, much longer than typical encoder- decoder architectures used in sequence transduction. We show that this model can generate fluent, coherent multi-sentence paragraphs and even whole Wikipedia articles. When given reference documents, we show it can extract relevant factual information as reflected in perplexity, ROUGE scores and human evaluations.
Göpferich, S.: Von der Terminographie zur Textographie : computergestützte Verwaltung textsortenspezifischer Textversatzstücke (1995) 0.02
```
0.02109547 = product of:
  0.06328641 = sum of:
    0.06328641 = product of:
      0.12657282 = sum of:
        0.12657282 = weight(_text_:database in 4567) [ClassicSimilarity], result of:
          0.12657282 = score(doc=4567,freq=6.0), product of:
            0.20452234 = queryWeight, product of:
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.050593734 = queryNorm
            0.6188704 = fieldWeight in 4567, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.0625 = fieldNorm(doc=4567)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

The paper presents 2 different types of computer-based retrieval systems for text-type specific information ranging from phrases to whole standardized passages. The first part describes the structure of a full-text database for text prototypes, the second part, ways of storing text-type specific phrases and passages an a combined terminological and textographic database. The program used to illustrate this second kind of retrieval system is the terminology system CATS, which the Terminology Centre at the Faculty of Applied Linguistics and Cultural Studies of the University of Mainz in Germersheim uses for its FASTERM database
Engerer, V.: Exploring interdisciplinary relationships between linguistics and information retrieval from the 1960s to today (2017) 0.02
```
0.018504292 = product of:
  0.055512875 = sum of:
    0.055512875 = weight(_text_:reference in 3434) [ClassicSimilarity], result of:
      0.055512875 = score(doc=3434,freq=2.0), product of:
        0.205834 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.050593734 = queryNorm
        0.2696973 = fieldWeight in 3434, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.046875 = fieldNorm(doc=3434)
  0.33333334 = coord(1/3)
```
Abstract

This article explores how linguistics has influenced information retrieval (IR) and attempts to explain the impact of linguistics through an analysis of internal developments in information science generally, and IR in particular. It notes that information science/IR has been evolving from a case science into a fully fledged, "disciplined"/disciplinary science. The article establishes correspondences between linguistics and information science/IR using the three established IR paradigms-physical, cognitive, and computational-as a frame of reference. The current relationship between information science/IR and linguistics is elucidated through discussion of some recent information science publications dealing with linguistic topics and a novel technique, "keyword collocation analysis," is introduced. Insights from interdisciplinarity research and case theory are also discussed. It is demonstrated that the three stages of interdisciplinarity, namely multidisciplinarity, interdisciplinarity (in the narrow sense), and transdisciplinarity, can be linked to different phases of the information science/IR-linguistics relationship and connected to different ways of using linguistic theory in information science and IR.

Warner, A.J.: Natural language processing (1987) 0.02

0.01827934 = product of:
  0.05483802 = sum of:
    0.05483802 = product of:
      0.10967604 = sum of:
        0.10967604 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
          0.10967604 = score(doc=337,freq=2.0), product of:
            0.17717063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050593734 = queryNorm
            0.61904186 = fieldWeight in 337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=337)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Annual review of information science and technology. 22(1987), S.79-108

Priß, U.: ¬The formalization of WordNet by methods of relational concept analysis (1998) 0.02

0.018269213 = product of:
  0.054807637 = sum of:
    0.054807637 = product of:
      0.109615274 = sum of:
        0.109615274 = weight(_text_:database in 3079) [ClassicSimilarity], result of:
          0.109615274 = score(doc=3079,freq=2.0), product of:
            0.20452234 = queryWeight, product of:
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.050593734 = queryNorm
            0.53595746 = fieldWeight in 3079, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.09375 = fieldNorm(doc=3079)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: WordNet: an electronic lexical database (language, speech and communication). Ed.: C. Fellbaum

Search (103 results, page 1 of 6)

Authors

Years

Languages

Types

Themes

Subjects

Classifications