Search (75 results, page 1 of 4)

Bredack, J.: Automatische Extraktion fachterminologischer Mehrwortbegriffe : ein Verfahrensvergleich (2016) 0.08

0.07832358 = product of:
  0.15664716 = sum of:
    0.11337131 = weight(_text_:master in 3194) [ClassicSimilarity], result of:
      0.11337131 = score(doc=3194,freq=2.0), product of:
        0.3116585 = queryWeight, product of:
          6.5848994 = idf(docFreq=165, maxDocs=44218)
          0.047329273 = queryNorm
        0.36376774 = fieldWeight in 3194, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.5848994 = idf(docFreq=165, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3194)
    0.04327585 = weight(_text_:reference in 3194) [ClassicSimilarity], result of:
      0.04327585 = score(doc=3194,freq=2.0), product of:
        0.19255297 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.047329273 = queryNorm
        0.22474778 = fieldWeight in 3194, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3194)
  0.5 = coord(2/4)

Abstract: In dieser Untersuchung wurden zwei Systeme eingesetzt, um MWT aus einer Dokumentkollektion mit fachsprachlichem Bezug (Volltexte des ACL Anthology Reference Corpus) automatisch zu extrahieren. Das thematische Spektrum umfasste alle Bereiche der natürlichen Sprachverarbeitung, im Speziellen die CL als interdisziplinäre Wissenschaft. Ziel war es MWT zu extrahieren, die als potentielle Indexterme im IR Verwendung finden können. Diese sollten auf Konzepte, Methoden, Verfahren und Algorithmen in der CL und angrenzenden Teilgebieten, wie Linguistik und Informatik hinweisen bzw. benennen.
Content: Schriftliche Hausarbeit (Masterarbeit) zur Erlangung des Grades eines Master of Arts An der Universität Trier Fachbereich II Studiengang Computerlinguistik.

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.08

0.07764148 = product of:
  0.15528296 = sum of:
    0.13604558 = weight(_text_:master in 563) [ClassicSimilarity], result of:
      0.13604558 = score(doc=563,freq=2.0), product of:
        0.3116585 = queryWeight, product of:
          6.5848994 = idf(docFreq=165, maxDocs=44218)
          0.047329273 = queryNorm
        0.4365213 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.5848994 = idf(docFreq=165, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.019237388 = product of:
      0.038474776 = sum of:
        0.038474776 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
          0.038474776 = score(doc=563,freq=2.0), product of:
            0.16573904 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047329273 = queryNorm
            0.23214069 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Content: A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Date: 10. 1.2013 19:22:47

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.05

0.047204413 = product of:
  0.094408825 = sum of:
    0.07517143 = product of:
      0.2255143 = sum of:
        0.2255143 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.2255143 = score(doc=562,freq=2.0), product of:
            0.4012581 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.047329273 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.019237388 = product of:
      0.038474776 = sum of:
        0.038474776 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.038474776 = score(doc=562,freq=2.0), product of:
            0.16573904 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047329273 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Addison, E.R.; Wilson, H.D.; Feder, J.: ¬The impact of plain English searching on end users (1993) 0.05
```
0.045348525 = product of:
  0.1813941 = sum of:
    0.1813941 = weight(_text_:master in 5354) [ClassicSimilarity], result of:
      0.1813941 = score(doc=5354,freq=2.0), product of:
        0.3116585 = queryWeight, product of:
          6.5848994 = idf(docFreq=165, maxDocs=44218)
          0.047329273 = queryNorm
        0.5820284 = fieldWeight in 5354, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.5848994 = idf(docFreq=165, maxDocs=44218)
          0.0625 = fieldNorm(doc=5354)
  0.25 = coord(1/4)
```
Abstract

Commercial software products are available with plain English searching capabilities as engines for online and CD-ROM information services, and for internal text information management. With plain English interfaces, end users do not need to master the keyword and connector approach of the Boolean search query language. Describes plain English searching and its impact on the process of full text retrieval. Explores the issues of ease of use, reliability and implications for the total research process

Godby, J.: WordSmith research project bridges gap between tokens and indexes (1998) 0.04

0.041514903 = product of:
  0.08302981 = sum of:
    0.060586188 = weight(_text_:reference in 4729) [ClassicSimilarity], result of:
      0.060586188 = score(doc=4729,freq=2.0), product of:
        0.19255297 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.047329273 = queryNorm
        0.31464687 = fieldWeight in 4729, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4729)
    0.02244362 = product of:
      0.04488724 = sum of:
        0.04488724 = weight(_text_:22 in 4729) [ClassicSimilarity], result of:
          0.04488724 = score(doc=4729,freq=2.0), product of:
            0.16573904 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047329273 = queryNorm
            0.2708308 = fieldWeight in 4729, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4729)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Reports on an OCLC natural language processing research project to develop methods for identifying terminology in unstructured electronic text, especially material associated with new cultural trends and emerging subjects. Current OCLC production software can only identify single words as indexable terms in full text documents, thus a major goal of the WordSmith project is to develop software that can automatically identify and intelligently organize phrases for uses in database indexes. By analyzing user terminology from local newspapers in the USA, the latest cultural trends and technical developments as well as personal and geographic names have been drawm out. Notes that this new vocabulary can also be mapped into reference works
Source: OCLC newsletter. 1998, no.234, Jul/Aug, S.22-24

Scherer Auberson, K.: Counteracting concept drift in natural language classifiers : proposal for an automated method (2018) 0.03

0.034011394 = product of:
  0.13604558 = sum of:
    0.13604558 = weight(_text_:master in 2849) [ClassicSimilarity], result of:
      0.13604558 = score(doc=2849,freq=2.0), product of:
        0.3116585 = queryWeight, product of:
          6.5848994 = idf(docFreq=165, maxDocs=44218)
          0.047329273 = queryNorm
        0.4365213 = fieldWeight in 2849, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.5848994 = idf(docFreq=165, maxDocs=44218)
          0.046875 = fieldNorm(doc=2849)
  0.25 = coord(1/4)

Content: Diese Publikation entstand im Rahmen einer Thesis zum Master of Science FHO in Business Administration, Major Information and Data Management.

Renker, L.: Exploration von Textkorpora : Topic Models als Grundlage der Interaktion (2015) 0.03

0.028342828 = product of:
  0.11337131 = sum of:
    0.11337131 = weight(_text_:master in 2380) [ClassicSimilarity], result of:
      0.11337131 = score(doc=2380,freq=2.0), product of:
        0.3116585 = queryWeight, product of:
          6.5848994 = idf(docFreq=165, maxDocs=44218)
          0.047329273 = queryNorm
        0.36376774 = fieldWeight in 2380, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.5848994 = idf(docFreq=165, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2380)
  0.25 = coord(1/4)

Footnote: Masterthesis zur Erlangung des akademischen Grades Master of Science (M.Sc.) vorgelegt an der Fachhochschule Köln / Fakultät für Informatik und Ingenieurswissenschaften im Studiengang Medieninformatik.

L'Homme, M.-C.: Processing word combinations in existing terms banks (1995) 0.03

0.025965508 = product of:
  0.10386203 = sum of:
    0.10386203 = weight(_text_:reference in 2949) [ClassicSimilarity], result of:
      0.10386203 = score(doc=2949,freq=2.0), product of:
        0.19255297 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.047329273 = queryNorm
        0.5393946 = fieldWeight in 2949, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.09375 = fieldNorm(doc=2949)
  0.25 = coord(1/4)

Abstract: How specific can word combinations be stored in computerized reference tools? The focus of this paper is on word lexical groups in special languages and their representation for translation purposes

Allen, E.E.: Searching, naturally (1998) 0.03

0.025965508 = product of:
  0.10386203 = sum of:
    0.10386203 = weight(_text_:reference in 2602) [ClassicSimilarity], result of:
      0.10386203 = score(doc=2602,freq=2.0), product of:
        0.19255297 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.047329273 = queryNorm
        0.5393946 = fieldWeight in 2602, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.09375 = fieldNorm(doc=2602)
  0.25 = coord(1/4)

Source: Internet reference services quarterly. 3(1998) no.2, S.75-81

Bird, S.; Dale, R.; Dorr, B.; Gibson, B.; Joseph, M.; Kan, M.-Y.; Lee, D.; Powley, B.; Radev, D.; Tan, Y.F.: ¬The ACL Anthology Reference Corpus : a reference dataset for bibliographic research in computational linguistics (2008) 0.02
```
0.024480518 = product of:
  0.09792207 = sum of:
    0.09792207 = weight(_text_:reference in 2804) [ClassicSimilarity], result of:
      0.09792207 = score(doc=2804,freq=16.0), product of:
        0.19255297 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.047329273 = queryNorm
        0.5085462 = fieldWeight in 2804, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.03125 = fieldNorm(doc=2804)
  0.25 = coord(1/4)
```
Abstract

The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Anthology that can be used for research in scholarly document processing. This corpus, which we call the ACL Anthology Reference Corpus (ACL ARC), brings together the recent activities of a number of research groups around the world. Our goal is to make the corpus widely available, and to encourage other researchers to use it as a standard testbed for experiments in both bibliographic and bibliometric research.

Content

Vgl. auch: Automatic Term Recognition (ATR) is a research task that deals with the identification of domain-specific terms. Terms, in simple words, are textual realization of significant concepts in an expertise domain. Additionally, domain-specific terms may be classified into a number of categories, in which each category represents a significant concept. A term classification task is often defined on top of an ATR procedure to perform such categorization. For instance, in the biomedical domain, terms can be classified as drugs, proteins, and genes. This is a reference dataset for terminology extraction and classification research in computational linguistics. It is a set of manually annotated terms in English language that are extracted from the ACL Anthology Reference Corpus (ACL ARC). The ACL ARC is a canonicalised and frozen subset of scientific publications in the domain of Human Language Technologies (HLT). It consists of 10,921 articles from 1965 to 2006. The dataset, called ACL RD-TEC, is comprised of more than 69,000 candidate terms that are manually annotated as valid and invalid terms. Furthermore, valid terms are classified as technology and non-technology terms. Technology terms refer to a method, process, or in general a technological concept in the domain of HLT, e.g. machine translation, word sense disambiguation, and language modelling. On the other hand, non-technology terms refer to important concepts other than technological; examples of such terms in the domain of HLT are multilingual lexicon, corpora, word sense, and language model. The dataset is created to serve as a gold standard for the comparison of the algorithms of term recognition and classification. [http://catalog.elra.info/product_info.php?products_id=1236].

Object

ACL Anthology Reference Corpus

Menge-Sonnentag, R.: Google veröffentlicht einen Parser für natürliche Sprache (2016) 0.02

0.022674263 = product of:
  0.09069705 = sum of:
    0.09069705 = weight(_text_:master in 2941) [ClassicSimilarity], result of:
      0.09069705 = score(doc=2941,freq=2.0), product of:
        0.3116585 = queryWeight, product of:
          6.5848994 = idf(docFreq=165, maxDocs=44218)
          0.047329273 = queryNorm
        0.2910142 = fieldWeight in 2941, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.5848994 = idf(docFreq=165, maxDocs=44218)
          0.03125 = fieldNorm(doc=2941)
  0.25 = coord(1/4)

Footnote: Download unter: https://github.com/tensorflow/models/tree/master/syntaxnet. Dort befinden sich auch weitere Information zu dem Modell sowie Vergleichszahlen zur Erkennungsrate.

¬The language engineering directory (1993) 0.02

0.021637926 = product of:
  0.0865517 = sum of:
    0.0865517 = weight(_text_:reference in 8408) [ClassicSimilarity], result of:
      0.0865517 = score(doc=8408,freq=2.0), product of:
        0.19255297 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.047329273 = queryNorm
        0.44949555 = fieldWeight in 8408, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.078125 = fieldNorm(doc=8408)
  0.25 = coord(1/4)

Abstract: This is a reference guide to language technology organizations and products around the world. Areas covered in the directory include: Artificial intelligence, Document storage and retrieval, Electronic dictionaries (mono- and multilingual), Expert language systems, Multilingual word processors, Natural language database interfaces, Term databanks, Terminology management, Text content analysis, Thesauri

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.02

0.018792858 = product of:
  0.07517143 = sum of:
    0.07517143 = product of:
      0.2255143 = sum of:
        0.2255143 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.2255143 = score(doc=862,freq=2.0), product of:
            0.4012581 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.047329273 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Wenzel, F.: Semantische Eingrenzung im Freitext-Retrieval auf der Basis morphologischer Segmentierungen (1980) 0.02

0.018779518 = product of:
  0.07511807 = sum of:
    0.07511807 = product of:
      0.15023614 = sum of:
        0.15023614 = weight(_text_:file in 2037) [ClassicSimilarity], result of:
          0.15023614 = score(doc=2037,freq=2.0), product of:
            0.25368783 = queryWeight, product of:
              5.3600616 = idf(docFreq=564, maxDocs=44218)
              0.047329273 = queryNorm
            0.59220874 = fieldWeight in 2037, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.3600616 = idf(docFreq=564, maxDocs=44218)
              0.078125 = fieldNorm(doc=2037)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Abstract: The basic problem in freetext retrieval is that the retrieval language is not properly adapted to that of the author. Morphological segmentation, where words with the same root are grouped together in the inverted file, is a good eliminator of noise and information loss, providing high recall but low precision

Zadeh, B.Q.; Handschuh, S.: ¬The ACL RD-TEC : a dataset for benchmarking terminology extraction and classification in computational linguistics (2014) 0.02
```
0.01836039 = product of:
  0.07344156 = sum of:
    0.07344156 = weight(_text_:reference in 2803) [ClassicSimilarity], result of:
      0.07344156 = score(doc=2803,freq=4.0), product of:
        0.19255297 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.047329273 = queryNorm
        0.38140965 = fieldWeight in 2803, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.046875 = fieldNorm(doc=2803)
  0.25 = coord(1/4)
```
Abstract

This paper introduces ACL RD-TEC: a dataset for evaluating the extraction and classification of terms from literature in the domain of computational linguistics. The dataset is derived from the Association for Computational Linguistics anthology reference corpus (ACL ARC). In its first release, the ACL RD-TEC consists of automatically segmented, part-of-speech-tagged ACL ARC documents, three lists of candidate terms, and more than 82,000 manually annotated terms. The annotated terms are marked as either valid or invalid, and valid terms are further classified as technology and non-technology terms. Technology terms signify methods, algorithms, and solutions in computational linguistics. The paper describes the dataset and reports the relevant statistics. We hope the step described in this paper encourages a collaborative effort towards building a full-fledged annotated corpus from the computational linguistics literature.

Object

ACL Anthology Reference Corpus
Belbachir, F.; Boughanem, M.: Using language models to improve opinion detection (2018) 0.02
```
0.01731034 = product of:
  0.06924136 = sum of:
    0.06924136 = weight(_text_:reference in 5044) [ClassicSimilarity], result of:
      0.06924136 = score(doc=5044,freq=8.0), product of:
        0.19255297 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.047329273 = queryNorm
        0.35959643 = fieldWeight in 5044, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.03125 = fieldNorm(doc=5044)
  0.25 = coord(1/4)
```
Abstract

Opinion mining is one of the most important research tasks in the information retrieval research community. With the huge volume of opinionated data available on the Web, approaches must be developed to differentiate opinion from fact. In this paper, we present a lexicon-based approach for opinion retrieval. Generally, opinion retrieval consists of two stages: relevance to the query and opinion detection. In our work, we focus on the second state which itself focusses on detecting opinionated documents . We compare the document to be analyzed with opinionated sources that contain subjective information. We hypothesize that a document with a strong similarity to opinionated sources is more likely to be opinionated itself. Typical lexicon-based approaches treat and choose their opinion sources according to their test collection, then calculate the opinion score based on the frequency of subjective terms in the document. In our work, we use different open opinion collections without any specific treatment and consider them as a reference collection. We then use language models to determine opinion scores. The analysis document and reference collection are represented by different language models (i.e., Dirichlet, Jelinek-Mercer and two-stage models). These language models are generally used in information retrieval to represent the relationship between documents and queries. However, in our study, we modify these language models to represent opinionated documents. We carry out several experiments using Text REtrieval Conference (TREC) Blogs 06 as our analysis collection and Internet Movie Data Bases (IMDB), Multi-Perspective Question Answering (MPQA) and CHESLY as our reference collection. To improve opinion detection, we study the impact of using different language models to represent the document and reference collection alongside different combinations of opinion and retrieval scores. We then use this data to deduce the best opinion detection models. Using the best models, our approach improves on the best baseline of TREC Blog (baseline4) by 30%.
Mustafa el Hadi, W.; Jouis, C.: Natural language processing-based systems for terminological construction and their contribution to information retrieval (1996) 0.02
```
0.015146547 = product of:
  0.060586188 = sum of:
    0.060586188 = weight(_text_:reference in 6331) [ClassicSimilarity], result of:
      0.060586188 = score(doc=6331,freq=2.0), product of:
        0.19255297 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.047329273 = queryNorm
        0.31464687 = fieldWeight in 6331, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6331)
  0.25 = coord(1/4)
```
Abstract

This paper will survey the capacity of natural language processing (NLP) systems to identify terms or concept names related to a specific field of knowledge (construction of a reference terminology) and the logico-semantic relations they entertain. The scope of our study will be limited to French language NLP systems whose purpose is automatic terms identification with textual area-grounded terms providing access keys to information
Liu, P.J.; Saleh, M.; Pot, E.; Goodrich, B.; Sepassi, R.; Kaiser, L.; Shazeer, N.: Generating Wikipedia by summarizing long sequences (2018) 0.02
```
0.015146547 = product of:
  0.060586188 = sum of:
    0.060586188 = weight(_text_:reference in 773) [ClassicSimilarity], result of:
      0.060586188 = score(doc=773,freq=2.0), product of:
        0.19255297 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.047329273 = queryNorm
        0.31464687 = fieldWeight in 773, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.0546875 = fieldNorm(doc=773)
  0.25 = coord(1/4)
```
Abstract

We show that generating English Wikipedia articles can be approached as a multi-document summarization of source documents. We use extractive summarization to coarsely identify salient information and a neural abstractive model to generate the article. For the abstractive model, we introduce a decoder-only architecture that can scalably attend to very long sequences, much longer than typical encoder- decoder architectures used in sequence transduction. We show that this model can generate fluent, coherent multi-sentence paragraphs and even whole Wikipedia articles. When given reference documents, we show it can extract relevant factual information as reflected in perplexity, ROUGE scores and human evaluations.
Chen, K.-H.: Evaluating Chinese text retrieval with multilingual queries (2002) 0.01
```
0.013145663 = product of:
  0.05258265 = sum of:
    0.05258265 = product of:
      0.1051653 = sum of:
        0.1051653 = weight(_text_:file in 1851) [ClassicSimilarity], result of:
          0.1051653 = score(doc=1851,freq=2.0), product of:
            0.25368783 = queryWeight, product of:
              5.3600616 = idf(docFreq=564, maxDocs=44218)
              0.047329273 = queryNorm
            0.4145461 = fieldWeight in 1851, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.3600616 = idf(docFreq=564, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1851)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

This paper reports the design of a Chinese test collection with multilingual queries and the application of this test collection to evaluate information retrieval Systems. The effective indexing units, IR models, translation techniques, and query expansion for Chinese text retrieval are identified. The collaboration of East Asian countries for construction of test collections for cross-language multilingual text retrieval is also discussed in this paper. As well, a tool is designed to help assessors judge relevante and gather the events of relevante judgment. The log file created by this tool will be used to analyze the behaviors of assessors in the future.
Engerer, V.: Exploring interdisciplinary relationships between linguistics and information retrieval from the 1960s to today (2017) 0.01
```
0.012982754 = product of:
  0.051931016 = sum of:
    0.051931016 = weight(_text_:reference in 3434) [ClassicSimilarity], result of:
      0.051931016 = score(doc=3434,freq=2.0), product of:
        0.19255297 = queryWeight, product of:
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.047329273 = queryNorm
        0.2696973 = fieldWeight in 3434, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0683694 = idf(docFreq=2055, maxDocs=44218)
          0.046875 = fieldNorm(doc=3434)
  0.25 = coord(1/4)
```
Abstract

This article explores how linguistics has influenced information retrieval (IR) and attempts to explain the impact of linguistics through an analysis of internal developments in information science generally, and IR in particular. It notes that information science/IR has been evolving from a case science into a fully fledged, "disciplined"/disciplinary science. The article establishes correspondences between linguistics and information science/IR using the three established IR paradigms-physical, cognitive, and computational-as a frame of reference. The current relationship between information science/IR and linguistics is elucidated through discussion of some recent information science publications dealing with linguistic topics and a novel technique, "keyword collocation analysis," is introduced. Insights from interdisciplinarity research and case theory are also discussed. It is demonstrated that the three stages of interdisciplinarity, namely multidisciplinarity, interdisciplinarity (in the narrow sense), and transdisciplinarity, can be linked to different phases of the information science/IR-linguistics relationship and connected to different ways of using linguistic theory in information science and IR.

Search (75 results, page 1 of 4)

Authors

Years

Languages

Types

Themes

Subjects

Classifications