Search (536 results, page 1 of 27)

Brill, E.: ¬An overview of empirical natural language processing (1997) 0.09

0.09455121 = product of:
  0.15758535 = sum of:
    0.12269233 = weight(_text_:section in 3249) [ClassicSimilarity], result of:
      0.12269233 = score(doc=3249,freq=2.0), product of:
        0.26305357 = queryWeight, product of:
          5.276892 = idf(docFreq=613, maxDocs=44218)
          0.049850095 = queryNorm
        0.46641576 = fieldWeight in 3249, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.276892 = idf(docFreq=613, maxDocs=44218)
          0.0625 = fieldNorm(doc=3249)
    0.02131451 = weight(_text_:on in 3249) [ClassicSimilarity], result of:
      0.02131451 = score(doc=3249,freq=2.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.19440265 = fieldWeight in 3249, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0625 = fieldNorm(doc=3249)
    0.013578499 = weight(_text_:information in 3249) [ClassicSimilarity], result of:
      0.013578499 = score(doc=3249,freq=2.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.1551638 = fieldWeight in 3249, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=3249)
  0.6 = coord(3/5)

Abstract: Introduces a special section reviewing recent research in empirical methods in speech recognition, syntactic parsing, semantic processing, information extraction and machine translation. Introduces a series of specialized articles on these topics and attempts to describe and explain the growing interest in using learning methods to aid the development of natural language processing systems

Warner, A.J.: Natural language processing (1987) 0.09

0.08535734 = product of:
  0.21339335 = sum of:
    0.027156997 = weight(_text_:information in 337) [ClassicSimilarity], result of:
      0.027156997 = score(doc=337,freq=2.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.3103276 = fieldWeight in 337, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.125 = fieldNorm(doc=337)
    0.18623635 = sum of:
      0.07817236 = weight(_text_:technology in 337) [ClassicSimilarity], result of:
        0.07817236 = score(doc=337,freq=2.0), product of:
          0.14847288 = queryWeight, product of:
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.049850095 = queryNorm
          0.5265094 = fieldWeight in 337, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.125 = fieldNorm(doc=337)
      0.108063996 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
        0.108063996 = score(doc=337,freq=2.0), product of:
          0.17456654 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049850095 = queryNorm
          0.61904186 = fieldWeight in 337, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.125 = fieldNorm(doc=337)
  0.4 = coord(2/5)

Source: Annual review of information science and technology. 22(1987), S.79-108

Empirical natural language processing (1997) 0.08

0.08176249 = product of:
  0.20440623 = sum of:
    0.18403849 = weight(_text_:section in 3328) [ClassicSimilarity], result of:
      0.18403849 = score(doc=3328,freq=2.0), product of:
        0.26305357 = queryWeight, product of:
          5.276892 = idf(docFreq=613, maxDocs=44218)
          0.049850095 = queryNorm
        0.69962364 = fieldWeight in 3328, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.276892 = idf(docFreq=613, maxDocs=44218)
          0.09375 = fieldNorm(doc=3328)
    0.020367749 = weight(_text_:information in 3328) [ClassicSimilarity], result of:
      0.020367749 = score(doc=3328,freq=2.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.23274569 = fieldWeight in 3328, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.09375 = fieldNorm(doc=3328)
  0.4 = coord(2/5)

Footnote: A special section reviewing recent research in empirical methods in speech recognition, syntactic parsing, semantic processing, information extraction and machine translation

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.08

0.08110962 = product of:
  0.1351827 = sum of:
    0.07917517 = product of:
      0.2375255 = sum of:
        0.2375255 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.2375255 = score(doc=562,freq=2.0), product of:
            0.42262965 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.049850095 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.03574552 = weight(_text_:on in 562) [ClassicSimilarity], result of:
      0.03574552 = score(doc=562,freq=10.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.32602316 = fieldWeight in 562, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.020261997 = product of:
      0.040523995 = sum of:
        0.040523995 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.040523995 = score(doc=562,freq=2.0), product of:
            0.17456654 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049850095 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Source: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK

Liddy, E.D.: Natural language processing for information retrieval and knowledge discovery (1998) 0.08

0.08065254 = product of:
  0.1344209 = sum of:
    0.02637536 = weight(_text_:on in 2345) [ClassicSimilarity], result of:
      0.02637536 = score(doc=2345,freq=4.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.24056101 = fieldWeight in 2345, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2345)
    0.02656714 = weight(_text_:information in 2345) [ClassicSimilarity], result of:
      0.02656714 = score(doc=2345,freq=10.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.3035872 = fieldWeight in 2345, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2345)
    0.08147841 = sum of:
      0.03420041 = weight(_text_:technology in 2345) [ClassicSimilarity], result of:
        0.03420041 = score(doc=2345,freq=2.0), product of:
          0.14847288 = queryWeight, product of:
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.049850095 = queryNorm
          0.23034787 = fieldWeight in 2345, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2345)
      0.047278 = weight(_text_:22 in 2345) [ClassicSimilarity], result of:
        0.047278 = score(doc=2345,freq=2.0), product of:
          0.17456654 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049850095 = queryNorm
          0.2708308 = fieldWeight in 2345, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2345)
  0.6 = coord(3/5)

Abstract: Natural language processing (NLP) is a powerful technology for the vital tasks of information retrieval (IR) and knowledge discovery (KD) which, in turn, feed the visualization systems of the present and future and enable knowledge workers to focus more of their time on the vital tasks of analysis and prediction
Date: 22. 9.1997 19:16:05
Imprint: Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Haas, S.W.: Natural language processing : toward large-scale, robust systems (1996) 0.08

0.08018135 = product of:
  0.13363558 = sum of:
    0.02131451 = weight(_text_:on in 7415) [ClassicSimilarity], result of:
      0.02131451 = score(doc=7415,freq=2.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.19440265 = fieldWeight in 7415, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0625 = fieldNorm(doc=7415)
    0.019202897 = weight(_text_:information in 7415) [ClassicSimilarity], result of:
      0.019202897 = score(doc=7415,freq=4.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.21943474 = fieldWeight in 7415, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=7415)
    0.093118176 = sum of:
      0.03908618 = weight(_text_:technology in 7415) [ClassicSimilarity], result of:
        0.03908618 = score(doc=7415,freq=2.0), product of:
          0.14847288 = queryWeight, product of:
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.049850095 = queryNorm
          0.2632547 = fieldWeight in 7415, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.0625 = fieldNorm(doc=7415)
      0.054031998 = weight(_text_:22 in 7415) [ClassicSimilarity], result of:
        0.054031998 = score(doc=7415,freq=2.0), product of:
          0.17456654 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049850095 = queryNorm
          0.30952093 = fieldWeight in 7415, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=7415)
  0.6 = coord(3/5)

Abstract: State of the art review of natural language processing updating an earlier review published in ARIST 22(1987). Discusses important developments that have allowed for significant advances in the field of natural language processing: materials and resources; knowledge based systems and statistical approaches; and a strong emphasis on evaluation. Reviews some natural language processing applications and common problems still awaiting solution. Considers closely related applications such as language generation and th egeneration phase of machine translation which face the same problems as natural language processing. Covers natural language methodologies for information retrieval only briefly
Source: Annual review of information science and technology. 31(1996), S.83-119

Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.08

0.07722899 = product of:
  0.12871498 = sum of:
    0.026643137 = weight(_text_:on in 2541) [ClassicSimilarity], result of:
      0.026643137 = score(doc=2541,freq=8.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.24300331 = fieldWeight in 2541, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
    0.012001811 = weight(_text_:information in 2541) [ClassicSimilarity], result of:
      0.012001811 = score(doc=2541,freq=4.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.13714671 = fieldWeight in 2541, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
    0.090070024 = sum of:
      0.042312037 = weight(_text_:technology in 2541) [ClassicSimilarity], result of:
        0.042312037 = score(doc=2541,freq=6.0), product of:
          0.14847288 = queryWeight, product of:
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.049850095 = queryNorm
          0.2849816 = fieldWeight in 2541, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2541)
      0.047757987 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
        0.047757987 = score(doc=2541,freq=4.0), product of:
          0.17456654 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049850095 = queryNorm
          0.27358043 = fieldWeight in 2541, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2541)
  0.6 = coord(3/5)

Abstract: The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.
Date: 14. 8.2004 17:22:56
Source: Online. 28(2004) no.3, S.22-29

Anizi, M.; Dichy, J.: Improving information retrieval in Arabic through a multi-agent approach and a rich lexical resource (2011) 0.07
```
0.07406627 = product of:
  0.12344377 = sum of:
    0.07668271 = weight(_text_:section in 4738) [ClassicSimilarity], result of:
      0.07668271 = score(doc=4738,freq=2.0), product of:
        0.26305357 = queryWeight, product of:
          5.276892 = idf(docFreq=613, maxDocs=44218)
          0.049850095 = queryNorm
        0.29150987 = fieldWeight in 4738, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.276892 = idf(docFreq=613, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4738)
    0.029787935 = weight(_text_:on in 4738) [ClassicSimilarity], result of:
      0.029787935 = score(doc=4738,freq=10.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.271686 = fieldWeight in 4738, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4738)
    0.016973123 = weight(_text_:information in 4738) [ClassicSimilarity], result of:
      0.016973123 = score(doc=4738,freq=8.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.19395474 = fieldWeight in 4738, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4738)
  0.6 = coord(3/5)
```
Abstract

This paper addresses the optimization of information retrieval in Arabic. The results derived from the expanding development of sites in Arabic are often spectacular. Nevertheless, several observations indicate that the responses remain disappointing, particularly upon comparing users' requests and quality of responses. One of the problems encountered by users is the loss of time when navigating between different URLs to find adequate responses. This, in many cases, is due to the absence of forms morphologically related to the research keyword. Such problems can be approached through a morphological analyzer drawing on the DIINAR.1 morpho-lexical resource. A second problem concerns the formulation of the query, which may prove ambiguous, as in everyday language. We then focus on contextual disambiguation based on a rich lexical resource that includes collocations and set expressions. The overall scheme of such a resource will only be hinted at here. Our approach leads to the elaboration of a multi-agent system, motivated by a need to solve problems encountered when using conventional methods of analysis, and to improve the results of queries thanks to a better collaboration between different levels of analysis. We suggest resorting to four agents: morphological, morpho-lexical, contextualization, and an interface agent. These agents 'negotiate' and 'cooperate' throughout the analysis process, starting from the submission of the initial query, and going on until an adequate query is obtained.

Content

Beitrag innerhalb einer Special Section: Knowledge Organization, Competitive Intelligence, and Information Systems - Papers from 4th International Conference on "Information Systems & Economic Intelligence," February 17-19th, 2011. Marrakech - Morocco.

Gachot, D.A.; Lange, E.; Yang, J.: ¬The SYSTRAN NLP browser : an application of machine translation technology in cross-language information retrieval (1998) 0.06

0.057938628 = product of:
  0.096564375 = sum of:
    0.031971764 = weight(_text_:on in 6213) [ClassicSimilarity], result of:
      0.031971764 = score(doc=6213,freq=2.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.29160398 = fieldWeight in 6213, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.09375 = fieldNorm(doc=6213)
    0.035277974 = weight(_text_:information in 6213) [ClassicSimilarity], result of:
      0.035277974 = score(doc=6213,freq=6.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.40312737 = fieldWeight in 6213, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.09375 = fieldNorm(doc=6213)
    0.029314637 = product of:
      0.058629274 = sum of:
        0.058629274 = weight(_text_:technology in 6213) [ClassicSimilarity], result of:
          0.058629274 = score(doc=6213,freq=2.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.39488205 = fieldWeight in 6213, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.09375 = fieldNorm(doc=6213)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Series: The Kluwer International series on information retrieval
Source: Cross-language information retrieval. Ed.: G. Grefenstette

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.06

0.057605032 = product of:
  0.09600838 = sum of:
    0.015985882 = weight(_text_:on in 563) [ClassicSimilarity], result of:
      0.015985882 = score(doc=563,freq=2.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.14580199 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.0101838745 = weight(_text_:information in 563) [ClassicSimilarity], result of:
      0.0101838745 = score(doc=563,freq=2.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.116372846 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.06983863 = sum of:
      0.029314637 = weight(_text_:technology in 563) [ClassicSimilarity], result of:
        0.029314637 = score(doc=563,freq=2.0), product of:
          0.14847288 = queryWeight, product of:
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.049850095 = queryNorm
          0.19744103 = fieldWeight in 563, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.046875 = fieldNorm(doc=563)
      0.040523995 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
        0.040523995 = score(doc=563,freq=2.0), product of:
          0.17456654 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049850095 = queryNorm
          0.23214069 = fieldWeight in 563, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=563)
  0.6 = coord(3/5)

Abstract: In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Date: 10. 1.2013 19:22:47

Stolcke, A.: Linguistic knowledge and empirical methods in speech recognition (1997) 0.05

0.054508336 = product of:
  0.13627084 = sum of:
    0.12269233 = weight(_text_:section in 2660) [ClassicSimilarity], result of:
      0.12269233 = score(doc=2660,freq=2.0), product of:
        0.26305357 = queryWeight, product of:
          5.276892 = idf(docFreq=613, maxDocs=44218)
          0.049850095 = queryNorm
        0.46641576 = fieldWeight in 2660, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.276892 = idf(docFreq=613, maxDocs=44218)
          0.0625 = fieldNorm(doc=2660)
    0.013578499 = weight(_text_:information in 2660) [ClassicSimilarity], result of:
      0.013578499 = score(doc=2660,freq=2.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.1551638 = fieldWeight in 2660, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=2660)
  0.4 = coord(2/5)

Footnote: Contribution to a special section reviewing recent research in empirical methods in speech recognition, syntactic parsing, semantic processing, information extraction and machine translation

Mustafa El Hadi, W.: Evaluating human language technology : general applications to information access and management (2002) 0.05

0.04899249 = product of:
  0.08165415 = sum of:
    0.031971764 = weight(_text_:on in 1840) [ClassicSimilarity], result of:
      0.031971764 = score(doc=1840,freq=2.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.29160398 = fieldWeight in 1840, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.09375 = fieldNorm(doc=1840)
    0.020367749 = weight(_text_:information in 1840) [ClassicSimilarity], result of:
      0.020367749 = score(doc=1840,freq=2.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.23274569 = fieldWeight in 1840, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.09375 = fieldNorm(doc=1840)
    0.029314637 = product of:
      0.058629274 = sum of:
        0.058629274 = weight(_text_:technology in 1840) [ClassicSimilarity], result of:
          0.058629274 = score(doc=1840,freq=2.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.39488205 = fieldWeight in 1840, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.09375 = fieldNorm(doc=1840)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Footnote: Guest editorial to a special issue of Knowledge Organization on "Evaluation of HLT"

Witschel, H.F.: Global and local resources for peer-to-peer text retrieval (2008) 0.05
```
0.046292994 = product of:
  0.07715499 = sum of:
    0.053677894 = weight(_text_:section in 127) [ClassicSimilarity], result of:
      0.053677894 = score(doc=127,freq=2.0), product of:
        0.26305357 = queryWeight, product of:
          5.276892 = idf(docFreq=613, maxDocs=44218)
          0.049850095 = queryNorm
        0.20405689 = fieldWeight in 127, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.276892 = idf(docFreq=613, maxDocs=44218)
          0.02734375 = fieldNorm(doc=127)
    0.01318768 = weight(_text_:on in 127) [ClassicSimilarity], result of:
      0.01318768 = score(doc=127,freq=4.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.120280504 = fieldWeight in 127, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.02734375 = fieldNorm(doc=127)
    0.01028941 = weight(_text_:information in 127) [ClassicSimilarity], result of:
      0.01028941 = score(doc=127,freq=6.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.11757882 = fieldWeight in 127, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02734375 = fieldNorm(doc=127)
  0.6 = coord(3/5)
```
Abstract

This thesis is organised as follows: Chapter 2 gives a general introduction to the field of information retrieval, covering its most important aspects. Further, the tasks of distributed and peer-to-peer information retrieval (P2PIR) are introduced, motivating their application and characterising the special challenges that they involve, including a review of existing architectures and search protocols in P2PIR. Finally, chapter 2 presents approaches to evaluating the e ectiveness of both traditional and peer-to-peer IR systems. Chapter 3 contains a detailed account of state-of-the-art information retrieval models and algorithms. This encompasses models for matching queries against document representations, term weighting algorithms, approaches to feedback and associative retrieval as well as distributed retrieval. It thus defines important terminology for the following chapters. The notion of "multi-level association graphs" (MLAGs) is introduced in chapter 4. An MLAG is a simple, graph-based framework that allows to model most of the theoretical and practical approaches to IR presented in chapter 3. Moreover, it provides an easy-to-grasp way of defining and including new entities into IR modeling, such as paragraphs or peers, dividing them conceptually while at the same time connecting them to each other in a meaningful way. This allows for a unified view on many IR tasks, including that of distributed and peer-to-peer search. Starting from related work and a formal defiition of the framework, the possibilities of modeling that it provides are discussed in detail, followed by an experimental section that shows how new insights gained from modeling inside the framework can lead to novel combinations of principles and eventually to improved retrieval effectiveness.
Chapter 5 empirically tackles the first of the two research questions formulated above, namely the question of global collection statistics. More precisely, it studies possibilities of radically simplified results merging. The simplification comes from the attempt - without having knowledge of the complete collection - to equip all peers with the same global statistics, making document scores comparable across peers. Chapter 5 empirically tackles the first of the two research questions formulated above, namely the question of global collection statistics. More precisely, it studies possibilities of radically simplified results merging. The simplification comes from the attempt - without having knowledge of the complete collection - to equip all peers with the same global statistics, making document scores comparable across peers. What is examined, is the question of how we can obtain such global statistics and to what extent their use will lead to a drop in retrieval effectiveness. In chapter 6, the second research question is tackled, namely that of making forwarding decisions for queries, based on profiles of other peers. After a review of related work in that area, the chapter first defines the approaches that will be compared against each other. Then, a novel evaluation framework is introduced, including a new measure for comparing results of a distributed search engine against those of a centralised one. Finally, the actual evaluation is performed using the new framework.

Paolillo, J.C.: Linguistics and the information sciences (2009) 0.04

0.044234317 = product of:
  0.07372386 = sum of:
    0.018650195 = weight(_text_:on in 3840) [ClassicSimilarity], result of:
      0.018650195 = score(doc=3840,freq=2.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.17010231 = fieldWeight in 3840, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3840)
    0.031434666 = weight(_text_:information in 3840) [ClassicSimilarity], result of:
      0.031434666 = score(doc=3840,freq=14.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.3592092 = fieldWeight in 3840, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3840)
    0.023639 = product of:
      0.047278 = sum of:
        0.047278 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
          0.047278 = score(doc=3840,freq=2.0), product of:
            0.17456654 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049850095 = queryNorm
            0.2708308 = fieldWeight in 3840, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3840)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Linguistics is the scientific study of language which emphasizes language spoken in everyday settings by human beings. It has a long history of interdisciplinarity, both internally and in contribution to other fields, including information science. A linguistic perspective is beneficial in many ways in information science, since it examines the relationship between the forms of meaningful expressions and their social, cognitive, institutional, and communicative context, these being two perspectives on information that are actively studied, to different degrees, in information science. Examples of issues relevant to information science are presented for which the approach taken under a linguistic perspective is illustrated.
Date: 27. 8.2011 14:22:33
Source: Encyclopedia of library and information sciences. 3rd ed. Ed.: M.J. Bates

Liddy, E.D.: Natural language processing for information retrieval (2009) 0.04

0.04273203 = product of:
  0.07122005 = sum of:
    0.02131451 = weight(_text_:on in 3854) [ClassicSimilarity], result of:
      0.02131451 = score(doc=3854,freq=2.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.19440265 = fieldWeight in 3854, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0625 = fieldNorm(doc=3854)
    0.030362446 = weight(_text_:information in 3854) [ClassicSimilarity], result of:
      0.030362446 = score(doc=3854,freq=10.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.3469568 = fieldWeight in 3854, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=3854)
    0.01954309 = product of:
      0.03908618 = sum of:
        0.03908618 = weight(_text_:technology in 3854) [ClassicSimilarity], result of:
          0.03908618 = score(doc=3854,freq=2.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.2632547 = fieldWeight in 3854, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0625 = fieldNorm(doc=3854)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Natural language processing (NLP) is the computerized approach to analyzing text that is based on both a set of theories and a set of technologies. Although NLP is a relatively recent area of research and application, compared with other information technology approaches, there have been sufficient successes to date that suggest that NLP-based information access technologies will continue to be a major area of research and development in information systems now and into the future.
Source: Encyclopedia of library and information sciences. 3rd ed. Ed.: M.J. Bates

Jones, I.; Cunliffe, D.; Tudhope, D.: Natural language processing and knowledge organization systems as an aid to retrieval (2004) 0.04
```
0.041941617 = product of:
  0.10485404 = sum of:
    0.09297285 = weight(_text_:section in 2677) [ClassicSimilarity], result of:
      0.09297285 = score(doc=2677,freq=6.0), product of:
        0.26305357 = queryWeight, product of:
          5.276892 = idf(docFreq=613, maxDocs=44218)
          0.049850095 = queryNorm
        0.35343695 = fieldWeight in 2677, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.276892 = idf(docFreq=613, maxDocs=44218)
          0.02734375 = fieldNorm(doc=2677)
    0.011881187 = weight(_text_:information in 2677) [ClassicSimilarity], result of:
      0.011881187 = score(doc=2677,freq=8.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.13576832 = fieldWeight in 2677, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02734375 = fieldNorm(doc=2677)
  0.4 = coord(2/5)
```
Abstract

This paper discusses research that employs methods from Natural Language Processing (NLP) in exploiting the intellectual resources of Knowledge Organization Systems (KOS), particularly in the retrieval of information. A technique for the disambiguation of homographs and nominal compounds in free text, where these are known ambiguous terms in the KOS itself, is described. The use of Roget's Thesaurus as an intermediary in the process is also reported. A short review of the relevant literature in the field is given. Design considerations, results and conclusions are presented from the implementation of a prototype system. The linguistic techniques are applied at two complementary levels, namely an a free text string used as an entry point to the KOS, and an the underlying controlled vocabulary itself.

Content

1. Introduction The need for research into the application of linguistic techniques in Information Retrieval (IR) in general, and a similar need in faceted Knowledge Organization Systems (KOS) has been indicated by various authors. Smeaton (1997) points out the inherent limitations of conventional approaches to IR based an "bags of words", mainly difficulties caused by lexical ambiguity in the words concerned, and goes an to suggest the possibility of using Natural Language Processing (NLP) in query formulation. Past experience with a faceted retrieval system highlighted the need for integrating the linguistic perspective in order to fully utilise the potential of a KOS (Tudhope et al." 2002). The present research seeks to address some of these needs in using NLP to improve the efficacy of KOS tools in query and retrieval systems. Syntactic parsing and part-of-speech tagging can substantially reduce lexical ambiguity through homograph disambiguation. Given the two strings "1 fable the motion" and "I put the motion an the fable", for instance, the parser used in this research clearly indicates that 'fable' in the first string is a verb, while 'table' in the second string is a noun, a distinction that would be missed in the "bag of words" approach. This syntactic disambiguation enables a more precise matching from free text to the controlled vocabulary of a KOS and vice versa. The use of a general linguistic resource, namely Roget's Thesaurus of English Words and Phrases (RTEWP), as an intermediary in this process, is investigated. The adaptation of the Link parser (Sleator & Temperley, 1993) to the purposes of the research is reported. The design and implementation of the early practical stages of the project are described, and the results of the initial experiments are presented and evaluated. Applications of the techniques developed are foreseen in the areas of query disambiguation, information retrieval and automatic indexing. In the first section of the paper a brief review of the literature and relevant current work in the field is presented. The second section includes reports an the development of algorithms, the construction of data sets and theoretical and experimental work undertaken to date. The third section evaluates the results obtained, and outlines directions for future research.

Source

Knowledge organization and the global information society: Proceedings of the 8th International ISKO Conference 13-16 July 2004, London, UK. Ed.: I.C. McIlwaine
Karlova-Bourbonus, N.: Automatic detection of contradictions in texts (2018) 0.04
```
0.04033534 = product of:
  0.10083835 = sum of:
    0.07969101 = weight(_text_:section in 5976) [ClassicSimilarity], result of:
      0.07969101 = score(doc=5976,freq=6.0), product of:
        0.26305357 = queryWeight, product of:
          5.276892 = idf(docFreq=613, maxDocs=44218)
          0.049850095 = queryNorm
        0.30294594 = fieldWeight in 5976, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.276892 = idf(docFreq=613, maxDocs=44218)
          0.0234375 = fieldNorm(doc=5976)
    0.021147337 = weight(_text_:on in 5976) [ClassicSimilarity], result of:
      0.021147337 = score(doc=5976,freq=14.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.19287792 = fieldWeight in 5976, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0234375 = fieldNorm(doc=5976)
  0.4 = coord(2/5)
```
Abstract

Natural language contradictions are of complex nature. As will be shown in Chapter 5, the realization of contradictions is not limited to the examples such as Socrates is a man and Socrates is not a man (under the condition that Socrates refers to the same object in the real world), which is discussed by Aristotle (Section 3.1.1). Empirical evidence (see Chapter 5 for more details) shows that only a few contradictions occurring in the real life are of that explicit (prototypical) kind. Rather, con-tradictions make use of a variety of natural language devices such as, e.g., paraphrasing, synonyms and antonyms, passive and active voice, diversity of negation expression, and figurative linguistic means such as idioms, irony, and metaphors. Additionally, the most so-phisticated kind of contradictions, the so-called implicit contradictions, can be found only when applying world knowledge and after conducting a sequence of logical operations such as e.g. in: (1.1) The first prize was given to the experienced grandmaster L. Stein who, in total, col-lected ten points (7 wins and 3 draws). Those familiar with the chess rules know that a chess player gets one point for winning and zero points for losing the game. In case of a draw, each player gets a half point. Built on this idea and by conducting some simple mathematical operations, we can infer that in the case of 7 wins and 3 draws (the second part of the sentence), a player can only collect 8.5 points and not 10 points. Hence, we observe that there is a contradiction between the first and the second parts of the sentence.
Implicit contradictions will only partially be the subject of the present study, aiming primarily at identifying the realization mechanism and cues (Chapter 5) as well as finding the parts of contradictions by applying the state of the art algorithms for natural language processing without conducting deep meaning processing. Further in focus are the explicit and implicit contradictions that can be detected by means of explicit linguistic, structural, lexical cues, and by conducting some additional processing operations (e.g., counting the sum in order to detect contradictions arising from numerical divergencies). One should note that an additional complexity in finding contradictions can arise in case parts of the contradictions occur on different levels of realization. Thus, a contradiction can be observed on the word- and phrase-level, such as in a married bachelor (for variations of contradictions on lexical level, see Ganeev 2004), on the sentence level - between parts of a sentence or between two or more sentences, or on the text level - between the portions of a text or between the whole texts such as a contradiction between the Bible and the Quran, for example. Only contradictions arising at the level of single sentences occurring in one or more texts, as well as parts of a sentence, will be considered for the purpose of this study. Though the focus of interest will be on single sentences, it will make use of text particularities such as coreference resolution without establishing the referents in the real world. Finally, another aspect to be considered is that parts of the contradictions are not neces-sarily to appear at the same time. They can be separated by many years and centuries with or without time expression making their recognition by human and detection by machine challenging. According to Aristotle's ontological version of the LNC (Section 3.1.1), how-ever, the same time reference is required in order for two statements to be judged as a contradiction. Taking this into account, we set the borders for the study by limiting the ana-lyzed textual data thematically (only nine world events) and temporally (three days after the reported event had happened) (Section 5.1). No sophisticated time processing will thus be conducted.

Meng, K.; Ba, Z.; Ma, Y.; Li, G.: ¬A network coupling approach to detecting hierarchical linkages between science and technology (2024) 0.04

0.040312126 = product of:
  0.06718688 = sum of:
    0.027688364 = weight(_text_:on in 1205) [ClassicSimilarity], result of:
      0.027688364 = score(doc=1205,freq=6.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.25253648 = fieldWeight in 1205, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.046875 = fieldNorm(doc=1205)
    0.0101838745 = weight(_text_:information in 1205) [ClassicSimilarity], result of:
      0.0101838745 = score(doc=1205,freq=2.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.116372846 = fieldWeight in 1205, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1205)
    0.029314637 = product of:
      0.058629274 = sum of:
        0.058629274 = weight(_text_:technology in 1205) [ClassicSimilarity], result of:
          0.058629274 = score(doc=1205,freq=8.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.39488205 = fieldWeight in 1205, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=1205)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Detecting science-technology hierarchical linkages is beneficial for understanding deep interactions between science and technology (S&T). Previous studies have mainly focused on linear linkages between S&T but ignored their structural linkages. In this paper, we propose a network coupling approach to inspect hierarchical interactions of S&T by integrating their knowledge linkages and structural linkages. S&T knowledge networks are first enhanced with bidirectional encoder representation from transformers (BERT) knowledge alignment, and then their hierarchical structures are identified based on K-core decomposition. Hierarchical coupling preferences and strengths of the S&T networks over time are further calculated based on similarities of coupling nodes' degree distribution and similarities of coupling edges' weight distribution. Extensive experimental results indicate that our approach is feasible and robust in identifying the coupling hierarchy with superior performance compared to other isomorphism and dissimilarity algorithms. Our research extends the mindset of S&T linkage measurement by identifying patterns and paths of the interaction of S&T hierarchical knowledge.
Source: Journal of the Association for Information Science and Technology. 75(2023) no.2, S.167-187

Mustafa el Hadi, W.: Human language technology and its role in information access and management (2003) 0.04
```
0.04009946 = product of:
  0.06683243 = sum of:
    0.018839544 = weight(_text_:on in 5524) [ClassicSimilarity], result of:
      0.018839544 = score(doc=5524,freq=4.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.1718293 = fieldWeight in 5524, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5524)
    0.026836865 = weight(_text_:information in 5524) [ClassicSimilarity], result of:
      0.026836865 = score(doc=5524,freq=20.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.30666938 = fieldWeight in 5524, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5524)
    0.021156019 = product of:
      0.042312037 = sum of:
        0.042312037 = weight(_text_:technology in 5524) [ClassicSimilarity], result of:
          0.042312037 = score(doc=5524,freq=6.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.2849816 = fieldWeight in 5524, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5524)
      0.5 = coord(1/2)
  0.6 = coord(3/5)
```
Abstract

The role of linguistics in information access, extraction and dissemination is essential. Radical changes in the techniques of information and communication at the end of the twentieth century have had a significant effect on the function of the linguistic paradigm and its applications in all forms of communication. The introduction of new technical means have deeply changed the possibilities for the distribution of information. In this situation, what is the role of the linguistic paradigm and its practical applications, i.e., natural language processing (NLP) techniques when applied to information access? What solutions can linguistics offer in human computer interaction, extraction and management? Many fields show the relevance of the linguistic paradigm through the various technologies that require NLP, such as document and message understanding, information detection, extraction, and retrieval, question and answer, cross-language information retrieval (CLIR), text summarization, filtering, and spoken document retrieval. This paper focuses on the central role of human language technologies in the information society, surveys the current situation, describes the benefits of the above mentioned applications, outlines successes and challenges, and discusses solutions. It reviews the resources and means needed to advance information access and dissemination across language boundaries in the twenty-first century. Multilingualism, which is a natural result of globalization, requires more effort in the direction of language technology. The scope of human language technology (HLT) is large, so we limit our review to applications that involve multilinguality.

Content

Beitrag eines Themenheftes "Knowledge organization and classification in international information retrieval"

Wu, D.-S.; Liang, T.: Chinese pronominal anaphora resolution using lexical knowledge and entropy-based weight (2008) 0.04

0.03677069 = product of:
  0.061284482 = sum of:
    0.03230309 = weight(_text_:on in 2367) [ClassicSimilarity], result of:
      0.03230309 = score(doc=2367,freq=6.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.29462588 = fieldWeight in 2367, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2367)
    0.011881187 = weight(_text_:information in 2367) [ClassicSimilarity], result of:
      0.011881187 = score(doc=2367,freq=2.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.13576832 = fieldWeight in 2367, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2367)
    0.017100206 = product of:
      0.03420041 = sum of:
        0.03420041 = weight(_text_:technology in 2367) [ClassicSimilarity], result of:
          0.03420041 = score(doc=2367,freq=2.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.23034787 = fieldWeight in 2367, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2367)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Pronominal anaphors are commonly observed in written texts. In this article, effective Chinese pronominal anaphora resolution is addressed by using lexical knowledge acquisition and salience measurement. The lexical knowledge acquisition is aimed to extract more semantic features, such as gender, number, and collocate compatibility by employing multiple resources. The presented salience measurement is based on entropy-based weighting on selecting antecedent candidates. The resolution is justified with a real corpus and compared with a rule-based model. Experimental results by five-fold cross-validation show that our approach yields 82.5% success rate on 1343 anaphoric instances. In comparison with a general rule-based approach, the performance is improved by 7%.
Source: Journal of the American Society for Information Science and Technology. 59(2008) no.13, S.2138-2145

Search (536 results, page 1 of 27)

Authors

Years

Languages

Types

Themes

Subjects

Classifications