Search (171 results, page 1 of 9)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.05

0.050282635 = product of:
  0.10056527 = sum of:
    0.0800734 = product of:
      0.2402202 = sum of:
        0.2402202 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.2402202 = score(doc=562,freq=2.0), product of:
            0.42742437 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.050415643 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.02049187 = product of:
      0.04098374 = sum of:
        0.04098374 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.04098374 = score(doc=562,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Mustafa El Hadi, W.: Evaluating human language technology : general applications to information access and management (2002) 0.04

0.040340282 = product of:
  0.080680564 = sum of:
    0.02059882 = weight(_text_:information in 1840) [ClassicSimilarity], result of:
      0.02059882 = score(doc=1840,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.23274569 = fieldWeight in 1840, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.09375 = fieldNorm(doc=1840)
    0.060081743 = product of:
      0.120163485 = sum of:
        0.120163485 = weight(_text_:organization in 1840) [ClassicSimilarity], result of:
          0.120163485 = score(doc=1840,freq=4.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.66850436 = fieldWeight in 1840, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.09375 = fieldNorm(doc=1840)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Footnote: Guest editorial to a special issue of Knowledge Organization on "Evaluation of HLT"
Source: Knowledge organization. 29(2002) nos.3/4, S.124-134

Wright, S.E.: Leveraging terminology resources across application boundaries : accessing resources in future integrated environments (2000) 0.04
```
0.03509516 = product of:
  0.07019032 = sum of:
    0.014865918 = weight(_text_:information in 5528) [ClassicSimilarity], result of:
      0.014865918 = score(doc=5528,freq=6.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.16796975 = fieldWeight in 5528, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5528)
    0.0553244 = weight(_text_:standards in 5528) [ClassicSimilarity], result of:
      0.0553244 = score(doc=5528,freq=2.0), product of:
        0.22470023 = queryWeight, product of:
          4.4569545 = idf(docFreq=1393, maxDocs=44218)
          0.050415643 = queryNorm
        0.24621427 = fieldWeight in 5528, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4569545 = idf(docFreq=1393, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5528)
  0.5 = coord(2/4)
```
Abstract

The title for this conference, stated in English, is Language Technology for a Dynamic Economy - y in the Media Age - The question arises as to what the media are we are dealing with and to what extent we are moving away from tile reality of different media to a world in which all sub-categories flow together into a unified stream of information that is constantly resealed to appear in different hardware configurations. A few years ago, people who were interested in sharing data or getting different electronic "boxes" to talk to each other were focused on two major aspects: I ) developing data conversion technology, and 2) convincing potential users that sharing information was an even remotely interesting option. Although some content "owners" are still reticent about releasing their data, it has become dramatically apparent in the Web environment that a broad range of users does indeed want this technology. Even as researchers struggle with the remaining technical, legal, and ethical impediments that stand in the way of unlimited information access to existing multi-platform resources, the future view of the world will no longer be as obsessed with conversion capability as it will be with creating content, with ,in eye to morphing technologies that will enable the delivery of that content from ail open-standards-based format such as XML (eXtensibic Markup Language), MPEG (Moving Picture Experts Group), or WAP (Wireless Application Protocol) to a rich variety of display Options

Paolillo, J.C.: Linguistics and the information sciences (2009) 0.03

0.027849235 = product of:
  0.05569847 = sum of:
    0.03179129 = weight(_text_:information in 3840) [ClassicSimilarity], result of:
      0.03179129 = score(doc=3840,freq=14.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.3592092 = fieldWeight in 3840, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3840)
    0.023907183 = product of:
      0.047814365 = sum of:
        0.047814365 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
          0.047814365 = score(doc=3840,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.2708308 = fieldWeight in 3840, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3840)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Linguistics is the scientific study of language which emphasizes language spoken in everyday settings by human beings. It has a long history of interdisciplinarity, both internally and in contribution to other fields, including information science. A linguistic perspective is beneficial in many ways in information science, since it examines the relationship between the forms of meaningful expressions and their social, cognitive, institutional, and communicative context, these being two perspectives on information that are actively studied, to different degrees, in information science. Examples of issues relevant to information science are presented for which the approach taken under a linguistic perspective is illustrated.
Date: 27. 8.2011 14:22:33
Source: Encyclopedia of library and information sciences. 3rd ed. Ed.: M.J. Bates

Herrera-Viedma, E.; Cordón, O.; Herrera, J.C.; Luqe, M.: ¬An IRS based on multi-granular lnguistic information (2003) 0.03
```
0.026535526 = product of:
  0.05307105 = sum of:
    0.02303018 = weight(_text_:information in 2740) [ClassicSimilarity], result of:
      0.02303018 = score(doc=2740,freq=10.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.2602176 = fieldWeight in 2740, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2740)
    0.030040871 = product of:
      0.060081743 = sum of:
        0.060081743 = weight(_text_:organization in 2740) [ClassicSimilarity], result of:
          0.060081743 = score(doc=2740,freq=4.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.33425218 = fieldWeight in 2740, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.046875 = fieldNorm(doc=2740)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

An information retrieval system (IRS) based on fuzzy multi-granular linguistic information is proposed. The system has an evaluation method to process multi-granular linguistic information, in such a way that the inputs to the IRS are represented in a different linguistic domain than the outputs. The system accepts Boolean queries whose terms are weighted by means of the ordinal linguistic values represented by the linguistic variable "Importance" assessed an a label set S. The system evaluates the weighted queries according to a threshold semantic and obtains the linguistic retrieval status values (RSV) of documents represented by a linguistic variable "Relevance" expressed in a different label set S'. The advantage of this linguistic IRS with respect to others is that the use of the multi-granular linguistic information facilitates and improves the IRS-user interaction

Series

Advances in knowledge organization; vol.8

Source

Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas

Sidhom, S.; Hassoun, M.: Morpho-syntactic parsing to text mining environment : NP recognition model to knowledge visualization and information (2003) 0.03

0.026284594 = product of:
  0.05256919 = sum of:
    0.017165681 = weight(_text_:information in 3546) [ClassicSimilarity], result of:
      0.017165681 = score(doc=3546,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.19395474 = fieldWeight in 3546, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=3546)
    0.035403505 = product of:
      0.07080701 = sum of:
        0.07080701 = weight(_text_:organization in 3546) [ClassicSimilarity], result of:
          0.07080701 = score(doc=3546,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.39391994 = fieldWeight in 3546, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.078125 = fieldNorm(doc=3546)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Source: Tendencias de investigación en organización del conocimient: IV Cologuio International de Ciencas de la Documentación , VI Congreso del Capitulo Espanol de ISKO = Trends in knowledge organization research. Eds.: J.A. Frias u. C. Travieso

Mustafa El Hadi, W.: Terminologies, ontologies and information access (2006) 0.03

0.026054136 = product of:
  0.052108273 = sum of:
    0.023785468 = weight(_text_:information in 1488) [ClassicSimilarity], result of:
      0.023785468 = score(doc=1488,freq=6.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.2687516 = fieldWeight in 1488, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=1488)
    0.028322803 = product of:
      0.056645606 = sum of:
        0.056645606 = weight(_text_:organization in 1488) [ClassicSimilarity], result of:
          0.056645606 = score(doc=1488,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.31513596 = fieldWeight in 1488, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0625 = fieldNorm(doc=1488)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Ontologies have become an important issue in research communities across several disciplines. This paper discusses some of the innovative techniques involving automatic terminology resources acquisition are briefly discussed. Suggests that NLP-based ontologies are useful in reducing the cost of ontology engineering. Emphasizes that linguistic ontologies covering both ontological and lexical information can offer solutions since they can be more easily updated by the resources of NLP products.
Source: Knowledge organization, information systems and other essays: Professor A. Neelameghan Festschrift. Ed. by K.S. Raghavan and K.N. Prasad

Navarretta, C.; Pedersen, B.S.; Hansen, D.H.: Language technology in knowledge-organization systems (2006) 0.03

0.025678985 = product of:
  0.05135797 = sum of:
    0.014565565 = weight(_text_:information in 5706) [ClassicSimilarity], result of:
      0.014565565 = score(doc=5706,freq=4.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.16457605 = fieldWeight in 5706, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=5706)
    0.036792405 = product of:
      0.07358481 = sum of:
        0.07358481 = weight(_text_:organization in 5706) [ClassicSimilarity], result of:
          0.07358481 = score(doc=5706,freq=6.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.40937364 = fieldWeight in 5706, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.046875 = fieldNorm(doc=5706)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: This paper describes the language technology methods developed in the Danish research project VID to extract from Danish text material relevant information for the population of knowledge organization systems (KOS) within specific corporate domains. The results achieved by applying these methods to a prototype search engine tuned to the patent and trademark domain indicate that the use of human language technology can support the construction of a linguistically based KOS and that linguistic information in search improves recall substantially without harming precision (near 90%). Finally, we describe two research experiments where (1) linguistic analysis of Danish compounds and is exploited to improve search atrategies on these (2) linguistic knowledge is used to model corporate knowledge into a language-based ontology.
Content: Beitrag eines Themenheftes "Knowledge organization systems and services"

Mustafa el Hadi, W.: Dynamics of the linguistic paradigm in information retrieval (2000) 0.03

0.025319844 = product of:
  0.05063969 = sum of:
    0.02059882 = weight(_text_:information in 151) [ClassicSimilarity], result of:
      0.02059882 = score(doc=151,freq=8.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.23274569 = fieldWeight in 151, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=151)
    0.030040871 = product of:
      0.060081743 = sum of:
        0.060081743 = weight(_text_:organization in 151) [ClassicSimilarity], result of:
          0.060081743 = score(doc=151,freq=4.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.33425218 = fieldWeight in 151, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.046875 = fieldNorm(doc=151)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: In this paper we briefly sketch the dynamics of the linguistic paradigm in Information Retrieval (IR) and its adaptation to the Internet. The emergence of Natural Language Processing (NLP) techniques has been a major factor leading to this adaptation. These techniques and tools try to adapt to the current needs, i.e. retrieving information from documents written and indexed in a foreign language by using a native language query to express the information need. This process, known as cross-language IR (CLIR), is a field at the cross roads of both Machine Translation and IR. This field represents a real challenge to the IR community and will require a solid cooperation with the NLP community.
Series: Advances in knowledge organization; vol.7
Source: Dynamism and stability in knowledge organization: Proceedings of the 6th International ISKO-Conference, 10-13 July 2000, Toronto, Canada. Ed.: C. Beghtol et al

Bowker, L.: Information retrieval in translation memory systems : assessment of current limitations and possibilities for future development (2002) 0.02

0.024407204 = product of:
  0.04881441 = sum of:
    0.024031956 = weight(_text_:information in 1854) [ClassicSimilarity], result of:
      0.024031956 = score(doc=1854,freq=8.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.27153665 = fieldWeight in 1854, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1854)
    0.024782453 = product of:
      0.049564905 = sum of:
        0.049564905 = weight(_text_:organization in 1854) [ClassicSimilarity], result of:
          0.049564905 = score(doc=1854,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.27574396 = fieldWeight in 1854, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1854)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: A translation memory system is a new type of human language technology (HLT) tool that is gaining popularity among translators. Such tools allow translators to store previously translated texts in a type of aligned bilingual database, and to recycle relevant parts of these texts when producing new translations. Currently, these tools retrieve information from the database using superficial character string matching, which often results in poor precision and recall. This paper explains how translation memory systems work, and it considers some possible ways for introducing more sophisticated information retrieval techniques into such systems by taking syntactic and semantic similarity into account. Some of the suggested techniques are inspired by these used in other areas of HLT, and some by techniques used in information science.
Source: Knowledge organization. 29(2002) nos.3/4, S.198-203

Sidhom, S.; Hassoun, M.: Morpho-syntactic parsing for a text mining environment : An NP recognition model for knowledge visualization and information retrieval (2002) 0.02

0.023939986 = product of:
  0.04787997 = sum of:
    0.017839102 = weight(_text_:information in 1852) [ClassicSimilarity], result of:
      0.017839102 = score(doc=1852,freq=6.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.20156369 = fieldWeight in 1852, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1852)
    0.030040871 = product of:
      0.060081743 = sum of:
        0.060081743 = weight(_text_:organization in 1852) [ClassicSimilarity], result of:
          0.060081743 = score(doc=1852,freq=4.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.33425218 = fieldWeight in 1852, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.046875 = fieldNorm(doc=1852)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Sidhom and Hassoun discuss the crucial role of NLP tools in Knowledge Extraction and Management as well as in the design of Information Retrieval Systems. The authors focus more specifically an the morpho-syntactic issues by describing their morpho-syntactic analysis platform, which has been implemented to cover the automatic indexing and information retrieval topics. To this end they implemented the Cascaded "Augmented Transition Network (ATN)". They used this formalism in order to analyse French text descriptions of Multimedia documents. An implementation of an ATN parsing automaton is briefly described. The Platform in its logical operation is considered as an investigative tool towards the knowledge organization (based an an NP recognition model) and management of multiform e-documents (text, multimedia, audio, image) using their text descriptions.
Source: Knowledge organization. 29(2002) nos.3/4, S.171-180

Mustafa el Hadi, W.: Terminology & information retrieval : new tools for new needs. Integration of knowledge across boundaries (2003) 0.02

0.023939986 = product of:
  0.04787997 = sum of:
    0.017839102 = weight(_text_:information in 2688) [ClassicSimilarity], result of:
      0.017839102 = score(doc=2688,freq=6.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.20156369 = fieldWeight in 2688, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2688)
    0.030040871 = product of:
      0.060081743 = sum of:
        0.060081743 = weight(_text_:organization in 2688) [ClassicSimilarity], result of:
          0.060081743 = score(doc=2688,freq=4.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.33425218 = fieldWeight in 2688, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.046875 = fieldNorm(doc=2688)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The radical changes in information and communication techniques at the end of the 20th century have significantly modified the function of terminology and its applications in all forms of communication. The introduction of new mediums has deeply changed the possibilities of distribution of scientific information. What in this situation is the role of terminology and its practical applications? What is the place for multiple functions of terminology in the communication society? What is the impact of natural language (NLP) techniques used in its processing and management? In this article we will focus an the possibilities NLP techniques offer and how they can be directed towards the satisfaction of the newly expressed needs.
Series: Advances in knowledge organization; vol.8
Source: Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas

Mustafa el Hadi, W.: Human language technology and its role in information access and management (2003) 0.02
```
0.02242154 = product of:
  0.04484308 = sum of:
    0.027141329 = weight(_text_:information in 5524) [ClassicSimilarity], result of:
      0.027141329 = score(doc=5524,freq=20.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.30666938 = fieldWeight in 5524, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5524)
    0.017701752 = product of:
      0.035403505 = sum of:
        0.035403505 = weight(_text_:organization in 5524) [ClassicSimilarity], result of:
          0.035403505 = score(doc=5524,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.19695997 = fieldWeight in 5524, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5524)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The role of linguistics in information access, extraction and dissemination is essential. Radical changes in the techniques of information and communication at the end of the twentieth century have had a significant effect on the function of the linguistic paradigm and its applications in all forms of communication. The introduction of new technical means have deeply changed the possibilities for the distribution of information. In this situation, what is the role of the linguistic paradigm and its practical applications, i.e., natural language processing (NLP) techniques when applied to information access? What solutions can linguistics offer in human computer interaction, extraction and management? Many fields show the relevance of the linguistic paradigm through the various technologies that require NLP, such as document and message understanding, information detection, extraction, and retrieval, question and answer, cross-language information retrieval (CLIR), text summarization, filtering, and spoken document retrieval. This paper focuses on the central role of human language technologies in the information society, surveys the current situation, describes the benefits of the above mentioned applications, outlines successes and challenges, and discusses solutions. It reviews the resources and means needed to advance information access and dissemination across language boundaries in the twenty-first century. Multilingualism, which is a natural result of globalization, requires more effort in the direction of language technology. The scope of human language technology (HLT) is large, so we limit our review to applications that involve multilinguality.

Content

Beitrag eines Themenheftes "Knowledge organization and classification in international information retrieval"
Rosemblat, G.; Tse, T.; Gemoets, D.: Adapting a monolingual consumer health system for Spanish cross-language information retrieval (2004) 0.02
```
0.02109987 = product of:
  0.04219974 = sum of:
    0.017165681 = weight(_text_:information in 2673) [ClassicSimilarity], result of:
      0.017165681 = score(doc=2673,freq=8.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.19395474 = fieldWeight in 2673, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2673)
    0.025034059 = product of:
      0.050068118 = sum of:
        0.050068118 = weight(_text_:organization in 2673) [ClassicSimilarity], result of:
          0.050068118 = score(doc=2673,freq=4.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.27854347 = fieldWeight in 2673, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2673)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

This preliminary study applies a bilingual term list (BTL) approach to cross-language information retrieval (CLIR) in the consumer health domain and compares it to a machine translation (MT) approach. We compiled a Spanish-English BTL of 34,980 medical and general terms. We collected a training set of 466 general health queries from MedlinePlus en espaiiol and 488 domainspecific queries from ClinicalTrials.gov translated into Spanish. We submitted the training set queries in English against a test bed of 7,170 ClinicalTrials.gov English documents, and compared MT and BTL against this English monolingual standard. The BTL approach was less effective (F = 0.420) than the MT approach (F = 0.578). A failure analysis of the results led to substitution of BTL dictionary sources and the addition of rudimentary normalisation of plural forms. These changes improved the CLIR effectiveness of the same training set queries (F = 0.474), and yielded comparable results for a test set of new 954 queries (F= 0.484). These results will shape our efforts to support Spanishspeakers' needs for consumer health information currently only available in English.

Series

Advances in knowledge organization; vol.9

Source

Knowledge organization and the global information society: Proceedings of the 8th International ISKO Conference 13-16 July 2004, London, UK. Ed.: I.C. McIlwaine

Martínez, F.; Martín, M.T.; Rivas, V.M.; Díaz, M.C.; Ureña, L.A.: Using neural networks for multiword recognition in IR (2003) 0.02

0.020170141 = product of:
  0.040340282 = sum of:
    0.01029941 = weight(_text_:information in 2777) [ClassicSimilarity], result of:
      0.01029941 = score(doc=2777,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.116372846 = fieldWeight in 2777, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2777)
    0.030040871 = product of:
      0.060081743 = sum of:
        0.060081743 = weight(_text_:organization in 2777) [ClassicSimilarity], result of:
          0.060081743 = score(doc=2777,freq=4.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.33425218 = fieldWeight in 2777, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.046875 = fieldNorm(doc=2777)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: In this paper, a supervised neural network has been used to classify pairs of terms as being multiwords or non-multiwords. Classification is based an the values yielded by different estimators, currently available in literature, used as inputs for the neural network. Lists of multiwords and non-multiwords have been built to train the net. Afterward, many other pairs of terms have been classified using the trained net. Results obtained in this classification have been used to perform information retrieval tasks. Experiments show that detecting multiwords results in better performance of the IR methods.
Series: Advances in knowledge organization; vol.8
Source: Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas

Peis, E.; Herrera-Viedma, E.; Herrera, J.C.: On the evaluation of XML documents using Fuzzy linguistic techniques (2003) 0.02

0.020170141 = product of:
  0.040340282 = sum of:
    0.01029941 = weight(_text_:information in 2778) [ClassicSimilarity], result of:
      0.01029941 = score(doc=2778,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.116372846 = fieldWeight in 2778, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2778)
    0.030040871 = product of:
      0.060081743 = sum of:
        0.060081743 = weight(_text_:organization in 2778) [ClassicSimilarity], result of:
          0.060081743 = score(doc=2778,freq=4.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.33425218 = fieldWeight in 2778, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.046875 = fieldNorm(doc=2778)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Recommender systems evaluate and filter the great amount of information available an the Web to assist people in their search processes. A fuzzy evaluation method of XML documents based an computing with words is presented. Given an XML document type (e.g. scientific article), we consider that its elements are not equally informative. This is indicated by the use of a DTD and defining linguistic importance attributes to the more meaningful elements of the DTD designed. Then, the evaluation method generates linguistic recommendations from linguistic evaluation judgements provided by different recommenders an meaningful elements of DTD.
Series: Advances in knowledge organization; vol.8
Source: Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas

Jones, I.; Cunliffe, D.; Tudhope, D.: Natural language processing and knowledge organization systems as an aid to retrieval (2004) 0.02
```
0.019861802 = product of:
  0.039723605 = sum of:
    0.012015978 = weight(_text_:information in 2677) [ClassicSimilarity], result of:
      0.012015978 = score(doc=2677,freq=8.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.13576832 = fieldWeight in 2677, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02734375 = fieldNorm(doc=2677)
    0.027707627 = product of:
      0.055415254 = sum of:
        0.055415254 = weight(_text_:organization in 2677) [ClassicSimilarity], result of:
          0.055415254 = score(doc=2677,freq=10.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.30829114 = fieldWeight in 2677, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2677)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

This paper discusses research that employs methods from Natural Language Processing (NLP) in exploiting the intellectual resources of Knowledge Organization Systems (KOS), particularly in the retrieval of information. A technique for the disambiguation of homographs and nominal compounds in free text, where these are known ambiguous terms in the KOS itself, is described. The use of Roget's Thesaurus as an intermediary in the process is also reported. A short review of the relevant literature in the field is given. Design considerations, results and conclusions are presented from the implementation of a prototype system. The linguistic techniques are applied at two complementary levels, namely an a free text string used as an entry point to the KOS, and an the underlying controlled vocabulary itself.

Content

1. Introduction The need for research into the application of linguistic techniques in Information Retrieval (IR) in general, and a similar need in faceted Knowledge Organization Systems (KOS) has been indicated by various authors. Smeaton (1997) points out the inherent limitations of conventional approaches to IR based an "bags of words", mainly difficulties caused by lexical ambiguity in the words concerned, and goes an to suggest the possibility of using Natural Language Processing (NLP) in query formulation. Past experience with a faceted retrieval system highlighted the need for integrating the linguistic perspective in order to fully utilise the potential of a KOS (Tudhope et al." 2002). The present research seeks to address some of these needs in using NLP to improve the efficacy of KOS tools in query and retrieval systems. Syntactic parsing and part-of-speech tagging can substantially reduce lexical ambiguity through homograph disambiguation. Given the two strings "1 fable the motion" and "I put the motion an the fable", for instance, the parser used in this research clearly indicates that 'fable' in the first string is a verb, while 'table' in the second string is a noun, a distinction that would be missed in the "bag of words" approach. This syntactic disambiguation enables a more precise matching from free text to the controlled vocabulary of a KOS and vice versa. The use of a general linguistic resource, namely Roget's Thesaurus of English Words and Phrases (RTEWP), as an intermediary in this process, is investigated. The adaptation of the Link parser (Sleator & Temperley, 1993) to the purposes of the research is reported. The design and implementation of the early practical stages of the project are described, and the results of the initial experiments are presented and evaluated. Applications of the techniques developed are foreseen in the areas of query disambiguation, information retrieval and automatic indexing. In the first section of the paper a brief review of the literature and relevant current work in the field is presented. The second section includes reports an the development of algorithms, the construction of data sets and theoretical and experimental work undertaken to date. The third section evaluates the results obtained, and outlines directions for future research.

Series

Advances in knowledge organization; vol.9

Source

Knowledge organization and the global information society: Proceedings of the 8th International ISKO Conference 13-16 July 2004, London, UK. Ed.: I.C. McIlwaine

Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.02

0.019165486 = product of:
  0.038330972 = sum of:
    0.017839102 = weight(_text_:information in 4436) [ClassicSimilarity], result of:
      0.017839102 = score(doc=4436,freq=6.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.20156369 = fieldWeight in 4436, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4436)
    0.02049187 = product of:
      0.04098374 = sum of:
        0.04098374 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
          0.04098374 = score(doc=4436,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.23214069 = fieldWeight in 4436, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=4436)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
Date: 16. 2.2000 14:22:39
Source: Journal of the American Society for Information Science. 51(2000) no.3, S.281-296

Chen, K.-H.: Evaluating Chinese text retrieval with multilingual queries (2002) 0.02

0.018399216 = product of:
  0.036798432 = sum of:
    0.012015978 = weight(_text_:information in 1851) [ClassicSimilarity], result of:
      0.012015978 = score(doc=1851,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.13576832 = fieldWeight in 1851, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1851)
    0.024782453 = product of:
      0.049564905 = sum of:
        0.049564905 = weight(_text_:organization in 1851) [ClassicSimilarity], result of:
          0.049564905 = score(doc=1851,freq=2.0), product of:
            0.17974974 = queryWeight, product of:
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.050415643 = queryNorm
            0.27574396 = fieldWeight in 1851, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5653565 = idf(docFreq=3399, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1851)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: This paper reports the design of a Chinese test collection with multilingual queries and the application of this test collection to evaluate information retrieval Systems. The effective indexing units, IR models, translation techniques, and query expansion for Chinese text retrieval are identified. The collaboration of East Asian countries for construction of test collections for cross-language multilingual text retrieval is also discussed in this paper. As well, a tool is designed to help assessors judge relevante and gather the events of relevante judgment. The log file created by this tool will be used to analyze the behaviors of assessors in the future.
Source: Knowledge organization. 29(2002) nos.3/4, S.156-170

Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.02
```
0.018143935 = product of:
  0.03628787 = sum of:
    0.01213797 = weight(_text_:information in 2541) [ClassicSimilarity], result of:
      0.01213797 = score(doc=2541,freq=4.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.13714671 = fieldWeight in 2541, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
    0.0241499 = product of:
      0.0482998 = sum of:
        0.0482998 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
          0.0482998 = score(doc=2541,freq=4.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.27358043 = fieldWeight in 2541, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2541)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.

Date

14. 8.2004 17:22:56

Source

Online. 28(2004) no.3, S.22-29

Search (171 results, page 1 of 9)

Authors

Languages

Types

Themes

Subjects

Classifications