Search (169 results, page 1 of 9)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07

0.068927675 = product of:
  0.10339151 = sum of:
    0.08232375 = product of:
      0.24697125 = sum of:
        0.24697125 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.24697125 = score(doc=562,freq=2.0), product of:
            0.43943653 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0518325 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.021067765 = product of:
      0.04213553 = sum of:
        0.04213553 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.04213553 = score(doc=562,freq=2.0), product of:
            0.18150859 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0518325 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Perez-Carballo, J.; Strzalkowski, T.: Natural language information retrieval : progress report (2000) 0.05

0.05365639 = product of:
  0.080484584 = sum of:
    0.034941453 = weight(_text_:information in 6421) [ClassicSimilarity], result of:
      0.034941453 = score(doc=6421,freq=4.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.3840108 = fieldWeight in 6421, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=6421)
    0.045543127 = product of:
      0.09108625 = sum of:
        0.09108625 = weight(_text_:management in 6421) [ClassicSimilarity], result of:
          0.09108625 = score(doc=6421,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.521365 = fieldWeight in 6421, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.109375 = fieldNorm(doc=6421)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: Information processing and management. 36(2000) no.1, S.155-205

0.046833646 = product of:
  0.07025047 = sum of:
    0.02470734 = weight(_text_:information in 4844) [ClassicSimilarity], result of:
      0.02470734 = score(doc=4844,freq=2.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.27153665 = fieldWeight in 4844, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=4844)
    0.045543127 = product of:
      0.09108625 = sum of:
        0.09108625 = weight(_text_:management in 4844) [ClassicSimilarity], result of:
          0.09108625 = score(doc=4844,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.521365 = fieldWeight in 4844, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.109375 = fieldNorm(doc=4844)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: Information processing and management. 36(2000) no.5, S.717-736

Kettunen, K.: Reductive and generative approaches to management of morphological variation of keywords in monolingual information retrieval : an overview (2009) 0.04
```
0.044100516 = product of:
  0.06615077 = sum of:
    0.018340444 = weight(_text_:information in 2835) [ClassicSimilarity], result of:
      0.018340444 = score(doc=2835,freq=6.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.20156369 = fieldWeight in 2835, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2835)
    0.047810324 = product of:
      0.09562065 = sum of:
        0.09562065 = weight(_text_:management in 2835) [ClassicSimilarity], result of:
          0.09562065 = score(doc=2835,freq=12.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.54731923 = fieldWeight in 2835, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.046875 = fieldNorm(doc=2835)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Purpose - The purpose of this article is to discuss advantages and disadvantages of various means to manage morphological variation of keywords in monolingual information retrieval. Design/methodology/approach - The authors present a compilation of query results from 11 mostly European languages and a new general classification of the language dependent techniques for management of morphological variation. Variants of the different techniques are compared in some detail in terms of retrieval effectiveness and other criteria. The paper consists mainly of an overview of different management methods for keyword variation in information retrieval. Typical IR retrieval results of 11 languages and a new classification for keyword management methods are also presented. Findings - The main results of the paper are an overall comparison of reductive and generative keyword management methods in terms of retrieval effectiveness and other broader criteria. Originality/value - The paper is of value to anyone who wants to get an overall picture of keyword management techniques used in IR.

Mustafa El Hadi, W.: Evaluating human language technology : general applications to information access and management (2002) 0.04

0.040143125 = product of:
  0.060214683 = sum of:
    0.02117772 = weight(_text_:information in 1840) [ClassicSimilarity], result of:
      0.02117772 = score(doc=1840,freq=2.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.23274569 = fieldWeight in 1840, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.09375 = fieldNorm(doc=1840)
    0.039036963 = product of:
      0.078073926 = sum of:
        0.078073926 = weight(_text_:management in 1840) [ClassicSimilarity], result of:
          0.078073926 = score(doc=1840,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.44688427 = fieldWeight in 1840, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.09375 = fieldNorm(doc=1840)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Paolillo, J.C.: Linguistics and the information sciences (2009) 0.04

0.038175866 = product of:
  0.057263795 = sum of:
    0.032684736 = weight(_text_:information in 3840) [ClassicSimilarity], result of:
      0.032684736 = score(doc=3840,freq=14.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.3592092 = fieldWeight in 3840, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3840)
    0.02457906 = product of:
      0.04915812 = sum of:
        0.04915812 = weight(_text_:22 in 3840) [ClassicSimilarity], result of:
          0.04915812 = score(doc=3840,freq=2.0), product of:
            0.18150859 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0518325 = queryNorm
            0.2708308 = fieldWeight in 3840, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3840)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Linguistics is the scientific study of language which emphasizes language spoken in everyday settings by human beings. It has a long history of interdisciplinarity, both internally and in contribution to other fields, including information science. A linguistic perspective is beneficial in many ways in information science, since it examines the relationship between the forms of meaningful expressions and their social, cognitive, institutional, and communicative context, these being two perspectives on information that are actively studied, to different degrees, in information science. Examples of issues relevant to information science are presented for which the approach taken under a linguistic perspective is illustrated.
Date: 27. 8.2011 14:22:33
Source: Encyclopedia of library and information sciences. 3rd ed. Ed.: M.J. Bates

Mustafa el Hadi, W.: Human language technology and its role in information access and management (2003) 0.03
```
0.0339379 = product of:
  0.050906852 = sum of:
    0.027904097 = weight(_text_:information in 5524) [ClassicSimilarity], result of:
      0.027904097 = score(doc=5524,freq=20.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.30666938 = fieldWeight in 5524, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5524)
    0.023002753 = product of:
      0.046005506 = sum of:
        0.046005506 = weight(_text_:management in 5524) [ClassicSimilarity], result of:
          0.046005506 = score(doc=5524,freq=4.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.2633291 = fieldWeight in 5524, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5524)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The role of linguistics in information access, extraction and dissemination is essential. Radical changes in the techniques of information and communication at the end of the twentieth century have had a significant effect on the function of the linguistic paradigm and its applications in all forms of communication. The introduction of new technical means have deeply changed the possibilities for the distribution of information. In this situation, what is the role of the linguistic paradigm and its practical applications, i.e., natural language processing (NLP) techniques when applied to information access? What solutions can linguistics offer in human computer interaction, extraction and management? Many fields show the relevance of the linguistic paradigm through the various technologies that require NLP, such as document and message understanding, information detection, extraction, and retrieval, question and answer, cross-language information retrieval (CLIR), text summarization, filtering, and spoken document retrieval. This paper focuses on the central role of human language technologies in the information society, surveys the current situation, describes the benefits of the above mentioned applications, outlines successes and challenges, and discusses solutions. It reviews the resources and means needed to advance information access and dissemination across language boundaries in the twenty-first century. Multilingualism, which is a natural result of globalization, requires more effort in the direction of language technology. The scope of human language technology (HLT) is large, so we limit our review to applications that involve multilinguality.

Content

Beitrag eines Themenheftes "Knowledge organization and classification in international information retrieval"

Sidhom, S.; Hassoun, M.: Morpho-syntactic parsing for a text mining environment : An NP recognition model for knowledge visualization and information retrieval (2002) 0.03

0.030629165 = product of:
  0.04594375 = sum of:
    0.018340444 = weight(_text_:information in 1852) [ClassicSimilarity], result of:
      0.018340444 = score(doc=1852,freq=6.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.20156369 = fieldWeight in 1852, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1852)
    0.027603304 = product of:
      0.055206608 = sum of:
        0.055206608 = weight(_text_:management in 1852) [ClassicSimilarity], result of:
          0.055206608 = score(doc=1852,freq=4.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.31599492 = fieldWeight in 1852, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.046875 = fieldNorm(doc=1852)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Sidhom and Hassoun discuss the crucial role of NLP tools in Knowledge Extraction and Management as well as in the design of Information Retrieval Systems. The authors focus more specifically an the morpho-syntactic issues by describing their morpho-syntactic analysis platform, which has been implemented to cover the automatic indexing and information retrieval topics. To this end they implemented the Cascaded "Augmented Transition Network (ATN)". They used this formalism in order to analyse French text descriptions of Multimedia documents. An implementation of an ATN parsing automaton is briefly described. The Platform in its logical operation is considered as an investigative tool towards the knowledge organization (based an an NP recognition model) and management of multiform e-documents (text, multimedia, audio, image) using their text descriptions.

Sprachtechnologie für eine dynamische Wirtschaft im Medienzeitalter - Language technologies for dynamic business in the age of the media - L'ingénierie linguistique au service de la dynamisation économique à l'ère du multimédia : Tagungsakten der XXVI. Jahrestagung der Internationalen Vereinigung Sprache und Wirtschaft e.V., 23.-25.11.2000 Fachhochschule Köln (2000) 0.03
```
0.028970806 = product of:
  0.043456208 = sum of:
    0.015283704 = weight(_text_:information in 5527) [ClassicSimilarity], result of:
      0.015283704 = score(doc=5527,freq=6.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.16796975 = fieldWeight in 5527, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5527)
    0.028172504 = product of:
      0.05634501 = sum of:
        0.05634501 = weight(_text_:management in 5527) [ClassicSimilarity], result of:
          0.05634501 = score(doc=5527,freq=6.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.32251096 = fieldWeight in 5527, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5527)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Content

Enthält die Beiträge: WRIGHT, S.E.: Leveraging terminology resources across application boundaries: accessing resources in future integrated environments; PALME, K.: E-Commerce: Verhindert Sprache Business-to-business?; RÜEGGER, R.: Die qualität der virtuellen Information als Wettbewerbsvorteil: Information im Internet ist Sprache - noch; SCHIRMER, K. u. J. HALLER: Zugang zu mehrsprachigen Nachrichten im Internet; WEISS, A. u. W. WIEDEN: Die Herstellung mehrsprachiger Informations- und Wissensressourcen in Unternehmen; FULFORD, H.: Monolingual or multilingual web sites? An exploratory study of UK SMEs; SCHMIDTKE-NIKELLA, M.: Effiziente Hypermediaentwicklung: Die Autorenentlastung durch eine Engine; SCHMIDT, R.: Maschinelle Text-Ton-Synchronisation in Wissenschaft und Wirtschaft; HELBIG, H. u.a.: Natürlichsprachlicher Zugang zu Informationsanbietern im Internet und zu lokalen Datenbanken; SIENEL, J. u.a.: Sprachtechnologien für die Informationsgesellschaft des 21. Jahrhunderts; ERBACH, G.: Sprachdialogsysteme für Telefondienste: Stand der Technik und zukünftige Entwicklungen; SUSEN, A.: Spracherkennung: Akteulle Einsatzmöglichkeiten im Bereich der Telekommunikation; BENZMÜLLER, R.: Logox WebSpeech: die neue Technologie für sprechende Internetseiten; JAARANEN, K. u.a.: Webtran tools for in-company language support; SCHMITZ, K.-D.: Projektforschung und Infrastrukturen im Bereich der Terminologie: Wie kann die Wirtschaft davon profitieren?; SCHRÖTER, F. u. U. MEYER: Entwicklung sprachlicher Handlungskompetenz in englisch mit hilfe eines Multimedia-Sprachlernsystems; KLEIN, A.: Der Einsatz von Sprachverarbeitungstools beim Sprachenlernen im Intranet; HAUER, M.: Knowledge Management braucht Terminologie Management; HEYER, G. u.a.: Texttechnologische Anwendungen am Beispiel Text Mining

Theme

Information Resources Management
Santana Suárez, O.; Carreras Riudavets, F.J.; Hernández Figueroa, Z.; González Cabrera, A.C.: Integration of an XML electronic dictionary with linguistic tools for natural language processing (2007) 0.03
```
0.028797261 = product of:
  0.043195892 = sum of:
    0.02367741 = weight(_text_:information in 921) [ClassicSimilarity], result of:
      0.02367741 = score(doc=921,freq=10.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.2602176 = fieldWeight in 921, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=921)
    0.019518482 = product of:
      0.039036963 = sum of:
        0.039036963 = weight(_text_:management in 921) [ClassicSimilarity], result of:
          0.039036963 = score(doc=921,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.22344214 = fieldWeight in 921, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.046875 = fieldNorm(doc=921)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

This study proposes the codification of lexical information in electronic dictionaries, in accordance with a generic and extendable XML scheme model, and its conjunction with linguistic tools for the processing of natural language. Our approach is different from other similar studies in that we propose XML coding of those items from a dictionary of meanings that are less related to the lexical units. Linguistic information, such as morphology, syllables, phonology, etc., will be included by means of specific linguistic tools. The use of XML as a container for the information allows the use of other XML tools for carrying out searches or for enabling presentation of the information in different resources. This model is particularly important as it combines two parallel paradigms-extendable labelling of documents and computational linguistics-and it is also applicable to other languages. We have included a comparison with the labelling proposal of printed dictionaries carried out by the Text Encoding Initiative (TEI). The proposed design has been validated with a dictionary of more than 145 000 accepted meanings.

Source

Information processing and management. 43(2007) no.4, S.946-957

Carter-Sigglow, J.: ¬Die Rolle der Sprache bei der Informationsvermittlung (2001) 0.03

0.028385475 = product of:
  0.042578213 = sum of:
    0.014974909 = weight(_text_:information in 5882) [ClassicSimilarity], result of:
      0.014974909 = score(doc=5882,freq=4.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.16457605 = fieldWeight in 5882, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=5882)
    0.027603304 = product of:
      0.055206608 = sum of:
        0.055206608 = weight(_text_:management in 5882) [ClassicSimilarity], result of:
          0.055206608 = score(doc=5882,freq=4.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.31599492 = fieldWeight in 5882, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.046875 = fieldNorm(doc=5882)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: Information Research & Content Management: Orientierung, Ordnung und Organisation im Wissensmarkt; 23. DGI-Online-Tagung der DGI und 53. Jahrestagung der Deutschen Gesellschaft für Informationswissenschaft und Informationspraxis e.V. DGI, Frankfurt am Main, 8.-10.5.2001. Proceedings. Hrsg.: R. Schmidt
Theme: Information Resources Management

Oard, D.W.; He, D.; Wang, J.: User-assisted query translation for interactive cross-language information retrieval (2008) 0.03
```
0.027130803 = product of:
  0.040696204 = sum of:
    0.02117772 = weight(_text_:information in 2030) [ClassicSimilarity], result of:
      0.02117772 = score(doc=2030,freq=8.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.23274569 = fieldWeight in 2030, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2030)
    0.019518482 = product of:
      0.039036963 = sum of:
        0.039036963 = weight(_text_:management in 2030) [ClassicSimilarity], result of:
          0.039036963 = score(doc=2030,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.22344214 = fieldWeight in 2030, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.046875 = fieldNorm(doc=2030)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Interactive Cross-Language Information Retrieval (CLIR), a process in which searcher and system collaborate to find documents that satisfy an information need regardless of the language in which those documents are written, calls for designs in which synergies between searcher and system can be leveraged so that the strengths of one can cover weaknesses of the other. This paper describes an approach that employs user-assisted query translation to help searchers better understand the system's operation. Supporting interaction and interface designs are introduced, and results from three user studies are presented. The results indicate that experienced searchers presented with this new system evolve new search strategies that make effective use of the new capabilities, that they achieve retrieval effectiveness comparable to results obtained using fully automatic techniques, and that reported satisfaction with support for cross-language searching increased. The paper concludes with a description of a freely available interactive CLIR system that incorporates lessons learned from this research.

Source

Information processing and management. 44(2008) no.1, S.181-211
Vilares, J.; Alonso, M.A.; Vilares, M.: Extraction of complex index terms in non-English IR : a shallow parsing based approach (2008) 0.03
```
0.027100569 = product of:
  0.040650852 = sum of:
    0.017648099 = weight(_text_:information in 2107) [ClassicSimilarity], result of:
      0.017648099 = score(doc=2107,freq=8.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.19395474 = fieldWeight in 2107, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2107)
    0.023002753 = product of:
      0.046005506 = sum of:
        0.046005506 = weight(_text_:management in 2107) [ClassicSimilarity], result of:
          0.046005506 = score(doc=2107,freq=4.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.2633291 = fieldWeight in 2107, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2107)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The performance of information retrieval systems is limited by the linguistic variation present in natural language texts. Word-level natural language processing techniques have been shown to be useful in reducing this variation. In this article, we summarize our work on the extension of these techniques for dealing with phrase-level variation in European languages, taking Spanish as a case in point. We propose the use of syntactic dependencies as complex index terms in an attempt to solve the problems deriving from both syntactic and morpho-syntactic variation and, in this way, to obtain more precise index terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers in order to reduce as far as possible the overhead due to this parsing process. The use of different sources of syntactic information, queries or documents, has been also studied, as has the restriction of the dependencies applied to those obtained from noun phrases. Our approaches have been tested using the CLEF corpus, obtaining consistent improvements with regard to classical word-level non-linguistic techniques. Results show, on the one hand, that syntactic information extracted from documents is more useful than that from queries. On the other hand, it has been demonstrated that by restricting dependencies to those corresponding to noun phrases, important reductions of storage and management costs can be achieved, albeit at the expense of a slight reduction in performance.

Source

Information processing and management. 44(2008) no.4, S.1517-1537

Bacchin, M.; Ferro, N.; Melucci, M.: ¬A probabilistic model for stemmer generation (2005) 0.03

0.026828196 = product of:
  0.040242292 = sum of:
    0.017470727 = weight(_text_:information in 1001) [ClassicSimilarity], result of:
      0.017470727 = score(doc=1001,freq=4.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.1920054 = fieldWeight in 1001, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1001)
    0.022771563 = product of:
      0.045543127 = sum of:
        0.045543127 = weight(_text_:management in 1001) [ClassicSimilarity], result of:
          0.045543127 = score(doc=1001,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.2606825 = fieldWeight in 1001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1001)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: In this paper we will present a language-independent probabilistic model which can automatically generate stemmers. Stemmers can improve the retrieval effectiveness of information retrieval systems, however the designing and the implementation of stemmers requires a laborious amount of effort due to the fact that documents and queries are often written or spoken in several different languages. The probabilistic model proposed in this paper aims at the development of stemmers used for several languages. The proposed model describes the mutual reinforcement relationship between stems and derivations and then provides a probabilistic interpretation. A series of experiments shows that the stemmers generated by the probabilistic model are as effective as the ones based on linguistic knowledge.
Source: Information processing and management. 41(2005) no.1, S.121-137

Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.03

0.026762083 = product of:
  0.040143125 = sum of:
    0.01411848 = weight(_text_:information in 2585) [ClassicSimilarity], result of:
      0.01411848 = score(doc=2585,freq=2.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.1551638 = fieldWeight in 2585, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=2585)
    0.026024643 = product of:
      0.052049287 = sum of:
        0.052049287 = weight(_text_:management in 2585) [ClassicSimilarity], result of:
          0.052049287 = score(doc=2585,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.29792285 = fieldWeight in 2585, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0625 = fieldNorm(doc=2585)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: Information processing and management. 38(2002) no.4, S.547-558

Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.03

0.026272139 = product of:
  0.039408207 = sum of:
    0.018340444 = weight(_text_:information in 4436) [ClassicSimilarity], result of:
      0.018340444 = score(doc=4436,freq=6.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.20156369 = fieldWeight in 4436, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4436)
    0.021067765 = product of:
      0.04213553 = sum of:
        0.04213553 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
          0.04213553 = score(doc=4436,freq=2.0), product of:
            0.18150859 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0518325 = queryNorm
            0.23214069 = fieldWeight in 4436, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=4436)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
Date: 16. 2.2000 14:22:39
Source: Journal of the American Society for Information Science. 51(2000) no.3, S.281-296

Li, W.; Wong, K.-F.; Yuan, C.: Toward automatic Chinese temporal information extraction (2001) 0.03
```
0.025253214 = product of:
  0.03787982 = sum of:
    0.02161442 = weight(_text_:information in 6029) [ClassicSimilarity], result of:
      0.02161442 = score(doc=6029,freq=12.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.23754507 = fieldWeight in 6029, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6029)
    0.016265402 = product of:
      0.032530803 = sum of:
        0.032530803 = weight(_text_:management in 6029) [ClassicSimilarity], result of:
          0.032530803 = score(doc=6029,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.18620178 = fieldWeight in 6029, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6029)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Over the past few years, temporal information processing and temporal database management have increasingly become hot topics. Nevertheless, only a few researchers have investigated these areas in the Chinese language. This lays down the objective of our research: to exploit Chinese language processing techniques for temporal information extraction and concept reasoning. In this article, we first study the mechanism for expressing time in Chinese. On the basis of the study, we then design a general frame structure for maintaining the extracted temporal concepts and propose a system for extracting time-dependent information from Hong Kong financial news. In the system, temporal knowledge is represented by different types of temporal concepts (TTC) and different temporal relations, including absolute and relative relations, which are used to correlate between action times and reference times. In analyzing a sentence, the algorithm first determines the situation related to the verb. This in turn will identify the type of temporal concept associated with the verb. After that, the relevant temporal information is extracted and the temporal relations are derived. These relations link relevant concept frames together in chronological order, which in turn provide the knowledge to fulfill users' queries, e.g., for question-answering (i.e., Q&A) applications

Source

Journal of the American Society for Information Science and technology. 52(2001) no.9, S.748-762
Li, Q.; Chen, Y.P.; Myaeng, S.-H.; Jin, Y.; Kang, B.-Y.: Concept unification of terms in different languages via web mining for Information Retrieval (2009) 0.03
```
0.025253214 = product of:
  0.03787982 = sum of:
    0.02161442 = weight(_text_:information in 4215) [ClassicSimilarity], result of:
      0.02161442 = score(doc=4215,freq=12.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.23754507 = fieldWeight in 4215, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4215)
    0.016265402 = product of:
      0.032530803 = sum of:
        0.032530803 = weight(_text_:management in 4215) [ClassicSimilarity], result of:
          0.032530803 = score(doc=4215,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.18620178 = fieldWeight in 4215, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4215)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

For historical and cultural reasons, English phrases, especially proper nouns and new words, frequently appear in Web pages written primarily in East Asian languages such as Chinese, Korean, and Japanese. Although such English terms and their equivalences in these East Asian languages refer to the same concept, they are often erroneously treated as independent index units in traditional Information Retrieval (IR). This paper describes the degree to which the problem arises in IR and proposes a novel technique to solve it. Our method first extracts English terms from native Web documents in an East Asian language, and then unifies the extracted terms and their equivalences in the native language as one index unit. For Cross-Language Information Retrieval (CLIR), one of the major hindrances to achieving retrieval performance at the level of Mono-Lingual Information Retrieval (MLIR) is the translation of terms in search queries which can not be found in a bilingual dictionary. The Web mining approach proposed in this paper for concept unification of terms in different languages can also be applied to solve this well-known challenge in CLIR. Experimental results based on NTCIR and KT-Set test collections show that the high translation precision of our approach greatly improves performance of both Mono-Lingual and Cross-Language Information Retrieval.

Source

Information processing and management. 45(2009) no.2, S.246-262

Mustafa el Hadi, W.: Terminology & information retrieval : new tools for new needs. Integration of knowledge across boundaries (2003) 0.03

0.025239285 = product of:
  0.037858926 = sum of:
    0.018340444 = weight(_text_:information in 2688) [ClassicSimilarity], result of:
      0.018340444 = score(doc=2688,freq=6.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.20156369 = fieldWeight in 2688, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2688)
    0.019518482 = product of:
      0.039036963 = sum of:
        0.039036963 = weight(_text_:management in 2688) [ClassicSimilarity], result of:
          0.039036963 = score(doc=2688,freq=2.0), product of:
            0.17470726 = queryWeight, product of:
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.0518325 = queryNorm
            0.22344214 = fieldWeight in 2688, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3706124 = idf(docFreq=4130, maxDocs=44218)
              0.046875 = fieldNorm(doc=2688)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The radical changes in information and communication techniques at the end of the 20th century have significantly modified the function of terminology and its applications in all forms of communication. The introduction of new mediums has deeply changed the possibilities of distribution of scientific information. What in this situation is the role of terminology and its practical applications? What is the place for multiple functions of terminology in the communication society? What is the impact of natural language (NLP) techniques used in its processing and management? In this article we will focus an the possibilities NLP techniques offer and how they can be directed towards the satisfaction of the newly expressed needs.

Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.02
```
0.024871793 = product of:
  0.037307687 = sum of:
    0.01247909 = weight(_text_:information in 2541) [ClassicSimilarity], result of:
      0.01247909 = score(doc=2541,freq=4.0), product of:
        0.09099081 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0518325 = queryNorm
        0.13714671 = fieldWeight in 2541, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
    0.024828598 = product of:
      0.049657196 = sum of:
        0.049657196 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
          0.049657196 = score(doc=2541,freq=4.0), product of:
            0.18150859 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0518325 = queryNorm
            0.27358043 = fieldWeight in 2541, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2541)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.

Date

14. 8.2004 17:22:56

Source

Online. 28(2004) no.3, S.22-29

Search (169 results, page 1 of 9)

Authors

Languages

Types

Themes

Subjects

Classifications