Search (45 results, page 1 of 3)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.23

0.22834091 = product of:
  0.45668182 = sum of:
    0.06293926 = product of:
      0.18881777 = sum of:
        0.18881777 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.18881777 = score(doc=562,freq=2.0), product of:
            0.3359639 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03962768 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.18881777 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.18881777 = score(doc=562,freq=2.0), product of:
        0.3359639 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.03962768 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.18881777 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.18881777 = score(doc=562,freq=2.0), product of:
        0.3359639 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.03962768 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.01610701 = product of:
      0.03221402 = sum of:
        0.03221402 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.03221402 = score(doc=562,freq=2.0), product of:
            0.13876937 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03962768 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Semantic role universals and argument linking : theoretical, typological, and psycholinguistic perspectives (2006) 0.03
```
0.031122763 = product of:
  0.08299404 = sum of:
    0.033850174 = weight(_text_:case in 3670) [ClassicSimilarity], result of:
      0.033850174 = score(doc=3670,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.1942959 = fieldWeight in 3670, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03125 = fieldNorm(doc=3670)
    0.027884906 = weight(_text_:studies in 3670) [ClassicSimilarity], result of:
      0.027884906 = score(doc=3670,freq=2.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.17634688 = fieldWeight in 3670, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03125 = fieldNorm(doc=3670)
    0.02125896 = product of:
      0.04251792 = sum of:
        0.04251792 = weight(_text_:area in 3670) [ClassicSimilarity], result of:
          0.04251792 = score(doc=3670,freq=2.0), product of:
            0.1952553 = queryWeight, product of:
              4.927245 = idf(docFreq=870, maxDocs=44218)
              0.03962768 = queryNorm
            0.21775553 = fieldWeight in 3670, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.927245 = idf(docFreq=870, maxDocs=44218)
              0.03125 = fieldNorm(doc=3670)
      0.5 = coord(1/2)
  0.375 = coord(3/8)
```
Abstract

The concept of semantic roles has been central to linguistic theory for many decades. More specifically, the assumption of such representations as mediators in the correspondence between a linguistic form and its associated meaning has helped to address a number of critical issues related to grammatical phenomena. Furthermore, in addition to featuring in all major theories of grammar, semantic (or 'thematic') roles have been referred to extensively within a wide range of other linguistic subdisciplines, including language typology and psycho-/neurolinguistics. This volume brings together insights from these different perspectives and thereby, for the first time, seeks to build upon the obvious potential for cross-fertilisation between hitherto autonomous approaches to a common theme. To this end, a view on semantic roles is adopted that goes beyond the mere assumption of generalised roles, but also focuses on their hierarchical organisation. The book is thus centred around the interdisciplinary examination of how these hierarchical dependencies subserve argument linking - both in terms of linguistic theory and with respect to real-time language processing - and how they interact with other information types in this process. Furthermore, the contributions examine the interaction between the role hierarchy and the conceptual content of (generalised) semantic roles and investigate their cross-linguistic applicability and psychological reality, as well as their explanatory potential in accounting for phenomena in the domain of language disorders. In bridging the gap between different disciplines, the book provides a valuable overview of current thought on semantic roles and argument linking, and may further serve as a point of departure for future interdisciplinary research in this area. As such, it will be of interest to scientists and advanced students in all domains of linguistics and cognitive science.

Content

Inhalt: Argument hierarchy and other factors determining argument realization / Dieter Wunderlich - Mismatches in semantic-role hierarchies and the dimensions of role semantics / Beatrice Primus - Thematic roles : universal, particular, and idiosyncratic aspects / Manfred Bierwisch - Experiencer constructions in Daghestanian languages / Bernard Comrie and Helma van den Berg - Clause-level vs. predicate-level linking / Balthasar Bickel - From meaning to syntax semantic roles and beyond / Walter Bisang - Meaning, form and function in basic case roles / Georg Bossong - Semantic macroroles and language processing / Robert D. Van Valin, Jr. - Thematic roles as event structure relations / Maria Mercedes Pinango - Generalised semantic roles and syntactic templates: Anew framework for language comprehension / Ina Bornkessel and Matthias Schlesewsky

Series

Trends in linguistics. Studies and monographs; 165
Kracht, M.: Mathematical linguistics (2002) 0.03
```
0.02748202 = product of:
  0.10992808 = sum of:
    0.05077526 = weight(_text_:case in 3572) [ClassicSimilarity], result of:
      0.05077526 = score(doc=3572,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.29144385 = fieldWeight in 3572, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=3572)
    0.05915282 = weight(_text_:studies in 3572) [ClassicSimilarity], result of:
      0.05915282 = score(doc=3572,freq=4.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.37408823 = fieldWeight in 3572, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.046875 = fieldNorm(doc=3572)
  0.25 = coord(2/8)
```
Abstract

This book studies language(s) and linguistic theories from a mathematical point of view. Starting with ideas already contained in Montague's work, it develops the mathematical foundations of present day linguistics. It equips the reader with all the background necessary to understand and evaluate theories as diverse as Montague Grammar, Categorial Grammar, HPSG and GB. The mathematical tools are mainly from universal algebra and logic, but no particular knowledge is presupposed beyond a certain mathematical sophistication that is in any case needed in order to fruitfully work within these theories. The presentation focuses an abstract mathematical structures and their computational properties, but plenty of examples from different natural languages are provided to illustrate the main concepts and results. In contrast to books devoted to so-called formal language theory, languages are seen here as semiotic systems, that is, as systems of signs. A language sign correlates form with meaning. Using the principle of compositionality it is possible to gain substantial insight into the interaction between form and meaning in natural languages.

Series

Studies in generative grammar; 63

Chowdhury, G.G.: Natural language processing (2002) 0.02

0.018361554 = product of:
  0.073446214 = sum of:
    0.02834915 = weight(_text_:libraries in 4284) [ClassicSimilarity], result of:
      0.02834915 = score(doc=4284,freq=2.0), product of:
        0.13017908 = queryWeight, product of:
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.03962768 = queryNorm
        0.2177704 = fieldWeight in 4284, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.046875 = fieldNorm(doc=4284)
    0.045097064 = product of:
      0.09019413 = sum of:
        0.09019413 = weight(_text_:area in 4284) [ClassicSimilarity], result of:
          0.09019413 = score(doc=4284,freq=4.0), product of:
            0.1952553 = queryWeight, product of:
              4.927245 = idf(docFreq=870, maxDocs=44218)
              0.03962768 = queryNorm
            0.46192923 = fieldWeight in 4284, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.927245 = idf(docFreq=870, maxDocs=44218)
              0.046875 = fieldNorm(doc=4284)
      0.5 = coord(1/2)
  0.25 = coord(2/8)

Abstract: Natural Language Processing (NLP) is an area of research and application that explores how computers can be used to understand and manipulate natural language text or speech to do useful things. NLP researchers aim to gather knowledge an how human beings understand and use language so that appropriate tools and techniques can be developed to make computer systems understand and manipulate natural languages to perform desired tasks. The foundations of NLP lie in a number of disciplines, namely, computer and information sciences, linguistics, mathematics, electrical and electronic engineering, artificial intelligence and robotics, and psychology. Applications of NLP include a number of fields of study, such as machine translation, natural language text processing and summarization, user interfaces, multilingual and cross-language information retrieval (CLIR), speech recognition, artificial intelligence, and expert systems. One important application area that is relatively new and has not been covered in previous ARIST chapters an NLP relates to the proliferation of the World Wide Web and digital libraries.

Schneider, J.W.; Borlund, P.: ¬A bibliometric-based semiautomatic approach to identification of candidate thesaurus terms : parsing and filtering of noun phrases from citation contexts (2005) 0.02

0.017851189 = product of:
  0.14280951 = sum of:
    0.14280951 = sum of:
      0.10522648 = weight(_text_:area in 156) [ClassicSimilarity], result of:
        0.10522648 = score(doc=156,freq=4.0), product of:
          0.1952553 = queryWeight, product of:
            4.927245 = idf(docFreq=870, maxDocs=44218)
            0.03962768 = queryNorm
          0.5389174 = fieldWeight in 156, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            4.927245 = idf(docFreq=870, maxDocs=44218)
            0.0546875 = fieldNorm(doc=156)
      0.037583023 = weight(_text_:22 in 156) [ClassicSimilarity], result of:
        0.037583023 = score(doc=156,freq=2.0), product of:
          0.13876937 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03962768 = queryNorm
          0.2708308 = fieldWeight in 156, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=156)
  0.125 = coord(1/8)

Abstract: The present study investigates the ability of a bibliometric based semi-automatic method to select candidate thesaurus terms from citation contexts. The method consists of document co-citation analysis, citation context analysis, and noun phrase parsing. The investigation is carried out within the specialty area of periodontology. The results clearly demonstrate that the method is able to select important candidate thesaurus terms within the chosen specialty area.
Date: 8. 3.2007 19:55:22

Ibekwe-SanJuan, F.; SanJuan, E.: From term variants to research topics (2002) 0.02
```
0.015357459 = product of:
  0.061429836 = sum of:
    0.034856133 = weight(_text_:studies in 1853) [ClassicSimilarity], result of:
      0.034856133 = score(doc=1853,freq=2.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.22043361 = fieldWeight in 1853, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1853)
    0.0265737 = product of:
      0.0531474 = sum of:
        0.0531474 = weight(_text_:area in 1853) [ClassicSimilarity], result of:
          0.0531474 = score(doc=1853,freq=2.0), product of:
            0.1952553 = queryWeight, product of:
              4.927245 = idf(docFreq=870, maxDocs=44218)
              0.03962768 = queryNorm
            0.27219442 = fieldWeight in 1853, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.927245 = idf(docFreq=870, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1853)
      0.5 = coord(1/2)
  0.25 = coord(2/8)
```
Abstract

In a scientific and technological watch (STW) task, an expert user needs to survey the evolution of research topics in his area of specialisation in order to detect interesting changes. The majority of methods proposing evaluation metrics (bibliometrics and scientometrics studies) for STW rely solely an statistical data analysis methods (Co-citation analysis, co-word analysis). Such methods usually work an structured databases where the units of analysis (words, keywords) are already attributed to documents by human indexers. The advent of huge amounts of unstructured textual data has rendered necessary the integration of natural language processing (NLP) techniques to first extract meaningful units from texts. We propose a method for STW which is NLP-oriented. The method not only analyses texts linguistically in order to extract terms from them, but also uses linguistic relations (syntactic variations) as the basis for clustering. Terms and variation relations are formalised as weighted di-graphs which the clustering algorithm, CPCL (Classification by Preferential Clustered Link) will seek to reduce in order to produces classes. These classes ideally represent the research topics present in the corpus. The results of the classification are subjected to validation by an expert in STW.
Working with conceptual structures : contributions to ICCS 2000. 8th International Conference on Conceptual Structures: Logical, Linguistic, and Computational Issues. Darmstadt, August 14-18, 2000 (2000) 0.01
```
0.013504548 = product of:
  0.054018192 = sum of:
    0.029618902 = weight(_text_:case in 5089) [ClassicSimilarity], result of:
      0.029618902 = score(doc=5089,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.17000891 = fieldWeight in 5089, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.02734375 = fieldNorm(doc=5089)
    0.024399292 = weight(_text_:studies in 5089) [ClassicSimilarity], result of:
      0.024399292 = score(doc=5089,freq=2.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.15430352 = fieldWeight in 5089, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.02734375 = fieldNorm(doc=5089)
  0.25 = coord(2/8)
```
Abstract

The 8th International Conference on Conceptual Structures - Logical, Linguistic, and Computational Issues (ICCS 2000) brings together a wide range of researchers and practitioners working with conceptual structures. During the last few years, the ICCS conference series has considerably widened its scope on different kinds of conceptual structures, stimulating research across domain boundaries. We hope that this stimulation is further enhanced by ICCS 2000 joining the long tradition of conferences in Darmstadt with extensive, lively discussions. This volume consists of contributions presented at ICCS 2000, complementing the volume "Conceptual Structures: Logical, Linguistic, and Computational Issues" (B. Ganter, G.W. Mineau (Eds.), LNAI 1867, Springer, Berlin-Heidelberg 2000). It contains submissions reviewed by the program committee, and position papers. We wish to express our appreciation to all the authors of submitted papers, to the general chair, the program chair, the editorial board, the program committee, and to the additional reviewers for making ICCS 2000 a valuable contribution in the knowledge processing research field. Special thanks go to the local organizers for making the conference an enjoyable and inspiring event. We are grateful to Darmstadt University of Technology, the Ernst Schröder Center for Conceptual Knowledge Processing, the Center for Interdisciplinary Studies in Technology, the Deutsche Forschungsgemeinschaft, Land Hessen, and NaviCon GmbH for their generous support

Content

Concepts & Language: Knowledge organization by procedures of natural language processing. A case study using the method GABEK (J. Zelger, J. Gadner) - Computer aided narrative analysis using conceptual graphs (H. Schärfe, P. 0hrstrom) - Pragmatic representation of argumentative text: a challenge for the conceptual graph approach (H. Irandoust, B. Moulin) - Conceptual graphs as a knowledge representation core in a complex language learning environment (G. Angelova, A. Nenkova, S. Boycheva, T. Nikolov) - Conceptual Modeling and Ontologies: Relationships and actions in conceptual categories (Ch. Landauer, K.L. Bellman) - Concept approximations for formal concept analysis (J. Saquer, J.S. Deogun) - Faceted information representation (U. Priß) - Simple concept graphs with universal quantifiers (J. Tappe) - A framework for comparing methods for using or reusing multiple ontologies in an application (J. van ZyI, D. Corbett) - Designing task/method knowledge-based systems with conceptual graphs (M. Leclère, F.Trichet, Ch. Choquet) - A logical ontology (J. Farkas, J. Sarbo) - Algorithms and Tools: Fast concept analysis (Ch. Lindig) - A framework for conceptual graph unification (D. Corbett) - Visual CP representation of knowledge (H.D. Pfeiffer, R.T. Hartley) - Maximal isojoin for representing software textual specifications and detecting semantic anomalies (Th. Charnois) - Troika: using grids, lattices and graphs in knowledge acquisition (H.S. Delugach, B.E. Lampkin) - Open world theorem prover for conceptual graphs (J.E. Heaton, P. Kocura) - NetCare: a practical conceptual graphs software tool (S. Polovina, D. Strang) - CGWorld - a web based workbench for conceptual graphs management and applications (P. Dobrev, K. Toutanova) - Position papers: The edition project: Peirce's existential graphs (R. Mülller) - Mining association rules using formal concept analysis (N. Pasquier) - Contextual logic summary (R Wille) - Information channels and conceptual scaling (K.E. Wolff) - Spatial concepts - a rule exploration (S. Rudolph) - The TEXT-TO-ONTO learning environment (A. Mädche, St. Staab) - Controlling the semantics of metadata on audio-visual documents using ontologies (Th. Dechilly, B. Bachimont) - Building the ontological foundations of a terminology from natural language to conceptual graphs with Ribosome, a knowledge extraction system (Ch. Jacquelinet, A. Burgun) - CharGer: some lessons learned and new directions (H.S. Delugach) - Knowledge management using conceptual graphs (W.K. Pun)
Witschel, H.F.: Global and local resources for peer-to-peer text retrieval (2008) 0.01
```
0.01327685 = product of:
  0.0531074 = sum of:
    0.03450581 = weight(_text_:studies in 127) [ClassicSimilarity], result of:
      0.03450581 = score(doc=127,freq=4.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.21821813 = fieldWeight in 127, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.02734375 = fieldNorm(doc=127)
    0.018601589 = product of:
      0.037203178 = sum of:
        0.037203178 = weight(_text_:area in 127) [ClassicSimilarity], result of:
          0.037203178 = score(doc=127,freq=2.0), product of:
            0.1952553 = queryWeight, product of:
              4.927245 = idf(docFreq=870, maxDocs=44218)
              0.03962768 = queryNorm
            0.19053608 = fieldWeight in 127, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.927245 = idf(docFreq=870, maxDocs=44218)
              0.02734375 = fieldNorm(doc=127)
      0.5 = coord(1/2)
  0.25 = coord(2/8)
```
Abstract

Chapter 5 empirically tackles the first of the two research questions formulated above, namely the question of global collection statistics. More precisely, it studies possibilities of radically simplified results merging. The simplification comes from the attempt - without having knowledge of the complete collection - to equip all peers with the same global statistics, making document scores comparable across peers. Chapter 5 empirically tackles the first of the two research questions formulated above, namely the question of global collection statistics. More precisely, it studies possibilities of radically simplified results merging. The simplification comes from the attempt - without having knowledge of the complete collection - to equip all peers with the same global statistics, making document scores comparable across peers. What is examined, is the question of how we can obtain such global statistics and to what extent their use will lead to a drop in retrieval effectiveness. In chapter 6, the second research question is tackled, namely that of making forwarding decisions for queries, based on profiles of other peers. After a review of related work in that area, the chapter first defines the approaches that will be compared against each other. Then, a novel evaluation framework is introduced, including a new measure for comparing results of a distributed search engine against those of a centralised one. Finally, the actual evaluation is performed using the new framework.
Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.01
```
0.0106174415 = product of:
  0.042469766 = sum of:
    0.03307401 = weight(_text_:libraries in 1616) [ClassicSimilarity], result of:
      0.03307401 = score(doc=1616,freq=8.0), product of:
        0.13017908 = queryWeight, product of:
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.03962768 = queryNorm
        0.25406548 = fieldWeight in 1616, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1616)
    0.009395756 = product of:
      0.018791512 = sum of:
        0.018791512 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
          0.018791512 = score(doc=1616,freq=2.0), product of:
            0.13876937 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03962768 = queryNorm
            0.1354154 = fieldWeight in 1616, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1616)
      0.5 = coord(1/2)
  0.25 = coord(2/8)
```
Abstract

The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.
Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.01
```
0.008462544 = product of:
  0.06770035 = sum of:
    0.06770035 = weight(_text_:case in 2585) [ClassicSimilarity], result of:
      0.06770035 = score(doc=2585,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.3885918 = fieldWeight in 2585, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0625 = fieldNorm(doc=2585)
  0.125 = coord(1/8)
```
Abstract

This paper presents an algorithm for generating stemmers from text stemmer specification files. A small study shows that the generated stemmers are computationally efficient, often running faster than stemmers custom written to implement particular stemming algorithms. The stemmer specification files are easily written and modified by non-programmers, making it much easier to create a stemmer, or tune a stemmer's performance, than would be the case with a custom stemmer program. Stemmer generation is thus also human-resource efficient.

Liddy, E.D.: Natural language processing for information retrieval (2009) 0.01

0.0075161774 = product of:
  0.06012942 = sum of:
    0.06012942 = product of:
      0.12025884 = sum of:
        0.12025884 = weight(_text_:area in 3854) [ClassicSimilarity], result of:
          0.12025884 = score(doc=3854,freq=4.0), product of:
            0.1952553 = queryWeight, product of:
              4.927245 = idf(docFreq=870, maxDocs=44218)
              0.03962768 = queryNorm
            0.61590564 = fieldWeight in 3854, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.927245 = idf(docFreq=870, maxDocs=44218)
              0.0625 = fieldNorm(doc=3854)
      0.5 = coord(1/2)
  0.125 = coord(1/8)

Abstract: Natural language processing (NLP) is the computerized approach to analyzing text that is based on both a set of theories and a set of technologies. Although NLP is a relatively recent area of research and application, compared with other information technology approaches, there have been sufficient successes to date that suggest that NLP-based information access technologies will continue to be a major area of research and development in information systems now and into the future.

Kishida, K.: Term disambiguation techniques based on target document collection for cross-language information retrieval : an empirical comparison of performance between techniques (2007) 0.01
```
0.007479902 = product of:
  0.059839215 = sum of:
    0.059839215 = weight(_text_:case in 897) [ClassicSimilarity], result of:
      0.059839215 = score(doc=897,freq=4.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.34346986 = fieldWeight in 897, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0390625 = fieldNorm(doc=897)
  0.125 = coord(1/8)
```
Abstract

Dictionary-based query translation for cross-language information retrieval often yields various translation candidates having different meanings for a source term in the query. This paper examines methods for solving the ambiguity of translations based on only the target document collections. First, we discuss two kinds of disambiguation technique: (1) one is a method using term co-occurrence statistics in the collection, and (2) a technique based on pseudo-relevance feedback. Next, these techniques are empirically compared using the CLEF 2003 test collection for German to Italian bilingual searches, which are executed by using English language as a pivot. The experiments showed that a variation of term co-occurrence based techniques, in which the best sequence algorithm for selecting translations is used with the Cosine coefficient, is dominant, and that the PRF method shows comparable high search performance, although statistical tests did not sufficiently support these conclusions. Furthermore, we repeat the same experiments for the case of French to Italian (pivot) and English to Italian (non-pivot) searches on the same CLEF 2003 test collection in order to verity our findings. Again, similar results were observed except that the Dice coefficient outperforms slightly the Cosine coefficient in the case of disambiguation based on term co-occurrence for English to Italian searches.
Pirkola, A.: Morphological typology of languages for IR (2001) 0.01
```
0.0073941024 = product of:
  0.05915282 = sum of:
    0.05915282 = weight(_text_:studies in 4476) [ClassicSimilarity], result of:
      0.05915282 = score(doc=4476,freq=4.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.37408823 = fieldWeight in 4476, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.046875 = fieldNorm(doc=4476)
  0.125 = coord(1/8)
```
Abstract

This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross-language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono- and cross-lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.
Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.01
```
0.0063469075 = product of:
  0.05077526 = sum of:
    0.05077526 = weight(_text_:case in 2502) [ClassicSimilarity], result of:
      0.05077526 = score(doc=2502,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.29144385 = fieldWeight in 2502, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=2502)
  0.125 = coord(1/8)
```
Abstract

Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.
Thelwall, M.; Price, L.: Language evolution and the spread of ideas on the Web : a procedure for identifying emergent hybrid word (2006) 0.01
```
0.0063469075 = product of:
  0.05077526 = sum of:
    0.05077526 = weight(_text_:case in 5896) [ClassicSimilarity], result of:
      0.05077526 = score(doc=5896,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.29144385 = fieldWeight in 5896, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=5896)
  0.125 = coord(1/8)
```
Abstract

Word usage is of interest to linguists for its own sake as well as to social scientists and others who seek to track the spread of ideas, for example, in public debates over political decisions. The historical evolution of language can be analyzed with the tools of corpus linguistics through evolving corpora and the Web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the Web, focusing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.
Airio, E.; Kettunen, K.: Does dictionary based bilingual retrieval work in a non-normalized index? (2009) 0.01
```
0.0063469075 = product of:
  0.05077526 = sum of:
    0.05077526 = weight(_text_:case in 4224) [ClassicSimilarity], result of:
      0.05077526 = score(doc=4224,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.29144385 = fieldWeight in 4224, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=4224)
  0.125 = coord(1/8)
```
Abstract

Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English-Finnish, English-Swedish, Swedish-Finnish and Finnish-Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish-Swedish, where s-gramming outperformed FCG.
Arsenault, C.: Aggregation consistency and frequency of Chinese words and characters (2006) 0.01
```
0.0052890894 = product of:
  0.042312715 = sum of:
    0.042312715 = weight(_text_:case in 609) [ClassicSimilarity], result of:
      0.042312715 = score(doc=609,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.24286987 = fieldWeight in 609, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0390625 = fieldNorm(doc=609)
  0.125 = coord(1/8)
```
Abstract

Purpose - Aims to measure syllable aggregation consistency of Romanized Chinese data in the title fields of bibliographic records. Also aims to verify if the term frequency distributions satisfy conventional bibliometric laws. Design/methodology/approach - Uses Cooper's interindexer formula to evaluate aggregation consistency within and between two sets of Chinese bibliographic data. Compares the term frequency distributions of polysyllabic words and monosyllabic characters (for vernacular and Romanized data) with the Lotka and the generalised Zipf theoretical distributions. The fits are tested with the Kolmogorov-Smirnov test. Findings - Finds high internal aggregation consistency within each data set but some aggregation discrepancy between sets. Shows that word (polysyllabic) distributions satisfy Lotka's law but that character (monosyllabic) distributions do not abide by the law. Research limitations/implications - The findings are limited to only two sets of bibliographic data (for aggregation consistency analysis) and to one set of data for the frequency distribution analysis. Only two bibliometric distributions are tested. Internal consistency within each database remains fairly high. Therefore the main argument against syllable aggregation does not appear to hold true. The analysis revealed that Chinese words and characters behave differently in terms of frequency distribution but that there is no noticeable difference between vernacular and Romanized data. The distribution of Romanized characters exhibits the worst case in terms of fit to either Lotka's or Zipf's laws, which indicates that Romanized data in aggregated form appear to be a preferable option. Originality/value - Provides empirical data on consistency and distribution of Romanized Chinese titles in bibliographic records.
Vilares, J.; Alonso, M.A.; Vilares, M.: Extraction of complex index terms in non-English IR : a shallow parsing based approach (2008) 0.01
```
0.0052890894 = product of:
  0.042312715 = sum of:
    0.042312715 = weight(_text_:case in 2107) [ClassicSimilarity], result of:
      0.042312715 = score(doc=2107,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.24286987 = fieldWeight in 2107, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2107)
  0.125 = coord(1/8)
```
Abstract

The performance of information retrieval systems is limited by the linguistic variation present in natural language texts. Word-level natural language processing techniques have been shown to be useful in reducing this variation. In this article, we summarize our work on the extension of these techniques for dealing with phrase-level variation in European languages, taking Spanish as a case in point. We propose the use of syntactic dependencies as complex index terms in an attempt to solve the problems deriving from both syntactic and morpho-syntactic variation and, in this way, to obtain more precise index terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers in order to reduce as far as possible the overhead due to this parsing process. The use of different sources of syntactic information, queries or documents, has been also studied, as has the restriction of the dependencies applied to those obtained from noun phrases. Our approaches have been tested using the CLEF corpus, obtaining consistent improvements with regard to classical word-level non-linguistic techniques. Results show, on the one hand, that syntactic information extracted from documents is more useful than that from queries. On the other hand, it has been demonstrated that by restricting dependencies to those corresponding to noun phrases, important reductions of storage and management costs can be achieved, albeit at the expense of a slight reduction in performance.
He, Q.: ¬A study of the strength indexes in co-word analysis (2000) 0.01
```
0.00522842 = product of:
  0.04182736 = sum of:
    0.04182736 = weight(_text_:studies in 111) [ClassicSimilarity], result of:
      0.04182736 = score(doc=111,freq=2.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.26452032 = fieldWeight in 111, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.046875 = fieldNorm(doc=111)
  0.125 = coord(1/8)
```
Abstract

Co-word analysis is a technique for detecting the knowledge structure of scientific literature and mapping the dynamics in a research field. It is used to count the co-occurrences of term pairs, compute the strength between term pairs, and map the research field by inserting terms and their linkages into a graphical structure according to the strength values. In previous co-word studies, there are two indexes used to measure the strength between term pairs in order to identify the major areas in a research field - the inclusion index (I) and the equivalence index (E). This study will conduct two co-word analysis experiments using the two indexes, respectively, and compare the results from the two experiments. The results show, due to the difference in their computation, index I is more likely to identify general subject areas in a research field while index E is more likely to identify subject areas at more specific levels
Abdelali, A.: Localization in modern standard Arabic (2004) 0.01
```
0.00522842 = product of:
  0.04182736 = sum of:
    0.04182736 = weight(_text_:studies in 2066) [ClassicSimilarity], result of:
      0.04182736 = score(doc=2066,freq=2.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.26452032 = fieldWeight in 2066, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.046875 = fieldNorm(doc=2066)
  0.125 = coord(1/8)
```
Abstract

Modern Standard Arabic (MSA) is the official language used in all Arabic countries. In this paper we describe an investigation of the uniformity of MSA across different countries. Many studies have been carried out locally or regionally an Arabic and its dialects. Here we look an a more global scale by studying language variations between countries. The source material used in this investigation was derived from national newspapers available an the Web, which provided samples of common media usage in each country. This corpus has been used to investigate the lexical characteristics of Modern Standard Arabic as found in 10 different Arabic speaking countries. We describe our collection methods, the types of lexical analysis performed, and the results of our investigations. With respect to newspaper articles, MSA seems to be very uniform across all the countries included in the study, but we have detected various types of differences, with implications for computational processing of MSA.

Search (45 results, page 1 of 3)

Authors

Languages

Types

Themes

Subjects

Classifications