Search (334 results, page 17 of 17)

  • × theme_ss:"Computerlinguistik"
  1. Niemi, T.; Jämsen, J.: ¬A query language for discovering semantic associations, part II : sample queries and query evaluation (2007) 0.00
    5.2312744E-4 = product of:
      0.003661892 = sum of:
        0.003661892 = product of:
          0.01830946 = sum of:
            0.01830946 = weight(_text_:retrieval in 580) [ClassicSimilarity], result of:
              0.01830946 = score(doc=580,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.16710453 = fieldWeight in 580, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=580)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  2. Niemi, T.; Jämsen , J.: ¬A query language for discovering semantic associations, part I : approach and formal definition of query primitives (2007) 0.00
    5.2312744E-4 = product of:
      0.003661892 = sum of:
        0.003661892 = product of:
          0.01830946 = sum of:
            0.01830946 = weight(_text_:retrieval in 591) [ClassicSimilarity], result of:
              0.01830946 = score(doc=591,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.16710453 = fieldWeight in 591, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=591)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  3. Vilares, J.; Alonso, M.A.; Vilares, M.: Extraction of complex index terms in non-English IR : a shallow parsing based approach (2008) 0.00
    5.2312744E-4 = product of:
      0.003661892 = sum of:
        0.003661892 = product of:
          0.01830946 = sum of:
            0.01830946 = weight(_text_:retrieval in 2107) [ClassicSimilarity], result of:
              0.01830946 = score(doc=2107,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.16710453 = fieldWeight in 2107, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2107)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    The performance of information retrieval systems is limited by the linguistic variation present in natural language texts. Word-level natural language processing techniques have been shown to be useful in reducing this variation. In this article, we summarize our work on the extension of these techniques for dealing with phrase-level variation in European languages, taking Spanish as a case in point. We propose the use of syntactic dependencies as complex index terms in an attempt to solve the problems deriving from both syntactic and morpho-syntactic variation and, in this way, to obtain more precise index terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers in order to reduce as far as possible the overhead due to this parsing process. The use of different sources of syntactic information, queries or documents, has been also studied, as has the restriction of the dependencies applied to those obtained from noun phrases. Our approaches have been tested using the CLEF corpus, obtaining consistent improvements with regard to classical word-level non-linguistic techniques. Results show, on the one hand, that syntactic information extracted from documents is more useful than that from queries. On the other hand, it has been demonstrated that by restricting dependencies to those corresponding to noun phrases, important reductions of storage and management costs can be achieved, albeit at the expense of a slight reduction in performance.
  4. Hmeidi, I.I.; Al-Shalabi, R.F.; Al-Taani, A.T.; Najadat, H.; Al-Hazaimeh, S.A.: ¬A novel approach to the extraction of roots from Arabic words using bigrams (2010) 0.00
    5.2312744E-4 = product of:
      0.003661892 = sum of:
        0.003661892 = product of:
          0.01830946 = sum of:
            0.01830946 = weight(_text_:retrieval in 3426) [ClassicSimilarity], result of:
              0.01830946 = score(doc=3426,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.16710453 = fieldWeight in 3426, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3426)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Root extraction is one of the most important topics in information retrieval (IR), natural language processing (NLP), text summarization, and many other important fields. In the last two decades, several algorithms have been proposed to extract Arabic roots. Most of these algorithms dealt with triliteral roots only, and some with fixed length words only. In this study, a novel approach to the extraction of roots from Arabic words using bigrams is proposed. Two similarity measures are used, the dissimilarity measure called the Manhattan distance, and Dice's measure of similarity. The proposed algorithm is tested on the Holy Qu'ran and on a corpus of 242 abstracts from the Proceedings of the Saudi Arabian National Computer Conferences. The two files used contain a wide range of data: the Holy Qu'ran contains most of the ancient Arabic words while the other file contains some modern Arabic words and some words borrowed from foreign languages in addition to the original Arabic words. The results of this study showed that combining N-grams with the Dice measure gives better results than using the Manhattan distance measure.
  5. Witschel, H.F.: Terminology extraction and automatic indexing : comparison and qualitative evaluation of methods (2005) 0.00
    5.2312744E-4 = product of:
      0.003661892 = sum of:
        0.003661892 = product of:
          0.01830946 = sum of:
            0.01830946 = weight(_text_:retrieval in 1842) [ClassicSimilarity], result of:
              0.01830946 = score(doc=1842,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.16710453 = fieldWeight in 1842, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1842)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Many terminology engineering processes involve the task of automatic terminology extraction: before the terminology of a given domain can be modelled, organised or standardised, important concepts (or terms) of this domain have to be identified and fed into terminological databases. These serve in further steps as a starting point for compiling dictionaries, thesauri or maybe even terminological ontologies for the domain. For the extraction of the initial concepts, extraction methods are needed that operate on specialised language texts. On the other hand, many machine learning or information retrieval applications require automatic indexing techniques. In Machine Learning applications concerned with the automatic clustering or classification of texts, often feature vectors are needed that describe the contents of a given text briefly but meaningfully. These feature vectors typically consist of a fairly small set of index terms together with weights indicating their importance. Short but meaningful descriptions of document contents as provided by good index terms are also useful to humans: some knowledge management applications (e.g. topic maps) use them as a set of basic concepts (topics). The author believes that the tasks of terminology extraction and automatic indexing have much in common and can thus benefit from the same set of basic algorithms. It is the goal of this paper to outline some methods that may be used in both contexts, but also to find the discriminating factors between the two tasks that call for the variation of parameters or application of different techniques. The discussion of these methods will be based on statistical, syntactical and especially morphological properties of (index) terms. The paper is concluded by the presentation of some qualitative and quantitative results comparing statistical and morphological methods.
  6. Renker, L.: Exploration von Textkorpora : Topic Models als Grundlage der Interaktion (2015) 0.00
    5.2312744E-4 = product of:
      0.003661892 = sum of:
        0.003661892 = product of:
          0.01830946 = sum of:
            0.01830946 = weight(_text_:retrieval in 2380) [ClassicSimilarity], result of:
              0.01830946 = score(doc=2380,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.16710453 = fieldWeight in 2380, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2380)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  7. Brychcín, T.; Konopík, M.: HPS: High precision stemmer (2015) 0.00
    5.2312744E-4 = product of:
      0.003661892 = sum of:
        0.003661892 = product of:
          0.01830946 = sum of:
            0.01830946 = weight(_text_:retrieval in 2686) [ClassicSimilarity], result of:
              0.01830946 = score(doc=2686,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.16710453 = fieldWeight in 2686, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2686)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Research into unsupervised ways of stemming has resulted, in the past few years, in the development of methods that are reliable and perform well. Our approach further shifts the boundaries of the state of the art by providing more accurate stemming results. The idea of the approach consists in building a stemmer in two stages. In the first stage, a stemming algorithm based upon clustering, which exploits the lexical and semantic information of words, is used to prepare large-scale training data for the second-stage algorithm. The second-stage algorithm uses a maximum entropy classifier. The stemming-specific features help the classifier decide when and how to stem a particular word. In our research, we have pursued the goal of creating a multi-purpose stemming tool. Its design opens up possibilities of solving non-traditional tasks such as approximating lemmas or improving language modeling. However, we still aim at very good results in the traditional task of information retrieval. The conducted tests reveal exceptional performance in all the above mentioned tasks. Our stemming method is compared with three state-of-the-art statistical algorithms and one rule-based algorithm. We used corpora in the Czech, Slovak, Polish, Hungarian, Spanish and English languages. In the tests, our algorithm excels in stemming previously unseen words (the words that are not present in the training set). Moreover, it was discovered that our approach demands very little text data for training when compared with competing unsupervised algorithms.
  8. Olsen, K.A.; Williams, J.G.: Spelling and grammar checking using the Web as a text repository (2004) 0.00
    4.5370017E-4 = product of:
      0.003175901 = sum of:
        0.003175901 = product of:
          0.015879504 = sum of:
            0.015879504 = weight(_text_:system in 2891) [ClassicSimilarity], result of:
              0.015879504 = score(doc=2891,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.13919188 = fieldWeight in 2891, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2891)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Natural languages are both complex and dynamic. They are in part formalized through dictionaries and grammar. Dictionaries attempt to provide definitions and examples of various usages for all the words in a language. Grammar, on the other hand, is the system of rules that defines the structure of a language and is concerned with the correct use and application of the language in speaking or writing. The fact that these two mechanisms lag behind the language as currently used is not a serious problem for those living in a language culture and talking their native language. However, the correct choice of words, expressions, and word relationships is much more difficult when speaking or writing in a foreign language. The basics of the grammar of a language may have been learned in school decades ago, and even then there were always several choices for the correct expression for an idea, fact, opinion, or emotion. Although many different parts of speech and their relationships can make for difficult language decisions, prepositions tend to be problematic for nonnative speakers of English, and, in reality, prepositions are a major problem in most languages. Does a speaker or writer say "in the West Coast" or "on the West Coast," or perhaps "at the West Coast"? In Norwegian, we are "in" a city, but "at" a place. But the distinction between cities and places is vague. To be absolutely correct, one really has to learn the right preposition for every single place. A simplistic way of resolving these language issues is to ask a native speaker. But even native speakers may disagree about the right choice of words. If there is disagreement, then one will have to ask more than one native speaker, treat his/her response as a vote for a particular choice, and perhaps choose the majority choice as the best possible alternative. In real life, such a procedure may be impossible or impractical, but in the electronic world, as we shall see, this is quite easy to achieve. Using the vast text repository of the Web, we may get a significant voting base for even the most detailed and distinct phrases. We shall start by introducing a set of examples to present our idea of using the text repository an the Web to aid in making the best word selection, especially for the use of prepositions. Then we will present a more general discussion of the possibilities and limitations of using the Web as an aid for correct writing.
  9. Pepper, S.: ¬The typology and semantics of binominal lexemes : noun-noun compounds and their functional equivalents (2020) 0.00
    4.5370017E-4 = product of:
      0.003175901 = sum of:
        0.003175901 = product of:
          0.015879504 = sum of:
            0.015879504 = weight(_text_:system in 104) [ClassicSimilarity], result of:
              0.015879504 = score(doc=104,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.13919188 = fieldWeight in 104, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03125 = fieldNorm(doc=104)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    The dissertation establishes 'binominal lexeme' as a comparative concept and discusses its cross-linguistic typology and semantics. Informally, a binominal lexeme is a noun-noun compound or functional equivalent; more precisely, it is a lexical item that consists primarily of two thing-morphs between which there exists an unstated semantic relation. Examples of binominals include Mandarin Chinese ?? (tielù) [iron road], French chemin de fer [way of iron] and Russian ???????? ?????? (zeleznaja doroga) [iron:adjz road]. All of these combine a word denoting 'iron' and a word denoting 'road' or 'way' to denote the meaning railway. In each case, the unstated semantic relation is one of composition: a railway is conceptualized as a road that is composed (or made) of iron. However, three different morphosyntactic strategies are employed: compounding, prepositional phrase and relational adjective. This study explores the range of such strategies used by a worldwide sample of 106 languages to express a set of 100 meanings from various semantic domains, resulting in a classification consisting of nine different morphosyntactic types. The semantic relations found in the data are also explored and a classification called the Hatcher-Bourque system is developed that operates at two levels of granularity, together with a tool for classifying binominals, the Bourquifier. The classification is extended to other subfields of language, including metonymy and lexical semantics, and beyond language to the domain of knowledge representation, resulting in a proposal for a general model of associative relations called the PHAB model. The many findings of the research include universals concerning the recruitment of anchoring nominal modification strategies, a method for comparing non-binary typologies, the non-universality (despite its predominance) of compounding, and a scale of frequencies for semantic relations which may provide insights into the associative nature of human thought.
  10. Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen : Beiträge zur GLDV Tagung 2005 in Bonn (2005) 0.00
    4.438883E-4 = product of:
      0.003107218 = sum of:
        0.003107218 = product of:
          0.01553609 = sum of:
            0.01553609 = weight(_text_:retrieval in 3578) [ClassicSimilarity], result of:
              0.01553609 = score(doc=3578,freq=4.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.1417929 = fieldWeight in 3578, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=3578)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Content
    INHALT: Chris Biemann/Rainer Osswald: Automatische Erweiterung eines semantikbasierten Lexikons durch Bootstrapping auf großen Korpora - Ernesto William De Luca/Andreas Nürnberger: Supporting Mobile Web Search by Ontology-based Categorization - Rüdiger Gleim: HyGraph - Ein Framework zur Extraktion, Repräsentation und Analyse webbasierter Hypertextstrukturen - Felicitas Haas/Bernhard Schröder: Freges Grundgesetze der Arithmetik: Dokumentbaum und Formelwald - Ulrich Held/ Andre Blessing/Bettina Säuberlich/Jürgen Sienel/Horst Rößler/Dieter Kopp: A personalized multimodal news service -Jürgen Hermes/Christoph Benden: Fusion von Annotation und Präprozessierung als Vorschlag zur Behebung des Rohtextproblems - Sonja Hüwel/Britta Wrede/Gerhard Sagerer: Semantisches Parsing mit Frames für robuste multimodale Mensch-Maschine-Kommunikation - Brigitte Krenn/Stefan Evert: Separating the wheat from the chaff- Corpus-driven evaluation of statistical association measures for collocation extraction - Jörn Kreutel: An application-centered Perspective an Multimodal Dialogue Systems - Jonas Kuhn: An Architecture for Prallel Corpusbased Grammar Learning - Thomas Mandl/Rene Schneider/Pia Schnetzler/Christa Womser-Hacker: Evaluierung von Systemen für die Eigennamenerkennung im crosslingualen Information Retrieval - Alexander Mehler/Matthias Dehmer/Rüdiger Gleim: Zur Automatischen Klassifikation von Webgenres - Charlotte Merz/Martin Volk: Requirements for a Parallel Treebank Search Tool - Sally YK. Mok: Multilingual Text Retrieval an the Web: The Case of a Cantonese-Dagaare-English Trilingual e-Lexicon -
  11. Vlachidis, A.; Binding, C.; Tudhope, D.; May, K.: Excavating grey literature : a case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources (2010) 0.00
    4.1850194E-4 = product of:
      0.0029295133 = sum of:
        0.0029295133 = product of:
          0.014647567 = sum of:
            0.014647567 = weight(_text_:retrieval in 3948) [ClassicSimilarity], result of:
              0.014647567 = score(doc=3948,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.13368362 = fieldWeight in 3948, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3948)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Purpose - This paper sets out to discuss the use of information extraction (IE), a natural language-processing (NLP) technique to assist "rich" semantic indexing of diverse archaeological text resources. The focus of the research is to direct a semantic-aware "rich" indexing of diverse natural language resources with properties capable of satisfying information retrieval from online publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project. Design/methodology/approach - The paper proposes use of the English Heritage extension (CRM-EH) of the standard core ontology in cultural heritage, CIDOC CRM, and exploitation of domain thesauri resources for driving and enhancing an Ontology-Oriented Information Extraction process. The process of semantic indexing is based on a rule-based Information Extraction technique, which is facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules. Findings - Initial results suggest that the combination of information extraction with knowledge resources and standard conceptual models is capable of supporting semantic-aware term indexing. Additional efforts are required for further exploitation of the technique and adoption of formal evaluation methods for assessing the performance of the method in measurable terms. Originality/value - The value of the paper lies in the semantic indexing of 535 unpublished online documents often referred to as "Grey Literature", from the Archaeological Data Service OASIS corpus (Online AccesS to the Index of archaeological investigationS), with respect to the CRM ontological concepts E49.Time Appellation and P19.Physical Object.
  12. Spitkovsky, V.; Norvig, P.: From words to concepts and back : dictionaries for linking text, entities and ideas (2012) 0.00
    4.1850194E-4 = product of:
      0.0029295133 = sum of:
        0.0029295133 = product of:
          0.014647567 = sum of:
            0.014647567 = weight(_text_:retrieval in 337) [ClassicSimilarity], result of:
              0.014647567 = score(doc=337,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.13368362 = fieldWeight in 337, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03125 = fieldNorm(doc=337)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Human language is both rich and ambiguous. When we hear or read words, we resolve meanings to mental representations, for example recognizing and linking names to the intended persons, locations or organizations. Bridging words and meaning - from turning search queries into relevant results to suggesting targeted keywords for advertisers - is also Google's core competency, and important for many other tasks in information retrieval and natural language processing. We are happy to release a resource, spanning 7,560,141 concepts and 175,100,788 unique text strings, that we hope will help everyone working in these areas. How do we represent concepts? Our approach piggybacks on the unique titles of entries from an encyclopedia, which are mostly proper and common noun phrases. We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia's groupings of articles into hierarchical categories. The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article's canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept's url. Our database thus includes weights that measure degrees of association. For example, the top two entries for football indicate that it is an ambiguous term, which is almost twice as likely to refer to what we in the US call soccer. Vgl. auch: Spitkovsky, V.I., A.X. Chang: A cross-lingual dictionary for english Wikipedia concepts. In: http://nlp.stanford.edu/pubs/crosswikis.pdf.
  13. Working with conceptual structures : contributions to ICCS 2000. 8th International Conference on Conceptual Structures: Logical, Linguistic, and Computational Issues. Darmstadt, August 14-18, 2000 (2000) 0.00
    3.9698763E-4 = product of:
      0.0027789134 = sum of:
        0.0027789134 = product of:
          0.013894566 = sum of:
            0.013894566 = weight(_text_:system in 5089) [ClassicSimilarity], result of:
              0.013894566 = score(doc=5089,freq=2.0), product of:
                0.11408355 = queryWeight, product of:
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.03622214 = queryNorm
                0.1217929 = fieldWeight in 5089, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1495528 = idf(docFreq=5152, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=5089)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Content
    Concepts & Language: Knowledge organization by procedures of natural language processing. A case study using the method GABEK (J. Zelger, J. Gadner) - Computer aided narrative analysis using conceptual graphs (H. Schärfe, P. 0hrstrom) - Pragmatic representation of argumentative text: a challenge for the conceptual graph approach (H. Irandoust, B. Moulin) - Conceptual graphs as a knowledge representation core in a complex language learning environment (G. Angelova, A. Nenkova, S. Boycheva, T. Nikolov) - Conceptual Modeling and Ontologies: Relationships and actions in conceptual categories (Ch. Landauer, K.L. Bellman) - Concept approximations for formal concept analysis (J. Saquer, J.S. Deogun) - Faceted information representation (U. Priß) - Simple concept graphs with universal quantifiers (J. Tappe) - A framework for comparing methods for using or reusing multiple ontologies in an application (J. van ZyI, D. Corbett) - Designing task/method knowledge-based systems with conceptual graphs (M. Leclère, F.Trichet, Ch. Choquet) - A logical ontology (J. Farkas, J. Sarbo) - Algorithms and Tools: Fast concept analysis (Ch. Lindig) - A framework for conceptual graph unification (D. Corbett) - Visual CP representation of knowledge (H.D. Pfeiffer, R.T. Hartley) - Maximal isojoin for representing software textual specifications and detecting semantic anomalies (Th. Charnois) - Troika: using grids, lattices and graphs in knowledge acquisition (H.S. Delugach, B.E. Lampkin) - Open world theorem prover for conceptual graphs (J.E. Heaton, P. Kocura) - NetCare: a practical conceptual graphs software tool (S. Polovina, D. Strang) - CGWorld - a web based workbench for conceptual graphs management and applications (P. Dobrev, K. Toutanova) - Position papers: The edition project: Peirce's existential graphs (R. Mülller) - Mining association rules using formal concept analysis (N. Pasquier) - Contextual logic summary (R Wille) - Information channels and conceptual scaling (K.E. Wolff) - Spatial concepts - a rule exploration (S. Rudolph) - The TEXT-TO-ONTO learning environment (A. Mädche, St. Staab) - Controlling the semantics of metadata on audio-visual documents using ontologies (Th. Dechilly, B. Bachimont) - Building the ontological foundations of a terminology from natural language to conceptual graphs with Ribosome, a knowledge extraction system (Ch. Jacquelinet, A. Burgun) - CharGer: some lessons learned and new directions (H.S. Delugach) - Knowledge management using conceptual graphs (W.K. Pun)
  14. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.00
    3.661892E-4 = product of:
      0.0025633243 = sum of:
        0.0025633243 = product of:
          0.012816621 = sum of:
            0.012816621 = weight(_text_:retrieval in 1536) [ClassicSimilarity], result of:
              0.012816621 = score(doc=1536,freq=2.0), product of:
                0.109568894 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03622214 = queryNorm
                0.11697317 = fieldWeight in 1536, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=1536)
          0.2 = coord(1/5)
      0.14285715 = coord(1/7)
    
    Abstract
    Multiword expressions (MWEs) are lexical items that can be decomposed into single words and display lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasy (Sag et al., 2002; Kim, 2008; Calzolari et al., 2002). The proper treatment of multiword expressions such as rock 'n' roll and make a decision is essential for many natural language processing (NLP) applications like information extraction and retrieval, terminology extraction and machine translation, and it is important to identify multiword expressions in context. For example, in machine translation we must know that MWEs form one semantic unit, hence their parts should not be translated separately. For this, multiword expressions should be identified first in the text to be translated. The chief aim of this thesis is to develop machine learning-based approaches for the automatic detection of different types of multiword expressions in English and Hungarian natural language texts. In our investigations, we pay attention to the characteristics of different types of multiword expressions such as nominal compounds, multiword named entities and light verb constructions, and we apply novel methods to identify MWEs in raw texts. In the thesis it will be demonstrated that nominal compounds and multiword amed entities may require a similar approach for their automatic detection as they behave in the same way from a linguistic point of view. Furthermore, it will be shown that the automatic detection of light verb constructions can be carried out using two effective machine learning-based approaches.

Authors

Years

Languages

Types

  • a 277
  • m 30
  • el 18
  • s 17
  • x 12
  • d 2
  • p 2
  • b 1
  • pat 1
  • r 1
  • More… Less…

Subjects

Classifications