Search (9 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × type_ss:"x"
  1. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.02
    0.022235535 = product of:
      0.04447107 = sum of:
        0.04447107 = sum of:
          0.007030784 = weight(_text_:a in 563) [ClassicSimilarity], result of:
            0.007030784 = score(doc=563,freq=6.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.13239266 = fieldWeight in 563, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046875 = fieldNorm(doc=563)
          0.037440285 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
            0.037440285 = score(doc=563,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.23214069 = fieldWeight in 563, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=563)
      0.5 = coord(1/2)
    
    Abstract
    In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
    Content
    A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
    Date
    10. 1.2013 19:22:47
  2. Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.01
    0.009360071 = product of:
      0.018720143 = sum of:
        0.018720143 = product of:
          0.037440285 = sum of:
            0.037440285 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
              0.037440285 = score(doc=1746,freq=2.0), product of:
                0.16128273 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046056706 = queryNorm
                0.23214069 = fieldWeight in 1746, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1746)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 3.2015 9:17:30
  3. Pepper, S.: ¬The typology and semantics of binominal lexemes : noun-noun compounds and their functional equivalents (2020) 0.00
    0.0027894354 = product of:
      0.005578871 = sum of:
        0.005578871 = product of:
          0.011157742 = sum of:
            0.011157742 = weight(_text_:a in 104) [ClassicSimilarity], result of:
              0.011157742 = score(doc=104,freq=34.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.21010503 = fieldWeight in 104, product of:
                  5.8309517 = tf(freq=34.0), with freq of:
                    34.0 = termFreq=34.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.03125 = fieldNorm(doc=104)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The dissertation establishes 'binominal lexeme' as a comparative concept and discusses its cross-linguistic typology and semantics. Informally, a binominal lexeme is a noun-noun compound or functional equivalent; more precisely, it is a lexical item that consists primarily of two thing-morphs between which there exists an unstated semantic relation. Examples of binominals include Mandarin Chinese ?? (tielù) [iron road], French chemin de fer [way of iron] and Russian ???????? ?????? (zeleznaja doroga) [iron:adjz road]. All of these combine a word denoting 'iron' and a word denoting 'road' or 'way' to denote the meaning railway. In each case, the unstated semantic relation is one of composition: a railway is conceptualized as a road that is composed (or made) of iron. However, three different morphosyntactic strategies are employed: compounding, prepositional phrase and relational adjective. This study explores the range of such strategies used by a worldwide sample of 106 languages to express a set of 100 meanings from various semantic domains, resulting in a classification consisting of nine different morphosyntactic types. The semantic relations found in the data are also explored and a classification called the Hatcher-Bourque system is developed that operates at two levels of granularity, together with a tool for classifying binominals, the Bourquifier. The classification is extended to other subfields of language, including metonymy and lexical semantics, and beyond language to the domain of knowledge representation, resulting in a proposal for a general model of associative relations called the PHAB model. The many findings of the research include universals concerning the recruitment of anchoring nominal modification strategies, a method for comparing non-binary typologies, the non-universality (despite its predominance) of compounding, and a scale of frequencies for semantic relations which may provide insights into the associative nature of human thought.
  4. Witschel, H.F.: Global and local resources for peer-to-peer text retrieval (2008) 0.00
    0.0021343792 = product of:
      0.0042687585 = sum of:
        0.0042687585 = product of:
          0.008537517 = sum of:
            0.008537517 = weight(_text_:a in 127) [ClassicSimilarity], result of:
              0.008537517 = score(doc=127,freq=26.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.16076508 = fieldWeight in 127, product of:
                  5.0990195 = tf(freq=26.0), with freq of:
                    26.0 = termFreq=26.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=127)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This thesis is organised as follows: Chapter 2 gives a general introduction to the field of information retrieval, covering its most important aspects. Further, the tasks of distributed and peer-to-peer information retrieval (P2PIR) are introduced, motivating their application and characterising the special challenges that they involve, including a review of existing architectures and search protocols in P2PIR. Finally, chapter 2 presents approaches to evaluating the e ectiveness of both traditional and peer-to-peer IR systems. Chapter 3 contains a detailed account of state-of-the-art information retrieval models and algorithms. This encompasses models for matching queries against document representations, term weighting algorithms, approaches to feedback and associative retrieval as well as distributed retrieval. It thus defines important terminology for the following chapters. The notion of "multi-level association graphs" (MLAGs) is introduced in chapter 4. An MLAG is a simple, graph-based framework that allows to model most of the theoretical and practical approaches to IR presented in chapter 3. Moreover, it provides an easy-to-grasp way of defining and including new entities into IR modeling, such as paragraphs or peers, dividing them conceptually while at the same time connecting them to each other in a meaningful way. This allows for a unified view on many IR tasks, including that of distributed and peer-to-peer search. Starting from related work and a formal defiition of the framework, the possibilities of modeling that it provides are discussed in detail, followed by an experimental section that shows how new insights gained from modeling inside the framework can lead to novel combinations of principles and eventually to improved retrieval effectiveness.
    Chapter 5 empirically tackles the first of the two research questions formulated above, namely the question of global collection statistics. More precisely, it studies possibilities of radically simplified results merging. The simplification comes from the attempt - without having knowledge of the complete collection - to equip all peers with the same global statistics, making document scores comparable across peers. Chapter 5 empirically tackles the first of the two research questions formulated above, namely the question of global collection statistics. More precisely, it studies possibilities of radically simplified results merging. The simplification comes from the attempt - without having knowledge of the complete collection - to equip all peers with the same global statistics, making document scores comparable across peers. What is examined, is the question of how we can obtain such global statistics and to what extent their use will lead to a drop in retrieval effectiveness. In chapter 6, the second research question is tackled, namely that of making forwarding decisions for queries, based on profiles of other peers. After a review of related work in that area, the chapter first defines the approaches that will be compared against each other. Then, a novel evaluation framework is introduced, including a new measure for comparing results of a distributed search engine against those of a centralised one. Finally, the actual evaluation is performed using the new framework.
  5. Karlova-Bourbonus, N.: Automatic detection of contradictions in texts (2018) 0.00
    0.0020920765 = product of:
      0.004184153 = sum of:
        0.004184153 = product of:
          0.008368306 = sum of:
            0.008368306 = weight(_text_:a in 5976) [ClassicSimilarity], result of:
              0.008368306 = score(doc=5976,freq=34.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15757877 = fieldWeight in 5976, product of:
                  5.8309517 = tf(freq=34.0), with freq of:
                    34.0 = termFreq=34.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=5976)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Natural language contradictions are of complex nature. As will be shown in Chapter 5, the realization of contradictions is not limited to the examples such as Socrates is a man and Socrates is not a man (under the condition that Socrates refers to the same object in the real world), which is discussed by Aristotle (Section 3.1.1). Empirical evidence (see Chapter 5 for more details) shows that only a few contradictions occurring in the real life are of that explicit (prototypical) kind. Rather, con-tradictions make use of a variety of natural language devices such as, e.g., paraphrasing, synonyms and antonyms, passive and active voice, diversity of negation expression, and figurative linguistic means such as idioms, irony, and metaphors. Additionally, the most so-phisticated kind of contradictions, the so-called implicit contradictions, can be found only when applying world knowledge and after conducting a sequence of logical operations such as e.g. in: (1.1) The first prize was given to the experienced grandmaster L. Stein who, in total, col-lected ten points (7 wins and 3 draws). Those familiar with the chess rules know that a chess player gets one point for winning and zero points for losing the game. In case of a draw, each player gets a half point. Built on this idea and by conducting some simple mathematical operations, we can infer that in the case of 7 wins and 3 draws (the second part of the sentence), a player can only collect 8.5 points and not 10 points. Hence, we observe that there is a contradiction between the first and the second parts of the sentence.
    Implicit contradictions will only partially be the subject of the present study, aiming primarily at identifying the realization mechanism and cues (Chapter 5) as well as finding the parts of contradictions by applying the state of the art algorithms for natural language processing without conducting deep meaning processing. Further in focus are the explicit and implicit contradictions that can be detected by means of explicit linguistic, structural, lexical cues, and by conducting some additional processing operations (e.g., counting the sum in order to detect contradictions arising from numerical divergencies). One should note that an additional complexity in finding contradictions can arise in case parts of the contradictions occur on different levels of realization. Thus, a contradiction can be observed on the word- and phrase-level, such as in a married bachelor (for variations of contradictions on lexical level, see Ganeev 2004), on the sentence level - between parts of a sentence or between two or more sentences, or on the text level - between the portions of a text or between the whole texts such as a contradiction between the Bible and the Quran, for example. Only contradictions arising at the level of single sentences occurring in one or more texts, as well as parts of a sentence, will be considered for the purpose of this study. Though the focus of interest will be on single sentences, it will make use of text particularities such as coreference resolution without establishing the referents in the real world. Finally, another aspect to be considered is that parts of the contradictions are not neces-sarily to appear at the same time. They can be separated by many years and centuries with or without time expression making their recognition by human and detection by machine challenging. According to Aristotle's ontological version of the LNC (Section 3.1.1), how-ever, the same time reference is required in order for two statements to be judged as a contradiction. Taking this into account, we set the borders for the study by limiting the ana-lyzed textual data thematically (only nine world events) and temporally (three days after the reported event had happened) (Section 5.1). No sophisticated time processing will thus be conducted.
  6. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.00
    0.0019633435 = product of:
      0.003926687 = sum of:
        0.003926687 = product of:
          0.007853374 = sum of:
            0.007853374 = weight(_text_:a in 1536) [ClassicSimilarity], result of:
              0.007853374 = score(doc=1536,freq=22.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.14788237 = fieldWeight in 1536, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=1536)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Multiword expressions (MWEs) are lexical items that can be decomposed into single words and display lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasy (Sag et al., 2002; Kim, 2008; Calzolari et al., 2002). The proper treatment of multiword expressions such as rock 'n' roll and make a decision is essential for many natural language processing (NLP) applications like information extraction and retrieval, terminology extraction and machine translation, and it is important to identify multiword expressions in context. For example, in machine translation we must know that MWEs form one semantic unit, hence their parts should not be translated separately. For this, multiword expressions should be identified first in the text to be translated. The chief aim of this thesis is to develop machine learning-based approaches for the automatic detection of different types of multiword expressions in English and Hungarian natural language texts. In our investigations, we pay attention to the characteristics of different types of multiword expressions such as nominal compounds, multiword named entities and light verb constructions, and we apply novel methods to identify MWEs in raw texts. In the thesis it will be demonstrated that nominal compounds and multiword amed entities may require a similar approach for their automatic detection as they behave in the same way from a linguistic point of view. Furthermore, it will be shown that the automatic detection of light verb constructions can be carried out using two effective machine learning-based approaches.
    In this thesis, we focused on the automatic detection of multiword expressions in natural language texts. On the basis of the main contributions, we can argue that: - Supervised machine learning methods can be successfully applied for the automatic detection of different types of multiword expressions in natural language texts. - Machine learning-based multiword expression detection can be successfully carried out for English as well as for Hungarian. - Our supervised machine learning-based model was successfully applied to the automatic detection of nominal compounds from English raw texts. - We developed a Wikipedia-based dictionary labeling method to automatically detect English nominal compounds. - A prior knowledge of nominal compounds can enhance Named Entity Recognition, while previously identified named entities can assist the nominal compound identification process. - The machine learning-based method can also provide acceptable results when it was trained on an automatically generated silver standard corpus. - As named entities form one semantic unit and may consist of more than one word and function as a noun, we can treat them in a similar way to nominal compounds. - Our sequence labelling-based tool can be successfully applied for identifying verbal light verb constructions in two typologically different languages, namely English and Hungarian. - Domain adaptation techniques may help diminish the distance between domains in the automatic detection of light verb constructions. - Our syntax-based method can be successfully applied for the full-coverage identification of light verb constructions. As a first step, a data-driven candidate extraction method can be utilized. After, a machine learning approach that makes use of an extended and rich feature set selects LVCs among extracted candidates. - When a precise syntactic parser is available for the actual domain, the full-coverage identification can be performed better. In other cases, the usage of the sequence labeling method is recommended.
  7. Schmolz, H.: Anaphora resolution and text retrieval : a lnguistic analysis of hypertexts (2015) 0.00
    0.0016913437 = product of:
      0.0033826875 = sum of:
        0.0033826875 = product of:
          0.006765375 = sum of:
            0.006765375 = weight(_text_:a in 1172) [ClassicSimilarity], result of:
              0.006765375 = score(doc=1172,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.12739488 = fieldWeight in 1172, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1172)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  8. Schmolz, H.: Anaphora resolution and text retrieval : a lnguistic analysis of hypertexts (2013) 0.00
    0.0016913437 = product of:
      0.0033826875 = sum of:
        0.0033826875 = product of:
          0.006765375 = sum of:
            0.006765375 = weight(_text_:a in 1810) [ClassicSimilarity], result of:
              0.006765375 = score(doc=1810,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.12739488 = fieldWeight in 1810, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1810)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  9. Rösener, C.: ¬Die Stecknadel im Heuhaufen : Natürlichsprachlicher Zugang zu Volltextdatenbanken (2005) 0.00
    6.765375E-4 = product of:
      0.001353075 = sum of:
        0.001353075 = product of:
          0.00270615 = sum of:
            0.00270615 = weight(_text_:a in 548) [ClassicSimilarity], result of:
              0.00270615 = score(doc=548,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.050957955 = fieldWeight in 548, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.03125 = fieldNorm(doc=548)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Content
    5: Interaktion 5.1 Frage-Antwort- bzw. Dialogsysteme: Forschungen und Projekte 5.2 Darstellung und Visualisierung von Wissen 5.3 Das Dialogsystem im Rahmen des LeWi-Projektes 5.4 Ergebnisdarstellung und Antwortpräsentation im LeWi-Kontext 6: Testumgebungen und -ergebnisse 7: Ergebnisse und Ausblick 7.1 Ausgangssituation 7.2 Schlussfolgerungen 7.3 Ausblick Anhang A Auszüge aus der Grob- bzw. Feinklassifikation des BMM Anhang B MPRO - Formale Beschreibung der wichtigsten Merkmale ... Anhang C Fragentypologie mit Beispielsätzen (Auszug) Anhang D Semantische Merkmale im morphologischen Lexikon (Auszug) Anhang E Regelbeispiele für die Fragentypzuweisung Anhang F Aufstellung der möglichen Suchen im LeWi-Dialogmodul (Auszug) Anhang G Vollständiger Dialogbaum zu Beginn des Projektes Anhang H Statuszustände zur Ermittlung der Folgefragen (Auszug)