Search (15 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × year_i:[2010 TO 2020}
  1. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.06
    0.060179763 = product of:
      0.18053928 = sum of:
        0.16634896 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
          0.16634896 = score(doc=563,freq=2.0), product of:
            0.2959851 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03491209 = queryNorm
            0.56201804 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
        0.014190319 = product of:
          0.028380638 = sum of:
            0.028380638 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
              0.028380638 = score(doc=563,freq=2.0), product of:
                0.1222562 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03491209 = queryNorm
                0.23214069 = fieldWeight in 563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=563)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Content
    A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
    Date
    10. 1.2013 19:22:47
  2. Luo, Z.; Yu, Y.; Osborne, M.; Wang, T.: Structuring tweets for improving Twitter search (2015) 0.02
    0.01995085 = product of:
      0.059852548 = sum of:
        0.031560984 = weight(_text_:searching in 2335) [ClassicSimilarity], result of:
          0.031560984 = score(doc=2335,freq=2.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.22347288 = fieldWeight in 2335, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2335)
        0.028291566 = product of:
          0.056583133 = sum of:
            0.056583133 = weight(_text_:etc in 2335) [ClassicSimilarity], result of:
              0.056583133 = score(doc=2335,freq=2.0), product of:
                0.18910104 = queryWeight, product of:
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.03491209 = queryNorm
                0.2992217 = fieldWeight in 2335, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2335)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Spam and wildly varying documents make searching in Twitter challenging. Most Twitter search systems generally treat a Tweet as a plain text when modeling relevance. However, a series of conventions allows users to Tweet in structural ways using a combination of different blocks of texts. These blocks include plain texts, hashtags, links, mentions, etc. Each block encodes a variety of communicative intent and the sequence of these blocks captures changing discourse. Previous work shows that exploiting the structural information can improve the structured documents (e.g., web pages) retrieval. In this study we utilize the structure of Tweets, induced by these blocks, for Twitter retrieval and Twitter opinion retrieval. For Twitter retrieval, a set of features, derived from the blocks of text and their combinations, is used into a learning-to-rank scenario. We show that structuring Tweets can achieve state-of-the-art performance. Our approach does not rely on social media features, but when we do add this additional information, performance improves significantly. For Twitter opinion retrieval, we explore the question of whether structural information derived from the body of Tweets and opinionatedness ratings of Tweets can improve performance. Experimental results show that retrieval using a novel unsupervised opinionatedness feature based on structuring Tweets achieves comparable performance with a supervised method using manually tagged Tweets. Topic-related specific structured Tweet sets are shown to help with query-dependent opinion retrieval.
  3. Dolamic, L.; Savoy, J.: Retrieval effectiveness of machine translated queries (2010) 0.01
    0.008926794 = product of:
      0.053560764 = sum of:
        0.053560764 = weight(_text_:searching in 4102) [ClassicSimilarity], result of:
          0.053560764 = score(doc=4102,freq=4.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.37924606 = fieldWeight in 4102, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.046875 = fieldNorm(doc=4102)
      0.16666667 = coord(1/6)
    
    Abstract
    This article describes and evaluates various information retrieval models used to search document collections written in English through submitting queries written in various other languages, either members of the Indo-European family (English, French, German, and Spanish) or radically different language groups such as Chinese. This evaluation method involves searching a rather large number of topics (around 300) and using two commercial machine translation systems to translate across the language barriers. In this study, mean average precision is used to measure variances in retrieval effectiveness when a query language differs from the document language. Although performance differences are rather large for certain languages pairs, this does not mean that bilingual search methods are not commercially viable. Causes of the difficulties incurred when searching or during translation are analyzed and the results of concrete examples are explained.
  4. Soo, J.; Frieder, O.: On searching misspelled collections (2015) 0.01
    0.008416262 = product of:
      0.050497573 = sum of:
        0.050497573 = weight(_text_:searching in 1862) [ClassicSimilarity], result of:
          0.050497573 = score(doc=1862,freq=2.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.3575566 = fieldWeight in 1862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0625 = fieldNorm(doc=1862)
      0.16666667 = coord(1/6)
    
  5. Altmann, E.G.; Cristadoro, G.; Esposti, M.D.: On the origin of long-range correlations in texts (2012) 0.01
    0.0056583136 = product of:
      0.03394988 = sum of:
        0.03394988 = product of:
          0.06789976 = sum of:
            0.06789976 = weight(_text_:etc in 330) [ClassicSimilarity], result of:
              0.06789976 = score(doc=330,freq=2.0), product of:
                0.18910104 = queryWeight, product of:
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.03491209 = queryNorm
                0.35906604 = fieldWeight in 330, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.046875 = fieldNorm(doc=330)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    The complexity of human interactions with social and natural phenomena is mirrored in the way we describe our experiences through natural language. In order to retain and convey such a high dimensional information, the statistical properties of our linguistic output has to be highly correlated in time. An example are the robust observations, still largely not understood, of correlations on arbitrary long scales in literary texts. In this paper we explain how long-range correlations flow from highly structured linguistic levels down to the building blocks of a text (words, letters, etc..). By combining calculations and data analysis we show that correlations take form of a bursty sequence of events once we approach the semantically relevant topics of the text. The mechanisms we identify are fairly general and can be equally applied to other hierarchical settings.
  6. AL-Smadi, M.; Jaradat, Z.; AL-Ayyoub, M.; Jararweh, Y.: Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features (2017) 0.01
    0.0056583136 = product of:
      0.03394988 = sum of:
        0.03394988 = product of:
          0.06789976 = sum of:
            0.06789976 = weight(_text_:etc in 5095) [ClassicSimilarity], result of:
              0.06789976 = score(doc=5095,freq=2.0), product of:
                0.18910104 = queryWeight, product of:
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.03491209 = queryNorm
                0.35906604 = fieldWeight in 5095, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5095)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users' information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results.
  7. Multi-source, multilingual information extraction and summarization (2013) 0.00
    0.004715261 = product of:
      0.028291566 = sum of:
        0.028291566 = product of:
          0.056583133 = sum of:
            0.056583133 = weight(_text_:etc in 978) [ClassicSimilarity], result of:
              0.056583133 = score(doc=978,freq=2.0), product of:
                0.18910104 = queryWeight, product of:
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.03491209 = queryNorm
                0.2992217 = fieldWeight in 978, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=978)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    Information extraction (IE) and text summarization (TS) are powerful technologies for finding relevant pieces of information in text and presenting them to the user in condensed form. The ongoing information explosion makes IE and TS critical for successful functioning within the information society. These technologies face particular challenges due to the inherent multi-source nature of the information explosion. The technologies must now handle not isolated texts or individual narratives, but rather large-scale repositories and streams---in general, in multiple languages---containing a multiplicity of perspectives, opinions, or commentaries on particular topics, entities or events. There is thus a need to adapt existing techniques and develop new ones to deal with these challenges. This volume contains a selection of papers that present a variety of methodologies for content identification and extraction, as well as for content fusion and regeneration. The chapters cover various aspects of the challenges, depending on the nature of the information sought---names vs. events,--- and the nature of the sources---news streams vs. image captions vs. scientific research papers, etc. This volume aims to offer a broad and representative sample of studies from this very active research field.
  8. Helbig, H.: Knowledge representation and the semantics of natural language (2014) 0.00
    0.004715261 = product of:
      0.028291566 = sum of:
        0.028291566 = product of:
          0.056583133 = sum of:
            0.056583133 = weight(_text_:etc in 2396) [ClassicSimilarity], result of:
              0.056583133 = score(doc=2396,freq=2.0), product of:
                0.18910104 = queryWeight, product of:
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.03491209 = queryNorm
                0.2992217 = fieldWeight in 2396, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2396)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    Natural Language is not only the most important means of communication between human beings, it is also used over historical periods for the preservation of cultural achievements and their transmission from one generation to the other. During the last few decades, the flod of digitalized information has been growing tremendously. This tendency will continue with the globalisation of information societies and with the growing importance of national and international computer networks. This is one reason why the theoretical understanding and the automated treatment of communication processes based on natural language have such a decisive social and economic impact. In this context, the semantic representation of knowledge originally formulated in natural language plays a central part, because it connects all components of natural language processing systems, be they the automatic understanding of natural language (analysis), the rational reasoning over knowledge bases, or the generation of natural language expressions from formal representations. This book presents a method for the semantic representation of natural language expressions (texts, sentences, phrases, etc.) which can be used as a universal knowledge representation paradigm in the human sciences, like linguistics, cognitive psychology, or philosophy of language, as well as in computational linguistics and in artificial intelligence. It is also an attempt to close the gap between these disciplines, which to a large extent are still working separately.
  9. Kajanan, S.; Bao, Y.; Datta, A.; VanderMeer, D.; Dutta, K.: Efficient automatic search query formulation using phrase-level analysis (2014) 0.00
    0.004208131 = product of:
      0.025248786 = sum of:
        0.025248786 = weight(_text_:searching in 1264) [ClassicSimilarity], result of:
          0.025248786 = score(doc=1264,freq=2.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.1787783 = fieldWeight in 1264, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03125 = fieldNorm(doc=1264)
      0.16666667 = coord(1/6)
    
    Abstract
    Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).
  10. Ramisch, C.: Multiword expressions acquisition : a generic and open framework (2015) 0.00
    0.0037722092 = product of:
      0.022633255 = sum of:
        0.022633255 = product of:
          0.04526651 = sum of:
            0.04526651 = weight(_text_:etc in 1649) [ClassicSimilarity], result of:
              0.04526651 = score(doc=1649,freq=2.0), product of:
                0.18910104 = queryWeight, product of:
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.03491209 = queryNorm
                0.23937736 = fieldWeight in 1649, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4164915 = idf(docFreq=533, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1649)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    This book is an excellent introduction to multiword expressions. It provides a unique, comprehensive and up-to-date overview of this exciting topic in computational linguistics. The first part describes the diversity and richness of multiword expressions, including many examples in several languages. These constructions are not only complex and arbitrary, but also much more frequent than one would guess, making them a real nightmare for natural language processing applications. The second part introduces a new generic framework for automatic acquisition of multiword expressions from texts. Furthermore, it describes the accompanying free software tool, the mwetoolkit, which comes in handy when looking for expressions in texts (regardless of the language). Evaluation is greatly emphasized, underlining the fact that results depend on parameters like corpus size, language, MWE type, etc. The last part contains solid experimental results and evaluates the mwetoolkit, demonstrating its usefulness for computer-assisted lexicography and machine translation. This is the first book to cover the whole pipeline of multiword expression acquisition in a single volume. It is addresses the needs of students and researchers in computational and theoretical linguistics, cognitive sciences, artificial intelligence and computer science. Its good balance between computational and linguistic views make it the perfect starting point for anyone interested in multiword expressions, language and text processing in general.
  11. Lezius, W.: Morphy - Morphologie und Tagging für das Deutsche (2013) 0.00
    0.0031534042 = product of:
      0.018920425 = sum of:
        0.018920425 = product of:
          0.03784085 = sum of:
            0.03784085 = weight(_text_:22 in 1490) [ClassicSimilarity], result of:
              0.03784085 = score(doc=1490,freq=2.0), product of:
                0.1222562 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03491209 = queryNorm
                0.30952093 = fieldWeight in 1490, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1490)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Date
    22. 3.2015 9:30:24
  12. Lawrie, D.; Mayfield, J.; McNamee, P.; Oard, P.W.: Cross-language person-entity linking from 20 languages (2015) 0.00
    0.0023650532 = product of:
      0.014190319 = sum of:
        0.014190319 = product of:
          0.028380638 = sum of:
            0.028380638 = weight(_text_:22 in 1848) [ClassicSimilarity], result of:
              0.028380638 = score(doc=1848,freq=2.0), product of:
                0.1222562 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03491209 = queryNorm
                0.23214069 = fieldWeight in 1848, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1848)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    The goal of entity linking is to associate references to an entity that is found in unstructured natural language content to an authoritative inventory of known entities. This article describes the construction of 6 test collections for cross-language person-entity linking that together span 22 languages. Fully automated components were used together with 2 crowdsourced validation stages to affordably generate ground-truth annotations with an accuracy comparable to that of a completely manual process. The resulting test collections each contain between 642 (Arabic) and 2,361 (Romanian) person references in non-English texts for which the correct resolution in English Wikipedia is known, plus a similar number of references for which no correct resolution into English Wikipedia is believed to exist. Fully automated cross-language person-name linking experiments with 20 non-English languages yielded a resolution accuracy of between 0.84 (Serbian) and 0.98 (Romanian), which compares favorably with previously reported cross-language entity linking results for Spanish.
  13. Fóris, A.: Network theory and terminology (2013) 0.00
    0.0019708779 = product of:
      0.011825266 = sum of:
        0.011825266 = product of:
          0.023650533 = sum of:
            0.023650533 = weight(_text_:22 in 1365) [ClassicSimilarity], result of:
              0.023650533 = score(doc=1365,freq=2.0), product of:
                0.1222562 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03491209 = queryNorm
                0.19345059 = fieldWeight in 1365, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1365)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Date
    2. 9.2014 21:22:48
  14. Rötzer, F.: KI-Programm besser als Menschen im Verständnis natürlicher Sprache (2018) 0.00
    0.0015767021 = product of:
      0.009460213 = sum of:
        0.009460213 = product of:
          0.018920425 = sum of:
            0.018920425 = weight(_text_:22 in 4217) [ClassicSimilarity], result of:
              0.018920425 = score(doc=4217,freq=2.0), product of:
                0.1222562 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03491209 = queryNorm
                0.15476047 = fieldWeight in 4217, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4217)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Date
    22. 1.2018 11:32:44
  15. Deventer, J.P. van; Kruger, C.J.; Johnson, R.D.: Delineating knowledge management through lexical analysis : a retrospective (2015) 0.00
    0.0013796145 = product of:
      0.008277686 = sum of:
        0.008277686 = product of:
          0.016555373 = sum of:
            0.016555373 = weight(_text_:22 in 3807) [ClassicSimilarity], result of:
              0.016555373 = score(doc=3807,freq=2.0), product of:
                0.1222562 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03491209 = queryNorm
                0.1354154 = fieldWeight in 3807, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=3807)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Date
    20. 1.2015 18:30:22