Search (31 results, page 2 of 2)

  • × theme_ss:"Computerlinguistik"
  • × year_i:[2010 TO 2020}
  1. Sünkler, S.; Kerkmann, F.; Schultheiß, S.: Ok Google . the end of search as we know it : sprachgesteuerte Websuche im Test (2018) 0.00
    0.0018871318 = product of:
      0.026419844 = sum of:
        0.026419844 = weight(_text_:web in 5626) [ClassicSimilarity], result of:
          0.026419844 = score(doc=5626,freq=2.0), product of:
            0.10467481 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0320743 = queryNorm
            0.25239927 = fieldWeight in 5626, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5626)
      0.071428575 = coord(1/14)
    
    Abstract
    Sprachsteuerungssysteme, die den Nutzer auf Zuruf unterstützen, werden im Zuge der Verbreitung von Smartphones und Lautsprechersystemen wie Amazon Echo oder Google Home zunehmend populär. Eine der zentralen Anwendungen dabei stellt die Suche in Websuchmaschinen dar. Wie aber funktioniert "googlen", wenn der Nutzer seine Suchanfrage nicht schreibt, sondern spricht? Dieser Frage ist ein Projektteam der HAW Hamburg nachgegangen und hat im Auftrag der Deutschen Telekom untersucht, wie effektiv, effizient und zufriedenstellend Google Now, Apple Siri, Microsoft Cortana sowie das Amazon Fire OS arbeiten. Ermittelt wurden Stärken und Schwächen der Systeme sowie Erfolgskriterien für eine hohe Gebrauchstauglichkeit. Diese Erkenntnisse mündeten in dem Prototyp einer optimalen Voice Web Search.
  2. Spitkovsky, V.; Norvig, P.: From words to concepts and back : dictionaries for linking text, entities and ideas (2012) 0.00
    0.0018677762 = product of:
      0.026148865 = sum of:
        0.026148865 = weight(_text_:web in 337) [ClassicSimilarity], result of:
          0.026148865 = score(doc=337,freq=6.0), product of:
            0.10467481 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0320743 = queryNorm
            0.24981049 = fieldWeight in 337, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=337)
      0.071428575 = coord(1/14)
    
    Abstract
    Human language is both rich and ambiguous. When we hear or read words, we resolve meanings to mental representations, for example recognizing and linking names to the intended persons, locations or organizations. Bridging words and meaning - from turning search queries into relevant results to suggesting targeted keywords for advertisers - is also Google's core competency, and important for many other tasks in information retrieval and natural language processing. We are happy to release a resource, spanning 7,560,141 concepts and 175,100,788 unique text strings, that we hope will help everyone working in these areas. How do we represent concepts? Our approach piggybacks on the unique titles of entries from an encyclopedia, which are mostly proper and common noun phrases. We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia's groupings of articles into hierarchical categories. The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article's canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept's url. Our database thus includes weights that measure degrees of association. For example, the top two entries for football indicate that it is an ambiguous term, which is almost twice as likely to refer to what we in the US call soccer. Vgl. auch: Spitkovsky, V.I., A.X. Chang: A cross-lingual dictionary for english Wikipedia concepts. In: http://nlp.stanford.edu/pubs/crosswikis.pdf.
  3. Levin, M.; Krawczyk, S.; Bethard, S.; Jurafsky, D.: Citation-based bootstrapping for large-scale author disambiguation (2012) 0.00
    0.0013479512 = product of:
      0.018871317 = sum of:
        0.018871317 = weight(_text_:web in 246) [ClassicSimilarity], result of:
          0.018871317 = score(doc=246,freq=2.0), product of:
            0.10467481 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0320743 = queryNorm
            0.18028519 = fieldWeight in 246, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=246)
      0.071428575 = coord(1/14)
    
    Abstract
    We present a new, two-stage, self-supervised algorithm for author disambiguation in large bibliographic databases. In the first "bootstrap" stage, a collection of high-precision features is used to bootstrap a training set with positive and negative examples of coreferring authors. A supervised feature-based classifier is then trained on the bootstrap clusters and used to cluster the authors in a larger unlabeled dataset. Our self-supervised approach shares the advantages of unsupervised approaches (no need for expensive hand labels) as well as supervised approaches (a rich set of features that can be discriminatively trained). The algorithm disambiguates 54,000,000 author instances in Thomson Reuters' Web of Knowledge with B3 F1 of.807. We analyze parameters and features, particularly those from citation networks, which have not been deeply investigated in author disambiguation. The most important citation feature is self-citation, which can be approximated without expensive extraction of the full network. For the supervised stage, the minor improvement due to other citation features (increasing F1 from.748 to.767) suggests they may not be worth the trouble of extracting from databases that don't already have them. A lean feature set without expensive abstract and title features performs 130 times faster with about equal F1.
  4. Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.00
    0.0013479512 = product of:
      0.018871317 = sum of:
        0.018871317 = weight(_text_:web in 1338) [ClassicSimilarity], result of:
          0.018871317 = score(doc=1338,freq=2.0), product of:
            0.10467481 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0320743 = queryNorm
            0.18028519 = fieldWeight in 1338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1338)
      0.071428575 = coord(1/14)
    
    Abstract
    A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
  5. Luo, Z.; Yu, Y.; Osborne, M.; Wang, T.: Structuring tweets for improving Twitter search (2015) 0.00
    0.0013479512 = product of:
      0.018871317 = sum of:
        0.018871317 = weight(_text_:web in 2335) [ClassicSimilarity], result of:
          0.018871317 = score(doc=2335,freq=2.0), product of:
            0.10467481 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0320743 = queryNorm
            0.18028519 = fieldWeight in 2335, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2335)
      0.071428575 = coord(1/14)
    
    Abstract
    Spam and wildly varying documents make searching in Twitter challenging. Most Twitter search systems generally treat a Tweet as a plain text when modeling relevance. However, a series of conventions allows users to Tweet in structural ways using a combination of different blocks of texts. These blocks include plain texts, hashtags, links, mentions, etc. Each block encodes a variety of communicative intent and the sequence of these blocks captures changing discourse. Previous work shows that exploiting the structural information can improve the structured documents (e.g., web pages) retrieval. In this study we utilize the structure of Tweets, induced by these blocks, for Twitter retrieval and Twitter opinion retrieval. For Twitter retrieval, a set of features, derived from the blocks of text and their combinations, is used into a learning-to-rank scenario. We show that structuring Tweets can achieve state-of-the-art performance. Our approach does not rely on social media features, but when we do add this additional information, performance improves significantly. For Twitter opinion retrieval, we explore the question of whether structural information derived from the body of Tweets and opinionatedness ratings of Tweets can improve performance. Experimental results show that retrieval using a novel unsupervised opinionatedness feature based on structuring Tweets achieves comparable performance with a supervised method using manually tagged Tweets. Topic-related specific structured Tweet sets are shown to help with query-dependent opinion retrieval.
  6. Gencosman, B.C.; Ozmutlu, H.C.; Ozmutlu, S.: Character n-gram application for automatic new topic identification (2014) 0.00
    0.0013479512 = product of:
      0.018871317 = sum of:
        0.018871317 = weight(_text_:web in 2688) [ClassicSimilarity], result of:
          0.018871317 = score(doc=2688,freq=2.0), product of:
            0.10467481 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0320743 = queryNorm
            0.18028519 = fieldWeight in 2688, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2688)
      0.071428575 = coord(1/14)
    
    Abstract
    The widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.
  7. Lezius, W.: Morphy - Morphologie und Tagging für das Deutsche (2013) 0.00
    0.0012416071 = product of:
      0.017382499 = sum of:
        0.017382499 = product of:
          0.034764998 = sum of:
            0.034764998 = weight(_text_:22 in 1490) [ClassicSimilarity], result of:
              0.034764998 = score(doc=1490,freq=2.0), product of:
                0.11231873 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0320743 = queryNorm
                0.30952093 = fieldWeight in 1490, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1490)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Date
    22. 3.2015 9:30:24
  8. Lawrie, D.; Mayfield, J.; McNamee, P.; Oard, P.W.: Cross-language person-entity linking from 20 languages (2015) 0.00
    9.3120534E-4 = product of:
      0.013036874 = sum of:
        0.013036874 = product of:
          0.026073748 = sum of:
            0.026073748 = weight(_text_:22 in 1848) [ClassicSimilarity], result of:
              0.026073748 = score(doc=1848,freq=2.0), product of:
                0.11231873 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0320743 = queryNorm
                0.23214069 = fieldWeight in 1848, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1848)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    The goal of entity linking is to associate references to an entity that is found in unstructured natural language content to an authoritative inventory of known entities. This article describes the construction of 6 test collections for cross-language person-entity linking that together span 22 languages. Fully automated components were used together with 2 crowdsourced validation stages to affordably generate ground-truth annotations with an accuracy comparable to that of a completely manual process. The resulting test collections each contain between 642 (Arabic) and 2,361 (Romanian) person references in non-English texts for which the correct resolution in English Wikipedia is known, plus a similar number of references for which no correct resolution into English Wikipedia is believed to exist. Fully automated cross-language person-name linking experiments with 20 non-English languages yielded a resolution accuracy of between 0.84 (Serbian) and 0.98 (Romanian), which compares favorably with previously reported cross-language entity linking results for Spanish.
  9. Fóris, A.: Network theory and terminology (2013) 0.00
    7.7600445E-4 = product of:
      0.010864062 = sum of:
        0.010864062 = product of:
          0.021728124 = sum of:
            0.021728124 = weight(_text_:22 in 1365) [ClassicSimilarity], result of:
              0.021728124 = score(doc=1365,freq=2.0), product of:
                0.11231873 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0320743 = queryNorm
                0.19345059 = fieldWeight in 1365, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1365)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Date
    2. 9.2014 21:22:48
  10. Rötzer, F.: KI-Programm besser als Menschen im Verständnis natürlicher Sprache (2018) 0.00
    6.2080356E-4 = product of:
      0.008691249 = sum of:
        0.008691249 = product of:
          0.017382499 = sum of:
            0.017382499 = weight(_text_:22 in 4217) [ClassicSimilarity], result of:
              0.017382499 = score(doc=4217,freq=2.0), product of:
                0.11231873 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0320743 = queryNorm
                0.15476047 = fieldWeight in 4217, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4217)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Date
    22. 1.2018 11:32:44
  11. Deventer, J.P. van; Kruger, C.J.; Johnson, R.D.: Delineating knowledge management through lexical analysis : a retrospective (2015) 0.00
    5.4320315E-4 = product of:
      0.0076048435 = sum of:
        0.0076048435 = product of:
          0.015209687 = sum of:
            0.015209687 = weight(_text_:22 in 3807) [ClassicSimilarity], result of:
              0.015209687 = score(doc=3807,freq=2.0), product of:
                0.11231873 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0320743 = queryNorm
                0.1354154 = fieldWeight in 3807, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=3807)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Date
    20. 1.2015 18:30:22

Languages

  • e 25
  • d 6

Types

  • a 24
  • el 8
  • m 2
  • s 1
  • x 1
  • More… Less…