Search (123 results, page 1 of 7)

  • × year_i:[2010 TO 2020}
  • × theme_ss:"Computerlinguistik"
  1. Lawrie, D.; Mayfield, J.; McNamee, P.; Oard, P.W.: Cross-language person-entity linking from 20 languages (2015) 0.02
    0.022779368 = product of:
      0.045558736 = sum of:
        0.045558736 = sum of:
          0.008118451 = weight(_text_:a in 1848) [ClassicSimilarity], result of:
            0.008118451 = score(doc=1848,freq=8.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.15287387 = fieldWeight in 1848, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046875 = fieldNorm(doc=1848)
          0.037440285 = weight(_text_:22 in 1848) [ClassicSimilarity], result of:
            0.037440285 = score(doc=1848,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.23214069 = fieldWeight in 1848, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=1848)
      0.5 = coord(1/2)
    
    Abstract
    The goal of entity linking is to associate references to an entity that is found in unstructured natural language content to an authoritative inventory of known entities. This article describes the construction of 6 test collections for cross-language person-entity linking that together span 22 languages. Fully automated components were used together with 2 crowdsourced validation stages to affordably generate ground-truth annotations with an accuracy comparable to that of a completely manual process. The resulting test collections each contain between 642 (Arabic) and 2,361 (Romanian) person references in non-English texts for which the correct resolution in English Wikipedia is known, plus a similar number of references for which no correct resolution into English Wikipedia is believed to exist. Fully automated cross-language person-name linking experiments with 20 non-English languages yielded a resolution accuracy of between 0.84 (Serbian) and 0.98 (Romanian), which compares favorably with previously reported cross-language entity linking results for Spanish.
    Type
    a
  2. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.02
    0.022235535 = product of:
      0.04447107 = sum of:
        0.04447107 = sum of:
          0.007030784 = weight(_text_:a in 563) [ClassicSimilarity], result of:
            0.007030784 = score(doc=563,freq=6.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.13239266 = fieldWeight in 563, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046875 = fieldNorm(doc=563)
          0.037440285 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
            0.037440285 = score(doc=563,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.23214069 = fieldWeight in 563, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=563)
      0.5 = coord(1/2)
    
    Abstract
    In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
    Content
    A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
    Date
    10. 1.2013 19:22:47
  3. Fóris, A.: Network theory and terminology (2013) 0.02
    0.020074995 = product of:
      0.04014999 = sum of:
        0.04014999 = sum of:
          0.00894975 = weight(_text_:a in 1365) [ClassicSimilarity], result of:
            0.00894975 = score(doc=1365,freq=14.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.1685276 = fieldWeight in 1365, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1365)
          0.03120024 = weight(_text_:22 in 1365) [ClassicSimilarity], result of:
            0.03120024 = score(doc=1365,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.19345059 = fieldWeight in 1365, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1365)
      0.5 = coord(1/2)
    
    Abstract
    The paper aims to present the relations of network theory and terminology. The model of scale-free networks, which has been recently developed and widely applied since, can be effectively used in terminology research as well. Operation based on the principle of networks is a universal characteristic of complex systems. Networks are governed by general laws. The model of scale-free networks can be viewed as a statistical-probability model, and it can be described with mathematical tools. Its main feature is that "everything is connected to everything else," that is, every node is reachable (in a few steps) starting from any other node; this phenomena is called "the small world phenomenon." The existence of a linguistic network and the general laws of the operation of networks enable us to place issues of language use in the complex system of relations that reveal the deeper connection s between phenomena with the help of networks embedded in each other. The realization of the metaphor that language also has a network structure is the basis of the classification methods of the terminological system, and likewise of the ways of creating terminology databases, which serve the purpose of providing easy and versatile accessibility to specialised knowledge.
    Date
    2. 9.2014 21:22:48
    Type
    a
  4. Deventer, J.P. van; Kruger, C.J.; Johnson, R.D.: Delineating knowledge management through lexical analysis : a retrospective (2015) 0.01
    0.014052495 = product of:
      0.02810499 = sum of:
        0.02810499 = sum of:
          0.0062648254 = weight(_text_:a in 3807) [ClassicSimilarity], result of:
            0.0062648254 = score(doc=3807,freq=14.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.11796933 = fieldWeight in 3807, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.02734375 = fieldNorm(doc=3807)
          0.021840166 = weight(_text_:22 in 3807) [ClassicSimilarity], result of:
            0.021840166 = score(doc=3807,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.1354154 = fieldWeight in 3807, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.02734375 = fieldNorm(doc=3807)
      0.5 = coord(1/2)
    
    Abstract
    Purpose Academic authors tend to define terms that meet their own needs. Knowledge Management (KM) is a term that comes to mind and is examined in this study. Lexicographical research identified KM terms used by authors from 1996 to 2006 in academic outlets to define KM. Data were collected based on strict criteria which included that definitions should be unique instances. From 2006 onwards, these authors could not identify new unique instances of definitions with repetitive usage of such definition instances. Analysis revealed that KM is directly defined by People (Person and Organisation), Processes (Codify, Share, Leverage, and Process) and Contextualised Content (Information). The paper aims to discuss these issues. Design/methodology/approach The aim of this paper is to add to the body of knowledge in the KM discipline and supply KM practitioners and scholars with insight into what is commonly regarded to be KM so as to reignite the debate on what one could consider as KM. The lexicon used by KM scholars was evaluated though the application of lexicographical research methods as extended though Knowledge Discovery and Text Analysis methods. Findings By simplifying term relationships through the application of lexicographical research methods, as extended though Knowledge Discovery and Text Analysis methods, it was found that KM is directly defined by People (Person and Organisation), Processes (Codify, Share, Leverage, Process) and Contextualised Content (Information). One would therefore be able to indicate that KM, from an academic point of view, refers to people processing contextualised content.
    Research limitations/implications In total, 42 definitions were identified spanning a period of 11 years. This represented the first use of KM through the estimated apex of terms used. From 2006 onwards definitions were used in repetition, and all definitions that were considered to repeat were therefore subsequently excluded as not being unique instances. All definitions listed are by no means complete and exhaustive. The definitions are viewed outside the scope and context in which they were originally formulated and then used to review the key concepts in the definitions themselves. Social implications When the authors refer to the aforementioned discussion of KM content as well as the presentation of the method followed in this paper, the authors may have a few implications for future research in KM. First the research validates ideas presented by the OECD in 2005 pertaining to KM. It also validates that through the evolution of KM, the authors ended with a description of KM that may be seen as a standardised description. If the authors as academics and practitioners, for example, refer to KM as the same construct and/or idea, it has the potential to speculatively, distinguish between what KM may or may not be. Originality/value By simplifying the term used to define KM, by focusing on the most common definitions, the paper assist in refocusing KM by reconsidering the dimensions that is the most common in how it has been defined over time. This would hopefully assist in reigniting discussions about KM and how it may be used to the benefit of an organisation.
    Date
    20. 1.2015 18:30:22
    Isbn
    a
  5. Rötzer, F.: KI-Programm besser als Menschen im Verständnis natürlicher Sprache (2018) 0.01
    0.01383317 = product of:
      0.02766634 = sum of:
        0.02766634 = sum of:
          0.00270615 = weight(_text_:a in 4217) [ClassicSimilarity], result of:
            0.00270615 = score(doc=4217,freq=2.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.050957955 = fieldWeight in 4217, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.03125 = fieldNorm(doc=4217)
          0.02496019 = weight(_text_:22 in 4217) [ClassicSimilarity], result of:
            0.02496019 = score(doc=4217,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.15476047 = fieldWeight in 4217, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=4217)
      0.5 = coord(1/2)
    
    Date
    22. 1.2018 11:32:44
    Type
    a
  6. Lezius, W.: Morphy - Morphologie und Tagging für das Deutsche (2013) 0.01
    0.012480095 = product of:
      0.02496019 = sum of:
        0.02496019 = product of:
          0.04992038 = sum of:
            0.04992038 = weight(_text_:22 in 1490) [ClassicSimilarity], result of:
              0.04992038 = score(doc=1490,freq=2.0), product of:
                0.16128273 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046056706 = queryNorm
                0.30952093 = fieldWeight in 1490, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1490)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 3.2015 9:30:24
  7. Vechtomova, O.: ¬A method for automatic extraction of multiword units representing business aspects from user reviews (2014) 0.00
    0.003515392 = product of:
      0.007030784 = sum of:
        0.007030784 = product of:
          0.014061568 = sum of:
            0.014061568 = weight(_text_:a in 1304) [ClassicSimilarity], result of:
              0.014061568 = score(doc=1304,freq=24.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.26478532 = fieldWeight in 1304, product of:
                  4.8989797 = tf(freq=24.0), with freq of:
                    24.0 = termFreq=24.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1304)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The article describes a semi-supervised approach to extracting multiword aspects of user-written reviews that belong to a given category. The method starts with a small set of seed words, representing the target category, and calculates distributional similarity between the candidate and seed words. We compare 3 distributional similarity measures (Lin's, Weeds's, and balAPinc), and a document retrieval function, BM25, adapted as a word similarity measure. We then introduce a method for identifying multiword aspects by using a combination of syntactic rules and a co-occurrence association measure. Finally, we describe a method for ranking multiword aspects by the likelihood of belonging to the target aspect category. The task used for evaluation is extraction of restaurant dish names from a corpus of restaurant reviews.
    Type
    a
  8. Rettinger, A.; Schumilin, A.; Thoma, S.; Ell, B.: Learning a cross-lingual semantic representation of relations expressed in text (2015) 0.00
    0.0033826875 = product of:
      0.006765375 = sum of:
        0.006765375 = product of:
          0.01353075 = sum of:
            0.01353075 = weight(_text_:a in 2027) [ClassicSimilarity], result of:
              0.01353075 = score(doc=2027,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.25478977 = fieldWeight in 2027, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2027)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  9. Kiela, D.; Clark, S.: Detecting compositionality of multi-word expressions using nearest neighbours in vector space models (2013) 0.00
    0.0030255679 = product of:
      0.0060511357 = sum of:
        0.0060511357 = product of:
          0.012102271 = sum of:
            0.012102271 = weight(_text_:a in 1161) [ClassicSimilarity], result of:
              0.012102271 = score(doc=1161,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.22789092 = fieldWeight in 1161, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1161)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We present a novel unsupervised approach to detecting the compositionality of multi-word expressions. We compute the compositionality of a phrase through substituting the constituent words with their "neighbours" in a semantic vector space and averaging over the distance between the original phrase and the substituted neighbour phrases. Several methods of obtaining neighbours are presented. The results are compared to existing supervised results and achieve state-of-the-art performance on a verb-object dataset of human compositionality ratings.
    Type
    a
  10. Colace, F.; Santo, M. De; Greco, L.; Napoletano, P.: Weighted word pairs for query expansion (2015) 0.00
    0.0029000505 = product of:
      0.005800101 = sum of:
        0.005800101 = product of:
          0.011600202 = sum of:
            0.011600202 = weight(_text_:a in 2687) [ClassicSimilarity], result of:
              0.011600202 = score(doc=2687,freq=12.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.21843673 = fieldWeight in 2687, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2687)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper proposes a novel query expansion method to improve accuracy of text retrieval systems. Our method makes use of a minimal relevance feedback to expand the initial query with a structured representation composed of weighted pairs of words. Such a structure is obtained from the relevance feedback through a method for pairs of words selection based on the Probabilistic Topic Model. We compared our method with other baseline query expansion schemes and methods. Evaluations performed on TREC-8 demonstrated the effectiveness of the proposed method with respect to the baseline.
    Type
    a
  11. Vasalou, A.; Gill, A.J.; Mazanderani, F.; Papoutsi, C.; Joinson, A.: Privacy dictionary : a new resource for the automated content analysis of privacy (2011) 0.00
    0.0028703054 = product of:
      0.005740611 = sum of:
        0.005740611 = product of:
          0.011481222 = sum of:
            0.011481222 = weight(_text_:a in 4915) [ClassicSimilarity], result of:
              0.011481222 = score(doc=4915,freq=16.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.2161963 = fieldWeight in 4915, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4915)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This article presents the privacy dictionary, a new linguistic resource for automated content analysis on privacy-related texts. To overcome the definitional challenges inherent in privacy research, the dictionary was informed by an inclusive set of relevant theoretical perspectives. Using methods from corpus linguistics, we constructed and validated eight dictionary categories on empirical material from a wide range of privacy-sensitive contexts. It was shown that the dictionary categories are able to measure unique linguistic patterns within privacy discussions. At a time when privacy considerations are increasing and online resources provide ever-growing quantities of textual data, the privacy dictionary can play a significant role not only for research in the social sciences but also in technology design and policymaking.
    Type
    a
  12. Biselli, A.: Unter Generalverdacht durch Algorithmen (2014) 0.00
    0.0028703054 = product of:
      0.005740611 = sum of:
        0.005740611 = product of:
          0.011481222 = sum of:
            0.011481222 = weight(_text_:a in 809) [ClassicSimilarity], result of:
              0.011481222 = score(doc=809,freq=4.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.2161963 = fieldWeight in 809, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.09375 = fieldNorm(doc=809)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  13. Kim, S.; Ko, Y.; Oard, D.W.: Combining lexical and statistical translation evidence for cross-language information retrieval (2015) 0.00
    0.0028703054 = product of:
      0.005740611 = sum of:
        0.005740611 = product of:
          0.011481222 = sum of:
            0.011481222 = weight(_text_:a in 1606) [ClassicSimilarity], result of:
              0.011481222 = score(doc=1606,freq=16.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.2161963 = fieldWeight in 1606, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1606)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This article explores how best to use lexical and statistical translation evidence together for cross-language information retrieval (CLIR). Lexical translation evidence is assembled from Wikipedia and from a large machine-readable dictionary, statistical translation evidence is drawn from parallel corpora, and evidence from co-occurrence in the document language provides a basis for limiting the adverse effect of translation ambiguity. Coverage statistics for NII Testbeds and Community for Information Access Research (NTCIR) queries confirm that these resources have complementary strengths. Experiments with translation evidence from a small parallel corpus indicate that even rather rough estimates of translation probabilities can yield further improvements over a strong technique for translation weighting based on using Jensen-Shannon divergence as a term-association measure. Finally, a novel approach to posttranslation query expansion using a random walk over the Wikipedia concept link graph is shown to yield further improvements over alternative techniques for posttranslation query expansion. Evaluation results on the NTCIR-5 English-Korean test collection show statistically significant improvements over strong baselines.
    Type
    a
  14. Nissim, M.; Zaninello, A,: Modeling the internal variability of multiword expressions through a pattern-based method (2013) 0.00
    0.0028047764 = product of:
      0.005609553 = sum of:
        0.005609553 = product of:
          0.011219106 = sum of:
            0.011219106 = weight(_text_:a in 990) [ClassicSimilarity], result of:
              0.011219106 = score(doc=990,freq=22.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.21126054 = fieldWeight in 990, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=990)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The issue of internal variability of multiword expressions (MWEs) is crucial towards their identification and extraction in running text.We present a corpus-supported and computational study on Italian MWEs, aimed at defining an automatic method for modeling internal variation, exploiting frequency and part-of-speech (POS) information. We do so by deriving an XML-encoded lexicon of MWEs based on a manually compiled dictionary, which is then projected onto a a large corpus. Since a search for fixed forms suffers from low recall, while an unconstrained flexible search for lemmas yields a loss in precision, we suggest a procedure aimed at maximizing precision in the identification of MWEs within a flexible search. Our method builds on the idea that internal variability can be modelled via the novel introduction of variation patterns, which work over POS patterns, and can be used as working tools for controlling precision. We also compare the performance of variation patterns to that of association measures, and explore the possibility of using variation patterns in MWE extraction in addition to identification. Finally, we suggest that corpus-derived, pattern-related information can be included in the original MWE lexicon by means of an enriched coding and the creation of an XML-based repository of patterns.
    Type
    a
  15. Panicheva, P.; Cardiff, J.; Rosso, P.: Identifying subjective statements in news titles using a personal sense annotation framework (2013) 0.00
    0.0026849252 = product of:
      0.0053698504 = sum of:
        0.0053698504 = product of:
          0.010739701 = sum of:
            0.010739701 = weight(_text_:a in 968) [ClassicSimilarity], result of:
              0.010739701 = score(doc=968,freq=14.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20223314 = fieldWeight in 968, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=968)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Subjective language contains information about private states. The goal of subjective language identification is to determine that a private state is expressed, without considering its polarity or specific emotion. A component of word meaning, "Personal Sense," has clear potential in the field of subjective language identification, as it reflects a meaning of words in terms of unique personal experience and carries personal characteristics. In this paper we investigate how Personal Sense can be harnessed for the purpose of identifying subjectivity in news titles. In the process, we develop a new Personal Sense annotation framework for annotating and classifying subjectivity, polarity, and emotion. The Personal Sense framework yields high performance in a fine-grained subsentence subjectivity classification. Our experiments demonstrate lexico-syntactic features to be useful for the identification of subjectivity indicators and the targets that receive the subjective Personal Sense.
    Type
    a
  16. Schöneberg, U.; Sperber, W.: POS tagging and its applications for mathematics (2014) 0.00
    0.0026849252 = product of:
      0.0053698504 = sum of:
        0.0053698504 = product of:
          0.010739701 = sum of:
            0.010739701 = weight(_text_:a in 1748) [ClassicSimilarity], result of:
              0.010739701 = score(doc=1748,freq=14.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20223314 = fieldWeight in 1748, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1748)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Content analysis of scientific publications is a nontrivial task, but a useful and important one for scientific information services. In the Gutenberg era it was a domain of human experts; in the digital age many machine-based methods, e.g., graph analysis tools and machine-learning techniques, have been developed for it. Natural Language Processing (NLP) is a powerful machine-learning approach to semiautomatic speech and language processing, which is also applicable to mathematics. The well established methods of NLP have to be adjusted for the special needs of mathematics, in particular for handling mathematical formulae. We demonstrate a mathematics-aware part of speech tagger and give a short overview about our adaptation of NLP methods for mathematical publications. We show the use of the tools developed for key phrase extraction and classification in the database zbMATH.
    Type
    a
  17. Anizi, M.; Dichy, J.: Improving information retrieval in Arabic through a multi-agent approach and a rich lexical resource (2011) 0.00
    0.0026742492 = product of:
      0.0053484985 = sum of:
        0.0053484985 = product of:
          0.010696997 = sum of:
            0.010696997 = weight(_text_:a in 4738) [ClassicSimilarity], result of:
              0.010696997 = score(doc=4738,freq=20.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20142901 = fieldWeight in 4738, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4738)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper addresses the optimization of information retrieval in Arabic. The results derived from the expanding development of sites in Arabic are often spectacular. Nevertheless, several observations indicate that the responses remain disappointing, particularly upon comparing users' requests and quality of responses. One of the problems encountered by users is the loss of time when navigating between different URLs to find adequate responses. This, in many cases, is due to the absence of forms morphologically related to the research keyword. Such problems can be approached through a morphological analyzer drawing on the DIINAR.1 morpho-lexical resource. A second problem concerns the formulation of the query, which may prove ambiguous, as in everyday language. We then focus on contextual disambiguation based on a rich lexical resource that includes collocations and set expressions. The overall scheme of such a resource will only be hinted at here. Our approach leads to the elaboration of a multi-agent system, motivated by a need to solve problems encountered when using conventional methods of analysis, and to improve the results of queries thanks to a better collaboration between different levels of analysis. We suggest resorting to four agents: morphological, morpho-lexical, contextualization, and an interface agent. These agents 'negotiate' and 'cooperate' throughout the analysis process, starting from the submission of the initial query, and going on until an adequate query is obtained.
    Type
    a
  18. Luo, Z.; Yu, Y.; Osborne, M.; Wang, T.: Structuring tweets for improving Twitter search (2015) 0.00
    0.0026742492 = product of:
      0.0053484985 = sum of:
        0.0053484985 = product of:
          0.010696997 = sum of:
            0.010696997 = weight(_text_:a in 2335) [ClassicSimilarity], result of:
              0.010696997 = score(doc=2335,freq=20.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20142901 = fieldWeight in 2335, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2335)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Spam and wildly varying documents make searching in Twitter challenging. Most Twitter search systems generally treat a Tweet as a plain text when modeling relevance. However, a series of conventions allows users to Tweet in structural ways using a combination of different blocks of texts. These blocks include plain texts, hashtags, links, mentions, etc. Each block encodes a variety of communicative intent and the sequence of these blocks captures changing discourse. Previous work shows that exploiting the structural information can improve the structured documents (e.g., web pages) retrieval. In this study we utilize the structure of Tweets, induced by these blocks, for Twitter retrieval and Twitter opinion retrieval. For Twitter retrieval, a set of features, derived from the blocks of text and their combinations, is used into a learning-to-rank scenario. We show that structuring Tweets can achieve state-of-the-art performance. Our approach does not rely on social media features, but when we do add this additional information, performance improves significantly. For Twitter opinion retrieval, we explore the question of whether structural information derived from the body of Tweets and opinionatedness ratings of Tweets can improve performance. Experimental results show that retrieval using a novel unsupervised opinionatedness feature based on structuring Tweets achieves comparable performance with a supervised method using manually tagged Tweets. Topic-related specific structured Tweet sets are shown to help with query-dependent opinion retrieval.
    Type
    a
  19. Doval, Y.; Gómez-Rodríguez, C.: Comparing neural- and N-gram-based language models for word segmentation (2019) 0.00
    0.0026742492 = product of:
      0.0053484985 = sum of:
        0.0053484985 = product of:
          0.010696997 = sum of:
            0.010696997 = weight(_text_:a in 4675) [ClassicSimilarity], result of:
              0.010696997 = score(doc=4675,freq=20.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20142901 = fieldWeight in 4675, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4675)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based on a beam search algorithm and a language model working at the byte/character level, the latter component implemented either as an n-gram model or a recurrent neural network. The resulting system analyzes the text input with no word boundaries one token at a time, which can be a character or a byte, and uses the information gathered by the language model to determine if a boundary must be placed in the current position or not. Our aim is to use this system in a preprocessing step for a microtext normalization system. This means that it needs to effectively cope with the data sparsity present on this kind of texts. We also strove to surpass the performance of two readily available word segmentation systems: The well-known and accessible Word Breaker by Microsoft, and the Python module WordSegment by Grant Jenks. The results show that we have met our objectives, and we hope to continue to improve both the precision and the efficiency of our system in the future.
    Type
    a
  20. Wu, H.; He, J.; Pei, Y.: Scientific impact at the topic level : a case study in computational linguistics (2010) 0.00
    0.0026473717 = product of:
      0.0052947435 = sum of:
        0.0052947435 = product of:
          0.010589487 = sum of:
            0.010589487 = weight(_text_:a in 4103) [ClassicSimilarity], result of:
              0.010589487 = score(doc=4103,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.19940455 = fieldWeight in 4103, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4103)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this article, we propose to apply the topic model and topic-level eigenfactor (TEF) algorithm to assess the relative importance of academic entities including articles, authors, journals, and conferences. Scientific impact is measured by the biased PageRank score toward topics created by the latent topic model. The TEF metric considers the impact of an academic entity in multiple granular views as well as in a global view. Experiments on a computational linguistics corpus show that the method is a useful and promising measure to assess scientific impact.
    Type
    a

Languages

  • e 95
  • d 26
  • el 1
  • More… Less…

Types

  • a 106
  • el 25
  • m 5
  • x 5
  • s 2
  • More… Less…