Search (37 results, page 1 of 2)

  • × theme_ss:"Computerlinguistik"
  • × year_i:[2010 TO 2020}
  1. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.13
    0.12907062 = product of:
      0.25814125 = sum of:
        0.19076009 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
          0.19076009 = score(doc=563,freq=2.0), product of:
            0.33941987 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.04003532 = queryNorm
            0.56201804 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
        0.056532677 = weight(_text_:web in 563) [ClassicSimilarity], result of:
          0.056532677 = score(doc=563,freq=8.0), product of:
            0.13065568 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04003532 = queryNorm
            0.43268442 = fieldWeight in 563, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
        0.010848465 = product of:
          0.032545395 = sum of:
            0.032545395 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
              0.032545395 = score(doc=563,freq=2.0), product of:
                0.14019686 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04003532 = queryNorm
                0.23214069 = fieldWeight in 563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=563)
          0.33333334 = coord(1/3)
      0.5 = coord(3/6)
    
    Abstract
    In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
    Content
    A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
    Date
    10. 1.2013 19:22:47
  2. Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.02
    0.022324583 = product of:
      0.066973746 = sum of:
        0.043418463 = weight(_text_:wide in 1338) [ClassicSimilarity], result of:
          0.043418463 = score(doc=1338,freq=2.0), product of:
            0.17738682 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.04003532 = queryNorm
            0.24476713 = fieldWeight in 1338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1338)
        0.023555283 = weight(_text_:web in 1338) [ClassicSimilarity], result of:
          0.023555283 = score(doc=1338,freq=2.0), product of:
            0.13065568 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04003532 = queryNorm
            0.18028519 = fieldWeight in 1338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1338)
      0.33333334 = coord(2/6)
    
    Abstract
    A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
  3. Stoykova, V.; Petkova, E.: Automatic extraction of mathematical terms for precalculus (2012) 0.02
    0.019505307 = product of:
      0.058515918 = sum of:
        0.045744486 = weight(_text_:world in 156) [ClassicSimilarity], result of:
          0.045744486 = score(doc=156,freq=2.0), product of:
            0.1538826 = queryWeight, product of:
              3.8436708 = idf(docFreq=2573, maxDocs=44218)
              0.04003532 = queryNorm
            0.29726875 = fieldWeight in 156, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8436708 = idf(docFreq=2573, maxDocs=44218)
              0.0546875 = fieldNorm(doc=156)
        0.012771431 = product of:
          0.038314294 = sum of:
            0.038314294 = weight(_text_:29 in 156) [ClassicSimilarity], result of:
              0.038314294 = score(doc=156,freq=2.0), product of:
                0.14083174 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.04003532 = queryNorm
                0.27205724 = fieldWeight in 156, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=156)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Content
    Beitrag für: First World Conference on Innovation and Computer Sciences (INSODE 2011). Vgl.: http://www.sciencedirect.com/science/article/pii/S221201731200103X.
    Date
    29. 5.2012 10:17:08
  4. Clark, M.; Kim, Y.; Kruschwitz, U.; Song, D.; Albakour, D.; Dignum, S.; Beresi, U.C.; Fasli, M.; Roeck, A De: Automatically structuring domain knowledge from text : an overview of current research (2012) 0.02
    0.018485319 = product of:
      0.055455957 = sum of:
        0.03997464 = weight(_text_:web in 2738) [ClassicSimilarity], result of:
          0.03997464 = score(doc=2738,freq=4.0), product of:
            0.13065568 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04003532 = queryNorm
            0.3059541 = fieldWeight in 2738, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=2738)
        0.015481315 = product of:
          0.046443943 = sum of:
            0.046443943 = weight(_text_:29 in 2738) [ClassicSimilarity], result of:
              0.046443943 = score(doc=2738,freq=4.0), product of:
                0.14083174 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.04003532 = queryNorm
                0.3297832 = fieldWeight in 2738, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2738)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    This paper presents an overview of automatic methods for building domain knowledge structures (domain models) from text collections. Applications of domain models have a long history within knowledge engineering and artificial intelligence. In the last couple of decades they have surfaced noticeably as a useful tool within natural language processing, information retrieval and semantic web technology. Inspired by the ubiquitous propagation of domain model structures that are emerging in several research disciplines, we give an overview of the current research landscape and some techniques and approaches. We will also discuss trade-offs between different approaches and point to some recent trends.
    Content
    Beitrag in einem Themenheft "Soft Approaches to IA on the Web". Vgl.: doi:10.1016/j.ipm.2011.07.002.
    Date
    29. 1.2016 18:29:51
  5. Gill, A.J.; Hinrichs-Krapels, S.; Blanke, T.; Grant, J.; Hedges, M.; Tanner, S.: Insight workflow : systematically combining human and computational methods to explore textual data (2017) 0.02
    0.018443786 = product of:
      0.055331357 = sum of:
        0.046208907 = weight(_text_:world in 3682) [ClassicSimilarity], result of:
          0.046208907 = score(doc=3682,freq=4.0), product of:
            0.1538826 = queryWeight, product of:
              3.8436708 = idf(docFreq=2573, maxDocs=44218)
              0.04003532 = queryNorm
            0.30028677 = fieldWeight in 3682, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8436708 = idf(docFreq=2573, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3682)
        0.009122452 = product of:
          0.027367353 = sum of:
            0.027367353 = weight(_text_:29 in 3682) [ClassicSimilarity], result of:
              0.027367353 = score(doc=3682,freq=2.0), product of:
                0.14083174 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.04003532 = queryNorm
                0.19432661 = fieldWeight in 3682, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3682)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    Analyzing large quantities of real-world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real-world data set as our case study, and use this exploration as a demonstration of our "insight workflow," which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in-depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.
    Date
    16.11.2017 14:00:29
  6. Babik, W.: Keywords as linguistic tools in information and knowledge organization (2017) 0.02
    0.01524961 = product of:
      0.04574883 = sum of:
        0.0329774 = weight(_text_:web in 3510) [ClassicSimilarity], result of:
          0.0329774 = score(doc=3510,freq=2.0), product of:
            0.13065568 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04003532 = queryNorm
            0.25239927 = fieldWeight in 3510, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3510)
        0.012771431 = product of:
          0.038314294 = sum of:
            0.038314294 = weight(_text_:29 in 3510) [ClassicSimilarity], result of:
              0.038314294 = score(doc=3510,freq=2.0), product of:
                0.14083174 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.04003532 = queryNorm
                0.27205724 = fieldWeight in 3510, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3510)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Source
    Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber
  7. Fóris, A.: Network theory and terminology (2013) 0.01
    0.013905007 = product of:
      0.041715022 = sum of:
        0.032674633 = weight(_text_:world in 1365) [ClassicSimilarity], result of:
          0.032674633 = score(doc=1365,freq=2.0), product of:
            0.1538826 = queryWeight, product of:
              3.8436708 = idf(docFreq=2573, maxDocs=44218)
              0.04003532 = queryNorm
            0.21233483 = fieldWeight in 1365, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8436708 = idf(docFreq=2573, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1365)
        0.009040388 = product of:
          0.027121164 = sum of:
            0.027121164 = weight(_text_:22 in 1365) [ClassicSimilarity], result of:
              0.027121164 = score(doc=1365,freq=2.0), product of:
                0.14019686 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04003532 = queryNorm
                0.19345059 = fieldWeight in 1365, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1365)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    The paper aims to present the relations of network theory and terminology. The model of scale-free networks, which has been recently developed and widely applied since, can be effectively used in terminology research as well. Operation based on the principle of networks is a universal characteristic of complex systems. Networks are governed by general laws. The model of scale-free networks can be viewed as a statistical-probability model, and it can be described with mathematical tools. Its main feature is that "everything is connected to everything else," that is, every node is reachable (in a few steps) starting from any other node; this phenomena is called "the small world phenomenon." The existence of a linguistic network and the general laws of the operation of networks enable us to place issues of language use in the complex system of relations that reveal the deeper connection s between phenomena with the help of networks embedded in each other. The realization of the metaphor that language also has a network structure is the basis of the classification methods of the terminological system, and likewise of the ways of creating terminology databases, which serve the purpose of providing easy and versatile accessibility to specialised knowledge.
    Date
    2. 9.2014 21:22:48
  8. Rettinger, A.; Schumilin, A.; Thoma, S.; Ell, B.: Learning a cross-lingual semantic representation of relations expressed in text (2015) 0.01
    0.013599649 = product of:
      0.081597894 = sum of:
        0.081597894 = weight(_text_:web in 2027) [ClassicSimilarity], result of:
          0.081597894 = score(doc=2027,freq=6.0), product of:
            0.13065568 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04003532 = queryNorm
            0.6245262 = fieldWeight in 2027, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.078125 = fieldNorm(doc=2027)
      0.16666667 = coord(1/6)
    
    Series
    Information Systems and Applications, incl. Internet/Web, and HCI; Bd. 9088
    Source
    The Semantic Web: latest advances and new domains. 12th European Semantic Web Conference, ESWC 2015 Portoroz, Slovenia, May 31 -- June 4, 2015. Proceedings. Eds.: F. Gandon u.a
  9. Vasalou, A.; Gill, A.J.; Mazanderani, F.; Papoutsi, C.; Joinson, A.: Privacy dictionary : a new resource for the automated content analysis of privacy (2011) 0.01
    0.008683693 = product of:
      0.052102152 = sum of:
        0.052102152 = weight(_text_:wide in 4915) [ClassicSimilarity], result of:
          0.052102152 = score(doc=4915,freq=2.0), product of:
            0.17738682 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.04003532 = queryNorm
            0.29372054 = fieldWeight in 4915, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.046875 = fieldNorm(doc=4915)
      0.16666667 = coord(1/6)
    
    Abstract
    This article presents the privacy dictionary, a new linguistic resource for automated content analysis on privacy-related texts. To overcome the definitional challenges inherent in privacy research, the dictionary was informed by an inclusive set of relevant theoretical perspectives. Using methods from corpus linguistics, we constructed and validated eight dictionary categories on empirical material from a wide range of privacy-sensitive contexts. It was shown that the dictionary categories are able to measure unique linguistic patterns within privacy discussions. At a time when privacy considerations are increasing and online resources provide ever-growing quantities of textual data, the privacy dictionary can play a significant role not only for research in the social sciences but also in technology design and policymaking.
  10. Korman, D.Z.; Mack, E.; Jett, J.; Renear, A.H.: Defining textual entailment (2018) 0.01
    0.008683693 = product of:
      0.052102152 = sum of:
        0.052102152 = weight(_text_:wide in 4284) [ClassicSimilarity], result of:
          0.052102152 = score(doc=4284,freq=2.0), product of:
            0.17738682 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.04003532 = queryNorm
            0.29372054 = fieldWeight in 4284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.046875 = fieldNorm(doc=4284)
      0.16666667 = coord(1/6)
    
    Abstract
    Textual entailment is a relationship that obtains between fragments of text when one fragment in some sense implies the other fragment. The automation of textual entailment recognition supports a wide variety of text-based tasks, including information retrieval, information extraction, question answering, text summarization, and machine translation. Much ingenuity has been devoted to developing algorithms for identifying textual entailments, but relatively little to saying what textual entailment actually is. This article is a review of the logical and philosophical issues involved in providing an adequate definition of textual entailment. We show that many natural definitions of textual entailment are refuted by counterexamples, including the most widely cited definition of Dagan et al. We then articulate and defend the following revised definition: T textually entails H?=?df typically, a human reading T would be justified in inferring the proposition expressed by H from the proposition expressed by T. We also show that textual entailment is context-sensitive, nontransitive, and nonmonotonic.
  11. Rajasurya, S.; Muralidharan, T.; Devi, S.; Swamynathan, S.: Semantic information retrieval using ontology in university domain (2012) 0.01
    0.007851761 = product of:
      0.047110565 = sum of:
        0.047110565 = weight(_text_:web in 2861) [ClassicSimilarity], result of:
          0.047110565 = score(doc=2861,freq=8.0), product of:
            0.13065568 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04003532 = queryNorm
            0.36057037 = fieldWeight in 2861, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2861)
      0.16666667 = coord(1/6)
    
    Abstract
    Today's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.
  12. Rozinajová, V.; Macko, P.: Using natural language to search linked data (2017) 0.01
    0.007851761 = product of:
      0.047110565 = sum of:
        0.047110565 = weight(_text_:web in 3488) [ClassicSimilarity], result of:
          0.047110565 = score(doc=3488,freq=8.0), product of:
            0.13065568 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04003532 = queryNorm
            0.36057037 = fieldWeight in 3488, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3488)
      0.16666667 = coord(1/6)
    
    Abstract
    There are many endeavors aiming to offer users more effective ways of getting relevant information from web. One of them is represented by a concept of Linked Data, which provides interconnected data sources. But querying these types of data is difficult not only for the conventional web users but also for ex-perts in this field. Therefore, a more comfortable way of user query would be of great value. One direction could be to allow the user to use a natural language. To make this task easier we have proposed a method for translating natural language query to SPARQL query. It is based on a sentence structure - utilizing dependen-cies between the words in user queries. Dependencies are used to map the query to the semantic web structure, which is in the next step translated to SPARQL query. According to our first experiments we are able to answer a significant group of user queries.
    Series
    Information Systems and Applications, incl. Internet/Web, and HCI; 10151
  13. Wong, W.; Liu, W.; Bennamoun, M.: Ontology learning from text : a look back and into the future (2010) 0.01
    0.007772847 = product of:
      0.04663708 = sum of:
        0.04663708 = weight(_text_:web in 4733) [ClassicSimilarity], result of:
          0.04663708 = score(doc=4733,freq=4.0), product of:
            0.13065568 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04003532 = queryNorm
            0.35694647 = fieldWeight in 4733, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4733)
      0.16666667 = coord(1/6)
    
    Abstract
    Ontologies are often viewed as the answer to the need for inter-operable semantics in modern information systems. The explosion of textual information on the "Read/Write" Web coupled with the increasing demand for ontologies to power the Semantic Web have made (semi-)automatic ontology learning from text a very promising research area. This together with the advanced state in related areas such as natural language processing have fuelled research into ontology learning over the past decade. This survey looks at how far we have come since the turn of the millennium, and discusses the remaining challenges that will define the research directions in this area in the near future.
  14. Hmeidi, I.I.; Al-Shalabi, R.F.; Al-Taani, A.T.; Najadat, H.; Al-Hazaimeh, S.A.: ¬A novel approach to the extraction of roots from Arabic words using bigrams (2010) 0.01
    0.007236411 = product of:
      0.043418463 = sum of:
        0.043418463 = weight(_text_:wide in 3426) [ClassicSimilarity], result of:
          0.043418463 = score(doc=3426,freq=2.0), product of:
            0.17738682 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.04003532 = queryNorm
            0.24476713 = fieldWeight in 3426, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3426)
      0.16666667 = coord(1/6)
    
    Abstract
    Root extraction is one of the most important topics in information retrieval (IR), natural language processing (NLP), text summarization, and many other important fields. In the last two decades, several algorithms have been proposed to extract Arabic roots. Most of these algorithms dealt with triliteral roots only, and some with fixed length words only. In this study, a novel approach to the extraction of roots from Arabic words using bigrams is proposed. Two similarity measures are used, the dissimilarity measure called the Manhattan distance, and Dice's measure of similarity. The proposed algorithm is tested on the Holy Qu'ran and on a corpus of 242 abstracts from the Proceedings of the Saudi Arabian National Computer Conferences. The two files used contain a wide range of data: the Holy Qu'ran contains most of the ancient Arabic words while the other file contains some modern Arabic words and some words borrowed from foreign languages in addition to the original Arabic words. The results of this study showed that combining N-grams with the Dice measure gives better results than using the Manhattan distance measure.
  15. Muresan, S.; Klavans, J.L.: Inducing terminologies from text : a case study for the consumer health domain (2013) 0.01
    0.0066624405 = product of:
      0.03997464 = sum of:
        0.03997464 = weight(_text_:web in 682) [ClassicSimilarity], result of:
          0.03997464 = score(doc=682,freq=4.0), product of:
            0.13065568 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04003532 = queryNorm
            0.3059541 = fieldWeight in 682, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=682)
      0.16666667 = coord(1/6)
    
    Abstract
    Specialized medical ontologies and terminologies, such as SNOMED CT and the Unified Medical Language System (UMLS), have been successfully leveraged in medical information systems to provide a standard web-accessible medium for interoperability, access, and reuse. However, these clinically oriented terminologies and ontologies cannot provide sufficient support when integrated into consumer-oriented applications, because these applications must "understand" both technical and lay vocabulary. The latter is not part of these specialized terminologies and ontologies. In this article, we propose a two-step approach for building consumer health terminologies from text: 1) automatic extraction of definitions from consumer-oriented articles and web documents, which reflects language in use, rather than relying solely on dictionaries, and 2) learning to map definitions expressed in natural language to terminological knowledge by inducing a syntactic-semantic grammar rather than using hand-written patterns or grammars. We present quantitative and qualitative evaluations of our two-step approach, which show that our framework could be used to induce consumer health terminologies from text.
  16. Perovsek, M.; Kranjca, J.; Erjaveca, T.; Cestnika, B.; Lavraca, N.: TextFlows : a visual programming platform for text mining and natural language processing (2016) 0.01
    0.0066624405 = product of:
      0.03997464 = sum of:
        0.03997464 = weight(_text_:web in 2697) [ClassicSimilarity], result of:
          0.03997464 = score(doc=2697,freq=4.0), product of:
            0.13065568 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04003532 = queryNorm
            0.3059541 = fieldWeight in 2697, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=2697)
      0.16666667 = coord(1/6)
    
    Abstract
    Text mining and natural language processing are fast growing areas of research, with numerous applications in business, science and creative industries. This paper presents TextFlows, a web-based text mining and natural language processing platform supporting workflow construction, sharing and execution. The platform enables visual construction of text mining workflows through a web browser, and the execution of the constructed workflows on a processing cloud. This makes TextFlows an adaptable infrastructure for the construction and sharing of text processing workflows, which can be reused in various applications. The paper presents the implemented text mining and language processing modules, and describes some precomposed workflows. Their features are demonstrated on three use cases: comparison of document classifiers and of different part-of-speech taggers on a text categorization problem, and outlier detection in document corpora.
  17. Lu, C.; Bu, Y.; Wang, J.; Ding, Y.; Torvik, V.; Schnaars, M.; Zhang, C.: Examining scientific writing styles from the perspective of linguistic complexity : a cross-level moderation model (2019) 0.01
    0.0065349266 = product of:
      0.03920956 = sum of:
        0.03920956 = weight(_text_:world in 5219) [ClassicSimilarity], result of:
          0.03920956 = score(doc=5219,freq=2.0), product of:
            0.1538826 = queryWeight, product of:
              3.8436708 = idf(docFreq=2573, maxDocs=44218)
              0.04003532 = queryNorm
            0.25480178 = fieldWeight in 5219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8436708 = idf(docFreq=2573, maxDocs=44218)
              0.046875 = fieldNorm(doc=5219)
      0.16666667 = coord(1/6)
    
    Abstract
    Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. To uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (a) syntactic complexity, including measurements of sentence length and sentence complexity; and (b) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity.
  18. Karlova-Bourbonus, N.: Automatic detection of contradictions in texts (2018) 0.01
    0.0065349266 = product of:
      0.03920956 = sum of:
        0.03920956 = weight(_text_:world in 5976) [ClassicSimilarity], result of:
          0.03920956 = score(doc=5976,freq=8.0), product of:
            0.1538826 = queryWeight, product of:
              3.8436708 = idf(docFreq=2573, maxDocs=44218)
              0.04003532 = queryNorm
            0.25480178 = fieldWeight in 5976, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8436708 = idf(docFreq=2573, maxDocs=44218)
              0.0234375 = fieldNorm(doc=5976)
      0.16666667 = coord(1/6)
    
    Abstract
    Natural language contradictions are of complex nature. As will be shown in Chapter 5, the realization of contradictions is not limited to the examples such as Socrates is a man and Socrates is not a man (under the condition that Socrates refers to the same object in the real world), which is discussed by Aristotle (Section 3.1.1). Empirical evidence (see Chapter 5 for more details) shows that only a few contradictions occurring in the real life are of that explicit (prototypical) kind. Rather, con-tradictions make use of a variety of natural language devices such as, e.g., paraphrasing, synonyms and antonyms, passive and active voice, diversity of negation expression, and figurative linguistic means such as idioms, irony, and metaphors. Additionally, the most so-phisticated kind of contradictions, the so-called implicit contradictions, can be found only when applying world knowledge and after conducting a sequence of logical operations such as e.g. in: (1.1) The first prize was given to the experienced grandmaster L. Stein who, in total, col-lected ten points (7 wins and 3 draws). Those familiar with the chess rules know that a chess player gets one point for winning and zero points for losing the game. In case of a draw, each player gets a half point. Built on this idea and by conducting some simple mathematical operations, we can infer that in the case of 7 wins and 3 draws (the second part of the sentence), a player can only collect 8.5 points and not 10 points. Hence, we observe that there is a contradiction between the first and the second parts of the sentence.
    Implicit contradictions will only partially be the subject of the present study, aiming primarily at identifying the realization mechanism and cues (Chapter 5) as well as finding the parts of contradictions by applying the state of the art algorithms for natural language processing without conducting deep meaning processing. Further in focus are the explicit and implicit contradictions that can be detected by means of explicit linguistic, structural, lexical cues, and by conducting some additional processing operations (e.g., counting the sum in order to detect contradictions arising from numerical divergencies). One should note that an additional complexity in finding contradictions can arise in case parts of the contradictions occur on different levels of realization. Thus, a contradiction can be observed on the word- and phrase-level, such as in a married bachelor (for variations of contradictions on lexical level, see Ganeev 2004), on the sentence level - between parts of a sentence or between two or more sentences, or on the text level - between the portions of a text or between the whole texts such as a contradiction between the Bible and the Quran, for example. Only contradictions arising at the level of single sentences occurring in one or more texts, as well as parts of a sentence, will be considered for the purpose of this study. Though the focus of interest will be on single sentences, it will make use of text particularities such as coreference resolution without establishing the referents in the real world. Finally, another aspect to be considered is that parts of the contradictions are not neces-sarily to appear at the same time. They can be separated by many years and centuries with or without time expression making their recognition by human and detection by machine challenging. According to Aristotle's ontological version of the LNC (Section 3.1.1), how-ever, the same time reference is required in order for two statements to be judged as a contradiction. Taking this into account, we set the borders for the study by limiting the ana-lyzed textual data thematically (only nine world events) and temporally (three days after the reported event had happened) (Section 5.1). No sophisticated time processing will thus be conducted.
  19. Sünkler, S.; Kerkmann, F.; Schultheiß, S.: Ok Google . the end of search as we know it : sprachgesteuerte Websuche im Test (2018) 0.01
    0.0054962332 = product of:
      0.0329774 = sum of:
        0.0329774 = weight(_text_:web in 5626) [ClassicSimilarity], result of:
          0.0329774 = score(doc=5626,freq=2.0), product of:
            0.13065568 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04003532 = queryNorm
            0.25239927 = fieldWeight in 5626, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5626)
      0.16666667 = coord(1/6)
    
    Abstract
    Sprachsteuerungssysteme, die den Nutzer auf Zuruf unterstützen, werden im Zuge der Verbreitung von Smartphones und Lautsprechersystemen wie Amazon Echo oder Google Home zunehmend populär. Eine der zentralen Anwendungen dabei stellt die Suche in Websuchmaschinen dar. Wie aber funktioniert "googlen", wenn der Nutzer seine Suchanfrage nicht schreibt, sondern spricht? Dieser Frage ist ein Projektteam der HAW Hamburg nachgegangen und hat im Auftrag der Deutschen Telekom untersucht, wie effektiv, effizient und zufriedenstellend Google Now, Apple Siri, Microsoft Cortana sowie das Amazon Fire OS arbeiten. Ermittelt wurden Stärken und Schwächen der Systeme sowie Erfolgskriterien für eine hohe Gebrauchstauglichkeit. Diese Erkenntnisse mündeten in dem Prototyp einer optimalen Voice Web Search.
  20. Spitkovsky, V.; Norvig, P.: From words to concepts and back : dictionaries for linking text, entities and ideas (2012) 0.01
    0.00543986 = product of:
      0.032639157 = sum of:
        0.032639157 = weight(_text_:web in 337) [ClassicSimilarity], result of:
          0.032639157 = score(doc=337,freq=6.0), product of:
            0.13065568 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04003532 = queryNorm
            0.24981049 = fieldWeight in 337, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=337)
      0.16666667 = coord(1/6)
    
    Abstract
    Human language is both rich and ambiguous. When we hear or read words, we resolve meanings to mental representations, for example recognizing and linking names to the intended persons, locations or organizations. Bridging words and meaning - from turning search queries into relevant results to suggesting targeted keywords for advertisers - is also Google's core competency, and important for many other tasks in information retrieval and natural language processing. We are happy to release a resource, spanning 7,560,141 concepts and 175,100,788 unique text strings, that we hope will help everyone working in these areas. How do we represent concepts? Our approach piggybacks on the unique titles of entries from an encyclopedia, which are mostly proper and common noun phrases. We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia's groupings of articles into hierarchical categories. The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article's canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept's url. Our database thus includes weights that measure degrees of association. For example, the top two entries for football indicate that it is an ambiguous term, which is almost twice as likely to refer to what we in the US call soccer. Vgl. auch: Spitkovsky, V.I., A.X. Chang: A cross-lingual dictionary for english Wikipedia concepts. In: http://nlp.stanford.edu/pubs/crosswikis.pdf.

Languages

  • e 31
  • d 6

Types

  • a 28
  • el 9
  • x 2
  • m 1
  • More… Less…

Classifications