Search (31 results, page 1 of 2)

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.13

0.12907062 = product of:
  0.25814125 = sum of:
    0.19076009 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.19076009 = score(doc=563,freq=2.0), product of:
        0.33941987 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.04003532 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.056532677 = weight(_text_:web in 563) [ClassicSimilarity], result of:
      0.056532677 = score(doc=563,freq=8.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.43268442 = fieldWeight in 563, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.010848465 = product of:
      0.032545395 = sum of:
        0.032545395 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
          0.032545395 = score(doc=563,freq=2.0), product of:
            0.14019686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04003532 = queryNorm
            0.23214069 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
      0.33333334 = coord(1/3)
  0.5 = coord(3/6)

Abstract: In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Content: A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Date: 10. 1.2013 19:22:47

Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.02
```
0.022324583 = product of:
  0.066973746 = sum of:
    0.043418463 = weight(_text_:wide in 1338) [ClassicSimilarity], result of:
      0.043418463 = score(doc=1338,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.24476713 = fieldWeight in 1338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
    0.023555283 = weight(_text_:web in 1338) [ClassicSimilarity], result of:
      0.023555283 = score(doc=1338,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.18028519 = fieldWeight in 1338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
  0.33333334 = coord(2/6)
```
Abstract

A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.

Stoykova, V.; Petkova, E.: Automatic extraction of mathematical terms for precalculus (2012) 0.02

0.019505307 = product of:
  0.058515918 = sum of:
    0.045744486 = weight(_text_:world in 156) [ClassicSimilarity], result of:
      0.045744486 = score(doc=156,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.29726875 = fieldWeight in 156, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0546875 = fieldNorm(doc=156)
    0.012771431 = product of:
      0.038314294 = sum of:
        0.038314294 = weight(_text_:29 in 156) [ClassicSimilarity], result of:
          0.038314294 = score(doc=156,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.27205724 = fieldWeight in 156, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=156)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Content: Beitrag für: First World Conference on Innovation and Computer Sciences (INSODE 2011). Vgl.: http://www.sciencedirect.com/science/article/pii/S221201731200103X.
Date: 29. 5.2012 10:17:08

Clark, M.; Kim, Y.; Kruschwitz, U.; Song, D.; Albakour, D.; Dignum, S.; Beresi, U.C.; Fasli, M.; Roeck, A De: Automatically structuring domain knowledge from text : an overview of current research (2012) 0.02

0.018485319 = product of:
  0.055455957 = sum of:
    0.03997464 = weight(_text_:web in 2738) [ClassicSimilarity], result of:
      0.03997464 = score(doc=2738,freq=4.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.3059541 = fieldWeight in 2738, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2738)
    0.015481315 = product of:
      0.046443943 = sum of:
        0.046443943 = weight(_text_:29 in 2738) [ClassicSimilarity], result of:
          0.046443943 = score(doc=2738,freq=4.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.3297832 = fieldWeight in 2738, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=2738)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Abstract: This paper presents an overview of automatic methods for building domain knowledge structures (domain models) from text collections. Applications of domain models have a long history within knowledge engineering and artificial intelligence. In the last couple of decades they have surfaced noticeably as a useful tool within natural language processing, information retrieval and semantic web technology. Inspired by the ubiquitous propagation of domain model structures that are emerging in several research disciplines, we give an overview of the current research landscape and some techniques and approaches. We will also discuss trade-offs between different approaches and point to some recent trends.
Content: Beitrag in einem Themenheft "Soft Approaches to IA on the Web". Vgl.: doi:10.1016/j.ipm.2011.07.002.
Date: 29. 1.2016 18:29:51

Gill, A.J.; Hinrichs-Krapels, S.; Blanke, T.; Grant, J.; Hedges, M.; Tanner, S.: Insight workflow : systematically combining human and computational methods to explore textual data (2017) 0.02
```
0.018443786 = product of:
  0.055331357 = sum of:
    0.046208907 = weight(_text_:world in 3682) [ClassicSimilarity], result of:
      0.046208907 = score(doc=3682,freq=4.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.30028677 = fieldWeight in 3682, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3682)
    0.009122452 = product of:
      0.027367353 = sum of:
        0.027367353 = weight(_text_:29 in 3682) [ClassicSimilarity], result of:
          0.027367353 = score(doc=3682,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.19432661 = fieldWeight in 3682, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3682)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Abstract

Analyzing large quantities of real-world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real-world data set as our case study, and use this exploration as a demonstration of our "insight workflow," which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in-depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.

Date

16.11.2017 14:00:29

Babik, W.: Keywords as linguistic tools in information and knowledge organization (2017) 0.02

0.01524961 = product of:
  0.04574883 = sum of:
    0.0329774 = weight(_text_:web in 3510) [ClassicSimilarity], result of:
      0.0329774 = score(doc=3510,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.25239927 = fieldWeight in 3510, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3510)
    0.012771431 = product of:
      0.038314294 = sum of:
        0.038314294 = weight(_text_:29 in 3510) [ClassicSimilarity], result of:
          0.038314294 = score(doc=3510,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.27205724 = fieldWeight in 3510, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3510)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Source: Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber

Fóris, A.: Network theory and terminology (2013) 0.01
```
0.013905007 = product of:
  0.041715022 = sum of:
    0.032674633 = weight(_text_:world in 1365) [ClassicSimilarity], result of:
      0.032674633 = score(doc=1365,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.21233483 = fieldWeight in 1365, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1365)
    0.009040388 = product of:
      0.027121164 = sum of:
        0.027121164 = weight(_text_:22 in 1365) [ClassicSimilarity], result of:
          0.027121164 = score(doc=1365,freq=2.0), product of:
            0.14019686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04003532 = queryNorm
            0.19345059 = fieldWeight in 1365, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1365)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Abstract

The paper aims to present the relations of network theory and terminology. The model of scale-free networks, which has been recently developed and widely applied since, can be effectively used in terminology research as well. Operation based on the principle of networks is a universal characteristic of complex systems. Networks are governed by general laws. The model of scale-free networks can be viewed as a statistical-probability model, and it can be described with mathematical tools. Its main feature is that "everything is connected to everything else," that is, every node is reachable (in a few steps) starting from any other node; this phenomena is called "the small world phenomenon." The existence of a linguistic network and the general laws of the operation of networks enable us to place issues of language use in the complex system of relations that reveal the deeper connection s between phenomena with the help of networks embedded in each other. The realization of the metaphor that language also has a network structure is the basis of the classification methods of the terminological system, and likewise of the ways of creating terminology databases, which serve the purpose of providing easy and versatile accessibility to specialised knowledge.

Date

2. 9.2014 21:22:48

Rettinger, A.; Schumilin, A.; Thoma, S.; Ell, B.: Learning a cross-lingual semantic representation of relations expressed in text (2015) 0.01

0.013599649 = product of:
  0.081597894 = sum of:
    0.081597894 = weight(_text_:web in 2027) [ClassicSimilarity], result of:
      0.081597894 = score(doc=2027,freq=6.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.6245262 = fieldWeight in 2027, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=2027)
  0.16666667 = coord(1/6)

Series: Information Systems and Applications, incl. Internet/Web, and HCI; Bd. 9088
Source: The Semantic Web: latest advances and new domains. 12th European Semantic Web Conference, ESWC 2015 Portoroz, Slovenia, May 31 -- June 4, 2015. Proceedings. Eds.: F. Gandon u.a

Vasalou, A.; Gill, A.J.; Mazanderani, F.; Papoutsi, C.; Joinson, A.: Privacy dictionary : a new resource for the automated content analysis of privacy (2011) 0.01
```
0.008683693 = product of:
  0.052102152 = sum of:
    0.052102152 = weight(_text_:wide in 4915) [ClassicSimilarity], result of:
      0.052102152 = score(doc=4915,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.29372054 = fieldWeight in 4915, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=4915)
  0.16666667 = coord(1/6)
```
Abstract

This article presents the privacy dictionary, a new linguistic resource for automated content analysis on privacy-related texts. To overcome the definitional challenges inherent in privacy research, the dictionary was informed by an inclusive set of relevant theoretical perspectives. Using methods from corpus linguistics, we constructed and validated eight dictionary categories on empirical material from a wide range of privacy-sensitive contexts. It was shown that the dictionary categories are able to measure unique linguistic patterns within privacy discussions. At a time when privacy considerations are increasing and online resources provide ever-growing quantities of textual data, the privacy dictionary can play a significant role not only for research in the social sciences but also in technology design and policymaking.
Korman, D.Z.; Mack, E.; Jett, J.; Renear, A.H.: Defining textual entailment (2018) 0.01
```
0.008683693 = product of:
  0.052102152 = sum of:
    0.052102152 = weight(_text_:wide in 4284) [ClassicSimilarity], result of:
      0.052102152 = score(doc=4284,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.29372054 = fieldWeight in 4284, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=4284)
  0.16666667 = coord(1/6)
```
Abstract

Textual entailment is a relationship that obtains between fragments of text when one fragment in some sense implies the other fragment. The automation of textual entailment recognition supports a wide variety of text-based tasks, including information retrieval, information extraction, question answering, text summarization, and machine translation. Much ingenuity has been devoted to developing algorithms for identifying textual entailments, but relatively little to saying what textual entailment actually is. This article is a review of the logical and philosophical issues involved in providing an adequate definition of textual entailment. We show that many natural definitions of textual entailment are refuted by counterexamples, including the most widely cited definition of Dagan et al. We then articulate and defend the following revised definition: T textually entails H?=?df typically, a human reading T would be justified in inferring the proposition expressed by H from the proposition expressed by T. We also show that textual entailment is context-sensitive, nontransitive, and nonmonotonic.
Rajasurya, S.; Muralidharan, T.; Devi, S.; Swamynathan, S.: Semantic information retrieval using ontology in university domain (2012) 0.01
```
0.007851761 = product of:
  0.047110565 = sum of:
    0.047110565 = weight(_text_:web in 2861) [ClassicSimilarity], result of:
      0.047110565 = score(doc=2861,freq=8.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.36057037 = fieldWeight in 2861, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2861)
  0.16666667 = coord(1/6)
```
Abstract

Today's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.
Rozinajová, V.; Macko, P.: Using natural language to search linked data (2017) 0.01
```
0.007851761 = product of:
  0.047110565 = sum of:
    0.047110565 = weight(_text_:web in 3488) [ClassicSimilarity], result of:
      0.047110565 = score(doc=3488,freq=8.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.36057037 = fieldWeight in 3488, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3488)
  0.16666667 = coord(1/6)
```
Abstract

There are many endeavors aiming to offer users more effective ways of getting relevant information from web. One of them is represented by a concept of Linked Data, which provides interconnected data sources. But querying these types of data is difficult not only for the conventional web users but also for ex-perts in this field. Therefore, a more comfortable way of user query would be of great value. One direction could be to allow the user to use a natural language. To make this task easier we have proposed a method for translating natural language query to SPARQL query. It is based on a sentence structure - utilizing dependen-cies between the words in user queries. Dependencies are used to map the query to the semantic web structure, which is in the next step translated to SPARQL query. According to our first experiments we are able to answer a significant group of user queries.

Series

Information Systems and Applications, incl. Internet/Web, and HCI; 10151
Wong, W.; Liu, W.; Bennamoun, M.: Ontology learning from text : a look back and into the future (2010) 0.01
```
0.007772847 = product of:
  0.04663708 = sum of:
    0.04663708 = weight(_text_:web in 4733) [ClassicSimilarity], result of:
      0.04663708 = score(doc=4733,freq=4.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.35694647 = fieldWeight in 4733, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4733)
  0.16666667 = coord(1/6)
```
Abstract

Ontologies are often viewed as the answer to the need for inter-operable semantics in modern information systems. The explosion of textual information on the "Read/Write" Web coupled with the increasing demand for ontologies to power the Semantic Web have made (semi-)automatic ontology learning from text a very promising research area. This together with the advanced state in related areas such as natural language processing have fuelled research into ontology learning over the past decade. This survey looks at how far we have come since the turn of the millennium, and discusses the remaining challenges that will define the research directions in this area in the near future.
Hmeidi, I.I.; Al-Shalabi, R.F.; Al-Taani, A.T.; Najadat, H.; Al-Hazaimeh, S.A.: ¬A novel approach to the extraction of roots from Arabic words using bigrams (2010) 0.01
```
0.007236411 = product of:
  0.043418463 = sum of:
    0.043418463 = weight(_text_:wide in 3426) [ClassicSimilarity], result of:
      0.043418463 = score(doc=3426,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.24476713 = fieldWeight in 3426, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3426)
  0.16666667 = coord(1/6)
```
Abstract

Root extraction is one of the most important topics in information retrieval (IR), natural language processing (NLP), text summarization, and many other important fields. In the last two decades, several algorithms have been proposed to extract Arabic roots. Most of these algorithms dealt with triliteral roots only, and some with fixed length words only. In this study, a novel approach to the extraction of roots from Arabic words using bigrams is proposed. Two similarity measures are used, the dissimilarity measure called the Manhattan distance, and Dice's measure of similarity. The proposed algorithm is tested on the Holy Qu'ran and on a corpus of 242 abstracts from the Proceedings of the Saudi Arabian National Computer Conferences. The two files used contain a wide range of data: the Holy Qu'ran contains most of the ancient Arabic words while the other file contains some modern Arabic words and some words borrowed from foreign languages in addition to the original Arabic words. The results of this study showed that combining N-grams with the Dice measure gives better results than using the Manhattan distance measure.
Muresan, S.; Klavans, J.L.: Inducing terminologies from text : a case study for the consumer health domain (2013) 0.01
```
0.0066624405 = product of:
  0.03997464 = sum of:
    0.03997464 = weight(_text_:web in 682) [ClassicSimilarity], result of:
      0.03997464 = score(doc=682,freq=4.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.3059541 = fieldWeight in 682, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=682)
  0.16666667 = coord(1/6)
```
Abstract

Specialized medical ontologies and terminologies, such as SNOMED CT and the Unified Medical Language System (UMLS), have been successfully leveraged in medical information systems to provide a standard web-accessible medium for interoperability, access, and reuse. However, these clinically oriented terminologies and ontologies cannot provide sufficient support when integrated into consumer-oriented applications, because these applications must "understand" both technical and lay vocabulary. The latter is not part of these specialized terminologies and ontologies. In this article, we propose a two-step approach for building consumer health terminologies from text: 1) automatic extraction of definitions from consumer-oriented articles and web documents, which reflects language in use, rather than relying solely on dictionaries, and 2) learning to map definitions expressed in natural language to terminological knowledge by inducing a syntactic-semantic grammar rather than using hand-written patterns or grammars. We present quantitative and qualitative evaluations of our two-step approach, which show that our framework could be used to induce consumer health terminologies from text.
Perovsek, M.; Kranjca, J.; Erjaveca, T.; Cestnika, B.; Lavraca, N.: TextFlows : a visual programming platform for text mining and natural language processing (2016) 0.01
```
0.0066624405 = product of:
  0.03997464 = sum of:
    0.03997464 = weight(_text_:web in 2697) [ClassicSimilarity], result of:
      0.03997464 = score(doc=2697,freq=4.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.3059541 = fieldWeight in 2697, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2697)
  0.16666667 = coord(1/6)
```
Abstract

Text mining and natural language processing are fast growing areas of research, with numerous applications in business, science and creative industries. This paper presents TextFlows, a web-based text mining and natural language processing platform supporting workflow construction, sharing and execution. The platform enables visual construction of text mining workflows through a web browser, and the execution of the constructed workflows on a processing cloud. This makes TextFlows an adaptable infrastructure for the construction and sharing of text processing workflows, which can be reused in various applications. The paper presents the implemented text mining and language processing modules, and describes some precomposed workflows. Their features are demonstrated on three use cases: comparison of document classifiers and of different part-of-speech taggers on a text categorization problem, and outlier detection in document corpora.
Lu, C.; Bu, Y.; Wang, J.; Ding, Y.; Torvik, V.; Schnaars, M.; Zhang, C.: Examining scientific writing styles from the perspective of linguistic complexity : a cross-level moderation model (2019) 0.01
```
0.0065349266 = product of:
  0.03920956 = sum of:
    0.03920956 = weight(_text_:world in 5219) [ClassicSimilarity], result of:
      0.03920956 = score(doc=5219,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.25480178 = fieldWeight in 5219, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=5219)
  0.16666667 = coord(1/6)
```
Abstract

Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. To uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (a) syntactic complexity, including measurements of sentence length and sentence complexity; and (b) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity.
Spitkovsky, V.; Norvig, P.: From words to concepts and back : dictionaries for linking text, entities and ideas (2012) 0.01
```
0.00543986 = product of:
  0.032639157 = sum of:
    0.032639157 = weight(_text_:web in 337) [ClassicSimilarity], result of:
      0.032639157 = score(doc=337,freq=6.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.24981049 = fieldWeight in 337, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=337)
  0.16666667 = coord(1/6)
```
Abstract

Human language is both rich and ambiguous. When we hear or read words, we resolve meanings to mental representations, for example recognizing and linking names to the intended persons, locations or organizations. Bridging words and meaning - from turning search queries into relevant results to suggesting targeted keywords for advertisers - is also Google's core competency, and important for many other tasks in information retrieval and natural language processing. We are happy to release a resource, spanning 7,560,141 concepts and 175,100,788 unique text strings, that we hope will help everyone working in these areas. How do we represent concepts? Our approach piggybacks on the unique titles of entries from an encyclopedia, which are mostly proper and common noun phrases. We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia's groupings of articles into hierarchical categories. The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article's canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept's url. Our database thus includes weights that measure degrees of association. For example, the top two entries for football indicate that it is an ambiguous term, which is almost twice as likely to refer to what we in the US call soccer. Vgl. auch: Spitkovsky, V.I., A.X. Chang: A cross-lingual dictionary for english Wikipedia concepts. In: http://nlp.stanford.edu/pubs/crosswikis.pdf.
Bowker, L.; Ciro, J.B.: Machine translation and global research : towards improved machine translation literacy in the scholarly community (2019) 0.00
```
0.004356618 = product of:
  0.026139706 = sum of:
    0.026139706 = weight(_text_:world in 5970) [ClassicSimilarity], result of:
      0.026139706 = score(doc=5970,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.16986786 = fieldWeight in 5970, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.03125 = fieldNorm(doc=5970)
  0.16666667 = coord(1/6)
```
Abstract

In the global research community, English has become the main language of scholarly publishing in many disciplines. At the same time, online machine translation systems have become increasingly easy to access and use. Is this a researcher's match made in heaven, or the road to publication perdition? Here Lynne Bowker and Jairo Buitrago Ciro introduce the concept of machine translation literacy, a new kind of literacy for scholars and librarians in the digital age. For scholars, they explain how machine translation works, how it is (or could be) used for scholarly communication, and how both native and non-native English-speakers can write in a translation-friendly way in order to harness its potential. Native English speakers can continue to write in English, but expand the global reach of their research by making it easier for their peers around the world to access and understand their works, while non-native English speakers can write in their mother tongues, but leverage machine translation technology to help them produce draft publications in English. For academic librarians, the authors provide a framework for supporting researchers in all disciplines as they grapple with producing translation-friendly texts and using machine translation for scholarly communication - a form of support that will only become more important as campuses become increasingly international and as universities continue to strive to excel on the global stage. Machine Translation and Global Research is a must-read for scientists, researchers, students, and librarians eager to maximize the global reach and impact of any form of scholarly work.
Levin, M.; Krawczyk, S.; Bethard, S.; Jurafsky, D.: Citation-based bootstrapping for large-scale author disambiguation (2012) 0.00
```
0.0039258804 = product of:
  0.023555283 = sum of:
    0.023555283 = weight(_text_:web in 246) [ClassicSimilarity], result of:
      0.023555283 = score(doc=246,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.18028519 = fieldWeight in 246, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=246)
  0.16666667 = coord(1/6)
```
Abstract

We present a new, two-stage, self-supervised algorithm for author disambiguation in large bibliographic databases. In the first "bootstrap" stage, a collection of high-precision features is used to bootstrap a training set with positive and negative examples of coreferring authors. A supervised feature-based classifier is then trained on the bootstrap clusters and used to cluster the authors in a larger unlabeled dataset. Our self-supervised approach shares the advantages of unsupervised approaches (no need for expensive hand labels) as well as supervised approaches (a rich set of features that can be discriminatively trained). The algorithm disambiguates 54,000,000 author instances in Thomson Reuters' Web of Knowledge with B3 F1 of.807. We analyze parameters and features, particularly those from citation networks, which have not been deeply investigated in author disambiguation. The most important citation feature is self-citation, which can be approximated without expensive extraction of the full network. For the supervised stage, the minor improvement due to other citation features (increasing F1 from.748 to.767) suggests they may not be worth the trouble of extracting from databases that don't already have them. A lean feature set without expensive abstract and title features performs 130 times faster with about equal F1.

Search (31 results, page 1 of 2)

Authors

Types

Themes

Subjects

Classifications