Search (102 results, page 1 of 6)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.10

0.10263703 = sum of:
  0.081723005 = product of:
    0.24516901 = sum of:
      0.24516901 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.24516901 = score(doc=562,freq=2.0), product of:
          0.4362298 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.05145426 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.020914026 = product of:
    0.04182805 = sum of:
      0.04182805 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.04182805 = score(doc=562,freq=2.0), product of:
          0.18018405 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05145426 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.5 = coord(1/2)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Basili, R.; Pazienza, M.T.; Velardi, P.: ¬An empirical symbolic approach to natural language processing (1996) 0.07

0.07321742 = product of:
  0.14643484 = sum of:
    0.14643484 = sum of:
      0.0906641 = weight(_text_:learning in 6753) [ClassicSimilarity], result of:
        0.0906641 = score(doc=6753,freq=2.0), product of:
          0.22973695 = queryWeight, product of:
            4.464877 = idf(docFreq=1382, maxDocs=44218)
            0.05145426 = queryNorm
          0.3946431 = fieldWeight in 6753, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.464877 = idf(docFreq=1382, maxDocs=44218)
            0.0625 = fieldNorm(doc=6753)
      0.055770736 = weight(_text_:22 in 6753) [ClassicSimilarity], result of:
        0.055770736 = score(doc=6753,freq=2.0), product of:
          0.18018405 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05145426 = queryNorm
          0.30952093 = fieldWeight in 6753, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=6753)
  0.5 = coord(1/2)

Abstract: Describes and evaluates the results of a large scale lexical learning system, ARISTO-LEX, that uses a combination of probabilisitc and knowledge based methods for the acquisition of selectional restrictions of words in sublanguages. Presents experimental data obtained from different corpora in different doamins and languages, and shows that the acquired lexical data not only have practical applications in natural language processing, but they are useful for a comparative analysis of sublanguages
Date: 6. 3.1997 16:22:15

Morris, V.: Automated language identification of bibliographic resources (2020) 0.07

0.07321742 = product of:
  0.14643484 = sum of:
    0.14643484 = sum of:
      0.0906641 = weight(_text_:learning in 5749) [ClassicSimilarity], result of:
        0.0906641 = score(doc=5749,freq=2.0), product of:
          0.22973695 = queryWeight, product of:
            4.464877 = idf(docFreq=1382, maxDocs=44218)
            0.05145426 = queryNorm
          0.3946431 = fieldWeight in 5749, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.464877 = idf(docFreq=1382, maxDocs=44218)
            0.0625 = fieldNorm(doc=5749)
      0.055770736 = weight(_text_:22 in 5749) [ClassicSimilarity], result of:
        0.055770736 = score(doc=5749,freq=2.0), product of:
          0.18018405 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05145426 = queryNorm
          0.30952093 = fieldWeight in 5749, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=5749)
  0.5 = coord(1/2)

Abstract: This article describes experiments in the use of machine learning techniques at the British Library to assign language codes to catalog records, in order to provide information about the language of content of the resources described. In the first phase of the project, language codes were assigned to 1.15 million records with 99.7% confidence. The automated language identification tools developed will be used to contribute to future enhancement of over 4 million legacy records.
Date: 2. 3.2020 19:04:22

Kuo, J.-S.; Li, H.; Yang, Y.-K.: Active learning for constructing transliteration lexicons from the Web (2008) 0.05
```
0.052472588 = product of:
  0.104945175 = sum of:
    0.104945175 = product of:
      0.20989035 = sum of:
        0.20989035 = weight(_text_:learning in 1345) [ClassicSimilarity], result of:
          0.20989035 = score(doc=1345,freq=14.0), product of:
            0.22973695 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.05145426 = queryNorm
            0.91361165 = fieldWeight in 1345, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1345)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This article presents an adaptive learning framework for Phonetic Similarity Modeling (PSM) that supports the automatic construction of transliteration lexicons. The learning algorithm starts with minimum prior knowledge about machine transliteration and acquires knowledge iteratively from the Web. We study the unsupervised learning and the active learning strategies that minimize human supervision in terms of data labeling. The learning process refines the PSM and constructs a transliteration lexicon at the same time. We evaluate the proposed PSM and its learning algorithm through a series of systematic experiments, which show that the proposed framework is reliably effective on two independent databases.

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.04

0.040861502 = product of:
  0.081723005 = sum of:
    0.081723005 = product of:
      0.24516901 = sum of:
        0.24516901 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.24516901 = score(doc=862,freq=2.0), product of:
            0.4362298 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.05145426 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.03
```
0.034351375 = product of:
  0.06870275 = sum of:
    0.06870275 = product of:
      0.1374055 = sum of:
        0.1374055 = weight(_text_:learning in 1595) [ClassicSimilarity], result of:
          0.1374055 = score(doc=1595,freq=6.0), product of:
            0.22973695 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.05145426 = queryNorm
            0.59809923 = fieldWeight in 1595, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1595)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based an the divide-and-conquer principle. The method is evaluated using backpropagation neural networks, such as the machine learning algorithm, that leam to assign MeSH categories to a subset of MEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers, are provided. The results indicate that the use of hierarchical structures improves Performance significantly.
Wong, W.; Liu, W.; Bennamoun, M.: Ontology learning from text : a look back and into the future (2010) 0.03
```
0.034351375 = product of:
  0.06870275 = sum of:
    0.06870275 = product of:
      0.1374055 = sum of:
        0.1374055 = weight(_text_:learning in 4733) [ClassicSimilarity], result of:
          0.1374055 = score(doc=4733,freq=6.0), product of:
            0.22973695 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.05145426 = queryNorm
            0.59809923 = fieldWeight in 4733, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4733)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Ontologies are often viewed as the answer to the need for inter-operable semantics in modern information systems. The explosion of textual information on the "Read/Write" Web coupled with the increasing demand for ontologies to power the Semantic Web have made (semi-)automatic ontology learning from text a very promising research area. This together with the advanced state in related areas such as natural language processing have fuelled research into ontology learning over the past decade. This survey looks at how far we have come since the turn of the millennium, and discusses the remaining challenges that will define the research directions in this area in the near future.
Sebastiani, F.: Machine learning in automated text categorization (2002) 0.03
```
0.03399904 = product of:
  0.06799808 = sum of:
    0.06799808 = product of:
      0.13599616 = sum of:
        0.13599616 = weight(_text_:learning in 3389) [ClassicSimilarity], result of:
          0.13599616 = score(doc=3389,freq=8.0), product of:
            0.22973695 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.05145426 = queryNorm
            0.59196466 = fieldWeight in 3389, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.046875 = fieldNorm(doc=3389)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based an machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

Perera, P.; Witte, R.: ¬A self-learning context-aware lemmatizer for German (2005) 0.03

0.032054603 = product of:
  0.064109206 = sum of:
    0.064109206 = product of:
      0.12821841 = sum of:
        0.12821841 = weight(_text_:learning in 4638) [ClassicSimilarity], result of:
          0.12821841 = score(doc=4638,freq=4.0), product of:
            0.22973695 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.05145426 = queryNorm
            0.55810964 = fieldWeight in 4638, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0625 = fieldNorm(doc=4638)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Accurate lemmatization of German nouns mandates the use of a lexicon. Comprehensive lexicons, however, are expensive to build and maintain. We present a self-learning lemmatizer capable of automatically creating a full-form lexicon by processing German documents.

Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.03
```
0.03203262 = product of:
  0.06406524 = sum of:
    0.06406524 = sum of:
      0.039665546 = weight(_text_:learning in 1616) [ClassicSimilarity], result of:
        0.039665546 = score(doc=1616,freq=2.0), product of:
          0.22973695 = queryWeight, product of:
            4.464877 = idf(docFreq=1382, maxDocs=44218)
            0.05145426 = queryNorm
          0.17265636 = fieldWeight in 1616, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.464877 = idf(docFreq=1382, maxDocs=44218)
            0.02734375 = fieldNorm(doc=1616)
      0.024399696 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
        0.024399696 = score(doc=1616,freq=2.0), product of:
          0.18018405 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05145426 = queryNorm
          0.1354154 = fieldWeight in 1616, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.02734375 = fieldNorm(doc=1616)
  0.5 = coord(1/2)
```
Abstract

The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.

Footnote

Teil eines Themenheftes: "Web retrieval and mining: A machine learning perspective"
Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y.: Lexical data augmentation for sentiment analysis (2021) 0.03
```
0.031676736 = product of:
  0.06335347 = sum of:
    0.06335347 = product of:
      0.12670694 = sum of:
        0.12670694 = weight(_text_:learning in 392) [ClassicSimilarity], result of:
          0.12670694 = score(doc=392,freq=10.0), product of:
            0.22973695 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.05145426 = queryNorm
            0.55153054 = fieldWeight in 392, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0390625 = fieldNorm(doc=392)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Machine learning methods, especially deep learning models, have achieved impressive performance in various natural language processing tasks including sentiment analysis. However, deep learning models are more demanding for training data. Data augmentation techniques are widely used to generate new instances based on modifications to existing data or relying on external knowledge bases to address annotated data scarcity, which hinders the full potential of machine learning techniques. This paper presents our work using part-of-speech (POS) focused lexical substitution for data augmentation (PLSDA) to enhance the performance of machine learning algorithms in sentiment analysis. We exploit POS information to identify words to be replaced and investigate different augmentation strategies to find semantically related substitutions when generating new instances. The choice of POS tags as well as a variety of strategies such as semantic-based substitution methods and sampling methods are discussed in detail. Performance evaluation focuses on the comparison between PLSDA and two previous lexical substitution-based data augmentation methods, one of which is thesaurus-based, and the other is lexicon manipulation based. Our approach is tested on five English sentiment analysis benchmarks: SST-2, MR, IMDB, Twitter, and AirRecord. Hyperparameters such as the candidate similarity threshold and number of newly generated instances are optimized. Results show that six classifiers (SVM, LSTM, BiLSTM-AT, bidirectional encoder representations from transformers [BERT], XLNet, and RoBERTa) trained with PLSDA achieve accuracy improvement of more than 0.6% comparing to two previous lexical substitution methods averaged on five benchmarks. Introducing POS constraint and well-designed augmentation strategies can improve the reliability of lexical data augmentation methods. Consequently, PLSDA significantly improves the performance of sentiment analysis algorithms.
Zhai, X.: ChatGPT user experience: : implications for education (2022) 0.03
```
0.031676736 = product of:
  0.06335347 = sum of:
    0.06335347 = product of:
      0.12670694 = sum of:
        0.12670694 = weight(_text_:learning in 849) [ClassicSimilarity], result of:
          0.12670694 = score(doc=849,freq=10.0), product of:
            0.22973695 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.05145426 = queryNorm
            0.55153054 = fieldWeight in 849, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0390625 = fieldNorm(doc=849)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

ChatGPT, a general-purpose conversation chatbot released on November 30, 2022, by OpenAI, is expected to impact every aspect of society. However, the potential impacts of this NLP tool on education remain unknown. Such impact can be enormous as the capacity of ChatGPT may drive changes to educational learning goals, learning activities, and assessment and evaluation practices. This study was conducted by piloting ChatGPT to write an academic paper, titled Artificial Intelligence for Education (see Appendix A). The piloting result suggests that ChatGPT is able to help researchers write a paper that is coherent, (partially) accurate, informative, and systematic. The writing is extremely efficient (2-3 hours) and involves very limited professional knowledge from the author. Drawing upon the user experience, I reflect on the potential impacts of ChatGPT, as well as similar AI tools, on education. The paper concludes by suggesting adjusting learning goals-students should be able to use AI tools to conduct subject-domain tasks and education should focus on improving students' creativity and critical thinking rather than general skills. To accomplish the learning goals, researchers should design AI-involved learning tasks to engage students in solving real-world problems. ChatGPT also raises concerns that students may outsource assessment tasks. This paper concludes that new formats of assessments are needed to focus on creativity and critical thinking that AI cannot substitute.
Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.03
```
0.029444033 = product of:
  0.058888067 = sum of:
    0.058888067 = product of:
      0.11777613 = sum of:
        0.11777613 = weight(_text_:learning in 3390) [ClassicSimilarity], result of:
          0.11777613 = score(doc=3390,freq=6.0), product of:
            0.22973695 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.05145426 = queryNorm
            0.51265645 = fieldWeight in 3390, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.046875 = fieldNorm(doc=3390)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to 1960. Until the late '80s, the dominant approach to the problem involved knowledge-engineering automatic categorisers, i.e. manually building a set of rules encoding expert knowledge an how to classify documents. In the '90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest. A newer paradigm based an machine learning has superseded the previous approach. Within this paradigm, a general inductive process automatically builds a classifier by "learning", from a set of previously classified documents, the characteristics of one or more categories; the advantages are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this tutorial we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues of document indexing, classifier construction, and classifier evaluation, will be touched upon.
Cimiano, P.; Völker, J.; Studer, R.: Ontologies on demand? : a description of the state-of-the-art, applications, challenges and trends for ontology learning from text (2006) 0.03
```
0.029444033 = product of:
  0.058888067 = sum of:
    0.058888067 = product of:
      0.11777613 = sum of:
        0.11777613 = weight(_text_:learning in 6014) [ClassicSimilarity], result of:
          0.11777613 = score(doc=6014,freq=6.0), product of:
            0.22973695 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.05145426 = queryNorm
            0.51265645 = fieldWeight in 6014, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.046875 = fieldNorm(doc=6014)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Ontologies are nowadays used for many applications requiring data, services and resources in general to be interoperable and machine understandable. Such applications are for example web service discovery and composition, information integration across databases, intelligent search, etc. The general idea is that data and services are semantically described with respect to ontologies, which are formal specifications of a domain of interest, and can thus be shared and reused in a way such that the shared meaning specified by the ontology remains formally the same across different parties and applications. As the cost of creating ontologies is relatively high, different proposals have emerged for learning ontologies from structured and unstructured resources. In this article we examine the maturity of techniques for ontology learning from textual resources, addressing the question whether the state-of-the-art is mature enough to produce ontologies 'on demand'.

Rettinger, A.; Schumilin, A.; Thoma, S.; Ell, B.: Learning a cross-lingual semantic representation of relations expressed in text (2015) 0.03

0.028332531 = product of:
  0.056665063 = sum of:
    0.056665063 = product of:
      0.113330126 = sum of:
        0.113330126 = weight(_text_:learning in 2027) [ClassicSimilarity], result of:
          0.113330126 = score(doc=2027,freq=2.0), product of:
            0.22973695 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.05145426 = queryNorm
            0.49330387 = fieldWeight in 2027, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.078125 = fieldNorm(doc=2027)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Tao, J.; Zhou, L.; Hickey, K.: Making sense of the black-boxes : toward interpretable text classification using deep learning models (2023) 0.03
```
0.028332531 = product of:
  0.056665063 = sum of:
    0.056665063 = product of:
      0.113330126 = sum of:
        0.113330126 = weight(_text_:learning in 990) [ClassicSimilarity], result of:
          0.113330126 = score(doc=990,freq=8.0), product of:
            0.22973695 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.05145426 = queryNorm
            0.49330387 = fieldWeight in 990, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0390625 = fieldNorm(doc=990)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Text classification is a common task in data science. Despite the superior performances of deep learning based models in various text classification tasks, their black-box nature poses significant challenges for wide adoption. The knowledge-to-action framework emphasizes several principles concerning the application and use of knowledge, such as ease-of-use, customization, and feedback. With the guidance of the above principles and the properties of interpretable machine learning, we identify the design requirements for and propose an interpretable deep learning (IDeL) based framework for text classification models. IDeL comprises three main components: feature penetration, instance aggregation, and feature perturbation. We evaluate our implementation of the framework with two distinct case studies: fake news detection and social question categorization. The experiment results provide evidence for the efficacy of IDeL components in enhancing the interpretability of text classification models. Moreover, the findings are generalizable across binary and multi-label, multi-class classification problems. The proposed IDeL framework introduce a unique iField perspective for building trusted models in data science by improving the transparency and access to advanced black-box models.
Gomez, F.: Learning word syntactic subcategorizations interactively (1995) 0.03
```
0.028047776 = product of:
  0.05609555 = sum of:
    0.05609555 = product of:
      0.1121911 = sum of:
        0.1121911 = weight(_text_:learning in 3130) [ClassicSimilarity], result of:
          0.1121911 = score(doc=3130,freq=4.0), product of:
            0.22973695 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.05145426 = queryNorm
            0.48834592 = fieldWeight in 3130, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3130)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Describes learning algorithms that acquire syntactic knowledge for a parser from sample sentences entered by users who have no knowledge of the parser or English syntax. It is shown how the subcategorization of verbs, nouns and adjectives can be inferred from sample sentences entered by end users. Then, if the parser fails to parse a sentence, e.g. Peter knows how to read books, because it has limited knowledge or no knowledge at all about 'know', an interface which incorporates the acquisition algorithms can be activated, and 'know' can be defined by entering some sample sentences, one of which can be one which the parser failed to parse

Warner, A.J.: Natural language processing (1987) 0.03

0.027885368 = product of:
  0.055770736 = sum of:
    0.055770736 = product of:
      0.11154147 = sum of:
        0.11154147 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
          0.11154147 = score(doc=337,freq=2.0), product of:
            0.18018405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05145426 = queryNorm
            0.61904186 = fieldWeight in 337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=337)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Annual review of information science and technology. 22(1987), S.79-108

Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.03
```
0.026236294 = product of:
  0.052472588 = sum of:
    0.052472588 = product of:
      0.104945175 = sum of:
        0.104945175 = weight(_text_:learning in 1536) [ClassicSimilarity], result of:
          0.104945175 = score(doc=1536,freq=14.0), product of:
            0.22973695 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.05145426 = queryNorm
            0.45680583 = fieldWeight in 1536, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1536)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Multiword expressions (MWEs) are lexical items that can be decomposed into single words and display lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasy (Sag et al., 2002; Kim, 2008; Calzolari et al., 2002). The proper treatment of multiword expressions such as rock 'n' roll and make a decision is essential for many natural language processing (NLP) applications like information extraction and retrieval, terminology extraction and machine translation, and it is important to identify multiword expressions in context. For example, in machine translation we must know that MWEs form one semantic unit, hence their parts should not be translated separately. For this, multiword expressions should be identified first in the text to be translated. The chief aim of this thesis is to develop machine learning-based approaches for the automatic detection of different types of multiword expressions in English and Hungarian natural language texts. In our investigations, we pay attention to the characteristics of different types of multiword expressions such as nominal compounds, multiword named entities and light verb constructions, and we apply novel methods to identify MWEs in raw texts. In the thesis it will be demonstrated that nominal compounds and multiword amed entities may require a similar approach for their automatic detection as they behave in the same way from a linguistic point of view. Furthermore, it will be shown that the automatic detection of light verb constructions can be carried out using two effective machine learning-based approaches.
In this thesis, we focused on the automatic detection of multiword expressions in natural language texts. On the basis of the main contributions, we can argue that: - Supervised machine learning methods can be successfully applied for the automatic detection of different types of multiword expressions in natural language texts. - Machine learning-based multiword expression detection can be successfully carried out for English as well as for Hungarian. - Our supervised machine learning-based model was successfully applied to the automatic detection of nominal compounds from English raw texts. - We developed a Wikipedia-based dictionary labeling method to automatically detect English nominal compounds. - A prior knowledge of nominal compounds can enhance Named Entity Recognition, while previously identified named entities can assist the nominal compound identification process. - The machine learning-based method can also provide acceptable results when it was trained on an automatically generated silver standard corpus. - As named entities form one semantic unit and may consist of more than one word and function as a noun, we can treat them in a similar way to nominal compounds. - Our sequence labelling-based tool can be successfully applied for identifying verbal light verb constructions in two typologically different languages, namely English and Hungarian. - Domain adaptation techniques may help diminish the distance between domains in the automatic detection of light verb constructions. - Our syntax-based method can be successfully applied for the full-coverage identification of light verb constructions. As a first step, a data-driven candidate extraction method can be utilized. After, a machine learning approach that makes use of an extended and rich feature set selects LVCs among extracted candidates. - When a precise syntactic parser is available for the actual domain, the full-coverage identification can be performed better. In other cases, the usage of the sequence labeling method is recommended.
Cruz Díaz, N.P.; Maña López, M.J.; Mata Vázquez, J.; Pachón Álvarez, V.: ¬A machine-learning approach to negation and speculation detection in clinical texts (2012) 0.02
```
0.024536695 = product of:
  0.04907339 = sum of:
    0.04907339 = product of:
      0.09814678 = sum of:
        0.09814678 = weight(_text_:learning in 283) [ClassicSimilarity], result of:
          0.09814678 = score(doc=283,freq=6.0), product of:
            0.22973695 = queryWeight, product of:
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.05145426 = queryNorm
            0.42721373 = fieldWeight in 283, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.464877 = idf(docFreq=1382, maxDocs=44218)
              0.0390625 = fieldNorm(doc=283)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Detecting negative and speculative information is essential in most biomedical text-mining tasks where these language forms are used to express impressions, hypotheses, or explanations of experimental results. Our research is focused on developing a system based on machine-learning techniques that identifies negation and speculation signals and their scope in clinical texts. The proposed system works in two consecutive phases: first, a classifier decides whether each token in a sentence is a negation/speculation signal or not. Then another classifier determines, at sentence level, the tokens which are affected by the signals previously identified. The system was trained and evaluated on the clinical texts of the BioScope corpus, a freely available resource consisting of medical and biological texts: full-length articles, scientific abstracts, and clinical reports. The results obtained by our system were compared with those of two different systems, one based on regular expressions and the other based on machine learning. Our system's results outperformed the results obtained by these two systems. In the signal detection task, the F-score value was 97.3% in negation and 94.9% in speculation. In the scope-finding task, a token was correctly classified if it had been properly identified as being inside or outside the scope of all the negation signals present in the sentence. Our proposal showed an F score of 93.2% in negation and 80.9% in speculation. Additionally, the percentage of correct scopes (those with all their tokens correctly classified) was evaluated obtaining F scores of 90.9% in negation and 71.9% in speculation.

Search (102 results, page 1 of 6)

Authors

Years

Languages

Types

Themes

Subjects

Classifications