Search (56 results, page 1 of 3)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.10

0.100861 = sum of:
  0.08030887 = product of:
    0.24092661 = sum of:
      0.24092661 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.24092661 = score(doc=562,freq=2.0), product of:
          0.42868128 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.050563898 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.02055213 = product of:
    0.04110426 = sum of:
      0.04110426 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.04110426 = score(doc=562,freq=2.0), product of:
          0.17706616 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050563898 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.5 = coord(1/2)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.04

0.040154435 = product of:
  0.08030887 = sum of:
    0.08030887 = product of:
      0.24092661 = sum of:
        0.24092661 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.24092661 = score(doc=862,freq=2.0), product of:
            0.42868128 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.050563898 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Savoy, J.: Searching strategies for the Hungarian language (2008) 0.04
```
0.039527763 = product of:
  0.079055525 = sum of:
    0.079055525 = product of:
      0.15811105 = sum of:
        0.15811105 = weight(_text_:light in 2037) [ClassicSimilarity], result of:
          0.15811105 = score(doc=2037,freq=4.0), product of:
            0.2920221 = queryWeight, product of:
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.050563898 = queryNorm
            0.5414352 = fieldWeight in 2037, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.046875 = fieldNorm(doc=2037)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper reports on the underlying IR problems encountered when dealing with the complex morphology and compound constructions found in the Hungarian language. It describes evaluations carried out on two general stemming strategies for this language, and also demonstrates that a light stemming approach could be quite effective. Based on searches done on the CLEF test collection, we find that a more aggressive suffix-stripping approach may produce better MAP. When compared to an IR scheme without stemming or one based on only a light stemmer, we find the differences to be statistically significant. When compared with probabilistic, vector-space and language models, we find that the Okapi model results in the best retrieval effectiveness. The resulting MAP is found to be about 35% better than the classical tf idf approach, particularly for very short requests. Finally, we demonstrate that applying an automatic decompounding procedure for both queries and documents significantly improves IR performance (+10%), compared to word-based indexing strategies.
Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.04
```
0.03645768 = product of:
  0.07291536 = sum of:
    0.07291536 = product of:
      0.14583072 = sum of:
        0.14583072 = weight(_text_:light in 1536) [ClassicSimilarity], result of:
          0.14583072 = score(doc=1536,freq=10.0), product of:
            0.2920221 = queryWeight, product of:
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.050563898 = queryNorm
            0.49938247 = fieldWeight in 1536, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1536)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Multiword expressions (MWEs) are lexical items that can be decomposed into single words and display lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasy (Sag et al., 2002; Kim, 2008; Calzolari et al., 2002). The proper treatment of multiword expressions such as rock 'n' roll and make a decision is essential for many natural language processing (NLP) applications like information extraction and retrieval, terminology extraction and machine translation, and it is important to identify multiword expressions in context. For example, in machine translation we must know that MWEs form one semantic unit, hence their parts should not be translated separately. For this, multiword expressions should be identified first in the text to be translated. The chief aim of this thesis is to develop machine learning-based approaches for the automatic detection of different types of multiword expressions in English and Hungarian natural language texts. In our investigations, we pay attention to the characteristics of different types of multiword expressions such as nominal compounds, multiword named entities and light verb constructions, and we apply novel methods to identify MWEs in raw texts. In the thesis it will be demonstrated that nominal compounds and multiword amed entities may require a similar approach for their automatic detection as they behave in the same way from a linguistic point of view. Furthermore, it will be shown that the automatic detection of light verb constructions can be carried out using two effective machine learning-based approaches.
In this thesis, we focused on the automatic detection of multiword expressions in natural language texts. On the basis of the main contributions, we can argue that: - Supervised machine learning methods can be successfully applied for the automatic detection of different types of multiword expressions in natural language texts. - Machine learning-based multiword expression detection can be successfully carried out for English as well as for Hungarian. - Our supervised machine learning-based model was successfully applied to the automatic detection of nominal compounds from English raw texts. - We developed a Wikipedia-based dictionary labeling method to automatically detect English nominal compounds. - A prior knowledge of nominal compounds can enhance Named Entity Recognition, while previously identified named entities can assist the nominal compound identification process. - The machine learning-based method can also provide acceptable results when it was trained on an automatically generated silver standard corpus. - As named entities form one semantic unit and may consist of more than one word and function as a noun, we can treat them in a similar way to nominal compounds. - Our sequence labelling-based tool can be successfully applied for identifying verbal light verb constructions in two typologically different languages, namely English and Hungarian. - Domain adaptation techniques may help diminish the distance between domains in the automatic detection of light verb constructions. - Our syntax-based method can be successfully applied for the full-coverage identification of light verb constructions. As a first step, a data-driven candidate extraction method can be utilized. After, a machine learning approach that makes use of an extended and rich feature set selects LVCs among extracted candidates. - When a precise syntactic parser is available for the actual domain, the full-coverage identification can be performed better. In other cases, the usage of the sequence labeling method is recommended.
Rayson, P.; Piao, S.; Sharoff, S.; Evert, S.; Moiron, B.V.: Multiword expressions : hard going or plain sailing? (2015) 0.03
```
0.03260874 = product of:
  0.06521748 = sum of:
    0.06521748 = product of:
      0.13043496 = sum of:
        0.13043496 = weight(_text_:light in 2918) [ClassicSimilarity], result of:
          0.13043496 = score(doc=2918,freq=2.0), product of:
            0.2920221 = queryWeight, product of:
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.050563898 = queryNorm
            0.44666123 = fieldWeight in 2918, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2918)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Over the past two decades or so, Multi-Word Expressions (MWEs; also called Multi-word Units) have been an increasingly important concern for Computational Linguistics and Natural Language Processing (NLP). The term MWE has been used to refer to various types of linguistic units and expressions, including idioms, noun compounds, phrasal verbs, light verbs and other habitual collocations. However, while there is no universally agreed definition for MWE as yet, most researchers use the term to refer to those frequently occurring phrasal units which are subject to certain level of semantic opaqueness, or non-compositionality. Non-compositional MWEs pose tough challenges for automatic analysis because their interpretation cannot be achieved by directly combining the semantics of their constituents, thereby causing the "pain in the neck of NLP".
Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.03
```
0.03260874 = product of:
  0.06521748 = sum of:
    0.06521748 = product of:
      0.13043496 = sum of:
        0.13043496 = weight(_text_:light in 1139) [ClassicSimilarity], result of:
          0.13043496 = score(doc=1139,freq=2.0), product of:
            0.2920221 = queryWeight, product of:
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.050563898 = queryNorm
            0.44666123 = fieldWeight in 1139, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.7753086 = idf(docFreq=372, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.

Warner, A.J.: Natural language processing (1987) 0.03

0.027402842 = product of:
  0.054805685 = sum of:
    0.054805685 = product of:
      0.10961137 = sum of:
        0.10961137 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
          0.10961137 = score(doc=337,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.61904186 = fieldWeight in 337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=337)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Annual review of information science and technology. 22(1987), S.79-108

McMahon, J.G.; Smith, F.J.: Improved statistical language model performance with automatic generated word hierarchies (1996) 0.02

0.023977486 = product of:
  0.047954973 = sum of:
    0.047954973 = product of:
      0.095909946 = sum of:
        0.095909946 = weight(_text_:22 in 3164) [ClassicSimilarity], result of:
          0.095909946 = score(doc=3164,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.5416616 = fieldWeight in 3164, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3164)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Computational linguistics. 22(1996) no.2, S.217-248

Ruge, G.: ¬A spreading activation network for automatic generation of thesaurus relationships (1991) 0.02

0.023977486 = product of:
  0.047954973 = sum of:
    0.047954973 = product of:
      0.095909946 = sum of:
        0.095909946 = weight(_text_:22 in 4506) [ClassicSimilarity], result of:
          0.095909946 = score(doc=4506,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.5416616 = fieldWeight in 4506, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=4506)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 8.10.2000 11:52:22

Somers, H.: Example-based machine translation : Review article (1999) 0.02

0.023977486 = product of:
  0.047954973 = sum of:
    0.047954973 = product of:
      0.095909946 = sum of:
        0.095909946 = weight(_text_:22 in 6672) [ClassicSimilarity], result of:
          0.095909946 = score(doc=6672,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.5416616 = fieldWeight in 6672, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6672)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

New tools for human translators (1997) 0.02

0.023977486 = product of:
  0.047954973 = sum of:
    0.047954973 = product of:
      0.095909946 = sum of:
        0.095909946 = weight(_text_:22 in 1179) [ClassicSimilarity], result of:
          0.095909946 = score(doc=1179,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.5416616 = fieldWeight in 1179, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=1179)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

Baayen, R.H.; Lieber, H.: Word frequency distributions and lexical semantics (1997) 0.02

0.023977486 = product of:
  0.047954973 = sum of:
    0.047954973 = product of:
      0.095909946 = sum of:
        0.095909946 = weight(_text_:22 in 3117) [ClassicSimilarity], result of:
          0.095909946 = score(doc=3117,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.5416616 = fieldWeight in 3117, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3117)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 28. 2.1999 10:48:22

¬Der Student aus dem Computer (2023) 0.02

0.023977486 = product of:
  0.047954973 = sum of:
    0.047954973 = product of:
      0.095909946 = sum of:
        0.095909946 = weight(_text_:22 in 1079) [ClassicSimilarity], result of:
          0.095909946 = score(doc=1079,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.5416616 = fieldWeight in 1079, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=1079)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 27. 1.2023 16:22:55

Byrne, C.C.; McCracken, S.A.: ¬An adaptive thesaurus employing semantic distance, relational inheritance and nominal compound interpretation for linguistic support of information retrieval (1999) 0.02

0.02055213 = product of:
  0.04110426 = sum of:
    0.04110426 = product of:
      0.08220852 = sum of:
        0.08220852 = weight(_text_:22 in 4483) [ClassicSimilarity], result of:
          0.08220852 = score(doc=4483,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.46428138 = fieldWeight in 4483, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4483)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 15. 3.2000 10:22:37

Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.02

0.02055213 = product of:
  0.04110426 = sum of:
    0.04110426 = product of:
      0.08220852 = sum of:
        0.08220852 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
          0.08220852 = score(doc=4888,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.46428138 = fieldWeight in 4888, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=4888)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 3.2013 14:56:22

Monnerjahn, P.: Vorsprung ohne Technik : Übersetzen: Computer und Qualität (2000) 0.02

0.02055213 = product of:
  0.04110426 = sum of:
    0.04110426 = product of:
      0.08220852 = sum of:
        0.08220852 = weight(_text_:22 in 5429) [ClassicSimilarity], result of:
          0.08220852 = score(doc=5429,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.46428138 = fieldWeight in 5429, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5429)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: c't. 2000, H.22, S.230-231

Hutchins, J.: From first conception to first demonstration : the nascent years of machine translation, 1947-1954. A chronology (1997) 0.02

0.017126776 = product of:
  0.034253553 = sum of:
    0.034253553 = product of:
      0.068507105 = sum of:
        0.068507105 = weight(_text_:22 in 1463) [ClassicSimilarity], result of:
          0.068507105 = score(doc=1463,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.38690117 = fieldWeight in 1463, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1463)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

Kuhlmann, U.; Monnerjahn, P.: Sprache auf Knopfdruck : Sieben automatische Übersetzungsprogramme im Test (2000) 0.02

0.017126776 = product of:
  0.034253553 = sum of:
    0.034253553 = product of:
      0.068507105 = sum of:
        0.068507105 = weight(_text_:22 in 5428) [ClassicSimilarity], result of:
          0.068507105 = score(doc=5428,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.38690117 = fieldWeight in 5428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=5428)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: c't. 2000, H.22, S.220-229

Lezius, W.; Rapp, R.; Wettler, M.: ¬A morphology-system and part-of-speech tagger for German (1996) 0.02

0.017126776 = product of:
  0.034253553 = sum of:
    0.034253553 = product of:
      0.068507105 = sum of:
        0.068507105 = weight(_text_:22 in 1693) [ClassicSimilarity], result of:
          0.068507105 = score(doc=1693,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.38690117 = fieldWeight in 1693, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1693)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2015 9:37:18

Wanner, L.: Lexical choice in text generation and machine translation (1996) 0.01

0.013701421 = product of:
  0.027402842 = sum of:
    0.027402842 = product of:
      0.054805685 = sum of:
        0.054805685 = weight(_text_:22 in 8521) [ClassicSimilarity], result of:
          0.054805685 = score(doc=8521,freq=2.0), product of:
            0.17706616 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050563898 = queryNorm
            0.30952093 = fieldWeight in 8521, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=8521)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 31. 7.1996 9:22:19

Search (56 results, page 1 of 3)

Authors

Years

Languages

Types

Themes

Subjects

Classifications