Search (16 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × type_ss:"el"
  1. Bird, S.; Dale, R.; Dorr, B.; Gibson, B.; Joseph, M.; Kan, M.-Y.; Lee, D.; Powley, B.; Radev, D.; Tan, Y.F.: ¬The ACL Anthology Reference Corpus : a reference dataset for bibliographic research in computational linguistics (2008) 0.01
    0.008536155 = product of:
      0.042680774 = sum of:
        0.042680774 = weight(_text_:bibliographic in 2804) [ClassicSimilarity], result of:
          0.042680774 = score(doc=2804,freq=4.0), product of:
            0.17541347 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.04505818 = queryNorm
            0.24331525 = fieldWeight in 2804, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.03125 = fieldNorm(doc=2804)
      0.2 = coord(1/5)
    
    Abstract
    The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Anthology that can be used for research in scholarly document processing. This corpus, which we call the ACL Anthology Reference Corpus (ACL ARC), brings together the recent activities of a number of research groups around the world. Our goal is to make the corpus widely available, and to encourage other researchers to use it as a standard testbed for experiments in both bibliographic and bibliometric research.
  2. Hausser, R.: Language and nonlanguage cognition (2021) 0.01
    0.0079016285 = product of:
      0.03950814 = sum of:
        0.03950814 = product of:
          0.07901628 = sum of:
            0.07901628 = weight(_text_:data in 255) [ClassicSimilarity], result of:
              0.07901628 = score(doc=255,freq=14.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.55459267 = fieldWeight in 255, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.046875 = fieldNorm(doc=255)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    A basic distinction in agent-based data-driven Database Semantics (DBS) is between language and nonlanguage cognition. Language cognition transfers content between agents by means of raw data. Nonlanguage cognition maps between content and raw data inside the focus agent. {\it Recognition} applies a concept type to raw data, resulting in a concept token. In language recognition, the focus agent (hearer) takes raw language-data (surfaces) produced by another agent (speaker) as input, while nonlanguage recognition takes raw nonlanguage-data as input. In either case, the output is a content which is stored in the agent's onboard short term memory. {\it Action} adapts a concept type to a purpose, resulting in a token. In language action, the focus agent (speaker) produces language-dependent surfaces for another agent (hearer), while nonlanguage action produces intentions for a nonlanguage purpose. In either case, the output is raw action data. As long as the procedural implementation of place holder values works properly, it is compatible with the DBS requirement of input-output equivalence between the natural prototype and the artificial reconstruction.
  3. Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.01
    0.0073257135 = product of:
      0.036628567 = sum of:
        0.036628567 = product of:
          0.07325713 = sum of:
            0.07325713 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
              0.07325713 = score(doc=4888,freq=2.0), product of:
                0.15778607 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04505818 = queryNorm
                0.46428138 = fieldWeight in 4888, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4888)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    1. 3.2013 14:56:22
  4. Lezius, W.: Morphy - Morphologie und Tagging für das Deutsche (2013) 0.00
    0.004883809 = product of:
      0.024419045 = sum of:
        0.024419045 = product of:
          0.04883809 = sum of:
            0.04883809 = weight(_text_:22 in 1490) [ClassicSimilarity], result of:
              0.04883809 = score(doc=1490,freq=2.0), product of:
                0.15778607 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04505818 = queryNorm
                0.30952093 = fieldWeight in 1490, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1490)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    22. 3.2015 9:30:24
  5. Bager, J.: ¬Die Text-KI ChatGPT schreibt Fachtexte, Prosa, Gedichte und Programmcode (2023) 0.00
    0.004883809 = product of:
      0.024419045 = sum of:
        0.024419045 = product of:
          0.04883809 = sum of:
            0.04883809 = weight(_text_:22 in 835) [ClassicSimilarity], result of:
              0.04883809 = score(doc=835,freq=2.0), product of:
                0.15778607 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04505818 = queryNorm
                0.30952093 = fieldWeight in 835, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=835)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    29.12.2022 18:22:55
  6. Rieger, F.: Lügende Computer (2023) 0.00
    0.004883809 = product of:
      0.024419045 = sum of:
        0.024419045 = product of:
          0.04883809 = sum of:
            0.04883809 = weight(_text_:22 in 912) [ClassicSimilarity], result of:
              0.04883809 = score(doc=912,freq=2.0), product of:
                0.15778607 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04505818 = queryNorm
                0.30952093 = fieldWeight in 912, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=912)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    16. 3.2023 19:22:55
  7. Griffiths, T.L.; Steyvers, M.: ¬A probabilistic approach to semantic representation (2002) 0.00
    0.003982046 = product of:
      0.01991023 = sum of:
        0.01991023 = product of:
          0.03982046 = sum of:
            0.03982046 = weight(_text_:data in 3671) [ClassicSimilarity], result of:
              0.03982046 = score(doc=3671,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.2794884 = fieldWeight in 3671, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3671)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Semantic networks produced from human data have statistical properties that cannot be easily captured by spatial representations. We explore a probabilistic approach to semantic representation that explicitly models the probability with which words occurin diffrent contexts, and hence captures the probabilistic relationships between words. We show that this representation has statistical properties consistent with the large-scale structure of semantic networks constructed by humans, and trace the origins of these properties.
  8. Franke-Maier, M.: Computerlinguistik und Bibliotheken : Editorial (2016) 0.00
    0.0035196648 = product of:
      0.017598324 = sum of:
        0.017598324 = product of:
          0.035196647 = sum of:
            0.035196647 = weight(_text_:data in 3206) [ClassicSimilarity], result of:
              0.035196647 = score(doc=3206,freq=4.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.24703519 = fieldWeight in 3206, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3206)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Vor 50 Jahren, im Februar 1966, wies Floyd M. Cammack auf den Zusammenhang von "Linguistics and Libraries" hin. Er ging dabei von dem Eintrag für "Linguistics" in den Library of Congress Subject Headings (LCSH) von 1957 aus, der als Verweis "See Language and Languages; Philology; Philology, Comparative" enthielt. Acht Jahre später kamen unter dem Schlagwort "Language and Languages" Ergänzungen wie "language data processing", "automatic indexing", "machine translation" und "psycholinguistics" hinzu. Für Cammack zeigt sich hier ein Netz komplexer Wechselbeziehungen, die unter dem Begriff "Linguistics" zusammengefasst werden sollten. Dieses System habe wichtigen Einfluss auf alle, die mit dem Sammeln, Organisieren, Speichern und Wiederauffinden von Informationen befasst seien. (Cammack 1966:73). Hier liegt - im übertragenen Sinne - ein Heft vor Ihnen, in dem es um computerlinguistische Verfahren in Bibliotheken geht. Letztlich geht es um eine Versachlichung der Diskussion, um den Stellenwert der Inhaltserschliessung und die Rekalibrierung ihrer Wertschätzung in Zeiten von Mega-Indizes und Big Data. Der derzeitige Widerspruch zwischen dem Wunsch nach relevanter Treffermenge in Rechercheoberflächen vs. der Erfahrung des Relevanz-Rankings ist zu lösen. Explizit auch die Frage, wie oft wir von letzterem enttäuscht wurden und was zu tun ist, um das Verhältnis von recall und precision wieder in ein angebrachtes Gleichgewicht zu bringen. Unsere Nutzerinnen und Nutzer werden es uns danken.
  9. Altmann, E.G.; Cristadoro, G.; Esposti, M.D.: On the origin of long-range correlations in texts (2012) 0.00
    0.0029865343 = product of:
      0.014932672 = sum of:
        0.014932672 = product of:
          0.029865343 = sum of:
            0.029865343 = weight(_text_:data in 330) [ClassicSimilarity], result of:
              0.029865343 = score(doc=330,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.2096163 = fieldWeight in 330, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.046875 = fieldNorm(doc=330)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    The complexity of human interactions with social and natural phenomena is mirrored in the way we describe our experiences through natural language. In order to retain and convey such a high dimensional information, the statistical properties of our linguistic output has to be highly correlated in time. An example are the robust observations, still largely not understood, of correlations on arbitrary long scales in literary texts. In this paper we explain how long-range correlations flow from highly structured linguistic levels down to the building blocks of a text (words, letters, etc..). By combining calculations and data analysis we show that correlations take form of a bursty sequence of events once we approach the semantically relevant topics of the text. The mechanisms we identify are fairly general and can be equally applied to other hierarchical settings.
  10. Snajder, J.; Almic, P.: Modeling semantic compositionality of Croatian multiword expressions (2015) 0.00
    0.0029865343 = product of:
      0.014932672 = sum of:
        0.014932672 = product of:
          0.029865343 = sum of:
            0.029865343 = weight(_text_:data in 2920) [ClassicSimilarity], result of:
              0.029865343 = score(doc=2920,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.2096163 = fieldWeight in 2920, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2920)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Content
    Vgl. unter: http://takelab.fer.hr/data/cromwesc/. The dataset is available from here: TakeLab-CroMWEsc.tar.gz. The archive contains one file, which contains a list of 200 Croatian multiword expressions annotated with semantic compositionality scores. Twenty expressions were annotated by 24 annotators (denoted by "*") and the rest of them were annotated by 6 annotators. Besides median, we provide mode, mean, and standard deviation for each expression. Consult the above mentioned paper for details.
  11. Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I.: Improving language understanding by Generative Pre-Training 0.00
    0.0029865343 = product of:
      0.014932672 = sum of:
        0.014932672 = product of:
          0.029865343 = sum of:
            0.029865343 = weight(_text_:data in 870) [ClassicSimilarity], result of:
              0.029865343 = score(doc=870,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.2096163 = fieldWeight in 870, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.046875 = fieldNorm(doc=870)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. For instance, we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on question answering (RACE), and 1.5% on textual entailment (MultiNLI).
  12. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I.: Attention Is all you need (2017) 0.00
    0.0029865343 = product of:
      0.014932672 = sum of:
        0.014932672 = product of:
          0.029865343 = sum of:
            0.029865343 = weight(_text_:data in 970) [ClassicSimilarity], result of:
              0.029865343 = score(doc=970,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.2096163 = fieldWeight in 970, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.046875 = fieldNorm(doc=970)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
  13. Jha, A.: Why GPT-4 isn't all it's cracked up to be (2023) 0.00
    0.0024637652 = product of:
      0.012318826 = sum of:
        0.012318826 = product of:
          0.024637653 = sum of:
            0.024637653 = weight(_text_:data in 923) [ClassicSimilarity], result of:
              0.024637653 = score(doc=923,freq=4.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17292464 = fieldWeight in 923, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=923)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    They might appear intelligent, but LLMs are nothing of the sort. They don't understand the meanings of the words they are using, nor the concepts expressed within the sentences they create. When asked how to bring a cow back to life, earlier versions of ChatGPT, for example, which ran on a souped-up version of GPT-3, would confidently provide a list of instructions. So-called hallucinations like this happen because language models have no concept of what a "cow" is or that "death" is a non-reversible state of being. LLMs do not have minds that can think about objects in the world and how they relate to each other. All they "know" is how likely it is that some sets of words will follow other sets of words, having calculated those probabilities from their training data. To make sense of all this, I spoke with Gary Marcus, an emeritus professor of psychology and neural science at New York University, for "Babbage", our science and technology podcast. Last year, as the world was transfixed by the sudden appearance of ChatGPT, he made some fascinating predictions about GPT-4.
    People use symbols to think about the world: if I say the words "cat", "house" or "aeroplane", you know instantly what I mean. Symbols can also be used to describe the way things are behaving (running, falling, flying) or they can represent how things should behave in relation to each other (a "+" means add the numbers before and after). Symbolic AI is a way to embed this human knowledge and reasoning into computer systems. Though the idea has been around for decades, it fell by the wayside a few years ago as deep learning-buoyed by the sudden easy availability of lots of training data and cheap computing power-became more fashionable. In the near future at least, there's no doubt people will find LLMs useful. But whether they represent a critical step on the path towards AGI, or rather just an intriguing detour, remains to be seen."
  14. Rötzer, F.: KI-Programm besser als Menschen im Verständnis natürlicher Sprache (2018) 0.00
    0.0024419045 = product of:
      0.012209523 = sum of:
        0.012209523 = product of:
          0.024419045 = sum of:
            0.024419045 = weight(_text_:22 in 4217) [ClassicSimilarity], result of:
              0.024419045 = score(doc=4217,freq=2.0), product of:
                0.15778607 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04505818 = queryNorm
                0.15476047 = fieldWeight in 4217, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4217)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    22. 1.2018 11:32:44
  15. Spitkovsky, V.; Norvig, P.: From words to concepts and back : dictionaries for linking text, entities and ideas (2012) 0.00
    0.001991023 = product of:
      0.009955115 = sum of:
        0.009955115 = product of:
          0.01991023 = sum of:
            0.01991023 = weight(_text_:data in 337) [ClassicSimilarity], result of:
              0.01991023 = score(doc=337,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.1397442 = fieldWeight in 337, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.03125 = fieldNorm(doc=337)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Human language is both rich and ambiguous. When we hear or read words, we resolve meanings to mental representations, for example recognizing and linking names to the intended persons, locations or organizations. Bridging words and meaning - from turning search queries into relevant results to suggesting targeted keywords for advertisers - is also Google's core competency, and important for many other tasks in information retrieval and natural language processing. We are happy to release a resource, spanning 7,560,141 concepts and 175,100,788 unique text strings, that we hope will help everyone working in these areas. How do we represent concepts? Our approach piggybacks on the unique titles of entries from an encyclopedia, which are mostly proper and common noun phrases. We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia's groupings of articles into hierarchical categories. The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article's canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept's url. Our database thus includes weights that measure degrees of association. For example, the top two entries for football indicate that it is an ambiguous term, which is almost twice as likely to refer to what we in the US call soccer. Vgl. auch: Spitkovsky, V.I., A.X. Chang: A cross-lingual dictionary for english Wikipedia concepts. In: http://nlp.stanford.edu/pubs/crosswikis.pdf.
  16. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.00
    0.001742145 = product of:
      0.008710725 = sum of:
        0.008710725 = product of:
          0.01742145 = sum of:
            0.01742145 = weight(_text_:data in 1536) [ClassicSimilarity], result of:
              0.01742145 = score(doc=1536,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.12227618 = fieldWeight in 1536, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=1536)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    In this thesis, we focused on the automatic detection of multiword expressions in natural language texts. On the basis of the main contributions, we can argue that: - Supervised machine learning methods can be successfully applied for the automatic detection of different types of multiword expressions in natural language texts. - Machine learning-based multiword expression detection can be successfully carried out for English as well as for Hungarian. - Our supervised machine learning-based model was successfully applied to the automatic detection of nominal compounds from English raw texts. - We developed a Wikipedia-based dictionary labeling method to automatically detect English nominal compounds. - A prior knowledge of nominal compounds can enhance Named Entity Recognition, while previously identified named entities can assist the nominal compound identification process. - The machine learning-based method can also provide acceptable results when it was trained on an automatically generated silver standard corpus. - As named entities form one semantic unit and may consist of more than one word and function as a noun, we can treat them in a similar way to nominal compounds. - Our sequence labelling-based tool can be successfully applied for identifying verbal light verb constructions in two typologically different languages, namely English and Hungarian. - Domain adaptation techniques may help diminish the distance between domains in the automatic detection of light verb constructions. - Our syntax-based method can be successfully applied for the full-coverage identification of light verb constructions. As a first step, a data-driven candidate extraction method can be utilized. After, a machine learning approach that makes use of an extended and rich feature set selects LVCs among extracted candidates. - When a precise syntactic parser is available for the actual domain, the full-coverage identification can be performed better. In other cases, the usage of the sequence labeling method is recommended.