Search (105 results, page 6 of 6)

  • × theme_ss:"Computerlinguistik"
  1. Deventer, J.P. van; Kruger, C.J.; Johnson, R.D.: Delineating knowledge management through lexical analysis : a retrospective (2015) 0.00
    7.746627E-4 = product of:
      0.006971964 = sum of:
        0.006971964 = product of:
          0.013943928 = sum of:
            0.013943928 = weight(_text_:22 in 3807) [ClassicSimilarity], result of:
              0.013943928 = score(doc=3807,freq=2.0), product of:
                0.10297151 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02940506 = queryNorm
                0.1354154 = fieldWeight in 3807, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=3807)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    
    Date
    20. 1.2015 18:30:22
  2. Yang, C.C.; Li, K.W.: Automatic construction of English/Chinese parallel corpora (2003) 0.00
    7.6892605E-4 = product of:
      0.0069203344 = sum of:
        0.0069203344 = product of:
          0.013840669 = sum of:
            0.013840669 = weight(_text_:web in 1683) [ClassicSimilarity], result of:
              0.013840669 = score(doc=1683,freq=2.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.14422815 = fieldWeight in 1683, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1683)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    
    Abstract
    As the demand for global information increases significantly, multilingual corpora has become a valuable linguistic resource for applications to cross-lingual information retrieval and natural language processing. In order to cross the boundaries that exist between different languages, dictionaries are the most typical tools. However, the general-purpose dictionary is less sensitive in both genre and domain. It is also impractical to manually construct tailored bilingual dictionaries or sophisticated multilingual thesauri for large applications. Corpusbased approaches, which do not have the limitation of dictionaries, provide a statistical translation model with which to cross the language boundary. There are many domain-specific parallel or comparable corpora that are employed in machine translation and cross-lingual information retrieval. Most of these are corpora between Indo-European languages, such as English/French and English/Spanish. The Asian/Indo-European corpus, especially English/Chinese corpus, is relatively sparse. The objective of the present research is to construct English/ Chinese parallel corpus automatically from the World Wide Web. In this paper, an alignment method is presented which is based an dynamic programming to identify the one-to-one Chinese and English title pairs. The method includes alignment at title level, word level and character level. The longest common subsequence (LCS) is applied to find the most reliabie Chinese translation of an English word. As one word for a language may translate into two or more words repetitively in another language, the edit operation, deletion, is used to resolve redundancy. A score function is then proposed to determine the optimal title pairs. Experiments have been conducted to investigate the performance of the proposed method using the daily press release articles by the Hong Kong SAR government as the test bed. The precision of the result is 0.998 while the recall is 0.806. The release articles and speech articles, published by Hongkong & Shanghai Banking Corporation Limited, are also used to test our method, the precision is 1.00, and the recall is 0.948.
  3. Belbachir, F.; Boughanem, M.: Using language models to improve opinion detection (2018) 0.00
    7.6892605E-4 = product of:
      0.0069203344 = sum of:
        0.0069203344 = product of:
          0.013840669 = sum of:
            0.013840669 = weight(_text_:web in 5044) [ClassicSimilarity], result of:
              0.013840669 = score(doc=5044,freq=2.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.14422815 = fieldWeight in 5044, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5044)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    
    Abstract
    Opinion mining is one of the most important research tasks in the information retrieval research community. With the huge volume of opinionated data available on the Web, approaches must be developed to differentiate opinion from fact. In this paper, we present a lexicon-based approach for opinion retrieval. Generally, opinion retrieval consists of two stages: relevance to the query and opinion detection. In our work, we focus on the second state which itself focusses on detecting opinionated documents . We compare the document to be analyzed with opinionated sources that contain subjective information. We hypothesize that a document with a strong similarity to opinionated sources is more likely to be opinionated itself. Typical lexicon-based approaches treat and choose their opinion sources according to their test collection, then calculate the opinion score based on the frequency of subjective terms in the document. In our work, we use different open opinion collections without any specific treatment and consider them as a reference collection. We then use language models to determine opinion scores. The analysis document and reference collection are represented by different language models (i.e., Dirichlet, Jelinek-Mercer and two-stage models). These language models are generally used in information retrieval to represent the relationship between documents and queries. However, in our study, we modify these language models to represent opinionated documents. We carry out several experiments using Text REtrieval Conference (TREC) Blogs 06 as our analysis collection and Internet Movie Data Bases (IMDB), Multi-Perspective Question Answering (MPQA) and CHESLY as our reference collection. To improve opinion detection, we study the impact of using different language models to represent the document and reference collection alongside different combinations of opinion and retrieval scores. We then use this data to deduce the best opinion detection models. Using the best models, our approach improves on the best baseline of TREC Blog (baseline4) by 30%.
  4. Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; Agarwal, S.; Herbert-Voss, A.; Krueger, G.; Henighan, T.; Child, R.; Ramesh, A.; Ziegler, D.M.; Wu, J.; Winter, C.; Hesse, C.; Chen, M.; Sigler, E.; Litwin, M.; Gray, S.; Chess, B.; Clark, J.; Berner, C.; McCandlish, S.; Radford, A.; Sutskever, I.; Amodei, D.: Language models are few-shot learners (2020) 0.00
    7.6892605E-4 = product of:
      0.0069203344 = sum of:
        0.0069203344 = product of:
          0.013840669 = sum of:
            0.013840669 = weight(_text_:web in 872) [ClassicSimilarity], result of:
              0.013840669 = score(doc=872,freq=2.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.14422815 = fieldWeight in 872, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03125 = fieldNorm(doc=872)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    
    Abstract
    Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
  5. Working with conceptual structures : contributions to ICCS 2000. 8th International Conference on Conceptual Structures: Logical, Linguistic, and Computational Issues. Darmstadt, August 14-18, 2000 (2000) 0.00
    6.728103E-4 = product of:
      0.0060552927 = sum of:
        0.0060552927 = product of:
          0.012110585 = sum of:
            0.012110585 = weight(_text_:web in 5089) [ClassicSimilarity], result of:
              0.012110585 = score(doc=5089,freq=2.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.12619963 = fieldWeight in 5089, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=5089)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    
    Content
    Concepts & Language: Knowledge organization by procedures of natural language processing. A case study using the method GABEK (J. Zelger, J. Gadner) - Computer aided narrative analysis using conceptual graphs (H. Schärfe, P. 0hrstrom) - Pragmatic representation of argumentative text: a challenge for the conceptual graph approach (H. Irandoust, B. Moulin) - Conceptual graphs as a knowledge representation core in a complex language learning environment (G. Angelova, A. Nenkova, S. Boycheva, T. Nikolov) - Conceptual Modeling and Ontologies: Relationships and actions in conceptual categories (Ch. Landauer, K.L. Bellman) - Concept approximations for formal concept analysis (J. Saquer, J.S. Deogun) - Faceted information representation (U. Priß) - Simple concept graphs with universal quantifiers (J. Tappe) - A framework for comparing methods for using or reusing multiple ontologies in an application (J. van ZyI, D. Corbett) - Designing task/method knowledge-based systems with conceptual graphs (M. Leclère, F.Trichet, Ch. Choquet) - A logical ontology (J. Farkas, J. Sarbo) - Algorithms and Tools: Fast concept analysis (Ch. Lindig) - A framework for conceptual graph unification (D. Corbett) - Visual CP representation of knowledge (H.D. Pfeiffer, R.T. Hartley) - Maximal isojoin for representing software textual specifications and detecting semantic anomalies (Th. Charnois) - Troika: using grids, lattices and graphs in knowledge acquisition (H.S. Delugach, B.E. Lampkin) - Open world theorem prover for conceptual graphs (J.E. Heaton, P. Kocura) - NetCare: a practical conceptual graphs software tool (S. Polovina, D. Strang) - CGWorld - a web based workbench for conceptual graphs management and applications (P. Dobrev, K. Toutanova) - Position papers: The edition project: Peirce's existential graphs (R. Mülller) - Mining association rules using formal concept analysis (N. Pasquier) - Contextual logic summary (R Wille) - Information channels and conceptual scaling (K.E. Wolff) - Spatial concepts - a rule exploration (S. Rudolph) - The TEXT-TO-ONTO learning environment (A. Mädche, St. Staab) - Controlling the semantics of metadata on audio-visual documents using ontologies (Th. Dechilly, B. Bachimont) - Building the ontological foundations of a terminology from natural language to conceptual graphs with Ribosome, a knowledge extraction system (Ch. Jacquelinet, A. Burgun) - CharGer: some lessons learned and new directions (H.S. Delugach) - Knowledge management using conceptual graphs (W.K. Pun)

Years

Languages

  • e 74
  • d 31
  • m 2
  • More… Less…

Types

  • a 81
  • el 15
  • m 12
  • s 7
  • x 3
  • p 2
  • d 1
  • More… Less…

Classifications