Search (2 results, page 1 of 1)

Did you mean:
editor's%3a%22Ohly%2c P.%2c G. rahmstorf u. A. sigel%22 2
editor's%3a%22Ohly%2c P.%2c G. rahmstorf u. A. siegel%22 2
editor's%3a%22Ohly%2c P.%2c G. rahmtorf u. A. sigel%22 2
editores%3a%22Ohly%2c P.%2c G. rahmstorf u. A. sigel%22 2
editores%3a%22Ohly%2c P.%2c G. rahmstorf u. A. siegel%22 2

Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; Agarwal, S.; Herbert-Voss, A.; Krueger, G.; Henighan, T.; Child, R.; Ramesh, A.; Ziegler, D.M.; Wu, J.; Winter, C.; Hesse, C.; Chen, M.; Sigler, E.; Litwin, M.; Gray, S.; Chess, B.; Clark, J.; Berner, C.; McCandlish, S.; Radford, A.; Sutskever, I.; Amodei, D.: Language models are few-shot learners (2020) 0.02
```
0.023644082 = product of:
  0.055169523 = sum of:
    0.014966144 = product of:
      0.029932288 = sum of:
        0.029932288 = weight(_text_:p in 872) [ClassicSimilarity], result of:
          0.029932288 = score(doc=872,freq=4.0), product of:
            0.13319843 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.03704574 = queryNorm
            0.22471954 = fieldWeight in 872, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.03125 = fieldNorm(doc=872)
      0.5 = coord(1/2)
    0.03266309 = weight(_text_:g in 872) [ClassicSimilarity], result of:
      0.03266309 = score(doc=872,freq=4.0), product of:
        0.13914184 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03704574 = queryNorm
        0.23474671 = fieldWeight in 872, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03125 = fieldNorm(doc=872)
    0.007540288 = weight(_text_:a in 872) [ClassicSimilarity], result of:
      0.007540288 = score(doc=872,freq=24.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.17652355 = fieldWeight in 872, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03125 = fieldNorm(doc=872)
  0.42857143 = coord(3/7)
```
Abstract

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

Type

a
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amode, D.; Sutskever, I.: Language models are unsupervised multitask learners 0.00
```
0.0013192756 = product of:
  0.0092349285 = sum of:
    0.0092349285 = weight(_text_:a in 871) [ClassicSimilarity], result of:
      0.0092349285 = score(doc=871,freq=16.0), product of:
        0.04271548 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03704574 = queryNorm
        0.2161963 = fieldWeight in 871, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=871)
  0.14285715 = coord(1/7)
```
Abstract

Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on task-specific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText. When conditioned on a document plus questions, the answers generated by the language model reach 55 F1 on the CoQA dataset - matching or exceeding the performance of 3 out of 4 baseline systems without using the 127,000+ training examples. The capacity of the language model is essential to the success of zero-shot task transfer and increasing it improves performance in a log-linear fashion across tasks. Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits WebText. Samples from the model reflect these improvements and contain coherent paragraphs of text. These findings suggest a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Type

a

Search (2 results, page 1 of 1)

Authors