Search (2 results, page 1 of 1)

  • × author_ss:"Hesse, C."
  1. Hesse, C.: ¬Das kleine Einmaleins des klaren Denkens : 22 Denkwerkzeuge für ein besseres Leben (2009) 0.02
    0.021275483 = product of:
      0.042550966 = sum of:
        0.042550966 = sum of:
          0.0051048263 = weight(_text_:s in 2137) [ClassicSimilarity], result of:
            0.0051048263 = score(doc=2137,freq=4.0), product of:
              0.05008241 = queryWeight, product of:
                1.0872376 = idf(docFreq=40523, maxDocs=44218)
                0.046063907 = queryNorm
              0.101928525 = fieldWeight in 2137, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                1.0872376 = idf(docFreq=40523, maxDocs=44218)
                0.046875 = fieldNorm(doc=2137)
          0.03744614 = weight(_text_:22 in 2137) [ClassicSimilarity], result of:
            0.03744614 = score(doc=2137,freq=2.0), product of:
              0.16130796 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046063907 = queryNorm
              0.23214069 = fieldWeight in 2137, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2137)
      0.5 = coord(1/2)
    
    Footnote
    Rez. in: Spektrum der Wissenschaft 2010, H.7, S.100-101 (R. Pilous): "Dieses Buch verschafft einem nicht gleich "ein besseres Leben" - aber immerhin eine gute Einführung in klassische Problemlösungsmethoden."
    Pages
    352 S
  2. Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; Agarwal, S.; Herbert-Voss, A.; Krueger, G.; Henighan, T.; Child, R.; Ramesh, A.; Ziegler, D.M.; Wu, J.; Winter, C.; Hesse, C.; Chen, M.; Sigler, E.; Litwin, M.; Gray, S.; Chess, B.; Clark, J.; Berner, C.; McCandlish, S.; Radford, A.; Sutskever, I.; Amodei, D.: Language models are few-shot learners (2020) 0.00
    0.001203219 = product of:
      0.002406438 = sum of:
        0.002406438 = product of:
          0.004812876 = sum of:
            0.004812876 = weight(_text_:s in 872) [ClassicSimilarity], result of:
              0.004812876 = score(doc=872,freq=8.0), product of:
                0.05008241 = queryWeight, product of:
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.046063907 = queryNorm
                0.09609913 = fieldWeight in 872, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.03125 = fieldNorm(doc=872)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.