Search (178 results, page 1 of 9)

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.11
```
0.10862944 = sum of:
  0.082218945 = product of:
    0.24665684 = sum of:
      0.24665684 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
        0.24665684 = score(doc=862,freq=2.0), product of:
          0.43887708 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.051766515 = queryNorm
          0.56201804 = fieldWeight in 862, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=862)
    0.33333334 = coord(1/3)
  0.0264105 = product of:
    0.052821 = sum of:
      0.052821 = weight(_text_:language in 862) [ClassicSimilarity], result of:
        0.052821 = score(doc=862,freq=2.0), product of:
          0.2030952 = queryWeight, product of:
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.051766515 = queryNorm
          0.26008 = fieldWeight in 862, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.046875 = fieldNorm(doc=862)
    0.5 = coord(1/2)
```
Abstract

This research revisits the classic Turing test and compares recent large language models such as ChatGPT for their abilities to reproduce human-level comprehension and compelling text generation. Two task challenges- summary and question answering- prompt ChatGPT to produce original content (98-99%) from a single text entry and sequential questions initially posed by Turing in 1950. We score the original and generated content against the OpenAI GPT-2 Output Detector from 2019, and establish multiple cases where the generated content proves original and undetectable (98%). The question of a machine fooling a human judge recedes in this work relative to the question of "how would one prove it?" The original contribution of the work presents a metric and simple grammatical set for understanding the writing mechanics of chatbots in evaluating their readability and statistical clarity, engagement, delivery, overall quality, and plagiarism risks. While Turing's original prose scores at least 14% below the machine-generated output, whether an algorithm displays hints of Turing's true initial thoughts (the "Lovelace 2.0" test) remains unanswerable.

Source

https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Morris, V.: Automated language identification of bibliographic resources (2020) 0.11

0.106795505 = product of:
  0.21359101 = sum of:
    0.21359101 = sum of:
      0.15748182 = weight(_text_:language in 5749) [ClassicSimilarity], result of:
        0.15748182 = score(doc=5749,freq=10.0), product of:
          0.2030952 = queryWeight, product of:
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.051766515 = queryNorm
          0.77540886 = fieldWeight in 5749, product of:
            3.1622777 = tf(freq=10.0), with freq of:
              10.0 = termFreq=10.0
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.0625 = fieldNorm(doc=5749)
      0.056109186 = weight(_text_:22 in 5749) [ClassicSimilarity], result of:
        0.056109186 = score(doc=5749,freq=2.0), product of:
          0.18127751 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051766515 = queryNorm
          0.30952093 = fieldWeight in 5749, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=5749)
  0.5 = coord(1/2)

Abstract: This article describes experiments in the use of machine learning techniques at the British Library to assign language codes to catalog records, in order to provide information about the language of content of the resources described. In the first phase of the project, language codes were assigned to 1.15 million records with 99.7% confidence. The automated language identification tools developed will be used to contribute to future enhancement of over 4 million legacy records.
Date: 2. 3.2020 19:04:22

Luo, L.; Ju, J.; Li, Y.-F.; Haffari, G.; Xiong, B.; Pan, S.: ChatRule: mining logical rules with large language models for knowledge graph reasoning (2023) 0.06
```
0.061551623 = product of:
  0.123103246 = sum of:
    0.123103246 = sum of:
      0.088035 = weight(_text_:language in 1171) [ClassicSimilarity], result of:
        0.088035 = score(doc=1171,freq=8.0), product of:
          0.2030952 = queryWeight, product of:
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.051766515 = queryNorm
          0.4334667 = fieldWeight in 1171, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1171)
      0.03506824 = weight(_text_:22 in 1171) [ClassicSimilarity], result of:
        0.03506824 = score(doc=1171,freq=2.0), product of:
          0.18127751 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051766515 = queryNorm
          0.19345059 = fieldWeight in 1171, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1171)
  0.5 = coord(1/2)
```
Abstract

Logical rules are essential for uncovering the logical connections between relations, which could improve the reasoning performance and provide interpretable results on knowledge graphs (KGs). Although there have been many efforts to mine meaningful logical rules over KGs, existing methods suffer from the computationally intensive searches over the rule space and a lack of scalability for large-scale KGs. Besides, they often ignore the semantics of relations which is crucial for uncovering logical connections. Recently, large language models (LLMs) have shown impressive performance in the field of natural language processing and various applications, owing to their emergent ability and generalizability. In this paper, we propose a novel framework, ChatRule, unleashing the power of large language models for mining logical rules over knowledge graphs. Specifically, the framework is initiated with an LLM-based rule generator, leveraging both the semantic and structural information of KGs to prompt LLMs to generate logical rules. To refine the generated rules, a rule ranking module estimates the rule quality by incorporating facts from existing KGs. Last, a rule validator harnesses the reasoning ability of LLMs to validate the logical correctness of ranked rules through chain-of-thought reasoning. ChatRule is evaluated on four large-scale KGs, w.r.t. different rule quality metrics and downstream tasks, showing the effectiveness and scalability of our method.

Date

23.11.2023 19:07:22
Ma, Y.: Relatedness and compatibility : the concept of privacy in Mandarin Chinese and American English corpora (2023) 0.05
```
0.047451444 = product of:
  0.09490289 = sum of:
    0.09490289 = sum of:
      0.052821 = weight(_text_:language in 887) [ClassicSimilarity], result of:
        0.052821 = score(doc=887,freq=2.0), product of:
          0.2030952 = queryWeight, product of:
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.051766515 = queryNorm
          0.26008 = fieldWeight in 887, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.046875 = fieldNorm(doc=887)
      0.04208189 = weight(_text_:22 in 887) [ClassicSimilarity], result of:
        0.04208189 = score(doc=887,freq=2.0), product of:
          0.18127751 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051766515 = queryNorm
          0.23214069 = fieldWeight in 887, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=887)
  0.5 = coord(1/2)
```
Abstract

This study investigates how privacy as an ethical concept exists in two languages: Mandarin Chinese and American English. The exploration relies on two genres of corpora from 10 years: social media posts and news articles, 2010-2019. A mixed-methods approach combining structural topic modeling (STM) and human interpretation were used to work with the data. Findings show various privacy-related topics across the two languages. Moreover, some of these different topics revealed fundamental incompatibilities for understanding privacy across these two languages. In other words, some of the variations of topics do not just reflect contextual differences; they reveal how the two languages value privacy in different ways that can relate back to the society's ethical tradition. This study is one of the first empirically grounded intercultural explorations of the concept of privacy. It has shown that natural language is promising to operationalize intercultural and comparative privacy research, and it provides an examination of the concept as it is understood in these two languages.

Date

22. 1.2023 18:59:40

Das, S.; Paik, J.H.: Gender tagging of named entities using retrieval-assisted multi-context aggregation : an unsupervised approach (2023) 0.05

0.047451444 = product of:
  0.09490289 = sum of:
    0.09490289 = sum of:
      0.052821 = weight(_text_:language in 941) [ClassicSimilarity], result of:
        0.052821 = score(doc=941,freq=2.0), product of:
          0.2030952 = queryWeight, product of:
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.051766515 = queryNorm
          0.26008 = fieldWeight in 941, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.046875 = fieldNorm(doc=941)
      0.04208189 = weight(_text_:22 in 941) [ClassicSimilarity], result of:
        0.04208189 = score(doc=941,freq=2.0), product of:
          0.18127751 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051766515 = queryNorm
          0.23214069 = fieldWeight in 941, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=941)
  0.5 = coord(1/2)

Abstract: Inferring the gender of named entities present in a text has several practical applications in information sciences. Existing approaches toward name gender identification rely exclusively on using the gender distributions from labeled data. In the absence of such labeled data, these methods fail. In this article, we propose a two-stage model that is able to infer the gender of names present in text without requiring explicit name-gender labels. We use coreference resolution as the backbone for our proposed model. To aid coreference resolution where the existing contextual information does not suffice, we use a retrieval-assisted context aggregation framework. We demonstrate that state-of-the-art name gender inference is possible without supervision. Our proposed method matches or outperforms several supervised approaches and commercially used methods on five English language datasets from different domains.
Date: 22. 3.2023 12:00:14

Thelwall, M.; Thelwall, S.: ¬A thematic analysis of highly retweeted early COVID-19 tweets : consensus, information, dissent and lockdown life (2020) 0.04
```
0.03954287 = product of:
  0.07908574 = sum of:
    0.07908574 = sum of:
      0.0440175 = weight(_text_:language in 178) [ClassicSimilarity], result of:
        0.0440175 = score(doc=178,freq=2.0), product of:
          0.2030952 = queryWeight, product of:
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.051766515 = queryNorm
          0.21673335 = fieldWeight in 178, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.0390625 = fieldNorm(doc=178)
      0.03506824 = weight(_text_:22 in 178) [ClassicSimilarity], result of:
        0.03506824 = score(doc=178,freq=2.0), product of:
          0.18127751 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051766515 = queryNorm
          0.19345059 = fieldWeight in 178, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=178)
  0.5 = coord(1/2)
```
Abstract

Purpose Public attitudes towards COVID-19 and social distancing are critical in reducing its spread. It is therefore important to understand public reactions and information dissemination in all major forms, including on social media. This article investigates important issues reflected on Twitter in the early stages of the public reaction to COVID-19. Design/methodology/approach A thematic analysis of the most retweeted English-language tweets mentioning COVID-19 during March 10-29, 2020. Findings The main themes identified for the 87 qualifying tweets accounting for 14 million retweets were: lockdown life; attitude towards social restrictions; politics; safety messages; people with COVID-19; support for key workers; work; and COVID-19 facts/news. Research limitations/implications Twitter played many positive roles, mainly through unofficial tweets. Users shared social distancing information, helped build support for social distancing, criticised government responses, expressed support for key workers and helped each other cope with social isolation. A few popular tweets not supporting social distancing show that government messages sometimes failed. Practical implications Public health campaigns in future may consider encouraging grass roots social web activity to support campaign goals. At a methodological level, analysing retweet counts emphasised politics and ignored practical implementation issues. Originality/value This is the first qualitative analysis of general COVID-19-related retweeting.

Date

20. 1.2015 18:30:22
Jiang, Y.; Meng, R.; Huang, Y.; Lu, W.; Liu, J.: Generating keyphrases for readers : a controllable keyphrase generation framework (2023) 0.04
```
0.03954287 = product of:
  0.07908574 = sum of:
    0.07908574 = sum of:
      0.0440175 = weight(_text_:language in 1012) [ClassicSimilarity], result of:
        0.0440175 = score(doc=1012,freq=2.0), product of:
          0.2030952 = queryWeight, product of:
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.051766515 = queryNorm
          0.21673335 = fieldWeight in 1012, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1012)
      0.03506824 = weight(_text_:22 in 1012) [ClassicSimilarity], result of:
        0.03506824 = score(doc=1012,freq=2.0), product of:
          0.18127751 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051766515 = queryNorm
          0.19345059 = fieldWeight in 1012, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1012)
  0.5 = coord(1/2)
```
Abstract

With the wide application of keyphrases in many Information Retrieval (IR) and Natural Language Processing (NLP) tasks, automatic keyphrase prediction has been emerging. However, these statistically important phrases are contributing increasingly less to the related tasks because the end-to-end learning mechanism enables models to learn the important semantic information of the text directly. Similarly, keyphrases are of little help for readers to quickly grasp the paper's main idea because the relationship between the keyphrase and the paper is not explicit to readers. Therefore, we propose to generate keyphrases with specific functions for readers to bridge the semantic gap between them and the information producers, and verify the effectiveness of the keyphrase function for assisting users' comprehension with a user experiment. A controllable keyphrase generation framework (the CKPG) that uses the keyphrase function as a control code to generate categorized keyphrases is proposed and implemented based on Transformer, BART, and T5, respectively. For the Computer Science domain, the Macro-avgs of , , and on the Paper with Code dataset are up to 0.680, 0.535, and 0.558, respectively. Our experimental results indicate the effectiveness of the CKPG models.

Date

22. 6.2023 14:55:20
Steichen, B.; Lowe, R.: How do multilingual users search? : An investigation of query and result list language choices (2021) 0.04
```
0.036497384 = product of:
  0.07299477 = sum of:
    0.07299477 = product of:
      0.14598954 = sum of:
        0.14598954 = weight(_text_:language in 246) [ClassicSimilarity], result of:
          0.14598954 = score(doc=246,freq=22.0), product of:
            0.2030952 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.051766515 = queryNorm
            0.7188232 = fieldWeight in 246, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.0390625 = fieldNorm(doc=246)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Many users of search systems are multilingual, that is, they are proficient in two or more languages. In order to better understand and support the language preferences and behaviors of such multilingual users, this paper presents a series of five large-scale studies that specifically elicit language choices regarding search queries and result lists. Overall, the results from the studies indicate that users frequently make use of different languages (i.e., not just their primary language), especially when they are provided with choices (e.g., when provided with a secondary language query or result list choice). In particular, when presented with a mixed-language list choice, participants choose this option to an almost equal extent compared to primary-language-only lists. Important factors leading to language choices are user-, task- and system-related, including proficiency, task topic, and result layout. Moreover, participants' subjective reasons for making particular choices indicate that their primary language is considered more comfortable, that the secondary language often has more relevant and trustworthy results, and that mixed-language lists provide a better overview. These results provide crucial insights into multilingual user preferences and behaviors, and may help in the design of systems that can better support the querying and result exploration of multilingual users.
Hausser, R.: Language and nonlanguage cognition (2021) 0.03
```
0.03493781 = product of:
  0.06987562 = sum of:
    0.06987562 = product of:
      0.13975124 = sum of:
        0.13975124 = weight(_text_:language in 255) [ClassicSimilarity], result of:
          0.13975124 = score(doc=255,freq=14.0), product of:
            0.2030952 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.051766515 = queryNorm
            0.6881071 = fieldWeight in 255, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.046875 = fieldNorm(doc=255)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

A basic distinction in agent-based data-driven Database Semantics (DBS) is between language and nonlanguage cognition. Language cognition transfers content between agents by means of raw data. Nonlanguage cognition maps between content and raw data inside the focus agent. {\it Recognition} applies a concept type to raw data, resulting in a concept token. In language recognition, the focus agent (hearer) takes raw language-data (surfaces) produced by another agent (speaker) as input, while nonlanguage recognition takes raw nonlanguage-data as input. In either case, the output is a content which is stored in the agent's onboard short term memory. {\it Action} adapts a concept type to a purpose, resulting in a token. In language action, the focus agent (speaker) produces language-dependent surfaces for another agent (hearer), while nonlanguage action produces intentions for a nonlanguage purpose. In either case, the output is raw action data. As long as the procedural implementation of place holder values works properly, it is compatible with the DBS requirement of input-output equivalence between the natural prototype and the artificial reconstruction.
Dietz, K.: en.wikipedia.org > 6 Mio. Artikel (2020) 0.03
```
0.034257896 = product of:
  0.06851579 = sum of:
    0.06851579 = product of:
      0.20554736 = sum of:
        0.20554736 = weight(_text_:3a in 5669) [ClassicSimilarity], result of:
          0.20554736 = score(doc=5669,freq=2.0), product of:
            0.43887708 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.051766515 = queryNorm
            0.46834838 = fieldWeight in 5669, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5669)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Content

"Die Englischsprachige Wikipedia verfügt jetzt über mehr als 6 Millionen Artikel. An zweiter Stelle kommt die deutschsprachige Wikipedia mit 2.3 Millionen Artikeln, an dritter Stelle steht die französischsprachige Wikipedia mit 2.1 Millionen Artikeln (via Researchbuzz: Firehose <https://rbfirehose.com/2020/01/24/techcrunch-wikipedia-now-has-more-than-6-million-articles-in-english/> und Techcrunch <https://techcrunch.com/2020/01/23/wikipedia-english-six-million-articles/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Techcrunch+%28TechCrunch%29&guccounter=1&guce_referrer=aHR0cHM6Ly9yYmZpcmVob3NlLmNvbS8yMDIwLzAxLzI0L3RlY2hjcnVuY2gtd2lraXBlZGlhLW5vdy1oYXMtbW9yZS10aGFuLTYtbWlsbGlvbi1hcnRpY2xlcy1pbi1lbmdsaXNoLw&guce_referrer_sig=AQAAAK0zHfjdDZ_spFZBF_z-zDjtL5iWvuKDumFTzm4HvQzkUfE2pLXQzGS6FGB_y-VISdMEsUSvkNsg2U_NWQ4lwWSvOo3jvXo1I3GtgHpP8exukVxYAnn5mJspqX50VHIWFADHhs5AerkRn3hMRtf_R3F1qmEbo8EROZXp328HMC-o>). 250120 via digithek ch = #fineBlog s.a.: Angesichts der Veröffentlichung des 6-millionsten Artikels vergangene Woche in der englischsprachigen Wikipedia hat die Community-Zeitungsseite "Wikipedia Signpost" ein Moratorium bei der Veröffentlichung von Unternehmensartikeln gefordert. Das sei kein Vorwurf gegen die Wikimedia Foundation, aber die derzeitigen Maßnahmen, um die Enzyklopädie gegen missbräuchliches undeklariertes Paid Editing zu schützen, funktionierten ganz klar nicht. *"Da die ehrenamtlichen Autoren derzeit von Werbung in Gestalt von Wikipedia-Artikeln überwältigt werden, und da die WMF nicht in der Lage zu sein scheint, dem irgendetwas entgegenzusetzen, wäre der einzige gangbare Weg für die Autoren, fürs erste die Neuanlage von Artikeln über Unternehmen zu untersagen"*, schreibt der Benutzer Smallbones in seinem Editorial <https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2020-01-27/From_the_editor> zur heutigen Ausgabe."
Gabler, S.: Vergabe von DDC-Sachgruppen mittels eines Schlagwort-Thesaurus (2021) 0.03
```
0.034257896 = product of:
  0.06851579 = sum of:
    0.06851579 = product of:
      0.20554736 = sum of:
        0.20554736 = weight(_text_:3a in 1000) [ClassicSimilarity], result of:
          0.20554736 = score(doc=1000,freq=2.0), product of:
            0.43887708 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.051766515 = queryNorm
            0.46834838 = fieldWeight in 1000, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1000)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Content

Master thesis Master of Science (Library and Information Studies) (MSc), Universität Wien. Advisor: Christoph Steiner. Vgl.: https://www.researchgate.net/publication/371680244_Vergabe_von_DDC-Sachgruppen_mittels_eines_Schlagwort-Thesaurus. DOI: 10.25365/thesis.70030. Vgl. dazu die Präsentation unter: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=web&cd=&ved=0CAIQw7AJahcKEwjwoZzzytz_AhUAAAAAHQAAAAAQAg&url=https%3A%2F%2Fwiki.dnb.de%2Fdownload%2Fattachments%2F252121510%2FDA3%2520Workshop-Gabler.pdf%3Fversion%3D1%26modificationDate%3D1671093170000%26api%3Dv2&psig=AOvVaw0szwENK1or3HevgvIDOfjx&ust=1687719410889597&opi=89978449.
Shree, P.: ¬The journey of Open AI GPT models (2020) 0.03
```
0.03234613 = product of:
  0.06469226 = sum of:
    0.06469226 = product of:
      0.12938452 = sum of:
        0.12938452 = weight(_text_:language in 869) [ClassicSimilarity], result of:
          0.12938452 = score(doc=869,freq=12.0), product of:
            0.2030952 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.051766515 = queryNorm
            0.6370634 = fieldWeight in 869, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.046875 = fieldNorm(doc=869)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Generative Pre-trained Transformer (GPT) models by OpenAI have taken natural language processing (NLP) community by storm by introducing very powerful language models. These models can perform various NLP tasks like question answering, textual entailment, text summarisation etc. without any supervised training. These language models need very few to no examples to understand the tasks and perform equivalent or even better than the state-of-the-art models trained in supervised fashion. In this article we will cover the journey of these models and understand how they have evolved over a period of 2 years. 1. Discussion of GPT-1 paper (Improving Language Understanding by Generative Pre-training). 2. Discussion of GPT-2 paper (Language Models are unsupervised multitask learners) and its subsequent improvements over GPT-1. 3. Discussion of GPT-3 paper (Language models are few shot learners) and the improvements which have made it one of the most powerful models NLP has seen till date. This article assumes familiarity with the basics of NLP terminologies and transformer architecture.
Lund, B.D.; Wang, T.; Mannuru, N.R.; Nie, B.; Shimray, S.; Wang, Z.: ChatGPT and a new academic reality : artificial Intelligence-written research papers and the ethics of the large language models in scholarly publishing (2023) 0.03
```
0.0264105 = product of:
  0.052821 = sum of:
    0.052821 = product of:
      0.105642 = sum of:
        0.105642 = weight(_text_:language in 943) [ClassicSimilarity], result of:
          0.105642 = score(doc=943,freq=8.0), product of:
            0.2030952 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.051766515 = queryNorm
            0.52016 = fieldWeight in 943, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.046875 = fieldNorm(doc=943)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This article discusses OpenAI's ChatGPT, a generative pre-trained transformer, which uses natural language processing to fulfill text-based user requests (i.e., a "chatbot"). The history and principles behind ChatGPT and similar models are discussed. This technology is then discussed in relation to its potential impact on academia and scholarly research and publishing. ChatGPT is seen as a potential model for the automated preparation of essays and other types of scholarly manuscripts. Potential ethical issues that could arise with the emergence of large language models like GPT-3, the underlying technology behind ChatGPT, and its usage by academics and researchers, are discussed and situated within the context of broader advancements in artificial intelligence, machine learning, and natural language processing for research and scholarly publishing.
Escolano, C.; Costa-Jussà, M.R.; Fonollosa, J.A.: From bilingual to multilingual neural-based machine translation by incremental training (2021) 0.02
```
0.024606533 = product of:
  0.049213067 = sum of:
    0.049213067 = product of:
      0.09842613 = sum of:
        0.09842613 = weight(_text_:language in 97) [ClassicSimilarity], result of:
          0.09842613 = score(doc=97,freq=10.0), product of:
            0.2030952 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.051766515 = queryNorm
            0.48463053 = fieldWeight in 97, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.0390625 = fieldNorm(doc=97)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

A common intermediate language representation in neural machine translation can be used to extend bilingual systems by incremental training. We propose a new architecture based on introducing an interlingual loss as an additional training objective. By adding and forcing this interlingual loss, we can train multiple encoders and decoders for each language, sharing among them a common intermediate representation. Translation results on the low-resource tasks (Turkish-English and Kazakh-English tasks) show a BLEU improvement of up to 2.8 points. However, results on a larger dataset (Russian-English and Kazakh-English) show BLEU losses of a similar amount. While our system provides improvements only for the low-resource tasks in terms of translation quality, our system is capable of quickly deploying new language pairs without the need to retrain the rest of the system, which may be a game changer in some situations. Specifically, what is most relevant regarding our architecture is that it is capable of: reducing the number of production systems, with respect to the number of languages, from quadratic to linear; incrementally adding a new language to the system without retraining the languages already there; and allowing for translations from the new language to all the others present in the system.

¬Der Student aus dem Computer (2023) 0.02

0.024547769 = product of:
  0.049095538 = sum of:
    0.049095538 = product of:
      0.098191075 = sum of:
        0.098191075 = weight(_text_:22 in 1079) [ClassicSimilarity], result of:
          0.098191075 = score(doc=1079,freq=2.0), product of:
            0.18127751 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051766515 = queryNorm
            0.5416616 = fieldWeight in 1079, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=1079)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 27. 1.2023 16:22:55

Roy, D.; Bhatia, S.; Jain, P.: Information asymmetry in Wikipedia across different languages : a statistical analysis (2022) 0.02
```
0.022872169 = product of:
  0.045744337 = sum of:
    0.045744337 = product of:
      0.091488674 = sum of:
        0.091488674 = weight(_text_:language in 494) [ClassicSimilarity], result of:
          0.091488674 = score(doc=494,freq=6.0), product of:
            0.2030952 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.051766515 = queryNorm
            0.45047188 = fieldWeight in 494, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.046875 = fieldNorm(doc=494)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Wikipedia is the largest web-based open encyclopedia covering more than 300 languages. Different language editions of Wikipedia differ significantly in terms of their information coverage. In this article, we compare the information coverage in English Wikipedia (most exhaustive) and Wikipedias in 8 other widely spoken languages, namely Arabic, German, Hindi, Korean, Portuguese, Russian, Spanish, and Turkish. We analyze variations in different language editions of Wikipedia in terms of the number of topics covered as well as the amount of information discussed about different topics. Further, as a step towards bridging the information gap, we present WikiCompare-a browser plugin that allows Wikipedia readers to have a comprehensive overview of topics by incorporating missing information from Wikipedia page in other language.
MacKrill, K.; Silvester, C.; Pennebaker, J.W.; Petrie, K.J.: What makes an idea worth spreading? : language markers of popularity in TED talks by academics and other speakers (2021) 0.02
```
0.02200875 = product of:
  0.0440175 = sum of:
    0.0440175 = product of:
      0.088035 = sum of:
        0.088035 = weight(_text_:language in 312) [ClassicSimilarity], result of:
          0.088035 = score(doc=312,freq=8.0), product of:
            0.2030952 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.051766515 = queryNorm
            0.4334667 = fieldWeight in 312, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.0390625 = fieldNorm(doc=312)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

TED talks are a popular internet forum where new ideas and research are presented by a wide variety of speakers. In this study, we investigated how the language used in TED talks influenced popularity and viewer ratings. We also investigated the differences in linguistic style and ratings of talks given by academics and non-academics. The transcripts of 1866 talks were analyzed using the Linguistic Inquiry and Word Count program and eight language variables were correlated with number of views and viewer ratings. We found that talks with more analytic language received fewer views, while a greater use of the pronoun "I," positive emotion and social words was associated with more views. Talks with these linguistic characteristics received more emotional viewer ratings such as inspiring or courageous. When comparing talks by academics and non-academics, there was no difference in the overall popularity but viewers rated talks by academics as more fascinating, informative, and persuasive while non-academics received higher emotional ratings. The implications for understanding social influence processes are discussed.

Tuider, B.: Plansprachen und Sprachplanung : zum Stand interlinguistischer Forschungen (2021) 0.02

0.02200875 = product of:
  0.0440175 = sum of:
    0.0440175 = product of:
      0.088035 = sum of:
        0.088035 = weight(_text_:language in 317) [ClassicSimilarity], result of:
          0.088035 = score(doc=317,freq=2.0), product of:
            0.2030952 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.051766515 = queryNorm
            0.4334667 = fieldWeight in 317, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.078125 = fieldNorm(doc=317)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Dieser Beitrag berichtet über das Symposium "Planned Languages and Language Planning", das die Österreichische Nationalbibliothek am 24. und 25. Oktober 2019 organisierte. Der Bericht zu den sechzehn Vorträgen gibt einen Überblick über aktuelle Forschungsthemen auf dem Gebiet der Interlinguistik.

Park, J.S.; O'Brien, J.C.; Cai, C.J.; Ringel Morris, M.; Liang, P.; Bernstein, M.S.: Generative agents : interactive simulacra of human behavior (2023) 0.02
```
0.02200875 = product of:
  0.0440175 = sum of:
    0.0440175 = product of:
      0.088035 = sum of:
        0.088035 = weight(_text_:language in 972) [ClassicSimilarity], result of:
          0.088035 = score(doc=972,freq=8.0), product of:
            0.2030952 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.051766515 = queryNorm
            0.4334667 = fieldWeight in 972, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.0390625 = fieldNorm(doc=972)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.
Hahn, U.: Automatische Sprachverarbeitung (2023) 0.02
```
0.021787554 = product of:
  0.043575108 = sum of:
    0.043575108 = product of:
      0.087150216 = sum of:
        0.087150216 = weight(_text_:language in 790) [ClassicSimilarity], result of:
          0.087150216 = score(doc=790,freq=4.0), product of:
            0.2030952 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.051766515 = queryNorm
            0.42911017 = fieldWeight in 790, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.0546875 = fieldNorm(doc=790)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Dieses Kapitel gibt eine Übersicht über die maschinelle Verarbeitung natürlicher Sprachen (wie das Deutsche oder Englische; natural language - NL) durch Computer. Grundlegende Konzepte der automatischen Sprachverarbeitung (natural language processing - NLP) stammen aus der Sprachwissenschaft (s. Abschnitt 2) und sind in zunehmend selbstständiger Weise mit formalen Methoden und technischen Grundlagen der Informatik in einer eigenständigen Disziplin, der Computerlinguistik (CL; s. Abschnitte 3 und 4), verknüpft worden. Natürlichsprachliche Systeme (NatS) mit anwendungsbezogenen Funktionalitätsvorgaben bilden den Kern der informationswissenschaftlich geprägten NLP, die häufig als Sprachtechnologie oder im Deutschen auch (mittlerweile veraltet) als Informationslinguistik bezeichnet wird (s. Abschnitt 5).

Search (178 results, page 1 of 9)

Authors

Languages

Types

Themes

Subjects

Classifications