Search (267 results, page 1 of 14)

Gartner, R.: Metadata in the digital library : building an integrated strategy with XML (2021) 0.05

0.045969315 = product of:
  0.09193863 = sum of:
    0.060214873 = weight(_text_:markup in 732) [ClassicSimilarity], result of:
      0.060214873 = score(doc=732,freq=2.0), product of:
        0.27638784 = queryWeight, product of:
          6.572923 = idf(docFreq=167, maxDocs=44218)
          0.042049456 = queryNorm
        0.21786368 = fieldWeight in 732, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.572923 = idf(docFreq=167, maxDocs=44218)
          0.0234375 = fieldNorm(doc=732)
    0.03172376 = product of:
      0.047585636 = sum of:
        0.03033913 = weight(_text_:language in 732) [ClassicSimilarity], result of:
          0.03033913 = score(doc=732,freq=4.0), product of:
            0.16497234 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.042049456 = queryNorm
            0.18390435 = fieldWeight in 732, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.0234375 = fieldNorm(doc=732)
        0.017246505 = weight(_text_:29 in 732) [ClassicSimilarity], result of:
          0.017246505 = score(doc=732,freq=2.0), product of:
            0.14791684 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.042049456 = queryNorm
            0.11659596 = fieldWeight in 732, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0234375 = fieldNorm(doc=732)
      0.6666667 = coord(2/3)
  0.5 = coord(2/4)

Abstract: Metadata in the Digital Library is a complete guide to building a digital library metadata strategy from scratch, using established metadata standards bound together by the markup language XML. The book introduces the reader to the theory of metadata and shows how it can be applied in practice. It lays out the basic principles that should underlie any metadata strategy, including its relation to such fundamentals as the digital curation lifecycle, and demonstrates how they should be put into effect. It introduces the XML language and the key standards for each type of metadata, including Dublin Core and MODS for descriptive metadata and PREMIS for its administrative and preservation counterpart. Finally, the book shows how these can all be integrated using the packaging standard METS. Two case studies from the Warburg Institute in London show how the strategy can be implemented in a working environment. The strategy laid out in this book will ensure that a digital library's metadata will support all of its operations, be fully interoperable with others and enable its long-term preservation. It assumes no prior knowledge of metadata, XML or any of the standards that it covers. It provides both an introduction to best practices in digital library metadata and a manual for their practical implementation.
Date: 29. 9.2022 17:57:57

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.04

0.040543843 = product of:
  0.081087686 = sum of:
    0.066785686 = product of:
      0.20035705 = sum of:
        0.20035705 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.20035705 = score(doc=862,freq=2.0), product of:
            0.35649577 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.042049456 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
    0.014302002 = product of:
      0.042906005 = sum of:
        0.042906005 = weight(_text_:language in 862) [ClassicSimilarity], result of:
          0.042906005 = score(doc=862,freq=2.0), product of:
            0.16497234 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.042049456 = queryNorm
            0.26008 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
  0.5 = coord(2/4)

Abstract: This research revisits the classic Turing test and compares recent large language models such as ChatGPT for their abilities to reproduce human-level comprehension and compelling text generation. Two task challenges- summary and question answering- prompt ChatGPT to produce original content (98-99%) from a single text entry and sequential questions initially posed by Turing in 1950. We score the original and generated content against the OpenAI GPT-2 Output Detector from 2019, and establish multiple cases where the generated content proves original and undetectable (98%). The question of a machine fooling a human judge recedes in this work relative to the question of "how would one prove it?" The original contribution of the work presents a metric and simple grammatical set for understanding the writing mechanics of chatbots in evaluating their readability and statistical clarity, engagement, delivery, overall quality, and plagiarism risks. While Turing's original prose scores at least 14% below the machine-generated output, whether an algorithm displays hints of Turing's true initial thoughts (the "Lovelace 2.0" test) remains unanswerable.
Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Dietz, K.: en.wikipedia.org > 6 Mio. Artikel (2020) 0.03
```
0.03261807 = product of:
  0.06523614 = sum of:
    0.05565474 = product of:
      0.16696422 = sum of:
        0.16696422 = weight(_text_:3a in 5669) [ClassicSimilarity], result of:
          0.16696422 = score(doc=5669,freq=2.0), product of:
            0.35649577 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.042049456 = queryNorm
            0.46834838 = fieldWeight in 5669, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5669)
      0.33333334 = coord(1/3)
    0.009581393 = product of:
      0.028744178 = sum of:
        0.028744178 = weight(_text_:29 in 5669) [ClassicSimilarity], result of:
          0.028744178 = score(doc=5669,freq=2.0), product of:
            0.14791684 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.042049456 = queryNorm
            0.19432661 = fieldWeight in 5669, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5669)
      0.33333334 = coord(1/3)
  0.5 = coord(2/4)
```
Content

"Die Englischsprachige Wikipedia verfügt jetzt über mehr als 6 Millionen Artikel. An zweiter Stelle kommt die deutschsprachige Wikipedia mit 2.3 Millionen Artikeln, an dritter Stelle steht die französischsprachige Wikipedia mit 2.1 Millionen Artikeln (via Researchbuzz: Firehose <https://rbfirehose.com/2020/01/24/techcrunch-wikipedia-now-has-more-than-6-million-articles-in-english/> und Techcrunch <https://techcrunch.com/2020/01/23/wikipedia-english-six-million-articles/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Techcrunch+%28TechCrunch%29&guccounter=1&guce_referrer=aHR0cHM6Ly9yYmZpcmVob3NlLmNvbS8yMDIwLzAxLzI0L3RlY2hjcnVuY2gtd2lraXBlZGlhLW5vdy1oYXMtbW9yZS10aGFuLTYtbWlsbGlvbi1hcnRpY2xlcy1pbi1lbmdsaXNoLw&guce_referrer_sig=AQAAAK0zHfjdDZ_spFZBF_z-zDjtL5iWvuKDumFTzm4HvQzkUfE2pLXQzGS6FGB_y-VISdMEsUSvkNsg2U_NWQ4lwWSvOo3jvXo1I3GtgHpP8exukVxYAnn5mJspqX50VHIWFADHhs5AerkRn3hMRtf_R3F1qmEbo8EROZXp328HMC-o>). 250120 via digithek ch = #fineBlog s.a.: Angesichts der Veröffentlichung des 6-millionsten Artikels vergangene Woche in der englischsprachigen Wikipedia hat die Community-Zeitungsseite "Wikipedia Signpost" ein Moratorium bei der Veröffentlichung von Unternehmensartikeln gefordert. Das sei kein Vorwurf gegen die Wikimedia Foundation, aber die derzeitigen Maßnahmen, um die Enzyklopädie gegen missbräuchliches undeklariertes Paid Editing zu schützen, funktionierten ganz klar nicht. *"Da die ehrenamtlichen Autoren derzeit von Werbung in Gestalt von Wikipedia-Artikeln überwältigt werden, und da die WMF nicht in der Lage zu sein scheint, dem irgendetwas entgegenzusetzen, wäre der einzige gangbare Weg für die Autoren, fürs erste die Neuanlage von Artikeln über Unternehmen zu untersagen"*, schreibt der Benutzer Smallbones in seinem Editorial <https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2020-01-27/From_the_editor> zur heutigen Ausgabe."

Morris, V.: Automated language identification of bibliographic resources (2020) 0.03

0.02891633 = product of:
  0.11566532 = sum of:
    0.11566532 = product of:
      0.17349797 = sum of:
        0.12792102 = weight(_text_:language in 5749) [ClassicSimilarity], result of:
          0.12792102 = score(doc=5749,freq=10.0), product of:
            0.16497234 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.042049456 = queryNorm
            0.77540886 = fieldWeight in 5749, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.0625 = fieldNorm(doc=5749)
        0.045576964 = weight(_text_:22 in 5749) [ClassicSimilarity], result of:
          0.045576964 = score(doc=5749,freq=2.0), product of:
            0.14725003 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042049456 = queryNorm
            0.30952093 = fieldWeight in 5749, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=5749)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Abstract: This article describes experiments in the use of machine learning techniques at the British Library to assign language codes to catalog records, in order to provide information about the language of content of the resources described. In the first phase of the project, language codes were assigned to 1.15 million records with 99.7% confidence. The automated language identification tools developed will be used to contribute to future enhancement of over 4 million legacy records.
Date: 2. 3.2020 19:04:22

Dachwitz, I.: ¬Das sind 650.000 Kategorien, in die uns die Online-Werbeindustrie einsortiert : Microsofts Datenmarktplatz Xandr (2023) 0.03
```
0.028385563 = product of:
  0.11354225 = sum of:
    0.11354225 = weight(_text_:markup in 982) [ClassicSimilarity], result of:
      0.11354225 = score(doc=982,freq=4.0), product of:
        0.27638784 = queryWeight, product of:
          6.572923 = idf(docFreq=167, maxDocs=44218)
          0.042049456 = queryNorm
        0.4108077 = fieldWeight in 982, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.572923 = idf(docFreq=167, maxDocs=44218)
          0.03125 = fieldNorm(doc=982)
  0.25 = coord(1/4)
```
Content

"Was auch immer wir im Internet tun, wird aufgezeichnet und ausgewertet, um uns zielgerichtet Werbung anzuzeigen. Das ist eine Realität, an die viele Menschen sich inzwischen gewöhnt haben - im Gegenzug sind schließlich viele Internetangebote kostenlos. Wo genau unsere Daten landen, wenn wir Websites aufrufen oder Apps nutzen, das können die wenigsten nachvollziehen. Auch daran haben wir uns gewöhnt. Die Wege des Targeted Advertising sind unergründlich. Die Werbeindustrie tut viel dafür, damit das so bleibt: Die Netzwerke der Datensammler sind selbst für Branchenkenner:innen kaum zu überschauen. Jetzt präsentieren netzpolitik.org und das US-Medium The Markup einen einmaligen Einblick in das Geschäft mit unseren Daten. Wir haben die Angebotsliste von Xandr ausgewertet, einem der größten Datenmarktplätze der Werbewelt. Sie enthält mehr als 650.000 unterschiedliche Kategorien, in die die Industrie Menschen einsortiert, um sie mit gezielter Werbung erreichen zu können.
Umfang und Detailtiefe dieser Datensammlung sind erschreckend. Es gibt kaum eine menschliche Eigenschaft, die Werbetreibende nicht für Werbung ausnutzen wollen. Sie wollen Menschen aus Dänemark erreichen, die einen Toyota gekauft haben? Kein Problem. Sie wollen Menschen erreichen, die gerade finanzielle Probleme haben? Oder keine Krankenversicherung? Kein Problem. Minderjährige? Schwangere? Homosexuelle? Depressive? Politiker:innen? Alles kein Problem. "Diese Liste ist das gewaltigste Dokument über den globalen Datenhandel, das ich je gesehen habe", sagt der Wiener Tracking-Forscher Wolfie Christl. Er hat die Datei aufgestöbert und mit netzpolitik.org sowie The Markup geteilt. Das US-Medium berichtet heute unter anderem über die zahlreichen sensiblen Daten und macht sie mit einem interaktiven Tool einfach durchsuchbar. Xandr hat auf mehrere Presseanfragen nicht reagiert. Die Liste ist auf Mai 2021 datiert, sie stand bis zu unserer Anfrage auf einer Dokumentationsseite von Xandr offen im Netz. Heute ist sie nicht mehr erreichbar, aber beim Internet Archive gibt es eine archivierte Version der Seite und der Datei [23 MB]. Laut von uns befragten Jurist:innen zeige die Liste, dass das derzeitige Werbegeschäft strukturell unvereinbar mit Datenschutzanforderungen ist."
Ostani, M.M.; Sohrabi, M.C.; Taheri, S.M.; Asemi, A.: Localization of Schema.org for manuscript description in the Iranian-Islamic information context (2021) 0.03
```
0.02508953 = product of:
  0.10035812 = sum of:
    0.10035812 = weight(_text_:markup in 585) [ClassicSimilarity], result of:
      0.10035812 = score(doc=585,freq=2.0), product of:
        0.27638784 = queryWeight, product of:
          6.572923 = idf(docFreq=167, maxDocs=44218)
          0.042049456 = queryNorm
        0.36310613 = fieldWeight in 585, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.572923 = idf(docFreq=167, maxDocs=44218)
          0.0390625 = fieldNorm(doc=585)
  0.25 = coord(1/4)
```
Abstract

This study aims to assess the localization of Schema.org for manuscript description in the Iranian-Islamic information context using documentary and qualitative content analysis. The schema.org introduces schemas for different Web content objects so as to generate structured data. Given that the structure of Schema.org is ontological, the inheritance of the manuscript types from the properties of their parent types, as well as the localization and description of the specific properties of the manuscripts in the Iranian-Islamic information context were investigated in order to improve their indexability and semantic visibility in the Web search engines. The proposed properties specific to the manuscript type and the six proposed properties to be added to the "CreativeWork" type are found to be consistent with other schema properties. In turn, these properties lead to the localization of the existing schema for the manuscript type compatibility with the Iranian-Islamic information context. This schema is also applicable to centers with published records on the Web, and if markup with these properties, their indexability and semantic visibility in Web search engines increases accordingly. The generation of structured data in the Web environment through this schema is deemed to promote the concept of the Semantic Web, and make data and knowledge retrieval easier.
Thelwall, M.; Thelwall, S.: ¬A thematic analysis of highly retweeted early COVID-19 tweets : consensus, information, dissent and lockdown life (2020) 0.02
```
0.023246197 = product of:
  0.09298479 = sum of:
    0.09298479 = sum of:
      0.03575501 = weight(_text_:language in 178) [ClassicSimilarity], result of:
        0.03575501 = score(doc=178,freq=2.0), product of:
          0.16497234 = queryWeight, product of:
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.042049456 = queryNorm
          0.21673335 = fieldWeight in 178, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.9232929 = idf(docFreq=2376, maxDocs=44218)
            0.0390625 = fieldNorm(doc=178)
      0.028744178 = weight(_text_:29 in 178) [ClassicSimilarity], result of:
        0.028744178 = score(doc=178,freq=2.0), product of:
          0.14791684 = queryWeight, product of:
            3.5176873 = idf(docFreq=3565, maxDocs=44218)
            0.042049456 = queryNorm
          0.19432661 = fieldWeight in 178, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5176873 = idf(docFreq=3565, maxDocs=44218)
            0.0390625 = fieldNorm(doc=178)
      0.028485604 = weight(_text_:22 in 178) [ClassicSimilarity], result of:
        0.028485604 = score(doc=178,freq=2.0), product of:
          0.14725003 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.042049456 = queryNorm
          0.19345059 = fieldWeight in 178, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=178)
  0.25 = coord(1/4)
```
Abstract

Purpose Public attitudes towards COVID-19 and social distancing are critical in reducing its spread. It is therefore important to understand public reactions and information dissemination in all major forms, including on social media. This article investigates important issues reflected on Twitter in the early stages of the public reaction to COVID-19. Design/methodology/approach A thematic analysis of the most retweeted English-language tweets mentioning COVID-19 during March 10-29, 2020. Findings The main themes identified for the 87 qualifying tweets accounting for 14 million retweets were: lockdown life; attitude towards social restrictions; politics; safety messages; people with COVID-19; support for key workers; work; and COVID-19 facts/news. Research limitations/implications Twitter played many positive roles, mainly through unofficial tweets. Users shared social distancing information, helped build support for social distancing, criticised government responses, expressed support for key workers and helped each other cope with social isolation. A few popular tweets not supporting social distancing show that government messages sometimes failed. Practical implications Public health campaigns in future may consider encouraging grass roots social web activity to support campaign goals. At a methodological level, analysing retweet counts emphasised politics and ignored practical implementation issues. Originality/value This is the first qualitative analysis of general COVID-19-related retweeting.

Date

20. 1.2015 18:30:22

Lund, B.D.; Wang, T.; Mannuru, N.R.; Nie, B.; Shimray, S.; Wang, Z.: ChatGPT and a new academic reality : artificial Intelligence-written research papers and the ethics of the large language models in scholarly publishing (2023) 0.02

0.020050837 = product of:
  0.08020335 = sum of:
    0.08020335 = product of:
      0.12030502 = sum of:
        0.08581201 = weight(_text_:language in 943) [ClassicSimilarity], result of:
          0.08581201 = score(doc=943,freq=8.0), product of:
            0.16497234 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.042049456 = queryNorm
            0.52016 = fieldWeight in 943, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.046875 = fieldNorm(doc=943)
        0.03449301 = weight(_text_:29 in 943) [ClassicSimilarity], result of:
          0.03449301 = score(doc=943,freq=2.0), product of:
            0.14791684 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.042049456 = queryNorm
            0.23319192 = fieldWeight in 943, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=943)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Abstract: This article discusses OpenAI's ChatGPT, a generative pre-trained transformer, which uses natural language processing to fulfill text-based user requests (i.e., a "chatbot"). The history and principles behind ChatGPT and similar models are discussed. This technology is then discussed in relation to its potential impact on academia and scholarly research and publishing. ChatGPT is seen as a potential model for the automated preparation of essays and other types of scholarly manuscripts. Potential ethical issues that could arise with the emergence of large language models like GPT-3, the underlying technology behind ChatGPT, and its usage by academics and researchers, are discussed and situated within the context of broader advancements in artificial intelligence, machine learning, and natural language processing for research and scholarly publishing.
Date: 19. 4.2023 19:29:44

Luo, L.; Ju, J.; Li, Y.-F.; Haffari, G.; Xiong, B.; Pan, S.: ChatRule: mining logical rules with large language models for knowledge graph reasoning (2023) 0.02
```
0.016665937 = product of:
  0.06666375 = sum of:
    0.06666375 = product of:
      0.09999562 = sum of:
        0.07151002 = weight(_text_:language in 1171) [ClassicSimilarity], result of:
          0.07151002 = score(doc=1171,freq=8.0), product of:
            0.16497234 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.042049456 = queryNorm
            0.4334667 = fieldWeight in 1171, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1171)
        0.028485604 = weight(_text_:22 in 1171) [ClassicSimilarity], result of:
          0.028485604 = score(doc=1171,freq=2.0), product of:
            0.14725003 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042049456 = queryNorm
            0.19345059 = fieldWeight in 1171, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1171)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)
```
Abstract

Logical rules are essential for uncovering the logical connections between relations, which could improve the reasoning performance and provide interpretable results on knowledge graphs (KGs). Although there have been many efforts to mine meaningful logical rules over KGs, existing methods suffer from the computationally intensive searches over the rule space and a lack of scalability for large-scale KGs. Besides, they often ignore the semantics of relations which is crucial for uncovering logical connections. Recently, large language models (LLMs) have shown impressive performance in the field of natural language processing and various applications, owing to their emergent ability and generalizability. In this paper, we propose a novel framework, ChatRule, unleashing the power of large language models for mining logical rules over knowledge graphs. Specifically, the framework is initiated with an LLM-based rule generator, leveraging both the semantic and structural information of KGs to prompt LLMs to generate logical rules. To refine the generated rules, a rule ranking module estimates the rule quality by incorporating facts from existing KGs. Last, a rule validator harnesses the reasoning ability of LLMs to validate the logical correctness of ranked rules through chain-of-thought reasoning. ChatRule is evaluated on four large-scale KGs, w.r.t. different rule quality metrics and downstream tasks, showing the effectiveness and scalability of our method.

Date

23.11.2023 19:07:22

Dedrick, D.: Colour classification in natural languages (2021) 0.02

0.01504981 = product of:
  0.06019924 = sum of:
    0.06019924 = product of:
      0.090298854 = sum of:
        0.05005701 = weight(_text_:language in 454) [ClassicSimilarity], result of:
          0.05005701 = score(doc=454,freq=2.0), product of:
            0.16497234 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.042049456 = queryNorm
            0.30342668 = fieldWeight in 454, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.0546875 = fieldNorm(doc=454)
        0.040241845 = weight(_text_:29 in 454) [ClassicSimilarity], result of:
          0.040241845 = score(doc=454,freq=2.0), product of:
            0.14791684 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.042049456 = queryNorm
            0.27205724 = fieldWeight in 454, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=454)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Abstract: Names for colours or colour-related properties are ubiquitous among natural languages, and this has made linguistic colour classification a topic of interest: are colour classifications in natural languages language-specific, or is there a more general set of principles by which such classificatory terms are organized? This article focuses on a debate between cultural-linguistic, relativistic approaches, and universalistic approaches in this domain of research. It characterizes the central contemporary debates about colour naming, and the main research strategies currently in use, as well as a novel, hybrid strategy.
Date: 27. 5.2022 18:21:29

Gabler, S.: Vergabe von DDC-Sachgruppen mittels eines Schlagwort-Thesaurus (2021) 0.01
```
0.013913685 = product of:
  0.05565474 = sum of:
    0.05565474 = product of:
      0.16696422 = sum of:
        0.16696422 = weight(_text_:3a in 1000) [ClassicSimilarity], result of:
          0.16696422 = score(doc=1000,freq=2.0), product of:
            0.35649577 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.042049456 = queryNorm
            0.46834838 = fieldWeight in 1000, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1000)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)
```
Content

Master thesis Master of Science (Library and Information Studies) (MSc), Universität Wien. Advisor: Christoph Steiner. Vgl.: https://www.researchgate.net/publication/371680244_Vergabe_von_DDC-Sachgruppen_mittels_eines_Schlagwort-Thesaurus. DOI: 10.25365/thesis.70030. Vgl. dazu die Präsentation unter: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=web&cd=&ved=0CAIQw7AJahcKEwjwoZzzytz_AhUAAAAAHQAAAAAQAg&url=https%3A%2F%2Fwiki.dnb.de%2Fdownload%2Fattachments%2F252121510%2FDA3%2520Workshop-Gabler.pdf%3Fversion%3D1%26modificationDate%3D1671093170000%26api%3Dv2&psig=AOvVaw0szwENK1or3HevgvIDOfjx&ust=1687719410889597&opi=89978449.
Ma, Y.: Relatedness and compatibility : the concept of privacy in Mandarin Chinese and American English corpora (2023) 0.01
```
0.012848122 = product of:
  0.05139249 = sum of:
    0.05139249 = product of:
      0.07708873 = sum of:
        0.042906005 = weight(_text_:language in 887) [ClassicSimilarity], result of:
          0.042906005 = score(doc=887,freq=2.0), product of:
            0.16497234 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.042049456 = queryNorm
            0.26008 = fieldWeight in 887, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.046875 = fieldNorm(doc=887)
        0.034182724 = weight(_text_:22 in 887) [ClassicSimilarity], result of:
          0.034182724 = score(doc=887,freq=2.0), product of:
            0.14725003 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042049456 = queryNorm
            0.23214069 = fieldWeight in 887, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=887)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)
```
Abstract

This study investigates how privacy as an ethical concept exists in two languages: Mandarin Chinese and American English. The exploration relies on two genres of corpora from 10 years: social media posts and news articles, 2010-2019. A mixed-methods approach combining structural topic modeling (STM) and human interpretation were used to work with the data. Findings show various privacy-related topics across the two languages. Moreover, some of these different topics revealed fundamental incompatibilities for understanding privacy across these two languages. In other words, some of the variations of topics do not just reflect contextual differences; they reveal how the two languages value privacy in different ways that can relate back to the society's ethical tradition. This study is one of the first empirically grounded intercultural explorations of the concept of privacy. It has shown that natural language is promising to operationalize intercultural and comparative privacy research, and it provides an examination of the concept as it is understood in these two languages.

Date

22. 1.2023 18:59:40

Das, S.; Paik, J.H.: Gender tagging of named entities using retrieval-assisted multi-context aggregation : an unsupervised approach (2023) 0.01

0.012848122 = product of:
  0.05139249 = sum of:
    0.05139249 = product of:
      0.07708873 = sum of:
        0.042906005 = weight(_text_:language in 941) [ClassicSimilarity], result of:
          0.042906005 = score(doc=941,freq=2.0), product of:
            0.16497234 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.042049456 = queryNorm
            0.26008 = fieldWeight in 941, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.046875 = fieldNorm(doc=941)
        0.034182724 = weight(_text_:22 in 941) [ClassicSimilarity], result of:
          0.034182724 = score(doc=941,freq=2.0), product of:
            0.14725003 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042049456 = queryNorm
            0.23214069 = fieldWeight in 941, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=941)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Abstract: Inferring the gender of named entities present in a text has several practical applications in information sciences. Existing approaches toward name gender identification rely exclusively on using the gender distributions from labeled data. In the absence of such labeled data, these methods fail. In this article, we propose a two-stage model that is able to infer the gender of names present in text without requiring explicit name-gender labels. We use coreference resolution as the backbone for our proposed model. To aid coreference resolution where the existing contextual information does not suffice, we use a retrieval-assisted context aggregation framework. We demonstrate that state-of-the-art name gender inference is possible without supervision. Our proposed method matches or outperforms several supervised approaches and commercially used methods on five English language datasets from different domains.
Date: 22. 3.2023 12:00:14

Bärnreuther, K.: Informationskompetenz-Vermittlung für Schulklassen mit Wikipedia und dem Framework Informationskompetenz in der Hochschulbildung (2021) 0.01

0.011445956 = product of:
  0.045783825 = sum of:
    0.045783825 = product of:
      0.068675734 = sum of:
        0.03449301 = weight(_text_:29 in 299) [ClassicSimilarity], result of:
          0.03449301 = score(doc=299,freq=2.0), product of:
            0.14791684 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.042049456 = queryNorm
            0.23319192 = fieldWeight in 299, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=299)
        0.034182724 = weight(_text_:22 in 299) [ClassicSimilarity], result of:
          0.034182724 = score(doc=299,freq=2.0), product of:
            0.14725003 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042049456 = queryNorm
            0.23214069 = fieldWeight in 299, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=299)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Date: 30. 6.2021 16:29:52
Source: o-bib: Das offene Bibliotheksjournal. 8(2021) Nr.2, S.1-22

Hertzum, M.: Information seeking by experimentation : trying something out to discover what happens (2023) 0.01

0.011445956 = product of:
  0.045783825 = sum of:
    0.045783825 = product of:
      0.068675734 = sum of:
        0.03449301 = weight(_text_:29 in 915) [ClassicSimilarity], result of:
          0.03449301 = score(doc=915,freq=2.0), product of:
            0.14791684 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.042049456 = queryNorm
            0.23319192 = fieldWeight in 915, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=915)
        0.034182724 = weight(_text_:22 in 915) [ClassicSimilarity], result of:
          0.034182724 = score(doc=915,freq=2.0), product of:
            0.14725003 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042049456 = queryNorm
            0.23214069 = fieldWeight in 915, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=915)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Date: 21. 3.2023 19:22:29

Laparra, E.; Binford-Walsh, A.; Emerson, K.; Miller, M.L.; López-Hoffman, L.; Currim, F.; Bethard, S.: Addressing structural hurdles for metadata extraction from environmental impact statements (2023) 0.01
```
0.010749864 = product of:
  0.042999458 = sum of:
    0.042999458 = product of:
      0.064499184 = sum of:
        0.03575501 = weight(_text_:language in 1042) [ClassicSimilarity], result of:
          0.03575501 = score(doc=1042,freq=2.0), product of:
            0.16497234 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.042049456 = queryNorm
            0.21673335 = fieldWeight in 1042, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1042)
        0.028744178 = weight(_text_:29 in 1042) [ClassicSimilarity], result of:
          0.028744178 = score(doc=1042,freq=2.0), product of:
            0.14791684 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.042049456 = queryNorm
            0.19432661 = fieldWeight in 1042, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1042)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)
```
Abstract

Natural language processing techniques can be used to analyze the linguistic content of a document to extract missing pieces of metadata. However, accurate metadata extraction may not depend solely on the linguistics, but also on structural problems such as extremely large documents, unordered multi-file documents, and inconsistency in manually labeled metadata. In this work, we start from two standard machine learning solutions to extract pieces of metadata from Environmental Impact Statements, environmental policy documents that are regularly produced under the US National Environmental Policy Act of 1969. We present a series of experiments where we evaluate how these standard approaches are affected by different issues derived from real-world data. We find that metadata extraction can be strongly influenced by nonlinguistic factors such as document length and volume ordering and that the standard machine learning solutions often do not scale well to long documents. We demonstrate how such solutions can be better adapted to these scenarios, and conclude with suggestions for other NLP practitioners cataloging large document collections.

Date

29. 8.2023 19:21:01
Jiang, Y.; Meng, R.; Huang, Y.; Lu, W.; Liu, J.: Generating keyphrases for readers : a controllable keyphrase generation framework (2023) 0.01
```
0.010706769 = product of:
  0.042827077 = sum of:
    0.042827077 = product of:
      0.06424061 = sum of:
        0.03575501 = weight(_text_:language in 1012) [ClassicSimilarity], result of:
          0.03575501 = score(doc=1012,freq=2.0), product of:
            0.16497234 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.042049456 = queryNorm
            0.21673335 = fieldWeight in 1012, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1012)
        0.028485604 = weight(_text_:22 in 1012) [ClassicSimilarity], result of:
          0.028485604 = score(doc=1012,freq=2.0), product of:
            0.14725003 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042049456 = queryNorm
            0.19345059 = fieldWeight in 1012, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1012)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)
```
Abstract

With the wide application of keyphrases in many Information Retrieval (IR) and Natural Language Processing (NLP) tasks, automatic keyphrase prediction has been emerging. However, these statistically important phrases are contributing increasingly less to the related tasks because the end-to-end learning mechanism enables models to learn the important semantic information of the text directly. Similarly, keyphrases are of little help for readers to quickly grasp the paper's main idea because the relationship between the keyphrase and the paper is not explicit to readers. Therefore, we propose to generate keyphrases with specific functions for readers to bridge the semantic gap between them and the information producers, and verify the effectiveness of the keyphrase function for assisting users' comprehension with a user experiment. A controllable keyphrase generation framework (the CKPG) that uses the keyphrase function as a control code to generate categorized keyphrases is proposed and implemented based on Transformer, BART, and T5, respectively. For the Computer Science domain, the Macro-avgs of , , and on the Paper with Code dataset are up to 0.680, 0.535, and 0.558, respectively. Our experimental results indicate the effectiveness of the CKPG models.

Date

22. 6.2023 14:55:20
Steichen, B.; Lowe, R.: How do multilingual users search? : An investigation of query and result list language choices (2021) 0.01
```
0.009882162 = product of:
  0.03952865 = sum of:
    0.03952865 = product of:
      0.118585944 = sum of:
        0.118585944 = weight(_text_:language in 246) [ClassicSimilarity], result of:
          0.118585944 = score(doc=246,freq=22.0), product of:
            0.16497234 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.042049456 = queryNorm
            0.7188232 = fieldWeight in 246, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.0390625 = fieldNorm(doc=246)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)
```
Abstract

Many users of search systems are multilingual, that is, they are proficient in two or more languages. In order to better understand and support the language preferences and behaviors of such multilingual users, this paper presents a series of five large-scale studies that specifically elicit language choices regarding search queries and result lists. Overall, the results from the studies indicate that users frequently make use of different languages (i.e., not just their primary language), especially when they are provided with choices (e.g., when provided with a secondary language query or result list choice). In particular, when presented with a mixed-language list choice, participants choose this option to an almost equal extent compared to primary-language-only lists. Important factors leading to language choices are user-, task- and system-related, including proficiency, task topic, and result layout. Moreover, participants' subjective reasons for making particular choices indicate that their primary language is considered more comfortable, that the secondary language often has more relevant and trustworthy results, and that mixed-language lists provide a better overview. These results provide crucial insights into multilingual user preferences and behaviors, and may help in the design of systems that can better support the querying and result exploration of multilingual users.
Barité, M.; Parentelli, V.; Rodríguez Casaballe, N.; Suárez, M.V.: Interdisciplinarity and postgraduate teaching of knowledge organization (KO) : elements for a necessary dialogue (2023) 0.01
```
0.009538297 = product of:
  0.038153186 = sum of:
    0.038153186 = product of:
      0.05722978 = sum of:
        0.028744178 = weight(_text_:29 in 1125) [ClassicSimilarity], result of:
          0.028744178 = score(doc=1125,freq=2.0), product of:
            0.14791684 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.042049456 = queryNorm
            0.19432661 = fieldWeight in 1125, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1125)
        0.028485604 = weight(_text_:22 in 1125) [ClassicSimilarity], result of:
          0.028485604 = score(doc=1125,freq=2.0), product of:
            0.14725003 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042049456 = queryNorm
            0.19345059 = fieldWeight in 1125, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1125)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)
```
Abstract

Interdisciplinarity implies the previous existence of disciplinary fields and not their dissolution. As a general objective, we propose to establish an initial approach to the emphasis given to interdisciplinarity in the teaching of KO, through the teaching staff responsible for postgraduate courses focused on -or related to the KO, in Ibero-American universities. For conducting the research, the framework and distribution of a survey addressed to teachers is proposed, based on four lines of action: 1. The way teachers manage the concept of interdisciplinarity. 2. The place that teachers give to interdisciplinarity in KO. 3. Assessment of interdisciplinary content that teachers incorporate into their postgraduate courses. 4. Set of teaching strategies and resources used by teachers to include interdisciplinarity in the teaching of KO. The study analyzed 22 responses. Preliminary results show that KO teachers recognize the influence of other disciplines in concepts, theories, methods, and applications, but no consensus has been reached regarding which disciplines and authors are the ones who build interdisciplinary bridges. Among other conclusions, the study strongly suggests that environmental and social tensions are reflected in subject representation, especially in the construction of friendly knowledge organization systems with interdisciplinary visions, and in the expressions through which information is sought.

Date

20.11.2023 17:29:13
Hausser, R.: Language and nonlanguage cognition (2021) 0.01
```
0.009459887 = product of:
  0.037839547 = sum of:
    0.037839547 = product of:
      0.11351863 = sum of:
        0.11351863 = weight(_text_:language in 255) [ClassicSimilarity], result of:
          0.11351863 = score(doc=255,freq=14.0), product of:
            0.16497234 = queryWeight, product of:
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.042049456 = queryNorm
            0.6881071 = fieldWeight in 255, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.9232929 = idf(docFreq=2376, maxDocs=44218)
              0.046875 = fieldNorm(doc=255)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)
```
Abstract

A basic distinction in agent-based data-driven Database Semantics (DBS) is between language and nonlanguage cognition. Language cognition transfers content between agents by means of raw data. Nonlanguage cognition maps between content and raw data inside the focus agent. {\it Recognition} applies a concept type to raw data, resulting in a concept token. In language recognition, the focus agent (hearer) takes raw language-data (surfaces) produced by another agent (speaker) as input, while nonlanguage recognition takes raw nonlanguage-data as input. In either case, the output is a content which is stored in the agent's onboard short term memory. {\it Action} adapts a concept type to a purpose, resulting in a token. In language action, the focus agent (speaker) produces language-dependent surfaces for another agent (hearer), while nonlanguage action produces intentions for a nonlanguage purpose. In either case, the output is raw action data. As long as the procedural implementation of place holder values works properly, it is compatible with the DBS requirement of input-output equivalence between the natural prototype and the artificial reconstruction.

Search (267 results, page 1 of 14)

Authors

Languages

Types

Themes

Subjects

Classifications