Search (9 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × type_ss:"a"
  • × year_i:[2020 TO 2030}
  1. Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.14
    0.13541423 = product of:
      0.1805523 = sum of:
        0.085297674 = weight(_text_:digital in 1139) [ClassicSimilarity], result of:
          0.085297674 = score(doc=1139,freq=4.0), product of:
            0.19770671 = queryWeight, product of:
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.050121464 = queryNorm
            0.43143538 = fieldWeight in 1139, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
        0.04641878 = weight(_text_:library in 1139) [ClassicSimilarity], result of:
          0.04641878 = score(doc=1139,freq=6.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.3522223 = fieldWeight in 1139, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
        0.048835836 = product of:
          0.09767167 = sum of:
            0.09767167 = weight(_text_:project in 1139) [ClassicSimilarity], result of:
              0.09767167 = score(doc=1139,freq=4.0), product of:
                0.21156175 = queryWeight, product of:
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.050121464 = queryNorm
                0.4616698 = fieldWeight in 1139, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1139)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.
  2. Morris, V.: Automated language identification of bibliographic resources (2020) 0.08
    0.0819426 = product of:
      0.1638852 = sum of:
        0.030628446 = weight(_text_:library in 5749) [ClassicSimilarity], result of:
          0.030628446 = score(doc=5749,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.23240642 = fieldWeight in 5749, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0625 = fieldNorm(doc=5749)
        0.13325676 = sum of:
          0.07893063 = weight(_text_:project in 5749) [ClassicSimilarity], result of:
            0.07893063 = score(doc=5749,freq=2.0), product of:
              0.21156175 = queryWeight, product of:
                4.220981 = idf(docFreq=1764, maxDocs=44218)
                0.050121464 = queryNorm
              0.37308553 = fieldWeight in 5749, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.220981 = idf(docFreq=1764, maxDocs=44218)
                0.0625 = fieldNorm(doc=5749)
          0.054326132 = weight(_text_:22 in 5749) [ClassicSimilarity], result of:
            0.054326132 = score(doc=5749,freq=2.0), product of:
              0.17551683 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.050121464 = queryNorm
              0.30952093 = fieldWeight in 5749, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=5749)
      0.5 = coord(2/4)
    
    Abstract
    This article describes experiments in the use of machine learning techniques at the British Library to assign language codes to catalog records, in order to provide information about the language of content of the resources described. In the first phase of the project, language codes were assigned to 1.15 million records with 99.7% confidence. The automated language identification tools developed will be used to contribute to future enhancement of over 4 million legacy records.
    Date
    2. 3.2020 19:04:22
  3. Andrushchenko, M.; Sandberg, K.; Turunen, R.; Marjanen, J.; Hatavara, M.; Kurunmäki, J.; Nummenmaa, T.; Hyvärinen, M.; Teräs, K.; Peltonen, J.; Nummenmaa, J.: Using parsed and annotated corpora to analyze parliamentarians' talk in Finland (2022) 0.02
    0.024083475 = product of:
      0.0963339 = sum of:
        0.0963339 = weight(_text_:digital in 471) [ClassicSimilarity], result of:
          0.0963339 = score(doc=471,freq=10.0), product of:
            0.19770671 = queryWeight, product of:
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.050121464 = queryNorm
            0.4872566 = fieldWeight in 471, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.0390625 = fieldNorm(doc=471)
      0.25 = coord(1/4)
    
    Abstract
    We present a search system for grammatically analyzed corpora of Finnish parliamentary records and interviews with former parliamentarians, annotated with metadata of talk structure and involved parliamentarians, and discuss their use through carefully chosen digital humanities case studies. We first introduce the construction, contents, and principles of use of the corpora. Then we discuss the application of the search system and the corpora to study how politicians talk about power, how ideological terms are used in political speech, and how to identify narratives in the data. All case studies stem from questions in the humanities and the social sciences, but rely on the grammatically parsed corpora in both identifying and quantifying passages of interest. Finally, the paper discusses the role of natural language processing methods for questions in the (digital) humanities. It makes the claim that a digital humanities inquiry of parliamentary speech and interviews with politicians cannot only rely on computational humanities modeling, but needs to accommodate a range of perspectives starting with simple searches, quantitative exploration, and ending with modeling. Furthermore, the digital humanities need a more thorough discussion about how the utilization of tools from information science and technologies alter the research questions posed in the humanities.
    Series
    JASIST special issue on digital humanities (DH): C. Methodological innovations, challenges, and new interest in DH
  4. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.02
    0.019901544 = product of:
      0.079606175 = sum of:
        0.079606175 = product of:
          0.23881851 = sum of:
            0.23881851 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.23881851 = score(doc=862,freq=2.0), product of:
                0.42493033 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.050121464 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  5. Suissa, O.; Elmalech, A.; Zhitomirsky-Geffet, M.: Text analysis using deep neural networks in digital humanities and information science (2022) 0.02
    0.01865498 = product of:
      0.07461992 = sum of:
        0.07461992 = weight(_text_:digital in 491) [ClassicSimilarity], result of:
          0.07461992 = score(doc=491,freq=6.0), product of:
            0.19770671 = queryWeight, product of:
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.050121464 = queryNorm
            0.37742734 = fieldWeight in 491, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.0390625 = fieldNorm(doc=491)
      0.25 = coord(1/4)
    
    Abstract
    Combining computational technologies and humanities is an ongoing effort aimed at making resources such as texts, images, audio, video, and other artifacts digitally available, searchable, and analyzable. In recent years, deep neural networks (DNN) dominate the field of automatic text analysis and natural language processing (NLP), in some cases presenting a super-human performance. DNNs are the state-of-the-art machine learning algorithms solving many NLP tasks that are relevant for Digital Humanities (DH) research, such as spell checking, language detection, entity extraction, author detection, question answering, and other tasks. These supervised algorithms learn patterns from a large number of "right" and "wrong" examples and apply them to new examples. However, using DNNs for analyzing the text resources in DH research presents two main challenges: (un)availability of training data and a need for domain adaptation. This paper explores these challenges by analyzing multiple use-cases of DH studies in recent literature and their possible solutions and lays out a practical decision model for DH experts for when and how to choose the appropriate deep learning approaches for their research. Moreover, in this paper, we aim to raise awareness of the benefits of utilizing deep learning models in the DH community.
    Series
    JASIST special issue on digital humanities (DH): C. Methodological innovations, challenges, and new interest in DH
  6. ¬Der Student aus dem Computer (2023) 0.01
    0.011883841 = product of:
      0.047535364 = sum of:
        0.047535364 = product of:
          0.09507073 = sum of:
            0.09507073 = weight(_text_:22 in 1079) [ClassicSimilarity], result of:
              0.09507073 = score(doc=1079,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.5416616 = fieldWeight in 1079, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=1079)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    27. 1.2023 16:22:55
  7. Zaitseva, E.M.: Developing linguistic tools of thematic search in library information systems (2023) 0.01
    0.00957139 = product of:
      0.03828556 = sum of:
        0.03828556 = weight(_text_:library in 1187) [ClassicSimilarity], result of:
          0.03828556 = score(doc=1187,freq=8.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.29050803 = fieldWeight in 1187, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1187)
      0.25 = coord(1/4)
    
    Abstract
    Within the R&D program "Information support of research by scientists and specialists on the basis of RNPLS&T Open Archive - the system of scientific knowledge aggregation", the RNPLS&T analyzes the use of linguistic tools of thematic search in the modern library information systems and the prospects for their development. The author defines the key common characteristics of e-catalogs of the largest Russian libraries revealed at the first stage of the analysis. Based on the specified common characteristics and detailed comparison analysis, the author outlines and substantiates the vectors for enhancing search inter faces of e-catalogs. The focus is made on linguistic tools of thematic search in library information systems; the key vectors are suggested: use of thematic search at different search levels with the clear-cut level differentiation; use of combined functionality within thematic search system; implementation of classification search in all e-catalogs; hierarchical representation of classifications; use of the matching systems for classification information retrieval languages, and in the long term classification and verbal information retrieval languages, and various verbal information retrieval languages. The author formulates practical recommendations to improve thematic search in library information systems.
  8. Bager, J.: ¬Die Text-KI ChatGPT schreibt Fachtexte, Prosa, Gedichte und Programmcode (2023) 0.01
    0.0067907665 = product of:
      0.027163066 = sum of:
        0.027163066 = product of:
          0.054326132 = sum of:
            0.054326132 = weight(_text_:22 in 835) [ClassicSimilarity], result of:
              0.054326132 = score(doc=835,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.30952093 = fieldWeight in 835, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=835)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    29.12.2022 18:22:55
  9. Rieger, F.: Lügende Computer (2023) 0.01
    0.0067907665 = product of:
      0.027163066 = sum of:
        0.027163066 = product of:
          0.054326132 = sum of:
            0.054326132 = weight(_text_:22 in 912) [ClassicSimilarity], result of:
              0.054326132 = score(doc=912,freq=2.0), product of:
                0.17551683 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050121464 = queryNorm
                0.30952093 = fieldWeight in 912, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=912)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    16. 3.2023 19:22:55