Search (79 results, page 1 of 4)

  • × language_ss:"e"
  • × year_i:[2020 TO 2030}
  1. Suominen, O.; Koskenniemi, I.: Annif Analyzer Shootout : comparing text lemmatization methods for automated subject indexing (2022) 0.06
    0.05560436 = product of:
      0.11120872 = sum of:
        0.11120872 = product of:
          0.22241744 = sum of:
            0.22241744 = weight(_text_:toolkit in 658) [ClassicSimilarity], result of:
              0.22241744 = score(doc=658,freq=4.0), product of:
                0.3736465 = queryWeight, product of:
                  7.61935 = idf(docFreq=58, maxDocs=44218)
                  0.049039155 = queryNorm
                0.5952617 = fieldWeight in 658, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  7.61935 = idf(docFreq=58, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=658)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Automated text classification is an important function for many AI systems relevant to libraries, including automated subject indexing and classification. When implemented using the traditional natural language processing (NLP) paradigm, one key part of the process is the normalization of words using stemming or lemmatization, which reduces the amount of linguistic variation and often improves the quality of classification. In this paper, we compare the output of seven different text lemmatization algorithms as well as two baseline methods. We measure how the choice of method affects the quality of text classification using example corpora in three languages. The experiments have been performed using the open source Annif toolkit for automated subject indexing and classification, but should generalize also to other NLP toolkits and similar text classification tasks. The results show that lemmatization methods in most cases outperform baseline methods in text classification particularly for Finnish and Swedish text, but not English, where baseline methods are most effective. The differences between lemmatization methods are quite small. The systematic comparison will help optimize text classification pipelines and inform the further development of the Annif toolkit to incorporate a wider choice of normalization methods.
  2. Dunsire, G.; Fritz, D.; Fritz, R.: Instructions, interfaces, and interoperable data : the RIMMF experience with RDA revisited (2020) 0.06
    0.055045508 = product of:
      0.110091016 = sum of:
        0.110091016 = product of:
          0.22018203 = sum of:
            0.22018203 = weight(_text_:toolkit in 5751) [ClassicSimilarity], result of:
              0.22018203 = score(doc=5751,freq=2.0), product of:
                0.3736465 = queryWeight, product of:
                  7.61935 = idf(docFreq=58, maxDocs=44218)
                  0.049039155 = queryNorm
                0.589279 = fieldWeight in 5751, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  7.61935 = idf(docFreq=58, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5751)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This article presents a case study of RIMMF, a software tool developed to improve the orientation and training of catalogers who use Resource Description and Access (RDA) to maintain bibliographic data. The cataloging guidance and instructions of RDA are based on the Functional Requirements conceptual models that are now consolidated in the IFLA Library Reference Model, but many catalogers are applying RDA in systems that have evolved from inventory and text-processing applications developed from older metadata paradigms. The article describes how RIMMF interacts with the RDA Toolkit and RDA Registry to offer cataloger-friendly multilingual data input and editing interfaces.
  3. Lowe, D.B.; Dollinger, I.; Koster, T.; Herbert, B.E.: Text mining for type of research classification (2021) 0.05
    0.047181867 = product of:
      0.094363734 = sum of:
        0.094363734 = product of:
          0.18872747 = sum of:
            0.18872747 = weight(_text_:toolkit in 720) [ClassicSimilarity], result of:
              0.18872747 = score(doc=720,freq=2.0), product of:
                0.3736465 = queryWeight, product of:
                  7.61935 = idf(docFreq=58, maxDocs=44218)
                  0.049039155 = queryNorm
                0.5050963 = fieldWeight in 720, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  7.61935 = idf(docFreq=58, maxDocs=44218)
                  0.046875 = fieldNorm(doc=720)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This project brought together undergraduate students in Computer Science with librarians to mine abstracts of articles from the Texas A&M University Libraries' institutional repository, OAKTrust, in order to probe the creation of new metadata to improve discovery and use. The mining operation task consisted simply of classifying the articles into two categories of research type: basic research ("for understanding," "curiosity-based," or "knowledge-based") and applied research ("use-based"). These categories are fundamental especially for funders but are also important to researchers. The mining-to-classification steps took several iterations, but ultimately, we achieved good results with the toolkit BERT (Bidirectional Encoder Representations from Transformers). The project and its workflows represent a preview of what may lie ahead in the future of crafting metadata using text mining techniques to enhance discoverability.
  4. Oliver, C: Introducing RDA : a guide to the basics after 3R (2021) 0.04
    0.039318223 = product of:
      0.078636445 = sum of:
        0.078636445 = product of:
          0.15727289 = sum of:
            0.15727289 = weight(_text_:toolkit in 716) [ClassicSimilarity], result of:
              0.15727289 = score(doc=716,freq=2.0), product of:
                0.3736465 = queryWeight, product of:
                  7.61935 = idf(docFreq=58, maxDocs=44218)
                  0.049039155 = queryNorm
                0.42091358 = fieldWeight in 716, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  7.61935 = idf(docFreq=58, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=716)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Since Oliver's guide was first published in 2010, thousands of LIS students, records managers, and catalogers and other library professionals have relied on its clear, plainspoken explanation of RDA: Resource Description and Access as their first step towards becoming acquainted with the cataloging standard. Now, reflecting the changes to RDA after the completion of the 3R Project, Oliver brings her Special Report up to date. This essential primer concisely explains what RDA is, its basic features, and the main factors in its development describes RDA's relationship to the international standards and models that continue to influence its evolution provides an overview of the latest developments, focusing on the impact of the 3R Project, the results of aligning RDA with IFLA's Library Reference Model (LRM), and the outcomes of internationalization illustrates how information is organized in the post 3R Toolkit and explains how to navigate through this new structure; and discusses how RDA continues to enable improved resource discovery both in traditional and new applications, including the linked data environment.
  5. Grabus, S.; Logan, P.M.; Greenberg, J.: Temporal concept drift and alignment : an empirical approach to comparing knowledge organization systems over time (2022) 0.04
    0.039318223 = product of:
      0.078636445 = sum of:
        0.078636445 = product of:
          0.15727289 = sum of:
            0.15727289 = weight(_text_:toolkit in 1100) [ClassicSimilarity], result of:
              0.15727289 = score(doc=1100,freq=2.0), product of:
                0.3736465 = queryWeight, product of:
                  7.61935 = idf(docFreq=58, maxDocs=44218)
                  0.049039155 = queryNorm
                0.42091358 = fieldWeight in 1100, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  7.61935 = idf(docFreq=58, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1100)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This research explores temporal concept drift and temporal alignment in knowledge organization systems (KOS). A comparative analysis is pursued using the 1910 Library of Congress Subject Headings, 2020 FAST Topical, and automatic indexing. The use case involves a sample of 90 nineteenth-century Encyclopedia Britannica entries. The entries were indexed using two approaches: 1) full-text indexing; 2) Named Entity Recognition was performed upon the entries with Stanza, Stanford's NLP toolkit, and entities were automatically indexed with the Helping Interdisciplinary Vocabulary application (HIVE), using both 1910 LCSH and FAST Topical. The analysis focused on three goals: 1) identifying results that were exclusive to the 1910 LCSH output; 2) identifying terms in the exclusive set that have been deprecated from the contemporary LCSH, demonstrating temporal concept drift; and 3) exploring the historical significance of these deprecated terms. Results confirm that historical vocabularies can be used to generate anachronistic subject headings representing conceptual drift across time in KOS and historical resources. A methodological contribution is made demonstrating how to study changes in KOS over time and improve the contextualization historical humanities resources.
  6. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.04
    0.03894359 = product of:
      0.07788718 = sum of:
        0.07788718 = product of:
          0.23366153 = sum of:
            0.23366153 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.23366153 = score(doc=862,freq=2.0), product of:
                0.4157545 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.049039155 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  7. Fugmann, R.: What is information? : an information veteran looks back (2022) 0.02
    0.01661032 = product of:
      0.03322064 = sum of:
        0.03322064 = product of:
          0.06644128 = sum of:
            0.06644128 = weight(_text_:22 in 1085) [ClassicSimilarity], result of:
              0.06644128 = score(doc=1085,freq=2.0), product of:
                0.17172676 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049039155 = queryNorm
                0.38690117 = fieldWeight in 1085, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1085)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    18. 8.2022 19:22:57
  8. Morris, V.: Automated language identification of bibliographic resources (2020) 0.01
    0.013288257 = product of:
      0.026576513 = sum of:
        0.026576513 = product of:
          0.053153027 = sum of:
            0.053153027 = weight(_text_:22 in 5749) [ClassicSimilarity], result of:
              0.053153027 = score(doc=5749,freq=2.0), product of:
                0.17172676 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049039155 = queryNorm
                0.30952093 = fieldWeight in 5749, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5749)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    2. 3.2020 19:04:22
  9. Tay, A.: ¬The next generation discovery citation indexes : a review of the landscape in 2020 (2020) 0.01
    0.011627224 = product of:
      0.023254449 = sum of:
        0.023254449 = product of:
          0.046508897 = sum of:
            0.046508897 = weight(_text_:22 in 40) [ClassicSimilarity], result of:
              0.046508897 = score(doc=40,freq=2.0), product of:
                0.17172676 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049039155 = queryNorm
                0.2708308 = fieldWeight in 40, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=40)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    17.11.2020 12:22:59
  10. Manley, S.: Letters to the editor and the race for publication metrics (2022) 0.01
    0.011627224 = product of:
      0.023254449 = sum of:
        0.023254449 = product of:
          0.046508897 = sum of:
            0.046508897 = weight(_text_:22 in 547) [ClassicSimilarity], result of:
              0.046508897 = score(doc=547,freq=2.0), product of:
                0.17172676 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049039155 = queryNorm
                0.2708308 = fieldWeight in 547, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=547)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    6. 4.2022 19:22:26
  11. Wu, P.F.: Veni, vidi, vici? : On the rise of scrape-and-report scholarship in online reviews research (2023) 0.01
    0.011627224 = product of:
      0.023254449 = sum of:
        0.023254449 = product of:
          0.046508897 = sum of:
            0.046508897 = weight(_text_:22 in 896) [ClassicSimilarity], result of:
              0.046508897 = score(doc=896,freq=2.0), product of:
                0.17172676 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049039155 = queryNorm
                0.2708308 = fieldWeight in 896, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=896)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 1.2023 18:33:53
  12. Candela, G.: ¬An automatic data quality approach to assess semantic data from cultural heritage institutions (2023) 0.01
    0.011627224 = product of:
      0.023254449 = sum of:
        0.023254449 = product of:
          0.046508897 = sum of:
            0.046508897 = weight(_text_:22 in 997) [ClassicSimilarity], result of:
              0.046508897 = score(doc=997,freq=2.0), product of:
                0.17172676 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049039155 = queryNorm
                0.2708308 = fieldWeight in 997, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=997)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 6.2023 18:23:31
  13. Geras, A.; Siudem, G.; Gagolewski, M.: Should we introduce a dislike button for academic articles? (2020) 0.01
    0.009966193 = product of:
      0.019932386 = sum of:
        0.019932386 = product of:
          0.03986477 = sum of:
            0.03986477 = weight(_text_:22 in 5620) [ClassicSimilarity], result of:
              0.03986477 = score(doc=5620,freq=2.0), product of:
                0.17172676 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049039155 = queryNorm
                0.23214069 = fieldWeight in 5620, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5620)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    6. 1.2020 18:10:22
  14. Bullard, J.; Dierking, A.; Grundner, A.: Centring LGBT2QIA+ subjects in knowledge organization systems (2020) 0.01
    0.009966193 = product of:
      0.019932386 = sum of:
        0.019932386 = product of:
          0.03986477 = sum of:
            0.03986477 = weight(_text_:22 in 5996) [ClassicSimilarity], result of:
              0.03986477 = score(doc=5996,freq=2.0), product of:
                0.17172676 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049039155 = queryNorm
                0.23214069 = fieldWeight in 5996, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5996)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    6.10.2020 21:22:33
  15. Lorentzen, D.G.: Bridging polarised Twitter discussions : the interactions of the users in the middle (2021) 0.01
    0.009966193 = product of:
      0.019932386 = sum of:
        0.019932386 = product of:
          0.03986477 = sum of:
            0.03986477 = weight(_text_:22 in 182) [ClassicSimilarity], result of:
              0.03986477 = score(doc=182,freq=2.0), product of:
                0.17172676 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049039155 = queryNorm
                0.23214069 = fieldWeight in 182, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=182)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    20. 1.2015 18:30:22
  16. Park, Y.J.: ¬A socio-technological model of search information divide in US cities (2021) 0.01
    0.009966193 = product of:
      0.019932386 = sum of:
        0.019932386 = product of:
          0.03986477 = sum of:
            0.03986477 = weight(_text_:22 in 184) [ClassicSimilarity], result of:
              0.03986477 = score(doc=184,freq=2.0), product of:
                0.17172676 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049039155 = queryNorm
                0.23214069 = fieldWeight in 184, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=184)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    20. 1.2015 18:30:22
  17. Cooke, N.A.; Kitzie, V.L.: Outsiders-within-Library and Information Science : reprioritizing the marginalized in critical sociocultural work (2021) 0.01
    0.009966193 = product of:
      0.019932386 = sum of:
        0.019932386 = product of:
          0.03986477 = sum of:
            0.03986477 = weight(_text_:22 in 351) [ClassicSimilarity], result of:
              0.03986477 = score(doc=351,freq=2.0), product of:
                0.17172676 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049039155 = queryNorm
                0.23214069 = fieldWeight in 351, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=351)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    18. 9.2021 13:22:27
  18. Zheng, X.; Chen, J.; Yan, E.; Ni, C.: Gender and country biases in Wikipedia citations to scholarly publications (2023) 0.01
    0.009966193 = product of:
      0.019932386 = sum of:
        0.019932386 = product of:
          0.03986477 = sum of:
            0.03986477 = weight(_text_:22 in 886) [ClassicSimilarity], result of:
              0.03986477 = score(doc=886,freq=2.0), product of:
                0.17172676 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049039155 = queryNorm
                0.23214069 = fieldWeight in 886, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=886)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 1.2023 18:53:32
  19. Ma, Y.: Relatedness and compatibility : the concept of privacy in Mandarin Chinese and American English corpora (2023) 0.01
    0.009966193 = product of:
      0.019932386 = sum of:
        0.019932386 = product of:
          0.03986477 = sum of:
            0.03986477 = weight(_text_:22 in 887) [ClassicSimilarity], result of:
              0.03986477 = score(doc=887,freq=2.0), product of:
                0.17172676 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049039155 = queryNorm
                0.23214069 = fieldWeight in 887, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=887)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 1.2023 18:59:40
  20. Ma, L.: Information, platformized (2023) 0.01
    0.009966193 = product of:
      0.019932386 = sum of:
        0.019932386 = product of:
          0.03986477 = sum of:
            0.03986477 = weight(_text_:22 in 888) [ClassicSimilarity], result of:
              0.03986477 = score(doc=888,freq=2.0), product of:
                0.17172676 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049039155 = queryNorm
                0.23214069 = fieldWeight in 888, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=888)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 1.2023 19:01:47

Types

  • a 75
  • el 3
  • m 3
  • p 2
  • More… Less…