Search (126 results, page 1 of 7)

  • × year_i:[2020 TO 2030}
  1. Rae, A.R.; Mork, J.G.; Demner-Fushman, D.: ¬The National Library of Medicine indexer assignment dataset : a new large-scale dataset for reviewer assignment research (2023) 0.05
    0.05061885 = product of:
      0.1012377 = sum of:
        0.1012377 = sum of:
          0.06654532 = weight(_text_:headings in 885) [ClassicSimilarity], result of:
            0.06654532 = score(doc=885,freq=2.0), product of:
              0.24837378 = queryWeight, product of:
                4.849944 = idf(docFreq=940, maxDocs=44218)
                0.051211677 = queryNorm
              0.2679241 = fieldWeight in 885, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.849944 = idf(docFreq=940, maxDocs=44218)
                0.0390625 = fieldNorm(doc=885)
          0.034692377 = weight(_text_:22 in 885) [ClassicSimilarity], result of:
            0.034692377 = score(doc=885,freq=2.0), product of:
              0.17933457 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051211677 = queryNorm
              0.19345059 = fieldWeight in 885, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=885)
      0.5 = coord(1/2)
    
    Abstract
    MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs a team of MeSH indexers, and in recent years they have been asked to index close to 1 million articles per year in order to keep MEDLINE up to date. An important part of the MEDLINE indexing process is the assignment of articles to indexers. High quality and timely indexing is only possible when articles are assigned to indexers with suitable expertise. This article introduces the NLM indexer assignment dataset: a large dataset of 4.2 million indexer article assignments for articles indexed between 2011 and 2019. The dataset is shown to be a valuable testbed for expert matching and assignment algorithms, and indexer article assignment is also found to be useful domain-adaptive pre-training for the closely related task of reviewer assignment.
    Date
    22. 1.2023 18:49:49
  2. Lorenzo, L.; Mak, L.; Smeltekop, N.: FAST Headings in MODS : Michigan State University libraries digital repository case study (2023) 0.05
    0.046581723 = product of:
      0.093163446 = sum of:
        0.093163446 = product of:
          0.18632689 = sum of:
            0.18632689 = weight(_text_:headings in 1177) [ClassicSimilarity], result of:
              0.18632689 = score(doc=1177,freq=8.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.75018746 = fieldWeight in 1177, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1177)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The Michigan State University Libraries (MSUL) digital repository contains numerous collections of openly available material. Since 2016, the digital repository has been using Faceted Application of Subject Terminology (FAST) subject headings as its primary subject vocabulary in order to streamline faceting, display, and search. The MSUL FAST use case presents some challenges that are not addressed by existing MARC-focused FAST tools. This paper will outline the MSUL digital repository team's justification for including FAST headings in the digital repository as well as workflows for adding FAST headings to Metadata Object Description Schema (MODS) metadata, their maintenance, and utilization for discovery.
  3. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.04
    0.04066886 = product of:
      0.08133772 = sum of:
        0.08133772 = product of:
          0.24401315 = sum of:
            0.24401315 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.24401315 = score(doc=862,freq=2.0), product of:
                0.43417317 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.051211677 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  4. Zimmerman, N.: User study: implementation of OCLC FAST subject headings in the Lafayette digital repository (2023) 0.04
    0.040340956 = product of:
      0.08068191 = sum of:
        0.08068191 = product of:
          0.16136383 = sum of:
            0.16136383 = weight(_text_:headings in 1176) [ClassicSimilarity], result of:
              0.16136383 = score(doc=1176,freq=6.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.6496814 = fieldWeight in 1176, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1176)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Digital repository migrations present a periodic opportunity to assess metadata quality and to perform strategic enhancements. Lafayette College Libraries implemented OCLC FAST (Faceted Application of Subject Terminology) for its digital image collections as part of a migration from multiple repositories to a single one built on the Samvera Hyrax open-source framework. Application of FAST has normalized subject headings across dissimilar collections in a way that tremendously improves descriptive consistency for staff and discoverability for end users. However, the process of applying FAST headings was complicated by several features of in-scope metadata as well as gaps in available controlled subject authorities.
  5. Hutchinson, J.; Nakatomi, J.: Improving subject description of an LGBTQ+ collection (2024) 0.04
    0.037643716 = product of:
      0.07528743 = sum of:
        0.07528743 = product of:
          0.15057486 = sum of:
            0.15057486 = weight(_text_:headings in 1157) [ClassicSimilarity], result of:
              0.15057486 = score(doc=1157,freq=4.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.606243 = fieldWeight in 1157, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1157)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This article summarizes the work done as part of a project to improve subject description of an LGBTQ + collection in the ONE Archives, part of the University of Southern California (USC) Libraries. The project involved adding local subject headings to augment existing Library of Congress Subject Headings. The article describes the steps that the project team took, along with the methods that were rejected. The paper discusses reasons why the team chose their course of action.
  6. Smith, A.: Physics Subject Headings (PhySH) (2020) 0.03
    0.034577962 = product of:
      0.069155924 = sum of:
        0.069155924 = product of:
          0.13831185 = sum of:
            0.13831185 = weight(_text_:headings in 5884) [ClassicSimilarity], result of:
              0.13831185 = score(doc=5884,freq=6.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.55686975 = fieldWeight in 5884, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5884)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    PhySH (Physics Subject Headings) was developed by the American Physical Society and first used in 2016 as a faceted hierarchical controlled vocabulary for physics, with some basic terms from related fields. It was developed mainly for the purpose of associating subjects with papers submitted to and published in the Physical Review family of journals. The scheme is organized at the top level with a two-dimensional classification, with one dimension (labeled "disciplines") representing professional divisions within physics, and the other dimension (labeled "facets") providing a conceptual partitioning of terms. PhySH was preceded in use by PACS ("Physics and Astronomy Classification Scheme"), which was in turn preceded by more ad hoc approaches, and this history and related vocabularies or categorizations will also be briefly discussed.
    Object
    Physics Subject Headings
  7. Dietz, K.: en.wikipedia.org > 6 Mio. Artikel (2020) 0.03
    0.033890717 = product of:
      0.06778143 = sum of:
        0.06778143 = product of:
          0.2033443 = sum of:
            0.2033443 = weight(_text_:3a in 5669) [ClassicSimilarity], result of:
              0.2033443 = score(doc=5669,freq=2.0), product of:
                0.43417317 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.051211677 = queryNorm
                0.46834838 = fieldWeight in 5669, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5669)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Content
    "Die Englischsprachige Wikipedia verfügt jetzt über mehr als 6 Millionen Artikel. An zweiter Stelle kommt die deutschsprachige Wikipedia mit 2.3 Millionen Artikeln, an dritter Stelle steht die französischsprachige Wikipedia mit 2.1 Millionen Artikeln (via Researchbuzz: Firehose <https://rbfirehose.com/2020/01/24/techcrunch-wikipedia-now-has-more-than-6-million-articles-in-english/> und Techcrunch <https://techcrunch.com/2020/01/23/wikipedia-english-six-million-articles/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Techcrunch+%28TechCrunch%29&guccounter=1&guce_referrer=aHR0cHM6Ly9yYmZpcmVob3NlLmNvbS8yMDIwLzAxLzI0L3RlY2hjcnVuY2gtd2lraXBlZGlhLW5vdy1oYXMtbW9yZS10aGFuLTYtbWlsbGlvbi1hcnRpY2xlcy1pbi1lbmdsaXNoLw&guce_referrer_sig=AQAAAK0zHfjdDZ_spFZBF_z-zDjtL5iWvuKDumFTzm4HvQzkUfE2pLXQzGS6FGB_y-VISdMEsUSvkNsg2U_NWQ4lwWSvOo3jvXo1I3GtgHpP8exukVxYAnn5mJspqX50VHIWFADHhs5AerkRn3hMRtf_R3F1qmEbo8EROZXp328HMC-o>). 250120 via digithek ch = #fineBlog s.a.: Angesichts der Veröffentlichung des 6-millionsten Artikels vergangene Woche in der englischsprachigen Wikipedia hat die Community-Zeitungsseite "Wikipedia Signpost" ein Moratorium bei der Veröffentlichung von Unternehmensartikeln gefordert. Das sei kein Vorwurf gegen die Wikimedia Foundation, aber die derzeitigen Maßnahmen, um die Enzyklopädie gegen missbräuchliches undeklariertes Paid Editing zu schützen, funktionierten ganz klar nicht. *"Da die ehrenamtlichen Autoren derzeit von Werbung in Gestalt von Wikipedia-Artikeln überwältigt werden, und da die WMF nicht in der Lage zu sein scheint, dem irgendetwas entgegenzusetzen, wäre der einzige gangbare Weg für die Autoren, fürs erste die Neuanlage von Artikeln über Unternehmen zu untersagen"*, schreibt der Benutzer Smallbones in seinem Editorial <https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2020-01-27/From_the_editor> zur heutigen Ausgabe."
  8. Gabler, S.: Vergabe von DDC-Sachgruppen mittels eines Schlagwort-Thesaurus (2021) 0.03
    0.033890717 = product of:
      0.06778143 = sum of:
        0.06778143 = product of:
          0.2033443 = sum of:
            0.2033443 = weight(_text_:3a in 1000) [ClassicSimilarity], result of:
              0.2033443 = score(doc=1000,freq=2.0), product of:
                0.43417317 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.051211677 = queryNorm
                0.46834838 = fieldWeight in 1000, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1000)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Content
    Master thesis Master of Science (Library and Information Studies) (MSc), Universität Wien. Advisor: Christoph Steiner. Vgl.: https://www.researchgate.net/publication/371680244_Vergabe_von_DDC-Sachgruppen_mittels_eines_Schlagwort-Thesaurus. DOI: 10.25365/thesis.70030. Vgl. dazu die Präsentation unter: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=web&cd=&ved=0CAIQw7AJahcKEwjwoZzzytz_AhUAAAAAHQAAAAAQAg&url=https%3A%2F%2Fwiki.dnb.de%2Fdownload%2Fattachments%2F252121510%2FDA3%2520Workshop-Gabler.pdf%3Fversion%3D1%26modificationDate%3D1671093170000%26api%3Dv2&psig=AOvVaw0szwENK1or3HevgvIDOfjx&ust=1687719410889597&opi=89978449.
  9. Bullard, J.; Watson, B.; Purdome, C.: Misrepresentation in the surrogate : author critiques of "Indians of North America" subject headings (2022) 0.03
    0.03293825 = product of:
      0.0658765 = sum of:
        0.0658765 = product of:
          0.131753 = sum of:
            0.131753 = weight(_text_:headings in 1143) [ClassicSimilarity], result of:
              0.131753 = score(doc=1143,freq=4.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.5304626 = fieldWeight in 1143, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1143)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The surrogate record for a book in the library catalog contains subject headings applied on the basis of literary warrant. To assess the extent to which terms like "Indians of North America" are accurate to the content of the items with that label, we invited the items' creators to critique their surrogate records. In interviews with 38 creators we found consensus against the term "Indians of North America" and identified a periphery of related terms that misrepresent the content of the work, are out of alignment with their scholarly communities, and reproduce settler colonial biases in our library systems.
  10. Cooey, N.; Phillips, A.: Library of Congress Subject Headings : a post-coordinated future (2023) 0.03
    0.03293825 = product of:
      0.0658765 = sum of:
        0.0658765 = product of:
          0.131753 = sum of:
            0.131753 = weight(_text_:headings in 1163) [ClassicSimilarity], result of:
              0.131753 = score(doc=1163,freq=4.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.5304626 = fieldWeight in 1163, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1163)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper is the result of a request from Library of Congress leadership to assess pre-coordinated versus post-coordinated subject cataloging. It argues that the disadvantages of pre-coordinated subject strings are perennial and continue to hinder progress, while the advantages of post-coordinated subject cataloging have expanded, resulting in new opportunities to serve the needs of catalogers and end users alike. The consequences of retaining pre-coordinated headings will have long-term impacts that heavily out-weigh the short-term challenges of transitioning to new cataloging practices. By implementing post-coordinated, faceted vocabularies, the Library of Congress will be investing in the future of libraries.
  11. Walker, J.M.: Faceted vocabularies in catalog searches : provenance evidence vocabulary as search terms or limiters for a personal library collection (2023) 0.03
    0.03293825 = product of:
      0.0658765 = sum of:
        0.0658765 = product of:
          0.131753 = sum of:
            0.131753 = weight(_text_:headings in 1173) [ClassicSimilarity], result of:
              0.131753 = score(doc=1173,freq=4.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.5304626 = fieldWeight in 1173, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1173)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Genre/Form headings are an important means by which librarians provide users with contextual or descriptive information. To facilitate the discovery of resources with important provenance characteristics, the Marion E. Wade Center added terms from a controlled vocabulary to bibliographic records representing items in the C. S. Lewis personal library collection. The selected terms focus on features that have historically been of interest to visitors. The addition of these headings in the bibliographic records allows users to use these keywords to conduct a search or narrow their results, resulting in more flexibility to locate and select the resources that best meet their needs.
  12. Wlodarczyk, B.: KABA Subject Headings and the National Library of Poland Descriptors in light of Wojciech Wrzosek's theory of historiographical metaphors and different historiographical traditions (2020) 0.03
    0.02881497 = product of:
      0.05762994 = sum of:
        0.05762994 = product of:
          0.11525988 = sum of:
            0.11525988 = weight(_text_:headings in 5733) [ClassicSimilarity], result of:
              0.11525988 = score(doc=5733,freq=6.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.46405816 = fieldWeight in 5733, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5733)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The aims of this article are, first, to provide a necessary background to investigate the discipline of history from the knowledge organization (KO) perspective, and econdly, to present, on selected examples, a way of analyzing knowledge organization systems (KOSs) from the point of view of the theory of history. The study includes a literature review and epistemological analysis. It provides a preliminary analysis of history in two selected universal Polish KOSs: KABA subject headings and the National Library of Poland Descriptors. The research is restricted to the high-level concept of historiographical metaphors coined by Wojciech Wrzosek and how they can be utilized in analyzing KOSs. The analysis of the structure of the KOSs and indexing practices of selected history books is performed. A particular emphasis is placed upon the requirements of classical and non-classical historiography in the context of KO. Although the knowledge about historiographical metaphors given by Wrzosek can be helpful for the analysis and improvement of KOSs, it seems that their broad character can provide the creators only with some general guidelines. Historical research is multidimensional, which is why the general remarks presented in this article need to be supplemented with in-depth theoretical and empirical analyses of historiography.
    Object
    KABA Subject Headings
  13. ¬Der Student aus dem Computer (2023) 0.02
    0.024284663 = product of:
      0.048569325 = sum of:
        0.048569325 = product of:
          0.09713865 = sum of:
            0.09713865 = weight(_text_:22 in 1079) [ClassicSimilarity], result of:
              0.09713865 = score(doc=1079,freq=2.0), product of:
                0.17933457 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051211677 = queryNorm
                0.5416616 = fieldWeight in 1079, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=1079)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    27. 1.2023 16:22:55
  14. Peponakis, M.; Mastora, A.; Kapidakis, S.; Doerr, M.: Expressiveness and machine processability of Knowledge Organization Systems (KOS) : an analysis of concepts and relations (2020) 0.02
    0.023527324 = product of:
      0.04705465 = sum of:
        0.04705465 = product of:
          0.0941093 = sum of:
            0.0941093 = weight(_text_:headings in 5787) [ClassicSimilarity], result of:
              0.0941093 = score(doc=5787,freq=4.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.3789019 = fieldWeight in 5787, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5787)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This study considers the expressiveness (that is the expressive power or expressivity) of different types of Knowledge Organization Systems (KOS) and discusses its potential to be machine-processable in the context of the Semantic Web. For this purpose, the theoretical foundations of KOS are reviewed based on conceptualizations introduced by the Functional Requirements for Subject Authority Data (FRSAD) and the Simple Knowledge Organization System (SKOS); natural language processing techniques are also implemented. Applying a comparative analysis, the dataset comprises a thesaurus (Eurovoc), a subject headings system (LCSH) and a classification scheme (DDC). These are compared with an ontology (CIDOC-CRM) by focusing on how they define and handle concepts and relations. It was observed that LCSH and DDC focus on the formalism of character strings (nomens) rather than on the modelling of semantics; their definition of what constitutes a concept is quite fuzzy, and they comprise a large number of complex concepts. By contrast, thesauri have a coherent definition of what constitutes a concept, and apply a systematic approach to the modelling of relations. Ontologies explicitly define diverse types of relations, and are by their nature machine-processable. The paper concludes that the potential of both the expressiveness and machine processability of each KOS is extensively regulated by its structural rules. It is harder to represent subject headings and classification schemes as semantic networks with nodes and arcs, while thesauri are more suitable for such a representation. In addition, a paradigm shift is revealed which focuses on the modelling of relations between concepts, rather than the concepts themselves.
  15. Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.02
    0.023527324 = product of:
      0.04705465 = sum of:
        0.04705465 = product of:
          0.0941093 = sum of:
            0.0941093 = weight(_text_:headings in 977) [ClassicSimilarity], result of:
              0.0941093 = score(doc=977,freq=4.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.3789019 = fieldWeight in 977, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=977)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.
  16. Grabus, S.; Logan, P.M.; Greenberg, J.: Temporal concept drift and alignment : an empirical approach to comparing knowledge organization systems over time (2022) 0.02
    0.023527324 = product of:
      0.04705465 = sum of:
        0.04705465 = product of:
          0.0941093 = sum of:
            0.0941093 = weight(_text_:headings in 1100) [ClassicSimilarity], result of:
              0.0941093 = score(doc=1100,freq=4.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.3789019 = fieldWeight in 1100, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1100)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This research explores temporal concept drift and temporal alignment in knowledge organization systems (KOS). A comparative analysis is pursued using the 1910 Library of Congress Subject Headings, 2020 FAST Topical, and automatic indexing. The use case involves a sample of 90 nineteenth-century Encyclopedia Britannica entries. The entries were indexed using two approaches: 1) full-text indexing; 2) Named Entity Recognition was performed upon the entries with Stanza, Stanford's NLP toolkit, and entities were automatically indexed with the Helping Interdisciplinary Vocabulary application (HIVE), using both 1910 LCSH and FAST Topical. The analysis focused on three goals: 1) identifying results that were exclusive to the 1910 LCSH output; 2) identifying terms in the exclusive set that have been deprecated from the contemporary LCSH, demonstrating temporal concept drift; and 3) exploring the historical significance of these deprecated terms. Results confirm that historical vocabularies can be used to generate anachronistic subject headings representing conceptual drift across time in KOS and historical resources. A methodological contribution is made demonstrating how to study changes in KOS over time and improve the contextualization historical humanities resources.
  17. Smith, C.: Controlled vocabularies : past, present and future of subject access (2021) 0.02
    0.023290861 = product of:
      0.046581723 = sum of:
        0.046581723 = product of:
          0.093163446 = sum of:
            0.093163446 = weight(_text_:headings in 704) [ClassicSimilarity], result of:
              0.093163446 = score(doc=704,freq=2.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.37509373 = fieldWeight in 704, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=704)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Controlled vocabularies are a foundational concept in library science and provide a framework for consistency in cataloging practices. Subject headings provide valuable access points to library resources during search and discovery for patrons. Many librarians will be familiar with the more widely used controlled vocabularies, like those maintained by national libraries or major professional organizations. More recently, there has been an increasing shift toward specialized vocabularies maintained by independent entities intended for much narrower use. While there is valid criticism of the nature or content of controlled vocabularies, they will likely continue to be an important feature in information organization.
  18. Moulaison-Sandy, H.; Adkins, D.; Bossaller, J.; Cho, H.: ¬An automated approach to describing fiction : a methodology to use book reviews to identify affect (2021) 0.02
    0.023290861 = product of:
      0.046581723 = sum of:
        0.046581723 = product of:
          0.093163446 = sum of:
            0.093163446 = weight(_text_:headings in 710) [ClassicSimilarity], result of:
              0.093163446 = score(doc=710,freq=2.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.37509373 = fieldWeight in 710, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=710)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Subject headings and genre terms are notoriously difficult to apply, yet are important for fiction. The current project functions as a proof of concept, using a text-mining methodology to identify affective information (emotion and tone) about fiction titles from professional book reviews as a potential first step in automating the subject analysis process. Findings are presented and discussed, comparing results to the range of aboutness and isness information in library cataloging records. The methodology is likewise presented, and how future work might expand on the current project to enhance catalog records through text-mining is explored.
  19. Hahn, J.: Semi-automated methods for BIBFRAME work entity description (2021) 0.02
    0.023290861 = product of:
      0.046581723 = sum of:
        0.046581723 = product of:
          0.093163446 = sum of:
            0.093163446 = weight(_text_:headings in 725) [ClassicSimilarity], result of:
              0.093163446 = score(doc=725,freq=2.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.37509373 = fieldWeight in 725, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=725)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper reports an investigation of machine learning methods for the semi-automated creation of a BIBFRAME Work entity description within the RDF linked data editor Sinopia (https://sinopia.io). The automated subject indexing software Annif was configured with the Library of Congress Subject Headings (LCSH) vocabulary from the Linked Data Service at https://id.loc.gov/. The training corpus was comprised of 9.3 million titles and LCSH linked data references from the IvyPlus POD project (https://pod.stanford.edu/) and from Share-VDE (https://wiki.share-vde.org). Semi-automated processes were explored to support and extend, not replace, professional expertise.
  20. Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.02
    0.023290861 = product of:
      0.046581723 = sum of:
        0.046581723 = product of:
          0.093163446 = sum of:
            0.093163446 = weight(_text_:headings in 1139) [ClassicSimilarity], result of:
              0.093163446 = score(doc=1139,freq=2.0), product of:
                0.24837378 = queryWeight, product of:
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.051211677 = queryNorm
                0.37509373 = fieldWeight in 1139, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.849944 = idf(docFreq=940, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1139)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.

Languages

  • e 95
  • d 31

Types

  • a 118
  • el 20
  • p 3
  • m 2
  • x 1
  • More… Less…