Search (441 results, page 1 of 23)

Gabler, S.: Vergabe von DDC-Sachgruppen mittels eines Schlagwort-Thesaurus (2021) 0.28

0.28202426 = product of:
  0.50764364 = sum of:
    0.048730727 = product of:
      0.14619218 = sum of:
        0.14619218 = weight(_text_:3a in 1000) [ClassicSimilarity], result of:
          0.14619218 = score(doc=1000,freq=2.0), product of:
            0.31214407 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.036818076 = queryNorm
            0.46834838 = fieldWeight in 1000, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1000)
      0.33333334 = coord(1/3)
    0.14619218 = weight(_text_:2f in 1000) [ClassicSimilarity], result of:
      0.14619218 = score(doc=1000,freq=2.0), product of:
        0.31214407 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.036818076 = queryNorm
        0.46834838 = fieldWeight in 1000, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1000)
    0.020336384 = weight(_text_:data in 1000) [ClassicSimilarity], result of:
      0.020336384 = score(doc=1000,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.17468026 = fieldWeight in 1000, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1000)
    0.14619218 = weight(_text_:2f in 1000) [ClassicSimilarity], result of:
      0.14619218 = score(doc=1000,freq=2.0), product of:
        0.31214407 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.036818076 = queryNorm
        0.46834838 = fieldWeight in 1000, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1000)
    0.14619218 = weight(_text_:2f in 1000) [ClassicSimilarity], result of:
      0.14619218 = score(doc=1000,freq=2.0), product of:
        0.31214407 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.036818076 = queryNorm
        0.46834838 = fieldWeight in 1000, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1000)
  0.5555556 = coord(5/9)

Abstract: Vorgestellt wird die Konstruktion eines thematisch geordneten Thesaurus auf Basis der Sachschlagwörter der Gemeinsamen Normdatei (GND) unter Nutzung der darin enthaltenen DDC-Notationen. Oberste Ordnungsebene dieses Thesaurus werden die DDC-Sachgruppen der Deutschen Nationalbibliothek. Die Konstruktion des Thesaurus erfolgt regelbasiert unter der Nutzung von Linked Data Prinzipien in einem SPARQL Prozessor. Der Thesaurus dient der automatisierten Gewinnung von Metadaten aus wissenschaftlichen Publikationen mittels eines computerlinguistischen Extraktors. Hierzu werden digitale Volltexte verarbeitet. Dieser ermittelt die gefundenen Schlagwörter über Vergleich der Zeichenfolgen Benennungen im Thesaurus, ordnet die Treffer nach Relevanz im Text und gibt die zugeordne-ten Sachgruppen rangordnend zurück. Die grundlegende Annahme dabei ist, dass die gesuchte Sachgruppe unter den oberen Rängen zurückgegeben wird. In einem dreistufigen Verfahren wird die Leistungsfähigkeit des Verfahrens validiert. Hierzu wird zunächst anhand von Metadaten und Erkenntnissen einer Kurzautopsie ein Goldstandard aus Dokumenten erstellt, die im Online-Katalog der DNB abrufbar sind. Die Dokumente vertei-len sich über 14 der Sachgruppen mit einer Losgröße von jeweils 50 Dokumenten. Sämtliche Dokumente werden mit dem Extraktor erschlossen und die Ergebnisse der Kategorisierung do-kumentiert. Schließlich wird die sich daraus ergebende Retrievalleistung sowohl für eine harte (binäre) Kategorisierung als auch eine rangordnende Rückgabe der Sachgruppen beurteilt.
Content: Master thesis Master of Science (Library and Information Studies) (MSc), Universität Wien. Advisor: Christoph Steiner. Vgl.: https://www.researchgate.net/publication/371680244_Vergabe_von_DDC-Sachgruppen_mittels_eines_Schlagwort-Thesaurus. DOI: 10.25365/thesis.70030. Vgl. dazu die Präsentation unter: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=web&cd=&ved=0CAIQw7AJahcKEwjwoZzzytz_AhUAAAAAHQAAAAAQAg&url=https%3A%2F%2Fwiki.dnb.de%2Fdownload%2Fattachments%2F252121510%2FDA3%2520Workshop-Gabler.pdf%3Fversion%3D1%26modificationDate%3D1671093170000%26api%3Dv2&psig=AOvVaw0szwENK1or3HevgvIDOfjx&ust=1687719410889597&opi=89978449.

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.26

0.25989717 = product of:
  0.58476865 = sum of:
    0.058476865 = product of:
      0.1754306 = sum of:
        0.1754306 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.1754306 = score(doc=862,freq=2.0), product of:
            0.31214407 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.036818076 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
    0.1754306 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.1754306 = score(doc=862,freq=2.0), product of:
        0.31214407 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.036818076 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.1754306 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.1754306 = score(doc=862,freq=2.0), product of:
        0.31214407 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.036818076 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.1754306 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.1754306 = score(doc=862,freq=2.0), product of:
        0.31214407 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.036818076 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
  0.44444445 = coord(4/9)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Ilhan, A.; Fietkiewicz, K.J.: Data privacy-related behavior and concerns of activity tracking technology users from Germany and the USA (2021) 0.07
```
0.07381636 = product of:
  0.22144906 = sum of:
    0.06430929 = weight(_text_:data in 180) [ClassicSimilarity], result of:
      0.06430929 = score(doc=180,freq=20.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.5523875 = fieldWeight in 180, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=180)
    0.14466892 = weight(_text_:germany in 180) [ClassicSimilarity], result of:
      0.14466892 = score(doc=180,freq=8.0), product of:
        0.21956629 = queryWeight, product of:
          5.963546 = idf(docFreq=308, maxDocs=44218)
          0.036818076 = queryNorm
        0.65888494 = fieldWeight in 180, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          5.963546 = idf(docFreq=308, maxDocs=44218)
          0.0390625 = fieldNorm(doc=180)
    0.012470853 = product of:
      0.024941705 = sum of:
        0.024941705 = weight(_text_:22 in 180) [ClassicSimilarity], result of:
          0.024941705 = score(doc=180,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.19345059 = fieldWeight in 180, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=180)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)
```
Abstract

Purpose This investigation aims to examine the differences and similarities between activity tracking technology users from two regions (the USA and Germany) in their intended privacy-related behavior. The focus lies on data handling after hypothetical discontinuance of use, data protection and privacy policy seeking, and privacy concerns. Design/methodology/approach The data was collected through an online survey in 2019. In order to identify significant differences between participants from Germany and the USA, the chi-squared test and the Mann-Whitney U test were applied. Findings The intensity of several privacy-related concerns was significantly different between the two groups. The majority of the participants did not inform themselves about the respective data privacy policies or terms and conditions before installing an activity tracking application. The majority of the German participants knew that they could request the deletion of all their collected data. In contrast, only 35% out of 68 participants from the US knew about this option. Research limitations/implications This study intends to raise awareness about managing the collected health and fitness data after stopping to use activity tracking technologies. Furthermore, to reduce privacy and security concerns, the involvement of the government, companies and users is necessary to handle and share data more considerably and in a sustainable way. Originality/value This study sheds light on users of activity tracking technologies from a broad perspective (here, participants from the USA and Germany). It incorporates not only concerns and the privacy paradox but (intended) user behavior, including seeking information on data protection and privacy policy and handling data after hypothetical discontinuance of use of the technology.

Date

20. 1.2015 18:30:22

Daquino, M.; Peroni, S.; Shotton, D.; Colavizza, G.; Ghavimi, B.; Lauscher, A.; Mayr, P.; Romanello, M.; Zumstein, P.: ¬The OpenCitations Data Model (2020) 0.07

0.069670334 = product of:
  0.20901099 = sum of:
    0.0921318 = weight(_text_:readable in 38) [ClassicSimilarity], result of:
      0.0921318 = score(doc=38,freq=2.0), product of:
        0.2262076 = queryWeight, product of:
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.036818076 = queryNorm
        0.4072887 = fieldWeight in 38, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.046875 = fieldNorm(doc=38)
    0.052313168 = weight(_text_:bibliographic in 38) [ClassicSimilarity], result of:
      0.052313168 = score(doc=38,freq=4.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.3649729 = fieldWeight in 38, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.046875 = fieldNorm(doc=38)
    0.064566016 = weight(_text_:data in 38) [ClassicSimilarity], result of:
      0.064566016 = score(doc=38,freq=14.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.55459267 = fieldWeight in 38, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=38)
  0.33333334 = coord(3/9)

Abstract: A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations. This diversity, and the reuse of the same ontology terms with different nuances, generates inconsistencies in data. Adoption of a single data model would facilitate data integration tasks regardless of the data supplier or context application. In this paper we present the OpenCitations Data Model (OCDM), a generic data model for describing bibliographic entities and citations, developed using Semantic Web technologies. We also evaluate the effective reusability of OCDM according to ontology evaluation practices, mention existing users of OCDM, and discuss the use and impact of OCDM in the wider open science community.

Fernanda de Jesus, A.; Ferreira de Castro, F.: Proposal for the publication of linked open bibliographic data (2024) 0.03

0.030788446 = product of:
  0.138548 = sum of:
    0.073981985 = weight(_text_:bibliographic in 1161) [ClassicSimilarity], result of:
      0.073981985 = score(doc=1161,freq=8.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.5161496 = fieldWeight in 1161, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.046875 = fieldNorm(doc=1161)
    0.064566016 = weight(_text_:data in 1161) [ClassicSimilarity], result of:
      0.064566016 = score(doc=1161,freq=14.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.55459267 = fieldWeight in 1161, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=1161)
  0.22222222 = coord(2/9)

Abstract: Linked Open Data (LOD) are a set of principles for publishing structured, connected data available for reuse under an open license. The objective of this paper is to analyze the publishing of bibliographic data such as LOD, having as a product the elaboration of theoretical-methodological recommendations for the publication of these data, in an approach based on the ten best practices for publishing LOD, from the World Wide Web Consortium. The starting point was the conduction of a Systematic Review of Literature, where initiatives to publish bibliographic data such as LOD were identified. An empirical study of these institutions was also conducted. As a result, theoretical-methodological recommendations were obtained for the process of publishing bibliographic data such as LOD.

Zhu, Y.; Quan, L.; Chen, P.-Y.; Kim, M.C.; Che, C.: Predicting coauthorship using bibliographic network embedding (2023) 0.03
```
0.028413637 = product of:
  0.12786137 = sum of:
    0.0871886 = weight(_text_:bibliographic in 917) [ClassicSimilarity], result of:
      0.0871886 = score(doc=917,freq=16.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.6082881 = fieldWeight in 917, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0390625 = fieldNorm(doc=917)
    0.040672768 = weight(_text_:data in 917) [ClassicSimilarity], result of:
      0.040672768 = score(doc=917,freq=8.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.34936053 = fieldWeight in 917, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=917)
  0.22222222 = coord(2/9)
```
Abstract

Coauthorship prediction applies predictive analytics to bibliographic data to predict authors who are highly likely to be coauthors. In this study, we propose an approach for coauthorship prediction based on bibliographic network embedding through a graph-based bibliographic data model that can be used to model common bibliographic data, including papers, terms, sources, authors, departments, research interests, universities, and countries. A real-world dataset released by AMiner that includes more than 2 million papers, 8 million citations, and 1.7 million authors were integrated into a large bibliographic network using the proposed bibliographic data model. Translation-based methods were applied to the entities and relationships to generate their low-dimensional embeddings while preserving their connectivity information in the original bibliographic network. We applied machine learning algorithms to embeddings that represent the coauthorship relationships of the two authors and achieved high prediction results. The reference model, which is the combination of a network embedding size of 100, the most basic translation-based method, and a gradient boosting method achieved an F1 score of 0.9 and even higher scores are obtainable with different embedding sizes and more advanced embedding methods. Thus, the strengths of the proposed approach lie in its customizable components under a unified framework.
Hobert, A.; Jahn, N.; Mayr, P.; Schmidt, B.; Taubert, N.: Open access uptake in Germany 2010-2018 : adoption in a diverse research landscape (2021) 0.03
```
0.027386125 = product of:
  0.123237565 = sum of:
    0.02300799 = weight(_text_:data in 250) [ClassicSimilarity], result of:
      0.02300799 = score(doc=250,freq=4.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.19762816 = fieldWeight in 250, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03125 = fieldNorm(doc=250)
    0.100229576 = weight(_text_:germany in 250) [ClassicSimilarity], result of:
      0.100229576 = score(doc=250,freq=6.0), product of:
        0.21956629 = queryWeight, product of:
          5.963546 = idf(docFreq=308, maxDocs=44218)
          0.036818076 = queryNorm
        0.4564889 = fieldWeight in 250, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.963546 = idf(docFreq=308, maxDocs=44218)
          0.03125 = fieldNorm(doc=250)
  0.22222222 = coord(2/9)
```
Content

This study investigates the development of open access (OA) to journal articles from authors affiliated with German universities and non-university research institutions in the period 2010-2018. Beyond determining the overall share of openly available articles, a systematic classification of distinct categories of OA publishing allowed us to identify different patterns of adoption of OA. Taking into account the particularities of the German research landscape, variations in terms of productivity, OA uptake and approaches to OA are examined at the meso-level and possible explanations are discussed. The development of the OA uptake is analysed for the different research sectors in Germany (universities, non-university research institutes of the Helmholtz Association, Fraunhofer Society, Max Planck Society, Leibniz Association, and government research agencies). Combining several data sources (incl. Web of Science, Unpaywall, an authority file of standardised German affiliation information, the ISSN-Gold-OA 3.0 list, and OpenDOAR), the study confirms the growth of the OA share mirroring the international trend reported in related studies. We found that 45% of all considered articles during the observed period were openly available at the time of analysis. Our findings show that subject-specific repositories are the most prevalent type of OA. However, the percentages for publication in fully OA journals and OA via institutional repositories show similarly steep increases. Enabling data-driven decision-making regarding the implementation of OA in Germany at the institutional level, the results of this study furthermore can serve as a baseline to assess the impact recent transformative agreements with major publishers will likely have on scholarly communication.

Wu, S.: Implementing bibliographic enhancement data in academic library catalogs : an empirical study (2024) 0.02

0.024521142 = product of:
  0.11034514 = sum of:
    0.061032027 = weight(_text_:bibliographic in 1159) [ClassicSimilarity], result of:
      0.061032027 = score(doc=1159,freq=4.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.4258017 = fieldWeight in 1159, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1159)
    0.04931311 = weight(_text_:data in 1159) [ClassicSimilarity], result of:
      0.04931311 = score(doc=1159,freq=6.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.42357713 = fieldWeight in 1159, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1159)
  0.22222222 = coord(2/9)

Abstract: This study examines users' needs for bibliographic enhancement data (BIBED) in academic library catalogs. Qualitative data were collected through 30 academic users' activity logs and follow-up interviews. These 30 participants were recruited from a public university in the United States that has over 19,000 students enrolled and over 600 full-time faculty members. This study identified 19 types of BIBED useful for supporting the five user tasks proposed in the IFLA Library Reference Model and in seven other contexts, such as enhancing one's understanding, offering search instructions, and providing readers' advisory. Findings suggest that adopting BIBFRAME and Semantic Web technologies may enable academic library catalogs to provide BIBED to better meet user needs in various contexts.

Zhao, D.; Strotmann, A.: Mapping knowledge domains on Wikipedia : an author bibliographic coupling analysis of traditional Chinese medicine (2022) 0.02
```
0.02076109 = product of:
  0.0934249 = sum of:
    0.06524598 = weight(_text_:bibliographic in 608) [ClassicSimilarity], result of:
      0.06524598 = score(doc=608,freq=14.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.45520115 = fieldWeight in 608, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.03125 = fieldNorm(doc=608)
    0.02817892 = weight(_text_:data in 608) [ClassicSimilarity], result of:
      0.02817892 = score(doc=608,freq=6.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.24204408 = fieldWeight in 608, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03125 = fieldNorm(doc=608)
  0.22222222 = coord(2/9)
```
Abstract

Purpose Wikipedia has the lofty goal of compiling all human knowledge. The purpose of the present study is to map the structure of the Traditional Chinese Medicine (TCM) knowledge domain on Wikipedia, to identify patterns of knowledge representation on Wikipedia and to test the applicability of author bibliographic coupling analysis, an effective method for mapping knowledge domains represented in published scholarly documents, for Wikipedia data. Design/methodology/approach We adapted and followed the well-established procedures and techniques for author bibliographic coupling analysis (ABCA). Instead of bibliographic data from a citation database, we used all articles on TCM downloaded from the English version of Wikipedia as our dataset. An author bibliographic coupling network was calculated and then factor analyzed using SPSS. Factor analysis results were visualized. Factors were labeled upon manual examination of articles that authors who load primarily in each factor have significantly contributed references to. Clear factors were interpreted as topics. Findings Seven TCM topic areas are represented on Wikipedia, among which Acupuncture-related practices, Falun Gong and Herbal Medicine attracted the most significant contributors to TCM. Acupuncture and Qi Gong have the most connections to the TCM knowledge domain and also serve as bridges for other topics to connect to the domain. Herbal medicine is weakly linked to and non-herbal medicine is isolated from the rest of the TCM knowledge domain. It appears that specific topics are represented well on Wikipedia but their conceptual connections are not. ABCA is effective for mapping knowledge domains on Wikipedia but document-based bibliographic coupling analysis is not. Originality/value Given the prominent position of Wikipedia for both information users and for researchers on knowledge organization and information retrieval, it is important to study how well knowledge is represented and structured on Wikipedia. Such studies appear largely missing although studies from different perspectives both about Wikipedia and using Wikipedia as data are abundant. Author bibliographic coupling analysis is effective for mapping knowledge domains represented in published scholarly documents but has never been applied to mapping knowledge domains represented on Wikipedia.

Dunsire, G.; Fritz, D.; Fritz, R.: Instructions, interfaces, and interoperable data : the RIMMF experience with RDA revisited (2020) 0.02

0.020548726 = product of:
  0.09246927 = sum of:
    0.04315616 = weight(_text_:bibliographic in 5751) [ClassicSimilarity], result of:
      0.04315616 = score(doc=5751,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.30108726 = fieldWeight in 5751, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5751)
    0.04931311 = weight(_text_:data in 5751) [ClassicSimilarity], result of:
      0.04931311 = score(doc=5751,freq=6.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.42357713 = fieldWeight in 5751, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5751)
  0.22222222 = coord(2/9)

Abstract: This article presents a case study of RIMMF, a software tool developed to improve the orientation and training of catalogers who use Resource Description and Access (RDA) to maintain bibliographic data. The cataloging guidance and instructions of RDA are based on the Functional Requirements conceptual models that are now consolidated in the IFLA Library Reference Model, but many catalogers are applying RDA in systems that have evolved from inventory and text-processing applications developed from older metadata paradigms. The article describes how RIMMF interacts with the RDA Toolkit and RDA Registry to offer cataloger-friendly multilingual data input and editing interfaces.

Samples, J.; Bigelow, I.: MARC to BIBFRAME : converting the PCC to Linked Data (2020) 0.02

0.020548726 = product of:
  0.09246927 = sum of:
    0.04315616 = weight(_text_:bibliographic in 119) [ClassicSimilarity], result of:
      0.04315616 = score(doc=119,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.30108726 = fieldWeight in 119, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=119)
    0.04931311 = weight(_text_:data in 119) [ClassicSimilarity], result of:
      0.04931311 = score(doc=119,freq=6.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.42357713 = fieldWeight in 119, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=119)
  0.22222222 = coord(2/9)

Abstract: The Program for Cooperative Cataloging (PCC) has formal relationships with the Library of Congress (LC), Share-VDE, and Linked Data for Production Phase 2 (LD4P2) for work on Bibliographic Framework (BIBFRAME), and PCC institutions have been very active in the exploration of MARC to BIBFRAME conversion processes. This article will review the involvement of PCC in the development of BIBFRAME and examine the work of LC, Share-VDE, and LD4P2 on MARC to BIBFRAME conversion. It will conclude with a discussion of areas for further exploration by the PCC leading up to the creation of PCC conversion specifications and PCC BIBFRAME data.

Dobreski, B.: Common usage as warrant in bibliographic description (2020) 0.02
```
0.019836674 = product of:
  0.089265026 = sum of:
    0.068928644 = weight(_text_:bibliographic in 5708) [ClassicSimilarity], result of:
      0.068928644 = score(doc=5708,freq=10.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.480894 = fieldWeight in 5708, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5708)
    0.020336384 = weight(_text_:data in 5708) [ClassicSimilarity], result of:
      0.020336384 = score(doc=5708,freq=2.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.17468026 = fieldWeight in 5708, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5708)
  0.22222222 = coord(2/9)
```
Abstract

Purpose Within standards for bibliographic description, common usage has served as a prominent design principle, guiding the choice and form of certain names and titles. In practice, however, the determination of common usage is difficult and lends itself to varying interpretations. The purpose of this paper is to explore the presence and role of common usage in bibliographic description through an examination of previously unexplored connections between common usage and the concept of warrant. Design/methodology/approach A brief historical review of the concept of common usage was conducted, followed by a case study of the current bibliographic standard Resource Description and Access (RDA) employing qualitative content analysis to examine the appearances, delineations and functions of common usage. Findings were then compared to the existing literature on warrant in knowledge organization. Findings Multiple interpretations of common usage coexist within RDA and its predecessors, and the current prioritization of these interpretations tends to render user perspectives secondary to those of creators, scholars and publishers. These varying common usages and their overall reliance on concrete sources of evidence reveal a mixture of underlying warrants, with literary warrant playing a more prominent role in comparison to the also present scientific/philosophical, use and autonomous warrants. Originality/value This paper offers new understanding of the concept of common usage, and adds to the body of work examining warrant in knowledge organization practices beyond classification. It sheds light on the design of the influential standard RDA while revealing the implications of naming and labeling in widely shared bibliographic data.

Candela, G.: ¬An automatic data quality approach to assess semantic data from cultural heritage institutions (2023) 0.02

0.019377435 = product of:
  0.08719845 = sum of:
    0.06973926 = weight(_text_:data in 997) [ClassicSimilarity], result of:
      0.06973926 = score(doc=997,freq=12.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.59902847 = fieldWeight in 997, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=997)
    0.017459193 = product of:
      0.034918386 = sum of:
        0.034918386 = weight(_text_:22 in 997) [ClassicSimilarity], result of:
          0.034918386 = score(doc=997,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.2708308 = fieldWeight in 997, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=997)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)

Abstract: In recent years, cultural heritage institutions have been exploring the benefits of applying Linked Open Data to their catalogs and digital materials. Innovative and creative methods have emerged to publish and reuse digital contents to promote computational access, such as the concepts of Labs and Collections as Data. Data quality has become a requirement for researchers and training methods based on artificial intelligence and machine learning. This article explores how the quality of Linked Open Data made available by cultural heritage institutions can be automatically assessed. The results obtained can be useful for other institutions who wish to publish and assess their collections.
Date: 22. 6.2023 18:23:31

Yu, L.; Fan, Z.; Li, A.: ¬A hierarchical typology of scholarly information units : based on a deduction-verification study (2020) 0.02
```
0.01921511 = product of:
  0.057645332 = sum of:
    0.024660662 = weight(_text_:bibliographic in 5655) [ClassicSimilarity], result of:
      0.024660662 = score(doc=5655,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.17204987 = fieldWeight in 5655, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.03125 = fieldNorm(doc=5655)
    0.02300799 = weight(_text_:data in 5655) [ClassicSimilarity], result of:
      0.02300799 = score(doc=5655,freq=4.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.19762816 = fieldWeight in 5655, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03125 = fieldNorm(doc=5655)
    0.009976682 = product of:
      0.019953365 = sum of:
        0.019953365 = weight(_text_:22 in 5655) [ClassicSimilarity], result of:
          0.019953365 = score(doc=5655,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.15476047 = fieldWeight in 5655, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=5655)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)
```
Abstract

Purpose The purpose of this paper is to lay a theoretical foundation for identifying operational information units for library and information professional activities in the context of scholarly communication. Design/methodology/approach The study adopts a deduction-verification approach to formulate a typology of units for scholarly information. It first deduces possible units from an existing conceptualization of information, which defines information as the combined product of data and meaning, and then tests the usefulness of these units via two empirical investigations, one with a group of scholarly papers and the other with a sample of scholarly information users. Findings The results show that, on defining an information unit as a piece of information that is complete in both data and meaning, to such an extent that it remains meaningful to its target audience when retrieved and displayed independently in a database, it is then possible to formulate a hierarchical typology of units for scholarly information. The typology proposed in this study consists of three levels, which in turn, consists of 1, 5 and 44 units, respectively. Research limitations/implications The result of this study has theoretical implications on both the philosophical and conceptual levels: on the philosophical level, it hinges on, and reinforces the objective view of information; on the conceptual level, it challenges the conceptualization of work by IFLA's Functional Requirements for Bibliographic Records and Library Reference Model but endorses that by Library of Congress's BIBFRAME 2.0 model. Practical implications It calls for reconsideration of existing operational units in a variety of library and information activities. Originality/value The study strengthens the conceptual foundation of operational information units and brings to light the primacy of "one work" as an information unit and the possibility for it to be supplemented by smaller units.

Date

14. 1.2020 11:15:22

Organisciak, P.; Schmidt, B.M.; Downie, J.S.: Giving shape to large digital libraries through exploratory data analysis (2022) 0.02

0.01906629 = product of:
  0.08579831 = sum of:
    0.036990993 = weight(_text_:bibliographic in 473) [ClassicSimilarity], result of:
      0.036990993 = score(doc=473,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.2580748 = fieldWeight in 473, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.046875 = fieldNorm(doc=473)
    0.048807316 = weight(_text_:data in 473) [ClassicSimilarity], result of:
      0.048807316 = score(doc=473,freq=8.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.4192326 = fieldWeight in 473, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=473)
  0.22222222 = coord(2/9)

Abstract: The emergence of large multi-institutional digital libraries has opened the door to aggregate-level examinations of the published word. Such large-scale analysis offers a new way to pursue traditional problems in the humanities and social sciences, using digital methods to ask routine questions of large corpora. However, inquiry into multiple centuries of books is constrained by the burdens of scale, where statistical inference is technically complex and limited by hurdles to access and flexibility. This work examines the role that exploratory data analysis and visualization tools may play in understanding large bibliographic datasets. We present one such tool, HathiTrust+Bookworm, which allows multifaceted exploration of the multimillion work HathiTrust Digital Library, and center it in the broader space of scholarly tools for exploratory data analysis.
Theme: Data Mining

Jia, J.: From data to knowledge : the relationships between vocabularies, linked data and knowledge graphs (2021) 0.02
```
0.019065496 = product of:
  0.08579473 = sum of:
    0.073323876 = weight(_text_:data in 106) [ClassicSimilarity], result of:
      0.073323876 = score(doc=106,freq=26.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.6298187 = fieldWeight in 106, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=106)
    0.012470853 = product of:
      0.024941705 = sum of:
        0.024941705 = weight(_text_:22 in 106) [ClassicSimilarity], result of:
          0.024941705 = score(doc=106,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.19345059 = fieldWeight in 106, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=106)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)
```
Abstract

Purpose The purpose of this paper is to identify the concepts, component parts and relationships between vocabularies, linked data and knowledge graphs (KGs) from the perspectives of data and knowledge transitions. Design/methodology/approach This paper uses conceptual analysis methods. This study focuses on distinguishing concepts and analyzing composition and intercorrelations to explore data and knowledge transitions. Findings Vocabularies are the cornerstone for accurately building understanding of the meaning of data. Vocabularies provide for a data-sharing model and play an important role in supporting the semantic expression of linked data and defining the schema layer; they are also used for entity recognition, alignment and linkage for KGs. KGs, which consist of a schema layer and a data layer, are presented as cubes that organically combine vocabularies, linked data and big data. Originality/value This paper first describes the composition of vocabularies, linked data and KGs. More importantly, this paper innovatively analyzes and summarizes the interrelatedness of these factors, which comes from frequent interactions between data and knowledge. The three factors empower each other and can ultimately empower the Semantic Web.

Date

22. 1.2021 14:24:32
Hottenrott, H.; Rose, M.E.; Lawson, C.: ¬The rise of multiple institutional affiliations in academia (2021) 0.02
```
0.018845625 = product of:
  0.08480531 = sum of:
    0.07233446 = weight(_text_:germany in 313) [ClassicSimilarity], result of:
      0.07233446 = score(doc=313,freq=2.0), product of:
        0.21956629 = queryWeight, product of:
          5.963546 = idf(docFreq=308, maxDocs=44218)
          0.036818076 = queryNorm
        0.32944247 = fieldWeight in 313, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.963546 = idf(docFreq=308, maxDocs=44218)
          0.0390625 = fieldNorm(doc=313)
    0.012470853 = product of:
      0.024941705 = sum of:
        0.024941705 = weight(_text_:22 in 313) [ClassicSimilarity], result of:
          0.024941705 = score(doc=313,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.19345059 = fieldWeight in 313, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=313)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)
```
Abstract

This study provides the first systematic, international, large-scale evidence on the extent and nature of multiple institutional affiliations on journal publications. Studying more than 15 million authors and 22 million articles from 40 countries we document that: In 2019, almost one in three articles was (co-)authored by authors with multiple affiliations and the share of authors with multiple affiliations increased from around 10% to 16% since 1996. The growth of multiple affiliations is prevalent in all fields and it is stronger in high impact journals. About 60% of multiple affiliations are between institutions from within the academic sector. International co-affiliations, which account for about a quarter of multiple affiliations, most often involve institutions from the United States, China, Germany and the United Kingdom, suggesting a core-periphery network. Network analysis also reveals a number communities of countries that are more likely to share affiliations. We discuss potential causes and show that the timing of the rise in multiple affiliations can be linked to the introduction of more competitive funding structures such as "excellence initiatives" in a number of countries. We discuss implications for science and science policy.
Palsdottir, A.: Data literacy and management of research data : a prerequisite for the sharing of research data (2021) 0.02
```
0.018784689 = product of:
  0.0845311 = sum of:
    0.07455441 = weight(_text_:data in 183) [ClassicSimilarity], result of:
      0.07455441 = score(doc=183,freq=42.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.6403884 = fieldWeight in 183, product of:
          6.4807405 = tf(freq=42.0), with freq of:
            42.0 = termFreq=42.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03125 = fieldNorm(doc=183)
    0.009976682 = product of:
      0.019953365 = sum of:
        0.019953365 = weight(_text_:22 in 183) [ClassicSimilarity], result of:
          0.019953365 = score(doc=183,freq=2.0), product of:
            0.12893063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036818076 = queryNorm
            0.15476047 = fieldWeight in 183, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=183)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)
```
Abstract

Purpose The purpose of this paper is to investigate the knowledge and attitude about research data management, the use of data management methods and the perceived need for support, in relation to participants' field of research. Design/methodology/approach This is a quantitative study. Data were collected by an email survey and sent to 792 academic researchers and doctoral students. Total response rate was 18% (N = 139). The measurement instrument consisted of six sets of questions: about data management plans, the assignment of additional information to research data, about metadata, standard file naming systems, training at data management methods and the storing of research data. Findings The main finding is that knowledge about the procedures of data management is limited, and data management is not a normal practice in the researcher's work. They were, however, in general, of the opinion that the university should take the lead by recommending and offering access to the necessary tools of data management. Taken together, the results indicate that there is an urgent need to increase the researcher's understanding of the importance of data management that is based on professional knowledge and to provide them with resources and training that enables them to make effective and productive use of data management methods. Research limitations/implications The survey was sent to all members of the population but not a sample of it. Because of the response rate, the results cannot be generalized to all researchers at the university. Nevertheless, the findings may provide an important understanding about their research data procedures, in particular what characterizes their knowledge about data management and attitude towards it. Practical implications Awareness of these issues is essential for information specialists at academic libraries, together with other units within the universities, to be able to design infrastructures and develop services that suit the needs of the research community. The findings can be used, to develop data policies and services, based on professional knowledge of best practices and recognized standards that assist the research community at data management. Originality/value The study contributes to the existing literature about research data management by examining the results by participants' field of research. Recognition of the issues is critical in order for information specialists in collaboration with universities to design relevant infrastructures and services for academics and doctoral students that can promote their research data management.

Date

20. 1.2015 18:30:22

Serra, L.G.; Schneider, J.A.; Santarém Segundo, J.E.: Person identifiers in MARC 21 records in a semantic environment (2020) 0.02

0.01853781 = product of:
  0.08342014 = sum of:
    0.04315616 = weight(_text_:bibliographic in 127) [ClassicSimilarity], result of:
      0.04315616 = score(doc=127,freq=2.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.30108726 = fieldWeight in 127, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=127)
    0.040263984 = weight(_text_:data in 127) [ClassicSimilarity], result of:
      0.040263984 = score(doc=127,freq=4.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.34584928 = fieldWeight in 127, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=127)
  0.22222222 = coord(2/9)

Abstract: This article discusses how libraries can include person identifiers in the MARC format. It suggests using URIs in fields and subfields to help transition the data to an RDF model, and to help prepare the catalog for a Linked Data. It analyzes the selection of URIs and Real-World Objects, and the use of tag 024 to describe person identifiers in authority records. When a creator or collaborator is identified in a work, the identifiers are transferred from authority to the bibliographic record. The article concludes that URI-based descriptions can provide a better experience for users, offering other methods of discovery.

Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.02
```
0.018255975 = product of:
  0.08215189 = sum of:
    0.053391904 = weight(_text_:bibliographic in 977) [ClassicSimilarity], result of:
      0.053391904 = score(doc=977,freq=6.0), product of:
        0.14333439 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.036818076 = queryNorm
        0.3724989 = fieldWeight in 977, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0390625 = fieldNorm(doc=977)
    0.028759988 = weight(_text_:data in 977) [ClassicSimilarity], result of:
      0.028759988 = score(doc=977,freq=4.0), product of:
        0.11642061 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.036818076 = queryNorm
        0.24703519 = fieldWeight in 977, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=977)
  0.22222222 = coord(2/9)
```
Abstract

The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.

Search (441 results, page 1 of 23)

Authors

Languages

Types

Themes

Subjects

Classifications