Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 24. Juni 2018)
1Gursoy, A. ; Wickett, K. ; Feinberg, M.: Understanding tag functions in a moderated, user-generated metadata ecosystem.
In: Journal of documentation. 74(2018) no.3, S.490-508.
Abstract: Purpose The purpose of this paper is to investigate tag use in a metadata ecosystem that supports a fan work repository to identify functions of tags and explore the system as a co-constructed communicative context. Design/methodology/approach Using modified techniques from grounded theory (Charmaz, 2007), this paper integrates humanistic and social science methods to identify kinds of tag use in a rich setting. Findings Three primary roles of tags emerge out of detailed study of the metadata ecosystem: tags can identify elements in the fan work, tags can reflect on how those elements are used or adapted in the fan work, and finally, tags can express the fan author's sense of her role in the discursive context of the fan work repository. Attending to each of the tag roles shifts focus away from just what tags say to include how they say it. Practical implications Instead of building metadata systems designed solely for retrieval or description, this research suggests that it may be fruitful to build systems that recognize various metadata functions and allow for expressivity. This research also suggests that attending to metadata previously considered unusable in systems may reflect the participants' sense of the system and their role within it. Originality/value In addition to accommodating a wider range of tag functions, this research implies consideration of metadata ecosystems, where different kinds of tags do different things and work together to create a multifaceted artifact.
Inhalt: Vgl.: https://www.emeraldinsight.com/doi/full/10.1108/JD-09-2017-0134.
2Mayernik, M.S. ; Acker, A.: Tracing the traces : the critical role of metadata within networked communications.
In: Journal of the Association for Information Science and Technology. 69(2018) no.1, S.177-180.
Abstract: The information sciences have traditionally been at the center of metadata-focused research. The US National Security Agency (NSA) intelligence documents revealed by Edward Snowden in June of 2013 brought the term "metadata" into the public consciousness. Surprisingly little discussion in the information sciences has since occurred on the nature and importance of metadata within networked communication systems. The collection of digital metadata impacts the ways that people experience social and technical communication. Without such metadata, networked communication cannot exist. The NSA leaks, and numerous recent hacks of corporate and government communications, point to metadata as objects of new scholarly inquiry. If we are to engage in meaningful discussions about our digital traces, or make informed decisions about new policies and technologies, it is essential to develop theoretical and empirical frameworks that account for digital metadata. This opinion paper presents 5 key sociotechnical characteristics of metadata within digital networks that would benefit from stronger engagement by the information sciences.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23927/full.
3Li, C. ; Sugimoto, S.: Provenance description of metadata application profiles for long-term maintenance of metadata schemas : Luciano Floridi's philosophy of information as the foundation for library and information science.
In: Journal of documentation. 74(2018) no.1, S.36-61.
Abstract: Purpose Provenance information is crucial for consistent maintenance of metadata schemas over time. The purpose of this paper is to propose a provenance model named DSP-PROV to keep track of structural changes of metadata schemas. Design/methodology/approach The DSP-PROV model is developed through applying the general provenance description standard PROV of the World Wide Web Consortium to the Dublin Core Application Profile. Metadata Application Profile of Digital Public Library of America is selected as a case study to apply the DSP-PROV model. Finally, this paper evaluates the proposed model by comparison between formal provenance description in DSP-PROV and semi-formal change log description in English. Findings Formal provenance description in the DSP-PROV model has advantages over semi-formal provenance description in English to keep metadata schemas consistent over time. Research limitations/implications The DSP-PROV model is applicable to keep track of the structural changes of metadata schema over time. Provenance description of other features of metadata schema such as vocabulary and encoding syntax are not covered. Originality/value This study proposes a simple model for provenance description of structural features of metadata schemas based on a few standards widely accepted on the Web and shows the advantage of the proposed model to conventional semi-formal provenance description.
Inhalt: Vgl.: http://www.emeraldinsight.com/doi/full/10.1108/JD-03-2017-0042.
4Haynes, D.: Metadata for information management and retrieval : understanding metadata and its use.2nd ed.
London : Facet Publishing, 2018. XIV, 267 S.
Abstract: This new and updated second edition of a classic text provides a thought-provoking introduction to metadata for all library and information students and professionals. Metadata for Information Management and Retrieval has been fully revised by David Haynes to bring it up to date with new technology and standards. The new edition, containing new chapters on Metadata Standards and Encoding Schemes, assesses the current theory and practice of metadata and examines key developments in terms of both policy and technology. Coverage includes: an introduction to the concept of metadata a description of the main components of metadata systems and standards an overview of the scope of metadata and its applications a description of typical information retrieval issues in corporate and research environments a demonstration of ways in which metadata is used to improve retrieval a look at ways in which metadata is used to manage information consideration of the role of metadata in information governance.
RSWK: Informationsmanagement / Information Retrieval / Metadatenmodell
RVK: AN 95000 ; ST 270
5Gracy, K.F.: Enriching and enhancing moving images with Linked Data : an exploration in the alignment of metadata models.
In: Journal of documentation. 74(2018) no.2, S.354-371.
Abstract: The purpose of this paper is to examine the current state of Linked Data (LD) in archival moving image description, and propose ways in which current metadata records can be enriched and enhanced by interlinking such metadata with relevant information found in other data sets. Design/methodology/approach Several possible metadata models for moving image production and archiving are considered, including models from records management, digital curation, and the recent BIBFRAME AV Modeling Study. This research also explores how mappings between archival moving image records and relevant external data sources might be drawn, and what gaps exist between current vocabularies and what is needed to record and make accessible the full lifecycle of archiving through production, use, and reuse. Findings The author notes several major impediments to implementation of LD for archival moving images. The various pieces of information about creators, places, and events found in moving image records are not easily connected to relevant information in other sources because they are often not semantically defined within the record and can be hidden in unstructured fields. Libraries, archives, and museums must work on aligning the various vocabularies and schemas of potential value for archival moving image description to enable interlinking between vocabularies currently in use and those which are used by external data sets. Alignment of vocabularies is often complicated by mismatches in granularity between vocabularies. Research limitations/implications The focus is on how these models inform functional requirements for access and other archival activities, and how the field might benefit from having a common metadata model for critical archival descriptive activities. Practical implications By having a shared model, archivists may more easily align current vocabularies and develop new vocabularies and schemas to address the needs of moving image data creators and scholars. Originality/value Moving image archives, like other cultural institutions with significant heritage holdings, can benefit tremendously from investing in the semantic definition of information found in their information databases. While commercial entities such as search engines and data providers have already embraced the opportunities that semantic search provides for resource discovery, most non-commercial entities are just beginning to do so. Thus, this research addresses the benefits and challenges of enriching and enhancing archival moving image records with semantically defined information via LD.
Inhalt: Vgl.: https://www.emeraldinsight.com/doi/full/10.1108/JD-07-2017-0106.
Themenfeld: Metadaten ; Semantische Interoperabilität
6Cho, H. ; Donovan, A. ; Lee, J.H.: Art in an algorithm : a taxonomy for describing video game visual styles.
In: Journal of the Association for Information Science and Technology. 69(2018) no.5, S.633-646.
Abstract: The discovery and retrieval of video games in library and information systems is, by and large, dependent on a limited set of descriptive metadata. Noticeably missing from this metadata are classifications of visual style-despite the overwhelmingly visual nature of most video games and the interest in visual style among video game users. One explanation for this paucity is the difficulty in eliciting consistent judgements about visual style, likely due to subjective interpretations of terminology and a lack of demonstrable testing for coinciding judgements. This study presents a taxonomy of video game visual styles constructed from the findings of a 22-participant cataloging user study of visual styles. A detailed description of the study, and its value and shortcomings, are presented along with reflections about the challenges of cultivating consensus about visual style in video games. The high degree of overall agreement in the user study demonstrates the potential value of a descriptor like visual style and the use of a cataloging study in developing visual style taxonomies. The resulting visual style taxonomy, the methods and analysis described herein may help improve the organization and retrieval of video games and possibly other visual materials like graphic designs, illustrations, and animations.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.23988.
Behandelte Form: Spiele
7Çelebi, A. ; Özgür, A.: Segmenting hashtags and analyzing their grammatical structure.
In: Journal of the Association for Information Science and Technology. 69(2018) no.5, S.675-686.
Abstract: Originated as a label to mark specific tweets, hashtags are increasingly used to convey messages that people like to see in the trending hashtags list. Complex noun phrases and even sentences can be turned into a hashtag. Breaking hashtags into their words is a challenging task due to the irregular and compact nature of the language used in Twitter. In this study, we investigate feature-based machine learning and language model (LM)-based approaches for hashtag segmentation. Our results show that LM alone is not successful at segmenting nontrivial hashtags. However, when the N-best LM-based segmentations are incorporated as features into the feature-based approach, along with context-based features proposed in this study, state-of-the-art results in hashtag segmentation are achieved. In addition, we provide an analysis of over two million distinct hashtags, autosegmented by using our best configuration. The analysis reveals that half of all 60 million hashtag occurrences contain multiple words and 80% of sentiment is trapped inside multiword hashtags, justifying the need for hashtag segmentation. Furthermore, we analyze the grammatical structure of hashtags by parsing them and observe that 77% of the hashtags are noun-based, whereas 11.9% are verb-based.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.23989.
Themenfeld: Metadaten ; Computerlinguistik
8Tallerås, C. ; Dahl, J.H.B. ; Pharo, N.: User conceptualizations of derivative relationships in the bibliographic universe.
In: Journal of documentation. 74(2018) no.4, S.894-916.
Abstract: Purpose Considerable effort is devoted to developing new models for organizing bibliographic metadata. However, such models have been repeatedly criticized for their lack of proper user testing. The purpose of this paper is to present a study on how non-experts in bibliographic systems map the bibliographic universe and, in particular, how they conceptualize relationships between independent but strongly related entities. Design/methodology/approach The study is based on an open concept-mapping task performed to externalize the conceptualizations of 98 novice students. The conceptualizations of the resulting concept maps are identified and analyzed statistically. Findings The study shows that the participants' conceptualizations have great variety, differing in detail and granularity. These conceptualizations can be categorized into two main groups according to derivative relationships: those that apply a single-entity model directly relating document entities and those (the majority) that apply a multi-entity model relating documents through a high-level collocating node. These high-level nodes seem to be most adequately interpreted either as superwork devices collocating documents belonging to the same bibliographic family or as devices collocating documents belonging to a shared fictional world. Originality/value The findings can guide the work to develop bibliographic standards. Based on the diversity of the conceptualizations, the findings also emphasize the need for more user testing of both conceptual models and the bibliographic end-user systems implementing those models.
Inhalt: Vgl.: https://www.emeraldinsight.com/doi/full/10.1108/JD-10-2017-0139.
Themenfeld: Formalerschließung ; Metadaten
9Maron, D. ; Feinberg, M.: What does it mean to adopt a metadata standard? : a case study of Omeka and the Dublin Core.
In: Journal of documentation. 74(2018) no.4, S.674-691.
Abstract: Purpose The purpose of this paper is to employ a case study of the Omeka content management system to demonstrate how the adoption and implementation of a metadata standard (in this case, Dublin Core) can result in contrasting rhetorical arguments regarding metadata utility, quality, and reliability. In the Omeka example, the author illustrate a conceptual disconnect in how two metadata stakeholders - standards creators and standards users - operationalize metadata quality. For standards creators such as the Dublin Core community, metadata quality involves implementing a standard properly, according to established usage principles; in contrast, for standards users like Omeka, metadata quality involves mere adoption of the standard, with little consideration of proper usage and accompanying principles. Design/methodology/approach The paper uses an approach based on rhetorical criticism. The paper aims to establish whether Omeka's given ends (the position that Omeka claims to take regarding Dublin Core) align with Omeka's guiding ends (Omeka's actual argument regarding Dublin Core). To make this assessment, the paper examines both textual evidence (what Omeka says) and material-discursive evidence (what Omeka does). Findings The evidence shows that, while Omeka appears to argue that adopting the Dublin Core is an integral part of Omeka's mission, the platform's lack of support for Dublin Core implementation makes an opposing argument. Ultimately, Omeka argues that the appearance of adopting a standard is more important than its careful implementation. Originality/value This study contributes to our understanding of how metadata standards are understood and used in practice. The misalignment between Omeka's position and the goals of the Dublin Core community suggests that Omeka, and some portion of its users, do not value metadata interoperability and aggregation in the same way that the Dublin Core community does. This indicates that, although certain values regarding standards adoption may be pervasive in the metadata community, these values are not equally shared amongst all stakeholders in a digital library ecosystem. The way that standards creators (Dublin Core) understand what it means to "adopt a standard" is different from the way that standards users (Omeka) understand what it means to "adopt a standard."
Inhalt: Vgl.: https://www.emeraldinsight.com/doi/full/10.1108/JD-06-2017-0095.
Objekt: Dublin Core ; Omeka
10Fidler, B. ; Acker, A.: Metadata, infrastructure, and computer-mediated communication in historical perspective.
In: Journal of the Association for Information Science and Technology. 68(2017) no.2, S.412-422.
Abstract: In this paper we describe the creation and use of metadata on the early Arpanet as part of normal network function. By using the Arpanet Host-Host Protocol and its sockets as an entry point for studying the generation of metadata, we show that the development and function of key Arpanet infrastructure can be studied by examining the creation and stabilization of metadata. More specifically, we use the Host-Host Protocol's sockets as an example of something that, at the level of the network, functions as both network infrastructure and metadata simultaneously. By presenting the function of sockets in tandem with an overview of the Host-Host Protocol, we argue for the further integrated study of infrastructure and metadata. Finally, we reintroduce the concept of infradata to refer specifically to data that locate data throughout an infrastructure and are required by the infrastructure to function, separating them from established and stabilized standards. We argue for the future application of infradata as a concept for the study of histories and political economies of networks, bridging the largely library and information science (LIS) study of metadata with the largely science and technology studies (STS) domain of infrastructure.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23660/full.
11Wallis, R. ; Isaac, A. ; Charles, V. ; Manguinhas, H.: Recommendations for the application of Schema.org to aggregated cultural heritage metadata to increase relevance and visibility to search engines : the case of Europeana.
In: Code4Lib journal. Issue 36(2017), [http://journal.code4lib.org].
Abstract: Europeana provides access to more than 54 million cultural heritage objects through its portal Europeana Collections. It is crucial for Europeana to be recognized by search engines as a trusted authoritative repository of cultural heritage objects. Indeed, even though its portal is the main entry point, most Europeana users come to it via search engines. Europeana Collections is fuelled by metadata describing cultural objects, represented in the Europeana Data Model (EDM). This paper presents the research and consequent recommendations for publishing Europeana metadata using the Schema.org vocabulary and best practices. Schema.org html embedded metadata to be consumed by search engines to power rich services (such as Google Knowledge Graph). Schema.org is an open and widely adopted initiative (used by over 12 million domains) backed by Google, Bing, Yahoo!, and Yandex, for sharing metadata across the web It underpins the emergence of new web techniques, such as so called Semantic SEO. Our research addressed the representation of the embedded metadata as part of the Europeana HTML pages and sitemaps so that the re-use of this data can be optimized. The practical objective of our work is to produce a Schema.org representation of Europeana resources described in EDM, being the richest as possible and tailored to Europeana's realities and user needs as well the search engines and their users.
Inhalt: Vgl.: http://journal.code4lib.org/articles/12330.
Objekt: Schema.org ; Europeana
12Edmunds, J.: Roadmap to nowhere : BIBFLOW, BIBFRAME, and linked data for libraries.[21.04.2017].
Abstract: On December 12, 2016, Carl Stahmer and MacKenzie Smith presented at the CNI Members Fall Meeting about the BIBFLOW project, self-described on Twitter as "a two-year project of the UC Davis University Library and Zepheira investigating the future of library technical services." In her opening remarks, Ms. Smith, University Librarian at UC Davis, stated that one of the goals of the project was to devise a roadmap "to get from where we are today, which is kind of the 1970s with a little lipstick on it, to 2020, which is where we're going to be very soon." The notion that where libraries are today is somehow behind the times is one of the commonly heard rationales behind a move to linked data. Stated more precisely: - Libraries devote considerable time and resources to producing high-quality bibliographic metadata - This metadata is stored in unconnected silos - This metadata is in a format (MARC) that is incompatible with technologies of the emerging Semantic Web - The visibility of library metadata is diminished as a result of the two points above Are these assertions true? If yes, is linked data the solution?
Themenfeld: Formalerschließung ; Metadaten
Objekt: BIBFLOW ; BIBFRAME
13Belém, F.M. ; Almeida, J.M. ; Gonçalves, M.A.: ¬A survey on tag recommendation methods : a review.
In: Journal of the Association for Information Science and Technology. 68(2017) no.4, S.830-844.
Abstract: Tags (keywords freely assigned by users to describe web content) have become highly popular on Web 2.0 applications, because of the strong stimuli and easiness for users to create and describe their own content. This increase in tag popularity has led to a vast literature on tag recommendation methods. These methods aim at assisting users in the tagging process, possibly increasing the quality of the generated tags and, consequently, improving the quality of the information retrieval (IR) services that rely on tags as data sources. Regardless of the numerous and diversified previous studies on tag recommendation, to our knowledge, no previous work has summarized and organized them into a single survey article. In this article, we propose a taxonomy for tag recommendation methods, classifying them according to the target of the recommendations, their objectives, exploited data sources, and underlying techniques. Moreover, we provide a critical overview of these methods, pointing out their advantages and disadvantages. Finally, we describe the main open challenges related to the field, such as tag ambiguity, cold start, and evaluation issues.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23736/full.
14Stiller, J. ; Király, P.: Multitlinguality of metadata : measuring the miltilingual degree of Europeana's metadata.
In: Everything changes, everything stays the same? - Understanding information spaces : Proceedings of the 15th International Symposium of Information Science (ISI 2017), Berlin/Germany, 13th - 15th March 2017. Eds.: M. Gäde, V. Trkulja u. V. Petras. vwh-Verlag : Glückstadt, 2017. S.164-176.
(Schriften zur Informationswissenschaft; Bd. 70)
Inhalt: Vgl.: http://www.vwh-verlag.de/vwh/wp-content/uploads/2017/03/titelei_isi17.pdf.
Themenfeld: Metadaten ; Multilinguale Probleme
15Niininen, S. ; Nykyri, S. ; Suominen, O.: ¬The future of metadata : open, linked, and multilingual - the YSO case.
In: Journal of documentation. 73(2017) no.3, S.451-465.
Abstract: Purpose The purpose of this paper is threefold: to focus on the process of multilingual concept scheme construction and the challenges involved; to addresses concrete challenges faced in the construction process and especially those related to equivalence between terms and concepts; and to briefly outlines the translation strategies developed during the process of concept scheme construction. Design/methodology/approach The analysis is based on experience acquired during the establishment of the Finnish thesaurus and ontology service Finto as well as the trilingual General Finnish Ontology YSO, both of which are being maintained and further developed at the National Library of Finland. Findings Although uniform resource identifiers can be considered language-independent, they do not render concept schemes and their construction free of language-related challenges. The fundamental issue with all the challenges faced is how to maintain consistency and predictability when the nature of language requires each concept to be treated individually. The key to such challenges is to recognise the function of the vocabulary and the needs of its intended users. Social implications Open science increases the transparency of not only research products, but also metadata tools. Gaining a deeper understanding of the challenges involved in their construction is important for a great variety of users - e.g. indexers, vocabulary builders and information seekers. Today, multilingualism is an essential aspect at both the national and international information society level. Originality/value This paper draws on the practical challenges faced in concept scheme construction in a trilingual environment, with a focus on "concept scheme" as a translation and mapping unit.
Inhalt: Vgl.: http://www.emeraldinsight.com/doi/full/10.1108/JD-06-2016-0084.
Themenfeld: Metadaten ; Multilinguale Probleme
16Suominen, O. ; Hyvönen, N.: From MARC silos to Linked Data silos?.
In: o-bib: Das offene Bibliotheksjournal. 4(2017) Nr.2, S.1-13.
Abstract: Seit einiger Zeit stellen Bibliotheken ihre bibliografischen Metadadaten verstärkt offen in Form von Linked Data zur Verfügung. Dabei kommen jedoch ganz unterschiedliche Modelle für die Strukturierung der bibliografischen Daten zur Anwendung. Manche Bibliotheken verwenden ein auf FRBR basierendes Modell mit mehreren Schichten von Entitäten, während andere flache, am Datensatz orientierte Modelle nutzen. Der Wildwuchs bei den Datenmodellen erschwert die Nachnutzung der bibliografischen Daten. Im Ergebnis haben die Bibliotheken die früheren MARC-Silos nur mit zueinander inkompatiblen Linked-Data-Silos vertauscht. Deshalb ist es häufig schwierig, Datensets miteinander zu kombinieren und nachzunutzen. Kleinere Unterschiede in der Datenmodellierung lassen sich zwar durch Schema Mappings in den Griff bekommen, doch erscheint es fraglich, ob die Interoperabilität insgesamt zugenommen hat. Der Beitrag stellt die Ergebnisse einer Studie zu verschiedenen veröffentlichten Sets von bibliografischen Daten vor. Dabei werden auch die unterschiedlichen Modelle betrachtet, um bibliografische Daten als RDF darzustellen, sowie Werkzeuge zur Erzeugung von entsprechenden Daten aus dem MARC-Format. Abschließend wird der von der Finnischen Nationalbibliothek verfolgte Ansatz behandelt.
Inhalt: https://www.o-bib.de/article/view/2017H2S1-13. DOI: https://doi.org/10.5282/o-bib/2017H2S1-13.
Themenfeld: Datenformate ; Metadaten ; Semantische Interoperabilität
Objekt: MARC ; RDA
17Hook, P.A. ; Gantchev, A.: Using combined metadata sources to visualize a small library (OBL's English Language Books).
In: http://www.iskocus.org/NASKO2017papers/NASKO2017_paper_24.pdf [NASKO 2017, June 15-16, 2017, Champaign, IL, USA].
Abstract: Data from multiple knowledge organization systems are combined to provide a global overview of the content holdings of a small personal library. Subject headings and classification data are used to effectively map the combined book and topic space of the library. While harvested and manipulated by hand, the work reveals issues and potential solutions when using automated techniques to produce topic maps of much larger libraries. The small library visualized consists of the thirty-nine, digital, English language books found in the Osama Bin Laden (OBL) compound in Abbottabad, Pakistan upon his death. As this list of books has garnered considerable media attention, it is worth providing a visual overview of the subject content of these books - some of which is not readily apparent from the titles. Metadata from subject headings and classification numbers was combined to create book-subject maps. Tree maps of the classification data were also produced. The books contain 328 subject headings. In order to enhance the base map with meaningful thematic overlay, library holding count data was also harvested (and aggregated from duplicates). This additional data revealed the relative scarcity or popularity of individual books.
Inhalt: Beitrag bei: NASKO 2017: Visualizing Knowledge Organization: Bringing Focus to Abstract Realities. The sixth North American Symposium on Knowledge Organization (NASKO 2017), June 15-16, 2017, in Champaign, IL, USA.
Themenfeld: Metadaten ; Semantische Interoperabilität ; Visualisierung
18Bartczak, J. ; Glendon, I.: Python, Google Sheets, and the Thesaurus for Graphic Materials for efficient metadata project workflows.
In: Code4Lib journal. Issue 35(2017), [http://journal.code4lib.org].
Abstract: In 2017, the University of Virginia (U.Va.) will launch a two year initiative to celebrate the bicentennial anniversary of the University's founding in 1819. The U.Va. Library is participating in this event by digitizing some 20,000 photographs and negatives that document student life on the U.Va. grounds in the 1960s and 1970s. Metadata librarians and archivists are well-versed in the challenges associated with generating digital content and accompanying description within the context of limited resources. This paper describes how technology and new approaches to metadata design have enabled the University of Virginia's Metadata Analysis and Design Department to rapidly and successfully generate accurate description for these digital objects. Python's pandas module improves efficiency by cleaning and repurposing data recorded at digitization, while the lxml module builds MODS XML programmatically from CSV tables. A simplified technique for subject heading selection and assignment in Google Sheets provides a collaborative environment for streamlined metadata creation and data quality control.
Inhalt: Vgl.: http://journal.code4lib.org/articles/12182.
19Neumann, M. ; Steinberg, J. ; Schaer, P.: Web-ccraping for non-programmers : introducing OXPath for digital library metadata harvesting.
In: Code4Lib journal. Issue 38(2017), [http://journal.code4lib.org].
Abstract: Building up new collections for digital libraries is a demanding task. Available data sets have to be extracted which is usually done with the help of software developers as it involves custom data handlers or conversion scripts. In cases where the desired data is only available on the data provider's website custom web scrapers are needed. This may be the case for small to medium-size publishers, research institutes or funding agencies. As data curation is a typical task that is done by people with a library and information science background, these people are usually proficient with XML technologies but are not full-stack programmers. Therefore we would like to present a web scraping tool that does not demand the digital library curators to program custom web scrapers from scratch. We present the open-source tool OXPath, an extension of XPath, that allows the user to define data to be extracted from websites in a declarative way. By taking one of our own use cases as an example, we guide you in more detail through the process of creating an OXPath wrapper for metadata harvesting. We also point out some practical things to consider when creating a web scraper (with OXPath). On top of that, we also present a syntax highlighting plugin for the popular text editor Atom that we developed to further support OXPath users and to simplify the authoring process.
Inhalt: Vgl.: http://journal.code4lib.org/articles/13007.
20Hardesty, J.L. ; Young, J.B.: ¬The semantics of metadata : Avalon Media System and the move to RDF.
In: Code4Lib journal. Issue 37(2017), [http://journal.code4lib.org].
Abstract: The Avalon Media System (Avalon) provides access and management for digital audio and video collections in libraries and archives. The open source project is led by the libraries of Indiana University Bloomington and Northwestern University and is funded in part by grants from The Andrew W. Mellon Foundation and Institute of Museum and Library Services. Avalon is based on the Samvera Community (formerly Hydra Project) software stack and uses Fedora as the digital repository back end. The Avalon project team is in the process of migrating digital repositories from Fedora 3 to Fedora 4 and incorporating metadata statements using the Resource Description Framework (RDF) instead of XML files accompanying the digital objects in the repository. The Avalon team has worked on the migration path for technical metadata and is now working on the migration paths for structural metadata (PCDM) and descriptive metadata (from MODS XML to RDF). This paper covers the decisions made to begin using RDF for software development and offers a window into how Semantic Web technology functions in the real world.
Inhalt: Vgl.: http://journal.code4lib.org/articles/12668.
Objekt: Avalon ; RDF