Literatur zur Informationserschließung
Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft
/
Powered by litecat, BIS Oldenburg
(Stand: 28. April 2022)
Suche
Suchergebnisse
Treffer 1–2 von 2
sortiert nach:
-
1Hodges, D.W. ; Schlottmann, K.: better archival migration outcomes with Python and the Google Sheets API : Reporting from the archives.
In: Code4Lib journal. Issue 46(2019), [http://journal.code4lib.org].
Abstract: Columbia University Libraries recently embarked on a multi-phase project to migrate nearly 4,000 records describing over 70,000 linear feet of archival material from disparate sources and formats into ArchivesSpace. This paper discusses tools and methods brought to bear in Phase 2 of this project, which required us to look closely at how to integrate a large number of legacy finding aids into the new system and merge descriptive data that had diverged in myriad ways. Using Python, XSLT, and a widely available if underappreciated resource-the Google Sheets API-archival and technical library staff devised ways to efficiently report data from different sources, and present it in an accessible, user-friendly way,. Responses were then fed back into automated data remediation processes to keep the migration project on track and minimize manual intervention. The scripts and processes developed proved very effective, and moreover, show promise well beyond the ArchivesSpace migration. This paper describes the Python/XSLT/Sheets API processes developed and how they opened a path to move beyond CSV-based reporting with flexible, ad-hoc data interfaces easily adaptable to meet a variety of purposes.
Inhalt: Vgl.: https://journal.code4lib.org/articles/14871.
Themenfeld: Metadaten
Objekt: Google Sheets API ; Python
-
2Klic, L. ; Miller, M. ; Nelson, J.K. ; Germann, J.E.: Approaching the largest 'API' : extracting information from the Internet with Python.
In: Code4Lib journal. Issue 39(2018), [http://journal.code4lib.org].
Abstract: This article explores the need for libraries to algorithmically access and manipulate the world's largest API: the Internet. The billions of pages on the 'Internet API' (HTTP, HTML, CSS, XPath, DOM, etc.) are easily accessible and manipulable. Libraries can assist in creating meaning through the datafication of information on the world wide web. Because most information is created for human consumption, some programming is required for automated extraction. Python is an easy-to-learn programming language with extensive packages and community support for web page automation. Four packages (Urllib, Selenium, BeautifulSoup, Scrapy) in Python can automate almost any web page for all sized projects. An example warrant data project is explained to illustrate how well Python packages can manipulate web pages to create meaning through assembling custom datasets.
Inhalt: Vgl.: http://journal.code4lib.org/articles/13197.
Themenfeld: Internet
Objekt: Python