Search (46 results, page 1 of 3)

Wallis, R.; Isaac, A.; Charles, V.; Manguinhas, H.: Recommendations for the application of Schema.org to aggregated cultural heritage metadata to increase relevance and visibility to search engines : the case of Europeana (2017) 0.06
```
0.057456337 = product of:
  0.114912674 = sum of:
    0.041137107 = weight(_text_:web in 3372) [ClassicSimilarity], result of:
      0.041137107 = score(doc=3372,freq=4.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.25496176 = fieldWeight in 3372, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3372)
    0.07377557 = weight(_text_:search in 3372) [ClassicSimilarity], result of:
      0.07377557 = score(doc=3372,freq=10.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.4293381 = fieldWeight in 3372, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3372)
  0.5 = coord(2/4)
```
Abstract

Europeana provides access to more than 54 million cultural heritage objects through its portal Europeana Collections. It is crucial for Europeana to be recognized by search engines as a trusted authoritative repository of cultural heritage objects. Indeed, even though its portal is the main entry point, most Europeana users come to it via search engines. Europeana Collections is fuelled by metadata describing cultural objects, represented in the Europeana Data Model (EDM). This paper presents the research and consequent recommendations for publishing Europeana metadata using the Schema.org vocabulary and best practices. Schema.org html embedded metadata to be consumed by search engines to power rich services (such as Google Knowledge Graph). Schema.org is an open and widely adopted initiative (used by over 12 million domains) backed by Google, Bing, Yahoo!, and Yandex, for sharing metadata across the web It underpins the emergence of new web techniques, such as so called Semantic SEO. Our research addressed the representation of the embedded metadata as part of the Europeana HTML pages and sitemaps so that the re-use of this data can be optimized. The practical objective of our work is to produce a Schema.org representation of Europeana resources described in EDM, being the richest as possible and tailored to Europeana's realities and user needs as well the search engines and their users.
Roux, M.: Metadata for search engines : what can be learned from e-Sciences? (2012) 0.06
```
0.057045117 = product of:
  0.114090234 = sum of:
    0.03490599 = weight(_text_:web in 96) [ClassicSimilarity], result of:
      0.03490599 = score(doc=96,freq=2.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.21634221 = fieldWeight in 96, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=96)
    0.07918424 = weight(_text_:search in 96) [ClassicSimilarity], result of:
      0.07918424 = score(doc=96,freq=8.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.460814 = fieldWeight in 96, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=96)
  0.5 = coord(2/4)
```
Abstract

E-sciences are data-intensive sciences that make a large use of the Web to share, collect, and process data. In this context, primary scientific data is becoming a new challenging issue as data must be extensively described (1) to account for empiric conditions and results that allow interpretation and/or analyses and (2) to be understandable by computers used for data storage and information retrieval. With this respect, metadata is a focal point whatever it is considered from the point of view of the user to visualize and exploit data as well as this of the search tools to find and retrieve information. Numerous disciplines are concerned with the issues of describing complex observations and addressing pertinent knowledge. In this paper, similarities and differences in data description and exploration strategies among disciplines in e-sciences are examined.

Footnote

Vgl.: http://www.igi-global.com/book/next-generation-search-engines/64420.

Source

Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a
Peters, I.; Stock, W.G.: Power tags in information retrieval (2010) 0.05
```
0.051431946 = product of:
  0.10286389 = sum of:
    0.029088326 = weight(_text_:web in 865) [ClassicSimilarity], result of:
      0.029088326 = score(doc=865,freq=2.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.18028519 = fieldWeight in 865, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=865)
    0.07377557 = weight(_text_:search in 865) [ClassicSimilarity], result of:
      0.07377557 = score(doc=865,freq=10.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.4293381 = fieldWeight in 865, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=865)
  0.5 = coord(2/4)
```
Abstract

Purpose - Many Web 2.0 services (including Library 2.0 catalogs) make use of folksonomies. The purpose of this paper is to cut off all tags in the long tail of a document-specific tag distribution. The remaining tags at the beginning of a tag distribution are considered power tags and form a new, additional search option in information retrieval systems. Design/methodology/approach - In a theoretical approach the paper discusses document-specific tag distributions (power law and inverse-logistic shape), the development of such distributions (Yule-Simon process and shuffling theory) and introduces search tags (besides the well-known index tags) as a possibility for generating tag distributions. Findings - Search tags are compatible with broad and narrow folksonomies and with all knowledge organization systems (e.g. classification systems and thesauri), while index tags are only applicable in broad folksonomies. Based on these findings, the paper presents a sketch of an algorithm for mining and processing power tags in information retrieval systems. Research limitations/implications - This conceptual approach is in need of empirical evaluation in a concrete retrieval system. Practical implications - Power tags are a new search option for retrieval systems to limit the amount of hits. Originality/value - The paper introduces power tags as a means for enhancing the precision of search results in information retrieval systems that apply folksonomies, e.g. catalogs in Library 2.0environments.

Sturmane, A.; Eglite, E.; Jankevica-Balode, M.: Subject metadata development for digital resources in Latvia (2014) 0.04

0.043457236 = product of:
  0.08691447 = sum of:
    0.04072366 = weight(_text_:web in 1963) [ClassicSimilarity], result of:
      0.04072366 = score(doc=1963,freq=2.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.25239927 = fieldWeight in 1963, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1963)
    0.046190813 = weight(_text_:search in 1963) [ClassicSimilarity], result of:
      0.046190813 = score(doc=1963,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.2688082 = fieldWeight in 1963, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1963)
  0.5 = coord(2/4)

Abstract: The National Library of Latvia (NLL) made a decision to use the Library of Congress Subject Headings (LCSH) in 2000. At present the NLL Subject Headings Database in Latvian holds approximately 34,000 subject headings and is used for subject cataloging of textual resources, including articles from serials. For digital objects NLL uses a system like Faceted Application of Subject Terminology (FAST). We succesfully use it in the project "In Search of Lost Latvia," one of the milestones in the development of the subject cataloging of digital resources in Latvia.
Footnote: Contribution in a special issue "Beyond libraries: Subject metadata in the digital environment and Semantic Web" - Enthält Beiträge der gleichnamigen IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn.

Ilik, V.; Storlien, J.; Olivarez, J.: Metadata makeover (2014) 0.03

0.032083966 = product of:
  0.06416793 = sum of:
    0.04072366 = weight(_text_:web in 2606) [ClassicSimilarity], result of:
      0.04072366 = score(doc=2606,freq=2.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.25239927 = fieldWeight in 2606, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2606)
    0.023444273 = product of:
      0.046888545 = sum of:
        0.046888545 = weight(_text_:22 in 2606) [ClassicSimilarity], result of:
          0.046888545 = score(doc=2606,freq=2.0), product of:
            0.17312855 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049439456 = queryNorm
            0.2708308 = fieldWeight in 2606, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2606)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Catalogers have become fluent in information technology such as web design skills, HyperText Markup Language (HTML), Cascading Stylesheets (CSS), eXensible Markup Language (XML), and programming languages. The knowledge gained from learning information technology can be used to experiment with methods of transforming one metadata schema into another using various software solutions. This paper will discuss the use of eXtensible Stylesheet Language Transformations (XSLT) for repurposing, editing, and reformatting metadata. Catalogers have the requisite skills for working with any metadata schema, and if they are excluded from metadata work, libraries are wasting a valuable human resource.
Date: 10. 9.2000 17:38:22

White, M.: ¬The value of taxonomies, thesauri and metadata in enterprise search (2016) 0.03
```
0.029739885 = product of:
  0.11895954 = sum of:
    0.11895954 = weight(_text_:search in 2964) [ClassicSimilarity], result of:
      0.11895954 = score(doc=2964,freq=26.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.69228697 = fieldWeight in 2964, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2964)
  0.25 = coord(1/4)
```
Abstract

Although the technical, mathematical and linguistic principles of search date back to the early 1960s and enterprise search applications have been commercially available since the 1980s; it is only since the launch of Microsoft SharePoint 2010 and the integration of the Apache Lucene and Solr projects in 2010 that there has been a wider adoption of enterprise search applications. Surveys carried out over the last five years indicate that although enterprises accept that search applications are essential in locating information, there has not been any significant investment in search teams to support these applications. Where taxonomies, thesauri and metadata have been used to improve the search user interface and enhance the search experience, the indications are that levels of search satisfaction are significantly higher. The challenges faced by search managers in developing and maintaining these tools include a lack of published research on the use of these tools and difficulty in recruiting search team members with the requisite skills and experience. There would seem to be an important and immediate opportunity to bring together the research, knowledge organization and enterprise search communities to explore how good practice in the use of taxonomies, thesauri and metadata in enterprise search can be established, enhanced and promoted.
Belém, F.M.; Almeida, J.M.; Gonçalves, M.A.: ¬A survey on tag recommendation methods : a review (2017) 0.03
```
0.028941508 = product of:
  0.057883017 = sum of:
    0.041137107 = weight(_text_:web in 3524) [ClassicSimilarity], result of:
      0.041137107 = score(doc=3524,freq=4.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.25496176 = fieldWeight in 3524, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3524)
    0.01674591 = product of:
      0.03349182 = sum of:
        0.03349182 = weight(_text_:22 in 3524) [ClassicSimilarity], result of:
          0.03349182 = score(doc=3524,freq=2.0), product of:
            0.17312855 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049439456 = queryNorm
            0.19345059 = fieldWeight in 3524, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3524)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Tags (keywords freely assigned by users to describe web content) have become highly popular on Web 2.0 applications, because of the strong stimuli and easiness for users to create and describe their own content. This increase in tag popularity has led to a vast literature on tag recommendation methods. These methods aim at assisting users in the tagging process, possibly increasing the quality of the generated tags and, consequently, improving the quality of the information retrieval (IR) services that rely on tags as data sources. Regardless of the numerous and diversified previous studies on tag recommendation, to our knowledge, no previous work has summarized and organized them into a single survey article. In this article, we propose a taxonomy for tag recommendation methods, classifying them according to the target of the recommendations, their objectives, exploited data sources, and underlying techniques. Moreover, we provide a critical overview of these methods, pointing out their advantages and disadvantages. Finally, we describe the main open challenges related to the field, such as tag ambiguity, cold start, and evaluation issues.

Date

16.11.2017 13:30:22
Roy, W.; Gray, C.: Preparing existing metadata for repository batch import : a recipe for a fickle food (2018) 0.03
```
0.028941508 = product of:
  0.057883017 = sum of:
    0.041137107 = weight(_text_:web in 4550) [ClassicSimilarity], result of:
      0.041137107 = score(doc=4550,freq=4.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.25496176 = fieldWeight in 4550, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4550)
    0.01674591 = product of:
      0.03349182 = sum of:
        0.03349182 = weight(_text_:22 in 4550) [ClassicSimilarity], result of:
          0.03349182 = score(doc=4550,freq=2.0), product of:
            0.17312855 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049439456 = queryNorm
            0.19345059 = fieldWeight in 4550, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4550)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

In 2016, the University of Waterloo began offering a mediated copyright review and deposit service to support the growth of our institutional repository UWSpace. This resulted in the need to batch import large lists of published works into the institutional repository quickly and accurately. A range of methods have been proposed for harvesting publications metadata en masse, but many technological solutions can easily become detached from a workflow that is both reproducible for support staff and applicable to a range of situations. Many repositories offer the capacity for batch upload via CSV, so our method provides a template Python script that leverages the Habanero library for populating CSV files with existing metadata retrieved from the CrossRef API. In our case, we have combined this with useful metadata contained in a TSV file downloaded from Web of Science in order to enrich our metadata as well. The appeal of this 'low-maintenance' method is that it provides more robust options for gathering metadata semi-automatically, and only requires the user's ability to access Web of Science and the Python program, while still remaining flexible enough for local customizations.

Date

10.11.2018 16:27:22
Bogaard, T.; Hollink, L.; Wielemaker, J.; Ossenbruggen, J. van; Hardman, L.: Metadata categorization for identifying search patterns in a digital library (2019) 0.03
```
0.028573154 = product of:
  0.114292614 = sum of:
    0.114292614 = weight(_text_:search in 5281) [ClassicSimilarity], result of:
      0.114292614 = score(doc=5281,freq=24.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.66512775 = fieldWeight in 5281, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5281)
  0.25 = coord(1/4)
```
Abstract

Purpose For digital libraries, it is useful to understand how users search in a collection. Investigating search patterns can help them to improve the user interface, collection management and search algorithms. However, search patterns may vary widely in different parts of a collection. The purpose of this paper is to demonstrate how to identify these search patterns within a well-curated historical newspaper collection using the existing metadata. Design/methodology/approach The authors analyzed search logs combined with metadata records describing the content of the collection, using this metadata to create subsets in the logs corresponding to different parts of the collection. Findings The study shows that faceted search is more prevalent than non-faceted search in terms of number of unique queries, time spent, clicks and downloads. Distinct search patterns are observed in different parts of the collection, corresponding to historical periods, geographical regions or subject matter. Originality/value First, this study provides deeper insights into search behavior at a fine granularity in a historical newspaper collection, by the inclusion of the metadata in the analysis. Second, it demonstrates how to use metadata categorization as a way to analyze distinct search patterns in a collection.

Wartburg, K. von; Sibille, C.; Aliverti, C.: Metadata collaboration between the Swiss National Library and research institutions in the field of Swiss historiography (2019) 0.03

0.02750054 = product of:
  0.05500108 = sum of:
    0.03490599 = weight(_text_:web in 5272) [ClassicSimilarity], result of:
      0.03490599 = score(doc=5272,freq=2.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.21634221 = fieldWeight in 5272, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5272)
    0.02009509 = product of:
      0.04019018 = sum of:
        0.04019018 = weight(_text_:22 in 5272) [ClassicSimilarity], result of:
          0.04019018 = score(doc=5272,freq=2.0), product of:
            0.17312855 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049439456 = queryNorm
            0.23214069 = fieldWeight in 5272, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=5272)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: This article presents examples of metadata collaborations between the Swiss National Library (NL) and research institutions in the field of Swiss historiography. The NL publishes the Bibliography on Swiss History (BSH). In order to meet the demands of its research community, the NL has improved the accessibility and interoperability of the BSH database. Moreover, the BSH takes part in metadata projects such as Metagrid, a web service linking different historical databases. Other metadata collaborations with partners in the historical field such as the Law Sources Foundation (LSF) will position the BSH as an indispensable literature hub for publications on Swiss history.
Date: 30. 5.2019 19:22:49

Khoo, M.J.; Ahn, J.-w.; Binding, C.; Jones, H.J.; Lin, X.; Massam, D.; Tudhope, D.: Augmenting Dublin Core digital library metadata with Dewey Decimal Classification (2015) 0.02
```
0.024832705 = product of:
  0.04966541 = sum of:
    0.023270661 = weight(_text_:web in 2320) [ClassicSimilarity], result of:
      0.023270661 = score(doc=2320,freq=2.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.14422815 = fieldWeight in 2320, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=2320)
    0.026394749 = weight(_text_:search in 2320) [ClassicSimilarity], result of:
      0.026394749 = score(doc=2320,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.15360467 = fieldWeight in 2320, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.03125 = fieldNorm(doc=2320)
  0.5 = coord(2/4)
```
Abstract

Purpose - The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query. Design/methodology/approach - The approach involves creating new Dewey Decimal Classification terms and numbers from existing Dublin Core records. In total, 263,550 records were harvested from three digital libraries. Weighted key terms were extracted from the title, description and subject fields of each record. Ranked DDC classes were automatically generated from these key terms by considering DDC hierarchies via a series of filtering and aggregation stages. A mean reciprocal ranking evaluation compared a sample of 49 generated classes against DDC classes created by a trained librarian for the same records. Findings - The best results combined weighted key terms from the title, description and subject fields. Performance declines with increased specificity of DDC level. The results compare favorably with similar studies. Research limitations/implications - The metadata harvest required manual intervention and the evaluation was resource intensive. Future research will look at evaluation methodologies that take account of issues of consistency and ecological validity. Practical implications - The method does not require training data and is easily scalable. The pipeline can be customized for individual use cases, for example, recall or precision enhancing. Social implications - The approach can provide centralized access to information from multiple domains currently provided by individual digital libraries. Originality/value - The approach addresses metadata normalization in the context of web resources. The automatic classification approach accounts for matches within hierarchies, aggregating lower level matches to broader parents and thus approximates the practices of a human cataloger.
Pope, J.T.; Holley, R.P.: Google Book Search and metadata (2011) 0.02
```
0.02000121 = product of:
  0.08000484 = sum of:
    0.08000484 = weight(_text_:search in 1887) [ClassicSimilarity], result of:
      0.08000484 = score(doc=1887,freq=6.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.46558946 = fieldWeight in 1887, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1887)
  0.25 = coord(1/4)
```
Abstract

This article summarizes published documents on metadata provided by Google for books scanned as part of the Google Book Search (GBS) project and provides suggestions for improvement. The faulty, misleading, and confusing metadata in current Google records can pose potentially serious problems for users of GBS. Google admits that it took data, which proved to be inaccurate, from many sources and is attempting to correct errors. Some argue that metadata is not needed with keyword searching; but optical character recognition (OCR) errors, synonym control, and materials in foreign languages make reliable metadata a requirement for academic researchers. The authors recommend that users should be able to submit error reports to Google to correct faulty metadata.

Object

Google Book Search
Neumann, M.; Steinberg, J.; Schaer, P.: Web-ccraping for non-programmers : introducing OXPath for digital library metadata harvesting (2017) 0.02
```
0.01626087 = product of:
  0.06504348 = sum of:
    0.06504348 = weight(_text_:web in 3895) [ClassicSimilarity], result of:
      0.06504348 = score(doc=3895,freq=10.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.40312994 = fieldWeight in 3895, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3895)
  0.25 = coord(1/4)
```
Abstract

Building up new collections for digital libraries is a demanding task. Available data sets have to be extracted which is usually done with the help of software developers as it involves custom data handlers or conversion scripts. In cases where the desired data is only available on the data provider's website custom web scrapers are needed. This may be the case for small to medium-size publishers, research institutes or funding agencies. As data curation is a typical task that is done by people with a library and information science background, these people are usually proficient with XML technologies but are not full-stack programmers. Therefore we would like to present a web scraping tool that does not demand the digital library curators to program custom web scrapers from scratch. We present the open-source tool OXPath, an extension of XPath, that allows the user to define data to be extracted from websites in a declarative way. By taking one of our own use cases as an example, we guide you in more detail through the process of creating an OXPath wrapper for metadata harvesting. We also point out some practical things to consider when creating a web scraper (with OXPath). On top of that, we also present a syntax highlighting plugin for the popular text editor Atom that we developed to further support OXPath users and to simplify the authoring process.
Assumpção, F.S.; Santarem Segundo, J.E.; Ventura Amorim da Costa Santos, P.L.: RDA element sets and RDA value vocabularies : vocabularies for resource description in the Semantic Web (2015) 0.02
```
0.015114739 = product of:
  0.060458954 = sum of:
    0.060458954 = weight(_text_:web in 2389) [ClassicSimilarity], result of:
      0.060458954 = score(doc=2389,freq=6.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.37471575 = fieldWeight in 2389, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2389)
  0.25 = coord(1/4)
```
Abstract

Considering the need for metadata standards suitable for the Semantic Web, this paper describes the RDA Element Sets and the RDA Value Vocabularies that were created from attributes and relationships defined in Resource Description and Access (RDA). First, we present the vocabularies included in RDA Element Sets: the vocabularies of classes, of properties and of properties unconstrained by FRBR entities; and then we present the RDA Value Vocabularies, which are under development. As a conclusion, we highlight that these vocabularies can be used to meet the needs of different contexts due to the unconstrained properties and to the independence of the vocabularies of properties from the vocabularies of values and vice versa.

Theme

Semantic Web

Al-Eryani, S.; Bucher, G.; Rühle, S: ¬Ein Metadatenmodell für gemischte Sammlungen (2018) 0.01

0.014544163 = product of:
  0.05817665 = sum of:
    0.05817665 = weight(_text_:web in 5110) [ClassicSimilarity], result of:
      0.05817665 = score(doc=5110,freq=2.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.36057037 = fieldWeight in 5110, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=5110)
  0.25 = coord(1/4)

Abstract: Im Rahmen des DFG-geförderten Projekts "Entwicklung von interoperablen Standards für die Kontextualisierung heterogener Objekte am Beispiel der Provenienz Asch" wurde ein Semantic Web und Linked Open Data fähiges Metadatenmodell entwickelt, das es ermöglicht, institutionsübergreifend Kulturerbe und dessen Provenienz zu kontextualisieren.

Strobel, S.; Marín-Arraiza, P.: Metadata for scientific audiovisual media : current practices and perspectives of the TIB / AV-portal (2015) 0.01
```
0.014286577 = product of:
  0.057146307 = sum of:
    0.057146307 = weight(_text_:search in 3667) [ClassicSimilarity], result of:
      0.057146307 = score(doc=3667,freq=6.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.33256388 = fieldWeight in 3667, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3667)
  0.25 = coord(1/4)
```
Abstract

Descriptive metadata play a key role in finding relevant search results in large amounts of unstructured data. However, current scientific audiovisual media are provided with little metadata, which makes them hard to find, let alone individual sequences. In this paper, the TIB / AV-Portal is presented as a use case where methods concerning the automatic generation of metadata, a semantic search and cross-lingual retrieval (German/English) have already been applied. These methods result in a better discoverability of the scientific audiovisual media hosted in the portal. Text, speech, and image content of the video are automatically indexed by specialised GND (Gemeinsame Normdatei) subject headings. A semantic search is established based on properties of the GND ontology. The cross-lingual retrieval uses English 'translations' that were derived by an ontology mapping (DBpedia i. a.). Further ways of increasing the discoverability and reuse of the metadata are publishing them as Linked Open Data and interlinking them with other data sets.
Suranofsky, M.; McColl, L.: a Google sheets add-on that uses the WorldCat search API : MatchMarc (2019) 0.01
```
0.013997929 = product of:
  0.055991717 = sum of:
    0.055991717 = weight(_text_:search in 5442) [ClassicSimilarity], result of:
      0.055991717 = score(doc=5442,freq=4.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.3258447 = fieldWeight in 5442, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=5442)
  0.25 = coord(1/4)
```
Abstract

Lehigh University Libraries has developed a new tool for querying WorldCat using the WorldCat Search API. The tool is a Google Sheet Add-on and is available now via the Google Sheets Add-ons menu under the name "MatchMarc." The add-on is easily customizable, with no knowledge of coding needed. The tool will return a single "best" OCLC record number, and its bibliographic information for a given ISBN or LCCN, allowing the user to set up and define "best." Because all of the information, the input, the criteria, and the results exist in the Google Sheets environment, efficient workflows can be developed from this flexible starting point. This article will discuss the development of the add-on, how it works, and future plans for development.
Söhler, M.: Schluss mit Schema F (2011) 0.01
```
0.013008695 = product of:
  0.05203478 = sum of:
    0.05203478 = weight(_text_:web in 4439) [ClassicSimilarity], result of:
      0.05203478 = score(doc=4439,freq=10.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.32250395 = fieldWeight in 4439, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=4439)
  0.25 = coord(1/4)
```
Abstract

Mit Schema.org und dem semantischen Web sollen Suchmaschinen verstehen lernen

Content

"Wörter haben oft mehrere Bedeutungen. Einige kennen den "Kanal" als künstliche Wasserstraße, andere vom Fernsehen. Die Waage kann zum Erfassen des Gewichts nützlich sein oder zur Orientierung auf der Horoskopseite. Casablanca ist eine Stadt und ein Film zugleich. Wo Menschen mit der Zeit Bedeutungen unterscheiden und verarbeiten lernen, können dies Suchmaschinen von selbst nicht. Stets listen sie dumpf hintereinander weg alles auf, was sie zu einem Thema finden. Damit das nicht so bleibt, haben sich nun Google, Yahoo und die zu Microsoft gehörende Suchmaschine Bing zusammengetan, um der Suche im Netz mehr Verständnis zu verpassen. Man spricht dabei auch von einer "semantischen Suche". Das Ergebnis heißt Schema.org. Wer die Webseite einmal besucht, sich ein wenig in die Unterstrukturen hereinklickt und weder Vorkenntnisse im Programmieren noch im Bereich des semantischen Webs hat, wird sich überfordert und gelangweilt wieder abwenden. Doch was hier entstehen könnte, hat das Zeug dazu, Teile des Netzes und speziell die Funktionen von Suchmaschinen mittel- oder langfristig zu verändern. "Große Player sind dabei, sich auf Standards zu einigen", sagt Daniel Bahls, Spezialist für Semantische Technologien beim ZBW Leibniz-Informationszentrum Wirtschaft in Hamburg. "Die semantischen Technologien stehen schon seit Jahren im Raum und wurden bisher nur im kleineren Kontext verwendet." Denn Schema.org lädt Entwickler, Forscher, die Semantic-Web-Community und am Ende auch alle Betreiber von Websites dazu ein, an der Umgestaltung der Suche im Netz mitzuwirken. Inhalte von Websites sollen mit einem speziellen, aber einheitlichen Vokabular für die Crawler - die Analyseprogramme der Suchmaschinen - gekennzeichnet und aufbereitet werden.
Indem Schlagworte, sogenannte Tags, in den für Normal-User nicht sichtbaren Teil des Codes von Websites eingebettet werden, sind Suchmachinen nicht mehr so sehr auf die Analyse der natürlichen Sprache angewiesen, um Texte inhaltlich zu erfassen. Im Blog ZBW Mediatalk wird dies als "Semantic Web light" bezeichnet - ein semantisches Web auf niedrigster Ebene. Aber selbst das werde "schon viel bewirken", meint Bahls. "Das semantische Web wird sich über die nächsten Jahrzehnte evolutionär weiterentwickeln." Einen "Abschluss" werde es nie geben, "da eine einheitliche Formalisierung von Begrifflichkeiten auf feiner Stufe kaum möglich ist". Die Ergebnisse aus Schema.org würden "zeitnah" in die Suchmaschine integriert, "denn einen Zeitplan" gebe es nicht, so Stefan Keuchel, Pressesprecher von Google Deutschland. Bis das so weit ist, hilft der Verweis von Daniel Bahns auf die bereits existierende semantische Suchmaschine Sig.ma. Geschwindigkeit und Menge der Ergebnisse nach einer Suchanfrage spielen hier keine Rolle. Sig.ma sammelt seine Informationen allein im Bereich des semantischen Webs und listet nach einer Anfrage alles Bekannte strukturiert auf.
Stiller, J.; Olensky, M.; Petras, V.: ¬A framework for the evaluation of automatic metadata enrichments (2014) 0.01
```
0.011547703 = product of:
  0.046190813 = sum of:
    0.046190813 = weight(_text_:search in 1587) [ClassicSimilarity], result of:
      0.046190813 = score(doc=1587,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.2688082 = fieldWeight in 1587, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1587)
  0.25 = coord(1/4)
```
Abstract

Automatic enrichment of collections connects data to vocabularies, which supports the contextualization of content and adds searchable text to metadata. The paper introduces a framework of four dimensions (frequency, coverage, relevance and error rate) that measure both the suitability of the enrichment for the object and the enrichments' contribution to search success. To verify the framework, it is applied to the evaluation of automatic enrichments in the digital library Europeana. The analysis of 100 result sets and their corresponding queries (1,121 documents total) shows the framework is a valuable tool for guiding enrichments and determining the value of enrichment efforts.
Bohne-Lang, A.: Semantische Metadaten für den Webauftritt einer Bibliothek (2016) 0.01
```
0.010284277 = product of:
  0.041137107 = sum of:
    0.041137107 = weight(_text_:web in 3337) [ClassicSimilarity], result of:
      0.041137107 = score(doc=3337,freq=4.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.25496176 = fieldWeight in 3337, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3337)
  0.25 = coord(1/4)
```
Abstract

Das Semantic Web ist schon seit über 10 Jahren viel beachtet und hat mit der Verfügbarkeit von Resource Description Framework (RDF) und den entsprechenden Ontologien einen großen Sprung in die Praxis gemacht. Vertreter kleiner Bibliotheken und Bibliothekare mit geringer Technik-Affinität stehen aber im Alltag vor großen Hürden, z.B. bei der Frage, wie man diese Technik konkret in den eigenen Webauftritt einbinden kann: man kommt sich vor wie Don Quijote, der versucht die Windmühlen zu bezwingen. RDF mit seinen Ontologien ist fast unverständlich komplex für Nicht-Informatiker und somit für den praktischen Einsatz auf Bibliotheksseiten in der Breite nicht direkt zu gebrauchen. Mit Schema.org wurde ursprünglich von den drei größten Suchmaschinen der Welt Google, Bing und Yahoo eine einfach und effektive semantische Beschreibung von Entitäten entwickelt. Aktuell wird Schema.org durch Google, Microsoft, Yahoo und Yandex weiter gesponsert und von vielen weiteren Suchmaschinen verstanden. Vor diesem Hintergrund hat die Bibliothek der Medizinischen Fakultät Mannheim auf ihrer Homepage (http://www.umm.uni-heidelberg.de/bibl/) verschiedene maschinenlesbare semantische Metadaten eingebettet. Sehr interessant und zukunftsweisend ist die neueste Entwicklung von Schema.org, bei der man eine 'Library' (https://schema.org/Library) mit Öffnungszeiten und vielem mehr modellieren kann. Ferner haben wir noch semantische Metadaten im Open Graph- und Dublin Core-Format eingebettet, um alte Standards und Facebook-konforme Informationen maschinenlesbar zur Verfügung zu stellen.

Theme

Semantic Web

Search (46 results, page 1 of 3)

Authors

Languages

Themes