Search (1 results, page 1 of 1)

  • × author_ss:"Wang, S."
  • × theme_ss:"Normdateien"
  1. Wang, S.; Koopman, R.: Second life for authority records (2015) 0.00
    0.0020608194 = product of:
      0.008243278 = sum of:
        0.008243278 = weight(_text_:information in 2303) [ClassicSimilarity], result of:
          0.008243278 = score(doc=2303,freq=6.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.1343758 = fieldWeight in 2303, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=2303)
      0.25 = coord(1/4)
    
    Abstract
    Authority control is a standard practice in the library community that provides consistent, unique, and unambiguous reference to entities such as persons, places, concepts, etc. The ideal way of referring to authority records through unique identifiers is in line with the current linked data principle. When presenting a bibliographic record, the linked authority records are expanded with the authoritative information. This way, any update in the authority records will not affect the indexing of the bibliographic records. The structural information in the authority files can also be leveraged to expand the user's query to retrieve bibliographic records associated with all the variations, narrower terms or related terms. However, in many digital libraries, especially largescale aggregations such as WorldCat and Europeana, name strings are often used instead of authority record identifiers. This is also partly due to the lack of global authority records that are valid across countries and cultural heritage domains. But even when there are global authority systems, they are not applied at scale. For example, in WorldCat, only 15% of the records have DDC and 3% have UDC codes; less than 40% of the records have one or more topical terms catalogued in the 650 MARC field, many of which are too general (such as "sports" or "literature") to be useful for retrieving bibliographic records. Therefore, when a user query is based on a Dewey code, the results usually have high precision but the recall is much lower than it should be; and, a search on a general topical term returns millions of hits without being even complete. All these practices make it difficult to leverage the key benefits of authority files. This is also true for authority files that have been transformed into linked data and enriched with mapping information. There are practical reasons for using name strings instead of identifiers. One is the indexing and query response. The future infrastructure design should take the performance into account while embracing the benefit of linking instead of copying, without introducing extra complexity to users. Notwithstanding all the restrictions, we argue that largescale aggregations also bring new opportunities for better exploiting the benefits of authority records. It is possible to use machine learning techniques to automatically link bibliographic records to authority records based on the manual input of cataloguers. Text mining and visualization techniques can offer a contextual view of authority records, which in return can be used to retrieve missing or mis-catalogued records. In this talk, we will describe such opportunities in more detail.