Search (3 results, page 1 of 1)

Crane, G.: What do you do with a million books? (2006) 0.02
```
0.023647794 = product of:
  0.11823897 = sum of:
    0.11823897 = weight(_text_:books in 1180) [ClassicSimilarity], result of:
      0.11823897 = score(doc=1180,freq=10.0), product of:
        0.24756333 = queryWeight, product of:
          4.8330836 = idf(docFreq=956, maxDocs=44218)
          0.051222645 = queryNorm
        0.477611 = fieldWeight in 1180, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.8330836 = idf(docFreq=956, maxDocs=44218)
          0.03125 = fieldNorm(doc=1180)
  0.2 = coord(1/5)
```
Abstract

The Greek historian Herodotus has the Athenian sage Solon estimate the lifetime of a human being at c. 26,250 days (Herodotus, The Histories, 1.32). If we could read a book on each of those days, it would take almost forty lifetimes to work through every volume in a single million book library. The continuous tradition of written European literature that began with the Iliad and Odyssey in the eighth century BCE is itself little more than a million days old. While libraries that contain more than one million items are not unusual, print libraries never possessed a million books of use to any one reader. The great libraries that took shape in the nineteenth and twentieth centuries were meta-structures, whose catalogues and finding aids allowed readers to create their own customized collections, building on the fixed classification schemes and disciplinary structures that took shape in the nineteenth century. The digital libraries of the early twenty-first century can be searched and their contents transmitted around the world. They can contain time-based media, images, quantitative data, and a far richer array of content than print, with visualization technologies blurring the boundaries between library and museum. But our digital libraries remain filled with digital incunabula - digital objects whose form remains firmly rooted in traditions of print, with HTML and PDF largely mimicking the limitations of their print predecessors. Vast collections based on image books - raw digital pictures of books with searchable but uncorrected text from OCR - could arguably retard our long-term progress, reinforcing the hegemony of structures that evolved to minimize the challenges of a world where paper was the only medium of distribution and where humans alone could read. Already the books in a digital library are beginning to read one another and to confer among themselves before creating a new synthetic document for review by their human readers.
Mimno, D.; Crane, G.; Jones, A.: Hierarchical catalog records : implementing a FRBR catalog (2005) 0.01
```
0.010575616 = product of:
  0.052878078 = sum of:
    0.052878078 = weight(_text_:books in 1183) [ClassicSimilarity], result of:
      0.052878078 = score(doc=1183,freq=2.0), product of:
        0.24756333 = queryWeight, product of:
          4.8330836 = idf(docFreq=956, maxDocs=44218)
          0.051222645 = queryNorm
        0.21359414 = fieldWeight in 1183, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.8330836 = idf(docFreq=956, maxDocs=44218)
          0.03125 = fieldNorm(doc=1183)
  0.2 = coord(1/5)
```
Abstract

IFLA's Functional Requirements for Bibliographic Records (FRBR) lay the foundation for a new generation of cataloging systems that recognize the difference between a particular work (e.g., Moby Dick), diverse expressions of that work (e.g., translations into German, Japanese and other languages), different versions of the same basic text (e.g., the Modern Library Classics vs. Penguin editions), and particular items (a copy of Moby Dick on the shelf). Much work has gone into finding ways to infer FRBR relationships between existing catalog records and modifying catalog interfaces to display those relationships. Relatively little work, however, has gone into exploring the creation of catalog records that are inherently based on the FRBR hierarchy of works, expressions, manifestations, and items. The Perseus Digital Library has created a new catalog that implements such a system for a small collection that includes many works with multiple versions. We have used this catalog to explore some of the implications of hierarchical catalog records for searching and browsing. Current online library catalog interfaces present many problems for searching. One commonly cited failure is the inability to find and collocate all versions of a distinct intellectual work that exist in a collection and the inability to take into account known variations in titles and personal names (Yee 2005). The IFLA Functional Requirements for Bibliographic Records (FRBR) attempts to address some of these failings by introducing the concept of multiple interrelated bibliographic entities (IFLA 1998). In particular, relationships between abstract intellectual works and the various published instances of those works are divided into a four-level hierarchy of works (such as the Aeneid), expressions (Robert Fitzgerald's translation of the Aeneid), manifestations (a particular paperback edition of Robert Fitzgerald's translation of the Aeneid), and items (my copy of a particular paperback edition of Robert Fitzgerald's translation of the Aeneid). In this formulation, each level in the hierarchy "inherits" information from the preceding level. Much of the work on FRBRized catalogs so far has focused on organizing existing records that describe individual physical books. Relatively little work has gone into rethinking what information should be in catalog records, or how the records should relate to each other. It is clear, however, that a more "native" FRBR catalog would include separate records for works, expressions, manifestations, and items. In this way, all information about a work would be centralized in one record. Records for subsequent expressions of that work would add only the information specific to each expression: Samuel Butler's translation of the Iliad does not need to repeat the fact that the work was written by Homer. This approach has certain inherent advantages for collections with many versions of the same works: new publications can be cataloged more quickly, and records can be stored and updated more efficiently.
Crane, G.; Jones, A.: Text, information, knowledge and the evolving record of humanity (2006) 0.01
```
0.00660976 = product of:
  0.0330488 = sum of:
    0.0330488 = weight(_text_:books in 1182) [ClassicSimilarity], result of:
      0.0330488 = score(doc=1182,freq=2.0), product of:
        0.24756333 = queryWeight, product of:
          4.8330836 = idf(docFreq=956, maxDocs=44218)
          0.051222645 = queryNorm
        0.13349634 = fieldWeight in 1182, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.8330836 = idf(docFreq=956, maxDocs=44218)
          0.01953125 = fieldNorm(doc=1182)
  0.2 = coord(1/5)
```
Abstract

Consider a sentence such as "the current price of tea in China is 35 cents per pound." In a library with millions of books we might find many statements of the above form that we could capture today with relatively simple rules: rather than pursuing every variation of a statement, programs can wait, like predators at a water hole, for their informational prey to reappear in a standard linguistic pattern. We can make inferences from sentences such as "NAME1 born at NAME2 in DATE" that NAME more likely than not represents a person and NAME a place and then convert the statement into a proposition about a person born at a given place and time. The changing price of tea in China, pedestrian birth and death dates, or other basic statements may not be truth and beauty in the Phaedrus, but a digital library that could plot the prices of various commodities in different markets over time, plot the various lifetimes of individuals, or extract and classify many events would be very useful. Services such as the Syllabus Finder1 and H-Bot2 (which Dan Cohen describes elsewhere in this issue of D-Lib) represent examples of information extraction already in use. H-Bot, in particular, builds on our evolving ability to extract information from very large corpora such as the billions of web pages available through the Google API. Aside from identifying higher order statements, however, users also want to search and browse named entities: they want to read about "C. P. E. Bach" rather than his father "Johann Sebastian" or about "Cambridge, Maryland", without hearing about "Cambridge, Massachusetts", Cambridge in the UK or any of the other Cambridges scattered around the world. Named entity identification is a well-established area with an ongoing literature. The Natural Language Processing Research Group at the University of Sheffield has developed its open source Generalized Architecture for Text Engineering (GATE) for years, while IBM's Unstructured Information Analysis and Search (UIMA) is "available as open source software to provide a common foundation for industry and academia." Powerful tools are thus freely available and more demanding users can draw upon published literature to develop their own systems. Major search engines such as Google and Yahoo also integrate increasingly sophisticated tools to categorize and identify places. The software resources are rich and expanding. The reference works on which these systems depend, however, are ill-suited for historical analysis. First, simple gazetteers and similar authority lists quickly grow too big for useful information extraction. They provide us with potential entities against which to match textual references, but existing electronic reference works assume that human readers can use their knowledge of geography and of the immediate context to pick the right Boston from the Bostons in the Getty Thesaurus of Geographic Names (TGN), but, with the crucial exception of geographic location, the TGN records do not provide any machine readable clues: we cannot tell which Bostons are large or small. If we are analyzing a document published in 1818, we cannot filter out those places that did not yet exist or that had different names: "Jefferson Davis" is not the name of a parish in Louisiana (tgn,2000880) or a county in Mississippi (tgn,2001118) until after the Civil War.

Search (3 results, page 1 of 1)

Authors

Types

Themes