Search (65 results, page 1 of 4)

  • × type_ss:"a"
  • × type_ss:"el"
  • × year_i:[2000 TO 2010}
  1. Nicholson, D.: Help us make HILT's terminology services useful in your information service (2008) 0.03
    0.027166676 = product of:
      0.12225004 = sum of:
        0.076776505 = weight(_text_:readable in 3654) [ClassicSimilarity], result of:
          0.076776505 = score(doc=3654,freq=2.0), product of:
            0.2262076 = queryWeight, product of:
              6.1439276 = idf(docFreq=257, maxDocs=44218)
              0.036818076 = queryNorm
            0.33940727 = fieldWeight in 3654, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.1439276 = idf(docFreq=257, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3654)
        0.04547354 = weight(_text_:data in 3654) [ClassicSimilarity], result of:
          0.04547354 = score(doc=3654,freq=10.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.39059696 = fieldWeight in 3654, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3654)
      0.22222222 = coord(2/9)
    
    Abstract
    The JISC-funded HILT project is looking to make contact with staff in information services or projects interested in helping it test and refine its developing terminology services. The project is currently working to create pilot web services that will deliver machine-readable terminology and cross-terminology mappings data likely to be useful to information services wishing to extend or enhance the efficacy of their subject search or browse services. Based on SRW/U, SOAP, and SKOS, the HILT facilities, when fully operational, will permit such services to improve their own subject search and browse mechanisms by using HILT data in a fashion transparent to their users. On request, HILT will serve up machine-processable data on individual subject schemes (broader terms, narrower terms, hierarchy information, preferred and non-preferred terms, and so on) and interoperability data (usually intellectual or automated mappings between schemes, but the architecture allows for the use of other methods) - data that can be used to enhance user services. The project is also developing an associated toolkit that will help service technical staff to embed HILT-related functionality into their services. The primary aim is to serve JISC funded information services or services at JISC institutions, but information services outside the JISC domain may also find the proposed services useful and wish to participate in the test and refine process.
  2. Dobratz, S.; Neuroth, H.: nestor: Network of Expertise in long-term STOrage of digital Resources : a digital preservation initiative for Germany (2004) 0.03
    0.026335854 = product of:
      0.11851134 = sum of:
        0.012201829 = weight(_text_:data in 1195) [ClassicSimilarity], result of:
          0.012201829 = score(doc=1195,freq=2.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.10480815 = fieldWeight in 1195, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1195)
        0.10630951 = weight(_text_:germany in 1195) [ClassicSimilarity], result of:
          0.10630951 = score(doc=1195,freq=12.0), product of:
            0.21956629 = queryWeight, product of:
              5.963546 = idf(docFreq=308, maxDocs=44218)
              0.036818076 = queryNorm
            0.4841796 = fieldWeight in 1195, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              5.963546 = idf(docFreq=308, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1195)
      0.22222222 = coord(2/9)
    
    Abstract
    Sponsored by the German Ministry of Education and Research with funding of 800.000 EURO, the German Network of Expertise in long-term storage of digital resources (nestor) began in June 2003 as a cooperative effort of 6 partners representing different players within the field of long-term preservation. The partners include: * The German National Library (Die Deutsche Bibliothek) as the lead institution for the project * The State and University Library of Lower Saxony Göttingen (Staats- und Universitätsbibliothek Göttingen) * The Computer and Media Service and the University Library of Humboldt-University Berlin (Humboldt-Universität zu Berlin) * The Bavarian State Library in Munich (Bayerische Staatsbibliothek) * The Institute for Museum Information in Berlin (Institut für Museumskunde) * General Directorate of the Bavarian State Archives (GDAB) As in other countries, long-term preservation of digital resources has become an important issue in Germany in recent years. Nevertheless, coming to agreement with institutions throughout the country to cooperate on tasks for a long-term preservation effort has taken a great deal of effort. Although there had been considerable attention paid to the preservation of physical media like CD-ROMS, technologies available for the long-term preservation of digital publications like e-books, digital dissertations, websites, etc., are still lacking. Considering the importance of the task within the federal structure of Germany, with the responsibility of each federal state for its science and culture activities, it is obvious that the approach to a successful solution of these issues in Germany must be a cooperative approach. Since 2000, there have been discussions about strategies and techniques for long-term archiving of digital information, particularly within the distributed structure of Germany's library and archival institutions. A key part of all the previous activities was focusing on using existing standards and analyzing the context in which those standards would be applied. One such activity, the Digital Library Forum Planning Project, was done on behalf of the German Ministry of Education and Research in 2002, where the vision of a digital library in 2010 that can meet the changing and increasing needs of users was developed and described in detail, including the infrastructure required and how the digital library would work technically, what it would contain and how it would be organized. The outcome was a strategic plan for certain selected specialist areas, where, amongst other topics, a future call for action for long-term preservation was defined, described and explained against the background of practical experience.
    As follow up, in 2002 the nestor long-term archiving working group provided an initial spark towards planning and organising coordinated activities concerning the long-term preservation and long-term availability of digital documents in Germany. This resulted in a workshop, held 29 - 30 October 2002, where major tasks were discussed. Influenced by the demands and progress of the nestor network, the participants reached agreement to start work on application-oriented projects and to address the following topics: * Overlapping problems o Collection and preservation of digital objects (selection criteria, preservation policy) o Definition of criteria for trusted repositories o Creation of models of cooperation, etc. * Digital objects production process o Analysis of potential conflicts between production and long-term preservation o Documentation of existing document models and recommendations for standards models to be used for long-term preservation o Identification systems for digital objects, etc. * Transfer of digital objects o Object data and metadata o Transfer protocols and interoperability o Handling of different document types, e.g. dynamic publications, etc. * Long-term preservation of digital objects o Design and prototype implementation of depot systems for digital objects (OAIS was chosen to be the best functional model.) o Authenticity o Functional requirements on user interfaces of an depot system o Identification systems for digital objects, etc. At the end of the workshop, participants decided to establish a permanent distributed infrastructure for long-term preservation and long-term accessibility of digital resources in Germany comparable, e.g., to the Digital Preservation Coalition in the UK. The initial phase, nestor, is now being set up by the above-mentioned 3-year funding project.
  3. Hjoerland, B.: Arguments for 'the bibliographical paradigm' : some thoughts inspired by the new English edition of the UDC (2007) 0.02
    0.019660873 = product of:
      0.08847393 = sum of:
        0.06407028 = weight(_text_:bibliographic in 552) [ClassicSimilarity], result of:
          0.06407028 = score(doc=552,freq=6.0), product of:
            0.14333439 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.036818076 = queryNorm
            0.44699866 = fieldWeight in 552, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.046875 = fieldNorm(doc=552)
        0.024403658 = weight(_text_:data in 552) [ClassicSimilarity], result of:
          0.024403658 = score(doc=552,freq=2.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.2096163 = fieldWeight in 552, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=552)
      0.22222222 = coord(2/9)
    
    Abstract
    The term 'the bibliographic paradigm' is used in the literature of library and information science, but is a very seldom term and is almost always negatively described. This paper reconsiders this concept. Method. The method is mainly 'analytical'. Empirical data concerning the current state of the UDC-classification system are also presented in order to illuminate the connection between theory and practice. Analysis. The bibliographic paradigm is understood as a perspective in library and information science focusing on documents and information resources, their description, organization, mediation and use. This perspective is examined as one among other metatheories of library and information science and its philosophical assumptions and implications are outlined. Results. The neglect and misunderstanding of 'the bibliographic paradigm' as well as the quality of the new UDC-classification indicate that both the metatheoretical discourses on library and information science and its concrete practice seem to be in a state of crisis.
  4. Brahms, E.: Digital library initiatives of the Deutsche Forschungsgemeinschaft (2001) 0.02
    0.01928919 = product of:
      0.1736027 = sum of:
        0.1736027 = weight(_text_:germany in 1190) [ClassicSimilarity], result of:
          0.1736027 = score(doc=1190,freq=8.0), product of:
            0.21956629 = queryWeight, product of:
              5.963546 = idf(docFreq=308, maxDocs=44218)
              0.036818076 = queryNorm
            0.79066193 = fieldWeight in 1190, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.963546 = idf(docFreq=308, maxDocs=44218)
              0.046875 = fieldNorm(doc=1190)
      0.11111111 = coord(1/9)
    
    Abstract
    The Deutsche Forschungsgemeinschaft (DFG) is the central public funding organization for academic research in Germany. It is thus comparable to a research council or a national research foundation. According to its statutes, DFG's mandate is to serve science and the arts in all fields by supporting research projects carried out at universities and public research institutions in Germany, to promote cooperation between researchers, and to forge and support links between German academic science, industry and partners in foreign countries. In the fulfillment of its tasks, the DFG pays special attention to the education and support of young scientists and scholars. DFG's mandate and operations follow the principle of territoriality. This means that its funding activities are restricted, with very few exceptions, to individuals and institutions with permanent addresses in Germany. Fellowships are granted for work in other countries, but most fellowship programs are restricted to German citizens, with a few exceptions for permanent residents of Germany holding foreign passports.
  5. Godby, C.J.; Young, J.A.; Childress, E.: ¬A repository of metadata crosswalks (2004) 0.02
    0.016889969 = product of:
      0.15200971 = sum of:
        0.15200971 = weight(_text_:readable in 1155) [ClassicSimilarity], result of:
          0.15200971 = score(doc=1155,freq=4.0), product of:
            0.2262076 = queryWeight, product of:
              6.1439276 = idf(docFreq=257, maxDocs=44218)
              0.036818076 = queryNorm
            0.67199206 = fieldWeight in 1155, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.1439276 = idf(docFreq=257, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1155)
      0.11111111 = coord(1/9)
    
    Abstract
    This paper proposes a model for metadata crosswalks that associates three pieces of information: the crosswalk, the source metadata standard, and the target metadata standard, each of which may have a machine-readable encoding and human-readable description. The crosswalks are encoded as METS records that are made available to a repository for processing by search engines, OAI harvesters, and custom-designed Web services. The METS object brings together all of the information required to access and interpret crosswalks and represents a significant improvement over previously available formats. But it raises questions about how best to describe these complex objects and exposes gaps that must eventually be filled in by the digital library community.
  6. Kaiser, M.; Lieder, H.J.; Majcen, K.; Vallant, H.: New ways of sharing and using authority information : the LEAF project (2003) 0.01
    0.013089778 = product of:
      0.058904 = sum of:
        0.02273677 = weight(_text_:data in 1166) [ClassicSimilarity], result of:
          0.02273677 = score(doc=1166,freq=10.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.19529848 = fieldWeight in 1166, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1166)
        0.03616723 = weight(_text_:germany in 1166) [ClassicSimilarity], result of:
          0.03616723 = score(doc=1166,freq=2.0), product of:
            0.21956629 = queryWeight, product of:
              5.963546 = idf(docFreq=308, maxDocs=44218)
              0.036818076 = queryNorm
            0.16472124 = fieldWeight in 1166, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.963546 = idf(docFreq=308, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1166)
      0.22222222 = coord(2/9)
    
    Abstract
    This article presents an overview of the LEAF project (Linking and Exploring Authority Files)1, which has set out to provide a framework for international, collaborative work in the sector of authority data with respect to authority control. Elaborating the virtues of authority control in today's Web environment is an almost futile exercise, since so much has been said and written about it in the last few years.2 The World Wide Web is generally understood to be poorly structured-both with regard to content and to locating required information. Highly structured databases might be viewed as small islands of precision within this chaotic environment. Though the Web in general or any particular structured database would greatly benefit from increased authority control, it should be noted that our following considerations only refer to authority control with regard to databases of "memory institutions" (i.e., libraries, archives, and museums). Moreover, when talking about authority records, we exclusively refer to personal name authority records that describe a specific person. Although different types of authority records could indeed be used in similar ways to the ones presented in this article, discussing those different types is outside the scope of both the LEAF project and this article. Personal name authority records-as are all other "authorities"-are maintained as separate records and linked to various kinds of descriptive records. Name authority records are usually either kept in independent databases or in separate tables in the database containing the descriptive records. This practice points at a crucial benefit: by linking any number of descriptive records to an authorized name record, the records related to this entity are collocated in the database. Variant forms of the authorized name are referenced in the authority records and thus ensure the consistency of the database while enabling search and retrieval operations that produce accurate results. On one hand, authority control may be viewed as a positive prerequisite of a consistent catalogue; on the other, the creation of new authority records is a very time consuming and expensive undertaking. As a consequence, various models of providing access to existing authority records have emerged: the Library of Congress and the French National Library (Bibliothèque nationale de France), for example, make their authority records available to all via a web-based search service.3 In Germany, the Personal Name Authority File (PND, Personennamendatei4) maintained by the German National Library (Die Deutsche Bibliothek, Frankfurt/Main) offers a different approach to shared access: within a closed network, participating institutions have online access to their pooled data. The number of recent projects and initiatives that have addressed the issue of authority control in one way or another is considerable.5 Two important current initiatives should be mentioned here: The Name Authority Cooperative (NACO) and Virtual International Authority File (VIAF).
    NACO was established in 1976 and is hosted by the Library of Congress. At the beginning of 2003, nearly 400 institutions were involved in this undertaking, including 43 institutions from outside the United States.6 Despite the enormous success of NACO and the impressive annual growth of the initiative, there are requirements for participation that form an obstacle for many institutions: they have to follow the Anglo-American Cataloguing Rules (AACR2) and employ the MARC217 data format. Participating institutions also have to belong to either OCLC (Online Computer Library Center) or RLG (Research Libraries Group) in order to be able to contribute records, and they have to provide a specified minimum number of authority records per year. A recent proof of concept project of the Library of Congress, OCLC and the German National Library-Virtual International Authority File (VIAF)8-will, in its first phase, test automatic linking of the records of the Library of Congress Name Authority File (LCNAF) and the German Personal Name Authority File by using matching algorithms and software developed by OCLC. The results are expected to form the basis of a "Virtual International Authority File". The project will then test the maintenance of the virtual authority file by employing the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)9 to harvest the metadata for new, updated, and deleted records. When using the "Virtual International Authority File" a cataloguer will be able to check the system to see whether the authority record he wants to establish already exists. The final phase of the project will test possibilities for displaying records in the preferred language and script of the end user. Currently, there are still some clear limitations associated with the ways in which authority records are used by memory institutions. One of the main problems has to do with limited access: generally only large institutions or those that are part of a library network have unlimited online access to permanently updated authority records. Smaller institutions outside these networks usually have to fall back on less efficient ways of obtaining authority data, or have no access at all. Cross-domain sharing of authority data between libraries, archives, museums and other memory institutions simply does not happen at present. Public users are, by and large, not even aware that such things as name authority records exist and are excluded from access to these information resources.
  7. Crane, G.; Jones, A.: Text, information, knowledge and the evolving record of humanity (2006) 0.01
    0.013049919 = product of:
      0.058724634 = sum of:
        0.038388252 = weight(_text_:readable in 1182) [ClassicSimilarity], result of:
          0.038388252 = score(doc=1182,freq=2.0), product of:
            0.2262076 = queryWeight, product of:
              6.1439276 = idf(docFreq=257, maxDocs=44218)
              0.036818076 = queryNorm
            0.16970363 = fieldWeight in 1182, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.1439276 = idf(docFreq=257, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1182)
        0.020336384 = weight(_text_:data in 1182) [ClassicSimilarity], result of:
          0.020336384 = score(doc=1182,freq=8.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.17468026 = fieldWeight in 1182, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1182)
      0.22222222 = coord(2/9)
    
    Abstract
    Consider a sentence such as "the current price of tea in China is 35 cents per pound." In a library with millions of books we might find many statements of the above form that we could capture today with relatively simple rules: rather than pursuing every variation of a statement, programs can wait, like predators at a water hole, for their informational prey to reappear in a standard linguistic pattern. We can make inferences from sentences such as "NAME1 born at NAME2 in DATE" that NAME more likely than not represents a person and NAME a place and then convert the statement into a proposition about a person born at a given place and time. The changing price of tea in China, pedestrian birth and death dates, or other basic statements may not be truth and beauty in the Phaedrus, but a digital library that could plot the prices of various commodities in different markets over time, plot the various lifetimes of individuals, or extract and classify many events would be very useful. Services such as the Syllabus Finder1 and H-Bot2 (which Dan Cohen describes elsewhere in this issue of D-Lib) represent examples of information extraction already in use. H-Bot, in particular, builds on our evolving ability to extract information from very large corpora such as the billions of web pages available through the Google API. Aside from identifying higher order statements, however, users also want to search and browse named entities: they want to read about "C. P. E. Bach" rather than his father "Johann Sebastian" or about "Cambridge, Maryland", without hearing about "Cambridge, Massachusetts", Cambridge in the UK or any of the other Cambridges scattered around the world. Named entity identification is a well-established area with an ongoing literature. The Natural Language Processing Research Group at the University of Sheffield has developed its open source Generalized Architecture for Text Engineering (GATE) for years, while IBM's Unstructured Information Analysis and Search (UIMA) is "available as open source software to provide a common foundation for industry and academia." Powerful tools are thus freely available and more demanding users can draw upon published literature to develop their own systems. Major search engines such as Google and Yahoo also integrate increasingly sophisticated tools to categorize and identify places. The software resources are rich and expanding. The reference works on which these systems depend, however, are ill-suited for historical analysis. First, simple gazetteers and similar authority lists quickly grow too big for useful information extraction. They provide us with potential entities against which to match textual references, but existing electronic reference works assume that human readers can use their knowledge of geography and of the immediate context to pick the right Boston from the Bostons in the Getty Thesaurus of Geographic Names (TGN), but, with the crucial exception of geographic location, the TGN records do not provide any machine readable clues: we cannot tell which Bostons are large or small. If we are analyzing a document published in 1818, we cannot filter out those places that did not yet exist or that had different names: "Jefferson Davis" is not the name of a parish in Louisiana (tgn,2000880) or a county in Mississippi (tgn,2001118) until after the Civil War.
    Although the Alexandria Digital Library provides far richer data than the TGN (5.9 vs. 1.3 million names), its added size lowers, rather than increases, the accuracy of most geographic name identification systems for historical documents: most of the extra 4.6 million names cover low frequency entities that rarely occur in any particular corpus. The TGN is sufficiently comprehensive to provide quite enough noise: we find place names that are used over and over (there are almost one hundred Washingtons) and semantically ambiguous (e.g., is Washington a person or a place?). Comprehensive knowledge sources emphasize recall but lower precision. We need data with which to determine which "Tribune" or "John Brown" a particular passage denotes. Secondly and paradoxically, our reference works may not be comprehensive enough. Human actors come and go over time. Organizations appear and vanish. Even places can change their names or vanish. The TGN does associate the obsolete name Siam with the nation of Thailand (tgn,1000142) - but also with towns named Siam in Iowa (tgn,2035651), Tennessee (tgn,2101519), and Ohio (tgn,2662003). Prussia appears but as a general region (tgn,7016786), with no indication when or if it was a sovereign nation. And if places do point to the same object over time, that object may have very different significance over time: in the foundational works of Western historiography, Herodotus reminds us that the great cities of the past may be small today, and the small cities of today great tomorrow (Hdt. 1.5), while Thucydides stresses that we cannot estimate the past significance of a place by its appearance today (Thuc. 1.10). In other words, we need to know the population figures for the various Washingtons in 1870 if we are analyzing documents from 1870. The foundations have been laid for reference works that provide machine actionable information about entities at particular times in history. The Alexandria Digital Library Gazetteer Content Standard8 represents a sophisticated framework with which to create such resources: places can be associated with temporal information about their foundation (e.g., Washington, DC, founded on 16 July 1790), changes in names for the same location (e.g., Saint Petersburg to Leningrad and back again), population figures at various times and similar historically contingent data. But if we have the software and the data structures, we do not yet have substantial amounts of historical content such as plentiful digital gazetteers, encyclopedias, lexica, grammars and other reference works to illustrate many periods and, even if we do, those resources may not be in a useful form: raw OCR output of a complex lexicon or gazetteer may have so many errors and have captured so little of the underlying structure that the digital resource is useless as a knowledge base. Put another way, human beings are still much better at reading and interpreting the contents of page images than machines. While people, places, and dates are probably the most important core entities, we will find a growing set of objects that we need to identify and track across collections, and each of these categories of objects will require its own knowledge sources. The following section enumerates and briefly describes some existing categories of documents that we need to mine for knowledge. This brief survey focuses on the format of print sources (e.g., highly structured textual "database" vs. unstructured text) to illustrate some of the challenges involved in converting our published knowledge into semantically annotated, machine actionable form.
  8. Heflin, J.; Hendler, J.: Semantic interoperability on the Web (2000) 0.01
    0.012827373 = product of:
      0.05772318 = sum of:
        0.040263984 = weight(_text_:data in 759) [ClassicSimilarity], result of:
          0.040263984 = score(doc=759,freq=4.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.34584928 = fieldWeight in 759, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=759)
        0.017459193 = product of:
          0.034918386 = sum of:
            0.034918386 = weight(_text_:22 in 759) [ClassicSimilarity], result of:
              0.034918386 = score(doc=759,freq=2.0), product of:
                0.12893063 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036818076 = queryNorm
                0.2708308 = fieldWeight in 759, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=759)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    XML will have a profound impact on the way data is exchanged on the Internet. An important feature of this language is the separation of content from presentation, which makes it easier to select and/or reformat the data. However, due to the likelihood of numerous industry and domain specific DTDs, those who wish to integrate information will still be faced with the problem of semantic interoperability. In this paper we discuss why this problem is not solved by XML, and then discuss why the Resource Description Framework is only a partial solution. We then present the SHOE language, which we feel has many of the features necessary to enable a semantic web, and describe an existing set of tools that make it easy to use the language.
    Date
    11. 5.2013 19:22:18
  9. Hickey, T.B.; O'Neill, E.T.; Toves, J.: Experiments with the IFLA Functional Requirements for Bibliographic Records (FRBR) (2002) 0.01
    0.009491893 = product of:
      0.08542704 = sum of:
        0.08542704 = weight(_text_:bibliographic in 1660) [ClassicSimilarity], result of:
          0.08542704 = score(doc=1660,freq=6.0), product of:
            0.14333439 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.036818076 = queryNorm
            0.5959982 = fieldWeight in 1660, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.0625 = fieldNorm(doc=1660)
      0.11111111 = coord(1/9)
    
    Abstract
    OCLC is investigating how best to implement IFLA's Functional Requirements for Bibliographic Records (FRBR). As part of that work, we have undertaken a series of experiments with algorithms to group existing bibliographic records into works and expressions. Working with both subsets of records and the whole WorldCat database, the algorithm we developed achieved reasonable success identifying all manifestations of a work.
  10. Decimal Classification Editorial Policy Committee (2002) 0.01
    0.0084384065 = product of:
      0.03797283 = sum of:
        0.020336384 = weight(_text_:data in 236) [ClassicSimilarity], result of:
          0.020336384 = score(doc=236,freq=2.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.17468026 = fieldWeight in 236, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=236)
        0.017636448 = product of:
          0.035272896 = sum of:
            0.035272896 = weight(_text_:22 in 236) [ClassicSimilarity], result of:
              0.035272896 = score(doc=236,freq=4.0), product of:
                0.12893063 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036818076 = queryNorm
                0.27358043 = fieldWeight in 236, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=236)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    The Decimal Classification Editorial Policy Committee (EPC) held its Meeting 117 at the Library Dec. 3-5, 2001, with chair Andrea Stamm (Northwestern University) presiding. Through its actions at this meeting, significant progress was made toward publication of DDC unabridged Edition 22 in mid-2003 and Abridged Edition 14 in early 2004. For Edition 22, the committee approved the revisions to two major segments of the classification: Table 2 through 55 Iran (the first half of the geographic area table) and 900 History and geography. EPC approved updates to several parts of the classification it had already considered: 004-006 Data processing, Computer science; 340 Law; 370 Education; 510 Mathematics; 610 Medicine; Table 3 issues concerning treatment of scientific and technical themes, with folklore, arts, and printing ramifications at 398.2 - 398.3, 704.94, and 758; Table 5 and Table 6 Ethnic Groups and Languages (portions concerning American native peoples and languages); and tourism issues at 647.9 and 790. Reports on the results of testing the approved 200 Religion and 305-306 Social groups schedules were received, as was a progress report on revision work for the manual being done by Ross Trotter (British Library, retired). Revisions for Abridged Edition 14 that received committee approval included 010 Bibliography; 070 Journalism; 150 Psychology; 370 Education; 380 Commerce, communications, and transportation; 621 Applied physics; 624 Civil engineering; and 629.8 Automatic control engineering. At the meeting the committee received print versions of _DC&_ numbers 4 and 5. Primarily for the use of Dewey translators, these cumulations list changes, substantive and cosmetic, to DDC Edition 21 and Abridged Edition 13 for the period October 1999 - December 2001. EPC will hold its Meeting 118 at the Library May 15-17, 2002.
  11. Gorman, M.: From card catalogues to WebPACs : celebrating cataloguing in the 20th century (2000) 0.01
    0.008220221 = product of:
      0.073981985 = sum of:
        0.073981985 = weight(_text_:bibliographic in 6857) [ClassicSimilarity], result of:
          0.073981985 = score(doc=6857,freq=2.0), product of:
            0.14333439 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.036818076 = queryNorm
            0.5161496 = fieldWeight in 6857, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.09375 = fieldNorm(doc=6857)
      0.11111111 = coord(1/9)
    
    Source
    Conference on Bibliographic Control for the New Millennium held in Washington, DC at the Library of Congress, November 2000
  12. Baeza-Yates, R.; Boldi, P.; Castillo, C.: Generalizing PageRank : damping functions for linkbased ranking algorithms (2006) 0.01
    0.0072904974 = product of:
      0.03280724 = sum of:
        0.020336384 = weight(_text_:data in 2565) [ClassicSimilarity], result of:
          0.020336384 = score(doc=2565,freq=2.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.17468026 = fieldWeight in 2565, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2565)
        0.012470853 = product of:
          0.024941705 = sum of:
            0.024941705 = weight(_text_:22 in 2565) [ClassicSimilarity], result of:
              0.024941705 = score(doc=2565,freq=2.0), product of:
                0.12893063 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036818076 = queryNorm
                0.19345059 = fieldWeight in 2565, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2565)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    This paper introduces a family of link-based ranking algorithms that propagate page importance through links. In these algorithms there is a damping function that decreases with distance, so a direct link implies more endorsement than a link through a long path. PageRank is the most widely known ranking function of this family. The main objective of this paper is to determine whether this family of ranking techniques has some interest per se, and how different choices for the damping function impact on rank quality and on convergence speed. Even though our results suggest that PageRank can be approximated with other simpler forms of rankings that may be computed more efficiently, our focus is of more speculative nature, in that it aims at separating the kernel of PageRank, that is, link-based importance propagation, from the way propagation decays over paths. We focus on three damping functions, having linear, exponential, and hyperbolic decay on the lengths of the paths. The exponential decay corresponds to PageRank, and the other functions are new. Our presentation includes algorithms, analysis, comparisons and experiments that study their behavior under different parameters in real Web graph data. Among other results, we show how to calculate a linear approximation that induces a page ordering that is almost identical to PageRank's using a fixed small number of iterations; comparisons were performed using Kendall's tau on large domain datasets.
    Date
    16. 1.2016 10:22:28
  13. Maaten, L. van den; Hinton, G.: Visualizing data using t-SNE (2008) 0.01
    0.006778794 = product of:
      0.061009146 = sum of:
        0.061009146 = weight(_text_:data in 3888) [ClassicSimilarity], result of:
          0.061009146 = score(doc=3888,freq=18.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.52404076 = fieldWeight in 3888, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3888)
      0.11111111 = coord(1/9)
    
    Abstract
    We present a new technique called "t-SNE" that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large data sets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of data sets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the data sets.
    Theme
    Data Mining
  14. Dextre Clarke, S.G.; Will, L.D.; Cochard, N.: ¬The BS8723 thesaurus data model and exchange format, and its relationship to SKOS (2008) 0.01
    0.0063268747 = product of:
      0.05694187 = sum of:
        0.05694187 = weight(_text_:data in 6051) [ClassicSimilarity], result of:
          0.05694187 = score(doc=6051,freq=2.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.48910472 = fieldWeight in 6051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.109375 = fieldNorm(doc=6051)
      0.11111111 = coord(1/9)
    
  15. Aitken, S.; Reid, S.: Evaluation of an ontology-based information retrieval tool (2000) 0.01
    0.006261982 = product of:
      0.05635784 = sum of:
        0.05635784 = weight(_text_:data in 2862) [ClassicSimilarity], result of:
          0.05635784 = score(doc=2862,freq=6.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.48408815 = fieldWeight in 2862, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0625 = fieldNorm(doc=2862)
      0.11111111 = coord(1/9)
    
    Abstract
    This paper evaluates the use of an explicit domain ontology in an information retrieval tool. The evaluation compares the performance of ontology-enhanced retrieval with keyword retrieval for a fixed set of queries across several data sets. The robustness of the IR approach is assessed by comparing the performance of the tool on the original data set with that on previously unseen data.
  16. Baker, T.: ¬A grammar of Dublin Core (2000) 0.01
    0.0058323974 = product of:
      0.026245788 = sum of:
        0.016269106 = weight(_text_:data in 1236) [ClassicSimilarity], result of:
          0.016269106 = score(doc=1236,freq=2.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.1397442 = fieldWeight in 1236, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03125 = fieldNorm(doc=1236)
        0.009976682 = product of:
          0.019953365 = sum of:
            0.019953365 = weight(_text_:22 in 1236) [ClassicSimilarity], result of:
              0.019953365 = score(doc=1236,freq=2.0), product of:
                0.12893063 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036818076 = queryNorm
                0.15476047 = fieldWeight in 1236, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1236)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    Dublin Core is often presented as a modern form of catalog card -- a set of elements (and now qualifiers) that describe resources in a complete package. Sometimes it is proposed as an exchange format for sharing records among multiple collections. The founding principle that "every element is optional and repeatable" reinforces the notion that a Dublin Core description is to be taken as a whole. This paper, in contrast, is based on a much different premise: Dublin Core is a language. More precisely, it is a small language for making a particular class of statements about resources. Like natural languages, it has a vocabulary of word-like terms, the two classes of which -- elements and qualifiers -- function within statements like nouns and adjectives; and it has a syntax for arranging elements and qualifiers into statements according to a simple pattern. Whenever tourists order a meal or ask directions in an unfamiliar language, considerate native speakers will spontaneously limit themselves to basic words and simple sentence patterns along the lines of "I am so-and-so" or "This is such-and-such". Linguists call this pidginization. In such situations, a small phrase book or translated menu can be most helpful. By analogy, today's Web has been called an Internet Commons where users and information providers from a wide range of scientific, commercial, and social domains present their information in a variety of incompatible data models and description languages. In this context, Dublin Core presents itself as a metadata pidgin for digital tourists who must find their way in this linguistically diverse landscape. Its vocabulary is small enough to learn quickly, and its basic pattern is easily grasped. It is well-suited to serve as an auxiliary language for digital libraries. This grammar starts by defining terms. It then follows a 200-year-old tradition of English grammar teaching by focusing on the structure of single statements. It concludes by looking at the growing dictionary of Dublin Core vocabulary terms -- its registry, and at how statements can be used to build the metadata equivalent of paragraphs and compositions -- the application profile.
    Date
    26.12.2011 14:01:22
  17. Bradford, R.B.: Relationship discovery in large text collections using Latent Semantic Indexing (2006) 0.01
    0.0058323974 = product of:
      0.026245788 = sum of:
        0.016269106 = weight(_text_:data in 1163) [ClassicSimilarity], result of:
          0.016269106 = score(doc=1163,freq=2.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.1397442 = fieldWeight in 1163, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03125 = fieldNorm(doc=1163)
        0.009976682 = product of:
          0.019953365 = sum of:
            0.019953365 = weight(_text_:22 in 1163) [ClassicSimilarity], result of:
              0.019953365 = score(doc=1163,freq=2.0), product of:
                0.12893063 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036818076 = queryNorm
                0.15476047 = fieldWeight in 1163, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1163)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Source
    Proceedings of the Fourth Workshop on Link Analysis, Counterterrorism, and Security, SIAM Data Mining Conference, Bethesda, MD, 20-22 April, 2006. [http://www.siam.org/meetings/sdm06/workproceed/Link%20Analysis/15.pdf]
  18. Lavoie, B.; Connaway, L.S.; Dempsey, L.: Anatomy of aggregate collections : the example of Google print for libraries (2005) 0.01
    0.005772891 = product of:
      0.025978008 = sum of:
        0.018495496 = weight(_text_:bibliographic in 1184) [ClassicSimilarity], result of:
          0.018495496 = score(doc=1184,freq=2.0), product of:
            0.14333439 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.036818076 = queryNorm
            0.1290374 = fieldWeight in 1184, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1184)
        0.0074825115 = product of:
          0.014965023 = sum of:
            0.014965023 = weight(_text_:22 in 1184) [ClassicSimilarity], result of:
              0.014965023 = score(doc=1184,freq=2.0), product of:
                0.12893063 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036818076 = queryNorm
                0.116070345 = fieldWeight in 1184, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=1184)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    This article offers some perspectives on GPLP in light of what is known about library print book collections in general, and those of the Google 5 in particular, from information in OCLC's WorldCat bibliographic database and holdings file. Questions addressed include: * Coverage: What proportion of the system-wide print book collection will GPLP potentially cover? What is the degree of holdings overlap across the print book collections of the five participating libraries? * Language: What is the distribution of languages associated with the print books held by the GPLP libraries? Which languages are predominant? * Copyright: What proportion of the GPLP libraries' print book holdings are out of copyright? * Works: How many distinct works are represented in the holdings of the GPLP libraries? How does a focus on works impact coverage and holdings overlap? * Convergence: What are the effects on coverage of using a different set of five libraries? What are the effects of adding the holdings of additional libraries to those of the GPLP libraries, and how do these effects vary by library type? These questions certainly do not exhaust the analytical possibilities presented by GPLP. More in-depth analysis might look at Google 5 coverage in particular subject areas; it also would be interesting to see how many books covered by the GPLP have already been digitized in other contexts. However, these questions are left to future studies. The purpose here is to explore a few basic questions raised by GPLP, and in doing so, provide an empirical context for the debate that is sure to continue for some time to come. A secondary objective is to lay some groundwork for a general set of questions that could be used to explore the implications of any mass digitization initiative. A suggested list of questions is provided in the conclusion of the article.
    Date
    26.12.2011 14:08:22
  19. Paskin, N.: DOI: a 2003 progress report (2003) 0.01
    0.005626014 = product of:
      0.050634123 = sum of:
        0.050634123 = weight(_text_:germany in 1203) [ClassicSimilarity], result of:
          0.050634123 = score(doc=1203,freq=2.0), product of:
            0.21956629 = queryWeight, product of:
              5.963546 = idf(docFreq=308, maxDocs=44218)
              0.036818076 = queryNorm
            0.23060973 = fieldWeight in 1203, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.963546 = idf(docFreq=308, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1203)
      0.11111111 = coord(1/9)
    
    Abstract
    The International DOI Foundation (IDF) recently published the third edition of its DOI Handbook, which sets the scene for DOI's expansion into much wider applications. Edition 3 is not simply an updated user guide. A great deal has happened in the underlying technologies and in the practical deployment and development of DOIs (Digital Object Identifiers) since the last edition was published a year ago. Much of the program of technical work foreseen at the inception of DOIs has now been completed. The initial simple implementation of DOI as a persistent name linked to redirection continues to grow, with approaching ten million DOIs assigned from several hundred organisations through a number of Registration Agencies in USA, Europe, and Australasia, supporting large scale business uses. Implementations of more sophisticated applications (offering associated services) have been developing well but on a smaller scale: a framework for building these has been completed as part of the latest release and promises to stimulate a new wave of growth. From its original starting point in text publishing, there has been gradual embrace by a number of communities: these include national libraries (a consortium of national libraries recently joined the IDF); government documentation (with the appointment of TSO The Stationery Office in the UK as a DOI agency and the announced intention of the EC Office of Publications to use DOIs); non-English language markets (France, Germany, Spain, Italy, Korea). However implementations in non-text sectors have been far slower to develop, though several are now under discussion. The DOI community can point to several significant achievements over the past few years: * A practical successful open implementation of naming objects, treating content as information objects, not simply packets of bits; * The IDF's role in co-sponsoring, championing, and now implementing the <indecs>T framework as a semantic tool for structured metadata - an essential step for treating content as information in Semantic-Web-like applications; * A template for building advanced applications, connecting resolution and metadata technologies, and offering hooks to web services and similar applications; * The development of a policy framework that allows multiple communities autonomy; * The practical implementation of DOIs with emerging related standards such as the OpenURL framework in contextual linking.
  20. Apps, A.; MacIntyre, R.: Why OpenURL? (2006) 0.01
    0.005480147 = product of:
      0.049321324 = sum of:
        0.049321324 = weight(_text_:bibliographic in 4081) [ClassicSimilarity], result of:
          0.049321324 = score(doc=4081,freq=2.0), product of:
            0.14333439 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.036818076 = queryNorm
            0.34409973 = fieldWeight in 4081, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.0625 = fieldNorm(doc=4081)
      0.11111111 = coord(1/9)
    
    Abstract
    The improvement of access to scholarly literature caused by electronic journal publishing quickly led to the wish for seamless linking to referenced articles. This article looks at the evolution of linking technologies with a particular focus on OpenURL, now a NISO standard. The implications for stakeholders in the supply chain are explored, including publishers, intermediaries, libraries and readers. The benefits, expectations and business drivers are examined. The article also highlights some novel, existing and potential future, uses, including increased user-empowerment and possibilities beyond referencing traditional bibliographic material.

Languages

  • e 62
  • d 2
  • More… Less…