Search (46 results, page 2 of 3)

Daniel Jr., R.; Lagoze, C.: Extending the Warwick framework : from metadata containers to active digital objects (1997) 0.00
```
0.0020161984 = product of:
  0.01209719 = sum of:
    0.01209719 = weight(_text_:in in 1264) [ClassicSimilarity], result of:
      0.01209719 = score(doc=1264,freq=30.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.20372246 = fieldWeight in 1264, product of:
          5.477226 = tf(freq=30.0), with freq of:
            30.0 = termFreq=30.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1264)
  0.16666667 = coord(1/6)
```
Abstract

Defining metadata as "data about data" provokes more questions than it answers. What are the forms of the data and metadata? Can we be more specific about the manner in which the metadata is "about" the data? Are data and metadata distinguished only in the context of their relationship? Is the nature of the relationship between the datasets declarative or procedural? Can the metadata itself be described by other data? Over the past several years, we have been engaged in a number of efforts examining the role, format, composition, and architecture of metadata for networked resources. During this time, we have noticed the tendency to be led astray by comfortable, but somewhat inappropriate, models in the non-digital information environment. Rather than pursuing familiar models, there is the need for a new model that fully exploits the unique combination of computation and connectivity that characterizes the digital library. In this paper, we describe an extension of the Warwick Framework that we call Distributed Active Relationships (DARs). DARs provide a powerful model for representing data and metadata in digital library objects. They explicitly express the relationships between networked resources, and even allow those relationships to be dynamically downloadable and executable. The DAR model is based on the following principles, which our examination of the "data about data" definition has led us to regard as axiomatic: * There is no essential distinction between data and metadata. We can only make such a distinction in terms of a particular "about" relationship. As a result, what is metadata in the context of one "about" relationship may be data in another. * There is no single "about" relationship. There are many different and important relationships between data resources. * Resources can be related without regard for their location. The connectivity in networked information architectures makes it possible to have data in one repository describe data in another repository. * The computational power of the networked information environment makes it possible to consider active or dynamic relationships between data sets. This adds considerable power to the "data about data" definition. First, data about another data set may not physically exist, but may be automatically derived. Second, the "about" relationship may be an executable object -- in a sense interpretable metadata. As will be shown, this provides useful mechanisms for handling complex metadata problems such as rights management of digital objects. The remainder of this paper describes the development and consequences of the DAR model. Section 2 reviews the Warwick Framework, which is the basis for the model described in this paper. Section 3 examines the concept of the Warwick Framework Catalog, which provides a mechanism for expressing the relationships between the packages in a Warwick Framework container. With that background established, section 4 generalizes the Warwick Framework by removing the restriction that it only contains "metadata". This allows us to consider digital library objects that are aggregations of (possibly distributed) data sets, with the relationships between the data sets expressed using a Warwick Framework Catalog. Section 5 further extends the model by describing Distributed Active Relationships (DARs). DARs are the explicit relationships that have the potential to be executable, as alluded to earlier. Finally, section 6 describes two possible implementations of these concepts.
Payette, S.; Blanchi, C.; Lagoze, C.; Overly, E.A.: Interoperability for digital objects and repositories : the Cornell/CNRI experiments (1999) 0.00
```
0.0019732218 = product of:
  0.01183933 = sum of:
    0.01183933 = weight(_text_:in in 1248) [ClassicSimilarity], result of:
      0.01183933 = score(doc=1248,freq=22.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.19937998 = fieldWeight in 1248, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=1248)
  0.16666667 = coord(1/6)
```
Abstract

For several years the Digital Library Research Group at Cornell University and the Corporation for National Research Initiatives (CNRI) have been engaged in research focused on the design and development of infrastructures for open architecture, confederated digital libraries. The goal of this effort is to achieve interoperability and extensibility of digital library systems through the definition of key digital library services and their open interfaces, allowing flexible interaction of existing services and augmentation of the infrastructure with new services. Some aspects of this research have included the development and deployment of the Dienst software, the Handle System®, and the architecture of digital objects and repositories. In this paper, we describe the joint effort by Cornell and CNRI to prototype a rich and deployable architecture for interoperable digital objects and repositories. This effort has challenged us to move theories of interoperability closer to practice. The Cornell/CNRI collaboration builds on two existing projects focusing on the development of interoperable digital libraries. Details relating to the technology of these projects are described elsewhere. Both projects were strongly influenced by the fundamental abstractions of repositories and digital objects as articulated by Kahn and Wilensky in A Framework for Distributed Digital Object Services. Furthermore, both programs were influenced by the container architecture described in the Warwick Framework, and by the notions of distributed dynamic objects presented by Lagoze and Daniel in their Distributed Active Relationship work. With these common roots, one would expect that the CNRI and Cornell repositories would be at least theoretically interoperable. However, the actual test would be the extent to which our independently developed repositories were practically interoperable. This paper focuses on the definition of interoperability in the joint Cornell/CNRI work and the set of experiments conducted to formally test it. Our motivation for this work is the eventual deployment of formally tested reference implementations of the repository architecture for experimentation and development by fellow digital library researchers. In Section 2, we summarize the digital object and repository approach that was the focus of our interoperability experiments. In Section 3, we describe the set of experiments that progressively tested interoperability at increasing levels of functionality. In Section 4, we discuss general conclusions, and in Section 5, we give a preview of our future work, including our plans to evolve our experimentation to the point of defining a set of formal metrics for measuring interoperability for repositories and digital objects. This is still a work in progress that is expected to undergo additional refinements during its development.
Van de Sompel, H.; Hochstenbach, P.: Reference linking in a hybrid library environment : part 2: SFX, a generic linking solution (1999) 0.00
```
0.0019676082 = product of:
  0.011805649 = sum of:
    0.011805649 = weight(_text_:in in 1241) [ClassicSimilarity], result of:
      0.011805649 = score(doc=1241,freq=14.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.19881277 = fieldWeight in 1241, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1241)
  0.16666667 = coord(1/6)
```
Abstract

This is the second part of two articles about reference linking in hybrid digital libraries. The first part, Frameworks for Linking described the current state-of-the-art and contrasted various approaches to the problem. It identified static and dynamic linking solutions, as well as open and closed linking frameworks. It also included an extensive bibliography. The second part describes our work at the University of Ghent to address these issues. SFX is a generic linking system that we have developed for our own needs, but its underlying concepts can be applied in a wide range of digital libraries. This is a description of the approach to the creation of extended services in a hybrid library environment that has been taken by the Library Automation team at the University of Ghent. The ongoing research has been grouped under the working title Special Effects (SFX). In order to explain the SFX-concepts in a comprehensive way, the discussion will start with a brief description of pre-SFX experiments. Thereafter, the basics of the SFX-approach are explained briefly, in combination with concrete implementation choices taken for the Elektron SFX-linking experiment. Elektron was the name of a modest digital library collaboration between the Universities of Ghent, Louvain and Antwerp.
Van de Sompel, H.; Hochstenbach, P.: Reference linking in a hybrid library environment : part 1: frameworks for linking (1999) 0.00
```
0.0018813931 = product of:
  0.011288359 = sum of:
    0.011288359 = weight(_text_:in in 1244) [ClassicSimilarity], result of:
      0.011288359 = score(doc=1244,freq=20.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.19010136 = fieldWeight in 1244, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=1244)
  0.16666667 = coord(1/6)
```
Abstract

The creation of services linking related information entities is an area that is attracting an ever increasing interest in the ongoing development of the World Wide Web in general, and of research-related information systems in particular. Currently, both practice and theory point at linking services as being a major domain for innovation enabled by digital communication of content. Publishers, subscription agents, researchers and libraries are all looking into ways to create added value by linking related information entities, as such presenting the information within a broader context estimated to be relevant to the users of the information. This is the first of two articles in D-Lib Magazine on this topic. This first part describes the current state-of-the-art and contrasts various approaches to the problem. It identifies static and dynamic linking solutions as well as open and closed linking frameworks. It also includes an extensive bibliography. The second part, SFX, a Generic Linking Solution describes a system that we have developed for linking in a hybrid working environment. The creation of services linking related information entities is an area that is attracting an ever increasing interest in the ongoing development of the World Wide Web in general, and of research-related information systems in particular. Although most writings on electronic scientific communication have touted other benefits, such as the increase in communication speed, the possibility to exchange multimedia content and the absence of limitations on the length of research papers, currently both practice and theory point at linking services as being a major opportunity for improved communication of content. Publishers, subscription agents, researchers and libraries are all looking into ways to create added-value by linking related information entities, as such presenting the information within a broader context estimated to be relevant to the users of the information.
Brin, S.; Page, L.: ¬The anatomy of a large-scale hypertextual Web search engine (1998) 0.00
```
0.001821651 = product of:
  0.010929906 = sum of:
    0.010929906 = weight(_text_:in in 947) [ClassicSimilarity], result of:
      0.010929906 = score(doc=947,freq=12.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.18406484 = fieldWeight in 947, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=947)
  0.16666667 = coord(1/6)
```
Abstract

In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want
Bearman, D.; Miller, E.; Rust, G.; Trant, J.; Weibel, S.: ¬A common model to support interoperable metadata : progress report on reconciling metadata requirements from the Dublin Core and INDECS/DOI communities (1999) 0.00
```
0.001821651 = product of:
  0.010929906 = sum of:
    0.010929906 = weight(_text_:in in 1249) [ClassicSimilarity], result of:
      0.010929906 = score(doc=1249,freq=12.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.18406484 = fieldWeight in 1249, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1249)
  0.16666667 = coord(1/6)
```
Abstract

The Dublin Core metadata community and the INDECS/DOI community of authors, rights holders, and publishers are seeking common ground in the expression of metadata for information resources. Recent meetings at the 6th Dublin Core Workshop in Washington DC sketched out common models for semantics (informed by the requirements articulated in the IFLA Functional Requirements for the Bibliographic Record) and conventions for knowledge representation (based on the Resource Description Framework under development by the W3C). Further development of detailed requirements is planned by both communities in the coming months with the aim of fully representing the metadata needs of each. An open "Schema Harmonization" working group has been established to identify a common framework to support interoperability among these communities. The present document represents a starting point identifying historical developments and common requirements of these perspectives on metadata and charts a path for harmonizing their respective conceptual models. It is hoped that collaboration over the coming year will result in agreed semantic and syntactic conventions that will support a high degree of interoperability among these communities, ideally expressed in a single data model and using common, standard tools.
Kirriemuir, J.; Brickley, D.; Welsh, S.; Knight, J.; Hamilton, M.: Cross-searching subject gateways : the query routing and forward knowledge approach (1998) 0.00
```
0.001821651 = product of:
  0.010929906 = sum of:
    0.010929906 = weight(_text_:in in 1252) [ClassicSimilarity], result of:
      0.010929906 = score(doc=1252,freq=12.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.18406484 = fieldWeight in 1252, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1252)
  0.16666667 = coord(1/6)
```
Abstract

A subject gateway, in the context of network-based resource access, can be defined as some facility that allows easier access to network-based resources in a defined subject area. The simplest types of subject gateways are sets of Web pages containing lists of links to resources. Some gateways index their lists of links and provide a simple search facility. More advanced gateways offer a much enhanced service via a system consisting of a resource database and various indexes, which can be searched and/or browsed through a Web-based interface. Each entry in the database contains information about a network-based resource, such as a Web page, Web site, mailing list or document. Entries are usually created by a cataloguer manually identifying a suitable resource, describing the resource using a template, and submitting the template to the database for indexing. Subject gateways are also known as subject-based information gateways (SBIGs), subject-based gateways, subject index gateways, virtual libraries, clearing houses, subject trees, pathfinders and other variations thereof. This paper describes the characteristics of some of the subject gateways currently accessible through the Web, and compares them to automatic "vacuum cleaner" type search engines, such as AltaVista. The application of WHOIS++, centroids, query routing, and forward knowledge to searching several of these subject gateways simultaneously is outlined. The paper concludes with looking at some of the issues facing subject gateway development in the near future. The paper touches on many of the issues mentioned in a previous paper in D-Lib Magazine, especially regarding resource-discovery related initiatives and services.
Weibel, S.: ¬The State of the Dublin Core Metadata Initiative April 1999 (1999) 0.00
```
0.0018033426 = product of:
  0.010820055 = sum of:
    0.010820055 = weight(_text_:in in 6066) [ClassicSimilarity], result of:
      0.010820055 = score(doc=6066,freq=6.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.1822149 = fieldWeight in 6066, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6066)
  0.16666667 = coord(1/6)
```
Abstract

One hundred and one experts in resource description convened in Washington, D.C., November 2 through November 4, 1998, for the sixth Dublin Core Metadata Workshop. The registrants represented 16 countries on 4 continents, and many disciplines. As with previous workshops, many new issues were opened, and vigorous debate was a hallmark of the event. Unlike previous workshops, the focus of DC-6 was not to resolve questions in plenary meetings, but rather to identify unresolved issues and assign them to formal working groups for resolution. The result of this process was an ambitious workplan for 1999. This report summarizes that workplan, highlights the progress that has made been on the workplan, and identifies a few significant projects that exemplify this progress.
Pitti, D.V.: Encoded Archival Description : an introduction and overview (1999) 0.00
```
0.0017848461 = product of:
  0.010709076 = sum of:
    0.010709076 = weight(_text_:in in 1152) [ClassicSimilarity], result of:
      0.010709076 = score(doc=1152,freq=8.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.18034597 = fieldWeight in 1152, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=1152)
  0.16666667 = coord(1/6)
```
Abstract

Encoded Archival Description (EAD) is an emerging standard used internationally in an increasing number of archives and manuscripts libraries to encode data describing corporate records and personal papers. The individual descriptions are variously called finding aids, guides, handlists, or catalogs. While archival description shares many objectives with bibliographic description, it differs from it in several essential ways. From its inception, EAD was based on SGML, and, with the release of EAD version 1.0 in 1998, it is also compliant with XML. EAD was, and continues to be, developed by the archival community. While development was initiated in the United States, international interest and contribution are increasing. EAD is currently administered and maintained jointly by the Society of American Archivists and the United States Library of Congress. Developers are currently exploring ways to internationalize the administration and maintenance of EAD to reflect and represent the expanding base of users.
Chen, H.: Semantic research for digital libraries (1999) 0.00
```
0.0017848461 = product of:
  0.010709076 = sum of:
    0.010709076 = weight(_text_:in in 1247) [ClassicSimilarity], result of:
      0.010709076 = score(doc=1247,freq=8.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.18034597 = fieldWeight in 1247, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=1247)
  0.16666667 = coord(1/6)
```
Abstract

In this era of the Internet and distributed, multimedia computing, new and emerging classes of information systems applications have swept into the lives of office workers and people in general. From digital libraries, multimedia systems, geographic information systems, and collaborative computing to electronic commerce, virtual reality, and electronic video arts and games, these applications have created tremendous opportunities for information and computer science researchers and practitioners. As applications become more pervasive, pressing, and diverse, several well-known information retrieval (IR) problems have become even more urgent. Information overload, a result of the ease of information creation and transmission via the Internet and WWW, has become more troublesome (e.g., even stockbrokers and elementary school students, heavily exposed to various WWW search engines, are versed in such IR terminology as recall and precision). Significant variations in database formats and structures, the richness of information media (text, audio, and video), and an abundance of multilingual information content also have created severe information interoperability problems -- structural interoperability, media interoperability, and multilingual interoperability.

Peters, C.; Picchi, E.: Across languages, across cultures : issues in multilinguality and digital libraries (1997) 0.00

0.001682769 = product of:
  0.010096614 = sum of:
    0.010096614 = weight(_text_:in in 1233) [ClassicSimilarity], result of:
      0.010096614 = score(doc=1233,freq=4.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.17003182 = fieldWeight in 1233, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0625 = fieldNorm(doc=1233)
  0.16666667 = coord(1/6)

Abstract: With the recent rapid diffusion over the international computer networks of world-wide distributed document bases, the question of multilingual access and multilingual information retrieval is becoming increasingly relevant. We briefly discuss just some of the issues that must be addressed in order to implement a multilingual interface for a Digital Library system and describe our own approach to this problem.

Lynch, C.A.: ¬The Z39.50 information retrieval standard : part I: a strategic view of its past, present and future (1997) 0.00
```
0.0016695707 = product of:
  0.010017424 = sum of:
    0.010017424 = weight(_text_:in in 1262) [ClassicSimilarity], result of:
      0.010017424 = score(doc=1262,freq=28.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.1686982 = fieldWeight in 1262, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1262)
  0.16666667 = coord(1/6)
```
Abstract

The Z39.50 standard for information retrieval is important from a number of perspectives. While still not widely known within the computer networking community, it is a mature standard that represents the culmination of two decades of thinking and debate about how information retrieval functions can be modeled, standardized, and implemented in a distributed systems environment. And - importantly -- it has been tested through substantial deployment experience. Z39.50 is one of the few examples we have to date of a protocol that actually goes beyond codifying mechanism and moves into the area of standardizing shared semantic knowledge. The extent to which this should be a goal of the protocol has been an ongoing source of controversy and tension within the developer community, and differing views on this issue can be seen both in the standard itself and the way that it is used in practice. Given the growing emphasis on issues such as "semantic interoperability" as part of the research agenda for digital libraries (see Clifford A. Lynch and Hector Garcia-Molina. Interoperability, Scaling, and the Digital Libraries Research Agenda, Report on the May 18-19, 1995 IITA Libraries Workshop, <http://www- diglib.stanford.edu/diglib/pub/reports/iita-dlw/main.html>), the insights gained by the Z39.50 community into the complex interactions among various definitions of semantics and interoperability are particularly relevant. The development process for the Z39.50 standard is also of interest in its own right. Its history, dating back to the 1970s, spans a period that saw the eclipse of formal standards-making agencies by groups such as the Internet Engineering Task Force (IETF) and informal standards development consortia. Moreover, in order to achieve meaningful implementation, Z39.50 had to move beyond its origins in the OSI debacle of the 1980s. Z39.50 has also been, to some extent, a victim of its own success -- or at least promise. Recent versions of the standard are highly extensible, and the consensus process of standards development has made it hospitable to an ever-growing set of new communities and requirements. As this process of extension has proceeded, it has become ever less clear what the appropriate scope and boundaries of the protocol should be, and what expectations one should have of practical interoperability among implementations of the standard. Z39.50 thus offers an excellent case study of the problems involved in managing the evolution of a standard over time. It may well offer useful lessons for the future of other standards such as HTTP and HTML, which seem to be facing some of the same issues.
This paper, which will appear in two parts, starting with this issue of D-Lib, looks at several strategic issues surrounding Z39.50. After a relatively brief overview of the function and history of the protocol, I will examine some of the competing visions of the protocol's role, with emphasis on issues of interoperability and the incorporation of semantics. The second installment of the paper will look at questions related to the management of the standard and the standards development process, with emphasis on the scope of the protocol and how that relates back again to interoperability questions. The paper concludes with a discussion of the adoption and deployment of the standard, its relationship to other standards, and some speculations on future directions for the protocol. This paper is not intended to be a tutorial on the details of how current or past versions of Z39.50 work. These technical details are covered not only in the standard itself (which can admittedly be rather difficult reading) but also in an array of tutorial and review papers (see <http://lcweb.loc.gov/z3950/agency> for bibliographies and pointers to on-line information on Z39.50). Instead, the paper's focus is on how and why Z39.50 developed the way it did, and the conceptual debates that have influenced its evolution and use. While a detailed technical knowledge of the operation of Z39.50 is certainly helpful, it should not be necessary in order to follow most of the material here. Some disclaimers are in order. I have been actively involved in the development of Z39.50 since the early 1980s and have been a participant -- and on occasion, even an instigator -- of some of the activities described here. This paper is an attempt to make a critical assessment of the current state of Z39.50 and a review of its development with the full benefit of hindsight. It recounts a number of debates that occurred within the developer community over the past years. In many of these, I advocated specific positions or approaches, sometimes successfully and sometimes unsuccessfully. What is presented here is one person's perspective - mine --, which is sometimes at odds with the current consensus with the developer community; I've tried to represent opposing views fairly, and to differentiate my opinions from fact or consensus. However, others will undoubtedly disagree with many of the comments here.
Brüggemann-Klein, A.; Klein, R.; Landgraf, B.: BibRelEx : Exploring bibliographic databases by visualization of annotated content-based relations (1999) 0.00
```
0.0015457221 = product of:
  0.009274333 = sum of:
    0.009274333 = weight(_text_:in in 1157) [ClassicSimilarity], result of:
      0.009274333 = score(doc=1157,freq=6.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.1561842 = fieldWeight in 1157, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=1157)
  0.16666667 = coord(1/6)
```
Abstract

Traditional searching and browsing functions for bibliographic databases no longer enable users to deal efficiently with the rapidly growing number of scientific publications. The main goal of our project BibRelEx is to develop a new method based on the visualization of content-based relations between documents such as cites, succeeds, improves with respect to. BibRelEx will therefore use these relationships for effective exploration. In addition, BibRelEx will take advantage of the additional insights into the area that can result from the aggregation of expert knowledge, which complements the specialized knowledge represented in the documents themselves. We are preparing to test this approach using a bibliographic database in a specific area of computer science.
Plotkin, R.C.; Schwartz, M.S.: Data modeling for news clip archive : a prototype solution (1997) 0.00
```
0.0015457221 = product of:
  0.009274333 = sum of:
    0.009274333 = weight(_text_:in in 1259) [ClassicSimilarity], result of:
      0.009274333 = score(doc=1259,freq=6.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.1561842 = fieldWeight in 1259, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=1259)
  0.16666667 = coord(1/6)
```
Abstract

Film, videotape and multimedia archive systems must address the issues of editing, authoring and searching at the media (i.e. tape) or sub media (i.e. scene) level in addition to the traditional inventory management capabilities associated with the physical media. This paper describes a prototype of a database design for the storage, search and retrieval of multimedia and its related information. It also provides a process by which legacy data can be imported to this schema. The Continuous Media Index, or Comix system is the name of the prototype. An implementation of such a digital library solution incorporates multimedia objects, hierarchical relationships and timecode in addition to traditional attribute data. Present video and multimedia archive systems are easily migrated to this architecture. Comix was implemented for a videotape archiving system. It was written for, and implemented using IBM Digital Library version 1.0. A derivative of Comix is currently in development for customer specific applications. Principles of the Comix design as well as the importation methods are not specific to the underlying systems used.
Chowdhury, A.; Mccabe, M.C.: Improving information retrieval systems using part of speech tagging (1993) 0.00
```
0.0015457221 = product of:
  0.009274333 = sum of:
    0.009274333 = weight(_text_:in in 1061) [ClassicSimilarity], result of:
      0.009274333 = score(doc=1061,freq=6.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.1561842 = fieldWeight in 1061, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=1061)
  0.16666667 = coord(1/6)
```
Abstract

The object of Information Retrieval is to retrieve all relevant documents for a user query and only those relevant documents. Much research has focused on achieving this objective with little regard for storage overhead or performance. In the paper we evaluate the use of Part of Speech Tagging to improve, the index storage overhead and general speed of the system with only a minimal reduction to precision recall measurements. We tagged 500Mbs of the Los Angeles Times 1990 and 1989 document collection provided by TREC for parts of speech. We then experimented to find the most relevant part of speech to index. We show that 90% of precision recall is achieved with 40% of the document collections terms. We also show that this is a improvement in overhead with only a 1% reduction in precision recall.
Oard, D.W.: Serving users in many languages : cross-language information retrieval for digital libraries (1997) 0.00
```
0.0014873719 = product of:
  0.008924231 = sum of:
    0.008924231 = weight(_text_:in in 1261) [ClassicSimilarity], result of:
      0.008924231 = score(doc=1261,freq=8.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.15028831 = fieldWeight in 1261, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1261)
  0.16666667 = coord(1/6)
```
Abstract

We are rapidly constructing an extensive network infrastructure for moving information across national boundaries, but much remains to be done before linguistic barriers can be surmounted as effectively as geographic ones. Users seeking information from a digital library could benefit from the ability to query large collections once using a single language, even when more than one language is present in the collection. If the information they locate is not available in a language that they can read, some form of translation will be needed. At present, multilingual thesauri such as EUROVOC help to address this challenge by facilitating controlled vocabulary search using terms from several languages, and services such as INSPEC produce English abstracts for documents in other languages. On the other hand, support for free text searching across languages is not yet widely deployed, and fully automatic machine translation is presently neither sufficiently fast nor sufficiently accurate to adequately support interactive cross-language information seeking. An active and rapidly growing research community has coalesced around these and other related issues, applying techniques drawn from several fields - notably information retrieval and natural language processing - to provide access to large multilingual collections.
Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.00
```
0.0012620769 = product of:
  0.0075724614 = sum of:
    0.0075724614 = weight(_text_:in in 316) [ClassicSimilarity], result of:
      0.0075724614 = score(doc=316,freq=4.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.12752387 = fieldWeight in 316, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=316)
  0.16666667 = coord(1/6)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC) [10], within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR).
Fowler, R.H.; Wilson, B.A.; Fowler, W.A.L.: Information navigator : an information system using associative networks for display and retrieval (1992) 0.00
```
0.0012620769 = product of:
  0.0075724614 = sum of:
    0.0075724614 = weight(_text_:in in 919) [ClassicSimilarity], result of:
      0.0075724614 = score(doc=919,freq=4.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.12752387 = fieldWeight in 919, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=919)
  0.16666667 = coord(1/6)
```
Abstract

Document retrieval is a highly interactive process dealing with large amounts of information. Visual representations can provide both a means for managing the complexity of large information structures and an interface style well suited to interactive manipulation. The system we have designed utilizes visually displayed graphic structures and a direct manipulation interface style to supply an integrated environment for retrieval. A common visually displayed network structure is used for query, document content, and term relations. A query can be modified through direct manipulation of its visual form by incorporating terms from any other information structure the system displays. An associative thesaurus of terms and an inter-document network provide information about a document collection that can complement other retrieval aids. Visualization of these large data structures makes use of fisheye views and overview diagrams to help overcome some of the inherent difficulties of orientation and navigation in large information structures.

Theme

Semantisches Umfeld in Indexierung u. Retrieval
Crane, G.: ¬The Perseus Project and beyond : how building a digital library challenges the humanities and technology (1998) 0.00
```
0.0012620769 = product of:
  0.0075724614 = sum of:
    0.0075724614 = weight(_text_:in in 1251) [ClassicSimilarity], result of:
      0.0075724614 = score(doc=1251,freq=4.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.12752387 = fieldWeight in 1251, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=1251)
  0.16666667 = coord(1/6)
```
Abstract

For more than ten years, the Perseus Project has been developing a digital library in the humanities. Initial work concentrated exclusively on ancient Greek culture, using this domain as a case study for a compact, densely hypertextual library on a single, but interdisciplinary, subject. Since it has achieved its initial goals with the Greek materials, however, Perseus is using the existing library to study the new possibilities (and limitations) of the electronic medium and to serve as the foundation for work in new cultural domains: Perseus has begun coverage of Roman and now Renaissance materials, with plans for expansion into other areas of the humanities as well. Our goal is not only to help traditional scholars conduct their research more effectively but, more importantly, to help humanists use the technology to redefine the relationship between their work and the broader intellectual community.
Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.00
```
0.0012620769 = product of:
  0.0075724614 = sum of:
    0.0075724614 = weight(_text_:in in 1253) [ClassicSimilarity], result of:
      0.0075724614 = score(doc=1253,freq=16.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.12752387 = fieldWeight in 1253, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
  0.16666667 = coord(1/6)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.

Search (46 results, page 2 of 3)

Authors

Languages

Themes