Search (9 results, page 1 of 1)

Dunning, A.: Do we still need search engines? (1999) 0.02

0.015707046 = product of:
  0.047121134 = sum of:
    0.047121134 = product of:
      0.09424227 = sum of:
        0.09424227 = weight(_text_:22 in 6021) [ClassicSimilarity], result of:
          0.09424227 = score(doc=6021,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.5416616 = fieldWeight in 6021, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6021)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Ariadne. 1999, no.22

Roszkowski, M.; Lukas, C.: ¬A distributed architecture for resource discovery using metadata (1998) 0.01
```
0.01072458 = product of:
  0.032173738 = sum of:
    0.032173738 = product of:
      0.064347476 = sum of:
        0.064347476 = weight(_text_:indexing in 1256) [ClassicSimilarity], result of:
          0.064347476 = score(doc=1256,freq=8.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.3383389 = fieldWeight in 1256, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.03125 = fieldNorm(doc=1256)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

This article describes an approach for linking geographically distributed collections of metadata so that they are searchable as a single collection. We describe the infrastructure, which uses standard Internet protocols such as the Lightweight Directory Access Protocol (LDAP) and the Common Indexing Protocol (CIP), to distribute queries, return results, and exchange index information. We discuss the advantages of using linked collections of authoritative metadata as an alternative to using a keyword indexing search-engine for resource discovery. We examine other architectures that use metadata for resource discovery, such as Dienst/NCSTRL, the AHDS HTTP/Z39.50 Gateway, and the ROADS initiative. Finally, we discuss research issues and future directions of the project. The Internet Scout Project, which is funded by the National Science Foundation and is located in the Computer Sciences Department at the University of Wisconsin-Madison, is charged with assisting the higher education community in resource discovery on the Internet. To that end, the Scout Report and subsequent subject-specific Scout Reports were developed to guide the U.S. higher education community to research-quality resources. The Scout Report Signpost utilizes the content from the Scout Reports as the basis of a metadata collection. Signpost consists of more than 2000 cataloged Internet sites using established standards such as Library of Congress subject headings and abbreviated call letters, and emerging standards such as the Dublin Core (DC). This searchable and browseable collection is free and freely accessible, as are all of the Internet Scout Project's services.
As well developed as both the Scout Reports and Signpost are, they cannot capture the wealth of high-quality content that is available on the Internet. An obvious next step toward increasing the usefulness of our own collection and its value to our customer base is to partner with other high-quality content providers who have developed similar collections and to develop a single, virtual collection. Project Isaac (working title) is the Internet Scout Project's latest resource discovery effort. Project Isaac involves the development of a research testbed that allows experimentation with protocols and algorithms for creating, maintaining, indexing and searching distributed collections of metadata. Project Isaac's infrastructure uses standard Internet protocols, such as the Lightweight Directory Access Protocol (LDAP) and the Common Indexing Protocol (CIP) to distribute queries, return results, and exchange index or centroid information. The overall goal is to support a single-search interface to geographically distributed and independently maintained metadata collections.

Landauer, T.K.; Foltz, P.W.; Laham, D.: ¬An introduction to Latent Semantic Analysis (1998) 0.01

0.0080434345 = product of:
  0.024130303 = sum of:
    0.024130303 = product of:
      0.048260607 = sum of:
        0.048260607 = weight(_text_:indexing in 1162) [ClassicSimilarity], result of:
          0.048260607 = score(doc=1162,freq=2.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.2537542 = fieldWeight in 1162, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=1162)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Object: Latent Semantic Indexing

Priss, U.: Faceted knowledge representation (1999) 0.01

0.007853523 = product of:
  0.023560567 = sum of:
    0.023560567 = product of:
      0.047121134 = sum of:
        0.047121134 = weight(_text_:22 in 2654) [ClassicSimilarity], result of:
          0.047121134 = score(doc=2654,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.2708308 = fieldWeight in 2654, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2654)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 1.2016 17:30:31

Atkins, H.: ¬The ISI® Web of Science® - links and electronic journals : how links work today in the Web of Science, and the challenges posed by electronic journals (1999) 0.01
```
0.0075834226 = product of:
  0.022750268 = sum of:
    0.022750268 = product of:
      0.045500536 = sum of:
        0.045500536 = weight(_text_:indexing in 1246) [ClassicSimilarity], result of:
          0.045500536 = score(doc=1246,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.23924173 = fieldWeight in 1246, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.03125 = fieldNorm(doc=1246)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Since their inception in the early 1960s the strength and unique aspect of the ISI citation indexes has been their ability to illustrate the conceptual relationships between scholarly documents. When authors create reference lists for their papers, they make explicit links between their own, current work and the prior work of others. The exact nature of these links may not be expressed in the references themselves, and the motivation behind them may vary (this has been the subject of much discussion over the years), but the links embodied in references do exist. Over the past 30+ years, technology has allowed ISI to make the presentation of citation searching increasingly accessible to users of our products. Citation searching and link tracking moved from being rather cumbersome in print, to being direct and efficient (albeit non-intuitive) online, to being somewhat more user-friendly in CD format. But it is the confluence of the hypertext link and development of Web browsers that has enabled us to present to users a new form of citation product -- the Web of Science -- that is intuitive and makes citation indexing conceptually accessible. A cited reference search begins with a known, important (or at least relevant) document used as the search term. The search allows one to identify subsequent articles that have cited that document. This feature adds the dimension of prospective searching to the usual retrospective searching that all bibliographic indexes provide. Citation indexing is a prime example of a concept before its time - important enough to be used in the meantime by those sufficiently motivated, but just waiting for the right technology to come along to expand its use. While it was possible to follow citation links in earlier citation index formats, this required a level of effort on the part of users that was often just too much to ask of the casual user. In the citation indexes as presented in the Web of Science, the relationship between citing and cited documents is evident to users, and a click of the mouse is all it takes to follow a citation link. Citation connections are established between the published papers being indexed from the 8,000+ journals ISI covers and the items their reference lists contain during the data capture process. It is the standardized capture of each of the references included with these documents that enables us to provide the citation searching feature in all the citation index formats, as well as both internal and external links in the Web of Science.

Priss, U.: Description logic and faceted knowledge representation (1999) 0.01

0.0067315903 = product of:
  0.02019477 = sum of:
    0.02019477 = product of:
      0.04038954 = sum of:
        0.04038954 = weight(_text_:22 in 2655) [ClassicSimilarity], result of:
          0.04038954 = score(doc=2655,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.23214069 = fieldWeight in 2655, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2655)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 1.2016 17:30:31

Kirriemuir, J.; Brickley, D.; Welsh, S.; Knight, J.; Hamilton, M.: Cross-searching subject gateways : the query routing and forward knowledge approach (1998) 0.01
```
0.0067028617 = product of:
  0.020108584 = sum of:
    0.020108584 = product of:
      0.04021717 = sum of:
        0.04021717 = weight(_text_:indexing in 1252) [ClassicSimilarity], result of:
          0.04021717 = score(doc=1252,freq=2.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.21146181 = fieldWeight in 1252, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1252)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

A subject gateway, in the context of network-based resource access, can be defined as some facility that allows easier access to network-based resources in a defined subject area. The simplest types of subject gateways are sets of Web pages containing lists of links to resources. Some gateways index their lists of links and provide a simple search facility. More advanced gateways offer a much enhanced service via a system consisting of a resource database and various indexes, which can be searched and/or browsed through a Web-based interface. Each entry in the database contains information about a network-based resource, such as a Web page, Web site, mailing list or document. Entries are usually created by a cataloguer manually identifying a suitable resource, describing the resource using a template, and submitting the template to the database for indexing. Subject gateways are also known as subject-based information gateways (SBIGs), subject-based gateways, subject index gateways, virtual libraries, clearing houses, subject trees, pathfinders and other variations thereof. This paper describes the characteristics of some of the subject gateways currently accessible through the Web, and compares them to automatic "vacuum cleaner" type search engines, such as AltaVista. The application of WHOIS++, centroids, query routing, and forward knowledge to searching several of these subject gateways simultaneously is outlined. The paper concludes with looking at some of the issues facing subject gateway development in the near future. The paper touches on many of the issues mentioned in a previous paper in D-Lib Magazine, especially regarding resource-discovery related initiatives and services.
Baker, T.: Languages for Dublin Core (1998) 0.01
```
0.0066354945 = product of:
  0.019906484 = sum of:
    0.019906484 = product of:
      0.039812967 = sum of:
        0.039812967 = weight(_text_:indexing in 1257) [ClassicSimilarity], result of:
          0.039812967 = score(doc=1257,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.20933652 = fieldWeight in 1257, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1257)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Over the past three years, the Dublin Core Metadata Initiative has achieved a broad international consensus on the semantics of a simple element set for describing electronic resources. Since the first workshop in March 1995, which was reported in the very first issue of D-Lib Magazine, Dublin Core has been the topic of perhaps a dozen articles here. Originally intended to be simple and intuitive enough for authors to tag Web pages without special training, Dublin Core is being adapted now for more specialized uses, from government information and legal deposit to museum informatics and electronic commerce. To meet such specialized requirements, Dublin Core can be customized with additional elements or qualifiers. However, these refinements can compromise interoperability across applications. There are tradeoffs between using specific terms that precisely meet local needs versus general terms that are understood more widely. We can better understand this inevitable tension between simplicity and complexity if we recognize that metadata is a form of human language. With Dublin Core, as with a natural language, people are inclined to stretch definitions, make general terms more specific, specific terms more general, misunderstand intended meanings, and coin new terms. One goal of this paper, therefore, will be to examine the experience of some related ways to seek semantic interoperability through simplicity: planned languages, interlingua constructs, and pidgins. The problem of semantic interoperability is compounded when we consider Dublin Core in translation. All of the workshops, documents, mailing lists, user guides, and working group outputs of the Dublin Core Initiative have been in English. But in many countries and for many applications, people need a metadata standard in their own language. In principle, the broad elements of Dublin Core can be defined equally well in Bulgarian or Hindi. Since Dublin Core is a controlled standard, however, any parallel definitions need to be kept in sync as the standard evolves. Another goal of the paper, then, will be to define the conceptual and organizational problem of maintaining a metadata standard in multiple languages. In addition to a name and definition, which are meant for human consumption, each Dublin Core element has a label, or indexing token, meant for harvesting by search engines. For practical reasons, these machine-readable tokens are English-looking strings such as Creator and Subject (just as HTML tags are called HEAD, BODY, or TITLE). These tokens, which are shared by Dublin Cores in every language, ensure that metadata fields created in any particular language are indexed together across repositories. As symbols of underlying universal semantics, these tokens form the basis of semantic interoperability among the multiple Dublin Cores. As long as we limit ourselves to sharing these indexing tokens among exact translations of a simple set of fifteen broad elements, the definitions of which fit easily onto two pages, the problem of Dublin Core in multiple languages is straightforward. But nothing having to do with human language is ever so simple. Just as speakers of various languages must learn the language of Dublin Core in their own tongues, we must find the right words to talk about a metadata language that is expressable in many discipline-specific jargons and natural languages and that inevitably will evolve and change over time.
Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.01
```
0.0056875674 = product of:
  0.017062701 = sum of:
    0.017062701 = product of:
      0.034125403 = sum of:
        0.034125403 = weight(_text_:indexing in 1253) [ClassicSimilarity], result of:
          0.034125403 = score(doc=1253,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.1794313 = fieldWeight in 1253, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.

Search (9 results, page 1 of 1)

Authors

Themes