Search (53 results, page 1 of 3)

Dunning, A.: Do we still need search engines? (1999) 0.03

0.033366717 = product of:
  0.066733435 = sum of:
    0.066733435 = product of:
      0.100100145 = sum of:
        0.013307645 = weight(_text_:a in 6021) [ClassicSimilarity], result of:
          0.013307645 = score(doc=6021,freq=4.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.25222903 = fieldWeight in 6021, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=6021)
        0.0867925 = weight(_text_:22 in 6021) [ClassicSimilarity], result of:
          0.0867925 = score(doc=6021,freq=2.0), product of:
            0.1602338 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045757167 = queryNorm
            0.5416616 = fieldWeight in 6021, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6021)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Source: Ariadne. 1999, no.22
Type: a

Priss, U.: Faceted knowledge representation (1999) 0.02

0.01760206 = product of:
  0.03520412 = sum of:
    0.03520412 = product of:
      0.052806176 = sum of:
        0.009409925 = weight(_text_:a in 2654) [ClassicSimilarity], result of:
          0.009409925 = score(doc=2654,freq=8.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.17835285 = fieldWeight in 2654, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2654)
        0.04339625 = weight(_text_:22 in 2654) [ClassicSimilarity], result of:
          0.04339625 = score(doc=2654,freq=2.0), product of:
            0.1602338 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045757167 = queryNorm
            0.2708308 = fieldWeight in 2654, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2654)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Abstract: Faceted Knowledge Representation provides a formalism for implementing knowledge systems. The basic notions of faceted knowledge representation are "unit", "relation", "facet" and "interpretation". Units are atomic elements and can be abstract elements or refer to external objects in an application. Relations are sequences or matrices of 0 and 1's (binary matrices). Facets are relational structures that combine units and relations. Each facet represents an aspect or viewpoint of a knowledge system. Interpretations are mappings that can be used to translate between different representations. This paper introduces the basic notions of faceted knowledge representation. The formalism is applied here to an abstract modeling of a faceted thesaurus as used in information retrieval.
Date: 22. 1.2016 17:30:31
Type: a

Priss, U.: Description logic and faceted knowledge representation (1999) 0.02
```
0.0166499 = product of:
  0.0332998 = sum of:
    0.0332998 = product of:
      0.0499497 = sum of:
        0.012752913 = weight(_text_:a in 2655) [ClassicSimilarity], result of:
          0.012752913 = score(doc=2655,freq=20.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.24171482 = fieldWeight in 2655, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2655)
        0.037196785 = weight(_text_:22 in 2655) [ClassicSimilarity], result of:
          0.037196785 = score(doc=2655,freq=2.0), product of:
            0.1602338 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045757167 = queryNorm
            0.23214069 = fieldWeight in 2655, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2655)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)
```
Abstract

The term "facet" was introduced into the field of library classification systems by Ranganathan in the 1930's [Ranganathan, 1962]. A facet is a viewpoint or aspect. In contrast to traditional classification systems, faceted systems are modular in that a domain is analyzed in terms of baseline facets which are then synthesized. In this paper, the term "facet" is used in a broader meaning. Facets can describe different aspects on the same level of abstraction or the same aspect on different levels of abstraction. The notion of facets is related to database views, multicontexts and conceptual scaling in formal concept analysis [Ganter and Wille, 1999], polymorphism in object-oriented design, aspect-oriented programming, views and contexts in description logic and semantic networks. This paper presents a definition of facets in terms of faceted knowledge representation that incorporates the traditional narrower notion of facets and potentially facilitates translation between different knowledge representation formalisms. A goal of this approach is a modular, machine-aided knowledge base design mechanism. A possible application is faceted thesaurus construction for information retrieval and data mining. Reasoning complexity depends on the size of the modules (facets). A more general analysis of complexity will be left for future research.

Date

22. 1.2016 17:30:31

Type

a
Kirriemuir, J.; Brickley, D.; Welsh, S.; Knight, J.; Hamilton, M.: Cross-searching subject gateways : the query routing and forward knowledge approach (1998) 0.01
```
0.009409107 = product of:
  0.018818215 = sum of:
    0.018818215 = product of:
      0.028227322 = sum of:
        0.015652781 = weight(_text_:m in 1252) [ClassicSimilarity], result of:
          0.015652781 = score(doc=1252,freq=2.0), product of:
            0.11386436 = queryWeight, product of:
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.045757167 = queryNorm
            0.13746867 = fieldWeight in 1252, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1252)
        0.012574541 = weight(_text_:a in 1252) [ClassicSimilarity], result of:
          0.012574541 = score(doc=1252,freq=28.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.23833402 = fieldWeight in 1252, product of:
              5.2915025 = tf(freq=28.0), with freq of:
                28.0 = termFreq=28.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1252)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)
```
Abstract

A subject gateway, in the context of network-based resource access, can be defined as some facility that allows easier access to network-based resources in a defined subject area. The simplest types of subject gateways are sets of Web pages containing lists of links to resources. Some gateways index their lists of links and provide a simple search facility. More advanced gateways offer a much enhanced service via a system consisting of a resource database and various indexes, which can be searched and/or browsed through a Web-based interface. Each entry in the database contains information about a network-based resource, such as a Web page, Web site, mailing list or document. Entries are usually created by a cataloguer manually identifying a suitable resource, describing the resource using a template, and submitting the template to the database for indexing. Subject gateways are also known as subject-based information gateways (SBIGs), subject-based gateways, subject index gateways, virtual libraries, clearing houses, subject trees, pathfinders and other variations thereof. This paper describes the characteristics of some of the subject gateways currently accessible through the Web, and compares them to automatic "vacuum cleaner" type search engines, such as AltaVista. The application of WHOIS++, centroids, query routing, and forward knowledge to searching several of these subject gateways simultaneously is outlined. The paper concludes with looking at some of the issues facing subject gateway development in the near future. The paper touches on many of the issues mentioned in a previous paper in D-Lib Magazine, especially regarding resource-discovery related initiatives and services.

Type

a

Buckland, M.; Chen, A.; Chen, H.M.; Kim, Y.; Lam, B.; Larson, R.; Norgard, B.; Purat, J.; Gey, F.: Mapping entry vocabulary to unfamiliar metadata vocabularies (1999) 0.01

0.008589465 = product of:
  0.01717893 = sum of:
    0.01717893 = product of:
      0.025768396 = sum of:
        0.018783338 = weight(_text_:m in 1238) [ClassicSimilarity], result of:
          0.018783338 = score(doc=1238,freq=2.0), product of:
            0.11386436 = queryWeight, product of:
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.045757167 = queryNorm
            0.1649624 = fieldWeight in 1238, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.046875 = fieldNorm(doc=1238)
        0.006985058 = weight(_text_:a in 1238) [ClassicSimilarity], result of:
          0.006985058 = score(doc=1238,freq=6.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.13239266 = fieldWeight in 1238, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1238)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Abstract: The emerging network environment brings access to an increasing population of heterogeneous repositories. Inevitably, these, have quite diverse metadata vocabularies (categorization codes, classification numbers, index and thesaurus terms). So, necessarily, the number of metadata vocabularies that are accessible but unfamiliar for any individual searcher is increasing steeply. When an unfamiliar metadata vocabulary is encountered, how is a searcher to know which codes or terms will lead to what is wanted? This paper reports work at the University of California, Berkeley, on the design and development of English language indexes to metadata vocabularies. Further details and the current status of the work can be found at the project website http://www.sims.berkeley.edu/research/metadata/
Type: a

Roszkowski, M.; Lukas, C.: ¬A distributed architecture for resource discovery using metadata (1998) 0.01
```
0.006708864 = product of:
  0.013417728 = sum of:
    0.013417728 = product of:
      0.020126592 = sum of:
        0.012522225 = weight(_text_:m in 1256) [ClassicSimilarity], result of:
          0.012522225 = score(doc=1256,freq=2.0), product of:
            0.11386436 = queryWeight, product of:
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.045757167 = queryNorm
            0.10997493 = fieldWeight in 1256, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.03125 = fieldNorm(doc=1256)
        0.007604368 = weight(_text_:a in 1256) [ClassicSimilarity], result of:
          0.007604368 = score(doc=1256,freq=16.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.14413087 = fieldWeight in 1256, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=1256)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)
```
Abstract

This article describes an approach for linking geographically distributed collections of metadata so that they are searchable as a single collection. We describe the infrastructure, which uses standard Internet protocols such as the Lightweight Directory Access Protocol (LDAP) and the Common Indexing Protocol (CIP), to distribute queries, return results, and exchange index information. We discuss the advantages of using linked collections of authoritative metadata as an alternative to using a keyword indexing search-engine for resource discovery. We examine other architectures that use metadata for resource discovery, such as Dienst/NCSTRL, the AHDS HTTP/Z39.50 Gateway, and the ROADS initiative. Finally, we discuss research issues and future directions of the project. The Internet Scout Project, which is funded by the National Science Foundation and is located in the Computer Sciences Department at the University of Wisconsin-Madison, is charged with assisting the higher education community in resource discovery on the Internet. To that end, the Scout Report and subsequent subject-specific Scout Reports were developed to guide the U.S. higher education community to research-quality resources. The Scout Report Signpost utilizes the content from the Scout Reports as the basis of a metadata collection. Signpost consists of more than 2000 cataloged Internet sites using established standards such as Library of Congress subject headings and abbreviated call letters, and emerging standards such as the Dublin Core (DC). This searchable and browseable collection is free and freely accessible, as are all of the Internet Scout Project's services.
As well developed as both the Scout Reports and Signpost are, they cannot capture the wealth of high-quality content that is available on the Internet. An obvious next step toward increasing the usefulness of our own collection and its value to our customer base is to partner with other high-quality content providers who have developed similar collections and to develop a single, virtual collection. Project Isaac (working title) is the Internet Scout Project's latest resource discovery effort. Project Isaac involves the development of a research testbed that allows experimentation with protocols and algorithms for creating, maintaining, indexing and searching distributed collections of metadata. Project Isaac's infrastructure uses standard Internet protocols, such as the Lightweight Directory Access Protocol (LDAP) and the Common Indexing Protocol (CIP) to distribute queries, return results, and exchange index or centroid information. The overall goal is to support a single-search interface to geographically distributed and independently maintained metadata collections.

Type

a

Page, A.: ¬The search is over : the search-engines secrets of the pros (1996) 0.00

0.0025049087 = product of:
  0.0050098174 = sum of:
    0.0050098174 = product of:
      0.015029452 = sum of:
        0.015029452 = weight(_text_:a in 5670) [ClassicSimilarity], result of:
          0.015029452 = score(doc=5670,freq=10.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.28486365 = fieldWeight in 5670, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=5670)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Abstract: Covers 8 of the most popular search engines. Gives a summary of each and has a nice table of features that also briefly lists the pros and cons. Includes a short explanation of Boolean operators too
Type: a

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.00
```
0.0020164126 = product of:
  0.004032825 = sum of:
    0.004032825 = product of:
      0.012098475 = sum of:
        0.012098475 = weight(_text_:a in 316) [ClassicSimilarity], result of:
          0.012098475 = score(doc=316,freq=18.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.22931081 = fieldWeight in 316, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=316)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC) [10], within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR).

Type

a
Hill, L.L.; Frew, J.; Zheng, Q.: Geographic names : the implementation of a gazetteer in a georeferenced digital library (1999) 0.00
```
0.002003927 = product of:
  0.004007854 = sum of:
    0.004007854 = product of:
      0.012023562 = sum of:
        0.012023562 = weight(_text_:a in 1240) [ClassicSimilarity], result of:
          0.012023562 = score(doc=1240,freq=40.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.22789092 = fieldWeight in 1240, product of:
              6.3245554 = tf(freq=40.0), with freq of:
                40.0 = termFreq=40.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=1240)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

The Alexandria Digital Library (ADL) Project has developed a content standard for gazetteer objects and a hierarchical type scheme for geographic features. Both of these developments are based on ADL experience with an earlier gazetteer component for the Library, based on two gazetteers maintained by the U.S. federal government. We define the minimum components of a gazetteer entry as (1) a geographic name, (2) a geographic location represented by coordinates, and (3) a type designation. With these attributes, a gazetteer can function as a tool for indirect spatial location identification through names and types. The ADL Gazetteer Content Standard supports contribution and sharing of gazetteer entries with rich descriptions beyond the minimum requirements. This paper describes the content standard, the feature type thesaurus, and the implementation and research issues. A gazetteer is list of geographic names, together with their geographic locations and other descriptive information. A geographic name is a proper name for a geographic place and feature, such as Santa Barbara County, Mount Washington, St. Francis Hospital, and Southern California. There are many types of printed gazetteers. For example, the New York Times Atlas has a gazetteer section that can be used to look up a geographic name and find the page(s) and grid reference(s) where the corresponding feature is shown. Some gazetteers provide information about places and features; for example, a history of the locale, population data, physical data such as elevation, or the pronunciation of the name. Some lists of geographic names are available as hierarchical term sets (thesauri) designed for information retreival; these are used to describe bibliographic or museum materials. Examples include the authority files of the U.S. Library of Congress and the GeoRef Thesaurus produced by the American Geological Institute. The Getty Museum has recently made their Thesaurus of Geographic Names available online. This is a major project to develop a controlled vocabulary of current and historical names to describe (i.e., catalog) art and architecture literature. U.S. federal government mapping agencies maintain gazetteers containing the official names of places and/or the names that appear on map series. Examples include the U.S. Geological Survey's Geographic Names Information System (GNIS) and the National Imagery and Mapping Agency's Geographic Names Processing System (GNPS). Both of these are maintained in cooperation with the U.S. Board of Geographic Names (BGN). Many other examples could be cited -- for local areas, for other countries, and for special purposes. There is remarkable diversity in approaches to the description of geographic places and no standardization beyond authoritative sources for the geographic names themselves.

Type

a

Powell, J.; Fox, E.A.: Multilingual federated searching across heterogeneous collections (1998) 0.00

0.002003927 = product of:
  0.004007854 = sum of:
    0.004007854 = product of:
      0.012023562 = sum of:
        0.012023562 = weight(_text_:a in 1250) [ClassicSimilarity], result of:
          0.012023562 = score(doc=1250,freq=10.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.22789092 = fieldWeight in 1250, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=1250)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Abstract: This article describes a scalable system for searching heterogeneous multilingual collections on the World Wide Web. It details a markup language for describing the characteristics of a search engine and its interface, and a protocol for requesting word translations between languages.
Type: a

Schmid, H.: Improvements in Part-of-Speech tagging with an application to German (1995) 0.00

0.002003927 = product of:
  0.004007854 = sum of:
    0.004007854 = product of:
      0.012023562 = sum of:
        0.012023562 = weight(_text_:a in 124) [ClassicSimilarity], result of:
          0.012023562 = score(doc=124,freq=10.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.22789092 = fieldWeight in 124, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=124)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Abstract: This paper presents a couple of extensions to a basic Markov Model tagger (called TreeTagger) which improve its accuracy when trained on small corpora. The basic tagger was originally developed for English Schmid, 1994. The extensions together reduced error rates on a German test corpus by more than a third.
Type: a

Plotkin, R.C.; Schwartz, M.S.: Data modeling for news clip archive : a prototype solution (1997) 0.00
```
0.001901092 = product of:
  0.003802184 = sum of:
    0.003802184 = product of:
      0.011406552 = sum of:
        0.011406552 = weight(_text_:a in 1259) [ClassicSimilarity], result of:
          0.011406552 = score(doc=1259,freq=16.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.2161963 = fieldWeight in 1259, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1259)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

Film, videotape and multimedia archive systems must address the issues of editing, authoring and searching at the media (i.e. tape) or sub media (i.e. scene) level in addition to the traditional inventory management capabilities associated with the physical media. This paper describes a prototype of a database design for the storage, search and retrieval of multimedia and its related information. It also provides a process by which legacy data can be imported to this schema. The Continuous Media Index, or Comix system is the name of the prototype. An implementation of such a digital library solution incorporates multimedia objects, hierarchical relationships and timecode in addition to traditional attribute data. Present video and multimedia archive systems are easily migrated to this architecture. Comix was implemented for a videotape archiving system. It was written for, and implemented using IBM Digital Library version 1.0. A derivative of Comix is currently in development for customer specific applications. Principles of the Comix design as well as the importation methods are not specific to the underlying systems used.

Type

a

Shneiderman, B.; Byrd, D.; Croft, W.B.: Clarifying search : a user-interface framework for text searches (1997) 0.00

0.0017923666 = product of:
  0.0035847332 = sum of:
    0.0035847332 = product of:
      0.0107542 = sum of:
        0.0107542 = weight(_text_:a in 1258) [ClassicSimilarity], result of:
          0.0107542 = score(doc=1258,freq=8.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.20383182 = fieldWeight in 1258, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=1258)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Abstract: Current user interfaces for textual database searching leave much to be desired: individually, they are often confusing, and as a group, they are seriously inconsistent. We propose a four- phase framework for user-interface design: the framework provides common structure and terminology for searching while preserving the distinct features of individual collections and search mechanisms. Users will benefit from faster learning, increased comprehension, and better control, leading to more effective searches and higher satisfaction.
Type: a

Fowler, R.H.; Wilson, B.A.; Fowler, W.A.L.: Information navigator : an information system using associative networks for display and retrieval (1992) 0.00
```
0.0017783088 = product of:
  0.0035566175 = sum of:
    0.0035566175 = product of:
      0.010669853 = sum of:
        0.010669853 = weight(_text_:a in 919) [ClassicSimilarity], result of:
          0.010669853 = score(doc=919,freq=14.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.20223314 = fieldWeight in 919, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=919)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

Document retrieval is a highly interactive process dealing with large amounts of information. Visual representations can provide both a means for managing the complexity of large information structures and an interface style well suited to interactive manipulation. The system we have designed utilizes visually displayed graphic structures and a direct manipulation interface style to supply an integrated environment for retrieval. A common visually displayed network structure is used for query, document content, and term relations. A query can be modified through direct manipulation of its visual form by incorporating terms from any other information structure the system displays. An associative thesaurus of terms and an inter-document network provide information about a document collection that can complement other retrieval aids. Visualization of these large data structures makes use of fisheye views and overview diagrams to help overcome some of the inherent difficulties of orientation and navigation in large information structures.

Type

a
Brin, S.; Page, L.: ¬The anatomy of a large-scale hypertextual Web search engine (1998) 0.00
```
0.0017712379 = product of:
  0.0035424759 = sum of:
    0.0035424759 = product of:
      0.010627427 = sum of:
        0.010627427 = weight(_text_:a in 947) [ClassicSimilarity], result of:
          0.010627427 = score(doc=947,freq=20.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.20142901 = fieldWeight in 947, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=947)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want

Type

a
Van de Sompel, H.; Hochstenbach, P.: Reference linking in a hybrid library environment : part 2: SFX, a generic linking solution (1999) 0.00
```
0.0017712379 = product of:
  0.0035424759 = sum of:
    0.0035424759 = product of:
      0.010627427 = sum of:
        0.010627427 = weight(_text_:a in 1241) [ClassicSimilarity], result of:
          0.010627427 = score(doc=1241,freq=20.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.20142901 = fieldWeight in 1241, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1241)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

This is the second part of two articles about reference linking in hybrid digital libraries. The first part, Frameworks for Linking described the current state-of-the-art and contrasted various approaches to the problem. It identified static and dynamic linking solutions, as well as open and closed linking frameworks. It also included an extensive bibliography. The second part describes our work at the University of Ghent to address these issues. SFX is a generic linking system that we have developed for our own needs, but its underlying concepts can be applied in a wide range of digital libraries. This is a description of the approach to the creation of extended services in a hybrid library environment that has been taken by the Library Automation team at the University of Ghent. The ongoing research has been grouped under the working title Special Effects (SFX). In order to explain the SFX-concepts in a comprehensive way, the discussion will start with a brief description of pre-SFX experiments. Thereafter, the basics of the SFX-approach are explained briefly, in combination with concrete implementation choices taken for the Elektron SFX-linking experiment. Elektron was the name of a modest digital library collaboration between the Universities of Ghent, Louvain and Antwerp.

Type

a
Crane, G.: ¬The Perseus Project and beyond : how building a digital library challenges the humanities and technology (1998) 0.00
```
0.0016463939 = product of:
  0.0032927878 = sum of:
    0.0032927878 = product of:
      0.009878363 = sum of:
        0.009878363 = weight(_text_:a in 1251) [ClassicSimilarity], result of:
          0.009878363 = score(doc=1251,freq=12.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.18723148 = fieldWeight in 1251, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1251)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

For more than ten years, the Perseus Project has been developing a digital library in the humanities. Initial work concentrated exclusively on ancient Greek culture, using this domain as a case study for a compact, densely hypertextual library on a single, but interdisciplinary, subject. Since it has achieved its initial goals with the Greek materials, however, Perseus is using the existing library to study the new possibilities (and limitations) of the electronic medium and to serve as the foundation for work in new cultural domains: Perseus has begun coverage of Roman and now Renaissance materials, with plans for expansion into other areas of the humanities as well. Our goal is not only to help traditional scholars conduct their research more effectively but, more importantly, to help humanists use the technology to redefine the relationship between their work and the broader intellectual community.

Type

a
Chowdhury, A.; Mccabe, M.C.: Improving information retrieval systems using part of speech tagging (1993) 0.00
```
0.0016463939 = product of:
  0.0032927878 = sum of:
    0.0032927878 = product of:
      0.009878363 = sum of:
        0.009878363 = weight(_text_:a in 1061) [ClassicSimilarity], result of:
          0.009878363 = score(doc=1061,freq=12.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.18723148 = fieldWeight in 1061, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1061)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

The object of Information Retrieval is to retrieve all relevant documents for a user query and only those relevant documents. Much research has focused on achieving this objective with little regard for storage overhead or performance. In the paper we evaluate the use of Part of Speech Tagging to improve, the index storage overhead and general speed of the system with only a minimal reduction to precision recall measurements. We tagged 500Mbs of the Los Angeles Times 1990 and 1989 document collection provided by TREC for parts of speech. We then experimented to find the most relevant part of speech to index. We show that 90% of precision recall is achieved with 40% of the document collections terms. We also show that this is a improvement in overhead with only a 1% reduction in precision recall.

Type

a
Landauer, T.K.; Foltz, P.W.; Laham, D.: ¬An introduction to Latent Semantic Analysis (1998) 0.00
```
0.0016463939 = product of:
  0.0032927878 = sum of:
    0.0032927878 = product of:
      0.009878363 = sum of:
        0.009878363 = weight(_text_:a in 1162) [ClassicSimilarity], result of:
          0.009878363 = score(doc=1162,freq=12.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.18723148 = fieldWeight in 1162, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1162)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text (Landauer and Dumais, 1997). The underlying idea is that the aggregate of all the word contexts in which a given word does and does not appear provides a set of mutual constraints that largely determines the similarity of meaning of words and sets of words to each other. The adequacy of LSA's reflection of human knowledge has been established in a variety of ways. For example, its scores overlap those of humans on standard vocabulary and subject matter tests; it mimics human word sorting and category judgments; it simulates word-word and passage-word lexical priming data; and as reported in 3 following articles in this issue, it accurately estimates passage coherence, learnability of passages by individual students, and the quality and quantity of knowledge contained in an essay.

Type

a
Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.00
```
0.0015763023 = product of:
  0.0031526047 = sum of:
    0.0031526047 = product of:
      0.009457814 = sum of:
        0.009457814 = weight(_text_:a in 1253) [ClassicSimilarity], result of:
          0.009457814 = score(doc=1253,freq=44.0), product of:
            0.05276016 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045757167 = queryNorm
            0.1792605 = fieldWeight in 1253, product of:
              6.6332498 = tf(freq=44.0), with freq of:
                44.0 = termFreq=44.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.

Type

a

Search (53 results, page 1 of 3)

Authors

Languages

Themes