Search (163 results, page 2 of 9)

Chen, H.: Semantic research for digital libraries (1999) 0.01
```
0.007995334 = product of:
  0.019988336 = sum of:
    0.005779455 = weight(_text_:a in 1247) [ClassicSimilarity], result of:
      0.005779455 = score(doc=1247,freq=4.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.10809815 = fieldWeight in 1247, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=1247)
    0.014208881 = product of:
      0.028417762 = sum of:
        0.028417762 = weight(_text_:information in 1247) [ClassicSimilarity], result of:
          0.028417762 = score(doc=1247,freq=18.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.34911853 = fieldWeight in 1247, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1247)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

In this era of the Internet and distributed, multimedia computing, new and emerging classes of information systems applications have swept into the lives of office workers and people in general. From digital libraries, multimedia systems, geographic information systems, and collaborative computing to electronic commerce, virtual reality, and electronic video arts and games, these applications have created tremendous opportunities for information and computer science researchers and practitioners. As applications become more pervasive, pressing, and diverse, several well-known information retrieval (IR) problems have become even more urgent. Information overload, a result of the ease of information creation and transmission via the Internet and WWW, has become more troublesome (e.g., even stockbrokers and elementary school students, heavily exposed to various WWW search engines, are versed in such IR terminology as recall and precision). Significant variations in database formats and structures, the richness of information media (text, audio, and video), and an abundance of multilingual information content also have created severe information interoperability problems -- structural interoperability, media interoperability, and multilingual interoperability.

Type

a
Kirriemuir, J.; Brickley, D.; Welsh, S.; Knight, J.; Hamilton, M.: Cross-searching subject gateways : the query routing and forward knowledge approach (1998) 0.01
```
0.007831501 = product of:
  0.019578751 = sum of:
    0.0127425 = weight(_text_:a in 1252) [ClassicSimilarity], result of:
      0.0127425 = score(doc=1252,freq=28.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.23833402 = fieldWeight in 1252, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1252)
    0.006836252 = product of:
      0.013672504 = sum of:
        0.013672504 = weight(_text_:information in 1252) [ClassicSimilarity], result of:
          0.013672504 = score(doc=1252,freq=6.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.16796975 = fieldWeight in 1252, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1252)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

A subject gateway, in the context of network-based resource access, can be defined as some facility that allows easier access to network-based resources in a defined subject area. The simplest types of subject gateways are sets of Web pages containing lists of links to resources. Some gateways index their lists of links and provide a simple search facility. More advanced gateways offer a much enhanced service via a system consisting of a resource database and various indexes, which can be searched and/or browsed through a Web-based interface. Each entry in the database contains information about a network-based resource, such as a Web page, Web site, mailing list or document. Entries are usually created by a cataloguer manually identifying a suitable resource, describing the resource using a template, and submitting the template to the database for indexing. Subject gateways are also known as subject-based information gateways (SBIGs), subject-based gateways, subject index gateways, virtual libraries, clearing houses, subject trees, pathfinders and other variations thereof. This paper describes the characteristics of some of the subject gateways currently accessible through the Web, and compares them to automatic "vacuum cleaner" type search engines, such as AltaVista. The application of WHOIS++, centroids, query routing, and forward knowledge to searching several of these subject gateways simultaneously is outlined. The paper concludes with looking at some of the issues facing subject gateway development in the near future. The paper touches on many of the issues mentioned in a previous paper in D-Lib Magazine, especially regarding resource-discovery related initiatives and services.

Theme

Information Gateway

Type

a

Strobel, S.: ¬The complete Linux kit : fully configured LINUX system kernel (1997) 0.01

0.0075387247 = product of:
  0.037693623 = sum of:
    0.037693623 = product of:
      0.07538725 = sum of:
        0.07538725 = weight(_text_:22 in 8959) [ClassicSimilarity], result of:
          0.07538725 = score(doc=8959,freq=2.0), product of:
            0.16237405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046368346 = queryNorm
            0.46428138 = fieldWeight in 8959, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=8959)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 16. 7.2002 20:22:55

Spink, A.; Wilson, T.; Ellis, D.; Ford, N.: Modeling users' successive searches in digital environments : a National Science Foundation/British Library funded study (1998) 0.01
```
0.0074227354 = product of:
  0.018556839 = sum of:
    0.008595286 = weight(_text_:a in 1255) [ClassicSimilarity], result of:
      0.008595286 = score(doc=1255,freq=26.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.16076508 = fieldWeight in 1255, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1255)
    0.009961553 = product of:
      0.019923106 = sum of:
        0.019923106 = weight(_text_:information in 1255) [ClassicSimilarity], result of:
          0.019923106 = score(doc=1255,freq=26.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.2447598 = fieldWeight in 1255, product of:
              5.0990195 = tf(freq=26.0), with freq of:
                26.0 = termFreq=26.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1255)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

As digital libraries become a major source of information for many people, we need to know more about how people seek and retrieve information in digital environments. Quite commonly, users with a problem-at-hand and associated question-in-mind repeatedly search a literature for answers, and seek information in stages over extended periods from a variety of digital information resources. The process of repeatedly searching over time in relation to a specific, but possibly an evolving information problem (including changes or shifts in a variety of variables), is called the successive search phenomenon. The study outlined in this paper is currently investigating this new and little explored line of inquiry for information retrieval, Web searching, and digital libraries. The purpose of the research project is to investigate the nature, manifestations, and behavior of successive searching by users in digital environments, and to derive criteria for use in the design of information retrieval interfaces and systems supporting successive searching behavior. This study includes two related projects. The first project is based in the School of Library and Information Sciences at the University of North Texas and is funded by a National Science Foundation POWRE Grant <http://www.nsf.gov/cgi-bin/show?award=9753277>. The second project is based at the Department of Information Studies at the University of Sheffield (UK) and is funded by a grant from the British Library <http://www.shef. ac.uk/~is/research/imrg/uncerty.html> Research and Innovation Center. The broad objectives of each project are to examine the nature and extent of successive search episodes in digital environments by real users over time. The specific aim of the current project is twofold: * To characterize progressive changes and shifts that occur in: user situational context; user information problem; uncertainty reduction; user cognitive styles; cognitive and affective states of the user, and consequently in their queries; and * To characterize related changes over time in the type and use of information resources and search strategies particularly related to given capabilities of IR systems, and IR search engines, and examine changes in users' relevance judgments and criteria, and characterize their differences. The study is an observational, longitudinal data collection in the U.S. and U.K. Three questionnaires are used to collect data: reference, client post search and searcher post search questionnaires. Each successive search episode with a search intermediary for textual materials on the DIALOG Information Service is audiotaped and search transaction logs are recorded. Quantitative analysis includes statistical analysis using Likert scale data from the questionnaires and log-linear analysis of sequential data. Qualitative methods include: content analysis, structuring taxonomies; and diagrams to describe shifts and transitions within and between each search episode. Outcomes of the study are the development of appropriate model(s) for IR interactions in successive search episodes and the derivation of a set of design criteria for interfaces and systems supporting successive searching.

Theme

Information Gateway

Type

a
Hill, L.L.; Frew, J.; Zheng, Q.: Geographic names : the implementation of a gazetteer in a georeferenced digital library (1999) 0.01
```
0.007399688 = product of:
  0.01849922 = sum of:
    0.012184162 = weight(_text_:a in 1240) [ClassicSimilarity], result of:
      0.012184162 = score(doc=1240,freq=40.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.22789092 = fieldWeight in 1240, product of:
          6.3245554 = tf(freq=40.0), with freq of:
            40.0 = termFreq=40.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03125 = fieldNorm(doc=1240)
    0.006315058 = product of:
      0.012630116 = sum of:
        0.012630116 = weight(_text_:information in 1240) [ClassicSimilarity], result of:
          0.012630116 = score(doc=1240,freq=8.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.1551638 = fieldWeight in 1240, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=1240)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

The Alexandria Digital Library (ADL) Project has developed a content standard for gazetteer objects and a hierarchical type scheme for geographic features. Both of these developments are based on ADL experience with an earlier gazetteer component for the Library, based on two gazetteers maintained by the U.S. federal government. We define the minimum components of a gazetteer entry as (1) a geographic name, (2) a geographic location represented by coordinates, and (3) a type designation. With these attributes, a gazetteer can function as a tool for indirect spatial location identification through names and types. The ADL Gazetteer Content Standard supports contribution and sharing of gazetteer entries with rich descriptions beyond the minimum requirements. This paper describes the content standard, the feature type thesaurus, and the implementation and research issues. A gazetteer is list of geographic names, together with their geographic locations and other descriptive information. A geographic name is a proper name for a geographic place and feature, such as Santa Barbara County, Mount Washington, St. Francis Hospital, and Southern California. There are many types of printed gazetteers. For example, the New York Times Atlas has a gazetteer section that can be used to look up a geographic name and find the page(s) and grid reference(s) where the corresponding feature is shown. Some gazetteers provide information about places and features; for example, a history of the locale, population data, physical data such as elevation, or the pronunciation of the name. Some lists of geographic names are available as hierarchical term sets (thesauri) designed for information retreival; these are used to describe bibliographic or museum materials. Examples include the authority files of the U.S. Library of Congress and the GeoRef Thesaurus produced by the American Geological Institute. The Getty Museum has recently made their Thesaurus of Geographic Names available online. This is a major project to develop a controlled vocabulary of current and historical names to describe (i.e., catalog) art and architecture literature. U.S. federal government mapping agencies maintain gazetteers containing the official names of places and/or the names that appear on map series. Examples include the U.S. Geological Survey's Geographic Names Information System (GNIS) and the National Imagery and Mapping Agency's Geographic Names Processing System (GNPS). Both of these are maintained in cooperation with the U.S. Board of Geographic Names (BGN). Many other examples could be cited -- for local areas, for other countries, and for special purposes. There is remarkable diversity in approaches to the description of geographic places and no standardization beyond authoritative sources for the geographic names themselves.

Type

a

Peters, C.; Picchi, E.: Across languages, across cultures : issues in multilinguality and digital libraries (1997) 0.01

0.0073474604 = product of:
  0.01836865 = sum of:
    0.009437811 = weight(_text_:a in 1233) [ClassicSimilarity], result of:
      0.009437811 = score(doc=1233,freq=6.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.17652355 = fieldWeight in 1233, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=1233)
    0.0089308405 = product of:
      0.017861681 = sum of:
        0.017861681 = weight(_text_:information in 1233) [ClassicSimilarity], result of:
          0.017861681 = score(doc=1233,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.21943474 = fieldWeight in 1233, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=1233)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: With the recent rapid diffusion over the international computer networks of world-wide distributed document bases, the question of multilingual access and multilingual information retrieval is becoming increasingly relevant. We briefly discuss just some of the issues that must be addressed in order to implement a multilingual interface for a Digital Library system and describe our own approach to this problem.
Theme: Information Gateway
Type: a

Van de Sompel, H.; Hochstenbach, P.: Reference linking in a hybrid library environment : part 1: frameworks for linking (1999) 0.01
```
0.0072633475 = product of:
  0.018158369 = sum of:
    0.008173384 = weight(_text_:a in 1244) [ClassicSimilarity], result of:
      0.008173384 = score(doc=1244,freq=18.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15287387 = fieldWeight in 1244, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03125 = fieldNorm(doc=1244)
    0.009984984 = product of:
      0.019969968 = sum of:
        0.019969968 = weight(_text_:information in 1244) [ClassicSimilarity], result of:
          0.019969968 = score(doc=1244,freq=20.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.2453355 = fieldWeight in 1244, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=1244)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

The creation of services linking related information entities is an area that is attracting an ever increasing interest in the ongoing development of the World Wide Web in general, and of research-related information systems in particular. Currently, both practice and theory point at linking services as being a major domain for innovation enabled by digital communication of content. Publishers, subscription agents, researchers and libraries are all looking into ways to create added value by linking related information entities, as such presenting the information within a broader context estimated to be relevant to the users of the information. This is the first of two articles in D-Lib Magazine on this topic. This first part describes the current state-of-the-art and contrasts various approaches to the problem. It identifies static and dynamic linking solutions as well as open and closed linking frameworks. It also includes an extensive bibliography. The second part, SFX, a Generic Linking Solution describes a system that we have developed for linking in a hybrid working environment. The creation of services linking related information entities is an area that is attracting an ever increasing interest in the ongoing development of the World Wide Web in general, and of research-related information systems in particular. Although most writings on electronic scientific communication have touted other benefits, such as the increase in communication speed, the possibility to exchange multimedia content and the absence of limitations on the length of research papers, currently both practice and theory point at linking services as being a major opportunity for improved communication of content. Publishers, subscription agents, researchers and libraries are all looking into ways to create added-value by linking related information entities, as such presenting the information within a broader context estimated to be relevant to the users of the information.

Type

a

Dillon, A.: What is the shape of information? : human factors in the development and use of digital libraries (1995) 0.01

0.007131535 = product of:
  0.017828837 = sum of:
    0.008258085 = weight(_text_:a in 3314) [ClassicSimilarity], result of:
      0.008258085 = score(doc=3314,freq=6.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.1544581 = fieldWeight in 3314, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3314)
    0.009570752 = product of:
      0.019141505 = sum of:
        0.019141505 = weight(_text_:information in 3314) [ClassicSimilarity], result of:
          0.019141505 = score(doc=3314,freq=6.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.23515764 = fieldWeight in 3314, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3314)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: At Indiana, we are currentlxy investigating several aspects of electronic document usage that relate to the organization of information in digital environments. This work is collectively referred to under the heading: the perception of ahpe in information. The aim of this research is to identify aspects of presentation that affords users with a sense of location and order in electronic space, and to transfer these findings to developers of digital library applications. As well as empirical research, SLIS is involved in the development of a number of practical projects involving the campus libraries utilising sociotechnical approaches to design, but it is the research component of our efforts that will be emphasized here

Rindflesch, T.C.; Aronson, A.R.: Semantic processing in information retrieval (1993) 0.01

0.00711762 = product of:
  0.01779405 = sum of:
    0.0067426977 = weight(_text_:a in 4121) [ClassicSimilarity], result of:
      0.0067426977 = score(doc=4121,freq=4.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.12611452 = fieldWeight in 4121, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4121)
    0.011051352 = product of:
      0.022102704 = sum of:
        0.022102704 = weight(_text_:information in 4121) [ClassicSimilarity], result of:
          0.022102704 = score(doc=4121,freq=8.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.27153665 = fieldWeight in 4121, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4121)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Intuition suggests that one way to enhance the information retrieval process would be the use of phrases to characterize the contents of text. A number of researchers, however, have noted that phrases alone do not improve retrieval effectiveness. In this paper we briefly review the use of phrases in information retrieval and then suggest extensions to this paradigm using semantic information. We claim that semantic processing, which can be viewed as expressing relations between the concepts represented by phrases, will in fact enhance retrieval effectiveness. The availability of the UMLS® domain model, which we exploit extensively, significantly contributes to the feasibility of this processing.
Type: a

Mortimer, M.; Lockhead, K.; Hyland, M.: CatSkill : a multimedia course on AACR2 and MARC (1994) 0.01

0.007058388 = product of:
  0.01764597 = sum of:
    0.008173384 = weight(_text_:a in 7865) [ClassicSimilarity], result of:
      0.008173384 = score(doc=7865,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15287387 = fieldWeight in 7865, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.09375 = fieldNorm(doc=7865)
    0.009472587 = product of:
      0.018945174 = sum of:
        0.018945174 = weight(_text_:information in 7865) [ClassicSimilarity], result of:
          0.018945174 = score(doc=7865,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.23274569 = fieldWeight in 7865, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.09375 = fieldNorm(doc=7865)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Footnote: Rez. in: Journal of librarianship and information science 29(1997) no.1, S.54-56 (J.H. Bowman)

Shafer, K.E.: Evaluating Scorpion results (1998) 0.01

0.0070104985 = product of:
  0.017526247 = sum of:
    0.009632425 = weight(_text_:a in 1569) [ClassicSimilarity], result of:
      0.009632425 = score(doc=1569,freq=4.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.18016359 = fieldWeight in 1569, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.078125 = fieldNorm(doc=1569)
    0.007893822 = product of:
      0.015787644 = sum of:
        0.015787644 = weight(_text_:information in 1569) [ClassicSimilarity], result of:
          0.015787644 = score(doc=1569,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.19395474 = fieldWeight in 1569, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=1569)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Scorpion is a research project at OCLC that builds tools for automatic subject assignment by combining library science and information retrieval techniques. A thesis of Scorpion is that the Dewey Decimal Classification (Dewey) can be used to perform automatic subject assignment for electronic items.

Oard, D.W.: Serving users in many languages : cross-language information retrieval for digital libraries (1997) 0.01
```
0.00690148 = product of:
  0.017253699 = sum of:
    0.0068111527 = weight(_text_:a in 1261) [ClassicSimilarity], result of:
      0.0068111527 = score(doc=1261,freq=8.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.12739488 = fieldWeight in 1261, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1261)
    0.010442546 = product of:
      0.020885091 = sum of:
        0.020885091 = weight(_text_:information in 1261) [ClassicSimilarity], result of:
          0.020885091 = score(doc=1261,freq=14.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.256578 = fieldWeight in 1261, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1261)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

We are rapidly constructing an extensive network infrastructure for moving information across national boundaries, but much remains to be done before linguistic barriers can be surmounted as effectively as geographic ones. Users seeking information from a digital library could benefit from the ability to query large collections once using a single language, even when more than one language is present in the collection. If the information they locate is not available in a language that they can read, some form of translation will be needed. At present, multilingual thesauri such as EUROVOC help to address this challenge by facilitating controlled vocabulary search using terms from several languages, and services such as INSPEC produce English abstracts for documents in other languages. On the other hand, support for free text searching across languages is not yet widely deployed, and fully automatic machine translation is presently neither sufficiently fast nor sufficiently accurate to adequately support interactive cross-language information seeking. An active and rapidly growing research community has coalesced around these and other related issues, applying techniques drawn from several fields - notably information retrieval and natural language processing - to provide access to large multilingual collections.

Theme

Information Gateway

Type

a
Multilingual information management : current levels and future abilities. A report Commissioned by the US National Science Foundation and also delivered to the European Commission's Language Engineering Office and the US Defense Advanced Research Projects Agency, April 1999 (1999) 0.01
```
0.006871411 = product of:
  0.017178528 = sum of:
    0.00770594 = weight(_text_:a in 6068) [ClassicSimilarity], result of:
      0.00770594 = score(doc=6068,freq=16.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.14413087 = fieldWeight in 6068, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03125 = fieldNorm(doc=6068)
    0.009472587 = product of:
      0.018945174 = sum of:
        0.018945174 = weight(_text_:information in 6068) [ClassicSimilarity], result of:
          0.018945174 = score(doc=6068,freq=18.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.23274568 = fieldWeight in 6068, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=6068)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Over the past 50 years, a variety of language-related capabilities has been developed in machine translation, information retrieval, speech recognition, text summarization, and so on. These applications rest upon a set of core techniques such as language modeling, information extraction, parsing, generation, and multimedia planning and integration; and they involve methods using statistics, rules, grammars, lexicons, ontologies, training techniques, and so on. It is a puzzling fact that although all of this work deals with language in some form or other, the major applications have each developed a separate research field. For example, there is no reason why speech recognition techniques involving n-grams and hidden Markov models could not have been used in machine translation 15 years earlier than they were, or why some of the lexical and semantic insights from the subarea called Computational Linguistics are still not used in information retrieval.
This picture will rapidly change. The twin challenges of massive information overload via the web and ubiquitous computers present us with an unavoidable task: developing techniques to handle multilingual and multi-modal information robustly and efficiently, with as high quality performance as possible. The most effective way for us to address such a mammoth task, and to ensure that our various techniques and applications fit together, is to start talking across the artificial research boundaries. Extending the current technologies will require integrating the various capabilities into multi-functional and multi-lingual natural language systems. However, at this time there is no clear vision of how these technologies could or should be assembled into a coherent framework. What would be involved in connecting a speech recognition system to an information retrieval engine, and then using machine translation and summarization software to process the retrieved text? How can traditional parsing and generation be enhanced with statistical techniques? What would be the effect of carefully crafted lexicons on traditional information retrieval? At which points should machine translation be interleaved within information retrieval systems to enable multilingual processing?

Chowdhury, A.; Mccabe, M.C.: Improving information retrieval systems using part of speech tagging (1993) 0.01

0.0066833766 = product of:
  0.016708441 = sum of:
    0.0100103095 = weight(_text_:a in 1061) [ClassicSimilarity], result of:
      0.0100103095 = score(doc=1061,freq=12.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.18723148 = fieldWeight in 1061, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=1061)
    0.0066981306 = product of:
      0.013396261 = sum of:
        0.013396261 = weight(_text_:information in 1061) [ClassicSimilarity], result of:
          0.013396261 = score(doc=1061,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.16457605 = fieldWeight in 1061, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1061)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: The object of Information Retrieval is to retrieve all relevant documents for a user query and only those relevant documents. Much research has focused on achieving this objective with little regard for storage overhead or performance. In the paper we evaluate the use of Part of Speech Tagging to improve, the index storage overhead and general speed of the system with only a minimal reduction to precision recall measurements. We tagged 500Mbs of the Los Angeles Times 1990 and 1989 document collection provided by TREC for parts of speech. We then experimented to find the most relevant part of speech to index. We show that 90% of precision recall is achieved with 40% of the document collections terms. We also show that this is a improvement in overhead with only a 1% reduction in precision recall.
Type: a

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.01
```
0.006675433 = product of:
  0.016688582 = sum of:
    0.009584142 = weight(_text_:a in 1253) [ClassicSimilarity], result of:
      0.009584142 = score(doc=1253,freq=44.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.1792605 = fieldWeight in 1253, product of:
          6.6332498 = tf(freq=44.0), with freq of:
            44.0 = termFreq=44.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
    0.0071044406 = product of:
      0.014208881 = sum of:
        0.014208881 = weight(_text_:information in 1253) [ClassicSimilarity], result of:
          0.014208881 = score(doc=1253,freq=18.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.17455927 = fieldWeight in 1253, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.

Type

a

Sowards, S.W.: ¬A typology for ready reference Web sites in libraries (1996) 0.01

0.006654713 = product of:
  0.016636781 = sum of:
    0.00770594 = weight(_text_:a in 944) [ClassicSimilarity], result of:
      0.00770594 = score(doc=944,freq=4.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.14413087 = fieldWeight in 944, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=944)
    0.0089308405 = product of:
      0.017861681 = sum of:
        0.017861681 = weight(_text_:information in 944) [ClassicSimilarity], result of:
          0.017861681 = score(doc=944,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.21943474 = fieldWeight in 944, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=944)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Many libraries manage Web sites intended to provide their users with online resources suitable for answering reference questions. Most of these sites can be analyzed in terms of their depth, and their organizing and searching features. Composing a typology based on these factors sheds light on the critical design decisions that influence whether users of these sites succees or fail to find information easily, rapidly and accurately. The same analysis highlights some larger design issues, both for Web sites and for information management at large

Siripan, P.: Metadata and trends of cataloging in Thai libraries (1999) 0.01

0.006654713 = product of:
  0.016636781 = sum of:
    0.00770594 = weight(_text_:a in 4183) [ClassicSimilarity], result of:
      0.00770594 = score(doc=4183,freq=4.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.14413087 = fieldWeight in 4183, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=4183)
    0.0089308405 = product of:
      0.017861681 = sum of:
        0.017861681 = weight(_text_:information in 4183) [ClassicSimilarity], result of:
          0.017861681 = score(doc=4183,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.21943474 = fieldWeight in 4183, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=4183)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: A status of cataloging in Thailand shows a movement toward the use of information technology. The international standards for cataloging are being used and modified to effectively organize the information resources. An expanded scope of resources needed cataloging now covers cataloging the Web resources. The paper mentions Thailand's participation in the international working group on the use of metadata for libraries

¬Third International World Wide Web Conference, Darmstadt 1995 : [Inhaltsverzeichnis] (1995) 0.01
```
0.006550755 = product of:
  0.016376887 = sum of:
    0.008173384 = weight(_text_:a in 3458) [ClassicSimilarity], result of:
      0.008173384 = score(doc=3458,freq=8.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15287387 = fieldWeight in 3458, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=3458)
    0.008203502 = product of:
      0.016407004 = sum of:
        0.016407004 = weight(_text_:information in 3458) [ClassicSimilarity], result of:
          0.016407004 = score(doc=3458,freq=6.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.20156369 = fieldWeight in 3458, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3458)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

ANDREW, K. u. F. KAPPE: Serving information to the Web with Hyper-G; BARBIERI, K., H.M. DOERR u. D. DWYER: Creating a virtual classroom for interactive education on the Web; CAMPBELL, J.K., S.B. JONES, N.M. STEPHENS u. S. HURLEY: Constructing educational courseware using NCSA Mosaic and the World Wide Web; CATLEDGE, L.L. u. J.E. PITKOW: Characterizing browsing strategies in the World-Wide Web; CLAUSNITZER, A. u. P. VOGEL: A WWW interface to the OMNIS/Myriad literature retrieval engine; FISCHER, R. u. L. PERROCHON: IDLE: Unified W3-access to interactive information servers; FOLEY, J.D.: Visualizing the World-Wide Web with the navigational view builder; FRANKLIN, S.D. u. B. IBRAHIM: Advanced educational uses of the World-Wide Web; FUHR, N., U. PFEIFER u. T. HUYNH: Searching structured documents with the enhanced retrieval functionality of free WAIS-sf and SFgate; FIORITO, M., J. OKSANEN u. D.R. IOIVANE: An educational environment using WWW; KENT, R.E. u. C. NEUSS: Conceptual analysis of resource meta-information; SHELDON, M.A. u. R. WEISS: Discover: a resource discovery system based on content routing; WINOGRAD, T.: Beyond browsing: shared comments, SOAPs, Trails, and On-line communities

Carrière, J.; Kazman, R.: WebQuery : searching and visualizing the Web through connectivity (1996) 0.01

0.006550755 = product of:
  0.016376887 = sum of:
    0.008173384 = weight(_text_:a in 2676) [ClassicSimilarity], result of:
      0.008173384 = score(doc=2676,freq=8.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15287387 = fieldWeight in 2676, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2676)
    0.008203502 = product of:
      0.016407004 = sum of:
        0.016407004 = weight(_text_:information in 2676) [ClassicSimilarity], result of:
          0.016407004 = score(doc=2676,freq=6.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.20156369 = fieldWeight in 2676, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2676)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Finding information located somewhere on the WWW is an error-prone and frustrating task. The WebQuey system offers a powerful new method for searching the Web based on connectivity and content. We do this by examining links among the nodes returned in a keyword-based query. We then rank the nodes, giving the highest rank to the most highly connected nodes. By doing so, we are finding 'hot spots' on the Web that contain onformation germane to a user's query. WebQuery not only ranks and filters the results of a Web query, it also extends the result set beyond what the search engine retrieves, by finding 'interesting' sites that are hoghly connected to those sites returned by the original query. Even with WebQuery filtering and ranking query results, the result sets can be enourmous. So, wen need to visualize the returned information. We explore several techniques for visualizing this information - including cone trees, 2D graphs, 3D graphy, lists, and bullseyes - and discuss the criteria for using each of the techniques

Brin, S.; Page, L.: ¬The anatomy of a large-scale hypertextual Web search engine (1998) 0.01
```
0.006540462 = product of:
  0.016351154 = sum of:
    0.010769378 = weight(_text_:a in 947) [ClassicSimilarity], result of:
      0.010769378 = score(doc=947,freq=20.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.20142901 = fieldWeight in 947, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=947)
    0.0055817757 = product of:
      0.011163551 = sum of:
        0.011163551 = weight(_text_:information in 947) [ClassicSimilarity], result of:
          0.011163551 = score(doc=947,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13714671 = fieldWeight in 947, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=947)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want

Type

a

Search (163 results, page 2 of 9)

Authors

Languages

Types

Themes

Subjects

Classifications