Search (90 results, page 5 of 5)

Dumais, S.T.: Latent semantic analysis (2003) 0.00
```
0.0040711993 = product of:
  0.016284797 = sum of:
    0.016284797 = weight(_text_:information in 2462) [ClassicSimilarity], result of:
      0.016284797 = score(doc=2462,freq=20.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.18400162 = fieldWeight in 2462, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0234375 = fieldNorm(doc=2462)
  0.25 = coord(1/4)
```
Abstract

Latent Semantic Analysis (LSA) was first introduced in Dumais, Furnas, Landauer, and Deerwester (1988) and Deerwester, Dumais, Furnas, Landauer, and Harshman (1990) as a technique for improving information retrieval. The key insight in LSA was to reduce the dimensionality of the information retrieval problem. Most approaches to retrieving information depend an a lexical match between words in the user's query and those in documents. Indeed, this lexical matching is the way that the popular Web and enterprise search engines work. Such systems are, however, far from ideal. We are all aware of the tremendous amount of irrelevant information that is retrieved when searching. We also fail to find much of the existing relevant material. LSA was designed to address these retrieval problems, using dimension reduction techniques. Fundamental characteristics of human word usage underlie these retrieval failures. People use a wide variety of words to describe the same object or concept (synonymy). Furnas, Landauer, Gomez, and Dumais (1987) showed that people generate the same keyword to describe well-known objects only 20 percent of the time. Poor agreement was also observed in studies of inter-indexer consistency (e.g., Chan, 1989; Tarr & Borko, 1974) in the generation of search terms (e.g., Fidel, 1985; Bates, 1986), and in the generation of hypertext links (Furner, Ellis, & Willett, 1999). Because searchers and authors often use different words, relevant materials are missed. Someone looking for documents an "human-computer interaction" will not find articles that use only the phrase "man-machine studies" or "human factors." People also use the same word to refer to different things (polysemy). Words like "saturn," "jaguar," or "chip" have several different meanings. A short query like "saturn" will thus return many irrelevant documents. The query "Saturn Gar" will return fewer irrelevant items, but it will miss some documents that use only the terms "Saturn automobile." In searching, there is a constant tension between being overly specific and missing relevant information, and being more general and returning irrelevant information.
A number of approaches have been developed in information retrieval to address the problems caused by the variability in word usage. Stemming is a popular technique used to normalize some kinds of surface-level variability by converting words to their morphological root. For example, the words "retrieve," "retrieval," "retrieved," and "retrieving" would all be converted to their root form, "retrieve." The root form is used for both document and query processing. Stemming sometimes helps retrieval, although not much (Harman, 1991; Hull, 1996). And, it does not address Gases where related words are not morphologically related (e.g., physician and doctor). Controlled vocabularies have also been used to limit variability by requiring that query and index terms belong to a pre-defined set of terms. Documents are indexed by a specified or authorized list of subject headings or index terms, called the controlled vocabulary. Library of Congress Subject Headings, Medical Subject Headings, Association for Computing Machinery (ACM) keywords, and Yellow Pages headings are examples of controlled vocabularies. If searchers can find the right controlled vocabulary terms, they do not have to think of all the morphologically related or synonymous terms that authors might have used. However, assigning controlled vocabulary terms in a consistent and thorough manner is a time-consuming and usually manual process. A good deal of research has been published about the effectiveness of controlled vocabulary indexing compared to full text indexing (e.g., Bates, 1998; Lancaster, 1986; Svenonius, 1986). The combination of both full text and controlled vocabularies is often better than either alone, although the size of the advantage is variable (Lancaster, 1986; Markey, Atherton, & Newton, 1982; Srinivasan, 1996). Richer thesauri have also been used to provide synonyms, generalizations, and specializations of users' search terms (see Srinivasan, 1992, for a review). Controlled vocabularies and thesaurus entries can be generated either manually or by the automatic analysis of large collections of texts.
With the advent of large-scale collections of full text, statistical approaches are being used more and more to analyze the relationships among terms and documents. LSA takes this approach. LSA induces knowledge about the meanings of documents and words by analyzing large collections of texts. The approach simultaneously models the relationships among documents based an their constituent words, and the relationships between words based an their occurrence in documents. By using fewer dimensions for representation than there are unique words, LSA induces similarities among terms that are useful in solving the information retrieval problems described earlier. LSA is a fully automatic statistical approach to extracting relations among words by means of their contexts of use in documents, passages, or sentences. It makes no use of natural language processing techniques for analyzing morphological, syntactic, or semantic relations. Nor does it use humanly constructed resources like dictionaries, thesauri, lexical reference systems (e.g., WordNet), semantic networks, or other knowledge representations. Its only input is large amounts of texts. LSA is an unsupervised learning technique. It starts with a large collection of texts, builds a term-document matrix, and tries to uncover some similarity structures that are useful for information retrieval and related text-analysis problems. Several recent ARIST chapters have focused an text mining and discovery (Benoit, 2002; Solomon, 2002; Trybula, 2000). These chapters provide complementary coverage of the field of text analysis.

Source

Annual review of information science and technology. 38(2004), S.189-230
Saracevic, T.: Relevance: a review of the literature and a framework for thinking on the notion in information science. Part II : nature and manifestations of relevance (2007) 0.00
```
0.0038383633 = product of:
  0.015353453 = sum of:
    0.015353453 = weight(_text_:information in 612) [ClassicSimilarity], result of:
      0.015353453 = score(doc=612,freq=10.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.1734784 = fieldWeight in 612, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=612)
  0.25 = coord(1/4)
```
Abstract

Relevance is a, if not even the, key notion in information science in general and information retrieval in particular. This two-part critical review traces and synthesizes the scholarship on relevance over the past 30 years and provides an updated framework within which the still widely dissonant ideas and works about relevance might be interpreted and related. It is a continuation and update of a similar review that appeared in 1975 under the same title, considered here as being Part I. The present review is organized into two parts: Part II addresses the questions related to nature and manifestations of relevance, and Part III addresses questions related to relevance behavior and effects. In Part II, the nature of relevance is discussed in terms of meaning ascribed to relevance, theories used or proposed, and models that have been developed. The manifestations of relevance are classified as to several kinds of relevance that form an interdependent system of relevances. In Part III, relevance behavior and effects are synthesized using experimental and observational works that incorporate data. In both parts, each section concludes with a summary that in effect provides an interpretation and synthesis of contemporary thinking on the topic treated or suggests hypotheses for future research. Analyses of some of the major trends that shape relevance work are offered in conclusions.

Content

Relevant: Having significant and demonstrable bearing on the matter at hand.[Note *][A version of this article has been published in 2006 as a chapter in E.G. Abels & D.A. Nitecki (Eds.), Advances in Librarianship (Vol. 30, pp. 3-71). San Diego: Academic Press. (Saracevic, 2006).] Relevance: The ability as of an information retrieval system to retrieve material that satisfies the needs of the user. - Merriam-Webster Dictionary 2005

Source

Journal of the American Society for Information Science and Technology. 58(2007) no.13, S.1915-1933
Legg, C.: Ontologies on the Semantic Web (2007) 0.00
```
0.0038383633 = product of:
  0.015353453 = sum of:
    0.015353453 = weight(_text_:information in 1979) [ClassicSimilarity], result of:
      0.015353453 = score(doc=1979,freq=10.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.1734784 = fieldWeight in 1979, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=1979)
  0.25 = coord(1/4)
```
Abstract

As an informational technology, the World Wide Web has enjoyed spectacular success. In just ten years it has transformed the way information is produced, stored, and shared in arenas as diverse as shopping, family photo albums, and high-level academic research. The "Semantic Web" is touted by its developers as equally revolutionary, although it has not yet achieved anything like the Web's exponential uptake. It seeks to transcend a current limitation of the Web - that it largely requires indexing to be accomplished merely on specific character strings. Thus, a person searching for information about "turkey" (the bird) receives from current search engines many irrelevant pages about "Turkey" (the country) and nothing about the Spanish "pavo" even if he or she is a Spanish-speaker able to understand such pages. The Semantic Web vision is to develop technology to facilitate retrieval of information via meanings, not just spellings. For this to be possible, most commentators believe, Semantic Web applications will have to draw on some kind of shared, structured, machine-readable conceptual scheme. Thus, there has been a convergence between the Semantic Web research community and an older tradition with roots in classical Artificial Intelligence (AI) research (sometimes referred to as "knowledge representation") whose goal is to develop a formal ontology. A formal ontology is a machine-readable theory of the most fundamental concepts or "categories" required in order to understand information pertaining to any knowledge domain. A review of the attempts that have been made to realize this goal provides an opportunity to reflect in interestingly concrete ways on various research questions such as the following: - How explicit a machine-understandable theory of meaning is it possible or practical to construct? - How universal a machine-understandable theory of meaning is it possible or practical to construct? - How much (and what kind of) inference support is required to realize a machine-understandable theory of meaning? - What is it for a theory of meaning to be machine-understandable anyway?

Source

Annual review of information science and technology. 41(2007), S.407-451
Smeaton, A.F.: Indexing, browsing, and searching of digital video (2003) 0.00
```
0.0037164795 = product of:
  0.014865918 = sum of:
    0.014865918 = weight(_text_:information in 4274) [ClassicSimilarity], result of:
      0.014865918 = score(doc=4274,freq=6.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.16796975 = fieldWeight in 4274, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4274)
  0.25 = coord(1/4)
```
Abstract

Video is a communications medium that normally brings together moving pictures with a synchronized audio track into a discrete piece or pieces of information. A "piece" of video is variously referred to as a frame, a shot, a scene, a Clip, a program, or an episode; these pieces are distinguished by their length and by their composition. We shall return to the definition of each of these in the section an automatically structuring and indexing digital video. In modern society, Video is commonplace and is usually equated with television, movies, or home Video produced by a Video camera or camcorder. We also accept Video recorded from closed circuit TVs for security and surveillance as part of our daily lives. In short, Video is ubiquitous. Digital Video is, as the name suggests, the creation or capture of Video information in digital format. Most Video produced today, commercial, surveillance, or domestic, is produced in digital form, although the medium of Video predates the development of digital computing by several decades. The essential nature of Video has not changed with the advent of digital computing. It is still moving pictures and synchronized audio. However, the production methods and the end product have gone through significant evolution, in the last decade especially.

Source

Annual review of information science and technology. 38(2004), S.371-409
Börner, K.; Chen, C.; Boyack, K.W.: Visualizing knowledge domains (2002) 0.00
```
0.003358568 = product of:
  0.013434272 = sum of:
    0.013434272 = weight(_text_:information in 4286) [ClassicSimilarity], result of:
      0.013434272 = score(doc=4286,freq=10.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.1517936 = fieldWeight in 4286, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4286)
  0.25 = coord(1/4)
```
Abstract

This chapter reviews visualization techniques that can be used to map the ever-growing domain structure of scientific disciplines and to support information retrieval and classification. In contrast to the comprehensive surveys conducted in traditional fashion by Howard White and Katherine McCain (1997, 1998), this survey not only reviews emerging techniques in interactive data analysis and information visualization, but also depicts the bibliographical structure of the field itself. The chapter starts by reviewing the history of knowledge domain visualization. We then present a general process flow for the visualization of knowledge domains and explain commonly used techniques. In order to visualize the domain reviewed by this chapter, we introduce a bibliographic data set of considerable size, which includes articles from the citation analysis, bibliometrics, semantics, and visualization literatures. Using tutorial style, we then apply various algorithms to demonstrate the visualization effectsl produced by different approaches and compare the results. The domain visualizations reveal the relationships within and between the four fields that together constitute the focus of this chapter. We conclude with a general discussion of research possibilities. Painting a "big picture" of scientific knowledge has long been desirable for a variety of reasons. Traditional approaches are brute forcescholars must sort through mountains of literature to perceive the outlines of their field. Obviously, this is time-consuming, difficult to replicate, and entails subjective judgments. The task is enormously complex. Sifting through recently published documents to find those that will later be recognized as important is labor intensive. Traditional approaches struggle to keep up with the pace of information growth. In multidisciplinary fields of study it is especially difficult to maintain an overview of literature dynamics. Painting the big picture of an everevolving scientific discipline is akin to the situation described in the widely known Indian legend about the blind men and the elephant. As the story goes, six blind men were trying to find out what an elephant looked like. They touched different parts of the elephant and quickly jumped to their conclusions. The one touching the body said it must be like a wall; the one touching the tail said it was like a snake; the one touching the legs said it was like a tree trunk, and so forth. But science does not stand still; the steady stream of new scientific literature creates a continuously changing structure. The resulting disappearance, fusion, and emergence of research areas add another twist to the tale-it is as if the elephant is running and dynamically changing its shape. Domain visualization, an emerging field of study, is in a similar situation. Relevant literature is spread across disciplines that have traditionally had few connections. Researchers examining the domain from a particular discipline cannot possibly have an adequate understanding of the whole. As noted by White and McCain (1997), the new generation of information scientists is technically driven in its efforts to visualize scientific disciplines. However, limited progress has been made in terms of connecting pioneers' theories and practices with the potentialities of today's enabling technologies. If the difference between past and present generations lies in the power of available technologies, what they have in common is the ultimate goal-to reveal the development of scientific knowledge.

Source

Annual review of information science and technology. 37(2003), S.179-258
Thelwall, M.; Vaughan, L.; Björneborn, L.: Webometrics (2004) 0.00
```
0.0030344925 = product of:
  0.01213797 = sum of:
    0.01213797 = weight(_text_:information in 4279) [ClassicSimilarity], result of:
      0.01213797 = score(doc=4279,freq=4.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.13714671 = fieldWeight in 4279, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4279)
  0.25 = coord(1/4)
```
Abstract

Webometrics, the quantitative study of Web-related phenomena, emerged from the realization that methods originally designed for bibliometric analysis of scientific journal article citation patterns could be applied to the Web, with commercial search engines providing the raw data. Almind and Ingwersen (1997) defined the field and gave it its name. Other pioneers included Rodriguez Gairin (1997) and Aguillo (1998). Larson (1996) undertook exploratory link structure analysis, as did Rousseau (1997). Webometrics encompasses research from fields beyond information science such as communication studies, statistical physics, and computer science. In this review we concentrate on link analysis, but also cover other aspects of webometrics, including Web log fle analysis. One theme that runs through this chapter is the messiness of Web data and the need for data cleansing heuristics. The uncontrolled Web creates numerous problems in the interpretation of results, for instance, from the automatic creation or replication of links. The loose connection between top-level domain specifications (e.g., com, edu, and org) and their actual content is also a frustrating problem. For example, many .com sites contain noncommercial content, although com is ostensibly the main commercial top-level domain. Indeed, a skeptical researcher could claim that obstacles of this kind are so great that all Web analyses lack value. As will be seen, one response to this view, a view shared by critics of evaluative bibliometrics, is to demonstrate that Web data correlate significantly with some non-Web data in order to prove that the Web data are not wholly random. A practical response has been to develop increasingly sophisticated data cleansing techniques and multiple data analysis methods.

Source

Annual review of information science and technology. 39(2005), S.81-138
Haythornthwaite, C.; Hagar, C.: ¬The social worlds of the Web (2004) 0.00
```
0.0030344925 = product of:
  0.01213797 = sum of:
    0.01213797 = weight(_text_:information in 4282) [ClassicSimilarity], result of:
      0.01213797 = score(doc=4282,freq=4.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.13714671 = fieldWeight in 4282, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4282)
  0.25 = coord(1/4)
```
Abstract

We know this Web world. We live in it, particularly those of us in developed countries. Even if we do not go online daily, we live with itour culture is imprinted with online activity and vocabulary: e-mailing colleagues, surfing the Web, posting Web pages, blogging, gender-bending in cyberspace, texting and instant messaging friends, engaging in ecommerce, entering an online chat room, or morphing in an online world. We use it-to conduct business, find information, talk with friends and colleagues. We know it is something separate, yet we incorporate it into our daily lives. We identify with it, bringing to it behaviors and expectations we hold for the world in general. We approach it as explorers and entrepreneurs, ready to move into unknown opportunities and territory; creators and engineers, eager to build new structures; utopians for whom "the world of the Web" represents the chance to start again and "get it right" this time; utilitarians, ready to get what we can out of the new structures; and dystopians, for whom this is just more evidence that there is no way to "get it right." The word "world" has many connotations. The Oxford English Dictionary (http://dictionary.oed.com) gives 27 definitions for the noun "world" including: - The sphere within which one's interests are bound up or one's activities find scope; (one's) sphere of action or thought; the "realm" within which one moves or lives. - A group or system of things or beings associated by common characteristics (denoted by a qualifying word or phrase), or considered as constituting a unity. - Human society considered in relation to its activities, difficulties, temptations, and the like; hence, contextually, the ways, practices, or customs of the people among whom one lives; the occupations and interests of society at large.

Source

Annual review of information science and technology. 39(2005), S.311-346
Gilliland-Swetland, A.: Electronic records management (2004) 0.00
```
0.0029731835 = product of:
  0.011892734 = sum of:
    0.011892734 = weight(_text_:information in 4280) [ClassicSimilarity], result of:
      0.011892734 = score(doc=4280,freq=6.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.1343758 = fieldWeight in 4280, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=4280)
  0.25 = coord(1/4)
```
Abstract

What is an electronic record, how should it best be preserved and made available, and to what extent do traditional, paradigmatic archival precepts such as provenance, original order, and archival custody hold when managing it? Over more than four decades of work in the area of electronic records (formerly known as machine-readable records), theorists and researchers have offered answers to these questions-or at least devised approaches for trying to answer them. However, a set of fundamental questions about the nature of the record and the applicability of traditional archival theory still confronts researchers seeking to advance knowledge and development in this increasingly active, but contested, area of research. For example, which characteristics differentiate a record from other types of information objects (such as publications or raw research data)? Are these characteristics consistently present regardless of the medium of the record? Does the record always have to have a tangible form? How does the record manifest itself within different technological and procedural contexts, and in particular, how do we determine the parameters of electronic records created in relational, distributed, or dynamic environments that bear little resemblance an the surface to traditional paper-based environments? At the heart of electronic records research lies a dual concern with the nature of the record as a specific type of information object and the nature of legal and historical evidence in a digital world. Electronic records research is relevant to the agendas of many communities in addition to that of archivists. Its emphasis an accountability and an establishing trust in records, for example, addresses concerns that are central to both digital government and e-commerce. Research relating to electronic records is still relatively homogeneous in terms of scope, in that most major research initiatives have addressed various combinations of the following: theory building in terms of identifying the nature of the electronic record, developing alternative conceptual models, establishing the determinants of reliability and authenticity in active and preserved electronic records, identifying functional and metadata requirements for record keeping, developing and testing preservation

Source

Annual review of information science and technology. 39(2005), S.219-256

Kling, R.: ¬The Internet and unrefereed scholarly publishing (2003) 0.00

0.0025748524 = product of:
  0.01029941 = sum of:
    0.01029941 = weight(_text_:information in 4272) [ClassicSimilarity], result of:
      0.01029941 = score(doc=4272,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.116372846 = fieldWeight in 4272, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4272)
  0.25 = coord(1/4)

Source: Annual review of information science and technology. 38(2004), S.591-632

Rogers, Y.: New theoretical approaches for human-computer interaction (2003) 0.00

0.0015019972 = product of:
  0.006007989 = sum of:
    0.006007989 = weight(_text_:information in 4270) [ClassicSimilarity], result of:
      0.006007989 = score(doc=4270,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.06788416 = fieldWeight in 4270, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4270)
  0.25 = coord(1/4)

Source: Annual review of information science and technology. 38(2004), S.87-144

Search (90 results, page 5 of 5)

Authors

Languages

Types

Themes