Search (6 results, page 1 of 1)

Maaten, L. van den: Learning a parametric embedding by preserving local structure (2009) 0.00
```
0.0026473717 = product of:
  0.0052947435 = sum of:
    0.0052947435 = product of:
      0.010589487 = sum of:
        0.010589487 = weight(_text_:a in 3883) [ClassicSimilarity], result of:
          0.010589487 = score(doc=3883,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.19940455 = fieldWeight in 3883, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3883)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The paper presents a new unsupervised dimensionality reduction technique, called parametric t-SNE, that learns a parametric mapping between the high-dimensional data space and the low-dimensional latent space. Parametric t-SNE learns the parametric mapping in such a way that the local structure of the data is preserved as well as possible in the latent space. We evaluate the performance of parametric t-SNE in experiments on three datasets, in which we compare it to the performance of two other unsupervised parametric dimensionality reduction techniques. The results of experiments illustrate the strong performance of parametric t-SNE, in particular, in learning settings in which the dimensionality of the latent space is relatively low.

Type

a
Maaten, L. van den; Hinton, G.: Visualizing data using t-SNE (2008) 0.00
```
0.0023919214 = product of:
  0.0047838427 = sum of:
    0.0047838427 = product of:
      0.009567685 = sum of:
        0.009567685 = weight(_text_:a in 3888) [ClassicSimilarity], result of:
          0.009567685 = score(doc=3888,freq=16.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.18016359 = fieldWeight in 3888, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3888)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We present a new technique called "t-SNE" that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large data sets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of data sets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the data sets.

Type

a
Beagle, D.: Visualizing keyword distribution across multidisciplinary c-space (2003) 0.00
```
0.002325213 = product of:
  0.004650426 = sum of:
    0.004650426 = product of:
      0.009300852 = sum of:
        0.009300852 = weight(_text_:a in 1202) [ClassicSimilarity], result of:
          0.009300852 = score(doc=1202,freq=42.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17513901 = fieldWeight in 1202, product of:
              6.4807405 = tf(freq=42.0), with freq of:
                42.0 = termFreq=42.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1202)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The concept of c-space is proposed as a visualization schema relating containers of content to cataloging surrogates and classification structures. Possible applications of keyword vector clusters within c-space could include improved retrieval rates through the use of captioning within visual hierarchies, tracings of semantic bleeding among subclasses, and access to buried knowledge within subject-neutral publication containers. The Scholastica Project is described as one example, following a tradition of research dating back to the 1980's. Preliminary focus group assessment indicates that this type of classification rendering may offer digital library searchers enriched entry strategies and an expanded range of re-entry vocabularies. Those of us who work in traditional libraries typically assume that our systems of classification: Library of Congress Classification (LCC) and Dewey Decimal Classification (DDC), are descriptive rather than prescriptive. In other words, LCC classes and subclasses approximate natural groupings of texts that reflect an underlying order of knowledge, rather than arbitrary categories prescribed by librarians to facilitate efficient shelving. Philosophical support for this assumption has traditionally been found in a number of places, from the archetypal tree of knowledge, to Aristotelian categories, to the concept of discursive formations proposed by Michel Foucault. Gary P. Radford has elegantly described an encounter with Foucault's discursive formations in the traditional library setting: "Just by looking at the titles on the spines, you can see how the books cluster together...You can identify those books that seem to form the heart of the discursive formation and those books that reside on the margins. Moving along the shelves, you see those books that tend to bleed over into other classifications and that straddle multiple discursive formations. You can physically and sensually experience...those points that feel like state borders or national boundaries, those points where one subject ends and another begins, or those magical places where one subject has morphed into another..."
But what happens to this awareness in a digital library? Can discursive formations be represented in cyberspace, perhaps through diagrams in a visualization interface? And would such a schema be helpful to a digital library user? To approach this question, it is worth taking a moment to reconsider what Radford is looking at. First, he looks at titles to see how the books cluster. To illustrate, I scanned one hundred books on the shelves of a college library under subclass HT 101-395, defined by the LCC subclass caption as Urban groups. The City. Urban sociology. Of the first 100 titles in this sequence, fifty included the word "urban" or variants (e.g. "urbanization"). Another thirty-five used the word "city" or variants. These keywords appear to mark their titles as the heart of this discursive formation. The scattering of titles not using "urban" or "city" used related terms such as "town," "community," or in one case "skyscrapers." So we immediately see some empirical correlation between keywords and classification. But we also see a problem with the commonly used search technique of title-keyword. A student interested in urban studies will want to know about this entire subclass, and may wish to browse every title available therein. A title-keyword search on "urban" will retrieve only half of the titles, while a search on "city" will retrieve just over a third. There will be no overlap, since no titles in this sample contain both words. The only place where both words appear in a common string is in the LCC subclass caption, but captions are not typically indexed in library Online Public Access Catalogs (OPACs). In a traditional library, this problem is mitigated when the student goes to the shelf looking for any one of the books and suddenly discovers a much wider selection than the keyword search had led him to expect. But in a digital library, the issue of non-retrieval can be more problematic, as studies have indicated. Micco and Popp reported that, in a study funded partly by the U.S. Department of Education, 65 of 73 unskilled users searching for material on U.S./Soviet foreign relations found some material but never realized they had missed a large percentage of what was in the database.

Type

a
Zhang, J.; Mostafa, J.; Tripathy, H.: Information retrieval by semantic analysis and visualization of the concept space of D-Lib® magazine (2002) 0.00
```
0.0021560488 = product of:
  0.0043120976 = sum of:
    0.0043120976 = product of:
      0.008624195 = sum of:
        0.008624195 = weight(_text_:a in 1211) [ClassicSimilarity], result of:
          0.008624195 = score(doc=1211,freq=52.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.16239727 = fieldWeight in 1211, product of:
              7.2111025 = tf(freq=52.0), with freq of:
                52.0 = termFreq=52.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1211)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this article we present a method for retrieving documents from a digital library through a visual interface based on automatically generated concepts. We used a vocabulary generation algorithm to generate a set of concepts for the digital library and a technique called the max-min distance technique to cluster them. Additionally, the concepts were visualized in a spring embedding graph layout to depict the semantic relationship among them. The resulting graph layout serves as an aid to users for retrieving documents. An online archive containing the contents of D-Lib Magazine from July 1995 to May 2002 was used to test the utility of an implemented retrieval and visualization system. We believe that the method developed and tested can be applied to many different domains to help users get a better understanding of online document collections and to minimize users' cognitive load during execution of search tasks. Over the past few years, the volume of information available through the World Wide Web has been expanding exponentially. Never has so much information been so readily available and shared among so many people. Unfortunately, the unstructured nature and huge volume of information accessible over networks have made it hard for users to sift through and find relevant information. To deal with this problem, information retrieval (IR) techniques have gained more intensive attention from both industrial and academic researchers. Numerous IR techniques have been developed to help deal with the information overload problem. These techniques concentrate on mathematical models and algorithms for retrieval. Popular IR models such as the Boolean model, the vector-space model, the probabilistic model and their variants are well established.
From the user's perspective, however, it is still difficult to use current information retrieval systems. Users frequently have problems expressing their information needs and translating those needs into queries. This is partly due to the fact that information needs cannot be expressed appropriately in systems terms. It is not unusual for users to input search terms that are different from the index terms information systems use. Various methods have been proposed to help users choose search terms and articulate queries. One widely used approach is to incorporate into the information system a thesaurus-like component that represents both the important concepts in a particular subject area and the semantic relationships among those concepts. Unfortunately, the development and use of thesauri is not without its own problems. The thesaurus employed in a specific information system has often been developed for a general subject area and needs significant enhancement to be tailored to the information system where it is to be used. This thesaurus development process, if done manually, is both time consuming and labor intensive. Usage of a thesaurus in searching is complex and may raise barriers for the user. For illustration purposes, let us consider two scenarios of thesaurus usage. In the first scenario the user inputs a search term and the thesaurus then displays a matching set of related terms. Without an overview of the thesaurus - and without the ability to see the matching terms in the context of other terms - it may be difficult to assess the quality of the related terms in order to select the correct term. In the second scenario the user browses the whole thesaurus, which is organized as in an alphabetically ordered list. The problem with this approach is that the list may be long, and neither does it show users the global semantic relationship among all the listed terms.
Nevertheless, because thesaurus use has shown to improve retrieval, for our method we integrate functions in the search interface that permit users to explore built-in search vocabularies to improve retrieval from digital libraries. Our method automatically generates the terms and their semantic relationships representing relevant topics covered in a digital library. We call these generated terms the "concepts", and the generated terms and their semantic relationships we call the "concept space". Additionally, we used a visualization technique to display the concept space and allow users to interact with this space. The automatically generated term set is considered to be more representative of subject area in a corpus than an "externally" imposed thesaurus, and our method has the potential of saving a significant amount of time and labor for those who have been manually creating thesauri as well. Information visualization is an emerging discipline and developed very quickly in the last decade. With growing volumes of documents and associated complexities, information visualization has become increasingly important. Researchers have found information visualization to be an effective way to use and understand information while minimizing a user's cognitive load. Our work was based on an algorithmic approach of concept discovery and association. Concepts are discovered using an algorithm based on an automated thesaurus generation procedure. Subsequently, similarities among terms are computed using the cosine measure, and the associations among terms are established using a method known as max-min distance clustering. The concept space is then visualized in a spring embedding graph, which roughly shows the semantic relationships among concepts in a 2-D visual representation. The semantic space of the visualization is used as a medium for users to retrieve the desired documents. In the remainder of this article, we present our algorithmic approach of concept generation and clustering, followed by description of the visualization technique and interactive interface. The paper ends with key conclusions and discussions on future work.

Content

The JAVA applet is available at <http://ella.slis.indiana.edu/~junzhang/dlib/IV.html>. A prototype of this interface has been developed and is available at <http://ella.slis.indiana.edu/~junzhang/dlib/IV.html>. The D-Lib search interface is available at <http://www.dlib.org/Architext/AT-dlib2query.html>.

Type

a
Dushay, N.: Visualizing bibliographic metadata : a virtual (book) spine viewer (2004) 0.00
```
0.0020296127 = product of:
  0.0040592253 = sum of:
    0.0040592253 = product of:
      0.008118451 = sum of:
        0.008118451 = weight(_text_:a in 1197) [ClassicSimilarity], result of:
          0.008118451 = score(doc=1197,freq=32.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15287387 = fieldWeight in 1197, product of:
              5.656854 = tf(freq=32.0), with freq of:
                32.0 = termFreq=32.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1197)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

User interfaces for digital information discovery often require users to click around and read a lot of text in order to find the text they want to read-a process that is often frustrating and tedious. This is exacerbated because of the limited amount of text that can be displayed on a computer screen. To improve the user experience of computer mediated information discovery, information visualization techniques are applied to the digital library context, while retaining traditional information organization concepts. In this article, the "virtual (book) spine" and the virtual spine viewer are introduced. The virtual spine viewer is an application which allows users to visually explore large information spaces or collections while also allowing users to hone in on individual resources of interest. The virtual spine viewer introduced here is an alpha prototype, presented to promote discussion and further work. Information discovery changed radically with the introduction of computerized library access catalogs, the World Wide Web and its search engines, and online bookstores. Yet few instances of these technologies provide a user experience analogous to walking among well-organized, well-stocked bookshelves-which many people find useful as well as pleasurable. To put it another way, many of us have heard or voiced complaints about the paucity of "online browsing"-but what does this really mean? In traditional information spaces such as libraries, often we can move freely among the books and other resources. When we walk among organized, labeled bookshelves, we get a sense of the information space-we take in clues, perhaps unconsciously, as to the scope of the collection, the currency of resources, the frequency of their use, etc. We also enjoy unexpected discoveries such as finding an interesting resource because library staff deliberately located it near similar resources, or because it was miss-shelved, or because we saw it on a bookshelf on the way to the water fountain.
When our experience of information discovery is mediated by a computer, we neither move ourselves nor the monitor. We have only the computer's monitor to view, and the keyboard and/or mouse to manipulate what is displayed there. Computer interfaces often reduce our ability to get a sense of the contents of a library: we don't perceive the scope of the library: its breadth, (the quantity of materials/information), its density (how full the shelves are, how thorough the collection is for individual topics), or the general audience for the materials (e.g., whether the materials are appropriate for middle school students, college professors, etc.). Additionally, many computer interfaces for information discovery require users to scroll through long lists, to click numerous navigational links and to read a lot of text to find the exact text they want to read. Text features of resources are almost always presented alphabetically, and the number of items in these alphabetical lists sometimes can be very long. Alphabetical ordering is certainly an improvement over no ordering, but it generally has no bearing on features with an inherent non-alphabetical ordering (e.g., dates of historical events), nor does it necessarily group similar items together. Alphabetical ordering of resources is analogous to one of the most familiar complaints about dictionaries: sometimes you need to know how to spell a word in order to look up its correct spelling in the dictionary. Some have used technology to replicate the appearance of physical libraries, presenting rooms of bookcases and shelves of book spines in virtual 3D environments. This approach presents a problem, as few book spines can be displayed legibly on a monitor screen. This article examines the role of book spines, call numbers, and other traditional organizational and information discovery concepts, and integrates this knowledge with information visualization techniques to show how computers and monitors can meet or exceed similar information discovery methods. The goal is to tap the unique potentials of current information visualization approaches in order to improve information discovery, offer new services, and most important of all, improve user satisfaction. We need to capitalize on what computers do well while bearing in mind their limitations. The intent is to design GUIs to optimize utility and provide a positive experience for the user.

Type

a
Linden, E.J. van der; Vliegen, R.; Wijk, J.J. van: Visual Universal Decimal Classification (2007) 0.00
```
0.0014647468 = product of:
  0.0029294936 = sum of:
    0.0029294936 = product of:
      0.005858987 = sum of:
        0.005858987 = weight(_text_:a in 548) [ClassicSimilarity], result of:
          0.005858987 = score(doc=548,freq=6.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.11032722 = fieldWeight in 548, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=548)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

UDC aims to be a consistent and complete classification system, that enables practitioners to classify documents swiftly and smoothly. The eventual goal of UDC is to enable the public at large to retrieve documents from large collections of documents that are classified with UDC. The large size of the UDC Master Reference File, MRF with over 66.000 records, makes it difficult to obtain an overview and to understand its structure. Moreover, finding the right classification in MRF turns out to be difficult in practice. Last but not least, retrieval of documents requires insight and understanding of the coding system. Visualization is an effective means to support the development of UDC as well as its use by practitioners. Moreover, visualization offers possibilities to use the classification without use of the coding system as such. MagnaView has developed an application which demonstrates the use of interactive visualization to face these challenges. In our presentation, we discuss these challenges, and we give a demonstration of the way the application helps face these. Examples of visualizations can be found below.

Type

a

Search (6 results, page 1 of 1)

Authors

Themes