Search (73 results, page 1 of 4)

Bradford, R.B.: Relationship discovery in large text collections using Latent Semantic Indexing (2006) 0.02
```
0.022275176 = product of:
  0.037125293 = sum of:
    0.0066520358 = product of:
      0.033260178 = sum of:
        0.033260178 = weight(_text_:problem in 1163) [ClassicSimilarity], result of:
          0.033260178 = score(doc=1163,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.1875815 = fieldWeight in 1163, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.03125 = fieldNorm(doc=1163)
      0.2 = coord(1/5)
    0.019153563 = weight(_text_:of in 1163) [ClassicSimilarity], result of:
      0.019153563 = score(doc=1163,freq=36.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.2932045 = fieldWeight in 1163, product of:
          6.0 = tf(freq=36.0), with freq of:
            36.0 = termFreq=36.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=1163)
    0.011319693 = product of:
      0.022639386 = sum of:
        0.022639386 = weight(_text_:22 in 1163) [ClassicSimilarity], result of:
          0.022639386 = score(doc=1163,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.15476047 = fieldWeight in 1163, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1163)
      0.5 = coord(1/2)
  0.6 = coord(3/5)
```
Abstract

This paper addresses the problem of information discovery in large collections of text. For users, one of the key problems in working with such collections is determining where to focus their attention. In selecting documents for examination, users must be able to formulate reasonably precise queries. Queries that are too broad will greatly reduce the efficiency of information discovery efforts by overwhelming the users with peripheral information. In order to formulate efficient queries, a mechanism is needed to automatically alert users regarding potentially interesting information contained within the collection. This paper presents the results of an experiment designed to test one approach to generation of such alerts. The technique of latent semantic indexing (LSI) is used to identify relationships among entities of interest. Entity extraction software is used to pre-process the text of the collection so that the LSI space contains representation vectors for named entities in addition to those for individual terms. In the LSI space, the cosine of the angle between the representation vectors for two entities captures important information regarding the degree of association of those two entities. For appropriate choices of entities, determining the entity pairs with the highest mutual cosine values yields valuable information regarding the contents of the text collection. The test database used for the experiment consists of 150,000 news articles. The proposed approach for alert generation is tested using a counterterrorism analysis example. The approach is shown to have significant potential for aiding users in rapidly focusing on information of potential importance in large text collections. The approach also has value in identifying possible use of aliases.

Source

Proceedings of the Fourth Workshop on Link Analysis, Counterterrorism, and Security, SIAM Data Mining Conference, Bethesda, MD, 20-22 April, 2006. [http://www.siam.org/meetings/sdm06/workproceed/Link%20Analysis/15.pdf]

Sacco, G.M.: Dynamic taxonomies and guided searches (2006) 0.02

0.01667951 = product of:
  0.041698776 = sum of:
    0.013683967 = weight(_text_:of in 5295) [ClassicSimilarity], result of:
      0.013683967 = score(doc=5295,freq=6.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.20947541 = fieldWeight in 5295, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5295)
    0.02801481 = product of:
      0.05602962 = sum of:
        0.05602962 = weight(_text_:22 in 5295) [ClassicSimilarity], result of:
          0.05602962 = score(doc=5295,freq=4.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.38301262 = fieldWeight in 5295, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5295)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: A new search paradigm, in which the primary user activity is the guided exploration of a complex information space rather than the retrieval of items based on precise specifications, is proposed. The author claims that this paradigm is the norm in most practical applications, and that solutions based on traditional search methods are not effective in this context. He then presents a solution based on dynamic taxonomies, a knowledge management model that effectively guides users to reach their goal while giving them total freedom in exploring the information base. Applications, benefits, and current research are discussed.
Date: 22. 7.2006 17:56:22
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.6, S.792-796

Faaborg, A.; Lagoze, C.: Semantic browsing (2003) 0.01

0.014244139 = product of:
  0.035610348 = sum of:
    0.015800884 = weight(_text_:of in 1026) [ClassicSimilarity], result of:
      0.015800884 = score(doc=1026,freq=8.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.24188137 = fieldWeight in 1026, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1026)
    0.019809462 = product of:
      0.039618924 = sum of:
        0.039618924 = weight(_text_:22 in 1026) [ClassicSimilarity], result of:
          0.039618924 = score(doc=1026,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.2708308 = fieldWeight in 1026, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1026)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: We have created software applications that allow users to both author and use Semantic Web metadata. To create and use a layer of semantic content on top of the existing Web, we have (1) implemented a user interface that expedites the task of attributing metadata to resources on the Web, and (2) augmented a Web browser to leverage this semantic metadata to provide relevant information and tasks to the user. This project provides a framework for annotating and reorganizing existing files, pages, and sites on the Web that is similar to Vannevar Bushrsquos original concepts of trail blazing and associative indexing.
Source: Research and advanced technology for digital libraries : 7th European Conference, proceedings / ECDL 2003, Trondheim, Norway, August 17-22, 2003

Tudhope, D.; Alani, H.; Jones, C.: Augmenting thesaurus relationships : possibilities for retrieval (2001) 0.01
```
0.013670124 = product of:
  0.03417531 = sum of:
    0.008315044 = product of:
      0.041575223 = sum of:
        0.041575223 = weight(_text_:problem in 1520) [ClassicSimilarity], result of:
          0.041575223 = score(doc=1520,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.23447686 = fieldWeight in 1520, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1520)
      0.2 = coord(1/5)
    0.025860265 = weight(_text_:of in 1520) [ClassicSimilarity], result of:
      0.025860265 = score(doc=1520,freq=42.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.39587128 = fieldWeight in 1520, product of:
          6.4807405 = tf(freq=42.0), with freq of:
            42.0 = termFreq=42.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1520)
  0.4 = coord(2/5)
```
Abstract

This paper discusses issues concerning the augmentation of thesaurus relationships, in light of new application possibilities for retrieval. We first discuss a case study that explored the retrieval potential of an augmented set of thesaurus relationships by specialising standard relationships into richer subtypes, in particular hierarchical geographical containment and the associative relationship. We then locate this work in a broader context by reviewing various attempts to build taxonomies of thesaurus relationships, and conclude by discussing the feasibility of hierarchically augmenting the core set of thesaurus relationships, particularly the associative relationship. We discuss the possibility of enriching the specification and semantics of Related Term (RT relationships), while maintaining compatibility with traditional thesauri via a limited hierarchical extension of the associative (and hierarchical) relationships. This would be facilitated by distinguishing the type of term from the (sub)type of relationship and explicitly specifying semantic categories for terms following a faceted approach. We first illustrate how hierarchical spatial relationships can be used to provide more flexible retrieval for queries incorporating place names in applications employing online gazetteers and geographical thesauri. We then employ a set of experimental scenarios to investigate key issues affecting use of the associative (RT) thesaurus relationships in semantic distance measures. Previous work has noted the potential of RTs in thesaurus search aids but also the problem of uncontrolled expansion of query term sets. Results presented in this paper suggest the potential for taking account of the hierarchical context of an RT link and specialisations of the RT relationship

Source

Journal of digital information. 1(2001) no.8
Hovy, E.: Comparing sets of semantic relations in ontologies (2002) 0.01
```
0.013374514 = product of:
  0.033436283 = sum of:
    0.009978054 = product of:
      0.04989027 = sum of:
        0.04989027 = weight(_text_:problem in 2178) [ClassicSimilarity], result of:
          0.04989027 = score(doc=2178,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.28137225 = fieldWeight in 2178, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=2178)
      0.2 = coord(1/5)
    0.02345823 = weight(_text_:of in 2178) [ClassicSimilarity], result of:
      0.02345823 = score(doc=2178,freq=24.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.3591007 = fieldWeight in 2178, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2178)
  0.4 = coord(2/5)
```
Abstract

A set of semantic relations is created every time a domain modeler wants to solve some complex problem computationally. These relations are usually organized into ontologies. But three is little standardization of ontologies today, and almost no discussion an ways of comparing relations, of determining a general approach to creating relations, or of modeling in general. This chapter outlines an approach to establishing a general methodology for comparing and justifying sets of relations (and ontologies in general). It first provides several dozen characteristics of ontologies, organized into three taxonomies of increasingly detailed features, by which many essential characteristics of ontologies can be described. These features enable one to compare ontologies at a general level, without studying every concept they contain. But sometimes it is necessary to make detailed comparisons of content. The chapter then illustrates one method for determining salient points for comparison, using algorithms that semi-automatically identify similarities and differences between ontologies.

Source

The semantics of relationships: an interdisciplinary perspective. Eds: Green, R., C.A. Bean u. S.H. Myaeng
Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.01
```
0.012797958 = product of:
  0.031994894 = sum of:
    0.017845279 = weight(_text_:of in 1428) [ClassicSimilarity], result of:
      0.017845279 = score(doc=1428,freq=20.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.27317715 = fieldWeight in 1428, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1428)
    0.0141496165 = product of:
      0.028299233 = sum of:
        0.028299233 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
          0.028299233 = score(doc=1428,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.19345059 = fieldWeight in 1428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1428)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.

Date

22. 3.2003 19:35:46

Source

Journal of the American Society for Information Science and technology. 54(2003) no.4, S.321-334
Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.01
```
0.012797958 = product of:
  0.031994894 = sum of:
    0.017845279 = weight(_text_:of in 56) [ClassicSimilarity], result of:
      0.017845279 = score(doc=56,freq=20.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.27317715 = fieldWeight in 56, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=56)
    0.0141496165 = product of:
      0.028299233 = sum of:
        0.028299233 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
          0.028299233 = score(doc=56,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.19345059 = fieldWeight in 56, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.

Date

22. 7.2006 16:32:43

Source

Journal of the American Society for Information Science and Technology. 57(2006) no.4, S.462-478

Knorz, G.; Rein, B.: Semantische Suche in einer Hochschulontologie (2005) 0.01

0.01258021 = product of:
  0.031450525 = sum of:
    0.011641062 = product of:
      0.05820531 = sum of:
        0.05820531 = weight(_text_:problem in 1852) [ClassicSimilarity], result of:
          0.05820531 = score(doc=1852,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.3282676 = fieldWeight in 1852, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1852)
      0.2 = coord(1/5)
    0.019809462 = product of:
      0.039618924 = sum of:
        0.039618924 = weight(_text_:22 in 1852) [ClassicSimilarity], result of:
          0.039618924 = score(doc=1852,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.2708308 = fieldWeight in 1852, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1852)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Ontologien werden eingesetzt, um durch semantische Fundierung insbesondere für das Dokumentenretrieval eine grundlegend bessere Basis zu haben, als dies gegenwärtiger Stand der Technik ist. Vorgestellt wird eine an der FH Darmstadt entwickelte und eingesetzte Ontologie, die den Gegenstandsbereich Hochschule sowohl breit abdecken und gleichzeitig differenziert semantisch beschreiben soll. Das Problem der semantischen Suche besteht nun darin, dass sie für Informationssuchende so einfach wie bei gängigen Suchmaschinen zu nutzen sein soll, und gleichzeitig auf der Grundlage des aufwendigen Informationsmodells hochwertige Ergebnisse liefern muss. Es wird beschrieben, welche Möglichkeiten die verwendete Software K-Infinity bereitstellt und mit welchem Konzept diese Möglichkeiten für eine semantische Suche nach Dokumenten und anderen Informationseinheiten (Personen, Veranstaltungen, Projekte etc.) eingesetzt werden.
Date: 11. 2.2011 18:22:58

Zazo, A.F.; Figuerola, C.G.; Berrocal, J.L.A.; Rodriguez, E.: Reformulation of queries using similarity-thesauri (2005) 0.01

0.0116526475 = product of:
  0.029131617 = sum of:
    0.009978054 = product of:
      0.04989027 = sum of:
        0.04989027 = weight(_text_:problem in 1043) [ClassicSimilarity], result of:
          0.04989027 = score(doc=1043,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.28137225 = fieldWeight in 1043, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=1043)
      0.2 = coord(1/5)
    0.019153563 = weight(_text_:of in 1043) [ClassicSimilarity], result of:
      0.019153563 = score(doc=1043,freq=16.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.2932045 = fieldWeight in 1043, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1043)
  0.4 = coord(2/5)

Abstract: One of the major problems in information retrieval is the formulation of queries on the part of the user. This entails specifying a set of words or terms that express their informational need. However, it is well-known that two people can assign different terms to refer to the same concepts. The techniques that attempt to reduce this problem as much as possible generally start from a first search, and then study how the initial query can be modified to obtain better results. In general, the construction of the new query involves expanding the terms of the initial query and recalculating the importance of each term in the expanded query. Depending on the technique used to formulate the new query several strategies are distinguished. These strategies are based on the idea that if two terms are similar (with respect to any criterion), the documents in which both terms appear frequently will also be related. The technique we used in this study is known as query expansion using similarity thesauri.

Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.01
```
0.011483461 = product of:
  0.028708652 = sum of:
    0.011729115 = weight(_text_:of in 2419) [ClassicSimilarity], result of:
      0.011729115 = score(doc=2419,freq=6.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.17955035 = fieldWeight in 2419, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2419)
    0.016979538 = product of:
      0.033959076 = sum of:
        0.033959076 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
          0.033959076 = score(doc=2419,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.23214069 = fieldWeight in 2419, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.

Date

16.11.2008 16:22:48
Niemi, T.; Jämsen , J.: ¬A query language for discovering semantic associations, part I : approach and formal definition of query primitives (2007) 0.01
```
0.011145428 = product of:
  0.02786357 = sum of:
    0.008315044 = product of:
      0.041575223 = sum of:
        0.041575223 = weight(_text_:problem in 591) [ClassicSimilarity], result of:
          0.041575223 = score(doc=591,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.23447686 = fieldWeight in 591, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=591)
      0.2 = coord(1/5)
    0.019548526 = weight(_text_:of in 591) [ClassicSimilarity], result of:
      0.019548526 = score(doc=591,freq=24.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.2992506 = fieldWeight in 591, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=591)
  0.4 = coord(2/5)
```
Abstract

In contemporary query languages, the user is responsible for navigation among semantically related data. Because of the huge amount of data and the complex structural relationships among data in modern applications, it is unrealistic to suppose that the user could know completely the content and structure of the available information. There are several query languages whose purpose is to facilitate navigation in unknown structures of databases. However, the background assumption of these languages is that the user knows how data are related to each other semantically in the structure at hand. So far only little attention has been paid to how unknown semantic associations among available data can be discovered. We address this problem in this article. A semantic association between two entities can be constructed if a sequence of relationships expressed explicitly in a database can be found that connects these entities to each other. This sequence may contain several other entities through which the original entities are connected to each other indirectly. We introduce an expressive and declarative query language for discovering semantic associations. Our query language is able, for example, to discover semantic associations between entities for which only some of the characteristics are known. Further, it integrates the manipulation of semantic associations with the manipulation of documents that may contain information on entities in semantic associations.

Content

Part II: Journal of the American Society for Information Science and Technology. 58(2007) no.11, S.1686-1700.

Source

Journal of the American Society for Information Science and Technology. 58(2007) no.11, S.1559-1568
Tudhope, D.; Binding, C.; Blocks, D.; Cunliffe, D.: Compound descriptors in context : a matching function for classifications and thesauri (2002) 0.01
```
0.010464129 = product of:
  0.026160322 = sum of:
    0.008315044 = product of:
      0.041575223 = sum of:
        0.041575223 = weight(_text_:problem in 3179) [ClassicSimilarity], result of:
          0.041575223 = score(doc=3179,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.23447686 = fieldWeight in 3179, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3179)
      0.2 = coord(1/5)
    0.017845279 = weight(_text_:of in 3179) [ClassicSimilarity], result of:
      0.017845279 = score(doc=3179,freq=20.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.27317715 = fieldWeight in 3179, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3179)
  0.4 = coord(2/5)
```
Abstract

There are many advantages for Digital Libraries in indexing with classifications or thesauri, but some current disincentive in the lack of flexible retrieval tools that deal with compound descriptors. This paper discusses a matching function for compound descriptors, or multi-concept subject headings, that does not rely an exact matching but incorporates term expansion via thesaurus semantic relationships to produce ranked results that take account of missing and partially matching terms. The matching function is based an a measure of semantic closeness between terms, which has the potential to help with recall problems. The work reported is part of the ongoing FACET project in collaboration with the National Museum of Science and Industry and its collections database. The architecture of the prototype system and its Interface are outlined. The matching problem for compound descriptors is reviewed and the FACET implementation described. Results are discussed from scenarios using the faceted Getty Art and Architecture Thesaurus. We argue that automatic traversal of thesaurus relationships can augment the user's browsing possibilities. The techniques can be applied both to unstructured multi-concept subject headings and potentially to more syntactically structured strings. The notion of a focus term is used by the matching function to model AAT modified descriptors (noun phrases). The relevance of the approach to precoordinated indexing and matching faceted strings is discussed.

Source

Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries : JCDL 2002 ; July 14 - 18, 2002, Portland, Oregon, USA. Ed. by Gary Marchionini
Lehtokangas, R.; Järvelin, K.: Consistency of textual expression in newspaper articles : an argument for semantically based query expansion (2001) 0.01
```
0.010464129 = product of:
  0.026160322 = sum of:
    0.008315044 = product of:
      0.041575223 = sum of:
        0.041575223 = weight(_text_:problem in 4485) [ClassicSimilarity], result of:
          0.041575223 = score(doc=4485,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.23447686 = fieldWeight in 4485, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4485)
      0.2 = coord(1/5)
    0.017845279 = weight(_text_:of in 4485) [ClassicSimilarity], result of:
      0.017845279 = score(doc=4485,freq=20.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.27317715 = fieldWeight in 4485, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4485)
  0.4 = coord(2/5)
```
Abstract

This article investigates how consistent different newspapers are in their choice of words when writing about the same news events. News articles on the same news events were taken from three Finnish newspapers and compared in regard to their central concepts and words representing the concepts in the news texts. Consistency figures were calculated for each set of three articles (the total number of sets was sixty). Inconsistency in words and concepts was found between news articles from different newspapers. The mean value of consistency calculated on the basis of words was 65 per cent; this however depended on the article length. For short news wires consistency was 83 per cent while for long articles it was only 47 per cent. At the concept level, consistency was considerably higher, ranging from 92 per cent to 97 per cent between short and long articles. The articles also represented three categories of topic (event, process and opinion). Statistically significant differences in consistency were found in regard to length but not in regard to the categories of topic. We argue that the expression inconsistency is a clear sign of a retrieval problem and that query expansion based on semantic relationships can significantly improve retrieval performance on free-text sources.

Source

Journal of documentation. 57(2001) no.4, S.535-548
Zhang, J.; Mostafa, J.; Tripathy, H.: Information retrieval by semantic analysis and visualization of the concept space of D-Lib® magazine (2002) 0.01
```
0.008635346 = product of:
  0.021588365 = sum of:
    0.0072010416 = product of:
      0.036005206 = sum of:
        0.036005206 = weight(_text_:problem in 1211) [ClassicSimilarity], result of:
          0.036005206 = score(doc=1211,freq=6.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.20306295 = fieldWeight in 1211, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1211)
      0.2 = coord(1/5)
    0.0143873235 = weight(_text_:of in 1211) [ClassicSimilarity], result of:
      0.0143873235 = score(doc=1211,freq=52.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.22024246 = fieldWeight in 1211, product of:
          7.2111025 = tf(freq=52.0), with freq of:
            52.0 = termFreq=52.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.01953125 = fieldNorm(doc=1211)
  0.4 = coord(2/5)
```
Abstract

In this article we present a method for retrieving documents from a digital library through a visual interface based on automatically generated concepts. We used a vocabulary generation algorithm to generate a set of concepts for the digital library and a technique called the max-min distance technique to cluster them. Additionally, the concepts were visualized in a spring embedding graph layout to depict the semantic relationship among them. The resulting graph layout serves as an aid to users for retrieving documents. An online archive containing the contents of D-Lib Magazine from July 1995 to May 2002 was used to test the utility of an implemented retrieval and visualization system. We believe that the method developed and tested can be applied to many different domains to help users get a better understanding of online document collections and to minimize users' cognitive load during execution of search tasks. Over the past few years, the volume of information available through the World Wide Web has been expanding exponentially. Never has so much information been so readily available and shared among so many people. Unfortunately, the unstructured nature and huge volume of information accessible over networks have made it hard for users to sift through and find relevant information. To deal with this problem, information retrieval (IR) techniques have gained more intensive attention from both industrial and academic researchers. Numerous IR techniques have been developed to help deal with the information overload problem. These techniques concentrate on mathematical models and algorithms for retrieval. Popular IR models such as the Boolean model, the vector-space model, the probabilistic model and their variants are well established.
From the user's perspective, however, it is still difficult to use current information retrieval systems. Users frequently have problems expressing their information needs and translating those needs into queries. This is partly due to the fact that information needs cannot be expressed appropriately in systems terms. It is not unusual for users to input search terms that are different from the index terms information systems use. Various methods have been proposed to help users choose search terms and articulate queries. One widely used approach is to incorporate into the information system a thesaurus-like component that represents both the important concepts in a particular subject area and the semantic relationships among those concepts. Unfortunately, the development and use of thesauri is not without its own problems. The thesaurus employed in a specific information system has often been developed for a general subject area and needs significant enhancement to be tailored to the information system where it is to be used. This thesaurus development process, if done manually, is both time consuming and labor intensive. Usage of a thesaurus in searching is complex and may raise barriers for the user. For illustration purposes, let us consider two scenarios of thesaurus usage. In the first scenario the user inputs a search term and the thesaurus then displays a matching set of related terms. Without an overview of the thesaurus - and without the ability to see the matching terms in the context of other terms - it may be difficult to assess the quality of the related terms in order to select the correct term. In the second scenario the user browses the whole thesaurus, which is organized as in an alphabetically ordered list. The problem with this approach is that the list may be long, and neither does it show users the global semantic relationship among all the listed terms.
Nevertheless, because thesaurus use has shown to improve retrieval, for our method we integrate functions in the search interface that permit users to explore built-in search vocabularies to improve retrieval from digital libraries. Our method automatically generates the terms and their semantic relationships representing relevant topics covered in a digital library. We call these generated terms the "concepts", and the generated terms and their semantic relationships we call the "concept space". Additionally, we used a visualization technique to display the concept space and allow users to interact with this space. The automatically generated term set is considered to be more representative of subject area in a corpus than an "externally" imposed thesaurus, and our method has the potential of saving a significant amount of time and labor for those who have been manually creating thesauri as well. Information visualization is an emerging discipline and developed very quickly in the last decade. With growing volumes of documents and associated complexities, information visualization has become increasingly important. Researchers have found information visualization to be an effective way to use and understand information while minimizing a user's cognitive load. Our work was based on an algorithmic approach of concept discovery and association. Concepts are discovered using an algorithm based on an automated thesaurus generation procedure. Subsequently, similarities among terms are computed using the cosine measure, and the associations among terms are established using a method known as max-min distance clustering. The concept space is then visualized in a spring embedding graph, which roughly shows the semantic relationships among concepts in a 2-D visual representation. The semantic space of the visualization is used as a medium for users to retrieve the desired documents. In the remainder of this article, we present our algorithmic approach of concept generation and clustering, followed by description of the visualization technique and interactive interface. The paper ends with key conclusions and discussions on future work.

Content

The JAVA applet is available at <http://ella.slis.indiana.edu/~junzhang/dlib/IV.html>. A prototype of this interface has been developed and is available at <http://ella.slis.indiana.edu/~junzhang/dlib/IV.html>. The D-Lib search interface is available at <http://www.dlib.org/Architext/AT-dlib2query.html>.
Jun, W.: ¬A knowledge network constructed by integrating classification, thesaurus and metadata in a digital library (2003) 0.01
```
0.00807826 = product of:
  0.02019565 = sum of:
    0.0066520358 = product of:
      0.033260178 = sum of:
        0.033260178 = weight(_text_:problem in 1254) [ClassicSimilarity], result of:
          0.033260178 = score(doc=1254,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.1875815 = fieldWeight in 1254, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.03125 = fieldNorm(doc=1254)
      0.2 = coord(1/5)
    0.013543614 = weight(_text_:of in 1254) [ClassicSimilarity], result of:
      0.013543614 = score(doc=1254,freq=18.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.20732687 = fieldWeight in 1254, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=1254)
  0.4 = coord(2/5)
```
Abstract

Knowledge management in digital libraries is a universal problem. Keyword-based searching is applied everywhere no matter whether the resources are indexed databases or full-text Web pages. In keyword matching, the valuable content description and indexing of the metadata, such as the subject descriptors and the classification notations, are merely treated as common keywords to be matched with the user query. Without the support of vocabulary control tools, such as classification systems and thesauri, the intelligent labor of content analysis, description and indexing in metadata production are seriously wasted. New retrieval paradigms are needed to exploit the potential of the metadata resources. Could classification and thesauri, which contain the condensed intelligence of generations of librarians, be used in a digital library to organize the networked information, especially metadata, to facilitate their usability and change the digital library into a knowledge management environment? To examine that question, we designed and implemented a new paradigm that incorporates a classification system, a thesaurus and metadata. The classification and the thesaurus are merged into a concept network, and the metadata are distributed into the nodes of the concept network according to their subjects. The abstract concept node instantiated with the related metadata records becomes a knowledge node. A coherent and consistent knowledge network is thus formed. It is not only a framework for resource organization but also a structure for knowledge navigation, retrieval and learning. We have built an experimental system based on the Chinese Classification and Thesaurus, which is the most comprehensive and authoritative in China, and we have incorporated more than 5000 bibliographic records in the computing domain from the Peking University Library. The result is encouraging. In this article, we review the tools, the architecture and the implementation of our experimental system, which is called Vision.

Source

Bulletin of the American Society for Information Science. 29(2003) no.2, S.24-28

Boyack, K.W.; Wylie,B.N.; Davidson, G.S.: Information Visualization, Human-Computer Interaction, and Cognitive Psychology : Domain Visualizations (2002) 0.01

0.008004232 = product of:
  0.04002116 = sum of:
    0.04002116 = product of:
      0.08004232 = sum of:
        0.08004232 = weight(_text_:22 in 1352) [ClassicSimilarity], result of:
          0.08004232 = score(doc=1352,freq=4.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.54716086 = fieldWeight in 1352, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1352)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 22. 2.2003 17:25:39
22. 2.2003 18:17:40

Khan, M.S.; Khor, S.: Enhanced Web document retrieval using automatic query expansion (2004) 0.01
```
0.0050675566 = product of:
  0.025337784 = sum of:
    0.025337784 = weight(_text_:of in 2091) [ClassicSimilarity], result of:
      0.025337784 = score(doc=2091,freq=28.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.38787308 = fieldWeight in 2091, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2091)
  0.2 = coord(1/5)
```
Abstract

The ever growing popularity of the Internet as a source of information, coupled with the accompanying growth in the number of documents made available through the World Wide Web, is leading to an increasing demand for more efficient and accurate information retrieval tools. Numerous techniques have been proposed and tried for improving the effectiveness of searching the World Wide Web for documents relevant to a given topic of interest. The specification of appropriate keywords and phrases by the user is crucial for the successful execution of a query as measured by the relevance of documents retrieved. Lack of users' knowledge an the search topic and their changing information needs often make it difficult for them to find suitable keywords or phrases for a query. This results in searches that fail to cover all likely aspects of the topic of interest. We describe a scheme that attempts to remedy this situation by automatically expanding the user query through the analysis of initially retrieved documents. Experimental results to demonstrate the effectiveness of the query expansion scheure are presented.

Source

Journal of the American Society for Information Science and technology. 55(2004) no.1, S.29-40
Sanderson, M.; Lawrie, D.: Building, testing, and applying concept hierarchies (2000) 0.00
```
0.00488322 = product of:
  0.024416098 = sum of:
    0.024416098 = weight(_text_:of in 37) [ClassicSimilarity], result of:
      0.024416098 = score(doc=37,freq=26.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.37376386 = fieldWeight in 37, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=37)
  0.2 = coord(1/5)
```
Abstract

A means of automatically deriving a hierarchical organization of concepts from a set of documents without use of training data or standard clustering techniques is presented. Using a process that extracts salient words and phrases from the documents, these terms are organized hierarchically using a type of co-occurrence known as subsumption. The resulting structure is displayed as a series of hierarchical menus. When generated from a set of retrieved documents, a user browsing the menus gains an overview of their content in a manner distinct from existing techniques. The methods used to build the structure are simple and appear to be effective. The formation and presentation of the hierarchy is described along with a study of some of its properties, including a preliminary experiment, which indicates that users may find the hierarchy a more efficient means of locating relevant documents than the classic method of scanning a ranked document list
Hoang, H.H.; Tjoa, A.M: ¬The state of the art of ontology-based query systems : a comparison of existing approaches (2006) 0.00
```
0.0047777384 = product of:
  0.023888692 = sum of:
    0.023888692 = weight(_text_:of in 792) [ClassicSimilarity], result of:
      0.023888692 = score(doc=792,freq=14.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.36569026 = fieldWeight in 792, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=792)
  0.2 = coord(1/5)
```
Abstract

Based on an in-depth analysis of existing approaches in building ontology-based query systems we discuss and compare the methods, approaches to be used in current query systems using Ontology or the Semantic Web techniques. This paper identifies various relevant research directions in ontology-based querying research. Based on the results of our investigation we summarise the state of the art ontology-based query/search and name areas of further research activities.
Shiri, A.A.; Revie, C.: End-user interaction with thesauri : an evaluation of cognitive overlap in search term selection (2004) 0.00
```
0.0044919094 = product of:
  0.022459546 = sum of:
    0.022459546 = weight(_text_:of in 2658) [ClassicSimilarity], result of:
      0.022459546 = score(doc=2658,freq=22.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.34381276 = fieldWeight in 2658, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2658)
  0.2 = coord(1/5)
```
Abstract

The use of thesaurus-enhanced search tools is an the increase. This paper provides an insight into end-users interaction with and perceptions of such tools. In particular the overlap between users' initial query formulation and thesaurus structures is investigated. This investigation involved the performance of genuine search tasks an the CAB Abstracts database by academic users in the domain of veterinary medicine. The perception of these users regarding the nature and usefulness of the terms suggested from the thesaurus during the search interaction is reported. The results indicated that around 80% of terms entered were matched either exactly or partially to thesaurus terms. Users found over 90% of the terms suggested to be close to their search topics and where terms were selected they indicated that around 50% were to support a 'narrowing down' activity. These findings have implications for the design of thesaurus-enhanced interfaces.

Source

Knowledge organization and the global information society: Proceedings of the 8th International ISKO Conference 13-16 July 2004, London, UK. Ed.: I.C. McIlwaine

Search (73 results, page 1 of 4)

Authors

Languages

Themes