Search (24 results, page 1 of 2)

  • × theme_ss:"Semantisches Umfeld in Indexierung u. Retrieval"
  • × type_ss:"a"
  • × year_i:[2000 TO 2010}
  1. Jun, W.: ¬A knowledge network constructed by integrating classification, thesaurus and metadata in a digital library (2003) 0.03
    0.030942764 = product of:
      0.10829967 = sum of:
        0.016974261 = weight(_text_:subject in 1254) [ClassicSimilarity], result of:
          0.016974261 = score(doc=1254,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.15806471 = fieldWeight in 1254, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03125 = fieldNorm(doc=1254)
        0.035607297 = weight(_text_:classification in 1254) [ClassicSimilarity], result of:
          0.035607297 = score(doc=1254,freq=14.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.37237754 = fieldWeight in 1254, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03125 = fieldNorm(doc=1254)
        0.020110816 = weight(_text_:bibliographic in 1254) [ClassicSimilarity], result of:
          0.020110816 = score(doc=1254,freq=2.0), product of:
            0.11688946 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.03002521 = queryNorm
            0.17204987 = fieldWeight in 1254, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.03125 = fieldNorm(doc=1254)
        0.035607297 = weight(_text_:classification in 1254) [ClassicSimilarity], result of:
          0.035607297 = score(doc=1254,freq=14.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.37237754 = fieldWeight in 1254, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03125 = fieldNorm(doc=1254)
      0.2857143 = coord(4/14)
    
    Abstract
    Knowledge management in digital libraries is a universal problem. Keyword-based searching is applied everywhere no matter whether the resources are indexed databases or full-text Web pages. In keyword matching, the valuable content description and indexing of the metadata, such as the subject descriptors and the classification notations, are merely treated as common keywords to be matched with the user query. Without the support of vocabulary control tools, such as classification systems and thesauri, the intelligent labor of content analysis, description and indexing in metadata production are seriously wasted. New retrieval paradigms are needed to exploit the potential of the metadata resources. Could classification and thesauri, which contain the condensed intelligence of generations of librarians, be used in a digital library to organize the networked information, especially metadata, to facilitate their usability and change the digital library into a knowledge management environment? To examine that question, we designed and implemented a new paradigm that incorporates a classification system, a thesaurus and metadata. The classification and the thesaurus are merged into a concept network, and the metadata are distributed into the nodes of the concept network according to their subjects. The abstract concept node instantiated with the related metadata records becomes a knowledge node. A coherent and consistent knowledge network is thus formed. It is not only a framework for resource organization but also a structure for knowledge navigation, retrieval and learning. We have built an experimental system based on the Chinese Classification and Thesaurus, which is the most comprehensive and authoritative in China, and we have incorporated more than 5000 bibliographic records in the computing domain from the Peking University Library. The result is encouraging. In this article, we review the tools, the architecture and the implementation of our experimental system, which is called Vision.
  2. Prieto-Díaz, R.: ¬A faceted approach to building ontologies (2002) 0.01
    0.014107771 = product of:
      0.065836266 = sum of:
        0.02546139 = weight(_text_:subject in 2259) [ClassicSimilarity], result of:
          0.02546139 = score(doc=2259,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.23709705 = fieldWeight in 2259, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.046875 = fieldNorm(doc=2259)
        0.02018744 = weight(_text_:classification in 2259) [ClassicSimilarity], result of:
          0.02018744 = score(doc=2259,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 2259, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=2259)
        0.02018744 = weight(_text_:classification in 2259) [ClassicSimilarity], result of:
          0.02018744 = score(doc=2259,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 2259, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=2259)
      0.21428572 = coord(3/14)
    
    Abstract
    An ontology is "an explicit conceptualization of a domain of discourse, and thus provides a shared and common understanding of the domain." We have been producing ontologies for millennia to understand and explain our rationale and environment. From Plato's philosophical framework to modern day classification systems, ontologies are, in most cases, the product of extensive analysis and categorization. Only recently has the process of building ontologies become a research topic of interest. Today, ontologies are built very much ad-hoc. A terminology is first developed providing a controlled vocabulary for the subject area or domain of interest, then it is organized into a taxonomy where key concepts are identified, and finally these concepts are defined and related to create an ontology. The intent of this paper is to show that domain analysis methods can be used for building ontologies. Domain analysis aims at generic models that represent groups of similar systems within an application domain. In this sense, it deals with categorization of common objects and operations, with clear, unambiguous definitions of them and with defining their relationships.
  3. Morato, J.; Llorens, J.; Genova, G.; Moreiro, J.A.: Experiments in discourse analysis impact on information classification and retrieval algorithms (2003) 0.01
    0.010747734 = product of:
      0.07523414 = sum of:
        0.03761707 = weight(_text_:classification in 1083) [ClassicSimilarity], result of:
          0.03761707 = score(doc=1083,freq=10.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.39339557 = fieldWeight in 1083, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1083)
        0.03761707 = weight(_text_:classification in 1083) [ClassicSimilarity], result of:
          0.03761707 = score(doc=1083,freq=10.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.39339557 = fieldWeight in 1083, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1083)
      0.14285715 = coord(2/14)
    
    Abstract
    Researchers in indexing and retrieval systems have been advocating the inclusion of more contextual information to improve results. The proliferation of full-text databases and advances in computer storage capacity have made it possible to carry out text analysis by means of linguistic and extra-linguistic knowledge. Since the mid 80s, research has tended to pay more attention to context, giving discourse analysis a more central role. The research presented in this paper aims to check whether discourse variables have an impact on modern information retrieval and classification algorithms. In order to evaluate this hypothesis, a functional framework for information analysis in an automated environment has been proposed, where the n-grams (filtering) and the k-means and Chen's classification algorithms have been tested against sub-collections of documents based on the following discourse variables: "Genre", "Register", "Domain terminology", and "Document structure". The results obtained with the algorithms for the different sub-collections were compared to the MeSH information structure. These demonstrate that n-grams does not appear to have a clear dependence on discourse variables, though the k-means classification algorithm does, but only on domain terminology and document structure, and finally Chen's algorithm has a clear dependence on all of the discourse variables. This information could be used to design better classification algorithms, where discourse variables should be taken into account. Other minor conclusions drawn from these results are also presented.
  4. Caro Castro, C.; Travieso Rodríguez, C.: Ariadne's thread : knowledge structures for browsing in OPAC's (2003) 0.01
    0.008752408 = product of:
      0.061266854 = sum of:
        0.03638099 = weight(_text_:subject in 2768) [ClassicSimilarity], result of:
          0.03638099 = score(doc=2768,freq=12.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.33878064 = fieldWeight in 2768, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2768)
        0.024885865 = weight(_text_:bibliographic in 2768) [ClassicSimilarity], result of:
          0.024885865 = score(doc=2768,freq=4.0), product of:
            0.11688946 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.03002521 = queryNorm
            0.21290085 = fieldWeight in 2768, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2768)
      0.14285715 = coord(2/14)
    
    Abstract
    Subject searching is the most common but also the most conflictive searching for end user. The aim of this paper is to check how users expressions match subject headings and to prove if knowledge structure used in online catalogs enhances searching effectiveness. A bibliographic revision about difficulties in subject access and proposed methods to improve it is also presented. For the empirical analysis, transaction logs from two university libraries, online catalogs (CISNE and FAMA) were collected. Results show that more than a quarter of user queries are effective due to an alphabetical subject index approach and browsing through hypertextual links. 1. Introduction Since the 1980's, online public access catalogs (OPAC's) have become usual way to access bibliographic information. During the last two decades the technological development has helped to extend their use, making feasible the access for a whole of users that is getting more and more extensive and heterogeneous, and also to incorporate information resources in electronic formats and to interconnect systems. However, technology seems to have developed faster than our knowledge about the tasks where it has been applied and than the evolution of our capacities for adapting to it. The conceptual model of OPAC has been hardly modified recently, and for interacting with them, users still need to combine the same skills and basic knowledge than at the beginning of its introduction (Borgman, 1986, 2000): a) conceptual knowledge to translate the information need into an appropriate query because of a well-designed mental model of the system, b) semantic and syntactic knowledge to be able to implement that query (access fields, searching type, Boolean logic, etc.) and c) basic technical skills in computing. At present many users have the essential technical skills to make use, with more or less expertise, of a computer. This number is substantially reduced when it is referred to the conceptual, semantic and syntactic knowledge that is necessary to achieve a moderately satisfactory search. An added difficulty arises in subject searching, as users should concrete their unknown information needs in terms that the information retrieval system can understand. Many researches have focused an unskilled searchers' difficulties to enter an effective query. The mental models influence, users assumption about characteristics, structure, contents and operation of the system they interact with have been analysed (Dillon, 2000; Dimitroff, 2000). Another issue that implies difficulties is vocabulary: how to find the right terms to implement a query and to modify it as the case may be. Terminology and expressions characteristics used in searching (Bates, 1993), the match between user terms and the subject headings from the catalog (Carlyle, 1989; Drabensttot, 1996; Drabensttot & Vizine-Goetz, 1994), the incidence of spelling errors (Drabensttot and Weller, 1996; Ferl and Millsap, 1996; Walker and Jones, 1987), users problems
  5. Wang, Y.-H.; Jhuo, P.-S.: ¬A semantic faceted search with rule-based inference (2009) 0.01
    0.008156957 = product of:
      0.057098698 = sum of:
        0.028549349 = weight(_text_:classification in 540) [ClassicSimilarity], result of:
          0.028549349 = score(doc=540,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.29856625 = fieldWeight in 540, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=540)
        0.028549349 = weight(_text_:classification in 540) [ClassicSimilarity], result of:
          0.028549349 = score(doc=540,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.29856625 = fieldWeight in 540, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=540)
      0.14285715 = coord(2/14)
    
    Abstract
    Semantic Search has become an active research of Semantic Web in recent years. The classification methodology plays a pretty critical role in the beginning of search process to disambiguate irrelevant information. However, the applications related to Folksonomy suffer from many obstacles. This study attempts to eliminate the problems resulted from Folksonomy using existing semantic technology. We also focus on how to effectively integrate heterogeneous ontologies over the Internet to acquire the integrity of domain knowledge. A faceted logic layer is abstracted in order to strengthen category framework and organize existing available ontologies according to a series of steps based on the methodology of faceted classification and ontology construction. The result showed that our approach can facilitate the integration of inconsistent or even heterogeneous ontologies. This paper also generalizes the principles of picking appropriate facets with which our facet browser completely complies so that better semantic search result can be obtained.
  6. Pahlevi, S.M.; Kitagawa, H.: Conveying taxonomy context for topic-focused Web search (2005) 0.01
    0.00576784 = product of:
      0.04037488 = sum of:
        0.02018744 = weight(_text_:classification in 3310) [ClassicSimilarity], result of:
          0.02018744 = score(doc=3310,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 3310, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=3310)
        0.02018744 = weight(_text_:classification in 3310) [ClassicSimilarity], result of:
          0.02018744 = score(doc=3310,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 3310, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=3310)
      0.14285715 = coord(2/14)
    
    Abstract
    Introducing context to a user query is effective to improve the search effectiveness. In this article we propose a method employing the taxonomy-based search services such as Web directories to facilitate searches in any Web search interfaces that support Boolean queries. The proposed method enables one to convey current search context an taxonomy of a taxonomy-based search service to the searches conducted with the Web search interfaces. The basic idea is to learn the search context in the form of a Boolean condition that is commonly accepted by many Web search interfaces, and to use the condition to modify the user query before forwarding it to the Web search interfaces. To guarantee that the modified query can always be processed by the Web search interfaces and to make the method adaptive to different user requirements an search result effectiveness, we have developed new fast classification learning algorithms.
  7. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.01
    0.0050137215 = product of:
      0.0701921 = sum of:
        0.0701921 = sum of:
          0.04985209 = weight(_text_:texts in 1428) [ClassicSimilarity], result of:
            0.04985209 = score(doc=1428,freq=2.0), product of:
              0.16460659 = queryWeight, product of:
                5.4822793 = idf(docFreq=499, maxDocs=44218)
                0.03002521 = queryNorm
              0.302856 = fieldWeight in 1428, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4822793 = idf(docFreq=499, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1428)
          0.020340007 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
            0.020340007 = score(doc=1428,freq=2.0), product of:
              0.10514317 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03002521 = queryNorm
              0.19345059 = fieldWeight in 1428, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1428)
      0.071428575 = coord(1/14)
    
    Abstract
    Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
    Date
    22. 3.2003 19:35:46
  8. Graham, R.Y.: Subject no-hits in an academic library online catalog : an exploration of two potential ameliorations (2004) 0.00
    0.0031500303 = product of:
      0.044100422 = sum of:
        0.044100422 = weight(_text_:subject in 178) [ClassicSimilarity], result of:
          0.044100422 = score(doc=178,freq=6.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.41066417 = fieldWeight in 178, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.046875 = fieldNorm(doc=178)
      0.071428575 = coord(1/14)
    
    Abstract
    This paper describes a study that explored ways in which users' subject-searching problems in a local online catalog might be reduced. On a weekly basis, the author reviewed catalog transaction logs to identify topics of subject searches retrieving no records for which appropriate information resources may actually be represented in the catalog. For topics thus identified, the author explored two potential ameliorations of the no-hits search results through the use of authority record cross-references and pathfinder records providing brief instructions on search refinement. This paper describes the study findings, discusses possible concerns regarding the amelioration methods used, outlines additional steps needed to determine whether the potential ameliorations make a difference to users' searching experiences, and suggests related areas for further research.
  9. Tudhope, D.; Binding, C.; Blocks, D.; Cunliffe, D.: Compound descriptors in context : a matching function for classifications and thesauri (2002) 0.00
    0.0021433241 = product of:
      0.030006537 = sum of:
        0.030006537 = weight(_text_:subject in 3179) [ClassicSimilarity], result of:
          0.030006537 = score(doc=3179,freq=4.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.27942157 = fieldWeight in 3179, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3179)
      0.071428575 = coord(1/14)
    
    Abstract
    There are many advantages for Digital Libraries in indexing with classifications or thesauri, but some current disincentive in the lack of flexible retrieval tools that deal with compound descriptors. This paper discusses a matching function for compound descriptors, or multi-concept subject headings, that does not rely an exact matching but incorporates term expansion via thesaurus semantic relationships to produce ranked results that take account of missing and partially matching terms. The matching function is based an a measure of semantic closeness between terms, which has the potential to help with recall problems. The work reported is part of the ongoing FACET project in collaboration with the National Museum of Science and Industry and its collections database. The architecture of the prototype system and its Interface are outlined. The matching problem for compound descriptors is reviewed and the FACET implementation described. Results are discussed from scenarios using the faceted Getty Art and Architecture Thesaurus. We argue that automatic traversal of thesaurus relationships can augment the user's browsing possibilities. The techniques can be applied both to unstructured multi-concept subject headings and potentially to more syntactically structured strings. The notion of a focus term is used by the matching function to model AAT modified descriptors (noun phrases). The relevance of the approach to precoordinated indexing and matching faceted strings is discussed.
  10. Boyack, K.W.; Wylie,B.N.; Davidson, G.S.: Information Visualization, Human-Computer Interaction, and Cognitive Psychology : Domain Visualizations (2002) 0.00
    0.002054651 = product of:
      0.028765112 = sum of:
        0.028765112 = product of:
          0.057530224 = sum of:
            0.057530224 = weight(_text_:22 in 1352) [ClassicSimilarity], result of:
              0.057530224 = score(doc=1352,freq=4.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.54716086 = fieldWeight in 1352, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1352)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Date
    22. 2.2003 17:25:39
    22. 2.2003 18:17:40
  11. Cool, C.; Spink, A.: Issues of context in information retrieval (IR) : an introduction to the special issue (2002) 0.00
    0.0018186709 = product of:
      0.02546139 = sum of:
        0.02546139 = weight(_text_:subject in 2587) [ClassicSimilarity], result of:
          0.02546139 = score(doc=2587,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.23709705 = fieldWeight in 2587, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.046875 = fieldNorm(doc=2587)
      0.071428575 = coord(1/14)
    
    Abstract
    The subject of context has received a great deal of attention in the information retrieval (IR) literature over the past decade, primarily in studies of information seeking and IR interactions. Recently, attention to context in IR has expanded to address new problems in new environments. In this paper we outline five overlapping dimensions of context which we believe to be important constituent elements and we discuss how they are related to different issues in IR research. The papers in this special issue are summarized with respect to how they represent work that is being conducted within these dimensions of context. We conclude with future areas of research which are needed in order to fully understand the multidimensional nature of context in IR.
  12. Wolfram, D.; Xie, H.I.: Traditional IR for web users : a context for general audience digital libraries (2002) 0.00
    0.0017956087 = product of:
      0.02513852 = sum of:
        0.02513852 = weight(_text_:bibliographic in 2589) [ClassicSimilarity], result of:
          0.02513852 = score(doc=2589,freq=2.0), product of:
            0.11688946 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.03002521 = queryNorm
            0.21506234 = fieldWeight in 2589, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2589)
      0.071428575 = coord(1/14)
    
    Abstract
    The emergence of general audience digital libraries (GADLs) defines a context that represents a hybrid of both "traditional" IR, using primarily bibliographic resources provided by database vendors, and "popular" IR, exemplified by public search systems available on the World Wide Web. Findings of a study investigating end-user searching and response to a GADL are reported. Data collected from a Web-based end-user survey and data logs of resource usage for a Web-based GADL were analyzed for user characteristics, patterns of access and use, and user feedback. Cross-tabulations using respondent demographics revealed several key differences in how the system was used and valued by users of different age groups. Older users valued the service more than younger users and engaged in different searching and viewing behaviors. The GADL more closely resembles traditional retrieval systems in terms of content and purpose of use, but is more similar to popular IR systems in terms of user behavior and accessibility. A model that defines the dual context of the GADL environment is derived from the data analysis and existing IR models in general and other specific contexts. The authors demonstrate the distinguishing characteristics of this IR context, and discuss implications for the development and evaluation of future GADLs to accommodate a variety of user needs and expectations.
  13. Lehtokangas, R.; Järvelin, K.: Consistency of textual expression in newspaper articles : an argument for semantically based query expansion (2001) 0.00
    0.001780432 = product of:
      0.024926046 = sum of:
        0.024926046 = product of:
          0.04985209 = sum of:
            0.04985209 = weight(_text_:texts in 4485) [ClassicSimilarity], result of:
              0.04985209 = score(doc=4485,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.302856 = fieldWeight in 4485, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4485)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    This article investigates how consistent different newspapers are in their choice of words when writing about the same news events. News articles on the same news events were taken from three Finnish newspapers and compared in regard to their central concepts and words representing the concepts in the news texts. Consistency figures were calculated for each set of three articles (the total number of sets was sixty). Inconsistency in words and concepts was found between news articles from different newspapers. The mean value of consistency calculated on the basis of words was 65 per cent; this however depended on the article length. For short news wires consistency was 83 per cent while for long articles it was only 47 per cent. At the concept level, consistency was considerably higher, ranging from 92 per cent to 97 per cent between short and long articles. The articles also represented three categories of topic (event, process and opinion). Statistically significant differences in consistency were found in regard to length but not in regard to the categories of topic. We argue that the expression inconsistency is a clear sign of a retrieval problem and that query expansion based on semantic relationships can significantly improve retrieval performance on free-text sources.
  14. Sihvonen, A.; Vakkari, P.: Subject knowledge improves interactive query expansion assisted by a thesaurus (2004) 0.00
    0.0015155592 = product of:
      0.021217827 = sum of:
        0.021217827 = weight(_text_:subject in 4417) [ClassicSimilarity], result of:
          0.021217827 = score(doc=4417,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.19758089 = fieldWeight in 4417, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4417)
      0.071428575 = coord(1/14)
    
  15. Shiri, A.: Topic familiarity and its effects on term selection and browsing in a thesaurus-enhanced search environment (2005) 0.00
    0.0015155592 = product of:
      0.021217827 = sum of:
        0.021217827 = weight(_text_:subject in 613) [ClassicSimilarity], result of:
          0.021217827 = score(doc=613,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.19758089 = fieldWeight in 613, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=613)
      0.071428575 = coord(1/14)
    
    Abstract
    Purpose - To evaluate the extent to which familiarity with search topics affects the ways in which users select and browse search terms in a thesaurus-enhanced search setting. Design/methodology/approach - An experimental methodology was adopted to study users' search behaviour in an operational information retrieval environment. Findings - Topic familiarity and subject knowledge influence some search and interaction behaviours. Searches involving moderately and very familiar topics were associated with browsing around twice as many thesaurus terms as was the case for unfamiliar topics. Research limitations/implications - Some search behaviours such as thesaurus browsing and term selection could be used as an indication of user levels of topic familiarity. Practical implications - The results of this study provide design implications as to how to develop personalized search interfaces where users with varying levels of familiarity with search topics can carry out searches. Originality/value - This paper establishes the importance of topic familiarity characteristics and the effects of those characteristics on users' interaction with search interfaces enhanced with semantic tools such as thesauri.
  16. Prasad, A.R.D.; Madalli, D.P.: Faceted infrastructure for semantic digital libraries (2008) 0.00
    0.0015155592 = product of:
      0.021217827 = sum of:
        0.021217827 = weight(_text_:subject in 1905) [ClassicSimilarity], result of:
          0.021217827 = score(doc=1905,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.19758089 = fieldWeight in 1905, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1905)
      0.071428575 = coord(1/14)
    
    Abstract
    Purpose - The paper aims to argue that digital library retrieval should be based on semantic representations and propose a semantic infrastructure for digital libraries. Design/methodology/approach - The approach taken is formal model based on subject representation for digital libraries. Findings - Search engines and search techniques have fallen short of user expectations as they do not give context based retrieval. Deploying semantic web technologies would lead to efficient and more precise representation of digital library content and hence better retrieval. Though digital libraries often have metadata of information resources which can be accessed through OAI-PMH, much remains to be accomplished in making digital libraries semantic web compliant. This paper presents a semantic infrastructure for digital libraries, that will go a long way in providing them and web based information services with products highly customised to users needs. Research limitations/implications - Here only a model for semantic infrastructure is proposed. This model is proposed after studying current user-centric, top-down models adopted in digital library service architectures. Originality/value - This paper gives a generic model for building semantic infrastructure for digital libraries. Faceted ontologies for digital libraries is just one approach. But the same may be adopted by groups working with different approaches in building ontologies to realise efficient retrieval in digital libraries.
  17. Sacco, G.M.: Dynamic taxonomies and guided searches (2006) 0.00
    0.0014382558 = product of:
      0.02013558 = sum of:
        0.02013558 = product of:
          0.04027116 = sum of:
            0.04027116 = weight(_text_:22 in 5295) [ClassicSimilarity], result of:
              0.04027116 = score(doc=5295,freq=4.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.38301262 = fieldWeight in 5295, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5295)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Date
    22. 7.2006 17:56:22
  18. Zhang, J.; Mostafa, J.; Tripathy, H.: Information retrieval by semantic analysis and visualization of the concept space of D-Lib® magazine (2002) 0.00
    0.0013125126 = product of:
      0.018375175 = sum of:
        0.018375175 = weight(_text_:subject in 1211) [ClassicSimilarity], result of:
          0.018375175 = score(doc=1211,freq=6.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.17111006 = fieldWeight in 1211, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1211)
      0.071428575 = coord(1/14)
    
    Abstract
    From the user's perspective, however, it is still difficult to use current information retrieval systems. Users frequently have problems expressing their information needs and translating those needs into queries. This is partly due to the fact that information needs cannot be expressed appropriately in systems terms. It is not unusual for users to input search terms that are different from the index terms information systems use. Various methods have been proposed to help users choose search terms and articulate queries. One widely used approach is to incorporate into the information system a thesaurus-like component that represents both the important concepts in a particular subject area and the semantic relationships among those concepts. Unfortunately, the development and use of thesauri is not without its own problems. The thesaurus employed in a specific information system has often been developed for a general subject area and needs significant enhancement to be tailored to the information system where it is to be used. This thesaurus development process, if done manually, is both time consuming and labor intensive. Usage of a thesaurus in searching is complex and may raise barriers for the user. For illustration purposes, let us consider two scenarios of thesaurus usage. In the first scenario the user inputs a search term and the thesaurus then displays a matching set of related terms. Without an overview of the thesaurus - and without the ability to see the matching terms in the context of other terms - it may be difficult to assess the quality of the related terms in order to select the correct term. In the second scenario the user browses the whole thesaurus, which is organized as in an alphabetically ordered list. The problem with this approach is that the list may be long, and neither does it show users the global semantic relationship among all the listed terms.
    Nevertheless, because thesaurus use has shown to improve retrieval, for our method we integrate functions in the search interface that permit users to explore built-in search vocabularies to improve retrieval from digital libraries. Our method automatically generates the terms and their semantic relationships representing relevant topics covered in a digital library. We call these generated terms the "concepts", and the generated terms and their semantic relationships we call the "concept space". Additionally, we used a visualization technique to display the concept space and allow users to interact with this space. The automatically generated term set is considered to be more representative of subject area in a corpus than an "externally" imposed thesaurus, and our method has the potential of saving a significant amount of time and labor for those who have been manually creating thesauri as well. Information visualization is an emerging discipline and developed very quickly in the last decade. With growing volumes of documents and associated complexities, information visualization has become increasingly important. Researchers have found information visualization to be an effective way to use and understand information while minimizing a user's cognitive load. Our work was based on an algorithmic approach of concept discovery and association. Concepts are discovered using an algorithm based on an automated thesaurus generation procedure. Subsequently, similarities among terms are computed using the cosine measure, and the associations among terms are established using a method known as max-min distance clustering. The concept space is then visualized in a spring embedding graph, which roughly shows the semantic relationships among concepts in a 2-D visual representation. The semantic space of the visualization is used as a medium for users to retrieve the desired documents. In the remainder of this article, we present our algorithmic approach of concept generation and clustering, followed by description of the visualization technique and interactive interface. The paper ends with key conclusions and discussions on future work.
  19. Tudhope, D.; Binding, C.; Blocks, D.; Cunliffe, D.: FACET: thesaurus retrieval with semantic term expansion (2002) 0.00
    0.0012124473 = product of:
      0.016974261 = sum of:
        0.016974261 = weight(_text_:subject in 175) [ClassicSimilarity], result of:
          0.016974261 = score(doc=175,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.15806471 = fieldWeight in 175, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03125 = fieldNorm(doc=175)
      0.071428575 = coord(1/14)
    
    Abstract
    There are many advantages for Digital Libraries in indexing with classifications or thesauri, but some current disincentive in the lack of flexible retrieval tools that deal with compound descriptors. This demonstration of a research prototype illustrates a matching function for compound descriptors, or multi-concept subject headings, that does not rely on exact matching but incorporates term expansion via thesaurus semantic relationships to produce ranked results that take account of missing and partially matching terms. The matching function is based on a measure of semantic closeness between terms.The work is part of the EPSRC funded FACET project in collaboration with the UK National Museum of Science and Industry (NMSI) which includes the National Railway Museum. An export of NMSI's Collections Database is used as the dataset for the research. The J. Paul Getty Trust's Art and Architecture Thesaurus (AAT) is the main thesaurus in the project. The AAT is a widely used thesaurus (over 120,000 terms). Descriptors are organised in 7 facets representing separate conceptual classes of terms.The FACET application is a multi tiered architecture accessing a SQL Server database, with an OLE DB connection. The thesauri are stored as relational tables in the Server's database. However, a key component of the system is a parallel representation of the underlying semantic network as an in-memory structure of thesaurus concepts (corresponding to preferred terms). The structure models the hierarchical and associative interrelationships of thesaurus concepts via weighted poly-hierarchical links. Its primary purpose is real-time semantic expansion of query terms, achieved by a spreading activation semantic closeness algorithm. Queries with associated results are stored persistently using XML format data. A Visual Basic interface combines a thesaurus browser and an initial term search facility that takes into account equivalence relationships. Terms are dragged to a direct manipulation Query Builder which maintains the facet structure.
  20. Faaborg, A.; Lagoze, C.: Semantic browsing (2003) 0.00
    0.0010170004 = product of:
      0.014238005 = sum of:
        0.014238005 = product of:
          0.02847601 = sum of:
            0.02847601 = weight(_text_:22 in 1026) [ClassicSimilarity], result of:
              0.02847601 = score(doc=1026,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.2708308 = fieldWeight in 1026, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1026)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Source
    Research and advanced technology for digital libraries : 7th European Conference, proceedings / ECDL 2003, Trondheim, Norway, August 17-22, 2003