Search (93 results, page 5 of 5)

  • × theme_ss:"Semantisches Umfeld in Indexierung u. Retrieval"
  • × type_ss:"a"
  • × year_i:[2000 TO 2010}
  1. Zhang, J.; Mostafa, J.; Tripathy, H.: Information retrieval by semantic analysis and visualization of the concept space of D-Lib® magazine (2002) 0.00
    0.0016629322 = product of:
      0.009977593 = sum of:
        0.009977593 = weight(_text_:in in 1211) [ClassicSimilarity], result of:
          0.009977593 = score(doc=1211,freq=40.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.16802745 = fieldWeight in 1211, product of:
              6.3245554 = tf(freq=40.0), with freq of:
                40.0 = termFreq=40.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1211)
      0.16666667 = coord(1/6)
    
    Abstract
    In this article we present a method for retrieving documents from a digital library through a visual interface based on automatically generated concepts. We used a vocabulary generation algorithm to generate a set of concepts for the digital library and a technique called the max-min distance technique to cluster them. Additionally, the concepts were visualized in a spring embedding graph layout to depict the semantic relationship among them. The resulting graph layout serves as an aid to users for retrieving documents. An online archive containing the contents of D-Lib Magazine from July 1995 to May 2002 was used to test the utility of an implemented retrieval and visualization system. We believe that the method developed and tested can be applied to many different domains to help users get a better understanding of online document collections and to minimize users' cognitive load during execution of search tasks. Over the past few years, the volume of information available through the World Wide Web has been expanding exponentially. Never has so much information been so readily available and shared among so many people. Unfortunately, the unstructured nature and huge volume of information accessible over networks have made it hard for users to sift through and find relevant information. To deal with this problem, information retrieval (IR) techniques have gained more intensive attention from both industrial and academic researchers. Numerous IR techniques have been developed to help deal with the information overload problem. These techniques concentrate on mathematical models and algorithms for retrieval. Popular IR models such as the Boolean model, the vector-space model, the probabilistic model and their variants are well established.
    From the user's perspective, however, it is still difficult to use current information retrieval systems. Users frequently have problems expressing their information needs and translating those needs into queries. This is partly due to the fact that information needs cannot be expressed appropriately in systems terms. It is not unusual for users to input search terms that are different from the index terms information systems use. Various methods have been proposed to help users choose search terms and articulate queries. One widely used approach is to incorporate into the information system a thesaurus-like component that represents both the important concepts in a particular subject area and the semantic relationships among those concepts. Unfortunately, the development and use of thesauri is not without its own problems. The thesaurus employed in a specific information system has often been developed for a general subject area and needs significant enhancement to be tailored to the information system where it is to be used. This thesaurus development process, if done manually, is both time consuming and labor intensive. Usage of a thesaurus in searching is complex and may raise barriers for the user. For illustration purposes, let us consider two scenarios of thesaurus usage. In the first scenario the user inputs a search term and the thesaurus then displays a matching set of related terms. Without an overview of the thesaurus - and without the ability to see the matching terms in the context of other terms - it may be difficult to assess the quality of the related terms in order to select the correct term. In the second scenario the user browses the whole thesaurus, which is organized as in an alphabetically ordered list. The problem with this approach is that the list may be long, and neither does it show users the global semantic relationship among all the listed terms.
    Nevertheless, because thesaurus use has shown to improve retrieval, for our method we integrate functions in the search interface that permit users to explore built-in search vocabularies to improve retrieval from digital libraries. Our method automatically generates the terms and their semantic relationships representing relevant topics covered in a digital library. We call these generated terms the "concepts", and the generated terms and their semantic relationships we call the "concept space". Additionally, we used a visualization technique to display the concept space and allow users to interact with this space. The automatically generated term set is considered to be more representative of subject area in a corpus than an "externally" imposed thesaurus, and our method has the potential of saving a significant amount of time and labor for those who have been manually creating thesauri as well. Information visualization is an emerging discipline and developed very quickly in the last decade. With growing volumes of documents and associated complexities, information visualization has become increasingly important. Researchers have found information visualization to be an effective way to use and understand information while minimizing a user's cognitive load. Our work was based on an algorithmic approach of concept discovery and association. Concepts are discovered using an algorithm based on an automated thesaurus generation procedure. Subsequently, similarities among terms are computed using the cosine measure, and the associations among terms are established using a method known as max-min distance clustering. The concept space is then visualized in a spring embedding graph, which roughly shows the semantic relationships among concepts in a 2-D visual representation. The semantic space of the visualization is used as a medium for users to retrieve the desired documents. In the remainder of this article, we present our algorithmic approach of concept generation and clustering, followed by description of the visualization technique and interactive interface. The paper ends with key conclusions and discussions on future work.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  2. Schaefer, A.; Jordan, M.; Klas, C.-P.; Fuhr, N.: Active support for query formulation in virtual digital libraries : a case study with DAFFODIL (2005) 0.00
    0.0016629322 = product of:
      0.009977593 = sum of:
        0.009977593 = weight(_text_:in in 4296) [ClassicSimilarity], result of:
          0.009977593 = score(doc=4296,freq=10.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.16802745 = fieldWeight in 4296, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4296)
      0.16666667 = coord(1/6)
    
    Abstract
    Daffodil is a front-end to federated, heterogeneous digital libraries targeting at strategic support of users during the information seeking process. This is done by offering a variety of functions for searching, exploring and managing digital library objects. However, the distributed search increases response time and the conceptual model of the underlying search processes is inherently weaker. This makes query formulation harder and the resulting waiting times can be frustrating. In this paper, we investigate the concept of proactive support during the user's query formulation. For improving user efficiency and satisfaction, we implemented annotations, proactive support and error markers on the query form itself. These functions decrease the probability for syntactical or semantical errors in queries. Furthermore, the user is able to make better tactical decisions and feels more confident that the system handles the query properly. Evaluations with 30 subjects showed that user satisfaction is improved, whereas no conclusive results were received for efficiency.
    Series
    Lecture notes in computer science ; 3652
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  3. Efthimiadis, E.N.: Interactive query expansion : a user-based evaluation in a relevance feedback environment (2000) 0.00
    0.0015740865 = product of:
      0.009444519 = sum of:
        0.009444519 = weight(_text_:in in 5701) [ClassicSimilarity], result of:
          0.009444519 = score(doc=5701,freq=14.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.15905021 = fieldWeight in 5701, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.03125 = fieldNorm(doc=5701)
      0.16666667 = coord(1/6)
    
    Abstract
    A user-centered investigation of interactive query expansion within the context of a relevance feedback system is presented in this article. Data were collected from 25 searches using the INSPEC database. The data collection mechanisms included questionnaires, transaction logs, and relevance evaluations. The results discuss issues that relate to query expansion, retrieval effectiveness, the correspondence of the on-line-to-off-line relevance judgments, and the selection of terms for query expansion by users (interactive query expansion). The main conclusions drawn from the results of the study are that: (1) one-third of the terms presented to users in a list of candidate terms for query expansion was identified by the users as potentially useful for query expansion. (2) These terms were mainly judged as either variant expressions (synonyms) or alternative (related) terms to the initial query terms. However, a substantial portion of the selected terms were identified as representing new ideas. (3) The relationships identified between the five best terms selected by the users for query expansion and the initial query terms were that: (a) 34% of the query expansion terms have no relationship or other type of correspondence with a query term; (b) 66% of the remaining query expansion terms have a relationship to the query terms. These relationships were: narrower term (46%), broader term (3%), related term (17%). (4) The results provide evidence for the effectiveness of interactive query expansion. The initial search produced on average three highly relevant documents; the query expansion search produced on average nine further highly relevant documents. The conclusions highlight the need for more research on: interactive query expansion, the comparative evaluation of automatic vs. interactive query expansion, the study of weighted Webbased or Web-accessible retrieval systems in operational environments, and for user studies in searching ranked retrieval systems in general
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  4. Sanderson, M.; Lawrie, D.: Building, testing, and applying concept hierarchies (2000) 0.00
    0.0015457221 = product of:
      0.009274333 = sum of:
        0.009274333 = weight(_text_:in in 37) [ClassicSimilarity], result of:
          0.009274333 = score(doc=37,freq=6.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.1561842 = fieldWeight in 37, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=37)
      0.16666667 = coord(1/6)
    
    Abstract
    A means of automatically deriving a hierarchical organization of concepts from a set of documents without use of training data or standard clustering techniques is presented. Using a process that extracts salient words and phrases from the documents, these terms are organized hierarchically using a type of co-occurrence known as subsumption. The resulting structure is displayed as a series of hierarchical menus. When generated from a set of retrieved documents, a user browsing the menus gains an overview of their content in a manner distinct from existing techniques. The methods used to build the structure are simple and appear to be effective. The formation and presentation of the hierarchy is described along with a study of some of its properties, including a preliminary experiment, which indicates that users may find the hierarchy a more efficient means of locating relevant documents than the classic method of scanning a ranked document list
    Source
    Advances in information retrieval: Recent research from the Center for Intelligent Information Retrieval. Ed.: W.B. Croft
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  5. Khan, M.S.; Khor, S.: Enhanced Web document retrieval using automatic query expansion (2004) 0.00
    0.0015457221 = product of:
      0.009274333 = sum of:
        0.009274333 = weight(_text_:in in 2091) [ClassicSimilarity], result of:
          0.009274333 = score(doc=2091,freq=6.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.1561842 = fieldWeight in 2091, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=2091)
      0.16666667 = coord(1/6)
    
    Abstract
    The ever growing popularity of the Internet as a source of information, coupled with the accompanying growth in the number of documents made available through the World Wide Web, is leading to an increasing demand for more efficient and accurate information retrieval tools. Numerous techniques have been proposed and tried for improving the effectiveness of searching the World Wide Web for documents relevant to a given topic of interest. The specification of appropriate keywords and phrases by the user is crucial for the successful execution of a query as measured by the relevance of documents retrieved. Lack of users' knowledge an the search topic and their changing information needs often make it difficult for them to find suitable keywords or phrases for a query. This results in searches that fail to cover all likely aspects of the topic of interest. We describe a scheme that attempts to remedy this situation by automatically expanding the user query through the analysis of initially retrieved documents. Experimental results to demonstrate the effectiveness of the query expansion scheure are presented.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  6. Scholer, F.; Williams, H.E.; Turpin, A.: Query association surrogates for Web search (2004) 0.00
    0.0015457221 = product of:
      0.009274333 = sum of:
        0.009274333 = weight(_text_:in in 2236) [ClassicSimilarity], result of:
          0.009274333 = score(doc=2236,freq=6.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.1561842 = fieldWeight in 2236, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=2236)
      0.16666667 = coord(1/6)
    
    Abstract
    Collection sizes, query rates, and the number of users of Web search engines are increasing. Therefore, there is continued demand for innovation in providing search services that meet user information needs. In this article, we propose new techniques to add additional terms to documents with the goal of providing more accurate searches. Our techniques are based an query association, where queries are stored with documents that are highly similar statistically. We show that adding query associations to documents improves the accuracy of Web topic finding searches by up to 7%, and provides an excellent complement to existing supplement techniques for site finding. We conclude that using document surrogates derived from query association is a valuable new technique for accurate Web searching.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  7. Prieto-Díaz, R.: ¬A faceted approach to building ontologies (2002) 0.00
    0.0015457221 = product of:
      0.009274333 = sum of:
        0.009274333 = weight(_text_:in in 2259) [ClassicSimilarity], result of:
          0.009274333 = score(doc=2259,freq=6.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.1561842 = fieldWeight in 2259, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=2259)
      0.16666667 = coord(1/6)
    
    Abstract
    An ontology is "an explicit conceptualization of a domain of discourse, and thus provides a shared and common understanding of the domain." We have been producing ontologies for millennia to understand and explain our rationale and environment. From Plato's philosophical framework to modern day classification systems, ontologies are, in most cases, the product of extensive analysis and categorization. Only recently has the process of building ontologies become a research topic of interest. Today, ontologies are built very much ad-hoc. A terminology is first developed providing a controlled vocabulary for the subject area or domain of interest, then it is organized into a taxonomy where key concepts are identified, and finally these concepts are defined and related to create an ontology. The intent of this paper is to show that domain analysis methods can be used for building ontologies. Domain analysis aims at generic models that represent groups of similar systems within an application domain. In this sense, it deals with categorization of common objects and operations, with clear, unambiguous definitions of them and with defining their relationships.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  8. Li, D.; Kwong, C.-P.; Lee, D.L.: Unified linear subspace approach to semantic analysis (2009) 0.00
    0.0014873719 = product of:
      0.008924231 = sum of:
        0.008924231 = weight(_text_:in in 3321) [ClassicSimilarity], result of:
          0.008924231 = score(doc=3321,freq=8.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.15028831 = fieldWeight in 3321, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3321)
      0.16666667 = coord(1/6)
    
    Abstract
    The Basic Vector Space Model (BVSM) is well known in information retrieval. Unfortunately, its retrieval effectiveness is limited because it is based on literal term matching. The Generalized Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) are two prominent semantic retrieval methods, both of which assume there is some underlying latent semantic structure in a dataset that can be used to improve retrieval performance. However, while this structure may be derived from both the term space and the document space, GVSM exploits only the former and LSI the latter. In this article, the latent semantic structure of a dataset is examined from a dual perspective; namely, we consider the term space and the document space simultaneously. This new viewpoint has a natural connection to the notion of kernels. Specifically, a unified kernel function can be derived for a class of vector space models. The dual perspective provides a deeper understanding of the semantic space and makes transparent the geometrical meaning of the unified kernel function. New semantic analysis methods based on the unified kernel function are developed, which combine the advantages of LSI and GVSM. We also prove that the new methods are stable because although the selected rank of the truncated Singular Value Decomposition (SVD) is far from the optimum, the retrieval performance will not be degraded significantly. Experiments performed on standard test collections show that our methods are promising.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  9. Hetzler, B.: Visual analysis and exploration of relationships (2002) 0.00
    0.0014724231 = product of:
      0.008834538 = sum of:
        0.008834538 = weight(_text_:in in 1189) [ClassicSimilarity], result of:
          0.008834538 = score(doc=1189,freq=4.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.14877784 = fieldWeight in 1189, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1189)
      0.16666667 = coord(1/6)
    
    Abstract
    Relationships can provide a rich and powerful set of information and can be used to accomplish application goals, such as information retrieval and natural language processing. A growing trend in the information science community is the use of information visualization-taking advantage of people's natural visual capabilities to perceive and understand complex information. This chapter explores how visualization and visual exploration can help users gain insight from known relationships and discover evidence of new relationships not previously anticipated.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  10. Kruschwitz, U.; AI-Bakour, H.: Users want more sophisticated search assistants : results of a task-based evaluation (2005) 0.00
    0.0012881019 = product of:
      0.007728611 = sum of:
        0.007728611 = weight(_text_:in in 4575) [ClassicSimilarity], result of:
          0.007728611 = score(doc=4575,freq=6.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.1301535 = fieldWeight in 4575, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4575)
      0.16666667 = coord(1/6)
    
    Abstract
    The Web provides a massive knowledge source, as do intranets and other electronic document collections. However, much of that knowledge is encoded implicitly and cannot be applied directly without processing into some more appropriate structures. Searching, browsing, question answering, for example, could all benefit from domain-specific knowledge contained in the documents, and in applications such as simple search we do not actually need very "deep" knowledge structures such as ontologies, but we can get a long way with a model of the domain that consists of term hierarchies. We combine domain knowledge automatically acquired by exploiting the documents' markup structure with knowledge extracted an the fly to assist a user with ad hoc search requests. Such a search system can suggest query modification options derived from the actual data and thus guide a user through the space of documents. This article gives a detailed account of a task-based evaluation that compares a search system that uses the outlined domain knowledge with a standard search system. We found that users do use the query modification suggestions proposed by the system. The main conclusion we can draw from this evaluation, however, is that users prefer a system that can suggest query modifications over a standard search engine, which simply presents a ranked list of documents. Most interestingly, we observe this user preference despite the fact that the baseline system even performs slightly better under certain criteria.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  11. Menczer, F.: Lexical and semantic clustering by Web links (2004) 0.00
    0.0012620769 = product of:
      0.0075724614 = sum of:
        0.0075724614 = weight(_text_:in in 3090) [ClassicSimilarity], result of:
          0.0075724614 = score(doc=3090,freq=4.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.12752387 = fieldWeight in 3090, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=3090)
      0.16666667 = coord(1/6)
    
    Footnote
    Beitrag in einem Themenheft über Webometrics
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  12. Vallet, D.; Fernández, M.; Castells, P.: ¬An ontology-based information retrieval model (2005) 0.00
    0.0012620769 = product of:
      0.0075724614 = sum of:
        0.0075724614 = weight(_text_:in in 4708) [ClassicSimilarity], result of:
          0.0075724614 = score(doc=4708,freq=4.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.12752387 = fieldWeight in 4708, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=4708)
      0.16666667 = coord(1/6)
    
    Series
    Lecture Notes in Computer Science ; 3532
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  13. Chen, H.; Lally, A.M.; Zhu, B.; Chau, M.: HelpfulMed : Intelligent searching for medical information over the Internet (2003) 0.00
    0.0010517307 = product of:
      0.006310384 = sum of:
        0.006310384 = weight(_text_:in in 1615) [ClassicSimilarity], result of:
          0.006310384 = score(doc=1615,freq=4.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.10626988 = fieldWeight in 1615, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1615)
      0.16666667 = coord(1/6)
    
    Abstract
    The Medical professionals and researchers need information from reputable sources to accomplish their work. Unfortunately, the Web has a large number of documents that are irrelevant to their work, even those documents that purport to be "medically-related." This paper describes an architecture designed to integrate advanced searching and indexing algorithms, an automatic thesaurus, or "concept space," and Kohonen-based Self-Organizing Map (SOM) technologies to provide searchers with finegrained results. Initial results indicate that these systems provide complementary retrieval functionalities. HelpfulMed not only allows users to search Web pages and other online databases, but also allows them to build searches through the use of an automatic thesaurus and browse a graphical display of medical-related topics. Evaluation results for each of the different components are included. Our spidering algorithm outperformed both breadth-first search and PageRank spiders an a test collection of 100,000 Web pages. The automatically generated thesaurus performed as well as both MeSH and UMLS-systems which require human mediation for currency. Lastly, a variant of the Kohonen SOM was comparable to MeSH terms in perceived cluster precision and significantly better at perceived cluster recall.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval

Languages

  • e 72
  • d 21