Search (66 results, page 1 of 4)

  • × language_ss:"e"
  • × theme_ss:"Semantisches Umfeld in Indexierung u. Retrieval"
  1. Caro Castro, C.; Travieso Rodríguez, C.: Ariadne's thread : knowledge structures for browsing in OPAC's (2003) 0.01
    0.011331651 = product of:
      0.079321556 = sum of:
        0.058721103 = weight(_text_:mental in 2768) [ClassicSimilarity], result of:
          0.058721103 = score(doc=2768,freq=4.0), product of:
            0.16438161 = queryWeight, product of:
              6.532101 = idf(docFreq=174, maxDocs=44218)
              0.025165197 = queryNorm
            0.3572243 = fieldWeight in 2768, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.532101 = idf(docFreq=174, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2768)
        0.020600451 = weight(_text_:representation in 2768) [ClassicSimilarity], result of:
          0.020600451 = score(doc=2768,freq=2.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.17791998 = fieldWeight in 2768, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2768)
      0.14285715 = coord(2/14)
    
    Abstract
    Subject searching is the most common but also the most conflictive searching for end user. The aim of this paper is to check how users expressions match subject headings and to prove if knowledge structure used in online catalogs enhances searching effectiveness. A bibliographic revision about difficulties in subject access and proposed methods to improve it is also presented. For the empirical analysis, transaction logs from two university libraries, online catalogs (CISNE and FAMA) were collected. Results show that more than a quarter of user queries are effective due to an alphabetical subject index approach and browsing through hypertextual links. 1. Introduction Since the 1980's, online public access catalogs (OPAC's) have become usual way to access bibliographic information. During the last two decades the technological development has helped to extend their use, making feasible the access for a whole of users that is getting more and more extensive and heterogeneous, and also to incorporate information resources in electronic formats and to interconnect systems. However, technology seems to have developed faster than our knowledge about the tasks where it has been applied and than the evolution of our capacities for adapting to it. The conceptual model of OPAC has been hardly modified recently, and for interacting with them, users still need to combine the same skills and basic knowledge than at the beginning of its introduction (Borgman, 1986, 2000): a) conceptual knowledge to translate the information need into an appropriate query because of a well-designed mental model of the system, b) semantic and syntactic knowledge to be able to implement that query (access fields, searching type, Boolean logic, etc.) and c) basic technical skills in computing. At present many users have the essential technical skills to make use, with more or less expertise, of a computer. This number is substantially reduced when it is referred to the conceptual, semantic and syntactic knowledge that is necessary to achieve a moderately satisfactory search. An added difficulty arises in subject searching, as users should concrete their unknown information needs in terms that the information retrieval system can understand. Many researches have focused an unskilled searchers' difficulties to enter an effective query. The mental models influence, users assumption about characteristics, structure, contents and operation of the system they interact with have been analysed (Dillon, 2000; Dimitroff, 2000). Another issue that implies difficulties is vocabulary: how to find the right terms to implement a query and to modify it as the case may be. Terminology and expressions characteristics used in searching (Bates, 1993), the match between user terms and the subject headings from the catalog (Carlyle, 1989; Drabensttot, 1996; Drabensttot & Vizine-Goetz, 1994), the incidence of spelling errors (Drabensttot and Weller, 1996; Ferl and Millsap, 1996; Walker and Jones, 1987), users problems
    Source
    Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas
  2. Ng, K.B.: Toward a theoretical framework for understanding the relationship between situated action and planned action models of behavior in information retrieval contexts : contributions from phenomenology (2002) 0.01
    0.009743382 = product of:
      0.13640735 = sum of:
        0.13640735 = weight(_text_:phenomenology in 2588) [ClassicSimilarity], result of:
          0.13640735 = score(doc=2588,freq=4.0), product of:
            0.20961581 = queryWeight, product of:
              8.329592 = idf(docFreq=28, maxDocs=44218)
              0.025165197 = queryNorm
            0.6507493 = fieldWeight in 2588, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              8.329592 = idf(docFreq=28, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2588)
      0.071428575 = coord(1/14)
    
    Abstract
    In human-computer interaction (HCI), a successful interaction sequence can take its own momentum and drift away from what the user has originally planned. However, this does not mean that planned actions play no important role in the overall performance. In this paper, the author tries to construct a line of argument to demonstrate that it is impossible to consider an action without an a priori plan, even according to the phenomenological position taken for granted by the situated action theory. Based on the phenomenological analysis of problematic situations and typification the author argues that, just like "situated-ness", "planned-ness" of an action should also be understood in the context of the situation. Successful plan can be developed and executed for familiar context. The first part of the paper treats information seeking behavior as a special type of social action and applies Alfred Schutz's phenomenology of sociology to understand the importance and necessity of plan. The second part reports results of a quasi-experiment focusing on plan deviation within an information seeking context. It was found that when the searcher's situation changed from problematic to non-problematic, the degree of plan deviation decreased significantly. These results support the argument proposed in the first part of the paper.
  3. Jarvelin, K.: ¬A deductive data model for thesaurus navigation and query expansion (1996) 0.01
    0.00803734 = product of:
      0.05626138 = sum of:
        0.04708675 = weight(_text_:representation in 5625) [ClassicSimilarity], result of:
          0.04708675 = score(doc=5625,freq=2.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.40667427 = fieldWeight in 5625, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0625 = fieldNorm(doc=5625)
        0.00917463 = product of:
          0.027523888 = sum of:
            0.027523888 = weight(_text_:29 in 5625) [ClassicSimilarity], result of:
              0.027523888 = score(doc=5625,freq=2.0), product of:
                0.08852329 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.025165197 = queryNorm
                0.31092256 = fieldWeight in 5625, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5625)
          0.33333334 = coord(1/3)
      0.14285715 = coord(2/14)
    
    Abstract
    Describes a deductive data model based on 3 abstraction levels for representing vocabularies for information retrieval: conceptual level; expression level; and occurrence level. The proposed data model can be used for the representation and navigation of indexing and retrieval thesauri and as a vocabulary source for concept based query expansion in heterogeneous retrieval environments
    Date
    2. 3.1997 17:29:07
  4. Principles of semantic networks : explorations in the representation of knowledge (1991) 0.01
    0.0072818436 = product of:
      0.1019458 = sum of:
        0.1019458 = weight(_text_:representation in 1677) [ClassicSimilarity], result of:
          0.1019458 = score(doc=1677,freq=6.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.88047564 = fieldWeight in 1677, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.078125 = fieldNorm(doc=1677)
      0.071428575 = coord(1/14)
    
    Abstract
    Enthält 3 thematische Sektionen: (1) Issues in knowledge representation; (2) formal analyses; (3) systems for knowledge representation
  5. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.01
    0.0067573944 = product of:
      0.047301758 = sum of:
        0.041619197 = weight(_text_:representation in 1428) [ClassicSimilarity], result of:
          0.041619197 = score(doc=1428,freq=4.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.35945266 = fieldWeight in 1428, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1428)
        0.0056825615 = product of:
          0.017047685 = sum of:
            0.017047685 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
              0.017047685 = score(doc=1428,freq=2.0), product of:
                0.08812423 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.025165197 = queryNorm
                0.19345059 = fieldWeight in 1428, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1428)
          0.33333334 = coord(1/3)
      0.14285715 = coord(2/14)
    
    Abstract
    Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
    Date
    22. 3.2003 19:35:46
  6. Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.01
    0.006019162 = product of:
      0.042134132 = sum of:
        0.03531506 = weight(_text_:representation in 2419) [ClassicSimilarity], result of:
          0.03531506 = score(doc=2419,freq=2.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.3050057 = fieldWeight in 2419, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
        0.006819073 = product of:
          0.02045722 = sum of:
            0.02045722 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
              0.02045722 = score(doc=2419,freq=2.0), product of:
                0.08812423 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.025165197 = queryNorm
                0.23214069 = fieldWeight in 2419, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2419)
          0.33333334 = coord(1/3)
      0.14285715 = coord(2/14)
    
    Abstract
    The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.
    Date
    16.11.2008 16:22:48
  7. Bradford, R.B.: Relationship discovery in large text collections using Latent Semantic Indexing (2006) 0.01
    0.005405916 = product of:
      0.03784141 = sum of:
        0.03329536 = weight(_text_:representation in 1163) [ClassicSimilarity], result of:
          0.03329536 = score(doc=1163,freq=4.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.28756213 = fieldWeight in 1163, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.03125 = fieldNorm(doc=1163)
        0.004546049 = product of:
          0.013638147 = sum of:
            0.013638147 = weight(_text_:22 in 1163) [ClassicSimilarity], result of:
              0.013638147 = score(doc=1163,freq=2.0), product of:
                0.08812423 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.025165197 = queryNorm
                0.15476047 = fieldWeight in 1163, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1163)
          0.33333334 = coord(1/3)
      0.14285715 = coord(2/14)
    
    Abstract
    This paper addresses the problem of information discovery in large collections of text. For users, one of the key problems in working with such collections is determining where to focus their attention. In selecting documents for examination, users must be able to formulate reasonably precise queries. Queries that are too broad will greatly reduce the efficiency of information discovery efforts by overwhelming the users with peripheral information. In order to formulate efficient queries, a mechanism is needed to automatically alert users regarding potentially interesting information contained within the collection. This paper presents the results of an experiment designed to test one approach to generation of such alerts. The technique of latent semantic indexing (LSI) is used to identify relationships among entities of interest. Entity extraction software is used to pre-process the text of the collection so that the LSI space contains representation vectors for named entities in addition to those for individual terms. In the LSI space, the cosine of the angle between the representation vectors for two entities captures important information regarding the degree of association of those two entities. For appropriate choices of entities, determining the entity pairs with the highest mutual cosine values yields valuable information regarding the contents of the text collection. The test database used for the experiment consists of 150,000 news articles. The proposed approach for alert generation is tested using a counterterrorism analysis example. The approach is shown to have significant potential for aiding users in rapidly focusing on information of potential importance in large text collections. The approach also has value in identifying possible use of aliases.
    Source
    Proceedings of the Fourth Workshop on Link Analysis, Counterterrorism, and Security, SIAM Data Mining Conference, Bethesda, MD, 20-22 April, 2006. [http://www.siam.org/meetings/sdm06/workproceed/Link%20Analysis/15.pdf]
  8. Tudhope, D.; Blocks, D.; Cunliffe, D.; Binding, C.: Query expansion via conceptual distance in thesaurus indexed collections (2006) 0.01
    0.005023338 = product of:
      0.035163365 = sum of:
        0.02942922 = weight(_text_:representation in 2215) [ClassicSimilarity], result of:
          0.02942922 = score(doc=2215,freq=2.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.25417143 = fieldWeight in 2215, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2215)
        0.005734144 = product of:
          0.017202431 = sum of:
            0.017202431 = weight(_text_:29 in 2215) [ClassicSimilarity], result of:
              0.017202431 = score(doc=2215,freq=2.0), product of:
                0.08852329 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.025165197 = queryNorm
                0.19432661 = fieldWeight in 2215, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2215)
          0.33333334 = coord(1/3)
      0.14285715 = coord(2/14)
    
    Abstract
    Purpose - The purpose of this paper is to explore query expansion via conceptual distance in thesaurus indexed collections Design/methodology/approach - An extract of the National Museum of Science and Industry's collections database, indexed with the Getty Art and Architecture Thesaurus (AAT), was the dataset for the research. The system architecture and algorithms for semantic closeness and the matching function are outlined. Standalone and web interfaces are described and formative qualitative user studies are discussed. One user session is discussed in detail, together with a scenario based on a related public inquiry. Findings are set in context of the literature on thesaurus-based query expansion. This paper discusses the potential of query expansion techniques using the semantic relationships in a faceted thesaurus. Findings - Thesaurus-assisted retrieval systems have potential for multi-concept descriptors, permitting very precise queries and indexing. However, indexer and searcher may differ in terminology judgments and there may not be any exactly matching results. The integration of semantic closeness in the matching function permits ranked results for multi-concept queries in thesaurus-indexed applications. An in-memory representation of the thesaurus semantic network allows a combination of automatic and interactive control of expansion and control of expansion on individual query terms. Originality/value - The application of semantic expansion to browsing may be useful in interface options where thesaurus structure is hidden.
    Date
    30. 7.2011 16:07:29
  9. Gnoli, C.; Santis, R. de; Pusterla, L.: Commerce, see also Rhetoric : cross-discipline relationships as authority data for enhanced retrieval (2015) 0.01
    0.005023338 = product of:
      0.035163365 = sum of:
        0.02942922 = weight(_text_:representation in 2299) [ClassicSimilarity], result of:
          0.02942922 = score(doc=2299,freq=2.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.25417143 = fieldWeight in 2299, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2299)
        0.005734144 = product of:
          0.017202431 = sum of:
            0.017202431 = weight(_text_:29 in 2299) [ClassicSimilarity], result of:
              0.017202431 = score(doc=2299,freq=2.0), product of:
                0.08852329 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.025165197 = queryNorm
                0.19432661 = fieldWeight in 2299, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2299)
          0.33333334 = coord(1/3)
      0.14285715 = coord(2/14)
    
    Abstract
    Subjects in a classification scheme are often related to other subjects belonging to different hierarchies. This problem was identified already by Hugh of Saint Victor (1096?-1141). Still with present-time bibliographic classifications, a user browsing the class of architecture under the hierarchy of arts may miss relevant items classified in building or in civil engineering under the hierarchy of applied sciences. To face these limitations we have developed SciGator, a browsable interface to explore the collections of all scientific libraries at the University of Pavia. Besides showing subclasses of a given class, the interface points users to related classes in the Dewey Decimal Classification, or in other local schemes, and allows for expanded queries that include them. This is made possible by using a special field for related classes in the database structure which models classification authority data. Ontologically, many relationships between classes in different hierarchies are cases of existential dependence. Dependence can occur between disciplines in such disciplinary classifications as Dewey (e.g. architecture existentially depends on building), or between phenomena in such phenomenon-based classifications as the Integrative Levels Classification (e.g. fishing as a human activity existentially depends on fish as a class of organisms). We provide an example of its representation in OWL and discuss some details of it.
    Source
    Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro
  10. Zhang, W.; Yoshida, T.; Tang, X.: ¬A comparative study of TF*IDF, LSI and multi-words for text classification (2011) 0.00
    0.004369106 = product of:
      0.061167482 = sum of:
        0.061167482 = weight(_text_:representation in 1165) [ClassicSimilarity], result of:
          0.061167482 = score(doc=1165,freq=6.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.5282854 = fieldWeight in 1165, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.046875 = fieldNorm(doc=1165)
      0.071428575 = coord(1/14)
    
    Abstract
    One of the main themes in text mining is text representation, which is fundamental and indispensable for text-based intellegent information processing. Generally, text representation inludes two tasks: indexing and weighting. This paper has comparatively studied TF*IDF, LSI and multi-word for text representation. We used a Chinese and an English document collection to respectively evaluate the three methods in information retreival and text categorization. Experimental results have demonstrated that in text categorization, LSI has better performance than other methods in both document collections. Also, LSI has produced the best performance in retrieving English documents. This outcome has shown that LSI has both favorable semantic and statistical quality and is different with the claim that LSI can not produce discriminative power for indexing.
  11. Prasad, A.R.D.; Madalli, D.P.: Faceted infrastructure for semantic digital libraries (2008) 0.00
    0.0029727998 = product of:
      0.041619197 = sum of:
        0.041619197 = weight(_text_:representation in 1905) [ClassicSimilarity], result of:
          0.041619197 = score(doc=1905,freq=4.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.35945266 = fieldWeight in 1905, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1905)
      0.071428575 = coord(1/14)
    
    Abstract
    Purpose - The paper aims to argue that digital library retrieval should be based on semantic representations and propose a semantic infrastructure for digital libraries. Design/methodology/approach - The approach taken is formal model based on subject representation for digital libraries. Findings - Search engines and search techniques have fallen short of user expectations as they do not give context based retrieval. Deploying semantic web technologies would lead to efficient and more precise representation of digital library content and hence better retrieval. Though digital libraries often have metadata of information resources which can be accessed through OAI-PMH, much remains to be accomplished in making digital libraries semantic web compliant. This paper presents a semantic infrastructure for digital libraries, that will go a long way in providing them and web based information services with products highly customised to users needs. Research limitations/implications - Here only a model for semantic infrastructure is proposed. This model is proposed after studying current user-centric, top-down models adopted in digital library service architectures. Originality/value - This paper gives a generic model for building semantic infrastructure for digital libraries. Faceted ontologies for digital libraries is just one approach. But the same may be adopted by groups working with different approaches in building ontologies to realise efficient retrieval in digital libraries.
  12. Jiang, Y.; Zhang, X.; Tang, Y.; Nie, R.: Feature-based approaches to semantic similarity assessment of concepts using Wikipedia (2015) 0.00
    0.0029727998 = product of:
      0.041619197 = sum of:
        0.041619197 = weight(_text_:representation in 2682) [ClassicSimilarity], result of:
          0.041619197 = score(doc=2682,freq=4.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.35945266 = fieldWeight in 2682, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2682)
      0.071428575 = coord(1/14)
    
    Abstract
    Semantic similarity assessment between concepts is an important task in many language related applications. In the past, several approaches to assess similarity by evaluating the knowledge modeled in an (or multiple) ontology (or ontologies) have been proposed. However, there are some limitations such as the facts of relying on predefined ontologies and fitting non-dynamic domains in the existing measures. Wikipedia provides a very large domain-independent encyclopedic repository and semantic network for computing semantic similarity of concepts with more coverage than usual ontologies. In this paper, we propose some novel feature based similarity assessment methods that are fully dependent on Wikipedia and can avoid most of the limitations and drawbacks introduced above. To implement similarity assessment based on feature by making use of Wikipedia, firstly a formal representation of Wikipedia concepts is presented. We then give a framework for feature based similarity based on the formal representation of Wikipedia concepts. Lastly, we investigate several feature based approaches to semantic similarity measures resulting from instantiations of the framework. The evaluation, based on several widely used benchmarks and a benchmark developed in ourselves, sustains the intuitions with respect to human judgements. Overall, several methods proposed in this paper have good human correlation and constitute some effective ways of determining similarity between Wikipedia concepts.
  13. Qu, R.; Fang, Y.; Bai, W.; Jiang, Y.: Computing semantic similarity based on novel models of semantic representation using Wikipedia (2018) 0.00
    0.0029727998 = product of:
      0.041619197 = sum of:
        0.041619197 = weight(_text_:representation in 5052) [ClassicSimilarity], result of:
          0.041619197 = score(doc=5052,freq=4.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.35945266 = fieldWeight in 5052, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5052)
      0.071428575 = coord(1/14)
    
    Abstract
    Computing Semantic Similarity (SS) between concepts is one of the most critical issues in many domains such as Natural Language Processing and Artificial Intelligence. Over the years, several SS measurement methods have been proposed by exploiting different knowledge resources. Wikipedia provides a large domain-independent encyclopedic repository and a semantic network for computing SS between concepts. Traditional feature-based measures rely on linear combinations of different properties with two main limitations, the insufficient information and the loss of semantic information. In this paper, we propose several hybrid SS measurement approaches by using the Information Content (IC) and features of concepts, which avoid the limitations introduced above. Considering integrating discrete properties into one component, we present two models of semantic representation, called CORM and CARM. Then, we compute SS based on these models and take the IC of categories as a supplement of SS measurement. The evaluation, based on several widely used benchmarks and a benchmark developed by ourselves, sustains the intuitions with respect to human judgments. In summary, our approaches are more efficient in determining SS between concepts and have a better human correlation than previous methods such as Word2Vec and NASARI.
  14. Gao, J.; Zhang, J.: Clustered SVD strategies in latent semantic indexing (2005) 0.00
    0.0029429218 = product of:
      0.041200902 = sum of:
        0.041200902 = weight(_text_:representation in 1166) [ClassicSimilarity], result of:
          0.041200902 = score(doc=1166,freq=2.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.35583997 = fieldWeight in 1166, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1166)
      0.071428575 = coord(1/14)
    
    Abstract
    The text retrieval method using latent semantic indexing (LSI) technique with truncated singular value decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term-document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collections. For large inhomogeneous datasets, the performance of the SVD based text retrieval technique may deteriorate. We propose to partition a large inhomogeneous dataset into several smaller ones with clustered structure, on which we apply the truncated SVD. Our experimental results show that the clustered SVD strategies may enhance the retrieval accuracy and reduce the computing and storage costs.
  15. Colace, F.; Santo, M. De; Greco, L.; Napoletano, P.: Weighted word pairs for query expansion (2015) 0.00
    0.0029429218 = product of:
      0.041200902 = sum of:
        0.041200902 = weight(_text_:representation in 2687) [ClassicSimilarity], result of:
          0.041200902 = score(doc=2687,freq=2.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.35583997 = fieldWeight in 2687, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2687)
      0.071428575 = coord(1/14)
    
    Abstract
    This paper proposes a novel query expansion method to improve accuracy of text retrieval systems. Our method makes use of a minimal relevance feedback to expand the initial query with a structured representation composed of weighted pairs of words. Such a structure is obtained from the relevance feedback through a method for pairs of words selection based on the Probabilistic Topic Model. We compared our method with other baseline query expansion schemes and methods. Evaluations performed on TREC-8 demonstrated the effectiveness of the proposed method with respect to the baseline.
  16. Blocks, D.; Cunliffe, D.; Tudhope, D.: ¬A reference model for user-system interaction in thesaurus-based searching (2006) 0.00
    0.0025225044 = product of:
      0.03531506 = sum of:
        0.03531506 = weight(_text_:representation in 202) [ClassicSimilarity], result of:
          0.03531506 = score(doc=202,freq=2.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.3050057 = fieldWeight in 202, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.046875 = fieldNorm(doc=202)
      0.071428575 = coord(1/14)
    
    Abstract
    The authors present a model of information searching in thesaurus-enhanced search systems, intended as a reference model for system developers. The model focuses on user-system interaction and charts the specific stages of searching an indexed collection with a thesaurus. It was developed based on literature, findings from empirical studies, and analysis of existing systems. The model describes in detail the entities, processes, and decisions when interacting with a search system augmented with a thesaurus. A basic search scenario illustrates this process through the model. Graphical and textual depictions of the model are complemented by a concise matrix representation for evaluation purposes. Potential problems at different stages of the search process are discussed, together with possibilities for system developers. The aim is to set out a framework of processes, decisions, and risks involved in thesaurus-based search, within which system developers can consider potential avenues for support.
  17. Järvelin, K.; Niemi, T.: Deductive information retrieval based on classifications (1993) 0.00
    0.0025225044 = product of:
      0.03531506 = sum of:
        0.03531506 = weight(_text_:representation in 2229) [ClassicSimilarity], result of:
          0.03531506 = score(doc=2229,freq=2.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.3050057 = fieldWeight in 2229, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.046875 = fieldNorm(doc=2229)
      0.071428575 = coord(1/14)
    
    Abstract
    Modern fact databses contain abundant data classified through several classifications. Typically, users msut consult these classifications in separate manuals or files, thus making their effective use difficult. Contemporary database systems do little support deductive use of classifications. In this study we show how deductive data management techniques can be applied to the utilization of data value classifications. Computation of transitive class relationships is of primary importance here. We define a representation of classifications which supports transitive computation and present an operation-oriented deductive query language tailored for classification-based deductive information retrieval. The operations of this language are on the same abstraction level as relational algebra operations and can be integrated with these to form a powerful and flexible query language for deductive information retrieval. We define the integration of these operations and demonstrate the usefulness of the language in terms of several sample queries
  18. Colace, F.; Santo, M. de; Greco, L.; Napoletano, P.: Improving relevance feedback-based query expansion by the use of a weighted word pairs approach (2015) 0.00
    0.0025225044 = product of:
      0.03531506 = sum of:
        0.03531506 = weight(_text_:representation in 2263) [ClassicSimilarity], result of:
          0.03531506 = score(doc=2263,freq=2.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.3050057 = fieldWeight in 2263, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.046875 = fieldNorm(doc=2263)
      0.071428575 = coord(1/14)
    
    Abstract
    In this article, the use of a new term extraction method for query expansion (QE) in text retrieval is investigated. The new method expands the initial query with a structured representation made of weighted word pairs (WWP) extracted from a set of training documents (relevance feedback). Standard text retrieval systems can handle a WWP structure through custom Boolean weighted models. We experimented with both the explicit and pseudorelevance feedback schemas and compared the proposed term extraction method with others in the literature, such as KLD and RM3. Evaluations have been conducted on a number of test collections (Text REtrivel Conference [TREC]-6, -7, -8, -9, and -10). Results demonstrated that the QE method based on this new structure outperforms the baseline.
  19. Wongthontham, P.; Abu-Salih, B.: Ontology-based approach for semantic data extraction from social big data : state-of-the-art and research directions (2018) 0.00
    0.0025225044 = product of:
      0.03531506 = sum of:
        0.03531506 = weight(_text_:representation in 4097) [ClassicSimilarity], result of:
          0.03531506 = score(doc=4097,freq=2.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.3050057 = fieldWeight in 4097, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.046875 = fieldNorm(doc=4097)
      0.071428575 = coord(1/14)
    
    Abstract
    A challenge of managing and extracting useful knowledge from social media data sources has attracted much attention from academic and industry. To address this challenge, semantic analysis of textual data is focused in this paper. We propose an ontology-based approach to extract semantics of textual data and define the domain of data. In other words, we semantically analyse the social data at two levels i.e. the entity level and the domain level. We have chosen Twitter as a social channel challenge for a purpose of concept proof. Domain knowledge is captured in ontologies which are then used to enrich the semantics of tweets provided with specific semantic conceptual representation of entities that appear in the tweets. Case studies are used to demonstrate this approach. We experiment and evaluate our proposed approach with a public dataset collected from Twitter and from the politics domain. The ontology-based approach leverages entity extraction and concept mappings in terms of quantity and accuracy of concept identification.
  20. Quiroga, L.M.; Mostafa, J.: ¬An experiment in building profiles in information filtering : the role of context of user relevance feedback (2002) 0.00
    0.0021020873 = product of:
      0.02942922 = sum of:
        0.02942922 = weight(_text_:representation in 2579) [ClassicSimilarity], result of:
          0.02942922 = score(doc=2579,freq=2.0), product of:
            0.11578492 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.025165197 = queryNorm
            0.25417143 = fieldWeight in 2579, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2579)
      0.071428575 = coord(1/14)
    
    Abstract
    An experiment was conducted to see how relevance feedback could be used to build and adjust profiles to improve the performance of filtering systems. Data was collected during the system interaction of 18 graduate students with SIFTER (Smart Information Filtering Technology for Electronic Resources), a filtering system that ranks incoming information based on users' profiles. The data set came from a collection of 6000 records concerning consumer health. In the first phase of the study, three different modes of profile acquisition were compared. The explicit mode allowed users to directly specify the profile; the implicit mode utilized relevance feedback to create and refine the profile; and the combined mode allowed users to initialize the profile and to continuously refine it using relevance feedback. Filtering performance, measured in terms of Normalized Precision, showed that the three approaches were significantly different ( [small alpha, Greek] =0.05 and p =0.012). The explicit mode of profile acquisition consistently produced superior results. Exclusive reliance on relevance feedback in the implicit mode resulted in inferior performance. The low performance obtained by the implicit acquisition mode motivated the second phase of the study, which aimed to clarify the role of context in relevance feedback judgments. An inductive content analysis of thinking aloud protocols showed dimensions that were highly situational, establishing the importance context plays in feedback relevance assessments. Results suggest the need for better representation of documents, profiles, and relevance feedback mechanisms that incorporate dimensions identified in this research.

Authors

Years

Types

  • a 64
  • el 3
  • m 2
  • More… Less…