Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 23. Dezember 2017)
1Sánchez, D. ; Batet, M.: C-sanitized : a privacy model for document redaction and sanitization.
In: Journal of the Association for Information Science and Technology. 67(2016) no.1, S.148-163.
Abstract: Vast amounts of information are daily exchanged and/or released. The sensitive nature of much of this information creates a serious privacy threat when documents are uncontrollably made available to untrusted third parties. In such cases, appropriate data protection measures should be undertaken by the responsible organization, especially under the umbrella of current legislation on data privacy. To do so, human experts are usually requested to redact or sanitize document contents. To relieve this burdensome task, this paper presents a privacy model for document redaction/sanitization, which offers several advantages over other models available in the literature. Based on the well-established foundations of data semantics and information theory, our model provides a framework to develop and implement automated and inherently semantic redaction/sanitization tools. Moreover, contrary to ad-hoc redaction methods, our proposal provides a priori privacy guarantees which can be intuitively defined according to current legislations on data privacy. Empirical tests performed within the context of several use cases illustrate the applicability of our model and its ability to mimic the reasoning of human sanitizers.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23363/abstract.
2Viejo, A. ; Sánchez, D.: Profiling social networks to provide useful and privacy-preserving web search.
In: Journal of the Association for Information Science and Technology. 65(2014) no.12, S.2444-2458.
Abstract: Web search engines (WSEs) use search queries to profile users and to provide personalized services like query disambiguation or refinement. These services are valuable because users get an enhanced search experience. However, the compiled user profiles may contain sensitive information that might represent a privacy threat. This issue should be addressed in a way that it also preserves the utility of the profile with regard to search services. State-of-the-art approaches tackle these issues by generating and submitting fake queries that are related to the interests of the user. This technique allows the WSE to only know general (and useful) data while the detailed (and potentially private) data are obfuscated. To build fake queries, these proposals rely on past queries to obtain user interests. However, we argue that this is not always the best strategy and, in this article, we study the use of social networks to gather more accurate user profiles that enable better personalized service while offering a similar, or even better, level of practical privacy. These hypotheses are empirically supported by evaluations using real profiles gathered from Twitter and a set of AOL search queries.
3Sánchez, D. ; Batet, M. ; Valls, A. ; Gibert, K.: Ontology-driven web-based semantic similarity.
In: Journal of intelligent information systems. 35(2010) no.x, S.383-413.
Abstract: Estimation of the degree of semantic similarity/distance between concepts is a very common problem in research areas such as natural language processing, knowledge acquisition, information retrieval or data mining. In the past, many similarity measures have been proposed, exploiting explicit knowledge-such as the structure of a taxonomy-or implicit knowledge-such as information distribution. In the former case, taxonomies and/or ontologies are used to introduce additional semantics; in the latter case, frequencies of term appearances in a corpus are considered. Classical measures based on those premises suffer from some problems: in the ?rst case, their excessive dependency of the taxonomical/ontological structure; in the second case, the lack of semantics of a pure statistical analysis of occurrences and/or the ambiguity of estimating concept statistical distribution from term appearances. Measures based on Information Content (IC) of taxonomical concepts combine both approaches. However, they heavily depend on a properly pre-tagged and disambiguated corpus according to the ontological entities in order to computer accurate concept appearance probabilities. This limits the applicability of those measures to other ontologies - like specific domain ontologies - and massive corpus - like the Web. In this paper, several of the presente issues are analyzed. Modifications of classical similarity measures are also proposed. They are based on a contextualized and scalable version of IC computation in the Web by exploiting taxonomical knowledge. The goal is to avoid the measures' dependency on the corpus pre-processing to achieve reliable results and minimize language ambiguity. Our proposals are able to outperform classical approaches when using the Web for estimating concept probabilities.
Inhalt: Vgl.: http://www.springerlink.com/content/p115p325222u0687/fulltext.pdf.
4Blanco, I. ; Martín-Bautista, M.J. ; Sánchez, D. ; Vila, A.: Fuzzy logic for measuring information retrieval effectiviness.
In: Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas. Würzburg : Ergon Verlag, 2003. S.578-585.
(Advances in knowledge organization; vol.8)
Abstract: We present a new fuzzy extension of the classical effectiveness measures of information retrieval by a new way to calculate the relative cardinality of a fuzzy set. Previous approaches using Zadeh's cardinality are compared to our new approach in an experimental stage. The experiments have been carried out with a genetic algorithm where the fitness function to optimize is a combination of the fuzzy recall and fuzzy precision measures. Results included at the end of the paper show the goodness of our proposal.
5Sánchez, D. ; Chamorro-Martínez, J. ; Vila, M.A.: Modelling subjectivity in visual perception of orientation for image retrieval.
In: Information processing and management. 39(2003) no.2, S.251-266.
Abstract: In this paper we combine computer vision and data mining techniques to model high-level concepts for image retrieval, on the basis of basic perceptual features of the human visual system. High-level concepts related to these features are learned and represented by means of a set of fuzzy association rules. The concepts so acquired can be used for image retrieval with the advantage that it is not needed to provide an image as a query. Instead, a query is formulated by using the labels that identify the learned concepts as search terms, and the retrieval process calculates the relevance of an image to the query by an inference mechanism. An additional feature of our methodology is that it can capture user's subjectivity. For that purpose, fuzzy sets theory is employed to measure user's assessments about the fulfillment of a concept by an image.
Themenfeld: Data Mining
Behandelte Form: Bilder