Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 15. Juni 2019)
1Zhang, L. ; Wang, S. ; Liu, B.: Deep learning for sentiment analysis : a survey.
Abstract: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.
2Ren, P. ; Chen, Z. ; Ma, J. ; Zhang, Z. ; Si, L. ; Wang, S.: Detecting temporal patterns of user queries.
In: Journal of the Association for Information Science and Technology. 68(2017) no.1, S.113-128.
Abstract: Query classification is an important part of exploring the characteristics of web queries. Existing studies are mainly based on Broder's classification scheme and classify user queries into navigational, informational, and transactional categories according to users' information needs. In this article, we present a novel classification scheme from the perspective of queries' temporal patterns. Queries' temporal patterns are inherent time series patterns of the search volumes of queries that reflect the evolution of the popularity of a query over time. By analyzing the temporal patterns of queries, search engines can more deeply understand the users' search intents and thus improve performance. Furthermore, we extract three groups of features based on the queries' search volume time series and use a support vector machine (SVM) to automatically detect the temporal patterns of user queries. Extensive experiments on the Million Query Track data sets of the Text REtrieval Conference (TREC) demonstrate the effectiveness of our approach.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23578/full.
3Cai, F. ; Wang, S. ; Rijke, M.de: Behavior-based personalization in web search.
In: Journal of the Association for Information Science and Technology. 68(2017) no.4, S.855-868.
Abstract: Personalized search approaches tailor search results to users' current interests, so as to help improve the likelihood of a user finding relevant documents for their query. Previous work on personalized search focuses on using the content of the user's query and of the documents clicked to model the user's preference. In this paper we focus on a different type of signal: We investigate the use of behavioral information for the purpose of search personalization. That is, we consider clicks and dwell time for reranking an initially retrieved list of documents. In particular, we (i) investigate the impact of distributions of users and queries on document reranking; (ii) estimate the relevance of a document for a query at 2 levels, at the query-level and at the word-level, to alleviate the problem of sparseness; and (iii) perform an experimental evaluation both for users seen during the training period and for users not seen during training. For the latter, we explore the use of information from similar users who have been seen during the training period. We use the dwell time on clicked documents to estimate a document's relevance to a query, and perform Bayesian probabilistic matrix factorization to generate a relevance distribution of a document over queries. Our experiments show that: (i) for personalized ranking, behavioral information helps to improve retrieval effectiveness; and (ii) given a query, merging information inferred from behavior of a particular user and from behaviors of other users with a user-dependent adaptive weight outperforms any combination with a fixed weight.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23735/full.
Anmerkung: A preliminary version of this paper was published in the proceedings of SIGIR '14. In this extension, we (i) extend the behavioral personalization search model introduced there to deal with queries issued by new users for whom long-term search logs are unavailable; (ii) examine the impact of sparseness on the performance of our model by considering both word-level and query-level modeling, as we find that the word-document relevance matrix is less sparse than the query-document relevance matrix; (iii) investigate the effectiveness of our behavior-based reranking model with and without assuming a uniform distribution of users as users may behave differently; (iv) include more related work and provide a detailed discussion of the experimental results.
4Gwizdka, J. ; Hosseini, R. ; Cole, M. ; Wang, S.: Temporal dynamics of eye-tracking and EEG during reading and relevance decisions.
In: Journal of the Association for Information Science and Technology. 68(2017) no.10, S.2299-2312.
Abstract: Assessment of text relevance is an important aspect of human-information interaction. For many search sessions it is essential to achieving the task goal. This work investigates text relevance decision dynamics in a question-answering task by direct measurement of eye movement using eye-tracking and brain activity using electroencephalography EEG. The EEG measurements are correlated with the user's goal-directed attention allocation revealed by their eye movements. In a within-subject lab experiment (N?=?24), participants read short news stories of varied relevance. Eye movement and EEG features were calculated in three epochs of reading each news story (early, middle, final) and for periods where relevant words were read. Perceived relevance classification models were learned for each epoch. The results show reading epochs where relevant words were processed could be distinguished from other epochs. The classification models show increasing divergence in processing relevant vs. irrelevant documents after the initial epoch. This suggests differences in cognitive processes used to assess texts of varied relevance levels and provides evidence for the potential to detect these differences in information search sessions using eye tracking and EEG.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23904/full.
5Tang, K.-H. ; Tsai, L.-C. ; Hwang, S.-L.: ¬The development and validation of a one-bit comparison for evaluating the maturity of tag distributions in a Web 2.0 environment.
In: Journal of the Association for Information Science and Technology. 67(2016) no.6, S.1430-1445.
Abstract: Tags generated by domain experts reaching a consensus under social influence reflect the core concepts of the tagged resource. Such tags can act as navigational cues that enable users to discover meaningful and relevant information in a Web 2.0 environment. This is particularly critical for nonexperts for understanding formal academic or scientific resources, also known as hard content. The goal of this study was to develop a novel one-bit comparison (OBC) metric and to assess in what circumstances a set of tags describing a hard-content resource is mature and representative. We compared OBC with the conventional Shannon entropy approach to determine performance when distinguishing tags generated by domain experts and nonexperts in the early and later stages under social influence. The results indicated that OBC can accurately distinguish mature tags generated by a strong expert consensus from other tags, and outperform Shannon entropy. The findings support tag-based learning, and provide insights and tools for the design of applications involving tags, such as tag recommendation and tag-based organization.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23454/abstract.
6Cui, C. ; Ma, J. ; Lian, T. ; Chen, Z. ; Wang, S.: Improving image annotation via ranking-oriented neighbor search and learning-based keyword propagation.
In: Journal of the Association for Information Science and Technology. 66(2015) no.1, S.82-98.
Abstract: Automatic image annotation plays a critical role in modern keyword-based image retrieval systems. For this task, the nearest-neighbor-based scheme works in two phases: first, it finds the most similar neighbors of a new image from the set of labeled images; then, it propagates the keywords associated with the neighbors to the new image. In this article, we propose a novel approach for image annotation, which simultaneously improves both phases of the nearest-neighbor-based scheme. In the phase of neighbor search, different from existing work discovering the nearest neighbors with the predicted distance, we introduce a ranking-oriented neighbor search mechanism (RNSM), where the ordering of labeled images is optimized directly without going through the intermediate step of distance prediction. In the phase of keyword propagation, different from existing work using simple heuristic rules to select the propagated keywords, we present a learning-based keyword propagation strategy (LKPS), where a scoring function is learned to evaluate the relevance of keywords based on their multiple relations with the nearest neighbors. Extensive experiments on the Corel 5K data set and the MIR Flickr data set demonstrate the effectiveness of our approach.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23163/abstract.
Behandelte Form: Bilder
7Wang, S. ; Koopman, R.: Second life for authority records.
In: Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro. Würzburg : Ergon-Verlag, 2015. S.197-200.
Abstract: Authority control is a standard practice in the library community that provides consistent, unique, and unambiguous reference to entities such as persons, places, concepts, etc. The ideal way of referring to authority records through unique identifiers is in line with the current linked data principle. When presenting a bibliographic record, the linked authority records are expanded with the authoritative information. This way, any update in the authority records will not affect the indexing of the bibliographic records. The structural information in the authority files can also be leveraged to expand the user's query to retrieve bibliographic records associated with all the variations, narrower terms or related terms. However, in many digital libraries, especially largescale aggregations such as WorldCat and Europeana, name strings are often used instead of authority record identifiers. This is also partly due to the lack of global authority records that are valid across countries and cultural heritage domains. But even when there are global authority systems, they are not applied at scale. For example, in WorldCat, only 15% of the records have DDC and 3% have UDC codes; less than 40% of the records have one or more topical terms catalogued in the 650 MARC field, many of which are too general (such as "sports" or "literature") to be useful for retrieving bibliographic records. Therefore, when a user query is based on a Dewey code, the results usually have high precision but the recall is much lower than it should be; and, a search on a general topical term returns millions of hits without being even complete. All these practices make it difficult to leverage the key benefits of authority files. This is also true for authority files that have been transformed into linked data and enriched with mapping information. There are practical reasons for using name strings instead of identifiers. One is the indexing and query response. The future infrastructure design should take the performance into account while embracing the benefit of linking instead of copying, without introducing extra complexity to users. Notwithstanding all the restrictions, we argue that largescale aggregations also bring new opportunities for better exploiting the benefits of authority records. It is possible to use machine learning techniques to automatically link bibliographic records to authority records based on the manual input of cataloguers. Text mining and visualization techniques can offer a contextual view of authority records, which in return can be used to retrieve missing or mis-catalogued records. In this talk, we will describe such opportunities in more detail.
Inhalt: Präsentation unter: http://seminar.udcc.org/2015/images/Wang-Koopman_InternationalUDCSeminar2015.pdf.
8Wang, S. ; Isaac, A. ; Schlobach, S. ; Meij, L. van der ; Schopman, B.: Instance-based semantic interoperability in the cultural heritage.
In: Semantic Web journal. 3(2012) no.1, S.45-64.
Abstract: This paper gives a comprehensive overview over the problem of Semantic Interoperability in the Cultural Heritage domain, with a particular focus on solutions centered around extensional, i.e., instance-based, ontology matching methods. It presents three typical scenarios requiring interoperability, one with homogenous collections, one with heterogeneous collections, and one with multi-lingual collection. It discusses two different ways to evaluate potential alignments, one based on the application of re-indexing, one using a reference alignment. To these scenarios we apply extensional matching with different similarity measures which gives interesting insights. Finally, we firmly position our work in the Cultural Heritage context through an extensive discussion of the relevance for, and issues related to this specific field. The findings are as unspectacular as expected but nevertheless important: the provided methods can really improve interoperability in a number of important cases, but they are not universal solutions to all related problems. This paper will provide a solid foundation for any future work on Semantic Interoperability in the Cultural Heritage domain, in particular for anybody intending to apply extensional methods.
Inhalt: Beitrag eines Schwerpunktthemas: Semantic Web and Reasoning for Cultural Heritage and Digital Libraries: http://www.semantic-web-journal.net/content/instance-based-semantic-interoperability-cultural-heritage http://www.semantic-web-journal.net/sites/default/files/swj157_1.pdf.
Themenfeld: Semantische Interoperabilität
9Hwang, S.-Y. ; Yang, W.-S. ; Ting, K.-D.: Automatic index construction for multimedia digital libraries.
In: Information processing and management. 46(2010) no.3, S.295-307.
Abstract: Indexing remains one of the most popular tools provided by digital libraries to help users identify and understand the characteristics of the information they need. Despite extensive studies of the problem of automatic index construction for text-based digital libraries, the construction of multimedia digital libraries continues to represent a challenge, because multimedia objects usually lack sufficient text information to ensure reliable index learning. This research attempts to tackle the problem of automatic index construction for multimedia objects by employing Web usage logs and limited keywords pertaining to multimedia objects. The tests of two proposed algorithms use two different data sets with different amounts of textual information. Web usage logs offer precious information for building indexes of multimedia digital libraries with limited textual information. The proposed methods generally yield better indexes, especially for the artwork data set.
Themenfeld: Multimedia ; Internet
10Isaac, A. ; Wang, S. ; Zinn, C. ; Matthezing, H. ; Meij, L. van der ; Schlobach, S.: Evaluating thesaurus alignments for semantic interoperability in the library domain.
In: IEEE intelligent systems. 24(2009) no.2, S.76-86.
Abstract: Thesaurus alignments play an important role in realizing efficient access to heterogeneous cultural-heritage data. Current technology, however, provides only limited value for such access because it fails to bridge the gap between theoretical study and practical application requirements. This article explores common real-world library problems and identifies solutions that focus on the application-embedded study, development, and evaluation of matching technology.
Inhalt: Vgl. auch: http://www.dit.unitn.it/~p2p/RelatedWork/Matching/wang_ieee.pdf.
Themenfeld: Semantische Interoperabilität
11Wang, S. ; Isaac, A. ; Schopman, B. ; Schlobach, S. ; Meij, L. van der: Matching multilingual subject vocabularies.
Abstract: Most libraries and other cultural heritage institutions use controlled knowledge organisation systems, such as thesauri, to describe their collections. Unfortunately, as most of these institutions use different such systems, united access to heterogeneous collections is difficult. Things are even worse in an international context when concepts have labels in different languages. In order to overcome the multilingual interoperability problem between European Libraries, extensive work has been done to manually map concepts from different knowledge organisation systems, which is a tedious and expensive process. Within the TELplus project, we developed and evaluated methods to automatically discover these mappings, using different ontology matching techniques. In experiments on major French, English and German subject heading lists Rameau, LCSH and SWD, we show that we can automatically produce mappings of surprisingly good quality, even when using relatively naive translation and matching methods.
Inhalt: Beitrag für: ECDL 2009, Sept. 27 - Oct. 02, 2009, Korfu.
Themenfeld: Semantische Interoperabilität ; Multilinguale Probleme
Objekt: TEL ; SWD ; Rameau ; LCSH
12Hollink, L. ; Assem, M. van ; Wang, S. ; Isaac, A. ; Schreiber, G.: Two variations on ontology alignment evaluation : methodological issues.
Abstract: Evaluation of ontology alignments is in practice done in two ways: (1) assessing individual correspondences and (2) comparing the alignment to a reference alignment. However, this type of evaluation does not guarantee that an application which uses the alignment will perform well. In this paper, we contribute to the current ontology alignment evaluation practices by proposing two alternative evaluation methods that take into account some characteristics of a usage scenario without doing a full-fledged end-to-end evaluation. We compare different evaluation approaches in three case studies, focussing on methodological issues. Each case study considers an alignment between a different pair of ontologies, ranging from rich and well-structured to small and poorly structured. This enables us to conclude on the use of different evaluation approaches in different settings.
Themenfeld: Wissensrepräsentation ; Semantische Interoperabilität
13Clark, D.A. ; Mitra, P.P. ; Wang, S.S.-H.: ¬The mammalian brain : a question of scale.
In: Nature. 2001, Nr.411 vom 10.5.2001, S.189-193.
Abstract: Comparisons of brain size and structure have traditionally been considered with reference to another scaling variable such as body size. Now Clark et al. have developed a new method of comparing the brains of different mammalian species by normalizing brain component size using the whole brain or 'telencephalon' as the reference unit. The 'cerebrotype' thus obtained corresponds well with established evolutionary relationships. Within each taxon, brain regions are scalable and tend to maintain a fixed ratio to one another independently of absolute total brain volume
Anmerkung: Vgl. auch: http://www.nature.com/nature/links/010510/010510-2.html