Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 28. April 2022)
1Bhattacharya, S. ; Yang, C. ; Srinivasan, P. ; Boynton, B.: Perceptions of presidential candidates' personalities in twitter.
In: Journal of the Association for Information Science and Technology. 67(2016) no.2, S.249-267.
Abstract: Political sentiment analysis using social media, especially Twitter, has attracted wide interest in recent years. In such research, opinions about politicians are typically divided into positive, negative, or neutral. In our research, the goal is to mine political opinion from social media at a higher resolution by assessing statements of opinion related to the personality traits of politicians; this is an angle that has not yet been considered in social media research. A second goal is to contribute a novel retrieval-based approach for tracking public perception of personality using Gough and Heilbrun's Adjective Check List (ACL) of 110 terms describing key traits. This is in contrast to the typical lexical and machine-learning approaches used in sentiment analysis. High-precision search templates developed from the ACL were run on an 18-month span of Twitter posts mentioning Obama and Romney and these retrieved more than half a million tweets. For example, the results indicated that Romney was perceived as more of an achiever and Obama was perceived as somewhat more friendly. The traits were also aggregated into 14 broad personality dimensions. For example, Obama rated far higher than Romney on the Moderation dimension and lower on the Machiavellianism dimension. The temporal variability of such perceptions was explored.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23377/abstract.
2Qiu, X.Y. ; Srinivasan, P. ; Hu, Y.: Supervised learning models to predict firm performance with annual reports : an empirical study.
In: Journal of the Association for Information Science and Technology. 65(2014) no.2, S.400-413.
Abstract: Text mining and machine learning methodologies have been applied toward knowledge discovery in several domains, such as biomedicine and business. Interestingly, in the business domain, the text mining and machine learning community has minimally explored company annual reports with their mandatory disclosures. In this study, we explore the question "How can annual reports be used to predict change in company performance from one year to the next?" from a text mining perspective. Our article contributes a systematic study of the potential of company mandatory disclosures using a computational viewpoint in the following aspects: (a) We characterize our research problem along distinct dimensions to gain a reasonably comprehensive understanding of the capacity of supervised learning methods in predicting change in company performance using annual reports, and (b) our findings from unbiased systematic experiments provide further evidence about the economic incentives faced by analysts in their stock recommendations and speculations on analysts having access to more information in producing earnings forecast.
Themenfeld: Data Mining
3Srinivasan, P.: Text mining in biomedicine : challenges and opportunities.
In: Knowledge organization, information systems and other essays: Professor A. Neelameghan Festschrift. Ed. by K.S. Raghavan and K.N. Prasad. New Delhi : Ess Ess Publications, 2006. S.221-236.
Abstract: Text mining is about making serendipity more likely. Serendipity, the chance discovery of interesting ideas, has been responsible for many discoveries in science. Text mining systems strive to explore large text collections, separate the potentially meaningfull connections from a vast and mostly noisy background of random associations. In this paper we provide a summary of our text mining approach and also illustrate briefly some of the experiments we have conducted with this approach. In particular we use a profile-based text mining method. We have used these profiles to explore the global distribution of disease research, replicate discoveries made by others and propose new hypotheses. Text mining holds much potential that has yet to be tapped.
Themenfeld: Data Mining
4Srinivasan, P.: Text mining : generating hypotheses from MEDLINE.
In: Journal of the American Society for Information Science and technology. 55(2004) no.5, S.396-413.
Abstract: Hypothesis generation, a crucial initial step for making scientific discoveries, relies an prior knowledge, experience, and intuition. Chance connections made between seemingly distinct subareas sometimes turn out to be fruitful. The goal in text mining is to assist in this process by automatically discovering a small set of interesting hypotheses from a suitable text collection. In this report, we present open and closed text mining algorithms that are built within the discovery framework established by Swanson and Smalheiser. Our algorithms represent topics using metadata profiles. When applied to MEDLINE, these are McSH based profiles. We present experiments that demonstrate the effectiveness of our algorithms. Specifically, our algorithms successfully generate ranked term lists where the key terms representing novel relationships between topics are ranked high.
Themenfeld: Data Mining
5Ruiz, M.E. ; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization.
In: Advances in classification research, vol.10: proceedings of the 10th ASIS SIG/CR Classification Research Workshop. Ed.: Albrechtsen, H. u. J.E. Mai. Medford, NJ : Information Today, 2001. S.107-124.
(ASIS monograph series)
Abstract: This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based an the divide-and-conquer principle. The method is evaluated using backpropagation neural networks, such as the machine learning algorithm, that leam to assign MeSH categories to a subset of MEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers, are provided. The results indicate that the use of hierarchical structures improves Performance significantly.
Themenfeld: Automatisches Klassifizieren ; Computerlinguistik
6Srinivasan, P.: Query expansion and MEDLINE.
In: Information processing and management. 32(1996) no.4, S.431-443.
Abstract: Evaluates the retrieval effectiveness of query expansion strategies on a test collection of the medical database MEDLINE using Cornell University's SMART retrieval system. Tests 3 expansion strategies for their ability to identify appropriate MeSH terms for user queries. Compares retrieval effectiveness using the original unexpanded and the alternative expanded user queries on a collection of 75 queries and 2.334 Medline citations. Recommends query expansions using retrieval feedback for adding MeSH search terms to a user's initial query
Themenfeld: Retrievalalgorithmen ; Semantisches Umfeld in Indexierung u. Retrieval
Objekt: MEDLINE ; SMART
7Srinivasan, P.: Optimal document-indexing vocabulary for MEDLINE.
In: Information processing and management. 32(1996) no.5, S.503-514.
Abstract: The indexing vocabulary is an important determinant of success in text retrieval. Researchers have compared the effectiveness of indexing using free text and controlled vocabularies in a variety of text contexts. A number of studies have investigated the relative merits of free-text, MeSH and UMLS metathesaurus indexing vocabularies for MEDLINE document indexing. Controlled vocabularies offer no advantages in retrieval performance over free text. Offers a detailed analysis of prior results and their underlying experimental designs. Offers results from a new experiment assessing 8 different retrieval strategies. Results indicate that MeSH does have an important role in text retrieval
Objekt: MEDLINE ; MeSH ; UMLS
8Srinivasan, P. ; Ruiz, M.E. ; Lam, W.: ¬An investigation of indexing on the WWW.
In: Global complexity: information, chaos and control. Proceedings of the 59th Annual Meeting of the American Society for Information Science, ASIS'96, Baltimore, Maryland, 21-24 Oct 1996. Ed.: S. Hardin. Medford, NJ : Learned Information, 1996. S.79-83.
Abstract: Proposes a model that assists in understanding indexing on the WWW. It specifies key features of indexing startegies that are currently being used. Presents an experiment assessing the validity of Inverse Document Frequency (IDF) as a term weighting strategy for WWW documents. The experiment indicates that IDF scores are not stable in the heterogeneous and dynamic context of the WWW. Recommends further investigation to clarify the effectiveness of alternative indexing strategies for the WWW
9Srinivasan, P.: Thesaurus construction.
In: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates. Englewood Cliffs, NJ : Prentice Hall, 1992. S.161-218.
Abstract: Thesauri are valuable structures for Information Retrieval systems. A thesaurus provides a precise and controlled vocabulary which serves to coordinate dacument indexing and document retrieval. In both indexing and retrieval, a thesaurus may be used to select the most appropriate terms. Additionally, the thesaurus can assist the searcher in reformulating search strategies if required. Examines the important features of thesauri. This should allow the reader to differentiate between thesauri. Next, a brief overview of the manual thesaurus construction process is given. 2 major approaches for automatic thesaurus construction have been selected for detailed examination. The first is on thesaurus construction from collections of documents,a nd the 2nd, on thesaurus construction by merging existing thesauri. These 2 methods were selected since they rely on statistical techniques alone and are also significantly different from each other. Programs written in C language accompany the discussion of these approaches
Themenfeld: Konzeption und Anwendung des Prinzips Thesaurus
11Srinivasan, P.: On generalizing the Two-Poisson Model.
In: Journal of the American Society for Information Science. 41(1990) no.1, S.61-66.
Abstract: Automatic indexing is one of the important functions of a modern document retrieval system. Numerous techniques for this function have been proposed in the literature ranging from purely statistical to linguistically complex mechanisms. Most result from examining properties of terms. Examines term distribution within the framework of the Poisson models. Specifically examines the effectiveness of the Two-Poisson and the Three-Poisson model to see if generalisation results in increased effectiveness. The results show that the Two-Poisson model is only moderately effective in identifying index terms. In addition, generalisation to the Three-Poisson does not give any additional power. The only Poisson model which consistently works well is the basic One-Poisson model. Also discusses term distribution information.
Themenfeld: Automatisches Indexieren
12Srinivasan, P.: Intelligent information retrieval using rough set approximations.
In: Information processing and management. 25(1989) no.4, S.347-361.
Abstract: The theory of rough sets was introduced in 1982. It allows the classification of objects into sets of equivalent members based on their attributes. Any combination of the same objetcts (or even their attributes) may be examined using the resultant classification. The theory has direct applications in the design and evaluation of classification schemes and the selection of discriminating attributes. Introductory papers discuss its application in the domain of medical diagnostic systems and the design of information retrieval systems accessing collections of documents. Advantages offered by the theory are: the implicit inclusion of Boolean logic; term weighting; and the ability to rank retrieved documents.