Literatur zur Informationserschließung
Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft
/
Powered by litecat, BIS Oldenburg
(Stand: 04. Juni 2021)
Suche
Suchergebnisse
Treffer 1–20 von 170
sortiert nach:
-
1Borgman, C.L. ; Wofford, M.F. ; Golshan, M.S. ; Darch, P.T.: Collaborative qualitative research at scale : reflections on 20 years of acquiring global data and making data global.
In: Journal of the Association for Information Science and Technology. 72(2021) no.6, S.667-682.
Abstract: A 5-year project to study scientific data uses in geography, starting in 1999, evolved into 20 years of research on data practices in sensor networks, environmental sciences, biology, seismology, undersea science, biomedicine, astronomy, and other fields. By emulating the "team science" approaches of the scientists studied, the UCLA Center for Knowledge Infrastructures accumulated a comprehensive collection of qualitative data about how scientists generate, manage, use, and reuse data across domains. Building upon Paul N. Edwards's model of "making global data"-collecting signals via consistent methods, technologies, and policies-to "make data global"-comparing and integrating those data, the research team has managed and exploited these data as a collaborative resource. This article reflects on the social, technical, organizational, economic, and policy challenges the team has encountered in creating new knowledge from data old and new. We reflect on continuity over generations of students and staff, transitions between grants, transfer of legacy data between software tools, research methods, and the role of professional data managers in the social sciences.
Inhalt: Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24439.
Themenfeld: Data Mining
-
2Deutsche Forschungsgemeinschaft / Ausschuss für Wissenschaftliche Bibliotheken und Informationssysteme (Hrsg.): Datentracking in der Wissenschaft : Aggregation und Verwendung bzw. Verkauf von Nutzungsdaten durch Wissenschaftsverlage. Ein Informationspapier des Ausschusses für Wissenschaftliche Bibliotheken und Informationssysteme der Deutschen Forschungsgemeinschaft.[20. Mai 2021].
In: https://www.dfg.de/download/pdf/foerderung/programme/lis/datentracking_papier_de.pdf.
Abstract: Das Informationspapier beschreibt die digitale Nachverfolgung von wissenschaftlichen Aktivitäten. Wissenschaftlerinnen und Wissenschaftler nutzen täglich eine Vielzahl von digitalen Informationsressourcen wie zum Beispiel Literatur- und Volltextdatenbanken. Häufig fallen dabei Nutzungsspuren an, die Aufschluss geben über gesuchte und genutzte Inhalte, Verweildauern und andere Arten der wissenschaftlichen Aktivität. Diese Nutzungsspuren können von den Anbietenden der Informationsressourcen festgehalten, aggregiert und weiterverwendet oder verkauft werden. Das Informationspapier legt die Transformation von Wissenschaftsverlagen hin zu Data Analytics Businesses dar, verweist auf die Konsequenzen daraus für die Wissenschaft und deren Einrichtungen und benennt die zum Einsatz kommenden Typen der Datengewinnung. Damit dient es vor allem der Darstellung gegenwärtiger Praktiken und soll zu Diskussionen über deren Konsequenzen für die Wissenschaft anregen. Es richtet sich an alle Wissenschaftlerinnen und Wissenschaftler sowie alle Akteure in der Wissenschaftslandschaft.
Inhalt: Vgl. Mail an Inetbib vom 21.05.2021 von Juliane Kant.
Themenfeld: Elektronisches Publizieren ; Data Mining
-
3Jones, K.M.L. ; Rubel, A. ; LeClere, E.: ¬A matter of trust : higher education institutions as information fiduciaries in an age of educational data mining and learning analytics.
In: Journal of the Association for Information Science and Technology. 71(2020) no.10, S.1227-1241.
Abstract: Higher education institutions are mining and analyzing student data to effect educational, political, and managerial outcomes. Done under the banner of "learning analytics," this work can-and often does-surface sensitive data and information about, inter alia, a student's demographics, academic performance, offline and online movements, physical fitness, mental wellbeing, and social network. With these data, institutions and third parties are able to describe student life, predict future behaviors, and intervene to address academic or other barriers to student success (however defined). Learning analytics, consequently, raise serious issues concerning student privacy, autonomy, and the appropriate flow of student data. We argue that issues around privacy lead to valid questions about the degree to which students should trust their institution to use learning analytics data and other artifacts (algorithms, predictive scores) with their interests in mind. We argue that higher education institutions are paradigms of information fiduciaries. As such, colleges and universities have a special responsibility to their students. In this article, we use the information fiduciary concept to analyze cases when learning analytics violate an institution's responsibility to its students.
Inhalt: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24327.
Themenfeld: Data Mining
Wissenschaftsfach: Erziehungswissenschaft
-
4Fonseca, F. ; Marcinkowski, M. ; Davis, C.: Cyber-human systems of thought and understanding.
In: Journal of the Association for Information Science and Technology. 70(2019) no.4, S.402-411.
Abstract: The present challenge faced by scientists working with Big Data comes in the overwhelming volume and level of detail provided by current data sets. Exceeding traditional empirical approaches, Big Data opens a new perspective on scientific work in which data comes to play a role in the development of the scientific problematic to be developed. Addressing this reconfiguration of our relationship with data through readings of Wittgenstein, Macherey, and Popper, we propose a picture of science that encourages scientists to engage with the data in a direct way, using the data itself as an instrument for scientific investigation. Using GIS as a theme, we develop the concept of cyber-human systems of thought and understanding to bridge the divide between representative (theoretical) thinking and (non-theoretical) data-driven science. At the foundation of these systems, we invoke the concept of the "semantic pixel" to establish a logical and virtual space linking data and the work of scientists. It is with this discussion of the relationship between analysts in their pursuit of knowledge and the rise of Big Data that this present discussion of the philosophical foundations of Big Data addresses the central questions raised by social informatics research.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24132.
Anmerkung: Beitrag eines Special issue on social informatics of knowledge
Themenfeld: Data Mining
Wissenschaftsfach: Informatik ; Philosophie
-
5Short, M.: Text mining and subject analysis for fiction; or, using machine learning and information extraction to assign subject headings to dime novels.
In: Cataloging and classification quarterly. 57(2019) no.5, S.315-336.
Abstract: This article describes multiple experiments in text mining at Northern Illinois University that were undertaken to improve the efficiency and accuracy of cataloging. It focuses narrowly on subject analysis of dime novels, a format of inexpensive fiction that was popular in the United States between 1860 and 1915. NIU holds more than 55,000 dime novels in its collections, which it is in the process of comprehensively digitizing. Classification, keyword extraction, named-entity recognition, clustering, and topic modeling are discussed as means of assigning subject headings to improve their discoverability by researchers and to increase the productivity of digitization workflows.
Inhalt: Vgl.: https://doi.org/10.1080/01639374.2019.1653413.
Themenfeld: Schöne Literatur ; Automatisches Indexieren ; Data Mining ; Inhaltsanalyse
-
6Bella, A. La ; Fronzetti Colladon, A. ; Battistoni, E. ; Castellan, S. ; Francucci, M.: Assessing perceived organizational leadership styles through twitter text mining.
In: Journal of the Association for Information Science and Technology. 69(2018) no.1, S.21-31.
Abstract: We propose a text classification tool based on support vector machines for the assessment of organizational leadership styles, as appearing to Twitter users. We collected Twitter data over 51 days, related to the first 30 Italian organizations in the 2015 ranking of Forbes Global 2000-out of which we selected the five with the most relevant volumes of tweets. We analyzed the communication of the company leaders, together with the dialogue among the stakeholders of each company, to understand the association with perceived leadership styles and dimensions. To assess leadership profiles, we referred to the 10-factor model developed by Barchiesi and La Bella in 2007. We maintain the distinctiveness of the approach we propose, as it allows a rapid assessment of the perceived leadership capabilities of an enterprise, as they emerge from its social media interactions. It can also be used to show how companies respond and manage their communication when specific events take place, and to assess their stakeholder's reactions.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23918/full.
Themenfeld: Data Mining
Objekt: Twitter
-
7Wongthontham, P. ; Abu-Salih, B.: Ontology-based approach for semantic data extraction from social big data : state-of-the-art and research directions.
In: http://arxiv.org/abs/1801.01624.
Abstract: A challenge of managing and extracting useful knowledge from social media data sources has attracted much attention from academic and industry. To address this challenge, semantic analysis of textual data is focused in this paper. We propose an ontology-based approach to extract semantics of textual data and define the domain of data. In other words, we semantically analyse the social data at two levels i.e. the entity level and the domain level. We have chosen Twitter as a social channel challenge for a purpose of concept proof. Domain knowledge is captured in ontologies which are then used to enrich the semantics of tweets provided with specific semantic conceptual representation of entities that appear in the tweets. Case studies are used to demonstrate this approach. We experiment and evaluate our proposed approach with a public dataset collected from Twitter and from the politics domain. The ontology-based approach leverages entity extraction and concept mappings in terms of quantity and accuracy of concept identification.
Themenfeld: Data Mining ; Semantisches Umfeld in Indexierung u. Retrieval
-
8Ebrahimi, M. ; ShafieiBavani, E. ; Wong, R. ; Chen, F.: Twitter user geolocation by filtering of highly mentioned users.
In: Journal of the Association for Information Science and Technology. 69(2018) no.7, S.879-889.
Abstract: Geolocated social media data provide a powerful source of information about places and regional human behavior. Because only a small amount of social media data have been geolocation-annotated, inference techniques play a substantial role to increase the volume of annotated data. Conventional research in this area has been based on the text content of posts from a given user or the social network of the user, with some recent crossovers between the text- and network-based approaches. This paper proposes a novel approach to categorize highly-mentioned users (celebrities) into Local and Global types, and consequently use Local celebrities as location indicators. A label propagation algorithm is then used over the refined social network for geolocation inference. Finally, we propose a hybrid approach by merging a text-based method as a back-off strategy into our network-based approach. Empirical experiments over three standard Twitter benchmark data sets demonstrate that our approach outperforms state-of-the-art user geolocation methods.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.24011.
Themenfeld: Data Mining
Objekt: Twitter
-
9Saggi, M.K. ; Jain, S.: ¬A survey towards an integration of big data analytics to big insights for value-creation.
In: Information processing and management. 54(2018) no.5, S.758-790.
Abstract: Big Data Analytics (BDA) is increasingly becoming a trending practice that generates an enormous amount of data and provides a new opportunity that is helpful in relevant decision-making. The developments in Big Data Analytics provide a new paradigm and solutions for big data sources, storage, and advanced analytics. The BDA provide a nuanced view of big data development, and insights on how it can truly create value for firm and customer. This article presents a comprehensive, well-informed examination, and realistic analysis of deploying big data analytics successfully in companies. It provides an overview of the architecture of BDA including six components, namely: (i) data generation, (ii) data acquisition, (iii) data storage, (iv) advanced data analytics, (v) data visualization, and (vi) decision-making for value-creation. In this paper, seven V's characteristics of BDA namely Volume, Velocity, Variety, Valence, Veracity, Variability, and Value are explored. The various big data analytics tools, techniques and technologies have been described. Furthermore, it presents a methodical analysis for the usage of Big Data Analytics in various applications such as agriculture, healthcare, cyber security, and smart city. This paper also highlights the previous research, challenges, current status, and future directions of big data analytics for various application platforms. This overview highlights three issues, namely (i) concepts, characteristics and processing paradigms of Big Data Analytics; (ii) the state-of-the-art framework for decision-making in BDA for companies to insight value-creation; and (iii) the current challenges of Big Data Analytics as well as possible future directions.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2018.01.010.
Anmerkung: Beitrag in einem Themenheft: 'In (Big) Data we trust: Value creation in knowledge organizations'.
Themenfeld: Data Mining
-
10Jäger, L.: Von Big Data zu Big Brother.[20. Januar 2018].
In: https://www.heise.de/tp/features/Von-Big-Data-zu-Big-Brother-3946125.html?view=print.
(Telepolis)
Abstract: 1983 bewegte ein einziges Thema die gesamte Bundesrepublik: die geplante Volkszählung. Jeder Haushalt in Westdeutschland sollte Fragebögen mit 36 Fragen zur Wohnsituation, den im Haushalt lebenden Personen und über ihre Einkommensverhältnisse ausfüllen. Es regte sich massiver Widerstand, hunderte Bürgerinitiativen formierten sich im ganzen Land gegen die Befragung. Man wollte nicht "erfasst" werden, die Privatsphäre war heilig. Es bestand die (berechtigte) Sorge, dass die Antworten auf den eigentlich anonymisierten Fragebögen Rückschlüsse auf die Identität der Befragten zulassen. Das Bundesverfassungsgericht gab den Klägern gegen den Zensus Recht: Die geplante Volkszählung verstieß gegen den Datenschutz und damit auch gegen das Grundgesetz. Sie wurde gestoppt. Nur eine Generation später geben wir sorglos jedes Mal beim Einkaufen die Bonuskarte der Supermarktkette heraus, um ein paar Punkte für ein Geschenk oder Rabatte beim nächsten Einkauf zu sammeln. Und dabei wissen wir sehr wohl, dass der Supermarkt damit unser Konsumverhalten bis ins letzte Detail erfährt. Was wir nicht wissen, ist, wer noch Zugang zu diesen Daten erhält. Deren Käufer bekommen nicht nur Zugriff auf unsere Einkäufe, sondern können über sie auch unsere Gewohnheiten, persönlichen Vorlieben und Einkommen ermitteln. Genauso unbeschwert surfen wir im Internet, googeln und shoppen, mailen und chatten. Google, Facebook und Microsoft schauen bei all dem nicht nur zu, sondern speichern auf alle Zeiten alles, was wir von uns geben, was wir einkaufen, was wir suchen, und verwenden es für ihre eigenen Zwecke. Sie durchstöbern unsere E-Mails, kennen unser persönliches Zeitmanagement, verfolgen unseren momentanen Standort, wissen um unsere politischen, religiösen und sexuellen Präferenzen (wer kennt ihn nicht, den Button "an Männern interessiert" oder "an Frauen interessiert"?), unsere engsten Freunde, mit denen wir online verbunden sind, unseren Beziehungsstatus, welche Schule wir besuchen oder besucht haben und vieles mehr.
Inhalt: Vgl.: http://www.heise.de/-3946125.
Themenfeld: Vision ; Data Mining
Wissenschaftsfach: Informatik
-
11Chardonnens, A. ; Hengchen, S.: Text mining for cultural heritage institutions : a 5-step method for cultural heritage institutions.
In: Everything changes, everything stays the same? - Understanding information spaces : Proceedings of the 15th International Symposium of Information Science (ISI 2017), Berlin/Germany, 13th - 15th March 2017. Eds.: M. Gäde, V. Trkulja u. V. Petras. vwh-Verlag : Glückstadt, 2017. S.177-191.
(Schriften zur Informationswissenschaft; Bd. 70)
Inhalt: Vgl.: http://www.vwh-verlag.de/vwh/wp-content/uploads/2017/03/titelei_isi17.pdf.
Themenfeld: Data Mining
-
12Varathan, K.D. ; Giachanou, A. ; Crestani, F.: Comparative opinion mining : a review.
In: Journal of the Association for Information Science and Technology. 68(2017) no.4, S.811-829.
(Review)
Abstract: Opinion mining refers to the use of natural language processing, text analysis, and computational linguistics to identify and extract subjective information in textual material. Opinion mining, also known as sentiment analysis, has received a lot of attention in recent times, as it provides a number of tools to analyze public opinion on a number of different topics. Comparative opinion mining is a subfield of opinion mining which deals with identifying and extracting information that is expressed in a comparative form (e.g., "paper X is better than the Y"). Comparative opinion mining plays a very important role when one tries to evaluate something because it provides a reference point for the comparison. This paper provides a review of the area of comparative opinion mining. It is the first review that cover specifically this topic as all previous reviews dealt mostly with general opinion mining. This survey covers comparative opinion mining from two different angles. One from the perspective of techniques and the other from the perspective of comparative opinion elements. It also incorporates preprocessing tools as well as data set that were used by past researchers that can be useful to future researchers in the field of comparative opinion mining.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23716/full.
Themenfeld: Data Mining
Wissenschaftsfach: Kommunikationswissenschaften
-
13Ayadi, H. ; Torjmen-Khemakhem, M. ; Daoud, M. ; Huang, J.X. ; Jemaa, M.B.: Mining correlations between medically dependent features and image retrieval models for query classification.
In: Journal of the Association for Information Science and Technology. 68(2017) no.5, S.1323-1334.
Abstract: The abundance of medical resources has encouraged the development of systems that allow for efficient searches of information in large medical image data sets. State-of-the-art image retrieval models are classified into three categories: content-based (visual) models, textual models, and combined models. Content-based models use visual features to answer image queries, textual image retrieval models use word matching to answer textual queries, and combined image retrieval models, use both textual and visual features to answer queries. Nevertheless, most of previous works in this field have used the same image retrieval model independently of the query type. In this article, we define a list of generic and specific medical query features and exploit them in an association rule mining technique to discover correlations between query features and image retrieval models. Based on these rules, we propose to use an associative classifier (NaiveClass) to find the best suitable retrieval model given a new textual query. We also propose a second associative classifier (SmartClass) to select the most appropriate default class for the query. Experiments are performed on Medical ImageCLEF queries from 2008 to 2012 to evaluate the impact of the proposed query features on the classification performance. The results show that combining our proposed specific and generic query features is effective in query classification.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23772/full.
Themenfeld: Data Mining
Wissenschaftsfach: Medizin
-
14Gill, A.J. ; Hinrichs-Krapels, S. ; Blanke, T. ; Grant, J. ; Hedges, M. ; Tanner, S.: Insight workflow : systematically combining human and computational methods to explore textual data.
In: Journal of the Association for Information Science and Technology. 68(2017) no.7, S.1671-1686.
Abstract: Analyzing large quantities of real-world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real-world data set as our case study, and use this exploration as a demonstration of our "insight workflow," which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in-depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23767/full.
Themenfeld: Computerlinguistik ; Data Mining
-
15Kipcic, O. ; Cramer, C.: Wie Zeitungsinhalte Forschung und Entwicklung befördern.
In: Open Password. 2017, Nr.237 vom 20.07.2017 [http://www.password-online.de/?wysija-page=1&controller=email&action=view&email_id=294&wysijap=subscriptions&user_id=623].
Abstract: Das F.A.Z.-Archiv ist nach innen das Informationszentrum der F.A.Z. Hier ist seine oberste Aufgabe die Informationsversorgung der Redaktionen der F.A.Z. GmbH und der Nachweis der F.A.Z. mit allen Teilen und Ausgaben. Nach außen tritt es als Vermarkter von Zeitungsdaten auf, dies sowohl für das eigene Haus wie auch für Dritte. Klarer Auftrag ist dabei die Generierung von Erlösen für die F.A.Z.-Gruppe durch Informations- und Datenbankdienste für externe Kunden.
Inhalt: Dieser Beitrag beruht auf einem Vortrag der Autorinnen auf der vfm-Frühjahrstagung 2017. Er wird demnächst in info7 2017/3 veröffentlicht.
Themenfeld: Data Mining
Behandelte Form: Zeitungen
Objekt: FAZ
Anwendungsfeld: Pressearchive
-
16Nohr, H.: Big Data im Lichte der EU-Datenschutz-Grundverordnung.
In: JurPC: Internet-Zeitschrift für Rechtsinformatik und Informationsrecht, [http://www.jurpc.de/jurpc/show?id=20170111].
Abstract: Der vorliegende Beitrag beschäftigt sich mit den Rahmenbedingungen für analytische Anwendungen wie Big Data, die durch das neue europäische Datenschutzrecht entstehen, insbesondere durch die EU-Datenschutz-Grundverordnung. Er stellt wesentliche Neuerungen vor und untersucht die spezifischen datenschutzrechtlichen Regelungen im Hinblick auf den Einsatz von Big Data sowie Voraussetzungen, die durch die Verordnung abverlangt werden.
Inhalt: Vgl.: JurPC Web-Dok. 111/2017 - DOI 10.7328/jurpcb2017328111.
Themenfeld: Rechtsfragen ; Data Mining
Land/Ort: D ; EU
-
17Loonus, Y.: Einsatzbereiche der KI und ihre Relevanz für Information Professionals.
In: Open Password. 2017, Nr.265 vom 11.10.2017. [http://www.password-online.de/?wysija-page=1&controller=email&action=view&email_id=340&wysijap=subscriptions&user_id=1045].
(Künstliche Intelligenz)
Abstract: Es liegt in der Natur des Menschen, Erfahrungen und Ideen in Wort und Schrift mit anderen teilen zu wollen. So produzieren wir jeden Tag gigantische Mengen an Texten, die in digitaler Form geteilt und abgelegt werden. The Radicati Group schätzt, dass 2017 täglich 269 Milliarden E-Mails versendet und empfangen werden. Hinzu kommen größtenteils unstrukturierte Daten wie Soziale Medien, Presse, Websites und firmeninterne Systeme, beispielsweise in Form von CRM-Software oder PDF-Dokumenten. Der weltweite Bestand an unstrukturierten Daten wächst so rasant, dass es kaum möglich ist, seinen Umfang zu quantifizieren. Der Versuch, eine belastbare Zahl zu recherchieren, führt unweigerlich zu diversen Artikeln, die den Anteil unstrukturierter Texte am gesamten Datenbestand auf 80% schätzen. Auch wenn nicht mehr einwandfrei nachvollziehbar ist, woher diese Zahl stammt, kann bei kritischer Reflexion unseres Tagesablaufs kaum bezweifelt werden, dass diese Daten von großer wirtschaftlicher Relevanz sind.
Themenfeld: Data Mining
Anwendungsfeld: Informationswirtschaft
-
18Winterhalter, C.: Licence to mine : ein Überblick über Rahmenbedingungen von Text and Data Mining und den aktuellen Stand der Diskussion.
In: 027.7 Zeitschrift für Bibliothekskultur. 4(2016), H.2.
Abstract: Der Artikel gibt einen Überblick über die Möglichkeiten der Anwendung von Text and Data Mining (TDM) und ähnlichen Verfahren auf der Grundlage bestehender Regelungen in Lizenzverträgen zu kostenpflichtigen elektronischen Ressourcen, die Debatte über zusätzliche Lizenzen für TDM am Beispiel von Elseviers TDM Policy und den Stand der Diskussion über die Einführung von Schrankenregelungen im Urheberrecht für TDM zu nichtkommerziellen wissenschaftlichen Zwecken.
Inhalt: Beitrag in einem Themenschwerpunkt 'Computerlinguistik und Bibliotheken'. Vgl.: http://0277.ch/ojs/index.php/cdrs_0277/article/view/153/350.
Themenfeld: Data Mining ; Rechtsfragen
-
19Bauckhage, C.: Moderne Textanalyse : neues Wissen für intelligente Lösungen.
In: https://login.mailingwork.de/public/a_5668_LVrTK/file/data/1125_Textanalyse_Christian-Bauckhage.pdf.
Abstract: Im Zuge der immer größeren Verfügbarkeit von Daten (Big Data) und rasanter Fortschritte im Daten-basierten maschinellen Lernen haben wir in den letzten Jahren Durchbrüche in der künstlichen Intelligenz erlebt. Dieser Vortrag beleuchtet diese Entwicklungen insbesondere im Hinblick auf die automatische Analyse von Textdaten. Anhand einfacher Beispiele illustrieren wir, wie moderne Textanalyse abläuft und zeigen wiederum anhand von Beispielen, welche praktischen Anwendungsmöglichkeiten sich heutzutage in Branchen wie dem Verlagswesen, der Finanzindustrie oder dem Consulting ergeben.
Inhalt: Folien der Präsentation anlässlich des GENIOS Datenbankfrühstücks 2016, 19. Oktober 2016.
Themenfeld: Wissensrepräsentation ; Data Mining
Wissenschaftsfach: Informatik
-
20Song, J. ; Huang, Y. ; Qi, X. ; Li, Y. ; Li, F. ; Fu, K. ; Huang, T.: Discovering hierarchical topic evolution in time-stamped documents.
In: Journal of the Association for Information Science and Technology. 67(2016) no.4, S.915-927.
Abstract: The objective of this paper is to propose a hierarchical topic evolution model (HTEM) that can organize time-varying topics in a hierarchy and discover their evolutions with multiple timescales. In the proposed HTEM, topics near the root of the hierarchy are more abstract and also evolve in the longer timescales than those near the leaves. To achieve this goal, the distance-dependent Chinese restaurant process (ddCRP) is extended to a new nested process that is able to simultaneously model the dependencies among data and the relationship between clusters. The HTEM is proposed based on the new process for time-stamped documents, in which the timestamp is utilized to measure the dependencies among documents. Moreover, an efficient Gibbs sampler is developed for the proposed HTEM. Our experimental results on two popular real-world data sets verify that the proposed HTEM can capture coherent topics and discover their hierarchical evolutions. It also outperforms the baseline model in terms of likelihood on held-out data.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23439/abstract.
Themenfeld: Data Mining