Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 28. April 2022)
1Walsh, J.A. ; Cobb, P.J. ; Fremery, W. de ; Golub, K. ; Keah, H. ; Kim, J. ; Kiplang'at, J. ; Liu, Y.-H. ; Mahony, S. ; Oh, S.G. ; Sula, C.A. ; Underwood, T. ; Wang, X.: Digital humanities in the iSchool.
In: Journal of the Association for Information Science and Technology. 73(2022) no.2, S.188-203.
(JASIST special issue on digital humanities (DH): A. Landscapes of DH)
Abstract: The interdisciplinary field known as digital humanities (DH) is represented in various forms in the teaching and research practiced in iSchools. Building on the work of an iSchools organization committee charged with exploring digital humanities curricula, we present findings from a series of related studies exploring aspects of DH teaching, education, and research in iSchools, often in collaboration with other units and disciplines. Through a survey of iSchool programs and an online DH course registry, we investigate the various education models for DH training found in iSchools, followed by a detailed look at DH courses and curricula, explored through analysis of course syllabi and course descriptions. We take a brief look at collaborative disciplines with which iSchools cooperate on DH research projects or in offering DH education. Next, we explore DH careers through an analysis of relevant job advertisements. Finally, we offer some observations about the management and administrative challenges and opportunities related to offering a new iSchool DH program. Our results provide a snapshot of the current state of digital humanities in iSchools which may usefully inform the design and evolution of new DH programs, degrees, and related initiatives.
Inhalt: Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24535.
Themenfeld: Elektronisches Publizieren
2Kim, J. ; Kim, J. ; Owen-Smith, J.: Ethnicity-based name partitioning for author name disambiguation using supervised machine learning.
In: Journal of the Association for Information Science and Technology. 72(2021) no.8, S.979-994.
Abstract: In several author name disambiguation studies, some ethnic name groups such as East Asian names are reported to be more difficult to disambiguate than others. This implies that disambiguation approaches might be improved if ethnic name groups are distinguished before disambiguation. We explore the potential of ethnic name partitioning by comparing performance of four machine learning algorithms trained and tested on the entire data or specifically on individual name groups. Results show that ethnicity-based name partitioning can substantially improve disambiguation performance because the individual models are better suited for their respective name group. The improvements occur across all ethnic name groups with different magnitudes. Performance gains in predicting matched name pairs outweigh losses in predicting nonmatched pairs. Feature (e.g., coauthor name) similarities of name pairs vary across ethnic name groups. Such differences may enable the development of ethnicity-specific feature weights to improve prediction for specific ethic name categories. These findings are observed for three labeled data with a natural distribution of problem sizes as well as one in which all ethnic name groups are controlled for the same sizes of ambiguous names. This study is expected to motive scholars to group author names based on ethnicity prior to disambiguation.
Inhalt: Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24459.
3Kim, J.(im) ; Kim, J.(enna): Effect of forename string on author name disambiguation.
In: Journal of the Association for Information Science and Technology. 71(2020) no.7, S.839-855.
Abstract: In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performance of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled data sets under varying ratios and lengths of full forenames, reflecting real-world scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). The results show that increasing the ratios of full forenames substantially improves both heuristic and machine-learning-based disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonyms are prevalent. As the ratios of full forenames increase, however, they become marginal compared to those by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation methods compared to using full-length strings. These findings provide practical suggestions, such as restoring initialized forenames into a full-string format via record linkage for improved disambiguation performances.
4Kim, J.: Author-based analysis of conference versus journal publication in computer science.
In: Journal of the Association for Information Science and Technology. 70(2019) no.1, S.71-82.
Abstract: Conference publications in computer science (CS) have attracted scholarly attention due to their unique status as a main research outlet, unlike other science fields where journals are dominantly used for communicating research findings. One frequent research question has been how different conference and journal publications are, considering an article as a unit of analysis. This study takes an author-based approach to analyze the publishing patterns of 517,763 scholars who have ever published both in CS conferences and journals for the last 57 years, as recorded in DBLP. The analysis shows that the majority of CS scholars tend to make their scholarly debut, publish more articles, and collaborate with more coauthors in conferences than in journals. Importantly, conference articles seem to serve as a distinct channel of scholarly communication, not a mere preceding step to journal publications: coauthors and title words of authors across conferences and journals tend not to overlap much. This study corroborates findings of previous studies on this topic from a distinctive perspective and suggests that conference authorship in CS calls for more special attention from scholars and administrators outside CS who have focused on journal publications to mine authorship data and evaluate scholarly performance.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24079.
5Kim, J.: Scale-free collaboration networks : an author name disambiguation perspective.
In: Journal of the Association for Information Science and Technology. 70(2019) no.7, S.685-700.
Abstract: Several studies have found that collaboration networks are scale-free, proposing that such networks can be modeled by specific network evolution mechanisms like preferential attachment. This study argues that collaboration networks can look more or less scale-free depending on the methods for resolving author name ambiguity in bibliographic data. Analyzing networks constructed from multiple datasets containing 3.4 M ~ 9.6 M publication records, this study shows that collaboration networks in which author names are disambiguated by the commonly used heuristic, i.e., forename-initial-based name matching, tend to produce degree distributions better fitted to power-law slopes with the typical scaling parameter (2 < a < 3) than networks disambiguated by more accurate algorithm-based methods. Such tendency is observed across collaboration networks generated under various conditions such as cumulative years, 5- and 1-year sliding windows, and random sampling, and through simulation, found to arise due mainly to artefactual entities created by inaccurate disambiguation. This cautionary study calls for special attention from scholars analyzing network data in which entities such as people, organization, and gene can be merged or split by improper disambiguation.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24158.
6Lee, S. ; Ha, T. ; Lee, D. ; Kim, J.H.: Understanding the majority opinion formation process in online environments : an exploratory approach to Facebook.
In: Information processing and management. 54(2018) no.6, S.1115-1128.
Abstract: Majority opinions are often observed in the process of social interaction in online communities, but few studies have addressed this issue with empirical data. To identify an appropriate theoretical lens for explaining majority opinions in online environments, this study investigates the skewness statistic, which indicates how many "Likes" are skewed to major comments on a Facebook post; 3489 posts are gathered from the New York Times Facebook page for 100 days. Results show that time is not an influential factor for skewness increase, but the number of comments has a logarithmic relation to skewness increase. Regression models and Chow tests show that this relationship differs depending on topic contents, but majority opinions are significant in overall. These results suggest that the bandwagon effect due to social affordance can be a suitable mechanism for explaining majority opinion formation in an online environment and that majority opinions in online communities can be misperceived due to overestimation.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2018.08.002.
7Banerjee, S. ; Chua, A.Y.K. ; Kim, J.-J.: Don't be deceived : using linguistic analysis to learn how to discern online review authenticity.
In: Journal of the Association for Information Science and Technology. 68(2017) no.6, S.1525-1538.
Abstract: This article uses linguistic analysis to help users discern the authenticity of online reviews. Two related studies were conducted using hotel reviews as the test case for investigation. The first study analyzed 1,800 authentic and fictitious reviews based on the linguistic cues of comprehensibility, specificity, exaggeration, and negligence. The analysis involved classification algorithms followed by feature selection and statistical tests. A filtered set of variables that helped discern review authenticity was identified. The second study incorporated these variables to develop a guideline that aimed to inform humans how to distinguish between authentic and fictitious reviews. The guideline was used as an intervention in an experimental setup that involved 240 participants. The intervention improved human ability to identify fictitious reviews amid authentic ones.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23784/full.
8Kim, J. ; Diesner, J.: Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks.
In: Journal of the Association for Information Science and Technology. 67(2016) no.6, S.1446-1461.
Abstract: Scholars have often relied on name initials to resolve name ambiguities in large-scale coauthorship network research. This approach bears the risk of incorrectly merging or splitting author identities. The use of initial-based disambiguation has been justified by the assumption that such errors would not affect research findings too much. This paper tests that assumption by analyzing coauthorship networks from five academic fields-biology, computer science, nanoscience, neuroscience, and physics-and an interdisciplinary journal, PNAS. Name instances in data sets of this study were disambiguated based on heuristics gained from previous algorithmic disambiguation solutions. We use disambiguated data as a proxy of ground-truth to test the performance of three types of initial-based disambiguation. Our results show that initial-based disambiguation can misrepresent statistical properties of coauthorship networks: It deflates the number of unique authors, number of components, average shortest paths, clustering coefficient, and assortativity, while it inflates average productivity, density, average coauthor number per author, and largest component size. Also, on average, more than half of top 10 productive or collaborative authors drop off the lists. Asian names were found to account for the majority of misidentification by initial-based disambiguation due to their common surname and given name initials.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23489/abstract.
9Kim, J. ; Thomas, P. ; Sankaranarayana, R. ; Gedeon, T. ; Yoon, H.-J.: Understanding eye movements on mobile devices for better presentation of search results.
In: Journal of the Association for Information Science and Technology. 67(2016) no.11, S.2607-2619.
Abstract: Compared to the early versions of smart phones, recent mobile devices have bigger screens that can present more web search results. Several previous studies have reported differences in user interaction between conventional desktop computer and mobile device-based web searches, so it is imperative to consider the differences in user behavior for web search engine interface design on mobile devices. However, it is still unknown how the diversification of screen sizes on hand-held devices affects how users search. In this article, we investigate search performance and behavior on three different small screen sizes: early smart phones, recent smart phones, and phablets. We found no significant difference with respect to the efficiency of carrying out tasks, however participants exhibited different search behaviors: less eye movement within top links on the larger screen, fast reading with some hesitation before choosing a link on the medium, and frequent use of scrolling on the small screen. This result suggests that the presentation of web search results for each screen needs to take into account differences in search behavior. We suggest several ideas for presentation design for each screen size.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23628/full.
10Kim, J. ; Thomas, P. ; Sankaranarayana, R. ; Gedeon, T. ; Yoon, H.-J.: Eye-tracking analysis of user behavior and performance in web search on large and small screens.
In: Journal of the Association for Information Science and Technology. 66(2015) no.3, S.526-544.
Abstract: In recent years, searching the web on mobile devices has become enormously popular. Because mobile devices have relatively small screens and show fewer search results, search behavior with mobile devices may be different from that with desktops or laptops. Therefore, examining these differences may suggest better, more efficient designs for mobile search engines. In this experiment, we use eye tracking to explore user behavior and performance. We analyze web searches with 2 task types on 2 differently sized screens: one for a desktop and the other for a mobile device. In addition, we examine the relationships between search performance and several search behaviors to allow further investigation of the differences engendered by the screens. We found that users have more difficulty extracting information from search results pages on the smaller screens, although they exhibit less eye movement as a result of an infrequent use of the scroll function. However, in terms of search performance, our findings suggest that there is no significant difference between the 2 screens in time spent on search results pages and the accuracy of finding answers. This suggests several possible ideas for the presentation design of search results pages on small devices.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23187/abstract.
Themenfeld: Benutzerstudien ; Suchtaktik
11Kim, J. ; Diesner, J.: Coauthorship networks : a directed network approach considering the order and number of coauthors.
In: Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2685-2696.
Abstract: In many scientific fields, the order of coauthors on a paper conveys information about each individual's contribution to a piece of joint work. We argue that in prior network analyses of coauthorship networks, the information on ordering has been insufficiently considered because ties between authors are typically symmetrized. This is basically the same as assuming that each coauthor has contributed equally to a paper. We introduce a solution to this problem by adopting a coauthorship credit allocation model proposed by Kim and Diesner (2014), which in its core conceptualizes coauthoring as a directed, weighted, and self-looped network. We test and validate our application of the adopted framework based on a sample data of 861 authors who have published in the journal Psychometrika. The results suggest that this novel sociometric approach can complement traditional measures based on undirected networks and expand insights into coauthoring patterns such as the hierarchy of collaboration among scholars. As another form of validation, we also show how our approach accurately detects prominent scholars in the Psychometric Society affiliated with the journal.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23361/abstract.
12Kim, J.-A.: Understanding knowledge representation in the knowledge management environment : evaluation of ontology visualization methods.
In: Knowledge organization. 39(2012) no.3, S.193-203.
Abstract: The application of effective mechanisms for organizing knowledge has been of great concern to help the user discover and share knowledge. Ontology provides the foundation for knowledge organization and sharing by supporting the specification of knowledge structure. The visualization of ontology provides new possibilities for presenting knowledge representation, but the effectiveness of visualization has not been proven. This study examines user performance and perception with ontology visualization methods and provides suggestions for the design of ontology visualization. Differences in user performance based on ontology visualization methods were examined in terms of task completion time and frequency of interaction. Also user perceptions on the usability of ontology visualization methods were examined in terms of ease of use, comprehension of visualization style, comprehension of properties, and subjective satisfaction.
Inhalt: Vgl.: http://www.ergon-verlag.de/isko_ko/downloads/ko_39_2012_3_e.pdf.
13Kim, J.H. ; Barnett, G.A. ; Park, H.W.: ¬A hyperlink and issue network analysis of the United States Senate : a rediscovery of the Web as a relational and topical medium.
In: Journal of the American Society for Information Science and Technology. 61(2010) no.8, S.1598-1611.
Abstract: Politicians' Web sites have been considered a medium for organizing, mobilizing, and agenda-setting, but extant literature lacks a systematic approach to interpret the Web sites of senators - a new medium for political communication. This study classifies the role of political Web sites into relational (hyperlinking) and topical (shared-issues) aspects. The two aspects may be viewed from a social embeddedness perspective and three facets, as K. Foot and S. Schneider () suggested. This study employed network analysis, a set of research procedures for identifying structures in social systems, as the basis of the relations among the system's components rather than the attributes of individuals. Hyperlink and issue data were gathered from the United States Senate Web site and Yahoo. Major findings include: (a) The hyperlinks are more targeted at Democratic senators than at Republicans and are a means of communication for senators and users; (b) the issue network found from the Web is used for discussing public agendas and is more highly utilized by Republican senators; (c) the hyperlink and issue networks are correlated; and (d) social relationships and issue ecologies can be effectively detected by these two networks. The need for further research is addressed.
14Kim, J.: Faculty self-archiving : motivations and barriers.
In: Journal of the American Society for Information Science and Technology. 61(2010) no.9, S.1909-1922.
Abstract: This study investigated factors that motivate or impede faculty participation in self-archiving practices-the placement of research work in various open access (OA) venues, ranging from personal Web pages to OA archives. The author's research design involves triangulation of survey and interview data from 17 Carnegie doctorate universities with DSpace institutional repositories. The analysis of survey responses from 684 professors and 41 telephone interviews identified seven significant factors: (a) altruism-the idea of providing OA benefits for users; (b) perceived self-archiving culture; (c) copyright concerns; (d) technical skills; (e) age; (f) perception of no harmful impact of self-archiving on tenure and promotion; and (g) concerns about additional time and effort. The factors are listed in descending order of their effect size. Age, copyright concerns, and additional time and effort are negatively associated with self-archiving, whereas remaining factors are positively related to it. Faculty are motivated by OA advantages to users, disciplinary norms, and no negative influence on academic reward. However, barriers to self-archiving-concerns about copyright, extra time and effort, technical ability, and age-imply that the provision of services to assist faculty with copyright management, and with technical and logistical issues, could encourage higher rates of self-archiving.
15Kim, J.: Describing and predicting information-seeking behavior on the Web.
In: Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.679-693.
Abstract: This study focuses on the task as a fundamental factor in the context of information seeking. The purpose of the study is to characterize kinds of tasks and to examine how different kinds of task give rise to different kinds of information-seeking behavior on the Web. For this, a model for information-seeking behavior was used employing dimensions of information-seeking strategies (ISS), which are based on several behavioral dimensions. The analysis of strategies was based on data collected through an experiment designed to observe users' behaviors. Three tasks were assigned to 30 graduate students and data were collected using questionnaires, search logs, and interviews. The qualitative and quantitative analysis of the data identified 14 distinct information-seeking strategies. The analysis showed significant differences in the frequencies and patterns of ISS employed between three tasks. The results of the study are intended to facilitate the development of task-based information-seeking models and to further suggest Web information system designs that support the user's diverse tasks.
Themenfeld: Informationsdienstleistungen ; Benutzerstudien
16Son, H.-J. ; Kim, S.-H. ; Kim, J.-S.: Text image matching without language model using a Hausdorff distance.
In: Information processing and management. 44(2008) no.3, S.1189-1200.
Abstract: In this paper, we propose a text matching method for document image retrieval without any language model. Two word images are first normalized to an appropriate size and image features are extracted using the local crowdedness method. Similarity between the two features is then measured by calculating a Hausdorff distance. We performed three experiments. The first experiment proves the effectiveness of the proposed method for text matching, and the other two experiments verify the language independence and font size independence of the proposed method.
17Kim, J.-M. ; Shin, H. ; Kim, H.-J.: Schema and constraints-based matching and merging of Topic Maps.
In: Information processing and management. 43(2007) no.4, S.930-945.
Abstract: In this paper, we propose a multi-strategic matching and merging approach to find correspondences between ontologies based on the syntactic or semantic characteristics and constraints of the Topic Maps. Our multi-strategic matching approach consists of a linguistic module and a Topic Map constraints-based module. A linguistic module computes similarities between concepts using morphological analysis, string normalization and tokenization and language-dependent heuristics. A Topic Map constraints-based module takes advantage of several Topic Maps-dependent techniques such as a topic property-based matching, a hierarchy-based matching, and an association-based matching. This is a composite matching procedure and need not generate a cross-pair of all topics from the ontologies because unmatched pairs of topics can be removed by characteristics and constraints of the Topic Maps. Merging between Topic Maps follows the matching operations. We set up the MERGE function to integrate two Topic Maps into a new Topic Map, which satisfies such merge requirements as entity preservation, property preservation, relation preservation, and conflict resolution. For our experiments, we used oriental philosophy ontologies, western philosophy ontologies, Yahoo western philosophy dictionary, and Wikipedia philosophy ontology as input ontologies. Our experiments show that the automatically generated matching results conform to the outputs generated manually by domain experts and can be of great benefit to the following merging operations.
Themenfeld: Semantische Interoperabilität
18Kang, I.-S. ; Na, S.-H. ; Kim, J. ; Lee, J.-H.: Cluster-based patent retrieval.
In: Information processing and management. 43(2007) no.5, S.1173-1182.
Abstract: Through the recent NTCIR workshops, patent retrieval casts many challenging issues to information retrieval community. Unlike newspaper articles, patent documents are very long and well structured. These characteristics raise the necessity to reassess existing retrieval techniques that have been mainly developed for structure-less and short documents such as newspapers. This study investigates cluster-based retrieval in the context of invalidity search task of patent retrieval. Cluster-based retrieval assumes that clusters would provide additional evidence to match user's information need. Thus far, cluster-based retrieval approaches have relied on automatically-created clusters. Fortunately, all patents have manually-assigned cluster information, international patent classification codes. International patent classification is a standard taxonomy for classifying patents, and has currently about 69,000 nodes which are organized into a five-level hierarchical system. Thus, patent documents could provide the best test bed to develop and evaluate cluster-based retrieval techniques. Experiments using the NTCIR-4 patent collection showed that the cluster-based language model could be helpful to improving the cluster-less baseline language model.
Anmerkung: Einführung in einen Themenschwerpunkt "patent processing"
19Kim, J.-H. ; Choi, K.-S.: Patent document categorization based on semantic structural information.
In: Information processing and management. 43(2007) no.5, S.1200-1215.
Abstract: The number of patent documents is currently rising rapidly worldwide, creating the need for an automatic categorization system to replace time-consuming and labor-intensive manual categorization. Because accurate patent classification is crucial to search for relevant existing patents in a certain field, patent categorization is a very important and useful field. As patent documents are structural documents with their own characteristics distinguished from general documents, these unique traits should be considered in the patent categorization process. In this paper, we categorize Japanese patent documents automatically, focusing on their characteristics: patents are structured by claims, purposes, effects, embodiments of the invention, and so on. We propose a patent document categorization method that uses the k-NN (k-Nearest Neighbour) approach. In order to retrieve similar documents from a training document set, some specific components to denote the so-called semantic elements, such as claim, purpose, and application field, are compared instead of the whole texts. Because those specific components are identified by various user-defined tags, first all of the components are clustered into several semantic elements. Such semantically clustered structural components are the basic features of patent categorization. We can achieve a 74% improvement of categorization performance over a baseline system that does not use the structural information of the patent.
Anmerkung: Beitrag innerhalb eines Themenschwerpunkt "special issue on patent processing"
20Kim, J.-A.: Toward an understanding of Web-based subscription database acceptance.
In: Journal of the American Society for Information Science and Technology. 57(2006) no.13, S.1715-1728.
Abstract: Underutilization of Web-based subscription databases and the importance of promoting them have been recognized in previous research. To determine the factors affecting user acceptance of Web-based subscription databases, this study tests an integrated model of the antecedents and consequences of user beliefs about intended use by extending the technology acceptance model. The research employs a cross-sectional field study using a Web survey method targeting undergraduate students who have experience with Web-based subscription databases. Overall, the research model performs well in explaining user acceptance of Web-based subscription databases. The effects of the cognitive instrumental determinants of usefulness perceptions are examined. Terminology clarity and accessibility were found to be important determinants for ease of use of the databases. The results indicate that user training has no impact on either perceptions of usefulness or ease of use, and that there is a need to reexamine the effectiveness of user training in the context of Web-based subscription databases. The results suggest that user acceptance of the databases depends largely on the utility they offer. The findings also suggest that although a subjective norm does not directly affect intended use, it exerts a positive influence on user beliefs about the utility of the databases.