Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 04. Juni 2021)
1Vegt, A. van der ; Zuccon, G. ; Koopman, B.: Do better search engines really equate to better clinical decisions? : If not, why not?.
In: Journal of the Association for Information Science and Technology. 72(2021) no.2, S.141-155.
Abstract: Previous research has found that improved search engine effectiveness-evaluated using a batch-style approach-does not always translate to significant improvements in user task performance; however, these prior studies focused on simple recall and precision-based search tasks. We investigated the same relationship, but for realistic, complex search tasks required in clinical decision making. One hundred and nine clinicians and final year medical students answered 16 clinical questions. Although the search engine did improve answer accuracy by 20 percentage points, there was no significant difference when participants used a more effective, state-of-the-art search engine. We also found that the search engine effectiveness difference, identified in the lab, was diminished by around 70% when the search engines were used with real users. Despite the aid of the search engine, half of the clinical questions were answered incorrectly. We further identified the relative contribution of search engine effectiveness to the overall end task success. We found that the ability to interpret documents correctly was a much more important factor impacting task success. If these findings are representative, information retrieval research may need to reorient its emphasis towards helping users to better understand information, rather than just finding it for them.
Inhalt: Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24398.
Themenfeld: Suchmaschinen ; Retrievalstudien
2Parapar, J. ; Losada, D.E. ; Presedo-Quindimil, M.A. ; Barreiro, A.: Using score distributions to compare statistical significance tests for information retrieval evaluation.
In: Journal of the Association for Information Science and Technology. 71(2020) no.1, S.98-113.
Abstract: Statistical significance tests can provide evidence that the observed difference in performance between 2 methods is not due to chance. In information retrieval (IR), some studies have examined the validity and suitability of such tests for comparing search systems. We argue here that current methods for assessing the reliability of statistical tests suffer from some methodological weaknesses, and we propose a novel way to study significance tests for retrieval evaluation. Using Score Distributions, we model the output of multiple search systems, produce simulated search results from such models, and compare them using various significance tests. A key strength of this approach is that we assess statistical tests under perfect knowledge about the truth or falseness of the null hypothesis. This new method for studying the power of significance tests in IR evaluation is formal and innovative. Following this type of analysis, we found that both the sign test and Wilcoxon signed test have more power than the permutation test and the t-test. The sign test and Wilcoxon signed test also have good behavior in terms of type I errors. The bootstrap test shows few type I errors, but it has less power than the other methods tested.
Inhalt: Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24203.
3Losada, D.E. ; Parapar, J. ; Barreiro, A.: When to stop making relevance judgments? : a study of stopping methods for building information retrieval test collections.
In: Journal of the Association for Information Science and Technology. 70(2019) no.1, S.49-60.
Abstract: In information retrieval evaluation, pooling is a well-known technique to extract a sample of documents to be assessed for relevance. Given the pooled documents, a number of studies have proposed different prioritization methods to adjudicate documents for judgment. These methods follow different strategies to reduce the assessment effort. However, there is no clear guidance on how many relevance judgments are required for creating a reliable test collection. In this article we investigate and further develop methods to determine when to stop making relevance judgments. We propose a highly diversified set of stopping methods and provide a comprehensive analysis of the usefulness of the resulting test collections. Some of the stopping methods introduced here combine innovative estimates of recall with time series models used in Financial Trading. Experimental results on several representative collections show that some stopping methods can reduce up to 95% of the assessment effort and still produce a robust test collection. We demonstrate that the reduced set of judgments can be reliably employed to compare search systems using disparate effectiveness metrics such as Average Precision, NDCG, P@100, and Rank Biased Precision. With all these measures, the correlations found between full pool rankings and reduced pool rankings is very high.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24077.
4Rajagopal, P. ; Ravana, S.D. ; Koh, Y.S. ; Balakrishnan, V.: Evaluating the effectiveness of information retrieval systems using effort-based relevance judgment.
In: Aslib journal of information management. 71(2019) no.1, S.2-17.
Abstract: Purpose The effort in addition to relevance is a major factor for satisfaction and utility of the document to the actual user. The purpose of this paper is to propose a method in generating relevance judgments that incorporate effort without human judges' involvement. Then the study determines the variation in system rankings due to low effort relevance judgment in evaluating retrieval systems at different depth of evaluation. Design/methodology/approach Effort-based relevance judgments are generated using a proposed boxplot approach for simple document features, HTML features and readability features. The boxplot approach is a simple yet repeatable approach in classifying documents' effort while ensuring outlier scores do not skew the grading of the entire set of documents. Findings The retrieval systems evaluation using low effort relevance judgments has a stronger influence on shallow depth of evaluation compared to deeper depth. It is proved that difference in the system rankings is due to low effort documents and not the number of relevant documents. Originality/value Hence, it is crucial to evaluate retrieval systems at shallow depth using low effort relevance judgments.
Inhalt: Vgl.: https://doi.org/10.1108/AJIM-04-2018-0086.
5Munkelt, J. ; Schaer, P.: Towards an IR test collection for the German National Library.
Anmerkung: Vortrag, Conference: Lernen Wissen Daten Analysen 2018 at Mannheim.
6Sarigil, E. ; Sengor Altingovde, I. ; Blanco, R. ; Barla Cambazoglu, B. ; Ozcan, R. ; Ulusoy, Ö.: Characterizing, predicting, and handling web search queries that match very few or no results.
In: Journal of the Association for Information Science and Technology. 69(2018) no.2, S.256-270.
Abstract: A non-negligible fraction of user queries end up with very few or even no matching results in leading commercial web search engines. In this work, we provide a detailed characterization of such queries and show that search engines try to improve such queries by showing the results of related queries. Through a user study, we show that these query suggestions are usually perceived as relevant. Also, through a query log analysis, we show that the users are dissatisfied after submitting a query that match no results at least 88.5% of the time. As a first step towards solving these no-answer queries, we devised a large number of features that can be used to identify such queries and built machine-learning models. These models can be useful for scenarios such as the mobile- or meta-search, where identifying a query that will retrieve no results at the client device (i.e., even before submitting it to the search engine) may yield gains in terms of the bandwidth usage, power consumption, and/or monetary costs. Experiments over query logs indicate that, despite the heavy skew in class sizes, our models achieve good prediction quality, with accuracy (in terms of area under the curve) up to 0.95.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23955/full.
Themenfeld: Retrievalstudien ; Suchmaschinen
7Hider, P.: ¬The search value added by professional indexing to a bibliographic database.
In: Knowledge organization. 45(2018) no.1, S.23-32.
Abstract: Gross et al. (2015) have demonstrated that about a quarter of hits would typically be lost to keyword searchers if contemporary academic library catalogs dropped their controlled subject headings. This article reports on an investigation of the search value that subject descriptors and identifiers assigned by professional indexers add to a bibliographic database, namely the Australian Education Index (AEI). First, a similar methodology to that developed by Gross et al. (2015) was applied, with keyword searches representing a range of educational topics run on the AEI database with and without its subject indexing. The results indicated that AEI users would also lose, on average, about a quarter of hits per query. Second, an alternative research design was applied in which an experienced literature searcher was asked to find resources on a set of educational topics on an AEI database stripped of its subject indexing and then asked to search for additional resources on the same topics after the subject indexing had been reinserted. In this study, the proportion of additional resources that would have been lost had it not been for the subject indexing was again found to be about a quarter of the total resources found for each topic, on average.
Themenfeld: Retrievalstudien ; Volltextretrieval
8Munkelt, J.: Erstellung einer DNB-Retrieval-Testkollektion.
Köln : Technische Hochschule, Fakultät für Informations- und Kommunikationswissenschaften, 2018. II, 79 S.
Abstract: Seit Herbst 2017 findet in der Deutschen Nationalbibliothek die Inhaltserschließung bestimmter Medienwerke rein maschinell statt. Die Qualität dieses Verfahrens, das die Prozessorganisation von Bibliotheken maßgeblich prägen kann, wird unter Fachleuten kontrovers diskutiert. Ihre Standpunkte werden zunächst hinreichend erläutert, ehe die Notwendigkeit einer Qualitätsprüfung des Verfahrens und dessen Grundlagen dargelegt werden. Zentraler Bestandteil einer künftigen Prüfung ist eine Testkollektion. Ihre Erstellung und deren Dokumentation steht im Fokus dieser Arbeit. In diesem Zusammenhang werden auch die Entstehungsgeschichte und Anforderungen an gelungene Testkollektionen behandelt. Abschließend wird ein Retrievaltest durchgeführt, der die Einsatzfähigkeit der erarbeiteten Testkollektion belegt. Seine Ergebnisse dienen ausschließlich der Funktionsüberprüfung. Eine Qualitätsbeurteilung maschineller Inhaltserschließung im Speziellen sowie im Allgemeinen findet nicht statt und ist nicht Ziel der Ausarbeitung.
Inhalt: Bachelorarbeit, Bibliothekswissenschaften, Fakultät für Informations- und Kommunikationswissenschaften, Technische Hochschule Köln
Themenfeld: Retrievalstudien ; Automatisches Indexieren
9Munkelt, J. ; Schaer, P. ; Lepsky, K.: Towards an IR test collection for the German National Library.[Preprint].
Abstract: Automatic content indexing is one of the innovations that are increasingly changing the way libraries work. In theory, it promises a cataloguing service that would hardly be possible with humans in terms of speed, quantity and maybe quality. The German National Library (DNB) has also recognised this potential and is increasingly relying on the automatic indexing of their catalogue content. The DNB took a major step in this direction in 2017, which was announced in two papers. The announcement was rather restrained, but the content of the papers is all the more explosive for the library community: Since September 2017, the DNB has discontinued the intellectual indexing of series Band H and has switched to an automatic process for these series. The subject indexing of online publications (series O) has been purely automatical since 2010; from September 2017, monographs and periodicals published outside the publishing industry and university publications will no longer be indexed by people. This raises the question: What is the quality of the automatic indexing compared to the manual work or in other words to which degree can the automatic indexing replace people without a signi cant drop in regards to quality?
Themenfeld: Retrievalstudien ; Automatisches Indexieren
10Angelini, M. ; Fazzini, V. ; Ferro, N. ; Santucci, G. ; Silvello, G.: CLAIRE: A combinatorial visual analytics system for information retrieval evaluation.
In: Information processing and management. 54(2018) no.6, S.1077-1100.
Abstract: Information Retrieval (IR) develops complex systems, constituted of several components, which aim at returning and optimally ranking the most relevant documents in response to user queries. In this context, experimental evaluation plays a central role, since it allows for measuring IR systems effectiveness, increasing the understanding of their functioning, and better directing the efforts for improving them. Current evaluation methodologies are limited by two major factors: (i) IR systems are evaluated as "black boxes", since it is not possible to decompose the contributions of the different components, e.g., stop lists, stemmers, and IR models; (ii) given that it is not possible to predict the effectiveness of an IR system, both academia and industry need to explore huge numbers of systems, originated by large combinatorial compositions of their components, to understand how they perform and how these components interact together. We propose a Combinatorial visuaL Analytics system for Information Retrieval Evaluation (CLAIRE) which allows for exploring and making sense of the performances of a large amount of IR systems, in order to quickly and intuitively grasp which system configurations are preferred, what are the contributions of the different components and how these components interact together. The CLAIRE system is then validated against use cases based on several test collections using a wide set of systems, generated by a combinatorial composition of several off-the-shelf components, representing the most common denominator almost always present in English IR systems. In particular, we validate the findings enabled by CLAIRE with respect to consolidated deep statistical analyses and we show that the CLAIRE system allows the generation of new insights, which were not detectable with traditional approaches.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2018.04.006.
11Kutlu, M. ; Elsayed, T. ; Lease, M.: Intelligent topic selection for low-cost information retrieval evaluation : a new perspective on deep vs. shallow judging.
In: Information processing and management. 54(2018) no.1, S.37-59.
Abstract: While test collections provide the cornerstone for Cranfield-based evaluation of information retrieval (IR) systems, it has become practically infeasible to rely on traditional pooling techniques to construct test collections at the scale of today's massive document collections (e.g., ClueWeb12's 700M+ Webpages). This has motivated a flurry of studies proposing more cost-effective yet reliable IR evaluation methods. In this paper, we propose a new intelligent topic selection method which reduces the number of search topics (and thereby costly human relevance judgments) needed for reliable IR evaluation. To rigorously assess our method, we integrate previously disparate lines of research on intelligent topic selection and deep vs. shallow judging (i.e., whether it is more cost-effective to collect many relevance judgments for a few topics or a few judgments for many topics). While prior work on intelligent topic selection has never been evaluated against shallow judging baselines, prior work on deep vs. shallow judging has largely argued for shallowed judging, but assuming random topic selection. We argue that for evaluating any topic selection method, ultimately one must ask whether it is actually useful to select topics, or should one simply perform shallow judging over many topics? In seeking a rigorous answer to this over-arching question, we conduct a comprehensive investigation over a set of relevant factors never previously studied together: 1) method of topic selection; 2) the effect of topic familiarity on human judging speed; and 3) how different topic generation processes (requiring varying human effort) impact (i) budget utilization and (ii) the resultant quality of judgments. Experiments on NIST TREC Robust 2003 and Robust 2004 test collections show that not only can we reliably evaluate IR systems with fewer topics, but also that: 1) when topics are intelligently selected, deep judging is often more cost-effective than shallow judging in evaluation reliability; and 2) topic familiarity and topic generation costs greatly impact the evaluation cost vs. reliability trade-off. Our findings challenge conventional wisdom in showing that deep judging is often preferable to shallow judging when topics are selected intelligently.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2017.09.002.
12Behnert, C. ; Lewandowski, D.: ¬A framework for designing retrieval effectiveness studies of library information systems using human relevance assessments.
In: Journal of documentation. 73(2017) no.3, S.509-527.
Abstract: Purpose This paper demonstrates how to apply traditional information retrieval evaluation methods based on standards from the Text REtrieval Conference (TREC) and web search evaluation to all types of modern library information systems including online public access catalogs, discovery systems, and digital libraries that provide web search features to gather information from heterogeneous sources. Design/methodology/approach We apply conventional procedures from information retrieval evaluation to the library information system context considering the specific characteristics of modern library materials. Findings We introduce a framework consisting of five parts: (1) search queries, (2) search results, (3) assessors, (4) testing, and (5) data analysis. We show how to deal with comparability problems resulting from diverse document types, e.g., electronic articles vs. printed monographs and what issues need to be considered for retrieval tests in the library context. Practical implications The framework can be used as a guideline for conducting retrieval effectiveness studies in the library context. Originality/value Although a considerable amount of research has been done on information retrieval evaluation, and standards for conducting retrieval effectiveness studies do exist, to our knowledge this is the first attempt to provide a systematic framework for evaluating the retrieval effectiveness of twenty-first-century library information systems. We demonstrate which issues must be considered and what decisions must be made by researchers prior to a retrieval test.
Inhalt: Vgl.: http://www.emeraldinsight.com/doi/pdfplus/10.1108/JD-08-2016-0099.
13Leiva-Mederos, A. ; Senso, J.A. ; Hidalgo-Delgado, Y. ; Hipola, P.: Working framework of semantic interoperability for CRIS with heterogeneous data sources.
In: Journal of documentation. 73(2017) no.3, S.481-499.
Abstract: Purpose Information from Current Research Information Systems (CRIS) is stored in different formats, in platforms that are not compatible, or even in independent networks. It would be helpful to have a well-defined methodology to allow for management data processing from a single site, so as to take advantage of the capacity to link disperse data found in different systems, platforms, sources and/or formats. Based on functionalities and materials of the VLIR project, the purpose of this paper is to present a model that provides for interoperability by means of semantic alignment techniques and metadata crosswalks, and facilitates the fusion of information stored in diverse sources. Design/methodology/approach After reviewing the state of the art regarding the diverse mechanisms for achieving semantic interoperability, the paper analyzes the following: the specific coverage of the data sets (type of data, thematic coverage and geographic coverage); the technical specifications needed to retrieve and analyze a distribution of the data set (format, protocol, etc.); the conditions of re-utilization (copyright and licenses); and the "dimensions" included in the data set as well as the semantics of these dimensions (the syntax and the taxonomies of reference). The semantic interoperability framework here presented implements semantic alignment and metadata crosswalk to convert information from three different systems (ABCD, Moodle and DSpace) to integrate all the databases in a single RDF file. Findings The paper also includes an evaluation based on the comparison - by means of calculations of recall and precision - of the proposed model and identical consultations made on Open Archives Initiative and SQL, in order to estimate its efficiency. The results have been satisfactory enough, due to the fact that the semantic interoperability facilitates the exact retrieval of information. Originality/value The proposed model enhances management of the syntactic and semantic interoperability of the CRIS system designed. In a real setting of use it achieves very positive results.
Inhalt: Vgl.: http://www.emeraldinsight.com/doi/full/10.1108/JD-07-2016-0091.
Themenfeld: Semantische Interoperabilität ; Retrievalstudien
14Hider, P.: ¬The search value added by professional indexing to a bibliographic database.
In: http://www.iskocus.org/NASKO2017papers/NASKO2017_paper_33.pdf [NASKO 2017, June 15-16, 2017, Champaign, IL, USA].
Abstract: Gross et al. (2015) have demonstrated that about a quarter of hits would typically be lost to keyword searchers if contemporary academic library catalogs dropped their controlled subject headings. This paper reports on an analysis of the loss levels that would result if a bibliographic database, namely the Australian Education Index (AEI), were missing the subject descriptors and identifiers assigned by its professional indexers, employing the methodology developed by Gross and Taylor (2005), and later by Gross et al. (2015). The results indicate that AEI users would lose a similar proportion of hits per query to that experienced by library catalog users: on average, 27% of the resources found by a sample of keyword queries on the AEI database would not have been found without the subject indexing, based on the Australian Thesaurus of Education Descriptors (ATED). The paper also discusses the methodological limitations of these studies, pointing out that real-life users might still find some of the resources missed by a particular query through follow-up searches, while additional resources might also be found through iterative searching on the subject vocabulary. The paper goes on to describe a new research design, based on a before - and - after experiment, which addresses some of these limitations. It is argued that this alternative design will provide a more realistic picture of the value that professionally assigned subject indexing and controlled subject vocabularies can add to literature searching of a more scholarly and thorough kind.
Inhalt: Beitrag bei: NASKO 2017: Visualizing Knowledge Organization: Bringing Focus to Abstract Realities. The sixth North American Symposium on Knowledge Organization (NASKO 2017), June 15-16, 2017, in Champaign, IL, USA.
Themenfeld: Retrievalstudien ; Volltextretrieval
15Li, J. ; Zhang, P. ; Song, D. ; Wu, Y.: Understanding an enriched multidimensional user relevance model by analyzing query logs.
In: Journal of the Association for Information Science and Technology. 68(2017) no.12, S.2743-2754.
Abstract: Modeling multidimensional relevance in information retrieval (IR) has attracted much attention in recent years. However, most existing studies are conducted through relatively small-scale user studies, which may not reflect a real-world and natural search scenario. In this article, we propose to study the multidimensional user relevance model (MURM) on large scale query logs, which record users' various search behaviors (e.g., query reformulations, clicks and dwelling time, etc.) in natural search settings. We advance an existing MURM model (including five dimensions: topicality, novelty, reliability, understandability, and scope) by providing two additional dimensions, that is, interest and habit. The two new dimensions represent personalized relevance judgment on retrieved documents. Further, for each dimension in the enriched MURM model, a set of computable features are formulated. By conducting extensive document ranking experiments on Bing's query logs and TREC session Track data, we systematically investigated the impact of each dimension on retrieval performance and gained a series of insightful findings which may bring benefits for the design of future IR systems.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23868/full.
16Losada, D.E. ; Parapar, J. ; Barreiro, A.: Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems.
In: Information processing and management. 53(2017) no.5, S.1005-1025.
Abstract: Evaluating Information Retrieval systems is crucial to making progress in search technologies. Evaluation is often based on assembling reference collections consisting of documents, queries and relevance judgments done by humans. In large-scale environments, exhaustively judging relevance becomes infeasible. Instead, only a pool of documents is judged for relevance. By selectively choosing documents from the pool we can optimize the number of judgments required to identify a given number of relevant documents. We argue that this iterative selection process can be naturally modeled as a reinforcement learning problem and propose innovative and formal adjudication methods based on multi-armed bandits. Casting document judging as a multi-armed bandit problem is not only theoretically appealing, but also leads to highly effective adjudication methods. Under this bandit allocation framework, we consider stationary and non-stationary models and propose seven new document adjudication methods (five stationary methods and two non-stationary variants). Our paper also reports a series of experiments performed to thoroughly compare our new methods against current adjudication methods. This comparative study includes existing methods designed for pooling-based evaluation and existing methods designed for metasearch. Our experiments show that our theoretically grounded adjudication methods can substantially minimize the assessment effort.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2017.04.005.
17White, H.D.: Relevance theory and distributions of judgments in document retrieval.
In: Information processing and management. 53(2017) no.5, S.1080-1102.
Abstract: This article extends relevance theory (RT) from linguistic pragmatics into information retrieval. Using more than 50 retrieval experiments from the literature as examples, it applies RT to explain the frequency distributions of documents on relevance scales with three or more points. The scale points, which judges in experiments must consider in addition to queries and documents, are communications from researchers. In RT, the relevance of a communication varies directly with its cognitive effects and inversely with the effort of processing it. Researchers define and/or label the scale points to measure the cognitive effects of documents on judges. However, they apparently assume that all scale points as presented are equally easy for judges to process. Yet the notion that points cost variable effort explains fairly well the frequency distributions of judgments across them. By hypothesis, points that cost more effort are chosen by judges less frequently. Effort varies with the vagueness or strictness of scale-point labels and definitions. It is shown that vague scales tend to produce U- or V-shaped distributions, while strict scales tend to produce right-skewed distributions. These results reinforce the paper's more general argument that RT clarifies the concept of relevance in the dialogues of retrieval evaluation.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2017.02.010.
18Schultz Jr., W.N. ; Braddy, L.: ¬A librarian-centered study of perceptions of subject terms and controlled vocabulary.
In: Cataloging and classification quarterly. 55(2017) no.7/8, S.456-466.
Abstract: Controlled vocabulary and subject headings in OPAC records have proven to be useful in improving search results. The authors used a survey to gather information about librarian opinions and professional use of controlled vocabulary. Data from a range of backgrounds and expertise were examined, including academic and public libraries, and technical services as well as public services professionals. Responses overall demonstrated positive opinions of the value of controlled vocabulary, including in reference interactions as well as during bibliographic instruction sessions. Results are also examined based upon factors such as age and type of librarian.
Inhalt: Vgl.: https://doi.org/10.1080/01639374.2017.1356781.
Themenfeld: OPAC ; Retrievalstudien
19Dang, E.K.F. ; Luk, R.W.P. ; Allan, J.: ¬A context-dependent relevance model.
In: Journal of the Association for Information Science and Technology. 67(2016) no.3, S.582-593.
Abstract: Numerous past studies have demonstrated the effectiveness of the relevance model (RM) for information retrieval (IR). This approach enables relevance or pseudo-relevance feedback to be incorporated within the language modeling framework of IR. In the traditional RM, the feedback information is used to improve the estimate of the query language model. In this article, we introduce an extension of RM in the setting of relevance feedback. Our method provides an additional way to incorporate feedback via the improvement of the document language models. Specifically, we make use of the context information of known relevant and nonrelevant documents to obtain weighted counts of query terms for estimating the document language models. The context information is based on the words (unigrams or bigrams) appearing within a text window centered on query terms. Experiments on several Text REtrieval Conference (TREC) collections show that our context-dependent relevance model can improve retrieval performance over the baseline RM. Together with previous studies within the BM25 framework, our current study demonstrates that the effectiveness of our method for using context information in IR is quite general and not limited to any specific retrieval model.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23419/abstract.
20Borlund, P.: ¬A study of the use of simulated work task situations in interactive information retrieval evaluations : a meta-evaluation.
In: Journal of documentation. 72(2016) no.3, S.394-413.
Abstract: Purpose - The purpose of this paper is to report a study of how the test instrument of a simulated work task situation is used in empirical evaluations of interactive information retrieval (IIR) and reported in the research literature. In particular, the author is interested to learn whether the requirements of how to employ simulated work task situations are followed, and whether these requirements call for further highlighting and refinement. Design/methodology/approach - In order to study how simulated work task situations are used, the research literature in question is identified. This is done partly via citation analysis by use of Web of Science®, and partly by systematic search of online repositories. On this basis, 67 individual publications were identified and they constitute the sample of analysis. Findings - The analysis reveals a need for clarifications of how to use simulated work task situations in IIR evaluations. In particular, with respect to the design and creation of realistic simulated work task situations. There is a lack of tailoring of the simulated work task situations to the test participants. Likewise, the requirement to include the test participants' personal information needs is neglected. Further, there is a need to add and emphasise a requirement to depict the used simulated work task situations when reporting the IIR studies. Research limitations/implications - Insight about the use of simulated work task situations has implications for test design of IIR studies and hence the knowledge base generated on the basis of such studies. Originality/value - Simulated work task situations are widely used in IIR studies, and the present study is the first comprehensive study of the intended and unintended use of this test instrument since its introduction in the late 1990's. The paper addresses the need to carefully design and tailor simulated work task situations to suit the test participants in order to obtain the intended authentic and realistic IIR under study.
Inhalt: Vgl.: http://dx.doi.org/10.1108/JD-06-2015-0068.