Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 21. Januar 2019)
1Rajagopal, P. ; Ravana, S.D. ; Koh, Y.S. ; Balakrishnan, V.: Evaluating the effectiveness of information retrieval systems using effort-based relevance judgment.
In: Aslib journal of information management. 71(2019) no.1, S.2-17.
Abstract: Purpose The effort in addition to relevance is a major factor for satisfaction and utility of the document to the actual user. The purpose of this paper is to propose a method in generating relevance judgments that incorporate effort without human judges' involvement. Then the study determines the variation in system rankings due to low effort relevance judgment in evaluating retrieval systems at different depth of evaluation. Design/methodology/approach Effort-based relevance judgments are generated using a proposed boxplot approach for simple document features, HTML features and readability features. The boxplot approach is a simple yet repeatable approach in classifying documents' effort while ensuring outlier scores do not skew the grading of the entire set of documents. Findings The retrieval systems evaluation using low effort relevance judgments has a stronger influence on shallow depth of evaluation compared to deeper depth. It is proved that difference in the system rankings is due to low effort documents and not the number of relevant documents. Originality/value Hence, it is crucial to evaluate retrieval systems at shallow depth using low effort relevance judgments.
Inhalt: Vgl.: https://doi.org/10.1108/AJIM-04-2018-0086.
2Losada, D.E. ; Parapar, J. ; Barreiro, A.: When to stop making relevance judgments? : a study of stopping methods for building information retrieval test collections.
In: Journal of the Association for Information Science and Technology. 70(2019) no.1, S.49-60.
Abstract: In information retrieval evaluation, pooling is a well-known technique to extract a sample of documents to be assessed for relevance. Given the pooled documents, a number of studies have proposed different prioritization methods to adjudicate documents for judgment. These methods follow different strategies to reduce the assessment effort. However, there is no clear guidance on how many relevance judgments are required for creating a reliable test collection. In this article we investigate and further develop methods to determine when to stop making relevance judgments. We propose a highly diversified set of stopping methods and provide a comprehensive analysis of the usefulness of the resulting test collections. Some of the stopping methods introduced here combine innovative estimates of recall with time series models used in Financial Trading. Experimental results on several representative collections show that some stopping methods can reduce up to 95% of the assessment effort and still produce a robust test collection. We demonstrate that the reduced set of judgments can be reliably employed to compare search systems using disparate effectiveness metrics such as Average Precision, NDCG, P@100, and Rank Biased Precision. With all these measures, the correlations found between full pool rankings and reduced pool rankings is very high.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24077.
3Munkelt, J. ; Schaer, P.: Towards an IR test collection for the German National Library.
Anmerkung: Vortrag, Conference: Lernen Wissen Daten Analysen 2018 at Mannheim.
4Sarigil, E. ; Sengor Altingovde, I. ; Blanco, R. ; Barla Cambazoglu, B. ; Ozcan, R. ; Ulusoy, Ö.: Characterizing, predicting, and handling web search queries that match very few or no results.
In: Journal of the Association for Information Science and Technology. 69(2018) no.2, S.256-270.
Abstract: A non-negligible fraction of user queries end up with very few or even no matching results in leading commercial web search engines. In this work, we provide a detailed characterization of such queries and show that search engines try to improve such queries by showing the results of related queries. Through a user study, we show that these query suggestions are usually perceived as relevant. Also, through a query log analysis, we show that the users are dissatisfied after submitting a query that match no results at least 88.5% of the time. As a first step towards solving these no-answer queries, we devised a large number of features that can be used to identify such queries and built machine-learning models. These models can be useful for scenarios such as the mobile- or meta-search, where identifying a query that will retrieve no results at the client device (i.e., even before submitting it to the search engine) may yield gains in terms of the bandwidth usage, power consumption, and/or monetary costs. Experiments over query logs indicate that, despite the heavy skew in class sizes, our models achieve good prediction quality, with accuracy (in terms of area under the curve) up to 0.95.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23955/full.
Themenfeld: Retrievalstudien ; Suchmaschinen
5Hider, P.: ¬The search value added by professional indexing to a bibliographic database.
In: Knowledge organization. 45(2018) no.1, S.23-32.
Abstract: Gross et al. (2015) have demonstrated that about a quarter of hits would typically be lost to keyword searchers if contemporary academic library catalogs dropped their controlled subject headings. This article reports on an investigation of the search value that subject descriptors and identifiers assigned by professional indexers add to a bibliographic database, namely the Australian Education Index (AEI). First, a similar methodology to that developed by Gross et al. (2015) was applied, with keyword searches representing a range of educational topics run on the AEI database with and without its subject indexing. The results indicated that AEI users would also lose, on average, about a quarter of hits per query. Second, an alternative research design was applied in which an experienced literature searcher was asked to find resources on a set of educational topics on an AEI database stripped of its subject indexing and then asked to search for additional resources on the same topics after the subject indexing had been reinserted. In this study, the proportion of additional resources that would have been lost had it not been for the subject indexing was again found to be about a quarter of the total resources found for each topic, on average.
Themenfeld: Retrievalstudien ; Volltextretrieval
6Munkelt, J.: Erstellung einer DNB-Retrieval-Testkollektion.
Köln : Technische Hochschule, Fakultät für Informations- und Kommunikationswissenschaften, 2018. II, 79 S.
Abstract: Seit Herbst 2017 findet in der Deutschen Nationalbibliothek die Inhaltserschließung bestimmter Medienwerke rein maschinell statt. Die Qualität dieses Verfahrens, das die Prozessorganisation von Bibliotheken maßgeblich prägen kann, wird unter Fachleuten kontrovers diskutiert. Ihre Standpunkte werden zunächst hinreichend erläutert, ehe die Notwendigkeit einer Qualitätsprüfung des Verfahrens und dessen Grundlagen dargelegt werden. Zentraler Bestandteil einer künftigen Prüfung ist eine Testkollektion. Ihre Erstellung und deren Dokumentation steht im Fokus dieser Arbeit. In diesem Zusammenhang werden auch die Entstehungsgeschichte und Anforderungen an gelungene Testkollektionen behandelt. Abschließend wird ein Retrievaltest durchgeführt, der die Einsatzfähigkeit der erarbeiteten Testkollektion belegt. Seine Ergebnisse dienen ausschließlich der Funktionsüberprüfung. Eine Qualitätsbeurteilung maschineller Inhaltserschließung im Speziellen sowie im Allgemeinen findet nicht statt und ist nicht Ziel der Ausarbeitung.
Inhalt: Bachelorarbeit, Bibliothekswissenschaften, Fakultät für Informations- und Kommunikationswissenschaften, Technische Hochschule Köln
Themenfeld: Retrievalstudien ; Automatisches Indexieren
7Munkelt, J. ; Schaer, P. ; Lepsky, K.: Towards an IR test collection for the German National Library.[Preprint].
Abstract: Automatic content indexing is one of the innovations that are increasingly changing the way libraries work. In theory, it promises a cataloguing service that would hardly be possible with humans in terms of speed, quantity and maybe quality. The German National Library (DNB) has also recognised this potential and is increasingly relying on the automatic indexing of their catalogue content. The DNB took a major step in this direction in 2017, which was announced in two papers. The announcement was rather restrained, but the content of the papers is all the more explosive for the library community: Since September 2017, the DNB has discontinued the intellectual indexing of series Band H and has switched to an automatic process for these series. The subject indexing of online publications (series O) has been purely automatical since 2010; from September 2017, monographs and periodicals published outside the publishing industry and university publications will no longer be indexed by people. This raises the question: What is the quality of the automatic indexing compared to the manual work or in other words to which degree can the automatic indexing replace people without a signi cant drop in regards to quality?
Themenfeld: Retrievalstudien ; Automatisches Indexieren
8Behnert, C. ; Lewandowski, D.: ¬A framework for designing retrieval effectiveness studies of library information systems using human relevance assessments.
In: Journal of documentation. 73(2017) no.3, S.509-527.
Abstract: Purpose This paper demonstrates how to apply traditional information retrieval evaluation methods based on standards from the Text REtrieval Conference (TREC) and web search evaluation to all types of modern library information systems including online public access catalogs, discovery systems, and digital libraries that provide web search features to gather information from heterogeneous sources. Design/methodology/approach We apply conventional procedures from information retrieval evaluation to the library information system context considering the specific characteristics of modern library materials. Findings We introduce a framework consisting of five parts: (1) search queries, (2) search results, (3) assessors, (4) testing, and (5) data analysis. We show how to deal with comparability problems resulting from diverse document types, e.g., electronic articles vs. printed monographs and what issues need to be considered for retrieval tests in the library context. Practical implications The framework can be used as a guideline for conducting retrieval effectiveness studies in the library context. Originality/value Although a considerable amount of research has been done on information retrieval evaluation, and standards for conducting retrieval effectiveness studies do exist, to our knowledge this is the first attempt to provide a systematic framework for evaluating the retrieval effectiveness of twenty-first-century library information systems. We demonstrate which issues must be considered and what decisions must be made by researchers prior to a retrieval test.
Inhalt: Vgl.: http://www.emeraldinsight.com/doi/pdfplus/10.1108/JD-08-2016-0099.
9Leiva-Mederos, A. ; Senso, J.A. ; Hidalgo-Delgado, Y. ; Hipola, P.: Working framework of semantic interoperability for CRIS with heterogeneous data sources.
In: Journal of documentation. 73(2017) no.3, S.481-499.
Abstract: Purpose Information from Current Research Information Systems (CRIS) is stored in different formats, in platforms that are not compatible, or even in independent networks. It would be helpful to have a well-defined methodology to allow for management data processing from a single site, so as to take advantage of the capacity to link disperse data found in different systems, platforms, sources and/or formats. Based on functionalities and materials of the VLIR project, the purpose of this paper is to present a model that provides for interoperability by means of semantic alignment techniques and metadata crosswalks, and facilitates the fusion of information stored in diverse sources. Design/methodology/approach After reviewing the state of the art regarding the diverse mechanisms for achieving semantic interoperability, the paper analyzes the following: the specific coverage of the data sets (type of data, thematic coverage and geographic coverage); the technical specifications needed to retrieve and analyze a distribution of the data set (format, protocol, etc.); the conditions of re-utilization (copyright and licenses); and the "dimensions" included in the data set as well as the semantics of these dimensions (the syntax and the taxonomies of reference). The semantic interoperability framework here presented implements semantic alignment and metadata crosswalk to convert information from three different systems (ABCD, Moodle and DSpace) to integrate all the databases in a single RDF file. Findings The paper also includes an evaluation based on the comparison - by means of calculations of recall and precision - of the proposed model and identical consultations made on Open Archives Initiative and SQL, in order to estimate its efficiency. The results have been satisfactory enough, due to the fact that the semantic interoperability facilitates the exact retrieval of information. Originality/value The proposed model enhances management of the syntactic and semantic interoperability of the CRIS system designed. In a real setting of use it achieves very positive results.
Inhalt: Vgl.: http://www.emeraldinsight.com/doi/full/10.1108/JD-07-2016-0091.
Themenfeld: Semantische Interoperabilität ; Retrievalstudien
10Hider, P.: ¬The search value added by professional indexing to a bibliographic database.
In: http://www.iskocus.org/NASKO2017papers/NASKO2017_paper_33.pdf [NASKO 2017, June 15-16, 2017, Champaign, IL, USA].
Abstract: Gross et al. (2015) have demonstrated that about a quarter of hits would typically be lost to keyword searchers if contemporary academic library catalogs dropped their controlled subject headings. This paper reports on an analysis of the loss levels that would result if a bibliographic database, namely the Australian Education Index (AEI), were missing the subject descriptors and identifiers assigned by its professional indexers, employing the methodology developed by Gross and Taylor (2005), and later by Gross et al. (2015). The results indicate that AEI users would lose a similar proportion of hits per query to that experienced by library catalog users: on average, 27% of the resources found by a sample of keyword queries on the AEI database would not have been found without the subject indexing, based on the Australian Thesaurus of Education Descriptors (ATED). The paper also discusses the methodological limitations of these studies, pointing out that real-life users might still find some of the resources missed by a particular query through follow-up searches, while additional resources might also be found through iterative searching on the subject vocabulary. The paper goes on to describe a new research design, based on a before - and - after experiment, which addresses some of these limitations. It is argued that this alternative design will provide a more realistic picture of the value that professionally assigned subject indexing and controlled subject vocabularies can add to literature searching of a more scholarly and thorough kind.
Inhalt: Beitrag bei: NASKO 2017: Visualizing Knowledge Organization: Bringing Focus to Abstract Realities. The sixth North American Symposium on Knowledge Organization (NASKO 2017), June 15-16, 2017, in Champaign, IL, USA.
Themenfeld: Retrievalstudien ; Volltextretrieval
11Li, J. ; Zhang, P. ; Song, D. ; Wu, Y.: Understanding an enriched multidimensional user relevance model by analyzing query logs.
In: Journal of the Association for Information Science and Technology. 68(2017) no.12, S.2743-2754.
Abstract: Modeling multidimensional relevance in information retrieval (IR) has attracted much attention in recent years. However, most existing studies are conducted through relatively small-scale user studies, which may not reflect a real-world and natural search scenario. In this article, we propose to study the multidimensional user relevance model (MURM) on large scale query logs, which record users' various search behaviors (e.g., query reformulations, clicks and dwelling time, etc.) in natural search settings. We advance an existing MURM model (including five dimensions: topicality, novelty, reliability, understandability, and scope) by providing two additional dimensions, that is, interest and habit. The two new dimensions represent personalized relevance judgment on retrieved documents. Further, for each dimension in the enriched MURM model, a set of computable features are formulated. By conducting extensive document ranking experiments on Bing's query logs and TREC session Track data, we systematically investigated the impact of each dimension on retrieval performance and gained a series of insightful findings which may bring benefits for the design of future IR systems.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23868/full.
12Dang, E.K.F. ; Luk, R.W.P. ; Allan, J.: ¬A context-dependent relevance model.
In: Journal of the Association for Information Science and Technology. 67(2016) no.3, S.582-593.
Abstract: Numerous past studies have demonstrated the effectiveness of the relevance model (RM) for information retrieval (IR). This approach enables relevance or pseudo-relevance feedback to be incorporated within the language modeling framework of IR. In the traditional RM, the feedback information is used to improve the estimate of the query language model. In this article, we introduce an extension of RM in the setting of relevance feedback. Our method provides an additional way to incorporate feedback via the improvement of the document language models. Specifically, we make use of the context information of known relevant and nonrelevant documents to obtain weighted counts of query terms for estimating the document language models. The context information is based on the words (unigrams or bigrams) appearing within a text window centered on query terms. Experiments on several Text REtrieval Conference (TREC) collections show that our context-dependent relevance model can improve retrieval performance over the baseline RM. Together with previous studies within the BM25 framework, our current study demonstrates that the effectiveness of our method for using context information in IR is quite general and not limited to any specific retrieval model.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23419/abstract.
13Borlund, P.: ¬A study of the use of simulated work task situations in interactive information retrieval evaluations : a meta-evaluation.
In: Journal of documentation. 72(2016) no.3, S.394-413.
Abstract: Purpose - The purpose of this paper is to report a study of how the test instrument of a simulated work task situation is used in empirical evaluations of interactive information retrieval (IIR) and reported in the research literature. In particular, the author is interested to learn whether the requirements of how to employ simulated work task situations are followed, and whether these requirements call for further highlighting and refinement. Design/methodology/approach - In order to study how simulated work task situations are used, the research literature in question is identified. This is done partly via citation analysis by use of Web of Science®, and partly by systematic search of online repositories. On this basis, 67 individual publications were identified and they constitute the sample of analysis. Findings - The analysis reveals a need for clarifications of how to use simulated work task situations in IIR evaluations. In particular, with respect to the design and creation of realistic simulated work task situations. There is a lack of tailoring of the simulated work task situations to the test participants. Likewise, the requirement to include the test participants' personal information needs is neglected. Further, there is a need to add and emphasise a requirement to depict the used simulated work task situations when reporting the IIR studies. Research limitations/implications - Insight about the use of simulated work task situations has implications for test design of IIR studies and hence the knowledge base generated on the basis of such studies. Originality/value - Simulated work task situations are widely used in IIR studies, and the present study is the first comprehensive study of the intended and unintended use of this test instrument since its introduction in the late 1990's. The paper addresses the need to carefully design and tailor simulated work task situations to suit the test participants in order to obtain the intended authentic and realistic IIR under study.
Inhalt: Vgl.: http://dx.doi.org/10.1108/JD-06-2015-0068.
14Günther, M.: Vermitteln Suchmaschinen vollständige Bilder aktueller Themen? : Untersuchung der Gewichtung inhaltlicher Aspekte von Suchmaschinenergebnissen in Deutschland und den USA.
In: Young information scientists. 1(2016), S.13-29.
Abstract: Zielsetzung - Vor dem Hintergrund von Suchmaschinenverzerrungen sollte herausgefunden werden, ob sich die von Google und Bing vermittelten Bilder aktueller internationaler Themen in Deutschland und den USA hinsichtlich (1) Vollständigkeit, (2) Abdeckung und (3) Gewichtung der jeweiligen inhaltlichen Aspekte unterscheiden. Forschungsmethoden - Für die empirische Untersuchung wurde eine Methode aus Ansätzen der empirischen Sozialwissenschaften (Inhaltsanalyse) und der Informationswissenschaft (Retrievaltests) entwickelt und angewandt. Ergebnisse - Es zeigte sich, dass Google und Bing in Deutschland und den USA (1) keine vollständigen Bilder aktueller internationaler Themen vermitteln, dass sie (2) auf den ersten Trefferpositionen nicht die drei wichtigsten inhaltlichen Aspekte abdecken, und dass es (3) bei der Gewichtung der inhaltlichen Aspekte keine signifikanten Unterschiede gibt. Allerdings erfahren diese Ergebnisse Einschränkungen durch die Methodik und die Auswertung der empirischen Untersuchung. Schlussfolgerungen - Es scheinen tatsächlich inhaltliche Suchmaschinenverzerrungen vorzuliegen - diese könnten Auswirkungen auf die Meinungsbildung der Suchmaschinennutzer haben. Trotz großem Aufwand bei manueller, und qualitativ schlechteren Ergebnissen bei automatischer Untersuchung sollte dieses Thema weiter erforscht werden.
Inhalt: Vgl.: https://yis.univie.ac.at/index.php/yis/article/view/1355. Diesem Beitrag liegt folgende Abschlussarbeit zugrunde: Günther, Markus: Welches Weltbild vermitteln Suchmaschinen? Untersuchung der Gewichtung inhaltlicher Aspekte von Google- und Bing-Ergebnissen in Deutschland und den USA zu aktuellen internationalen Themen . Masterarbeit (M.A.), Hochschule für Angewandte Wissenschaften Hamburg, 2015. Volltext: http://edoc.sub.uni-hamburg.de/haw/volltexte/2016/332.
Themenfeld: Suchmaschinen ; Retrievalstudien
Objekt: Google ; Bing
Land/Ort: D ; USA
15Schaer, P. ; Mayr, P. ; Sünkler, S. ; Lewandowski, D.: How relevant is the long tail? : a relevance assessment study on million short.
Abstract: Users of web search engines are known to mostly focus on the top ranked results of the search engine result page. While many studies support this well known information seeking pattern only few studies concentrate on the question what users are missing by neglecting lower ranked results. To learn more about the relevance distributions in the so-called long tail we conducted a relevance assessment study with the Million Short long-tail web search engine. While we see a clear difference in the content between the head and the tail of the search engine result list we see no statistical significant differences in the binary relevance judgments and weak significant differences when using graded relevance. The tail contains different but still valuable results. We argue that the long tail can be a rich source for the diversification of web search engine result lists but it needs more evaluation to clearly describe the differences.
Inhalt: Die Studie wurde auf der diesjährigen CLEF-Konferenz mit dem Best Poster Award ausgezeichnet.
Anmerkung: To appear in Experimental IR Meets Multilinguality, Multimodality, and Interaction. 7th International Conference of the CLEF Association, CLEF 2016, \'Evora, Portugal, September 5-8, 2016.
Themenfeld: Suchmaschinen ; Retrievalstudien
16Colace, F. ; Santo, M. de ; Greco, L. ; Napoletano, P.: Improving relevance feedback-based query expansion by the use of a weighted word pairs approach.
In: Journal of the Association for Information Science and Technology. 66(2015) no.11, S.2223-2234.
Abstract: In this article, the use of a new term extraction method for query expansion (QE) in text retrieval is investigated. The new method expands the initial query with a structured representation made of weighted word pairs (WWP) extracted from a set of training documents (relevance feedback). Standard text retrieval systems can handle a WWP structure through custom Boolean weighted models. We experimented with both the explicit and pseudorelevance feedback schemas and compared the proposed term extraction method with others in the literature, such as KLD and RM3. Evaluations have been conducted on a number of test collections (Text REtrivel Conference [TREC]-6, -7, -8, -9, and -10). Results demonstrated that the QE method based on this new structure outperforms the baseline.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23331/abstract.
Themenfeld: Semantisches Umfeld in Indexierung u. Retrieval ; Retrievalstudien
17Tamine, L. ; Chouquet, C. ; Palmer, T.: Analysis of biomedical and health queries : lessons learned from TREC and CLEF evaluation benchmarks.
In: Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2626-2642.
Abstract: A large body of research work examined, from both the query side and the user behavior side, the characteristics of medical- and health-related searches. One of the core issues in medical information retrieval (IR) is diversity of tasks that lead to diversity of categories of information needs and queries. From the evaluation perspective, another related and challenging issue is the limited availability of appropriate test collections allowing the experimental validation of medically task oriented IR techniques and systems. In this paper, we explore the peculiarities of TREC and CLEF medically oriented tasks and queries through the analysis of the differences and the similarities between queries across tasks, with respect to length, specificity, and clarity features and then study their effect on retrieval performance. We show that, even for expert oriented queries, language specificity level varies significantly across tasks as well as search difficulty. Additional findings highlight that query clarity factors are task dependent and that query terms specificity based on domain-specific terminology resources is not significantly linked to term rareness in the document collection. The lessons learned from our study could serve as starting points for the design of future task-based medical information retrieval frameworks.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23351/abstract.
Objekt: TREC ; CLEF
18Ravana, S.D. ; Taheri, M.S. ; Rajagopal, P.: Document-based approach to improve the accuracy of pairwise comparison in evaluating information retrieval systems.
In: Aslib journal of information management. 67(2015) no.4, S.408-421.
Abstract: Purpose The purpose of this paper is to propose a method to have more accurate results in comparing performance of the paired information retrieval (IR) systems with reference to the current method, which is based on the mean effectiveness scores of the systems across a set of identified topics/queries. Design/methodology/approach Based on the proposed approach, instead of the classic method of using a set of topic scores, the documents level scores are considered as the evaluation unit. These document scores are the defined document's weight, which play the role of the mean average precision (MAP) score of the systems as a significance test's statics. The experiments were conducted using the TREC 9 Web track collection. Findings The p-values generated through the two types of significance tests, namely the Student's t-test and Mann-Whitney show that by using the document level scores as an evaluation unit, the difference between IR systems is more significant compared with utilizing topic scores. Originality/value Utilizing a suitable test collection is a primary prerequisite for IR systems comparative evaluation. However, in addition to reusable test collections, having an accurate statistical testing is a necessity for these evaluations. The findings of this study will assist IR researchers to evaluate their retrieval systems and algorithms more accurately.
Inhalt: Vgl.: http://dx.doi.org/10.1108/AJIM-12-2014-0171.
19Lu, K. ; Kipp, M.E.I.: Understanding the retrieval effectiveness of collaborative tags and author keywords in different retrieval environments : an experimental study on medical collections.
In: Journal of the Association for Information Science and Technology. 65(2014) no.3, S.483-500.
Abstract: This study investigates the retrieval effectiveness of collaborative tags and author keywords in different environments through controlled experiments. Three test collections were built. The first collection tests the impact of tags on retrieval performance when only the title and abstract are available (the abstract environment). The second tests the impact of tags when the full text is available (the full-text environment). The third compares the retrieval effectiveness of tags and author keywords in the abstract environment. In addition, both single-word queries and phrase queries are tested to understand the impact of different query types. Our findings suggest that including tags and author keywords in indexes can enhance recall but may improve or worsen average precision depending on retrieval environments and query types. Indexing tags and author keywords for searching using phrase queries in the abstract environment showed improved average precision, whereas indexing tags for searching using single-word queries in the full-text environment led to a significant drop in average precision. The comparison between tags and author keywords in the abstract environment indicates that they have comparable impact on average precision, but author keywords are more advantageous in enhancing recall. The findings from this study provide useful implications for designing retrieval systems that incorporate tags and author keywords.
20Ruthven, I.: Relevance behaviour in TREC.
In: Journal of documentation. 70(2014) no.6, S.1098-1117.
Abstract: Purpose - The purpose of this paper is to examine how various types of TREC data can be used to better understand relevance and serve as test-bed for exploring relevance. The author proposes that there are many interesting studies that can be performed on the TREC data collections that are not directly related to evaluating systems but to learning more about human judgements of information and relevance and that these studies can provide useful research questions for other types of investigation. Design/methodology/approach - Through several case studies the author shows how existing data from TREC can be used to learn more about the factors that may affect relevance judgements and interactive search decisions and answer new research questions for exploring relevance. Findings - The paper uncovers factors, such as familiarity, interest and strictness of relevance criteria, that affect the nature of relevance assessments within TREC, contrasting these against findings from user studies of relevance. Research limitations/implications - The research only considers certain uses of TREC data and assessment given by professional relevance assessors but motivates further exploration of the TREC data so that the research community can further exploit the effort involved in the construction of TREC test collections. Originality/value - The paper presents an original viewpoint on relevance investigations and TREC itself by motivating TREC as a source of inspiration on understanding relevance rather than purely as a source of evaluation material.
Inhalt: Beitrag in einem Special Issue: Festschrift in honour of Nigel Ford