Literatur zur Informationserschließung
Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft
/
Powered by litecat, BIS Oldenburg
(Stand: 28. April 2022)
Suche
Suchergebnisse
Treffer 1–12 von 12
sortiert nach:
-
1Lehmann, J. ; Castillo, C. ; Lalmas, M. ; Baeza-Yates, R.: Story-focused reading in online news and its potential for user engagement.
In: Journal of the Association for Information Science and Technology. 68(2017) no.4, S.869-883.
Abstract: We study the news reading behavior of several hundred thousand users on 65 highly visited news sites. We focus on a specific phenomenon: users reading several articles related to a particular news development, which we call story-focused reading. Our goal is to understand the effect of story-focused reading on user engagement and how news sites can support this phenomenon. We found that most users focus on stories that interest them and that even casual news readers engage in story-focused reading. During story-focused reading, users spend more time reading and a larger number of news sites are involved. In addition, readers employ different strategies to find articles related to a story. We also analyze how news sites promote story-focused reading by looking at how they link their articles to related content published by them, or by other sources. The results show that providing links to related content leads to a higher engagement of the users, and that this is the case even for links to external sites. We also show that the performance of links can be affected by their type, their position, and how many of them are present within an article.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23707/full.
Anmerkung: This work was done while Janette Lehmann was a PhD student at Universitat Pompeu Fabra and it was carried out as part of her PhD internship at Yahoo! Labs Barcelona. This work was carried out while Carlos Castillo was working at Qatar Computing Research Institute.
Themenfeld: Internet
Wissenschaftsfach: Kommunikationswissenschaften
-
2Kucukyilmaz, T. ; Cambazoglu, B.B. ; Aykanat, C. ; Baeza-Yates, R.: ¬A machine learning approach for result caching in web search engines.
In: Information processing and management. 53(2017) no.4, S.834-850.
Abstract: A commonly used technique for improving search engine performance is result caching. In result caching, precomputed results (e.g., URLs and snippets of best matching pages) of certain queries are stored in a fast-access storage. The future occurrences of a query whose results are already stored in the cache can be directly served by the result cache, eliminating the need to process the query using costly computing resources. Although other performance metrics are possible, the main performance metric for evaluating the success of a result cache is hit rate. In this work, we present a machine learning approach to improve the hit rate of a result cache by facilitating a large number of features extracted from search engine query logs. We then apply the proposed machine learning approach to static, dynamic, and static-dynamic caching. Compared to the previous methods in the literature, the proposed approach improves the hit rate of the result cache up to 0.66%, which corresponds to 9.60% of the potential room for improvement.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2017.02.006.
Themenfeld: Suchmaschinen
-
3Castillo, C. ; Baeza-Yates, R.: Web retrieval and mining.
In: Encyclopedia of library and information sciences. 3rd ed. Ed.: M.J. Bates. London : Taylor & Francis, 2009. S.xx-xx.
Abstract: The advent of the Web in the mid-1990s followed by its fast adoption in a relatively short time, posed significant challenges to classical information retrieval methods developed in the 1970s and the 1980s. The major challenges include that the Web is massive, dynamic, and distributed. The two main types of tasks that are carried on the Web are searching and mining. Searching is locating information given an information need, and mining is extracting information and/or knowledge from a corpus. The metrics for success when carrying these tasks on the Web include precision, recall (completeness), freshness, and efficiency.
Anmerkung: Vgl.: http://www.tandfonline.com/doi/book/10.1081/E-ELIS3.
Themenfeld: Suchmaschinen
-
4Baeza-Yates, R. ; Hurtado, C. ; Mendoza, M.: Improving search engines by query clustering.
In: Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1793-1804.
Abstract: In this paper, we present a framework for clustering Web search engine queries whose aim is to identify groups of queries used to search for similar information on the Web. The framework is based on a novel term vector model of queries that integrates user selections and the content of selected documents extracted from the logs of a search engine. The query representation obtained allows us to treat query clustering similarly to standard document clustering. We study the application of the clustering framework to two problems: relevance ranking boosting and query recommendation. Finally, we evaluate with experiments the effectiveness of our approach.
Anmerkung: Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
Themenfeld: Suchmaschinen ; Data Mining
Objekt: WWW
-
5Baeza-Yates, R. ; Boldi, P. ; Castillo, C.: Generalizing PageRank : damping functions for linkbased ranking algorithms.
In: http://chato.cl/papers/baeza06_general_pagerank_damping_functions_link_ranking.pdf [Proceedings of the ACM Special Interest Group on Information Retrieval (SIGIR) Conference, SIGIR'06, August 6-10, 2006, Seattle, Washington, USA].
Abstract: This paper introduces a family of link-based ranking algorithms that propagate page importance through links. In these algorithms there is a damping function that decreases with distance, so a direct link implies more endorsement than a link through a long path. PageRank is the most widely known ranking function of this family. The main objective of this paper is to determine whether this family of ranking techniques has some interest per se, and how different choices for the damping function impact on rank quality and on convergence speed. Even though our results suggest that PageRank can be approximated with other simpler forms of rankings that may be computed more efficiently, our focus is of more speculative nature, in that it aims at separating the kernel of PageRank, that is, link-based importance propagation, from the way propagation decays over paths. We focus on three damping functions, having linear, exponential, and hyperbolic decay on the lengths of the paths. The exponential decay corresponds to PageRank, and the other functions are new. Our presentation includes algorithms, analysis, comparisons and experiments that study their behavior under different parameters in real Web graph data. Among other results, we show how to calculate a linear approximation that induces a page ordering that is almost identical to PageRank's using a fixed small number of iterations; comparisons were performed using Kendall's tau on large domain datasets.
Themenfeld: Suchmaschinen
Objekt: PageRank
-
6Navarro, G. ; Baeza-Yates, R. ; Azevedo Arcoverde, J.M.: Matchsimile : a flexible approximate matching tool for searching proper names.
In: Journal of the American Society for Information Science and technology. 54(2003) no.1, S.3-15.
Abstract: We present the architecture and algorithms behind Matchsimile, an approximate string matching lookup tool especially designed for extracting person and company names from large texts. Part of a larger information extraction environment, this specific engine receives a large set of proper names to search for, a text to search, and search options; and outputs all the occurrences of the names found in the text. Beyond the similarity search capabilities applied at the intraword level, the tool considers a set of specific person name formation rules at the word level, such as combination, abbreviation, duplicity detections, ordering, word omission and insertion, among others. This engine is used in a successful commercial application (also named Matchsimile), which allows searching for lawyer names in official law publications.
-
7Baeza-Yates, R. ; Navarro, G.: XQL and proximal nodes.
In: Journal of the American Society for Information Science and technology. 53(2002) no.6, S.504-514.
Abstract: Despite the fact that several models to structure text documents and to query on this structure have been proposed in the past, a standard has emerged only relatively recently with the introduction of XML and its proposed query language XQL, on which we focus in this article. Although there exist some implementations of XQL, efficiency of the query engine is still a problem. We show in this article that an already existing model, Proximal Nodes, which was defined with the goal of efficiency in mind, can be used as an efficient query engine behind an XQL front-end.
Objekt: XML ; XQL
-
8Baeza-Yates, R. ; Navarro, G.: Block addressing indices for approximate text retrieval.
In: Journal of the American Society for Information Science. 51(2000) no.1, S.69-82.
Abstract: The issue of reducing the space overhead when indexing large text databases is becoming more and more important, as the text collection grow in size. Another subject, which is gaining importance as text databases grow and get more heterogeneous and error prone, is that of flexible string matching. One of the best tools to make the search more flexible is to allow a limited number of differences between the words found and those sought. This is called 'approximate text searching'. which is becoming more and more popular. In recent years some indexing schemes with very low space overhead have appeared, some of them dealing with approximate searching. These low overhead indices (whose most notorious exponent is Glimpse) are modified inverted files, where space is saved by making the lists of occurences point to text blocks instead of exact word positions. Despite their existence, little is known about the expected behaviour of these 'block addressing' indices, and even less is known when it comes to cope with approximate search. Our main contribution is an analytical study of the space-time trade-offs for indexed text searching
Themenfeld: Retrievalalgorithmen
-
9Baeza-Yates, R.A.: Introduction to data structures and algorithms related to information retrieval.
In: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates. Englewood Cliffs, NJ : Prentice Hall, 1992. S.13-27.
Abstract: In this chapter we review the main concepts and data structures used in information retrieval, and we classify information retrieval related algorithms
Themenfeld: Retrievalalgorithmen
-
10Harman, D. ; Fox, E. ; Baeza-Yates, R. ; Lee, W.: Inverted files.
In: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates. Englewood Cliffs, NJ : Prentice Hall, 1992. S.28-43.
Abstract: This chaper presents a survey of the various structures (techniques) that can be used in building inverted files, and gives the details for producing an inverted file using sorted arrays. The chapter ends with 2 modifications to this basic method that are affective for large data collections
Themenfeld: Retrievalalgorithmen
-
11Gonnet, G.H. ; Snider, T. ; Baeza-Yates, R.A.: New indices for text : PAT trees and PAT arrays.
In: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates. Englewood Cliffs, NJ : Prentice Hall, 1992. S.66-82.
Abstract: We survey new indices for text, with emphasis on PAT arrays (also called suffic arrays). A PAT array is an index based on a new model of text that does not use the concept of word and does not need to know the structure of text
Themenfeld: Retrievalalgorithmen
-
12Baeza-Yates, R.A.: String searching algorithms.
In: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates. Englewood Cliffs, NJ : Prentice Hall, 1992. S.219-240.
Abstract: Survey of several algorithms for searching a string in a text. Includes are theoretical and empirical results, as well as the actual code of each algorithm. An extensive bibliography is included
Themenfeld: Retrievalalgorithmen
Objekt: Knuth-Morris-Pratt Algorithmus ; Boyer-Moore Algorithmus ; Shift-Or Algorithmus ; Karp-Rabin Algorithmus ; Aho-Corasick Algorithmus