Search (132 results, page 7 of 7)

  • × theme_ss:"Retrievalalgorithmen"
  • × year_i:[2000 TO 2010}
  1. White, R.W.; Jose, J.M.; Ruthven, I.: ¬An implicit feedback approach for interactive information retrieval (2006) 0.00
    0.0014112709 = product of:
      0.012701439 = sum of:
        0.012701439 = weight(_text_:of in 964) [ClassicSimilarity], result of:
          0.012701439 = score(doc=964,freq=8.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.20732689 = fieldWeight in 964, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=964)
      0.11111111 = coord(1/9)
    
    Abstract
    Searchers can face problems finding the information they seek. One reason for this is that they may have difficulty devising queries to express their information needs. In this article, we describe an approach that uses unobtrusive monitoring of interaction to proactively support searchers. The approach chooses terms to better represent information needs by monitoring searcher interaction with different representations of top-ranked documents. Information needs are dynamic and can change as a searcher views information. The approach we propose gathers evidence on potential changes in these needs and uses this evidence to choose new retrieval strategies. We present an evaluation of how well our technique estimates information needs, how well it estimates changes in these needs and the appropriateness of the interface support it offers. The results are presented and the avenues for future research identified.
  2. Efron, M.: Query expansion and dimensionality reduction : Notions of optimality in Rocchio relevance feedback and latent semantic indexing (2008) 0.00
    0.0014112709 = product of:
      0.012701439 = sum of:
        0.012701439 = weight(_text_:of in 2020) [ClassicSimilarity], result of:
          0.012701439 = score(doc=2020,freq=8.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.20732689 = fieldWeight in 2020, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=2020)
      0.11111111 = coord(1/9)
    
    Abstract
    Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method's basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in some sense least-squares optimal, we ask: what is the relationship between LSI's and Rocchio's notions of optimality? What does this relationship imply for IR? Using an analytical approach, we argue that Rocchio relevance feedback is optimal if we understand retrieval as a simplified classification problem. On the other hand, LSI's motivation comes to the fore if we understand it as a biased regression technique, where projection onto a low-dimensional orthogonal subspace of the documents reduces model variance.
  3. Ding, Y.; Yan, E.; Frazho, A.; Caverlee, J.: PageRank for ranking authors in co-citation networks (2009) 0.00
    0.0014112709 = product of:
      0.012701439 = sum of:
        0.012701439 = weight(_text_:of in 3161) [ClassicSimilarity], result of:
          0.012701439 = score(doc=3161,freq=8.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.20732689 = fieldWeight in 3161, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=3161)
      0.11111111 = coord(1/9)
    
    Abstract
    This paper studies how varied damping factors in the PageRank algorithm influence the ranking of authors and proposes weighted PageRank algorithms. We selected the 108 most highly cited authors in the information retrieval (IR) area from the 1970s to 2008 to form the author co-citation network. We calculated the ranks of these 108 authors based on PageRank with the damping factor ranging from 0.05 to 0.95. In order to test the relationship between different measures, we compared PageRank and weighted PageRank results with the citation ranking, h-index, and centrality measures. We found that in our author co-citation network, citation rank is highly correlated with PageRank with different damping factors and also with different weighted PageRank algorithms; citation rank and PageRank are not significantly correlated with centrality measures; and h-index rank does not significantly correlate with centrality measures but does significantly correlate with other measures. The key factors that have impact on the PageRank of authors in the author co-citation network are being co-cited with important authors.
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.11, S.2229-2243
  4. Lewandowski, D.: How can library materials be ranked in the OPAC? (2009) 0.00
    0.0013148742 = product of:
      0.011833867 = sum of:
        0.011833867 = weight(_text_:of in 2810) [ClassicSimilarity], result of:
          0.011833867 = score(doc=2810,freq=10.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.19316542 = fieldWeight in 2810, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2810)
      0.11111111 = coord(1/9)
    
    Abstract
    Some Online Public Access Catalogues offer a ranking component. However, ranking there is merely text-based and is doomed to fail due to limited text in bibliographic data. The main assumption for the talk is that we are in a situation where the appropriate ranking factors for OPACs should be defined, while the implementation is no major problem. We must define what we want, and not so much focus on the technical work. Some deep thinking is necessary on the "perfect results set" and how we can achieve it through ranking. The talk presents a set of potential ranking factors and clustering possibilities for further discussion. A look at commercial Web search engines could provide us with ideas how ranking can be improved with additional factors. Search engines are way beyond pure text-based ranking and apply ranking factors in the groups like popularity, freshness, personalisation, etc. The talk describes the main factors used in search engines and how derivatives of these could be used for libraries' purposes. The goal of ranking is to provide the user with the best-suitable results on top of the results list. How can this goal be achieved with the library catalogue and also concerning the library's different collections and databases? The assumption is that ranking of such materials is a complex problem and is yet nowhere near solved. Libraries should focus on ranking to improve user experience.
  5. Yang, L.; Ji, D.; Leong, M.: Document reranking by term distribution and maximal marginal relevance for chinese information retrieval (2007) 0.00
    0.0012221966 = product of:
      0.010999769 = sum of:
        0.010999769 = weight(_text_:of in 907) [ClassicSimilarity], result of:
          0.010999769 = score(doc=907,freq=6.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.17955035 = fieldWeight in 907, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=907)
      0.11111111 = coord(1/9)
    
    Abstract
    In this paper, we propose a document reranking method for Chinese information retrieval. The method is based on a term weighting scheme, which integrates local and global distribution of terms as well as document frequency, document positions and term length. The weight scheme allows randomly setting a larger portion of the retrieved documents as relevance feedback, and lifts off the worry that very fewer relevant documents appear in top retrieved documents. It also helps to improve the performance of maximal marginal relevance (MMR) in document reranking. The method was evaluated by MAP (mean average precision), a recall-oriented measure. Significance tests showed that our method can get significant improvement against standard baselines, and outperform relevant methods consistently.
  6. Liu, A.; Zou, Q.; Chu, W.W.: Configurable indexing and ranking for XML information retrieval (2004) 0.00
    0.0011760591 = product of:
      0.010584532 = sum of:
        0.010584532 = weight(_text_:of in 4114) [ClassicSimilarity], result of:
          0.010584532 = score(doc=4114,freq=2.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.17277241 = fieldWeight in 4114, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.078125 = fieldNorm(doc=4114)
      0.11111111 = coord(1/9)
    
    Source
    SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a
  7. Yu, K.; Tresp, V.; Yu, S.: ¬A nonparametric hierarchical Bayesian framework for information filtering (2004) 0.00
    0.0011760591 = product of:
      0.010584532 = sum of:
        0.010584532 = weight(_text_:of in 4117) [ClassicSimilarity], result of:
          0.010584532 = score(doc=4117,freq=2.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.17277241 = fieldWeight in 4117, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.078125 = fieldNorm(doc=4117)
      0.11111111 = coord(1/9)
    
    Source
    SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a
  8. Widyantoro, D.H.; Ioerger, T.R.; Yen, J.: Learning user Interest dynamics with a three-descriptor representation (2001) 0.00
    0.0011760591 = product of:
      0.010584532 = sum of:
        0.010584532 = weight(_text_:of in 5185) [ClassicSimilarity], result of:
          0.010584532 = score(doc=5185,freq=8.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.17277241 = fieldWeight in 5185, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5185)
      0.11111111 = coord(1/9)
    
    Abstract
    The use of documents ranked high by user feedback to profile user interests is commonly done with Rocchio's `s algorithm which uses a single list of attribute value pairs called a descriptor to carry term value weights for an individual. Negative feed back on old preferences or positive feedback on new preferences adjusts the descriptor at a fixed, predetermined, and often slow pace. Widyantoro, et alia, suggest a three descriptor model which adds two short term interest descriptors, one each for positive and negative feedback. User short term interest in a particular document is computed by subtracting the similarity measure with the negative descriptor from the similarity measure with the positive descriptor. Using a constant to represent the desired impact of long and short term interests these values may be summed for a single interest value. Using the Reuters 21578 1.0 test collection split into training and test sets, topics with at least 100 documents in a tight cluster were chosen. The TDR handles change well showing better recovery speed and accuracy than the single descriptor model. The nearest neighbor update strategy appears to keep the category concept relatively consistent when multiple TDRs are used.
    Source
    Journal of the American Society for Information Science and technology. 52(2001) no.3, S.212-225
  9. Schaefer, A.; Jordan, M.; Klas, C.-P.; Fuhr, N.: Active support for query formulation in virtual digital libraries : a case study with DAFFODIL (2005) 0.00
    0.0011760591 = product of:
      0.010584532 = sum of:
        0.010584532 = weight(_text_:of in 4296) [ClassicSimilarity], result of:
          0.010584532 = score(doc=4296,freq=8.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.17277241 = fieldWeight in 4296, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4296)
      0.11111111 = coord(1/9)
    
    Abstract
    Daffodil is a front-end to federated, heterogeneous digital libraries targeting at strategic support of users during the information seeking process. This is done by offering a variety of functions for searching, exploring and managing digital library objects. However, the distributed search increases response time and the conceptual model of the underlying search processes is inherently weaker. This makes query formulation harder and the resulting waiting times can be frustrating. In this paper, we investigate the concept of proactive support during the user's query formulation. For improving user efficiency and satisfaction, we implemented annotations, proactive support and error markers on the query form itself. These functions decrease the probability for syntactical or semantical errors in queries. Furthermore, the user is able to make better tactical decisions and feels more confident that the system handles the query properly. Evaluations with 30 subjects showed that user satisfaction is improved, whereas no conclusive results were received for efficiency.
  10. Käki, M.: fKWIC: frequency-based Keyword-in-Context Index for filtering Web search results (2006) 0.00
    9.979192E-4 = product of:
      0.0089812735 = sum of:
        0.0089812735 = weight(_text_:of in 6112) [ClassicSimilarity], result of:
          0.0089812735 = score(doc=6112,freq=4.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.14660224 = fieldWeight in 6112, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=6112)
      0.11111111 = coord(1/9)
    
    Abstract
    Enormous Web search engine databases combined with short search queries result in large result sets that are often difficult to access. Result ranking works fairly well, but users need help when it fails. For these situations, we propose a filtering interface that is inspired by keyword-in-context (KWIC) indices. The user interface lists the most frequent keyword contexts (fKWIC). When a context is selected, the corresponding results are displayed in the result list, allowing users to concentrate on the specific context. We compared the keyword context index user interface to the rank order result listing in an experiment with 36 participants. The results show that the proposed user interface was 29% faster in finding relevant results, and the precision of the selected results was 19% higher. In addition, participants showed positive attitudes toward the system.
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.12, S.1606-1615
  11. Bodoff, D.; Enache, D.; Kambil, A.; Simon, G.; Yukhimets, A.: ¬A unified maximum likelihood approach to document retrieval (2001) 0.00
    7.0563547E-4 = product of:
      0.0063507194 = sum of:
        0.0063507194 = weight(_text_:of in 174) [ClassicSimilarity], result of:
          0.0063507194 = score(doc=174,freq=2.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.103663445 = fieldWeight in 174, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=174)
      0.11111111 = coord(1/9)
    
    Source
    Journal of the American Society for Information Science and technology. 52(2001) no.10, S.785-796
  12. Ackermann, J.: Knuth-Morris-Pratt (2005) 0.00
    4.7042366E-4 = product of:
      0.004233813 = sum of:
        0.004233813 = weight(_text_:of in 865) [ClassicSimilarity], result of:
          0.004233813 = score(doc=865,freq=2.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.06910896 = fieldWeight in 865, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03125 = fieldNorm(doc=865)
      0.11111111 = coord(1/9)
    
    Abstract
    Im Rahmen des Seminars Suchmaschinen und Suchalgorithmen beschäftigt sich diese Arbeit mit dem Auffinden bestimmter Wörter oder Muster in Texten. Der Begriff "Text" wird hier in einem sehr allgemeinen Sinne als strukturierte Folge beliebiger Länge von Zeichen aus einem endlichen Alphabet verstanden. Somit fällt unter diesen Bereich ganz allgemein die Suche nach einem Muster in einer Sequenz von Zeichen. Beispiele hierfür sind neben der Suche von Wörtern in "literarischen" Texten, z.B. das Finden von Pixelfolgen in Bildern oder gar das Finden von Mustern in DNS-Strängen. Das Anwendungsgebiet für eine solche Suche ist weit gefächert. Man denke hier allein an Texteditoren, Literaturdatenbanken, digitale Lexika oder die besagte DNADatenbank. Betrachtet man allein das 1989 publizierte Oxford English Dictionary mit seinen etwa 616500 definierten Stichworten auf gedruckten 21728 Seiten, so gilt es, einen möglichst effizienten Algorithmus für die Suche in Texten zu nutzen. Der in der Arbeit zugrunde liegende Datentyp ist vom Typ String (Zeichenkette), wobei hier offen gelassen wird, wie der Datentyp programmtechnisch realisiert wird. Algorithmen zur Verarbeitung von Zeichenketten (string processing) umfassen ein bestimmtes Spektrum an Anwendungsgebieten [Ot96, S.617 f.], wie z.B. das Komprimieren, das Verschlüssen, das Analysieren (parsen), das Übersetzen von Texten sowie das Suchen in Texten, welches Thema dieses Seminars ist. Im Rahmen dieser Arbeit wird der Knuth-Morris-Pratt Algorithmus vorgestellt, der wie der ebenfalls in diesem Seminar vorgestellte Boyer-Moore Algorithmus einen effizienten Suchalgorithmus darstellt. Dabei soll ein gegebenes Suchwort oder Muster (pattern) in einer gegeben Zeichenkette erkannt werden (pattern matching). Gesucht werden dabei ein oder mehrere Vorkommen eines bestimmten Suchwortes (exact pattern matching). Der Knuth-Morris-Pratt Algorithmus wurde erstmals 1974 als Institutbericht der Stanford University beschrieben und erschien 1977 in der Fachzeitschrift Journal of Computing unter dem Titel "Fast Pattern Matching in Strings" [Kn77]. Der Algorithmus beschreibt eine Suche in Zeichenketten mit linearer Laufzeit. Der Name des Algorithmus setzt sich aus den Entwicklern des Algorithmus Donald E. Knuth, James H. Morris und Vaughan R. Pratt zusammen.

Authors

Languages

  • e 125
  • d 5
  • m 1
  • sp 1
  • More… Less…

Types

  • a 121
  • m 8
  • el 3
  • s 2
  • x 1
  • More… Less…