Search (310 results, page 16 of 16)

  • × theme_ss:"Retrievalalgorithmen"
  1. Silva, R.M.; Gonçalves, M.A.; Veloso, A.: ¬A Two-stage active learning method for learning to rank (2014) 0.00
    0.0011760591 = product of:
      0.010584532 = sum of:
        0.010584532 = weight(_text_:of in 1184) [ClassicSimilarity], result of:
          0.010584532 = score(doc=1184,freq=8.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.17277241 = fieldWeight in 1184, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1184)
      0.11111111 = coord(1/9)
    
    Abstract
    Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can later be used to rank new query results. These training sets are costly and laborious to produce, requiring human annotators to assess the relevance or order of the documents in relation to a query. Active learning algorithms are able to reduce the labeling effort by selectively sampling an unlabeled set and choosing data instances that maximize a learning function's effectiveness. In this article, we propose a novel two-stage active learning method for L2R that combines and exploits interesting properties of its constituent parts, thus being effective and practical. In the first stage, an association rule active sampling algorithm is used to select a very small but effective initial training set. In the second stage, a query-by-committee strategy trained with the first-stage set is used to iteratively select more examples until a preset labeling budget is met or a target effectiveness is achieved. We test our method with various LETOR benchmarking data sets and compare it with several baselines to show that it achieves good results using only a small portion of the original training sets.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.1, S.109-128
  2. Dadashkarimia, J.; Shakery, A.; Failia, H.; Zamani, H.: ¬An expectation-maximization algorithm for query translation based on pseudo-relevant documents (2017) 0.00
    0.0010518994 = product of:
      0.009467094 = sum of:
        0.009467094 = weight(_text_:of in 3296) [ClassicSimilarity], result of:
          0.009467094 = score(doc=3296,freq=10.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.15453234 = fieldWeight in 3296, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03125 = fieldNorm(doc=3296)
      0.11111111 = coord(1/9)
    
    Abstract
    Query translation in cross-language information retrieval (CLIR) can be done by employing dictionaries, aligned corpora, or machine translators. Scarcity of aligned corpora for various domains in many language pairs intensifies the importance of dictionary-based CLIR which motivates us to use only a bilingual dictionary and two independent collections in source and target languages for query translation. We exploit pseudo-relevant documents for a given query in the source language and pseudo-relevant documents for a translation of the query in the target language with a proposed expectation-maximization algorithm for improving query translation. The proposed method (called EM4QT) assumes that each target term either is translated from the source pseudo-relevant documents or has come from a noisy collection. Since EM4QT does not directly consider term coherency, which is defined as fluency of the target translation, we investigate a crucial question: can EM4QT be improved using either coherency-based methods or token-to-token translation ones? To address this question, we combine different translation models via simple linear interpolation and a proposed divergence minimization method. Evaluations over four CLEF collections in Persian, French, Spanish, and German indicate that EM4QT significantly outperforms competitive baselines in all the collections. Our experiments also reveal that since EM4QT indirectly considers term coherency, combining the method with coherency-based models cannot significantly improve the retrieval performance. On the other hand, investigating the query-by-query results supports the view that EM4QT usually gives a relatively high weight to one translation and its combination with the proposed token-to-token translation model, which is obtained by running EM4QT for each query term separately, soothes the effect and reaches better results for many queries. Comparing the method with a competitive word-embedding baseline reveals the superiority of the proposed model.
  3. Purpura, A.; Silvello, G.; Susto, G.A.: Learning to rank from relevance judgments distributions (2022) 0.00
    0.0010184972 = product of:
      0.009166474 = sum of:
        0.009166474 = weight(_text_:of in 645) [ClassicSimilarity], result of:
          0.009166474 = score(doc=645,freq=6.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.1496253 = fieldWeight in 645, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=645)
      0.11111111 = coord(1/9)
    
    Abstract
    LEarning TO Rank (LETOR) algorithms are usually trained on annotated corpora where a single relevance label is assigned to each available document-topic pair. Within the Cranfield framework, relevance labels result from merging either multiple expertly curated or crowdsourced human assessments. In this paper, we explore how to train LETOR models with relevance judgments distributions (either real or synthetically generated) assigned to document-topic pairs instead of single-valued relevance labels. We propose five new probabilistic loss functions to deal with the higher expressive power provided by relevance judgments distributions and show how they can be applied both to neural and gradient boosting machine (GBM) architectures. Moreover, we show how training a LETOR model on a sampled version of the relevance judgments from certain probability distributions can improve its performance when relying either on traditional or probabilistic loss functions. Finally, we validate our hypothesis on real-world crowdsourced relevance judgments distributions. Overall, we observe that relying on relevance judgments distributions to train different LETOR models can boost their performance and even outperform strong baselines such as LambdaMART on several test collections.
    Source
    Journal of the Association for Information Science and Technology. 73(2022) no.9, S.1236-1252
  4. Käki, M.: fKWIC: frequency-based Keyword-in-Context Index for filtering Web search results (2006) 0.00
    9.979192E-4 = product of:
      0.0089812735 = sum of:
        0.0089812735 = weight(_text_:of in 6112) [ClassicSimilarity], result of:
          0.0089812735 = score(doc=6112,freq=4.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.14660224 = fieldWeight in 6112, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=6112)
      0.11111111 = coord(1/9)
    
    Abstract
    Enormous Web search engine databases combined with short search queries result in large result sets that are often difficult to access. Result ranking works fairly well, but users need help when it fails. For these situations, we propose a filtering interface that is inspired by keyword-in-context (KWIC) indices. The user interface lists the most frequent keyword contexts (fKWIC). When a context is selected, the corresponding results are displayed in the result list, allowing users to concentrate on the specific context. We compared the keyword context index user interface to the rank order result listing in an experiment with 36 participants. The results show that the proposed user interface was 29% faster in finding relevant results, and the precision of the selected results was 19% higher. In addition, participants showed positive attitudes toward the system.
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.12, S.1606-1615
  5. Yan, E.; Ding, Y.; Sugimoto, C.R.: P-Rank: an indicator measuring prestige in heterogeneous scholarly networks (2011) 0.00
    9.979192E-4 = product of:
      0.0089812735 = sum of:
        0.0089812735 = weight(_text_:of in 4349) [ClassicSimilarity], result of:
          0.0089812735 = score(doc=4349,freq=4.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.14660224 = fieldWeight in 4349, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=4349)
      0.11111111 = coord(1/9)
    
    Abstract
    Ranking scientific productivity and prestige are often limited to homogeneous networks. These networks are unable to account for the multiple factors that constitute the scholarly communication and reward system. This study proposes a new informetric indicator, P-Rank, for measuring prestige in heterogeneous scholarly networks containing articles, authors, and journals. P-Rank differentiates the weight of each citation based on its citing papers, citing journals, and citing authors. Articles from 16 representative library and information science journals are selected as the dataset. Principle Component Analysis is conducted to examine the relationship between P-Rank and other bibliometric indicators. We also compare the correlation and rank variances between citation counts and P-Rank scores. This work provides a new approach to examining prestige in scholarly communication networks in a more comprehensive and nuanced way.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.3, S.467-477
  6. Longshu, L.; Xia, Z.: On an aproximate fuzzy information retrieval agent (1998) 0.00
    9.408473E-4 = product of:
      0.008467626 = sum of:
        0.008467626 = weight(_text_:of in 3294) [ClassicSimilarity], result of:
          0.008467626 = score(doc=3294,freq=2.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.13821793 = fieldWeight in 3294, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=3294)
      0.11111111 = coord(1/9)
    
    Source
    Journal of the China Society for Scientific and Technical Information. 17(1998) no.3, S.180-184
  7. Harman, D.; Fox, E.; Baeza-Yates, R.; Lee, W.: Inverted files (1992) 0.00
    9.408473E-4 = product of:
      0.008467626 = sum of:
        0.008467626 = weight(_text_:of in 3497) [ClassicSimilarity], result of:
          0.008467626 = score(doc=3497,freq=2.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.13821793 = fieldWeight in 3497, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=3497)
      0.11111111 = coord(1/9)
    
    Abstract
    This chaper presents a survey of the various structures (techniques) that can be used in building inverted files, and gives the details for producing an inverted file using sorted arrays. The chapter ends with 2 modifications to this basic method that are affective for large data collections
  8. Bodoff, D.; Enache, D.; Kambil, A.; Simon, G.; Yukhimets, A.: ¬A unified maximum likelihood approach to document retrieval (2001) 0.00
    7.0563547E-4 = product of:
      0.0063507194 = sum of:
        0.0063507194 = weight(_text_:of in 174) [ClassicSimilarity], result of:
          0.0063507194 = score(doc=174,freq=2.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.103663445 = fieldWeight in 174, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=174)
      0.11111111 = coord(1/9)
    
    Source
    Journal of the American Society for Information Science and technology. 52(2001) no.10, S.785-796
  9. Mayr, P.: Bradfordizing als Re-Ranking-Ansatz in Literaturinformationssystemen (2011) 0.00
    7.0563547E-4 = product of:
      0.0063507194 = sum of:
        0.0063507194 = weight(_text_:of in 4292) [ClassicSimilarity], result of:
          0.0063507194 = score(doc=4292,freq=2.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.103663445 = fieldWeight in 4292, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=4292)
      0.11111111 = coord(1/9)
    
    Abstract
    In diesem Artikel wird ein Re-Ranking-Ansatz für Suchsysteme vorgestellt, der die Recherche nach wissenschaftlicher Literatur messbar verbessern kann. Das nichttextorientierte Rankingverfahren Bradfordizing wird eingeführt und anschließend im empirischen Teil des Artikels bzgl. der Effektivität für typische fachbezogene Recherche-Topics evaluiert. Dem Bradford Law of Scattering (BLS), auf dem Bradfordizing basiert, liegt zugrunde, dass sich die Literatur zu einem beliebigen Fachgebiet bzw. -thema in Zonen unterschiedlicher Dokumentenkonzentration verteilt. Dem Kernbereich mit hoher Konzentration der Literatur folgen Bereiche mit mittlerer und geringer Konzentration. Bradfordizing sortiert bzw. rankt eine Dokumentmenge damit nach den sogenannten Kernzeitschriften. Der Retrievaltest mit 164 intellektuell bewerteten Fragestellungen in Fachdatenbanken aus den Bereichen Sozial- und Politikwissenschaften, Wirtschaftswissenschaften, Psychologie und Medizin zeigt, dass die Dokumente der Kernzeitschriften signifikant häufiger relevant bewertet werden als Dokumente der zweiten Dokumentzone bzw. den Peripherie-Zeitschriften. Die Implementierung von Bradfordizing und weiteren Re-Rankingverfahren liefert unmittelbare Mehrwerte für den Nutzer.
  10. Ackermann, J.: Knuth-Morris-Pratt (2005) 0.00
    4.7042366E-4 = product of:
      0.004233813 = sum of:
        0.004233813 = weight(_text_:of in 865) [ClassicSimilarity], result of:
          0.004233813 = score(doc=865,freq=2.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.06910896 = fieldWeight in 865, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03125 = fieldNorm(doc=865)
      0.11111111 = coord(1/9)
    
    Abstract
    Im Rahmen des Seminars Suchmaschinen und Suchalgorithmen beschäftigt sich diese Arbeit mit dem Auffinden bestimmter Wörter oder Muster in Texten. Der Begriff "Text" wird hier in einem sehr allgemeinen Sinne als strukturierte Folge beliebiger Länge von Zeichen aus einem endlichen Alphabet verstanden. Somit fällt unter diesen Bereich ganz allgemein die Suche nach einem Muster in einer Sequenz von Zeichen. Beispiele hierfür sind neben der Suche von Wörtern in "literarischen" Texten, z.B. das Finden von Pixelfolgen in Bildern oder gar das Finden von Mustern in DNS-Strängen. Das Anwendungsgebiet für eine solche Suche ist weit gefächert. Man denke hier allein an Texteditoren, Literaturdatenbanken, digitale Lexika oder die besagte DNADatenbank. Betrachtet man allein das 1989 publizierte Oxford English Dictionary mit seinen etwa 616500 definierten Stichworten auf gedruckten 21728 Seiten, so gilt es, einen möglichst effizienten Algorithmus für die Suche in Texten zu nutzen. Der in der Arbeit zugrunde liegende Datentyp ist vom Typ String (Zeichenkette), wobei hier offen gelassen wird, wie der Datentyp programmtechnisch realisiert wird. Algorithmen zur Verarbeitung von Zeichenketten (string processing) umfassen ein bestimmtes Spektrum an Anwendungsgebieten [Ot96, S.617 f.], wie z.B. das Komprimieren, das Verschlüssen, das Analysieren (parsen), das Übersetzen von Texten sowie das Suchen in Texten, welches Thema dieses Seminars ist. Im Rahmen dieser Arbeit wird der Knuth-Morris-Pratt Algorithmus vorgestellt, der wie der ebenfalls in diesem Seminar vorgestellte Boyer-Moore Algorithmus einen effizienten Suchalgorithmus darstellt. Dabei soll ein gegebenes Suchwort oder Muster (pattern) in einer gegeben Zeichenkette erkannt werden (pattern matching). Gesucht werden dabei ein oder mehrere Vorkommen eines bestimmten Suchwortes (exact pattern matching). Der Knuth-Morris-Pratt Algorithmus wurde erstmals 1974 als Institutbericht der Stanford University beschrieben und erschien 1977 in der Fachzeitschrift Journal of Computing unter dem Titel "Fast Pattern Matching in Strings" [Kn77]. Der Algorithmus beschreibt eine Suche in Zeichenketten mit linearer Laufzeit. Der Name des Algorithmus setzt sich aus den Entwicklern des Algorithmus Donald E. Knuth, James H. Morris und Vaughan R. Pratt zusammen.

Languages

  • e 293
  • d 12
  • chi 2
  • m 1
  • sp 1
  • More… Less…

Types

  • a 287
  • m 11
  • el 8
  • s 5
  • r 3
  • p 2
  • x 1
  • More… Less…