Search (145 results, page 1 of 8)

  • × theme_ss:"Retrievalalgorithmen"
  1. Stock, M.; Stock, W.G.: Internet-Suchwerkzeuge im Vergleich (IV) : Relevance Ranking nach "Popularität" von Webseiten: Google (2001) 0.06
    0.0629695 = product of:
      0.125939 = sum of:
        0.11963169 = sum of:
          0.045334514 = weight(_text_:web in 5771) [ClassicSimilarity], result of:
            0.045334514 = score(doc=5771,freq=6.0), product of:
              0.12098375 = queryWeight, product of:
                3.2635105 = idf(docFreq=4597, maxDocs=44218)
                0.03707166 = queryNorm
              0.37471575 = fieldWeight in 5771, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.2635105 = idf(docFreq=4597, maxDocs=44218)
                0.046875 = fieldNorm(doc=5771)
          0.074297175 = weight(_text_:seiten in 5771) [ClassicSimilarity], result of:
            0.074297175 = score(doc=5771,freq=2.0), product of:
              0.20383513 = queryWeight, product of:
                5.4984083 = idf(docFreq=491, maxDocs=44218)
                0.03707166 = queryNorm
              0.3644964 = fieldWeight in 5771, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4984083 = idf(docFreq=491, maxDocs=44218)
                0.046875 = fieldNorm(doc=5771)
        0.0063072974 = product of:
          0.031536486 = sum of:
            0.031536486 = weight(_text_:28 in 5771) [ClassicSimilarity], result of:
              0.031536486 = score(doc=5771,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.23747274 = fieldWeight in 5771, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5771)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    In unserem Retrievaltest von Suchwerkzeugen im World Wide Web (Password 11/2000) schnitt die Suchmaschine Google am besten ab. Im Vergleich zu anderen Search Engines setzt Google kaum auf Informationslinguistik, sondern auf Algorithmen, die sich aus den Besonderheiten der Web-Dokumente ableiten lassen. Kernstück der informationsstatistischen Technik ist das "PageRank"- Verfahren (benannt nach dem Entwickler Larry Page), das aus der Hypertextstruktur des Web die "Popularität" von Seiten anhand ihrer ein- und ausgehenden Links berechnet. Google besticht durch das Angebot intuitiv verstehbarer Suchbildschirme sowie durch einige sehr nützliche "Kleinigkeiten" wie die Angabe des Rangs einer Seite, Highlighting, Suchen in der Seite, Suchen innerhalb eines Suchergebnisses usw., alles verstaut in einer eigenen Befehlsleiste innerhalb des Browsers. Ähnlich wie RealNames bietet Google mit dem Produkt "AdWords" den Aufkauf von Suchtermen an. Nach einer Reihe von nunmehr vier Password-Artikeln über InternetSuchwerkzeugen im Vergleich wollen wir abschließend zu einer Bewertung kommen. Wie ist der Stand der Technik bei Directories und Search Engines aus informationswissenschaftlicher Sicht einzuschätzen? Werden die "typischen" Internetnutzer, die ja in der Regel keine Information Professionals sind, adäquat bedient? Und können auch Informationsfachleute von den Suchwerkzeugen profitieren?
    Date
    28. 4.2001 14:47:21
  2. Mandl, T.: Web- und Multimedia-Dokumente : Neuere Entwicklungen bei der Evaluierung von Information Retrieval Systemen (2003) 0.03
    0.033490356 = product of:
      0.13396142 = sum of:
        0.13396142 = sum of:
          0.034898523 = weight(_text_:web in 1734) [ClassicSimilarity], result of:
            0.034898523 = score(doc=1734,freq=2.0), product of:
              0.12098375 = queryWeight, product of:
                3.2635105 = idf(docFreq=4597, maxDocs=44218)
                0.03707166 = queryNorm
              0.2884563 = fieldWeight in 1734, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2635105 = idf(docFreq=4597, maxDocs=44218)
                0.0625 = fieldNorm(doc=1734)
          0.0990629 = weight(_text_:seiten in 1734) [ClassicSimilarity], result of:
            0.0990629 = score(doc=1734,freq=2.0), product of:
              0.20383513 = queryWeight, product of:
                5.4984083 = idf(docFreq=491, maxDocs=44218)
                0.03707166 = queryNorm
              0.4859952 = fieldWeight in 1734, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4984083 = idf(docFreq=491, maxDocs=44218)
                0.0625 = fieldNorm(doc=1734)
      0.25 = coord(1/4)
    
    Abstract
    Die Menge an Daten im Internet steigt weiter rapide an. Damit wächst auch der Bedarf an qualitativ hochwertigen Information Retrieval Diensten zur Orientierung und problemorientierten Suche. Die Entscheidung für die Benutzung oder Beschaffung von Information Retrieval Software erfordert aussagekräftige Evaluierungsergebnisse. Dieser Beitrag stellt neuere Entwicklungen bei der Evaluierung von Information Retrieval Systemen vor und zeigt den Trend zu Spezialisierung und Diversifizierung von Evaluierungsstudien, die den Realitätsgrad derErgebnisse erhöhen. DerSchwerpunkt liegt auf dem Retrieval von Fachtexten, Internet-Seiten und Multimedia-Objekten.
  3. Weiß, B.: Verwandte Seiten finden : "Ähnliche Seiten" oder "What's Related" (2005) 0.02
    0.021890016 = product of:
      0.087560065 = sum of:
        0.087560065 = product of:
          0.17512013 = sum of:
            0.17512013 = weight(_text_:seiten in 868) [ClassicSimilarity], result of:
              0.17512013 = score(doc=868,freq=16.0), product of:
                0.20383513 = queryWeight, product of:
                  5.4984083 = idf(docFreq=491, maxDocs=44218)
                  0.03707166 = queryNorm
                0.8591263 = fieldWeight in 868, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  5.4984083 = idf(docFreq=491, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=868)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Die Link-Struktur-Analyse (LSA) ist nicht nur beim Crawling, dem Webseitenranking, der Abgrenzung geographischer Bereiche, der Vorhersage von Linkverwendungen, dem Auffinden von "Mirror"-Seiten, dem Kategorisieren von Webseiten und beim Generieren von Webseitenstatistiken eines der wichtigsten Analyseverfahren, sondern auch bei der Suche nach verwandten Seiten. Um qualitativ hochwertige verwandte Seiten zu finden, bildet sie nach herrschender Meinung den Hauptbestandteil bei der Identifizierung von ähnlichen Seiten innerhalb themenspezifischer Graphen vernetzter Dokumente. Dabei wird stets von zwei Annahmen ausgegangen: Links zwischen zwei Dokumenten implizieren einen verwandten Inhalt beider Dokumente und wenn die Dokumente aus unterschiedlichen Quellen (von unterschiedlichen Autoren, Hosts, Domänen, .) stammen, so bedeutet dies das eine Quelle die andere über einen Link empfiehlt. Aufbauend auf dieser Idee entwickelte Kleinberg 1998 den HITS Algorithmus um verwandte Seiten über die Link-Struktur-Analyse zu bestimmen. Dieser Ansatz wurde von Bharat und Henzinger weiterentwickelt und später auch in Algorithmen wie dem Companion und Cocitation Algorithmus zur Suche von verwandten Seiten basierend auf nur einer Anfrage-URL weiter verfolgt. In der vorliegenden Seminararbeit sollen dabei die Algorithmen, die hinter diesen Überlegungen stehen, näher erläutert werden und im Anschluss jeweils neuere Forschungsansätze auf diesem Themengebiet aufgezeigt werden.
  4. Thelwall, M.; Vaughan, L.: New versions of PageRank employing alternative Web document models (2004) 0.02
    0.021661386 = product of:
      0.04332277 = sum of:
        0.037015475 = product of:
          0.07403095 = sum of:
            0.07403095 = weight(_text_:web in 674) [ClassicSimilarity], result of:
              0.07403095 = score(doc=674,freq=16.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.6119082 = fieldWeight in 674, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=674)
          0.5 = coord(1/2)
        0.0063072974 = product of:
          0.031536486 = sum of:
            0.031536486 = weight(_text_:28 in 674) [ClassicSimilarity], result of:
              0.031536486 = score(doc=674,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.23747274 = fieldWeight in 674, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.046875 = fieldNorm(doc=674)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects' rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.
    Date
    20. 1.2007 18:32:28
  5. Chang, C.-H.; Hsu, C.-C.: Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval (1998) 0.02
    0.016738456 = product of:
      0.03347691 = sum of:
        0.026445134 = product of:
          0.052890267 = sum of:
            0.052890267 = weight(_text_:web in 1319) [ClassicSimilarity], result of:
              0.052890267 = score(doc=1319,freq=6.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.43716836 = fieldWeight in 1319, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1319)
          0.5 = coord(1/2)
        0.007031777 = product of:
          0.035158884 = sum of:
            0.035158884 = weight(_text_:22 in 1319) [ClassicSimilarity], result of:
              0.035158884 = score(doc=1319,freq=2.0), product of:
                0.12981863 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03707166 = queryNorm
                0.2708308 = fieldWeight in 1319, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1319)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    Keyword based querying has been an immediate and efficient way to specify and retrieve related information that the user inquired. However, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given. Proposes an idea to integrate 2 existing techniques, query expansion and relevance feedback to achieve a concept-based information search for the Web
    Date
    1. 8.1996 22:08:06
    Footnote
    Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia
  6. Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.02
    0.016100565 = product of:
      0.03220113 = sum of:
        0.026173891 = product of:
          0.052347783 = sum of:
            0.052347783 = weight(_text_:web in 2239) [ClassicSimilarity], result of:
              0.052347783 = score(doc=2239,freq=8.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.43268442 = fieldWeight in 2239, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2239)
          0.5 = coord(1/2)
        0.0060272375 = product of:
          0.030136187 = sum of:
            0.030136187 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
              0.030136187 = score(doc=2239,freq=2.0), product of:
                0.12981863 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03707166 = queryNorm
                0.23214069 = fieldWeight in 2239, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2239)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
    Date
    31. 5.2004 19:22:06
  7. Agosti, M.; Pretto, L.: ¬A theoretical study of a generalized version of kleinberg's HITS algorithm (2005) 0.01
    0.013439935 = product of:
      0.02687987 = sum of:
        0.021811578 = product of:
          0.043623157 = sum of:
            0.043623157 = weight(_text_:web in 4) [ClassicSimilarity], result of:
              0.043623157 = score(doc=4,freq=8.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.36057037 = fieldWeight in 4, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4)
          0.5 = coord(1/2)
        0.005068291 = product of:
          0.025341455 = sum of:
            0.025341455 = weight(_text_:29 in 4) [ClassicSimilarity], result of:
              0.025341455 = score(doc=4,freq=2.0), product of:
                0.13040651 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03707166 = queryNorm
                0.19432661 = fieldWeight in 4, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    Kleinberg's HITS (Hyperlink-Induced Topic Search) algorithm (Kleinberg 1999), which was originally developed in a Web context, tries to infer the authoritativeness of a Web page in relation to a specific query using the structure of a subgraph of the Web graph, which is obtained considering this specific query. Recent applications of this algorithm in contexts far removed from that of Web searching (Bacchin, Ferro and Melucci 2002, Ng et al. 2001) inspired us to study the algorithm in the abstract, independently of its particular applications, trying to mathematically illuminate its behaviour. In the present paper we detail this theoretical analysis. The original work starts from the definition of a revised and more general version of the algorithm, which includes the classic one as a particular case. We perform an analysis of the structure of two particular matrices, essential to studying the behaviour of the algorithm, and we prove the convergence of the algorithm in the most general case, finding the analytic expression of the vectors to which it converges. Then we study the symmetry of the algorithm and prove the equivalence between the existence of symmetry and the independence from the order of execution of some basic operations on initial vectors. Finally, we expound some interesting consequences of our theoretical results.
    Date
    31.12.1996 19:29:41
  8. Meghabghab, G.: Google's Web page ranking applied to different topological Web graph structures (2001) 0.01
    0.013075581 = product of:
      0.052302323 = sum of:
        0.052302323 = product of:
          0.10460465 = sum of:
            0.10460465 = weight(_text_:web in 6028) [ClassicSimilarity], result of:
              0.10460465 = score(doc=6028,freq=46.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.86461735 = fieldWeight in 6028, product of:
                  6.78233 = tf(freq=46.0), with freq of:
                    46.0 = termFreq=46.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6028)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This research is part of the ongoing study to better understand web page ranking on the web. It looks at a web page as a graph structure or a web graph, and tries to classify different web graphs in the new coordinate space: (out-degree, in-degree). The out-degree coordinate od is defined as the number of outgoing web pages from a given web page. The in-degree id coordinate is the number of web pages that point to a given web page. In this new coordinate space a metric is built to classify how close or far different web graphs are. Google's web ranking algorithm (Brin & Page, 1998) on ranking web pages is applied in this new coordinate space. The results of the algorithm has been modified to fit different topological web graph structures. Also the algorithm was not successful in the case of general web graphs and new ranking web algorithms have to be considered. This study does not look at enhancing web ranking by adding any contextual information. It only considers web links as a source to web page ranking. The author believes that understanding the underlying web page as a graph will help design better ranking web algorithms, enhance retrieval and web performance, and recommends using graphs as a part of visual aid for browsing engine designers
  9. Chakrabarti, S.; Dom, B.; Kumar, S.R.; Raghavan, P.; Rajagopalan, S.; Tomkins, A.; Kleinberg, J.M.; Gibson, D.: Neue Pfade durch den Internet-Dschungel : Die zweite Generation von Web-Suchmaschinen (1999) 0.01
    0.012779264 = product of:
      0.025558528 = sum of:
        0.017449262 = product of:
          0.034898523 = sum of:
            0.034898523 = weight(_text_:web in 3) [ClassicSimilarity], result of:
              0.034898523 = score(doc=3,freq=2.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.2884563 = fieldWeight in 3, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3)
          0.5 = coord(1/2)
        0.008109266 = product of:
          0.040546328 = sum of:
            0.040546328 = weight(_text_:29 in 3) [ClassicSimilarity], result of:
              0.040546328 = score(doc=3,freq=2.0), product of:
                0.13040651 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03707166 = queryNorm
                0.31092256 = fieldWeight in 3, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Date
    31.12.1996 19:29:41
  10. Habernal, I.; Konopík, M.; Rohlík, O.: Question answering (2012) 0.01
    0.012407517 = product of:
      0.024815034 = sum of:
        0.018507738 = product of:
          0.037015475 = sum of:
            0.037015475 = weight(_text_:web in 101) [ClassicSimilarity], result of:
              0.037015475 = score(doc=101,freq=4.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.3059541 = fieldWeight in 101, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=101)
          0.5 = coord(1/2)
        0.0063072974 = product of:
          0.031536486 = sum of:
            0.031536486 = weight(_text_:28 in 101) [ClassicSimilarity], result of:
              0.031536486 = score(doc=101,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.23747274 = fieldWeight in 101, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.046875 = fieldNorm(doc=101)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    Question Answering is an area of information retrieval with the added challenge of applying sophisticated techniques to identify the complex syntactic and semantic relationships present in text in order to provide a more sophisticated and satisfactory response to the user's information needs. For this reason, the authors see question answering as the next step beyond standard information retrieval. In this chapter state of the art question answering is covered focusing on providing an overview of systems, techniques and approaches that are likely to be employed in the next generations of search engines. Special attention is paid to question answering using the World Wide Web as the data source and to question answering exploiting the possibilities of Semantic Web. Considerations about the current issues and prospects for promising future research are also provided.
    Date
    17. 4.2012 15:28:17
  11. Käki, M.: fKWIC: frequency-based Keyword-in-Context Index for filtering Web search results (2006) 0.01
    0.012294844 = product of:
      0.024589688 = sum of:
        0.018507738 = product of:
          0.037015475 = sum of:
            0.037015475 = weight(_text_:web in 6112) [ClassicSimilarity], result of:
              0.037015475 = score(doc=6112,freq=4.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.3059541 = fieldWeight in 6112, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6112)
          0.5 = coord(1/2)
        0.0060819495 = product of:
          0.030409746 = sum of:
            0.030409746 = weight(_text_:29 in 6112) [ClassicSimilarity], result of:
              0.030409746 = score(doc=6112,freq=2.0), product of:
                0.13040651 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03707166 = queryNorm
                0.23319192 = fieldWeight in 6112, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6112)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    Enormous Web search engine databases combined with short search queries result in large result sets that are often difficult to access. Result ranking works fairly well, but users need help when it fails. For these situations, we propose a filtering interface that is inspired by keyword-in-context (KWIC) indices. The user interface lists the most frequent keyword contexts (fKWIC). When a context is selected, the corresponding results are displayed in the result list, allowing users to concentrate on the specific context. We compared the keyword context index user interface to the rank order result listing in an experiment with 36 participants. The results show that the proposed user interface was 29% faster in finding relevant results, and the precision of the selected results was 19% higher. In addition, participants showed positive attitudes toward the system.
  12. Desai, M.; Spink, A.: ¬A algorithm to cluster documents based on relevance (2005) 0.01
    0.009697122 = product of:
      0.019394243 = sum of:
        0.013086946 = product of:
          0.026173891 = sum of:
            0.026173891 = weight(_text_:web in 1035) [ClassicSimilarity], result of:
              0.026173891 = score(doc=1035,freq=2.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.21634221 = fieldWeight in 1035, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1035)
          0.5 = coord(1/2)
        0.0063072974 = product of:
          0.031536486 = sum of:
            0.031536486 = weight(_text_:28 in 1035) [ClassicSimilarity], result of:
              0.031536486 = score(doc=1035,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.23747274 = fieldWeight in 1035, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1035)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    Search engines fail to make a clear distinction between items of varying relevance when presenting search results to users. Instead, they rely on the user of the system to estimate which items are relevant, partially relevant, or not relevant. The user of the system is given the task of distinguishing between documents that are relevant to different degrees. This process often hinders the accessibility of relevant or partially relevant documents, particularly when the results set is large and documents of varying relevance are scattered throughout the set. In this paper, we present a clustering scheme that groups documents within relevant, partially relevant, and not relevant regions for a given search. A clustering algorithm accomplishes the task of clustering documents based on relevance. The clusters were evaluated by end-users issuing categorical, interval, and descriptive relevance judgments for the documents returned from a search. The degree of overlap between users and the system for each of the clustered regions was measured to determine the overall effectiveness of the algorithm. This research showed that clustering documents on the Web by regions of relevance is highly necessary and quite feasible.
    Date
    26.12.2007 20:28:53
  13. Liu, X.; Turtle, H.: Real-time user interest modeling for real-time ranking (2013) 0.01
    0.009697122 = product of:
      0.019394243 = sum of:
        0.013086946 = product of:
          0.026173891 = sum of:
            0.026173891 = weight(_text_:web in 1035) [ClassicSimilarity], result of:
              0.026173891 = score(doc=1035,freq=2.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.21634221 = fieldWeight in 1035, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1035)
          0.5 = coord(1/2)
        0.0063072974 = product of:
          0.031536486 = sum of:
            0.031536486 = weight(_text_:28 in 1035) [ClassicSimilarity], result of:
              0.031536486 = score(doc=1035,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.23747274 = fieldWeight in 1035, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1035)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    User interest as a very dynamic information need is often ignored in most existing information retrieval systems. In this research, we present the results of experiments designed to evaluate the performance of a real-time interest model (RIM) that attempts to identify the dynamic and changing query level interests regarding social media outputs. Unlike most existing ranking methods, our ranking approach targets calculation of the probability that user interest in the content of the document is subject to very dynamic user interest change. We describe 2 formulations of the model (real-time interest vector space and real-time interest language model) stemming from classical relevance ranking methods and develop a novel methodology for evaluating the performance of RIM using Amazon Mechanical Turk to collect (interest-based) relevance judgments on a daily basis. Our results show that the model usually, although not always, performs better than baseline results obtained from commercial web search engines. We identify factors that affect RIM performance and outline plans for future research.
    Date
    28. 7.2013 12:59:19
  14. Kaszkiel, M.; Zobel, J.: Effective ranking with arbitrary passages (2001) 0.01
    0.009584447 = product of:
      0.019168895 = sum of:
        0.013086946 = product of:
          0.026173891 = sum of:
            0.026173891 = weight(_text_:web in 5764) [ClassicSimilarity], result of:
              0.026173891 = score(doc=5764,freq=2.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.21634221 = fieldWeight in 5764, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5764)
          0.5 = coord(1/2)
        0.0060819495 = product of:
          0.030409746 = sum of:
            0.030409746 = weight(_text_:29 in 5764) [ClassicSimilarity], result of:
              0.030409746 = score(doc=5764,freq=2.0), product of:
                0.13040651 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03707166 = queryNorm
                0.23319192 = fieldWeight in 5764, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5764)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material among otherwise irrelevant text. In this article, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents
    Date
    29. 9.2001 14:00:39
  15. Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.01
    0.009391997 = product of:
      0.018783994 = sum of:
        0.015268105 = product of:
          0.03053621 = sum of:
            0.03053621 = weight(_text_:web in 2509) [ClassicSimilarity], result of:
              0.03053621 = score(doc=2509,freq=8.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.25239927 = fieldWeight in 2509, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=2509)
          0.5 = coord(1/2)
        0.0035158885 = product of:
          0.017579442 = sum of:
            0.017579442 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
              0.017579442 = score(doc=2509,freq=2.0), product of:
                0.12981863 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03707166 = queryNorm
                0.1354154 = fieldWeight in 2509, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=2509)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Content
    "Most Web search engines accept natural language queries, perform some kind of fuzzy matching and produce ranked output, displaying first the documents that are most likely to be relevant. On the other hand, most library online public access catalogs (OPACs) an the Web are still Boolean retrieval systems that perform exact matching, and require users to express their search requests precisely in a Boolean search language and to refine their search statements to improve the search results. It is well-documented that users have difficulty searching Boolean OPACs effectively (e.g. Borgman, 1996; Ensor, 1992; Wallace, 1993). One approach to making OPACs easier to use is to develop a natural language search interface that acts as a middleware between the user's Web browser and the OPAC system. The search interface can accept a natural language query from the user and reformulate it as a series of Boolean search statements that are then submitted to the OPAC. The records retrieved by the OPAC are ranked by the search interface before forwarding them to the user's Web browser. The user, then, does not need to interact directly with the Boolean OPAC but with the natural language search interface or search intermediary. The search interface interacts with the OPAC system an the user's behalf. The advantage of this approach is that no modification to the OPAC or library system is required. Furthermore, the search interface can access multiple OPACs, acting as a meta search engine, and integrate search results from various OPACs before sending them to the user. The search interface needs to incorporate a method for converting the user's natural language query into a series of Boolean search statements, and for ranking the OPAC records retrieved. The purpose of this study was to develop a relevancyranking algorithm for a search interface to Boolean OPAC systems. This is part of an on-going effort to develop a knowledge-based search interface to OPACs called the E-Referencer (Khoo et al., 1998, 1999; Poo et al., 2000). E-Referencer v. 2 that has been implemented applies a repertoire of initial search strategies and reformulation strategies to retrieve records from OPACs using the Z39.50 protocol, and also assists users in mapping query keywords to the Library of Congress subject headings."
    Source
    Electronic library. 22(2004) no.2, S.112-120
  16. Fichtner, K.: Boyer-Moore Suchalgorithmus (2005) 0.01
    0.009287147 = product of:
      0.037148587 = sum of:
        0.037148587 = product of:
          0.074297175 = sum of:
            0.074297175 = weight(_text_:seiten in 864) [ClassicSimilarity], result of:
              0.074297175 = score(doc=864,freq=2.0), product of:
                0.20383513 = queryWeight, product of:
                  5.4984083 = idf(docFreq=491, maxDocs=44218)
                  0.03707166 = queryNorm
                0.3644964 = fieldWeight in 864, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4984083 = idf(docFreq=491, maxDocs=44218)
                  0.046875 = fieldNorm(doc=864)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Die Masse der Suchalgorithmen lässt sich in zwei grundlegend verschiedene Teilbereiche untergliedern. Auf der einen Seite stehen Algorithmen, die auf komplexen Datenstrukturen (häufig baumartig) ganze Datensätze unter Verwendung eines Indizes finden. Als geläufiger Vertreter sei hier die binäre Suche auf sortierten Arrays oder in binären Bäumen genannt. Die andere Gruppe, der sich diese Ausarbeitung widmet, dient dazu, Entsprechungen von Mustern in gegebenen Zeichenketten zu finden. Auf den folgenden Seiten werden nun zunächst einige Begriffe eingeführt, die für das weitere Verständnis und einen Vergleich verschiedener Suchalgorithmen nötig sind. Weiterhin wird ein naiver Suchalgorithmus dargestellt und mit der Idee von Boyer und Moore verglichen. Hierzu wird ihr Algorithmus zunächst informal beschrieben, dann mit Blick auf eine Implementation näher erläutert und anschließend einer Effizienzanalyse - sowohl empirisch als auch theoretisch - unterzogen. Abschließend findet eine kurze Bewertung mit Bezug auf Schwachstellen, Vorzüge und Verbesserungsmöglichkeiten statt, im Zuge derer einige prominente Modifikationen des Boyer-Moore Algorithmus vorgestellt werden.
  17. Oberhauser, O.: Relevance Ranking in den Online-Katalogen der "nächsten Generation" (2010) 0.01
    0.009287147 = product of:
      0.037148587 = sum of:
        0.037148587 = product of:
          0.074297175 = sum of:
            0.074297175 = weight(_text_:seiten in 4308) [ClassicSimilarity], result of:
              0.074297175 = score(doc=4308,freq=2.0), product of:
                0.20383513 = queryWeight, product of:
                  5.4984083 = idf(docFreq=491, maxDocs=44218)
                  0.03707166 = queryNorm
                0.3644964 = fieldWeight in 4308, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4984083 = idf(docFreq=491, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4308)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Relevance Ranking in Online-Katalogen ist zwar kein neues Thema, doch liegt dazu nicht allzu viel Literatur vor, die das Prädikat "ernstzunehmen" verdient. Dies ist zum einen darin begründet, dass das Interesse an der Ausgabe ranggereihter Ergebnislisten auf Seiten aller Beteiligter (Bibliothekare, Softwarehersteller, Benutzer) traditionell gering war. Zum anderen ging die seit einigen Jahren populär gewordene Kritik an den bestehenden OPACs vielfach von einer unzureichenden Wissensbasis aus und produzierte oft nur polemische oder emotional gefärbte Beiträge, die zum Thema Ranking wenig beitrugen. ... Der hier beschriebene Test ist natürlich in keiner Weise erschöpfend oder repräsentativ. Dennoch gibt er, wie ich glaube, Anlass zu einiger Hoffnung. Er lässt vermuten, dass die "neuen" OPACs - zumindest was das Relevance Ranking betrifft - auf dem Weg in die richtige Richtung sind. Wie gut es wirklich gelingen wird, die Rankingleistung von Suchmaschinen wie Google, die unter völlig anderen Voraussetzungen arbeiten, einzuholen, wird aber erst die Zukunft zeigen.
  18. Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.01
    0.009004478 = product of:
      0.018008957 = sum of:
        0.010905789 = product of:
          0.021811578 = sum of:
            0.021811578 = weight(_text_:web in 2591) [ClassicSimilarity], result of:
              0.021811578 = score(doc=2591,freq=2.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.18028519 = fieldWeight in 2591, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2591)
          0.5 = coord(1/2)
        0.0071031675 = product of:
          0.035515837 = sum of:
            0.035515837 = weight(_text_:22 in 2591) [ClassicSimilarity], result of:
              0.035515837 = score(doc=2591,freq=4.0), product of:
                0.12981863 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03707166 = queryNorm
                0.27358043 = fieldWeight in 2591, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2591)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    Purpose In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.
    Date
    20. 1.2015 18:30:22
    18. 9.2018 18:22:56
  19. Hubert, G.; Pitarch, Y.; Pinel-Sauvagnat, K.; Tournier, R.; Laporte, L.: TournaRank : when retrieval becomes document competition (2018) 0.01
    0.008080935 = product of:
      0.01616187 = sum of:
        0.010905789 = product of:
          0.021811578 = sum of:
            0.021811578 = weight(_text_:web in 5087) [ClassicSimilarity], result of:
              0.021811578 = score(doc=5087,freq=2.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.18028519 = fieldWeight in 5087, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5087)
          0.5 = coord(1/2)
        0.0052560815 = product of:
          0.026280407 = sum of:
            0.026280407 = weight(_text_:28 in 5087) [ClassicSimilarity], result of:
              0.026280407 = score(doc=5087,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.19789396 = fieldWeight in 5087, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5087)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    Numerous feature-based models have been recently proposed by the information retrieval community. The capability of features to express different relevance facets (query- or document-dependent) can explain such a success story. Such models are most of the time supervised, thus requiring a learning phase. To leverage the advantages of feature-based representations of documents, we propose TournaRank, an unsupervised approach inspired by real-life game and sport competition principles. Documents compete against each other in tournaments using features as evidences of relevance. Tournaments are modeled as a sequence of matches, which involve pairs of documents playing in turn their features. Once a tournament is ended, documents are ranked according to their number of won matches during the tournament. This principle is generic since it can be applied to any collection type. It also provides great flexibility since different alternatives can be considered by changing the tournament type, the match rules, the feature set, or the strategies adopted by documents during matches. TournaRank was experimented on several collections to evaluate our model in different contexts and to compare it with related approaches such as Learning To Rank and fusion ones: the TREC Robust2004 collection for homogeneous documents, the TREC Web2014 (ClueWeb12) collection for heterogeneous web documents, and the LETOR3.0 collection for comparison with supervised feature-based models.
    Date
    17. 3.2019 11:28:52
  20. Hammache, A.; Boughanem, M.: Term position-based language model for information retrieval (2021) 0.01
    0.008080935 = product of:
      0.01616187 = sum of:
        0.010905789 = product of:
          0.021811578 = sum of:
            0.021811578 = weight(_text_:web in 216) [ClassicSimilarity], result of:
              0.021811578 = score(doc=216,freq=2.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.18028519 = fieldWeight in 216, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=216)
          0.5 = coord(1/2)
        0.0052560815 = product of:
          0.026280407 = sum of:
            0.026280407 = weight(_text_:28 in 216) [ClassicSimilarity], result of:
              0.026280407 = score(doc=216,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.19789396 = fieldWeight in 216, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=216)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    Term position feature is widely and successfully used in IR and Web search engines, to enhance the retrieval effectiveness. This feature is essentially used for two purposes: to capture query terms proximity or to boost the weight of terms appearing in some parts of a document. In this paper, we are interested in this second category. We propose two novel query-independent techniques based on absolute term positions in a document, whose goal is to boost the weight of terms appearing in the beginning of a document. The first one considers only the earliest occurrence of a term in a document. The second one takes into account all term positions in a document. We formalize each of these two techniques as a document model based on term position, and then we incorporate it into a basic language model (LM). Two smoothing techniques, Dirichlet and Jelinek-Mercer, are considered in the basic LM. Experiments conducted on three TREC test collections show that our model, especially the version based on all term positions, achieves significant improvements over the baseline LMs, and it also often performs better than two state-of-the-art baseline models, the chronological term rank model and the Markov random field model.
    Date
    11. 4.2021 18:28:34

Languages

  • e 122
  • d 21
  • m 1
  • pt 1
  • More… Less…

Types

  • a 130
  • m 7
  • x 5
  • el 3
  • s 2
  • r 1
  • More… Less…