Search (97 results, page 1 of 5)

  • × theme_ss:"Retrievalalgorithmen"
  1. Stock, M.; Stock, W.G.: Internet-Suchwerkzeuge im Vergleich (IV) : Relevance Ranking nach "Popularität" von Webseiten: Google (2001) 0.04
    0.03585235 = product of:
      0.16133557 = sum of:
        0.12244281 = sum of:
          0.035959117 = weight(_text_:web in 5771) [ClassicSimilarity], result of:
            0.035959117 = score(doc=5771,freq=6.0), product of:
              0.09596372 = queryWeight, product of:
                3.2635105 = idf(docFreq=4597, maxDocs=44218)
                0.02940506 = queryNorm
              0.37471575 = fieldWeight in 5771, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.2635105 = idf(docFreq=4597, maxDocs=44218)
                0.046875 = fieldNorm(doc=5771)
          0.086483695 = weight(_text_:seite in 5771) [ClassicSimilarity], result of:
            0.086483695 = score(doc=5771,freq=4.0), product of:
              0.16469958 = queryWeight, product of:
                5.601063 = idf(docFreq=443, maxDocs=44218)
                0.02940506 = queryNorm
              0.52509964 = fieldWeight in 5771, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.601063 = idf(docFreq=443, maxDocs=44218)
                0.046875 = fieldNorm(doc=5771)
        0.038892757 = product of:
          0.077785514 = sum of:
            0.077785514 = weight(_text_:bewertung in 5771) [ClassicSimilarity], result of:
              0.077785514 = score(doc=5771,freq=2.0), product of:
                0.18575147 = queryWeight, product of:
                  6.31699 = idf(docFreq=216, maxDocs=44218)
                  0.02940506 = queryNorm
                0.41876122 = fieldWeight in 5771, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  6.31699 = idf(docFreq=216, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5771)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    In unserem Retrievaltest von Suchwerkzeugen im World Wide Web (Password 11/2000) schnitt die Suchmaschine Google am besten ab. Im Vergleich zu anderen Search Engines setzt Google kaum auf Informationslinguistik, sondern auf Algorithmen, die sich aus den Besonderheiten der Web-Dokumente ableiten lassen. Kernstück der informationsstatistischen Technik ist das "PageRank"- Verfahren (benannt nach dem Entwickler Larry Page), das aus der Hypertextstruktur des Web die "Popularität" von Seiten anhand ihrer ein- und ausgehenden Links berechnet. Google besticht durch das Angebot intuitiv verstehbarer Suchbildschirme sowie durch einige sehr nützliche "Kleinigkeiten" wie die Angabe des Rangs einer Seite, Highlighting, Suchen in der Seite, Suchen innerhalb eines Suchergebnisses usw., alles verstaut in einer eigenen Befehlsleiste innerhalb des Browsers. Ähnlich wie RealNames bietet Google mit dem Produkt "AdWords" den Aufkauf von Suchtermen an. Nach einer Reihe von nunmehr vier Password-Artikeln über InternetSuchwerkzeugen im Vergleich wollen wir abschließend zu einer Bewertung kommen. Wie ist der Stand der Technik bei Directories und Search Engines aus informationswissenschaftlicher Sicht einzuschätzen? Werden die "typischen" Internetnutzer, die ja in der Regel keine Information Professionals sind, adäquat bedient? Und können auch Informationsfachleute von den Suchwerkzeugen profitieren?
  2. Fichtner, K.: Boyer-Moore Suchalgorithmus (2005) 0.02
    0.0154376365 = product of:
      0.06946936 = sum of:
        0.030576602 = product of:
          0.061153203 = sum of:
            0.061153203 = weight(_text_:seite in 864) [ClassicSimilarity], result of:
              0.061153203 = score(doc=864,freq=2.0), product of:
                0.16469958 = queryWeight, product of:
                  5.601063 = idf(docFreq=443, maxDocs=44218)
                  0.02940506 = queryNorm
                0.3713015 = fieldWeight in 864, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.601063 = idf(docFreq=443, maxDocs=44218)
                  0.046875 = fieldNorm(doc=864)
          0.5 = coord(1/2)
        0.038892757 = product of:
          0.077785514 = sum of:
            0.077785514 = weight(_text_:bewertung in 864) [ClassicSimilarity], result of:
              0.077785514 = score(doc=864,freq=2.0), product of:
                0.18575147 = queryWeight, product of:
                  6.31699 = idf(docFreq=216, maxDocs=44218)
                  0.02940506 = queryNorm
                0.41876122 = fieldWeight in 864, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  6.31699 = idf(docFreq=216, maxDocs=44218)
                  0.046875 = fieldNorm(doc=864)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    Die Masse der Suchalgorithmen lässt sich in zwei grundlegend verschiedene Teilbereiche untergliedern. Auf der einen Seite stehen Algorithmen, die auf komplexen Datenstrukturen (häufig baumartig) ganze Datensätze unter Verwendung eines Indizes finden. Als geläufiger Vertreter sei hier die binäre Suche auf sortierten Arrays oder in binären Bäumen genannt. Die andere Gruppe, der sich diese Ausarbeitung widmet, dient dazu, Entsprechungen von Mustern in gegebenen Zeichenketten zu finden. Auf den folgenden Seiten werden nun zunächst einige Begriffe eingeführt, die für das weitere Verständnis und einen Vergleich verschiedener Suchalgorithmen nötig sind. Weiterhin wird ein naiver Suchalgorithmus dargestellt und mit der Idee von Boyer und Moore verglichen. Hierzu wird ihr Algorithmus zunächst informal beschrieben, dann mit Blick auf eine Implementation näher erläutert und anschließend einer Effizienzanalyse - sowohl empirisch als auch theoretisch - unterzogen. Abschließend findet eine kurze Bewertung mit Bezug auf Schwachstellen, Vorzüge und Verbesserungsmöglichkeiten statt, im Zuge derer einige prominente Modifikationen des Boyer-Moore Algorithmus vorgestellt werden.
  3. Tober, M.; Hennig, L.; Furch, D.: SEO Ranking-Faktoren und Rang-Korrelationen 2014 : Google Deutschland (2014) 0.02
    0.015065095 = product of:
      0.13558586 = sum of:
        0.13558586 = sum of:
          0.10371402 = weight(_text_:bewertung in 1484) [ClassicSimilarity], result of:
            0.10371402 = score(doc=1484,freq=2.0), product of:
              0.18575147 = queryWeight, product of:
                6.31699 = idf(docFreq=216, maxDocs=44218)
                0.02940506 = queryNorm
              0.5583483 = fieldWeight in 1484, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.31699 = idf(docFreq=216, maxDocs=44218)
                0.0625 = fieldNorm(doc=1484)
          0.031871837 = weight(_text_:22 in 1484) [ClassicSimilarity], result of:
            0.031871837 = score(doc=1484,freq=2.0), product of:
              0.10297151 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.02940506 = queryNorm
              0.30952093 = fieldWeight in 1484, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=1484)
      0.11111111 = coord(1/9)
    
    Abstract
    Dieses Whitepaper beschäftigt sich mit der Definition und Bewertung von Faktoren, die eine hohe Rangkorrelation-Koeffizienz mit organischen Suchergebnissen aufweisen und dient dem Zweck der tieferen Analyse von Suchmaschinen-Algorithmen. Die Datenerhebung samt Auswertung bezieht sich auf Ranking-Faktoren für Google-Deutschland im Jahr 2014. Zusätzlich wurden die Korrelationen und Faktoren unter anderem anhand von Durchschnitts- und Medianwerten sowie Entwicklungstendenzen zu den Vorjahren hinsichtlich ihrer Relevanz für vordere Suchergebnis-Positionen interpretiert.
    Date
    13. 9.2014 14:45:22
  4. Langville, A.N.; Meyer, C.D.: Google's PageRank and beyond : the science of search engine rankings (2006) 0.01
    0.008264816 = product of:
      0.07438335 = sum of:
        0.07438335 = sum of:
          0.031141505 = weight(_text_:web in 6) [ClassicSimilarity], result of:
            0.031141505 = score(doc=6,freq=18.0), product of:
              0.09596372 = queryWeight, product of:
                3.2635105 = idf(docFreq=4597, maxDocs=44218)
                0.02940506 = queryNorm
              0.32451332 = fieldWeight in 6, product of:
                4.2426405 = tf(freq=18.0), with freq of:
                  18.0 = termFreq=18.0
                3.2635105 = idf(docFreq=4597, maxDocs=44218)
                0.0234375 = fieldNorm(doc=6)
          0.043241847 = weight(_text_:seite in 6) [ClassicSimilarity], result of:
            0.043241847 = score(doc=6,freq=4.0), product of:
              0.16469958 = queryWeight, product of:
                5.601063 = idf(docFreq=443, maxDocs=44218)
                0.02940506 = queryNorm
              0.26254982 = fieldWeight in 6, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.601063 = idf(docFreq=443, maxDocs=44218)
                0.0234375 = fieldNorm(doc=6)
      0.11111111 = coord(1/9)
    
    Abstract
    Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other Web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of Web page rankings, "Google's PageRank and Beyond" supplies the answers to these and other questions and more. The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research. The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample Web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text. Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided. It includes: many illustrative examples and entertaining asides; MATLAB code; accessible and informal style; and complete and self-contained section for mathematics review.
    Content
    Inhalt: Chapter 1. Introduction to Web Search Engines: 1.1 A Short History of Information Retrieval - 1.2 An Overview of Traditional Information Retrieval - 1.3 Web Information Retrieval Chapter 2. Crawling, Indexing, and Query Processing: 2.1 Crawling - 2.2 The Content Index - 2.3 Query Processing Chapter 3. Ranking Webpages by Popularity: 3.1 The Scene in 1998 - 3.2 Two Theses - 3.3 Query-Independence Chapter 4. The Mathematics of Google's PageRank: 4.1 The Original Summation Formula for PageRank - 4.2 Matrix Representation of the Summation Equations - 4.3 Problems with the Iterative Process - 4.4 A Little Markov Chain Theory - 4.5 Early Adjustments to the Basic Model - 4.6 Computation of the PageRank Vector - 4.7 Theorem and Proof for Spectrum of the Google Matrix Chapter 5. Parameters in the PageRank Model: 5.1 The a Factor - 5.2 The Hyperlink Matrix H - 5.3 The Teleportation Matrix E Chapter 6. The Sensitivity of PageRank; 6.1 Sensitivity with respect to alpha - 6.2 Sensitivity with respect to H - 6.3 Sensitivity with respect to vT - 6.4 Other Analyses of Sensitivity - 6.5 Sensitivity Theorems and Proofs Chapter 7. The PageRank Problem as a Linear System: 7.1 Properties of (I - alphaS) - 7.2 Properties of (I - alphaH) - 7.3 Proof of the PageRank Sparse Linear System Chapter 8. Issues in Large-Scale Implementation of PageRank: 8.1 Storage Issues - 8.2 Convergence Criterion - 8.3 Accuracy - 8.4 Dangling Nodes - 8.5 Back Button Modeling
    Chapter 9. Accelerating the Computation of PageRank: 9.1 An Adaptive Power Method - 9.2 Extrapolation - 9.3 Aggregation - 9.4 Other Numerical Methods Chapter 10. Updating the PageRank Vector: 10.1 The Two Updating Problems and their History - 10.2 Restarting the Power Method - 10.3 Approximate Updating Using Approximate Aggregation - 10.4 Exact Aggregation - 10.5 Exact vs. Approximate Aggregation - 10.6 Updating with Iterative Aggregation - 10.7 Determining the Partition - 10.8 Conclusions Chapter 11. The HITS Method for Ranking Webpages: 11.1 The HITS Algorithm - 11.2 HITS Implementation - 11.3 HITS Convergence - 11.4 HITS Example - 11.5 Strengths and Weaknesses of HITS - 11.6 HITS's Relationship to Bibliometrics - 11.7 Query-Independent HITS - 11.8 Accelerating HITS - 11.9 HITS Sensitivity Chapter 12. Other Link Methods for Ranking Webpages: 12.1 SALSA - 12.2 Hybrid Ranking Methods - 12.3 Rankings based on Traffic Flow Chapter 13. The Future of Web Information Retrieval: 13.1 Spam - 13.2 Personalization - 13.3 Clustering - 13.4 Intelligent Agents - 13.5 Trends and Time-Sensitive Search - 13.6 Privacy and Censorship - 13.7 Library Classification Schemes - 13.8 Data Fusion Chapter 14. Resources for Web Information Retrieval: 14.1 Resources for Getting Started - 14.2 Resources for Serious Study Chapter 15. The Mathematics Guide: 15.1 Linear Algebra - 15.2 Perron-Frobenius Theory - 15.3 Markov Chains - 15.4 Perron Complementation - 15.5 Stochastic Complementation - 15.6 Censoring - 15.7 Aggregation - 15.8 Disaggregation
    RSWK
    Google / Web-Seite / Rangstatistik (HEBIS)
    Subject
    Google / Web-Seite / Rangstatistik (HEBIS)
  5. Chang, C.-H.; Hsu, C.-C.: Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval (1998) 0.01
    0.007760017 = product of:
      0.034920078 = sum of:
        0.02097615 = product of:
          0.0419523 = sum of:
            0.0419523 = weight(_text_:web in 1319) [ClassicSimilarity], result of:
              0.0419523 = score(doc=1319,freq=6.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.43716836 = fieldWeight in 1319, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1319)
          0.5 = coord(1/2)
        0.013943928 = product of:
          0.027887857 = sum of:
            0.027887857 = weight(_text_:22 in 1319) [ClassicSimilarity], result of:
              0.027887857 = score(doc=1319,freq=2.0), product of:
                0.10297151 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02940506 = queryNorm
                0.2708308 = fieldWeight in 1319, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1319)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    Keyword based querying has been an immediate and efficient way to specify and retrieve related information that the user inquired. However, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given. Proposes an idea to integrate 2 existing techniques, query expansion and relevance feedback to achieve a concept-based information search for the Web
    Date
    1. 8.1996 22:08:06
    Footnote
    Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia
  6. Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.01
    0.007269542 = product of:
      0.03271294 = sum of:
        0.020761002 = product of:
          0.041522004 = sum of:
            0.041522004 = weight(_text_:web in 2239) [ClassicSimilarity], result of:
              0.041522004 = score(doc=2239,freq=8.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.43268442 = fieldWeight in 2239, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2239)
          0.5 = coord(1/2)
        0.011951938 = product of:
          0.023903877 = sum of:
            0.023903877 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
              0.023903877 = score(doc=2239,freq=2.0), product of:
                0.10297151 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02940506 = queryNorm
                0.23214069 = fieldWeight in 2239, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2239)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
    Date
    31. 5.2004 19:22:06
  7. Effektive Information Retrieval Verfahren in Theorie und Praxis : ausgewählte und erweiterte Beiträge des Vierten Hildesheimer Evaluierungs- und Retrievalworkshop (HIER 2005), Hildesheim, 20.7.2005 (2006) 0.01
    0.005161697 = product of:
      0.023227636 = sum of:
        0.0048934156 = product of:
          0.009786831 = sum of:
            0.009786831 = weight(_text_:web in 5973) [ClassicSimilarity], result of:
              0.009786831 = score(doc=5973,freq=4.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.1019847 = fieldWeight in 5973, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.015625 = fieldNorm(doc=5973)
          0.5 = coord(1/2)
        0.018334221 = product of:
          0.036668442 = sum of:
            0.036668442 = weight(_text_:bewertung in 5973) [ClassicSimilarity], result of:
              0.036668442 = score(doc=5973,freq=4.0), product of:
                0.18575147 = queryWeight, product of:
                  6.31699 = idf(docFreq=216, maxDocs=44218)
                  0.02940506 = queryNorm
                0.19740593 = fieldWeight in 5973, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  6.31699 = idf(docFreq=216, maxDocs=44218)
                  0.015625 = fieldNorm(doc=5973)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Content
    Inhalt: Jan-Hendrik Scheufen: RECOIN: Modell offener Schnittstellen für Information-Retrieval-Systeme und -Komponenten Markus Nick, Klaus-Dieter Althoff: Designing Maintainable Experience-based Information Systems Gesine Quint, Steffen Weichert: Die benutzerzentrierte Entwicklung des Produkt- Retrieval-Systems EIKON der Blaupunkt GmbH Claus-Peter Klas, Sascha Kriewel, André Schaefer, Gudrun Fischer: Das DAFFODIL System - Strategische Literaturrecherche in Digitalen Bibliotheken Matthias Meiert: Entwicklung eines Modells zur Integration digitaler Dokumente in die Universitätsbibliothek Hildesheim Daniel Harbig, René Schneider: Ontology Learning im Rahmen von MyShelf Michael Kluck, Marco Winter: Topic-Entwicklung und Relevanzbewertung bei GIRT: ein Werkstattbericht Thomas Mandl: Neue Entwicklungen bei den Evaluierungsinitiativen im Information Retrieval Joachim Pfister: Clustering von Patent-Dokumenten am Beispiel der Datenbanken des Fachinformationszentrums Karlsruhe Ralph Kölle, Glenn Langemeier, Wolfgang Semar: Programmieren lernen in kollaborativen Lernumgebungen Olga Tartakovski, Margaryta Shramko: Implementierung eines Werkzeugs zur Sprachidentifikation in mono- und multilingualen Texten Nina Kummer: Indexierungstechniken für das japanische Retrieval Suriya Na Nhongkai, Hans-Joachim Bentz: Bilinguale Suche mittels Konzeptnetzen Robert Strötgen, Thomas Mandl, René Schneider: Entwicklung und Evaluierung eines Question Answering Systems im Rahmen des Cross Language Evaluation Forum (CLEF) Niels Jensen: Evaluierung von mehrsprachigem Web-Retrieval: Experimente mit dem EuroGOV-Korpus im Rahmen des Cross Language Evaluation Forum (CLEF)
    Footnote
    "Evaluierung", das Thema des dritten Kapitels, ist in seiner Breite nicht auf das Information Retrieval beschränkt sondern beinhaltet ebenso einzelne Aspekte der Bereiche Mensch-Maschine-Interaktion sowie des E-Learning. Michael Muck und Marco Winter von der Stiftung Wissenschaft und Politik sowie dem Informationszentrum Sozialwissenschaften thematisieren in ihrem Beitrag den Einfluss der Fragestellung (Topic) auf die Bewertung von Relevanz und zeigen Verfahrensweisen für die Topic-Erstellung auf, die beim Cross Language Evaluation Forum (CLEF) Anwendung finden. Im darauf folgenden Aufsatz stellt Thomas Mandl verschiedene Evaluierungsinitiativen im Information Retrieval und aktuelle Entwicklungen dar. Joachim Pfister erläutert in seinem Beitrag das automatisierte Gruppieren, das sogenannte Clustering, von Patent-Dokumenten in den Datenbanken des Fachinformationszentrums Karlsruhe und evaluiert unterschiedliche Clusterverfahren auf Basis von Nutzerbewertungen. Ralph Kölle, Glenn Langemeier und Wolfgang Semar widmen sich dem kollaborativen Lernen unter den speziellen Bedingungen des Programmierens. Dabei werden das System VitaminL zur synchronen Bearbeitung von Programmieraufgaben und das Kennzahlensystem K-3 für die Bewertung kollaborativer Zusammenarbeit in einer Lehrveranstaltung angewendet. Der aktuelle Forschungsschwerpunkt der Hildesheimer Informationswissenschaft zeichnet sich im vierten Kapitel unter dem Thema "Multilinguale Systeme" ab. Hier finden sich die meisten Beiträge des Tagungsbandes wieder. Olga Tartakovski und Margaryta Shramko beschreiben und prüfen das System Langldent, das die Sprache von mono- und multilingualen Texten identifiziert. Die Eigenheiten der japanischen Schriftzeichen stellt Nina Kummer dar und vergleicht experimentell die unterschiedlichen Techniken der Indexierung. Suriya Na Nhongkai und Hans-Joachim Bentz präsentieren und prüfen eine bilinguale Suche auf Basis von Konzeptnetzen, wobei die Konzeptstruktur das verbindende Elemente der beiden Textsammlungen darstellt. Das Entwickeln und Evaluieren eines mehrsprachigen Question-Answering-Systems im Rahmen des Cross Language Evaluation Forum (CLEF), das die alltagssprachliche Formulierung von konkreten Fragestellungen ermöglicht, wird im Beitrag von Robert Strötgen, Thomas Mandl und Rene Schneider thematisiert. Den Schluss bildet der Aufsatz von Niels Jensen, der ein mehrsprachiges Web-Retrieval-System ebenfalls im Zusammenhang mit dem CLEF anhand des multilingualen EuroGOVKorpus evaluiert.
  8. Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.01
    0.005052425 = product of:
      0.022735912 = sum of:
        0.008650418 = product of:
          0.017300837 = sum of:
            0.017300837 = weight(_text_:web in 2591) [ClassicSimilarity], result of:
              0.017300837 = score(doc=2591,freq=2.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.18028519 = fieldWeight in 2591, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2591)
          0.5 = coord(1/2)
        0.014085495 = product of:
          0.02817099 = sum of:
            0.02817099 = weight(_text_:22 in 2591) [ClassicSimilarity], result of:
              0.02817099 = score(doc=2591,freq=4.0), product of:
                0.10297151 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02940506 = queryNorm
                0.27358043 = fieldWeight in 2591, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2591)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    Purpose In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.
    Date
    20. 1.2015 18:30:22
    18. 9.2018 18:22:56
  9. Meghabghab, G.: Google's Web page ranking applied to different topological Web graph structures (2001) 0.00
    0.00460955 = product of:
      0.041485947 = sum of:
        0.041485947 = product of:
          0.08297189 = sum of:
            0.08297189 = weight(_text_:web in 6028) [ClassicSimilarity], result of:
              0.08297189 = score(doc=6028,freq=46.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.86461735 = fieldWeight in 6028, product of:
                  6.78233 = tf(freq=46.0), with freq of:
                    46.0 = termFreq=46.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6028)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    
    Abstract
    This research is part of the ongoing study to better understand web page ranking on the web. It looks at a web page as a graph structure or a web graph, and tries to classify different web graphs in the new coordinate space: (out-degree, in-degree). The out-degree coordinate od is defined as the number of outgoing web pages from a given web page. The in-degree id coordinate is the number of web pages that point to a given web page. In this new coordinate space a metric is built to classify how close or far different web graphs are. Google's web ranking algorithm (Brin & Page, 1998) on ranking web pages is applied in this new coordinate space. The results of the algorithm has been modified to fit different topological web graph structures. Also the algorithm was not successful in the case of general web graphs and new ranking web algorithms have to be considered. This study does not look at enhancing web ranking by adding any contextual information. It only considers web links as a source to web page ranking. The author believes that understanding the underlying web page as a graph will help design better ranking web algorithms, enhance retrieval and web performance, and recommends using graphs as a part of visual aid for browsing engine designers
  10. Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.00
    0.004240567 = product of:
      0.01908255 = sum of:
        0.012110585 = product of:
          0.02422117 = sum of:
            0.02422117 = weight(_text_:web in 2509) [ClassicSimilarity], result of:
              0.02422117 = score(doc=2509,freq=8.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.25239927 = fieldWeight in 2509, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=2509)
          0.5 = coord(1/2)
        0.006971964 = product of:
          0.013943928 = sum of:
            0.013943928 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
              0.013943928 = score(doc=2509,freq=2.0), product of:
                0.10297151 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02940506 = queryNorm
                0.1354154 = fieldWeight in 2509, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=2509)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Content
    "Most Web search engines accept natural language queries, perform some kind of fuzzy matching and produce ranked output, displaying first the documents that are most likely to be relevant. On the other hand, most library online public access catalogs (OPACs) an the Web are still Boolean retrieval systems that perform exact matching, and require users to express their search requests precisely in a Boolean search language and to refine their search statements to improve the search results. It is well-documented that users have difficulty searching Boolean OPACs effectively (e.g. Borgman, 1996; Ensor, 1992; Wallace, 1993). One approach to making OPACs easier to use is to develop a natural language search interface that acts as a middleware between the user's Web browser and the OPAC system. The search interface can accept a natural language query from the user and reformulate it as a series of Boolean search statements that are then submitted to the OPAC. The records retrieved by the OPAC are ranked by the search interface before forwarding them to the user's Web browser. The user, then, does not need to interact directly with the Boolean OPAC but with the natural language search interface or search intermediary. The search interface interacts with the OPAC system an the user's behalf. The advantage of this approach is that no modification to the OPAC or library system is required. Furthermore, the search interface can access multiple OPACs, acting as a meta search engine, and integrate search results from various OPACs before sending them to the user. The search interface needs to incorporate a method for converting the user's natural language query into a series of Boolean search statements, and for ranking the OPAC records retrieved. The purpose of this study was to develop a relevancyranking algorithm for a search interface to Boolean OPAC systems. This is part of an on-going effort to develop a knowledge-based search interface to OPACs called the E-Referencer (Khoo et al., 1998, 1999; Poo et al., 2000). E-Referencer v. 2 that has been implemented applies a repertoire of initial search strategies and reformulation strategies to retrieve records from OPACs using the Z39.50 protocol, and also assists users in mapping query keywords to the Library of Congress subject headings."
    Source
    Electronic library. 22(2004) no.2, S.112-120
  11. Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.00
    0.004135637 = product of:
      0.018610368 = sum of:
        0.008650418 = product of:
          0.017300837 = sum of:
            0.017300837 = weight(_text_:web in 56) [ClassicSimilarity], result of:
              0.017300837 = score(doc=56,freq=2.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.18028519 = fieldWeight in 56, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=56)
          0.5 = coord(1/2)
        0.009959949 = product of:
          0.019919898 = sum of:
            0.019919898 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
              0.019919898 = score(doc=56,freq=2.0), product of:
                0.10297151 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02940506 = queryNorm
                0.19345059 = fieldWeight in 56, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=56)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.
    Date
    22. 7.2006 16:32:43
  12. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.00
    0.0035413152 = product of:
      0.031871837 = sum of:
        0.031871837 = product of:
          0.06374367 = sum of:
            0.06374367 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.06374367 = score(doc=402,freq=2.0), product of:
                0.10297151 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02940506 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  13. Wilhelmy, A.: Phonetische Ähnlichkeitssuche in Datenbanken (1991) 0.00
    0.0033974003 = product of:
      0.030576602 = sum of:
        0.030576602 = product of:
          0.061153203 = sum of:
            0.061153203 = weight(_text_:seite in 5684) [ClassicSimilarity], result of:
              0.061153203 = score(doc=5684,freq=2.0), product of:
                0.16469958 = queryWeight, product of:
                  5.601063 = idf(docFreq=443, maxDocs=44218)
                  0.02940506 = queryNorm
                0.3713015 = fieldWeight in 5684, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.601063 = idf(docFreq=443, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5684)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    
    Abstract
    In dialoggesteuerten Systemen zur Informationswiedergewinnung (Information Retrieval Systems, IRS) kann man - vergröbernd - das Wechselspiel zwischen Mensch und Computer als iterativen Prozess zur Erhöhung von Genauigkeit (Precision) auf der einen und Vollständigkeit (Recall) der Nachweise auf der anderen Seite verstehen. Vorgestellt wird ein maschinell anwendbares Verfahren, das auf phonologische Untersuchungen des Sprachwissenschaftlers Nikolaj S. Trubetzkoy (1890-1938) zurückgeht. In den Grundzügen kann es erheblich zur Verbesserung der Nachweisvollständigkeit beitragen. Dadurch, daß es die 'Ähnlichkeitsumgebungen' von Suchbegriffen in die Recherche mit einbezieht, zeigt es sich vor allem für Systeme mit koordinativer maschineller Indexierung als vorteilhaft. Bei alphabetischen Begriffen erweist sich die Einführung eines solchen zunächst nur auf den Benutzer hin orientierten Verfahrens auch aus technischer Sicht als günstig, da damit die Anzahl der Zugriffe bei den Suchvorgängen auch für große Datenvolumina niedrig gehalten werden kann
  14. Thelwall, M.; Vaughan, L.: New versions of PageRank employing alternative Web document models (2004) 0.00
    0.003262277 = product of:
      0.029360492 = sum of:
        0.029360492 = product of:
          0.058720984 = sum of:
            0.058720984 = weight(_text_:web in 674) [ClassicSimilarity], result of:
              0.058720984 = score(doc=674,freq=16.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.6119082 = fieldWeight in 674, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=674)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    
    Abstract
    Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects' rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.
  15. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.00
    0.0030986508 = product of:
      0.027887857 = sum of:
        0.027887857 = product of:
          0.055775713 = sum of:
            0.055775713 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.055775713 = score(doc=2134,freq=2.0), product of:
                0.10297151 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02940506 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    
    Date
    30. 3.2001 13:32:22
  16. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.00
    0.0030986508 = product of:
      0.027887857 = sum of:
        0.027887857 = product of:
          0.055775713 = sum of:
            0.055775713 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.055775713 = score(doc=3445,freq=2.0), product of:
                0.10297151 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02940506 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    
    Date
    25. 8.2005 17:42:22
  17. Henzinger, M.R.: Hyperlink analysis for the Web (2001) 0.00
    0.0027724023 = product of:
      0.02495162 = sum of:
        0.02495162 = product of:
          0.04990324 = sum of:
            0.04990324 = weight(_text_:web in 8) [ClassicSimilarity], result of:
              0.04990324 = score(doc=8,freq=26.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.520022 = fieldWeight in 8, product of:
                  5.0990195 = tf(freq=26.0), with freq of:
                    26.0 = termFreq=26.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03125 = fieldNorm(doc=8)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    
    Abstract
    Hyperlink analysis algorithms allow search engines to deliver focused results to user queries.This article surveys ranking algorithms used to retrieve information on the Web.
    Content
    Information retrieval is a computer science subfield whose goal is to find all documents relevant to a user query in a given collection of documents. As such, information retrieval should really be called document retrieval. Before the advent of the Web, IR systems were typically installed in libraries for use mostly by reference librarians. The retrieval algorithm for these systems was usually based exclusively on analysis of the words in the document. The Web changed all this. Now each Web user has access to various search engines whose retrieval algorithms often use not only the words in the documents but also information like the hyperlink structure of the Web or markup language tags. How are hyperlinks useful? The hyperlink functionality alone-that is, the hyperlink to Web page B that is contained in Web page A-is not directly useful in information retrieval. However, the way Web page authors use hyperlinks can give them valuable information content. Authors usually create hyperlinks they think will be useful to readers. Some may be navigational aids that, for example, take the reader back to the site's home page; others provide access to documents that augment the content of the current page. The latter tend to point to highquality pages that might be on the same topic as the page containing the hyperlink. Web information retrieval systems can exploit this information to refine searches for relevant documents. Hyperlink analysis significantly improves the relevance of the search results, so much so that all major Web search engines claim to use some type of hyperlink analysis. However, the search engines do not disclose details about the type of hyperlink analysis they perform- mostly to avoid manipulation of search results by Web-positioning companies. In this article, I discuss how hyperlink analysis can be applied to ranking algorithms, and survey other ways Web search engines can use this analysis.
  18. Ning, X.; Jin, H.; Wu, H.: RSS: a framework enabling ranked search on the semantic web (2008) 0.00
    0.0027185641 = product of:
      0.024467077 = sum of:
        0.024467077 = product of:
          0.048934154 = sum of:
            0.048934154 = weight(_text_:web in 2069) [ClassicSimilarity], result of:
              0.048934154 = score(doc=2069,freq=16.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.5099235 = fieldWeight in 2069, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2069)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    
    Abstract
    The semantic web not only contains resources but also includes the heterogeneous relationships among them, which is sharply distinguished from the current web. As the growth of the semantic web, specialized search techniques are of significance. In this paper, we present RSS-a framework for enabling ranked semantic search on the semantic web. In this framework, the heterogeneity of relationships is fully exploited to determine the global importance of resources. In addition, the search results can be greatly expanded with entities most semantically related to the query, thus able to provide users with properly ordered semantic search results by combining global ranking values and the relevance between the resources and the query. The proposed semantic search model which supports inference is very different from traditional keyword-based search methods. Moreover, RSS also distinguishes from many current methods of accessing the semantic web data in that it applies novel ranking strategies to prevent returning search results in disorder. The experimental results show that the framework is feasible and can produce better ordering of semantic search results than directly applying the standard PageRank algorithm on the semantic web.
    Theme
    Semantic Web
  19. Zhang, D.; Dong, Y.: ¬An effective algorithm to rank Web resources (2000) 0.00
    0.0026912412 = product of:
      0.02422117 = sum of:
        0.02422117 = product of:
          0.04844234 = sum of:
            0.04844234 = weight(_text_:web in 3662) [ClassicSimilarity], result of:
              0.04844234 = score(doc=3662,freq=2.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.50479853 = fieldWeight in 3662, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3662)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    
  20. Finding anything in the billion page Web : are algorithms the key? (1999) 0.00
    0.0026912412 = product of:
      0.02422117 = sum of:
        0.02422117 = product of:
          0.04844234 = sum of:
            0.04844234 = weight(_text_:web in 6248) [ClassicSimilarity], result of:
              0.04844234 = score(doc=6248,freq=2.0), product of:
                0.09596372 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02940506 = queryNorm
                0.50479853 = fieldWeight in 6248, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6248)
          0.5 = coord(1/2)
      0.11111111 = coord(1/9)
    

Years

Languages

  • e 83
  • d 13
  • m 1
  • More… Less…

Types

  • a 85
  • m 7
  • el 2
  • s 2
  • x 2
  • r 1
  • More… Less…