Search (34 results, page 1 of 2)

  • × theme_ss:"Suchmaschinen"
  • × theme_ss:"Retrievalalgorithmen"
  1. Kanaeva, Z.: Ranking: Google und CiteSeer (2005) 0.03
    0.030123372 = product of:
      0.060246743 = sum of:
        0.010820055 = weight(_text_:in in 3276) [ClassicSimilarity], result of:
          0.010820055 = score(doc=3276,freq=6.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.1822149 = fieldWeight in 3276, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3276)
        0.028725822 = weight(_text_:und in 3276) [ClassicSimilarity], result of:
          0.028725822 = score(doc=3276,freq=6.0), product of:
            0.09675359 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.043654136 = queryNorm
            0.2968967 = fieldWeight in 3276, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3276)
        0.020700864 = product of:
          0.04140173 = sum of:
            0.04140173 = weight(_text_:22 in 3276) [ClassicSimilarity], result of:
              0.04140173 = score(doc=3276,freq=2.0), product of:
                0.15286934 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043654136 = queryNorm
                0.2708308 = fieldWeight in 3276, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3276)
          0.5 = coord(1/2)
      0.5 = coord(3/6)
    
    Abstract
    Im Rahmen des klassischen Information Retrieval wurden verschiedene Verfahren für das Ranking sowie die Suche in einer homogenen strukturlosen Dokumentenmenge entwickelt. Die Erfolge der Suchmaschine Google haben gezeigt dass die Suche in einer zwar inhomogenen aber zusammenhängenden Dokumentenmenge wie dem Internet unter Berücksichtigung der Dokumentenverbindungen (Links) sehr effektiv sein kann. Unter den von der Suchmaschine Google realisierten Konzepten ist ein Verfahren zum Ranking von Suchergebnissen (PageRank), das in diesem Artikel kurz erklärt wird. Darüber hinaus wird auf die Konzepte eines Systems namens CiteSeer eingegangen, welches automatisch bibliographische Angaben indexiert (engl. Autonomous Citation Indexing, ACI). Letzteres erzeugt aus einer Menge von nicht vernetzten wissenschaftlichen Dokumenten eine zusammenhängende Dokumentenmenge und ermöglicht den Einsatz von Banking-Verfahren, die auf den von Google genutzten Verfahren basieren.
    Date
    20. 3.2005 16:23:22
    Source
    Information - Wissenschaft und Praxis. 56(2005) H.2, S.87-92
  2. Tober, M.; Hennig, L.; Furch, D.: SEO Ranking-Faktoren und Rang-Korrelationen 2014 : Google Deutschland (2014) 0.02
    0.022013616 = product of:
      0.06604084 = sum of:
        0.042382717 = weight(_text_:und in 1484) [ClassicSimilarity], result of:
          0.042382717 = score(doc=1484,freq=10.0), product of:
            0.09675359 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.043654136 = queryNorm
            0.438048 = fieldWeight in 1484, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.0625 = fieldNorm(doc=1484)
        0.02365813 = product of:
          0.04731626 = sum of:
            0.04731626 = weight(_text_:22 in 1484) [ClassicSimilarity], result of:
              0.04731626 = score(doc=1484,freq=2.0), product of:
                0.15286934 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043654136 = queryNorm
                0.30952093 = fieldWeight in 1484, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1484)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Dieses Whitepaper beschäftigt sich mit der Definition und Bewertung von Faktoren, die eine hohe Rangkorrelation-Koeffizienz mit organischen Suchergebnissen aufweisen und dient dem Zweck der tieferen Analyse von Suchmaschinen-Algorithmen. Die Datenerhebung samt Auswertung bezieht sich auf Ranking-Faktoren für Google-Deutschland im Jahr 2014. Zusätzlich wurden die Korrelationen und Faktoren unter anderem anhand von Durchschnitts- und Medianwerten sowie Entwicklungstendenzen zu den Vorjahren hinsichtlich ihrer Relevanz für vordere Suchergebnis-Positionen interpretiert.
    Date
    13. 9.2014 14:45:22
  3. Behnert, C.; Plassmeier, K.; Borst, T.; Lewandowski, D.: Evaluierung von Rankingverfahren für bibliothekarische Informationssysteme (2019) 0.01
    0.01400142 = product of:
      0.042004257 = sum of:
        0.008834538 = weight(_text_:in in 5023) [ClassicSimilarity], result of:
          0.008834538 = score(doc=5023,freq=4.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.14877784 = fieldWeight in 5023, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5023)
        0.03316972 = weight(_text_:und in 5023) [ClassicSimilarity], result of:
          0.03316972 = score(doc=5023,freq=8.0), product of:
            0.09675359 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.043654136 = queryNorm
            0.34282678 = fieldWeight in 5023, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5023)
      0.33333334 = coord(2/6)
    
    Abstract
    Dieser Beitrag beschreibt eine Studie zur Entwicklung und Evaluierung von Rankingverfahren für bibliothekarische Informationssysteme. Dazu wurden mögliche Faktoren für das Relevanzranking ausgehend von den Verfahren in Websuchmaschinen identifiziert, auf den Bibliothekskontext übertragen und systematisch evaluiert. Mithilfe eines Testsystems, das auf dem ZBW-Informationsportal EconBiz und einer web-basierten Software zur Evaluierung von Suchsystemen aufsetzt, wurden verschiedene Relevanzfaktoren (z. B. Popularität in Verbindung mit Aktualität) getestet. Obwohl die getesteten Rankingverfahren auf einer theoretischen Ebene divers sind, konnten keine einheitlichen Verbesserungen gegenüber den Baseline-Rankings gemessen werden. Die Ergebnisse deuten darauf hin, dass eine Adaptierung des Rankings auf individuelle Nutzer bzw. Nutzungskontexte notwendig sein könnte, um eine höhere Performance zu erzielen.
    Source
    Information - Wissenschaft und Praxis. 70(2019) H.1, S.14-23
  4. Weiß, B.: Verwandte Seiten finden : "Ähnliche Seiten" oder "What's Related" (2005) 0.01
    0.013023684 = product of:
      0.03907105 = sum of:
        0.007728611 = weight(_text_:in in 868) [ClassicSimilarity], result of:
          0.007728611 = score(doc=868,freq=6.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.1301535 = fieldWeight in 868, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=868)
        0.03134244 = weight(_text_:und in 868) [ClassicSimilarity], result of:
          0.03134244 = score(doc=868,freq=14.0), product of:
            0.09675359 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.043654136 = queryNorm
            0.32394084 = fieldWeight in 868, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.0390625 = fieldNorm(doc=868)
      0.33333334 = coord(2/6)
    
    Abstract
    Die Link-Struktur-Analyse (LSA) ist nicht nur beim Crawling, dem Webseitenranking, der Abgrenzung geographischer Bereiche, der Vorhersage von Linkverwendungen, dem Auffinden von "Mirror"-Seiten, dem Kategorisieren von Webseiten und beim Generieren von Webseitenstatistiken eines der wichtigsten Analyseverfahren, sondern auch bei der Suche nach verwandten Seiten. Um qualitativ hochwertige verwandte Seiten zu finden, bildet sie nach herrschender Meinung den Hauptbestandteil bei der Identifizierung von ähnlichen Seiten innerhalb themenspezifischer Graphen vernetzter Dokumente. Dabei wird stets von zwei Annahmen ausgegangen: Links zwischen zwei Dokumenten implizieren einen verwandten Inhalt beider Dokumente und wenn die Dokumente aus unterschiedlichen Quellen (von unterschiedlichen Autoren, Hosts, Domänen, .) stammen, so bedeutet dies das eine Quelle die andere über einen Link empfiehlt. Aufbauend auf dieser Idee entwickelte Kleinberg 1998 den HITS Algorithmus um verwandte Seiten über die Link-Struktur-Analyse zu bestimmen. Dieser Ansatz wurde von Bharat und Henzinger weiterentwickelt und später auch in Algorithmen wie dem Companion und Cocitation Algorithmus zur Suche von verwandten Seiten basierend auf nur einer Anfrage-URL weiter verfolgt. In der vorliegenden Seminararbeit sollen dabei die Algorithmen, die hinter diesen Überlegungen stehen, näher erläutert werden und im Anschluss jeweils neuere Forschungsansätze auf diesem Themengebiet aufgezeigt werden.
    Content
    Ausarbeitung im Rahmen des Seminars Suchmaschinen und Suchalgorithmen, Institut für Wirtschaftsinformatik Praktische Informatik in der Wirtschaft, Westfälische Wilhelms-Universität Münster. - Vgl.: http://www-wi.uni-muenster.de/pi/lehre/ss05/seminarSuchen/Ausarbeitungen/BurkhardWei%DF.pdf
  5. Stock, M.; Stock, W.G.: Internet-Suchwerkzeuge im Vergleich (IV) : Relevance Ranking nach "Popularität" von Webseiten: Google (2001) 0.01
    0.011777069 = product of:
      0.03533121 = sum of:
        0.010709076 = weight(_text_:in in 5771) [ClassicSimilarity], result of:
          0.010709076 = score(doc=5771,freq=8.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.18034597 = fieldWeight in 5771, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=5771)
        0.024622133 = weight(_text_:und in 5771) [ClassicSimilarity], result of:
          0.024622133 = score(doc=5771,freq=6.0), product of:
            0.09675359 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.043654136 = queryNorm
            0.2544829 = fieldWeight in 5771, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.046875 = fieldNorm(doc=5771)
      0.33333334 = coord(2/6)
    
    Abstract
    In unserem Retrievaltest von Suchwerkzeugen im World Wide Web (Password 11/2000) schnitt die Suchmaschine Google am besten ab. Im Vergleich zu anderen Search Engines setzt Google kaum auf Informationslinguistik, sondern auf Algorithmen, die sich aus den Besonderheiten der Web-Dokumente ableiten lassen. Kernstück der informationsstatistischen Technik ist das "PageRank"- Verfahren (benannt nach dem Entwickler Larry Page), das aus der Hypertextstruktur des Web die "Popularität" von Seiten anhand ihrer ein- und ausgehenden Links berechnet. Google besticht durch das Angebot intuitiv verstehbarer Suchbildschirme sowie durch einige sehr nützliche "Kleinigkeiten" wie die Angabe des Rangs einer Seite, Highlighting, Suchen in der Seite, Suchen innerhalb eines Suchergebnisses usw., alles verstaut in einer eigenen Befehlsleiste innerhalb des Browsers. Ähnlich wie RealNames bietet Google mit dem Produkt "AdWords" den Aufkauf von Suchtermen an. Nach einer Reihe von nunmehr vier Password-Artikeln über InternetSuchwerkzeugen im Vergleich wollen wir abschließend zu einer Bewertung kommen. Wie ist der Stand der Technik bei Directories und Search Engines aus informationswissenschaftlicher Sicht einzuschätzen? Werden die "typischen" Internetnutzer, die ja in der Regel keine Information Professionals sind, adäquat bedient? Und können auch Informationsfachleute von den Suchwerkzeugen profitieren?
  6. Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.01
    0.009484224 = product of:
      0.028452672 = sum of:
        0.010709076 = weight(_text_:in in 2717) [ClassicSimilarity], result of:
          0.010709076 = score(doc=2717,freq=8.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.18034597 = fieldWeight in 2717, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=2717)
        0.017743597 = product of:
          0.035487194 = sum of:
            0.035487194 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
              0.035487194 = score(doc=2717,freq=2.0), product of:
                0.15286934 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043654136 = queryNorm
                0.23214069 = fieldWeight in 2717, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2717)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Previous work an search-engine design has indicated that information-seekers may benefit from being given the opportunity to exploit multiple sources of evidence of document relatedness. Few existing systems, however, give users more than minimal control over the selections that may be made among methods of exploitation. By applying the methods of "document network analysis" (DNA), a unifying, graph-theoretic model of content-, collaboration-, and context-based systems (CCC) may be developed in which the nature of the similarities between types of document relatedness and document ranking are clarified. The usefulness of the approach to system design suggested by this model may be tested by constructing and evaluating a prototype system (UCXtra) that allows searchers to maintain control over the multiple ways in which document collections may be ranked and re-ranked.
    Date
    11. 9.2004 17:32:22
    Series
    Advances in knowledge organization; vol.8
    Source
    Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas
  7. Lanvent, A.: Licht im Daten Chaos (2004) 0.01
    0.009443581 = product of:
      0.028330743 = sum of:
        0.0071393843 = weight(_text_:in in 2806) [ClassicSimilarity], result of:
          0.0071393843 = score(doc=2806,freq=8.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.120230645 = fieldWeight in 2806, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.03125 = fieldNorm(doc=2806)
        0.021191359 = weight(_text_:und in 2806) [ClassicSimilarity], result of:
          0.021191359 = score(doc=2806,freq=10.0), product of:
            0.09675359 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.043654136 = queryNorm
            0.219024 = fieldWeight in 2806, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.03125 = fieldNorm(doc=2806)
      0.33333334 = coord(2/6)
    
    Content
    "Bitte suchen Sie alle Unterlagen, die im PC zum Ibelshäuser-Vertrag in Sprockhövel gespeichert sind. Finden Sie alles, was wir haben - Dokumente, Tabellen, Präsentationen, Scans, E-Mails. Und erledigen Sie das gleich! « Wer diese Aufgabe an das Windows-eigene Suchmodul vergibt, wird zwangsläufig enttäuscht. Denn das Betriebssystem beherrscht weder die formatübergreifende Recherche noch die Kontextsuche, die für solche komplexen Aufträge nötig sind. Professionelle Desktop-Suchmaschinen erledigen Aufgaben dieser Art jedoch im Handumdrehen - genauer gesagt in einer einzigen Sekunde. Spitzenprogramme wie Global Brain benötigen dafür nicht einmal umfangreiche Abfrageformulare. Es genügt, einen Satz im Eingabefeld zu formulieren, der das Thema der gewünschten Dokumente eingrenzt. Dabei suchen die Programme über alle Laufwerke, die sich auf dem System einbinden lassen - also auch im Netzwerk-Ordner (Shared Folder), sofern dieser freigegeben wurde. Allen Testkandidaten - mit Ausnahme von Search 32 - gemeinsam ist, dass sie weitaus bessere Rechercheergebnisse abliefern als Windows, deutlich schneller arbeiten und meist auch in den Online-Postfächern stöbern. Wer schon öfter vergeblich über die Windows-Suche nach wichtigen Dokumenten gefahndet hat, kommt angesichts der Qualität der Search-Engines kaum mehr um die Anschaffung eines Desktop-Suchtools herum. Aber Microsoft will nachbessern. Für den Windows-XP-Nachfolger Longhorn wirbt der Hersteller vor allem mit dem Hinweis auf das neue Dateisystem WinFS, das sämtliche Files auf der Festplatte über Meta-Tags indiziert und dem Anwender damit lange Suchläufe erspart. So sollen sich anders als bei Windows XP alle Dateien zu bestimmten Themen in wenigen Sekunden auflisten lassen - unabhängig vom Format und vom physikalischen Speicherort der Files. Für die Recherche selbst ist dann weder der Dateiname noch das Erstelldatum ausschlaggebend. Anhand der kontextsensitiven Suche von WinFS kann der Anwender einfach einen Suchbefehl wie »Vertragsabschluss mit Firma XYZ, Neunkirchen/Saar« eingeben, der dann ohne Umwege zum Ziel führt."
    Footnote
    Darin auch 2 Teilbeiträge: (1) Know-how - Suchverfahren; (2) Praxis - Windows-Suche und Indexdienst
  8. Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (1999) 0.01
    0.00922545 = product of:
      0.02767635 = sum of:
        0.0075724614 = weight(_text_:in in 5777) [ClassicSimilarity], result of:
          0.0075724614 = score(doc=5777,freq=4.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.12752387 = fieldWeight in 5777, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=5777)
        0.020103889 = weight(_text_:und in 5777) [ClassicSimilarity], result of:
          0.020103889 = score(doc=5777,freq=4.0), product of:
            0.09675359 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.043654136 = queryNorm
            0.20778441 = fieldWeight in 5777, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.046875 = fieldNorm(doc=5777)
      0.33333334 = coord(2/6)
    
    Abstract
    This book discusses many of the key design issues for building search engines and emphazises the important role that applied mathematics can play in improving information retrieval. The authors discuss not only important data structures, algorithms, and software but also user-centered issues such as interfaces, manual indexing, and document preparation. They also present some of the current problems in information retrieval that many not be familiar to applied mathematicians and computer scientists and some of the driving computational methods (SVD, SDD) for automated conceptual indexing
    Classification
    ST 230 [Informatik # Monographien # Software und -entwicklung # Software allgemein, (Einführung, Lehrbücher, Methoden der Programmierung) Software engineering, Programmentwicklungssysteme, Softwarewerkzeuge]
    RVK
    ST 230 [Informatik # Monographien # Software und -entwicklung # Software allgemein, (Einführung, Lehrbücher, Methoden der Programmierung) Software engineering, Programmentwicklungssysteme, Softwarewerkzeuge]
  9. Chakrabarti, S.; Dom, B.; Kumar, S.R.; Raghavan, P.; Rajagopalan, S.; Tomkins, A.; Kleinberg, J.M.; Gibson, D.: Neue Pfade durch den Internet-Dschungel : Die zweite Generation von Web-Suchmaschinen (1999) 0.01
    0.008697838 = product of:
      0.02609351 = sum of:
        0.0071393843 = weight(_text_:in in 3) [ClassicSimilarity], result of:
          0.0071393843 = score(doc=3,freq=2.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.120230645 = fieldWeight in 3, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0625 = fieldNorm(doc=3)
        0.018954126 = weight(_text_:und in 3) [ClassicSimilarity], result of:
          0.018954126 = score(doc=3,freq=2.0), product of:
            0.09675359 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.043654136 = queryNorm
            0.19590102 = fieldWeight in 3, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.0625 = fieldNorm(doc=3)
      0.33333334 = coord(2/6)
    
    Abstract
    Die im WWW verfügbare Datenmenge wächst mit atemberaubender Geschwindigkeit; entsprechend schwieriger wird es, relevante Informationen zu finden. ein neues Analyseverfahren stellt nahezu automatische Abhilfe in Aussicht
    Content
    Ausnutzen der Hyperlinks für verbesserte Such- und Findeverfahren; Darstellung des HITS-Algorithmus
  10. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.01
    0.006900288 = product of:
      0.04140173 = sum of:
        0.04140173 = product of:
          0.08280346 = sum of:
            0.08280346 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.08280346 = score(doc=3445,freq=2.0), product of:
                0.15286934 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043654136 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Date
    25. 8.2005 17:42:22
  11. Thelwall, M.: Can Google's PageRank be used to find the most important academic Web pages? (2003) 0.00
    0.0021859813 = product of:
      0.013115887 = sum of:
        0.013115887 = weight(_text_:in in 4457) [ClassicSimilarity], result of:
          0.013115887 = score(doc=4457,freq=12.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.22087781 = fieldWeight in 4457, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=4457)
      0.16666667 = coord(1/6)
    
    Abstract
    Google's PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems in order to test whether it is capable of identifying the most important Web pages. The results are also compared with simple inlink counts. It was discovered that the highest inlinked pages do not always have the highest PageRank, indicating that the two metrics are genuinely different, even for the top pages. More significantly, however, internal links dominated external links for the high ranks in either method and superficial reasons accounted for high scores in both cases. It is concluded that PageRank is not useful for identifying the top pages in a site and that it must be combined with a powerful text matching techniques in order to get the quality of information retrieval results provided by Google.
  12. Ding, Y.; Yan, E.; Frazho, A.; Caverlee, J.: PageRank for ranking authors in co-citation networks (2009) 0.00
    0.0021859813 = product of:
      0.013115887 = sum of:
        0.013115887 = weight(_text_:in in 3161) [ClassicSimilarity], result of:
          0.013115887 = score(doc=3161,freq=12.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.22087781 = fieldWeight in 3161, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=3161)
      0.16666667 = coord(1/6)
    
    Abstract
    This paper studies how varied damping factors in the PageRank algorithm influence the ranking of authors and proposes weighted PageRank algorithms. We selected the 108 most highly cited authors in the information retrieval (IR) area from the 1970s to 2008 to form the author co-citation network. We calculated the ranks of these 108 authors based on PageRank with the damping factor ranging from 0.05 to 0.95. In order to test the relationship between different measures, we compared PageRank and weighted PageRank results with the citation ranking, h-index, and centrality measures. We found that in our author co-citation network, citation rank is highly correlated with PageRank with different damping factors and also with different weighted PageRank algorithms; citation rank and PageRank are not significantly correlated with centrality measures; and h-index rank does not significantly correlate with centrality measures but does significantly correlate with other measures. The key factors that have impact on the PageRank of authors in the author co-citation network are being co-cited with important authors.
  13. Bauckhage, C.: Marginalizing over the PageRank damping factor (2014) 0.00
    0.0021034614 = product of:
      0.012620768 = sum of:
        0.012620768 = weight(_text_:in in 928) [ClassicSimilarity], result of:
          0.012620768 = score(doc=928,freq=4.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.21253976 = fieldWeight in 928, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.078125 = fieldNorm(doc=928)
      0.16666667 = coord(1/6)
    
    Abstract
    In this note, we show how to marginalize over the damping parameter of the PageRank equation so as to obtain a parameter-free version known as TotalRank. Our discussion is meant as a reference and intended to provide a guided tour towards an interesting result that has applications in information retrieval and classification.
  14. Agosti, M.; Pretto, L.: ¬A theoretical study of a generalized version of kleinberg's HITS algorithm (2005) 0.00
    0.0021034614 = product of:
      0.012620768 = sum of:
        0.012620768 = weight(_text_:in in 4) [ClassicSimilarity], result of:
          0.012620768 = score(doc=4,freq=16.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.21253976 = fieldWeight in 4, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4)
      0.16666667 = coord(1/6)
    
    Abstract
    Kleinberg's HITS (Hyperlink-Induced Topic Search) algorithm (Kleinberg 1999), which was originally developed in a Web context, tries to infer the authoritativeness of a Web page in relation to a specific query using the structure of a subgraph of the Web graph, which is obtained considering this specific query. Recent applications of this algorithm in contexts far removed from that of Web searching (Bacchin, Ferro and Melucci 2002, Ng et al. 2001) inspired us to study the algorithm in the abstract, independently of its particular applications, trying to mathematically illuminate its behaviour. In the present paper we detail this theoretical analysis. The original work starts from the definition of a revised and more general version of the algorithm, which includes the classic one as a particular case. We perform an analysis of the structure of two particular matrices, essential to studying the behaviour of the algorithm, and we prove the convergence of the algorithm in the most general case, finding the analytic expression of the vectors to which it converges. Then we study the symmetry of the algorithm and prove the equivalence between the existence of symmetry and the independence from the order of execution of some basic operations on initial vectors. Finally, we expound some interesting consequences of our theoretical results.
    Source
    Advances in mathematical/formal methods in information retrieval. 8(2005) no.2 , S.219-243
  15. Finding anything in the billion page Web : are algorithms the key? (1999) 0.00
    0.0020823204 = product of:
      0.012493922 = sum of:
        0.012493922 = weight(_text_:in in 6248) [ClassicSimilarity], result of:
          0.012493922 = score(doc=6248,freq=2.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.21040362 = fieldWeight in 6248, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.109375 = fieldNorm(doc=6248)
      0.16666667 = coord(1/6)
    
  16. Ding, Y.; Chowdhury, G.; Foo, S.: Organsising keywords in a Web search environment : a methodology based on co-word analysis (2000) 0.00
    0.0019955188 = product of:
      0.011973113 = sum of:
        0.011973113 = weight(_text_:in in 105) [ClassicSimilarity], result of:
          0.011973113 = score(doc=105,freq=10.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.20163295 = fieldWeight in 105, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=105)
      0.16666667 = coord(1/6)
    
    Abstract
    The rapid development of the Internet and World Wide Web has caused some critical problem for information retrieval. Researchers have made several attempts to solve these problems. Thesauri and subject heading lists as traditional information retrieval tools have been criticised for their efficiency to tackle these newly emerging problems. This paper proposes an information retrieval tool generated by cocitation analysis, comprising keyword clusters with relationships based on the co-occurrences of keywords in the literature. Such a tool can play the role of an associative thesaurus that can provide information about the keywords in a domain that might be useful for information searching and query expansion
    Series
    Advances in knowledge organization; vol.7
    Source
    Dynamism and stability in knowledge organization: Proceedings of the 6th International ISKO-Conference, 10-13 July 2000, Toronto, Canada. Ed.: C. Beghtol et al
  17. Wills, R.S.: Google's PageRank : the math behind the search engine (2006) 0.00
    0.0019732218 = product of:
      0.01183933 = sum of:
        0.01183933 = weight(_text_:in in 5954) [ClassicSimilarity], result of:
          0.01183933 = score(doc=5954,freq=22.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.19937998 = fieldWeight in 5954, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.03125 = fieldNorm(doc=5954)
      0.16666667 = coord(1/6)
    
    Abstract
    Approximately 91 million American adults use the Internet on a typical day The number-one Internet activity is reading and writing e-mail. Search engine use is next in line and continues to increase in popularity. In fact, survey findings indicate that nearly 60 million American adults use search engines on a given day. Even though there are many Internet search engines, Google, Yahoo!, and MSN receive over 81% of all search requests. Despite claims that the quality of search provided by Yahoo! and MSN now equals that of Google, Google continues to thrive as the search engine of choice, receiving over 46% of all search requests, nearly double the volume of Yahoo! and over four times that of MSN. I use Google's search engine on a daily basis and rarely request information from other search engines. One day, I decided to visit the homepages of Google. Yahoo!, and MSN to compare the quality of search results. Coffee was on my mind that day, so I entered the simple query "coffee" in the search box at each homepage. Table 1 shows the top ten (unsponsored) results returned by each search engine. Although ordered differently, two webpages, www.peets.com and www.coffeegeek.com, appear in all three top ten lists. In addition, each pairing of top ten lists has two additional results in common. Depending on the information I hoped to obtain about coffee by using the search engines, I could argue that any one of the three returned better results: however, I was not looking for a particular webpage, so all three listings of search results seemed of equal quality. Thus, I plan to continue using Google. My decision is indicative of the problem Yahoo!, MSN, and other search engine companies face in the quest to obtain a larger percentage of Internet search volume. Search engine users are loyal to one or a few search engines and are generally happy with search results. Thus, as long as Google continues to provide results deemed high in quality, Google likely will remain the top search engine. But what set Google apart from its competitors in the first place? The answer is PageRank. In this article I explain this simple mathematical algorithm that revolutionized Web search.
  18. Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.00
    0.0019478323 = product of:
      0.011686994 = sum of:
        0.011686994 = weight(_text_:in in 93) [ClassicSimilarity], result of:
          0.011686994 = score(doc=93,freq=28.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.19681457 = fieldWeight in 93, product of:
              5.2915025 = tf(freq=28.0), with freq of:
                28.0 = termFreq=28.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
      0.16666667 = coord(1/6)
    
    Abstract
    Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.
  19. Bar-Ilan, J.; Levene, M.; Mat-Hassan, M.: Methods for evaluating dynamic changes in search engine rankings : a case study (2006) 0.00
    0.0018813931 = product of:
      0.011288359 = sum of:
        0.011288359 = weight(_text_:in in 616) [ClassicSimilarity], result of:
          0.011288359 = score(doc=616,freq=20.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.19010136 = fieldWeight in 616, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.03125 = fieldNorm(doc=616)
      0.16666667 = coord(1/6)
    
    Abstract
    Purpose - The objective of this paper is to characterize the changes in the rankings of the top ten results of major search engines over time and to compare the rankings between these engines. Design/methodology/approach - The papers compare rankings of the top-ten results of the search engines Google and AlltheWeb on ten identical queries over a period of three weeks. Only the top-ten results were considered, since users do not normally inspect more than the first results page returned by a search engine. The experiment was repeated twice, in October 2003 and in January 2004, in order to assess changes to the top-ten results of some of the queries during the three months interval. In order to assess the changes in the rankings, three measures were computed for each data collection point and each search engine. Findings - The findings in this paper show that the rankings of AlltheWeb were highly stable over each period, while the rankings of Google underwent constant yet minor changes, with occasional major ones. Changes over time can be explained by the dynamic nature of the web or by fluctuations in the search engines' indexes. The top-ten results of the two search engines had surprisingly low overlap. With such small overlap, the task of comparing the rankings of the two engines becomes extremely challenging. Originality/value - The paper shows that because of the abundance of information on the web, ranking search results is of extreme importance. The paper compares several measures for computing the similarity between rankings of search tools, and shows that none of the measures is fully satisfactory as a standalone measure. It also demonstrates the apparent differences in the ranking algorithms of two widely used search engines.
  20. Meghabghab, G.: Google's Web page ranking applied to different topological Web graph structures (2001) 0.00
    0.001821651 = product of:
      0.010929906 = sum of:
        0.010929906 = weight(_text_:in in 6028) [ClassicSimilarity], result of:
          0.010929906 = score(doc=6028,freq=12.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.18406484 = fieldWeight in 6028, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6028)
      0.16666667 = coord(1/6)
    
    Abstract
    This research is part of the ongoing study to better understand web page ranking on the web. It looks at a web page as a graph structure or a web graph, and tries to classify different web graphs in the new coordinate space: (out-degree, in-degree). The out-degree coordinate od is defined as the number of outgoing web pages from a given web page. The in-degree id coordinate is the number of web pages that point to a given web page. In this new coordinate space a metric is built to classify how close or far different web graphs are. Google's web ranking algorithm (Brin & Page, 1998) on ranking web pages is applied in this new coordinate space. The results of the algorithm has been modified to fit different topological web graph structures. Also the algorithm was not successful in the case of general web graphs and new ranking web algorithms have to be considered. This study does not look at enhancing web ranking by adding any contextual information. It only considers web links as a source to web page ranking. The author believes that understanding the underlying web page as a graph will help design better ranking web algorithms, enhance retrieval and web performance, and recommends using graphs as a part of visual aid for browsing engine designers

Languages

  • e 24
  • d 10

Types

  • a 28
  • m 3
  • el 2
  • r 1
  • x 1
  • More… Less…