Document (#31993)

Author
Lehrke, C.
Title
Architektur von Suchmaschinen : Googles Architektur, insb. Crawler und Indizierer
Imprint
Münster : Institut für Wirtschaftsinformatik der Westfälische Wilhelms-Universität Münster
Year
2005
Pages
22 S
Abstract
Das Internet mit seinen ständig neuen Usern und seinem extremen Wachstum bringt viele neue Herausforderungen mit sich. Aufgrund dieses Wachstums bedienen sich die meisten Leute der Hilfe von Suchmaschinen um Inhalte innerhalb des Internet zu finden. Suchmaschinen nutzen für die Beantwortung der User-Anfragen Information Retrieval Techniken. Problematisch ist nur, dass traditionelle Information Retrieval (IR) Systeme für eine relativ kleine und zusammenhängende Sammlung von Dokumenten entwickelt wurden. Das Internet hingegen unterliegt einem ständigen Wachstum, schnellen Änderungsraten und es ist über geographisch verteilte Computer verteilt. Aufgrund dieser Tatsachen müssen die alten Techniken erweitert oder sogar neue IRTechniken entwickelt werden. Eine Suchmaschine die diesen Herausforderungen vergleichsweise erfolgreich entgegnet ist Google. Ziel dieser Arbeit ist es aufzuzeigen, wie Suchmaschinen funktionieren. Der Fokus liegt dabei auf der Suchmaschine Google. Kapitel 2 wird sich zuerst mit dem Aufbau von Suchmaschinen im Allgemeinen beschäftigen, wodurch ein grundlegendes Verständnis für die einzelnen Komponenten geschaffen werden soll. Im zweiten Teil des Kapitels wird darauf aufbauend ein Überblick über die Architektur von Google gegeben. Kapitel 3 und 4 dienen dazu, näher auf die beiden Komponenten Crawler und Indexer einzugehen, bei denen es sich um zentrale Elemente im Rahmen von Suchmaschinen handelt.
Content
Ausarbeitung im Rahmen des Seminars Suchmaschinen und Suchalgorithmen, Institut für Wirtschaftsinformatik Praktische Informatik in der Wirtschaft, Westfälische Wilhelms-Universität Münster. - Vgl.: http://www-wi.uni-muenster.de/pi/lehre/ss05/seminarSuchen/Ausarbeitungen/ChristophLehrke.pdf
Theme
Suchmaschinen
Object
Google

Similar documents (content)

  1. Wyss, V.; Keel, G.: Google als Trojanisches Pferd? : Konsequenzen der Internet-Recherche von Journalisten für die journalistische Qualität (2007) 0.23
    0.2299153 = sum of:
      0.2299153 = product of:
        0.8211261 = sum of:
          0.11440588 = weight(abstract_txt:kapitels in 385) [ClassicSimilarity], result of:
            0.11440588 = score(doc=385,freq=1.0), product of:
              0.16291264 = queryWeight, product of:
                1.0551723 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.01717623 = queryNorm
              0.7022529 = fieldWeight in 385, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.078125 = fieldNorm(doc=385)
          0.035892807 = weight(abstract_txt:neue in 385) [ClassicSimilarity], result of:
            0.035892807 = score(doc=385,freq=1.0), product of:
              0.09476999 = queryWeight, product of:
                1.138142 = boost
                4.8478208 = idf(docFreq=942, maxDocs=44218)
                0.01717623 = queryNorm
              0.378736 = fieldWeight in 385, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8478208 = idf(docFreq=942, maxDocs=44218)
                0.078125 = fieldNorm(doc=385)
          0.03460321 = weight(abstract_txt:internet in 385) [ClassicSimilarity], result of:
            0.03460321 = score(doc=385,freq=2.0), product of:
              0.084029265 = queryWeight, product of:
                1.3125684 = boost
                3.7271836 = idf(docFreq=2891, maxDocs=44218)
                0.01717623 = queryNorm
              0.4117995 = fieldWeight in 385, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7271836 = idf(docFreq=2891, maxDocs=44218)
                0.078125 = fieldNorm(doc=385)
          0.083315104 = weight(abstract_txt:suchmaschine in 385) [ClassicSimilarity], result of:
            0.083315104 = score(doc=385,freq=1.0), product of:
              0.16614287 = queryWeight, product of:
                1.5069605 = boost
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.01717623 = queryNorm
              0.50146663 = fieldWeight in 385, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.078125 = fieldNorm(doc=385)
          0.09361219 = weight(abstract_txt:kapitel in 385) [ClassicSimilarity], result of:
            0.09361219 = score(doc=385,freq=1.0), product of:
              0.17956464 = queryWeight, product of:
                1.566648 = boost
                6.6730065 = idf(docFreq=151, maxDocs=44218)
                0.01717623 = queryNorm
              0.5213286 = fieldWeight in 385, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6730065 = idf(docFreq=151, maxDocs=44218)
                0.078125 = fieldNorm(doc=385)
          0.028394913 = weight(abstract_txt:sich in 385) [ClassicSimilarity], result of:
            0.028394913 = score(doc=385,freq=1.0), product of:
              0.102133825 = queryWeight, product of:
                1.6709399 = boost
                3.5586145 = idf(docFreq=3422, maxDocs=44218)
                0.01717623 = queryNorm
              0.27801675 = fieldWeight in 385, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5586145 = idf(docFreq=3422, maxDocs=44218)
                0.078125 = fieldNorm(doc=385)
          0.43090203 = weight(abstract_txt:suchmaschinen in 385) [ClassicSimilarity], result of:
            0.43090203 = score(doc=385,freq=5.0), product of:
              0.41908488 = queryWeight, product of:
                4.145459 = boost
                5.885746 = idf(docFreq=333, maxDocs=44218)
                0.01717623 = queryNorm
              1.0281975 = fieldWeight in 385, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.885746 = idf(docFreq=333, maxDocs=44218)
                0.078125 = fieldNorm(doc=385)
        0.28 = coord(7/25)
    
  2. Sadrozinski, J.: Suchmaschinen und öffentlich-rechtlicher Onlinejournalismus am Beispiel tagesschau.de (2007) 0.19
    0.19327042 = sum of:
      0.19327042 = product of:
        0.69025147 = sum of:
          0.021115432 = weight(abstract_txt:dieser in 375) [ClassicSimilarity], result of:
            0.021115432 = score(doc=375,freq=1.0), product of:
              0.07720982 = queryWeight, product of:
                1.0273 = boost
                4.3756986 = idf(docFreq=1511, maxDocs=44218)
                0.01717623 = queryNorm
              0.27348116 = fieldWeight in 375, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3756986 = idf(docFreq=1511, maxDocs=44218)
                0.0625 = fieldNorm(doc=375)
          0.0915247 = weight(abstract_txt:kapitels in 375) [ClassicSimilarity], result of:
            0.0915247 = score(doc=375,freq=1.0), product of:
              0.16291264 = queryWeight, product of:
                1.0551723 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.01717623 = queryNorm
              0.5618023 = fieldWeight in 375, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.0625 = fieldNorm(doc=375)
          0.01957453 = weight(abstract_txt:internet in 375) [ClassicSimilarity], result of:
            0.01957453 = score(doc=375,freq=1.0), product of:
              0.084029265 = queryWeight, product of:
                1.3125684 = boost
                3.7271836 = idf(docFreq=2891, maxDocs=44218)
                0.01717623 = queryNorm
              0.23294897 = fieldWeight in 375, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7271836 = idf(docFreq=2891, maxDocs=44218)
                0.0625 = fieldNorm(doc=375)
          0.06665208 = weight(abstract_txt:suchmaschine in 375) [ClassicSimilarity], result of:
            0.06665208 = score(doc=375,freq=1.0), product of:
              0.16614287 = queryWeight, product of:
                1.5069605 = boost
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.01717623 = queryNorm
              0.4011733 = fieldWeight in 375, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.0625 = fieldNorm(doc=375)
          0.03934515 = weight(abstract_txt:sich in 375) [ClassicSimilarity], result of:
            0.03934515 = score(doc=375,freq=3.0), product of:
              0.102133825 = queryWeight, product of:
                1.6709399 = boost
                3.5586145 = idf(docFreq=3422, maxDocs=44218)
                0.01717623 = queryNorm
              0.38523132 = fieldWeight in 375, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.5586145 = idf(docFreq=3422, maxDocs=44218)
                0.0625 = fieldNorm(doc=375)
          0.18501936 = weight(abstract_txt:google in 375) [ClassicSimilarity], result of:
            0.18501936 = score(doc=375,freq=10.0), product of:
              0.17436036 = queryWeight, product of:
                1.8907344 = boost
                5.3689504 = idf(docFreq=559, maxDocs=44218)
                0.01717623 = queryNorm
              1.061132 = fieldWeight in 375, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                5.3689504 = idf(docFreq=559, maxDocs=44218)
                0.0625 = fieldNorm(doc=375)
          0.26702023 = weight(abstract_txt:suchmaschinen in 375) [ClassicSimilarity], result of:
            0.26702023 = score(doc=375,freq=3.0), product of:
              0.41908488 = queryWeight, product of:
                4.145459 = boost
                5.885746 = idf(docFreq=333, maxDocs=44218)
                0.01717623 = queryNorm
              0.6371507 = fieldWeight in 375, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.885746 = idf(docFreq=333, maxDocs=44218)
                0.0625 = fieldNorm(doc=375)
        0.28 = coord(7/25)
    
  3. tz: Mein Freund Google und ich (2006) 0.18
    0.17710593 = sum of:
      0.17710593 = product of:
        0.6325212 = sum of:
          0.021115432 = weight(abstract_txt:dieser in 2144) [ClassicSimilarity], result of:
            0.021115432 = score(doc=2144,freq=1.0), product of:
              0.07720982 = queryWeight, product of:
                1.0273 = boost
                4.3756986 = idf(docFreq=1511, maxDocs=44218)
                0.01717623 = queryNorm
              0.27348116 = fieldWeight in 2144, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3756986 = idf(docFreq=1511, maxDocs=44218)
                0.0625 = fieldNorm(doc=2144)
          0.042607382 = weight(abstract_txt:entwickelt in 2144) [ClassicSimilarity], result of:
            0.042607382 = score(doc=2144,freq=1.0), product of:
              0.12329036 = queryWeight, product of:
                1.2981521 = boost
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.01717623 = queryNorm
              0.34558567 = fieldWeight in 2144, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.0625 = fieldNorm(doc=2144)
          0.027682565 = weight(abstract_txt:internet in 2144) [ClassicSimilarity], result of:
            0.027682565 = score(doc=2144,freq=2.0), product of:
              0.084029265 = queryWeight, product of:
                1.3125684 = boost
                3.7271836 = idf(docFreq=2891, maxDocs=44218)
                0.01717623 = queryNorm
              0.32943958 = fieldWeight in 2144, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7271836 = idf(docFreq=2891, maxDocs=44218)
                0.0625 = fieldNorm(doc=2144)
          0.115444794 = weight(abstract_txt:suchmaschine in 2144) [ClassicSimilarity], result of:
            0.115444794 = score(doc=2144,freq=3.0), product of:
              0.16614287 = queryWeight, product of:
                1.5069605 = boost
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.01717623 = queryNorm
              0.69485253 = fieldWeight in 2144, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.0625 = fieldNorm(doc=2144)
          0.03212518 = weight(abstract_txt:sich in 2144) [ClassicSimilarity], result of:
            0.03212518 = score(doc=2144,freq=2.0), product of:
              0.102133825 = queryWeight, product of:
                1.6709399 = boost
                3.5586145 = idf(docFreq=3422, maxDocs=44218)
                0.01717623 = queryNorm
              0.31454006 = fieldWeight in 2144, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5586145 = idf(docFreq=3422, maxDocs=44218)
                0.0625 = fieldNorm(doc=2144)
          0.17552479 = weight(abstract_txt:google in 2144) [ClassicSimilarity], result of:
            0.17552479 = score(doc=2144,freq=9.0), product of:
              0.17436036 = queryWeight, product of:
                1.8907344 = boost
                5.3689504 = idf(docFreq=559, maxDocs=44218)
                0.01717623 = queryNorm
              1.0066782 = fieldWeight in 2144, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.3689504 = idf(docFreq=559, maxDocs=44218)
                0.0625 = fieldNorm(doc=2144)
          0.2180211 = weight(abstract_txt:suchmaschinen in 2144) [ClassicSimilarity], result of:
            0.2180211 = score(doc=2144,freq=2.0), product of:
              0.41908488 = queryWeight, product of:
                4.145459 = boost
                5.885746 = idf(docFreq=333, maxDocs=44218)
                0.01717623 = queryNorm
              0.52023137 = fieldWeight in 2144, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.885746 = idf(docFreq=333, maxDocs=44218)
                0.0625 = fieldNorm(doc=2144)
        0.28 = coord(7/25)
    
  4. Sietmann, R.: Suchmaschine für das akademische Internet (2004) 0.17
    0.16905631 = sum of:
      0.16905631 = product of:
        0.528301 = sum of:
          0.018476004 = weight(abstract_txt:dieser in 5742) [ClassicSimilarity], result of:
            0.018476004 = score(doc=5742,freq=1.0), product of:
              0.07720982 = queryWeight, product of:
                1.0273 = boost
                4.3756986 = idf(docFreq=1511, maxDocs=44218)
                0.01717623 = queryNorm
              0.23929602 = fieldWeight in 5742, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3756986 = idf(docFreq=1511, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5742)
          0.03728146 = weight(abstract_txt:entwickelt in 5742) [ClassicSimilarity], result of:
            0.03728146 = score(doc=5742,freq=1.0), product of:
              0.12329036 = queryWeight, product of:
                1.2981521 = boost
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.01717623 = queryNorm
              0.30238748 = fieldWeight in 5742, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5742)
          0.017127715 = weight(abstract_txt:internet in 5742) [ClassicSimilarity], result of:
            0.017127715 = score(doc=5742,freq=1.0), product of:
              0.084029265 = queryWeight, product of:
                1.3125684 = boost
                3.7271836 = idf(docFreq=2891, maxDocs=44218)
                0.01717623 = queryNorm
              0.20383035 = fieldWeight in 5742, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7271836 = idf(docFreq=2891, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5742)
          0.05832057 = weight(abstract_txt:suchmaschine in 5742) [ClassicSimilarity], result of:
            0.05832057 = score(doc=5742,freq=1.0), product of:
              0.16614287 = queryWeight, product of:
                1.5069605 = boost
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.01717623 = queryNorm
              0.35102662 = fieldWeight in 5742, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5742)
          0.04444507 = weight(abstract_txt:sich in 5742) [ClassicSimilarity], result of:
            0.04444507 = score(doc=5742,freq=5.0), product of:
              0.102133825 = queryWeight, product of:
                1.6709399 = boost
                3.5586145 = idf(docFreq=3422, maxDocs=44218)
                0.01717623 = queryNorm
              0.43516505 = fieldWeight in 5742, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.5586145 = idf(docFreq=3422, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5742)
          0.07240027 = weight(abstract_txt:google in 5742) [ClassicSimilarity], result of:
            0.07240027 = score(doc=5742,freq=2.0), product of:
              0.17436036 = queryWeight, product of:
                1.8907344 = boost
                5.3689504 = idf(docFreq=559, maxDocs=44218)
                0.01717623 = queryNorm
              0.41523355 = fieldWeight in 5742, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3689504 = idf(docFreq=559, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5742)
          0.14535624 = weight(abstract_txt:architektur in 5742) [ClassicSimilarity], result of:
            0.14535624 = score(doc=5742,freq=1.0), product of:
              0.34961233 = queryWeight, product of:
                2.6773183 = boost
                7.602543 = idf(docFreq=59, maxDocs=44218)
                0.01717623 = queryNorm
              0.41576406 = fieldWeight in 5742, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.602543 = idf(docFreq=59, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5742)
          0.13489367 = weight(abstract_txt:suchmaschinen in 5742) [ClassicSimilarity], result of:
            0.13489367 = score(doc=5742,freq=1.0), product of:
              0.41908488 = queryWeight, product of:
                4.145459 = boost
                5.885746 = idf(docFreq=333, maxDocs=44218)
                0.01717623 = queryNorm
              0.32187673 = fieldWeight in 5742, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.885746 = idf(docFreq=333, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5742)
        0.32 = coord(8/25)
    
  5. Wolf, S.: Konkurrenz bei der wissenschaftlichen Recherche (2005) 0.16
    0.15729254 = sum of:
      0.15729254 = product of:
        0.7864627 = sum of:
          0.041523848 = weight(abstract_txt:internet in 3256) [ClassicSimilarity], result of:
            0.041523848 = score(doc=3256,freq=2.0), product of:
              0.084029265 = queryWeight, product of:
                1.3125684 = boost
                3.7271836 = idf(docFreq=2891, maxDocs=44218)
                0.01717623 = queryNorm
              0.49415937 = fieldWeight in 3256, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7271836 = idf(docFreq=2891, maxDocs=44218)
                0.09375 = fieldNorm(doc=3256)
          0.09997812 = weight(abstract_txt:suchmaschine in 3256) [ClassicSimilarity], result of:
            0.09997812 = score(doc=3256,freq=1.0), product of:
              0.16614287 = queryWeight, product of:
                1.5069605 = boost
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.01717623 = queryNorm
              0.6017599 = fieldWeight in 3256, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.09375 = fieldNorm(doc=3256)
          0.04818777 = weight(abstract_txt:sich in 3256) [ClassicSimilarity], result of:
            0.04818777 = score(doc=3256,freq=2.0), product of:
              0.102133825 = queryWeight, product of:
                1.6709399 = boost
                3.5586145 = idf(docFreq=3422, maxDocs=44218)
                0.01717623 = queryNorm
              0.4718101 = fieldWeight in 3256, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5586145 = idf(docFreq=3422, maxDocs=44218)
                0.09375 = fieldNorm(doc=3256)
          0.19624266 = weight(abstract_txt:google in 3256) [ClassicSimilarity], result of:
            0.19624266 = score(doc=3256,freq=5.0), product of:
              0.17436036 = queryWeight, product of:
                1.8907344 = boost
                5.3689504 = idf(docFreq=559, maxDocs=44218)
                0.01717623 = queryNorm
              1.1255004 = fieldWeight in 3256, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.3689504 = idf(docFreq=559, maxDocs=44218)
                0.09375 = fieldNorm(doc=3256)
          0.40053034 = weight(abstract_txt:suchmaschinen in 3256) [ClassicSimilarity], result of:
            0.40053034 = score(doc=3256,freq=3.0), product of:
              0.41908488 = queryWeight, product of:
                4.145459 = boost
                5.885746 = idf(docFreq=333, maxDocs=44218)
                0.01717623 = queryNorm
              0.955726 = fieldWeight in 3256, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.885746 = idf(docFreq=333, maxDocs=44218)
                0.09375 = fieldNorm(doc=3256)
        0.2 = coord(5/25)