Document (#29335)

Author
Westermeyer, D.
Title
Adaptive Techniken zur Informationsgewinnung : der Webcrawler InfoSpiders
Imprint
Münster : Institut für Wirtschaftsinformatik der Westfälische Wilhelms-Universität Münster
Year
2005
Pages
22 S
Abstract
Die Suche nach Informationen im Internet führt den Nutzer meistens direkt zu einer Suchmaschine. Teile der gelieferten Ergebnisse enthalten aber manchmal nicht das, was der Nutzer gesucht hat. Hier setzen sog. adaptive Agenten an, welche die Gewohnheiten ihres Nutzers zu erlernen versuchen, um später auf Basis dessen selbstständig Entscheidungen zu treffen, ohne dass der Nutzer dazu befragt werden muss. Zunächst werden im Grundlagenteil adaptive Techniken zur Informationsgewinnung sowie die grundlegenden Eigenschaften von Webcrawlern besprochen. Im Hauptteil wird daraufhin der Webcrawler InfoSpiders erläutert. Dieses Programm arbeitet mit mehreren adaptiven Agenten, die parallel basierend auf einem Satz von Startlinks das Internet nach Informationen durchsuchen. Dabei bedienen sich die Agenten verschiedenster Techniken. Darunter fallen beispielsweise statistische Methoden, die den Inhalt von Webseiten untersuchen sowie neuronale Netze, mit denen der Inhalt bewertet wird. Eine andere Technik implementiert der genetische Algorithmus mit Hilfe dessen die Agenten Nachkommen mit neuen Mutationen erzeugen können. Danach wird eine konkrete Implementierung des InfoSpiders-Algorithmus' anhand von MySpiders verdeutlicht. Im Anschluss daran wird der InfoSpiders-Algorithmus sowie MySpiders einer Evaluation bezüglich des zusätzlichen Nutzens gegenüber herkömmlichen Suchmaschinen unterzogen. Eine Zusammenfassung mit Ausblick zu weiteren Entwicklungen in dem Bereich adaptiver Agenten zur Suche im Internet wird das Thema abschließen.
Content
Ausarbeitung im Rahmen des Seminars Suchmaschinen und Suchalgorithmen, Institut für Wirtschaftsinformatik Praktische Informatik in der Wirtschaft, Westfälische Wilhelms-Universität Münster. - Vgl.: http://www-wi.uni-muenster.de/pi/lehre/ss05/seminarSuchen/Ausarbeitungen/DenisWestermeyer.pdf
Theme
Suchmaschinen
Object
InfoSpider

Similar documents (content)

  1. Lahme, N.: Information Retrieval im Wissensmanagement : ein am Vorwissen orientierter Ansatz zur Komposition von Informationsressourcen (2004) 0.18
    0.18022166 = sum of:
      0.18022166 = product of:
        0.5631927 = sum of:
          0.029565984 = weight(abstract_txt:nach in 943) [ClassicSimilarity], result of:
            0.029565984 = score(doc=943,freq=2.0), product of:
              0.061169438 = queryWeight, product of:
                1.0106896 = boost
                4.374746 = idf(docFreq=1447, maxDocs=42306)
                0.013834513 = queryNorm
              0.4833457 = fieldWeight in 943, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.374746 = idf(docFreq=1447, maxDocs=42306)
                0.078125 = fieldNorm(doc=943)
          0.030757815 = weight(abstract_txt:informationen in 943) [ClassicSimilarity], result of:
            0.030757815 = score(doc=943,freq=1.0), product of:
              0.07912613 = queryWeight, product of:
                1.1495041 = boost
                4.9756007 = idf(docFreq=793, maxDocs=42306)
                0.013834513 = queryNorm
              0.3887188 = fieldWeight in 943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9756007 = idf(docFreq=793, maxDocs=42306)
                0.078125 = fieldNorm(doc=943)
          0.028659333 = weight(abstract_txt:eine in 943) [ClassicSimilarity], result of:
            0.028659333 = score(doc=943,freq=3.0), product of:
              0.059912436 = queryWeight, product of:
                1.2250525 = boost
                3.5350735 = idf(docFreq=3352, maxDocs=42306)
                0.013834513 = queryNorm
              0.47835365 = fieldWeight in 943, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.5350735 = idf(docFreq=3352, maxDocs=42306)
                0.078125 = fieldNorm(doc=943)
          0.044313356 = weight(abstract_txt:suche in 943) [ClassicSimilarity], result of:
            0.044313356 = score(doc=943,freq=1.0), product of:
              0.10093444 = queryWeight, product of:
                1.2982856 = boost
                5.619598 = idf(docFreq=416, maxDocs=42306)
                0.013834513 = queryNorm
              0.4390311 = fieldWeight in 943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.619598 = idf(docFreq=416, maxDocs=42306)
                0.078125 = fieldNorm(doc=943)
          0.061094366 = weight(abstract_txt:dessen in 943) [ClassicSimilarity], result of:
            0.061094366 = score(doc=943,freq=1.0), product of:
              0.12503082 = queryWeight, product of:
                1.4449708 = boost
                6.2545214 = idf(docFreq=220, maxDocs=42306)
                0.013834513 = queryNorm
              0.48863447 = fieldWeight in 943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2545214 = idf(docFreq=220, maxDocs=42306)
                0.078125 = fieldNorm(doc=943)
          0.053307343 = weight(abstract_txt:sowie in 943) [ClassicSimilarity], result of:
            0.053307343 = score(doc=943,freq=2.0), product of:
              0.1037277 = queryWeight, product of:
                1.6119202 = boost
                4.6514387 = idf(docFreq=1097, maxDocs=42306)
                0.013834513 = queryNorm
              0.5139162 = fieldWeight in 943, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6514387 = idf(docFreq=1097, maxDocs=42306)
                0.078125 = fieldNorm(doc=943)
          0.034271024 = weight(abstract_txt:wird in 943) [ClassicSimilarity], result of:
            0.034271024 = score(doc=943,freq=1.0), product of:
              0.11541998 = queryWeight, product of:
                2.1951342 = boost
                3.800634 = idf(docFreq=2570, maxDocs=42306)
                0.013834513 = queryNorm
              0.29692453 = fieldWeight in 943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.800634 = idf(docFreq=2570, maxDocs=42306)
                0.078125 = fieldNorm(doc=943)
          0.28122348 = weight(abstract_txt:algorithmus in 943) [ClassicSimilarity], result of:
            0.28122348 = score(doc=943,freq=2.0), product of:
              0.31434345 = queryWeight, product of:
                2.8060694 = boost
                8.097336 = idf(docFreq=34, maxDocs=42306)
                0.013834513 = queryNorm
              0.89463764 = fieldWeight in 943, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.097336 = idf(docFreq=34, maxDocs=42306)
                0.078125 = fieldNorm(doc=943)
        0.32 = coord(8/25)
    
  2. Voregger, M.: Angriff der Heinzelmännchen : Steuern in den Informationsfluten des Webs - mit Hilfe digitaler Agenten (1997) 0.18
    0.17515029 = sum of:
      0.17515029 = product of:
        2.1893787 = sum of:
          0.091149144 = weight(abstract_txt:internet in 380) [ClassicSimilarity], result of:
            0.091149144 = score(doc=380,freq=1.0), product of:
              0.06567311 = queryWeight, product of:
                1.2825963 = boost
                3.701125 = idf(docFreq=2839, maxDocs=42306)
                0.013834513 = queryNorm
              1.3879218 = fieldWeight in 380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.701125 = idf(docFreq=2839, maxDocs=42306)
                0.375 = fieldNorm(doc=380)
          2.0982296 = weight(abstract_txt:agenten in 380) [ClassicSimilarity], result of:
            2.0982296 = score(doc=380,freq=1.0), product of:
              0.63009226 = queryWeight, product of:
                5.1288815 = boost
                8.8800955 = idf(docFreq=15, maxDocs=42306)
                0.013834513 = queryNorm
              3.3300357 = fieldWeight in 380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.8800955 = idf(docFreq=15, maxDocs=42306)
                0.375 = fieldNorm(doc=380)
        0.08 = coord(2/25)
    
  3. Weiner, M.: ¬Die Agenten kommen (2002) 0.17
    0.1719276 = sum of:
      0.1719276 = product of:
        0.859638 = sum of:
          0.036210787 = weight(abstract_txt:nach in 735) [ClassicSimilarity], result of:
            0.036210787 = score(doc=735,freq=3.0), product of:
              0.061169438 = queryWeight, product of:
                1.0106896 = boost
                4.374746 = idf(docFreq=1447, maxDocs=42306)
                0.013834513 = queryNorm
              0.59197515 = fieldWeight in 735, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.374746 = idf(docFreq=1447, maxDocs=42306)
                0.078125 = fieldNorm(doc=735)
          0.030757815 = weight(abstract_txt:informationen in 735) [ClassicSimilarity], result of:
            0.030757815 = score(doc=735,freq=1.0), product of:
              0.07912613 = queryWeight, product of:
                1.1495041 = boost
                4.9756007 = idf(docFreq=793, maxDocs=42306)
                0.013834513 = queryNorm
              0.3887188 = fieldWeight in 735, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9756007 = idf(docFreq=793, maxDocs=42306)
                0.078125 = fieldNorm(doc=735)
          0.016546473 = weight(abstract_txt:eine in 735) [ClassicSimilarity], result of:
            0.016546473 = score(doc=735,freq=1.0), product of:
              0.059912436 = queryWeight, product of:
                1.2250525 = boost
                3.5350735 = idf(docFreq=3352, maxDocs=42306)
                0.013834513 = queryNorm
              0.27617761 = fieldWeight in 735, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5350735 = idf(docFreq=3352, maxDocs=42306)
                0.078125 = fieldNorm(doc=735)
          0.018989407 = weight(abstract_txt:internet in 735) [ClassicSimilarity], result of:
            0.018989407 = score(doc=735,freq=1.0), product of:
              0.06567311 = queryWeight, product of:
                1.2825963 = boost
                3.701125 = idf(docFreq=2839, maxDocs=42306)
                0.013834513 = queryNorm
              0.2891504 = fieldWeight in 735, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.701125 = idf(docFreq=2839, maxDocs=42306)
                0.078125 = fieldNorm(doc=735)
          0.7571335 = weight(abstract_txt:agenten in 735) [ClassicSimilarity], result of:
            0.7571335 = score(doc=735,freq=3.0), product of:
              0.63009226 = queryWeight, product of:
                5.1288815 = boost
                8.8800955 = idf(docFreq=15, maxDocs=42306)
                0.013834513 = queryNorm
              1.2016232 = fieldWeight in 735, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.8800955 = idf(docFreq=15, maxDocs=42306)
                0.078125 = fieldNorm(doc=735)
        0.2 = coord(5/25)
    
  4. Röscheisen, E.: Fin de such (2001) 0.17
    0.17171901 = sum of:
      0.17171901 = product of:
        1.0732439 = sum of:
          0.041812617 = weight(abstract_txt:nach in 497) [ClassicSimilarity], result of:
            0.041812617 = score(doc=497,freq=1.0), product of:
              0.061169438 = queryWeight, product of:
                1.0106896 = boost
                4.374746 = idf(docFreq=1447, maxDocs=42306)
                0.013834513 = queryNorm
              0.68355405 = fieldWeight in 497, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.374746 = idf(docFreq=1447, maxDocs=42306)
                0.15625 = fieldNorm(doc=497)
          0.08862671 = weight(abstract_txt:suche in 497) [ClassicSimilarity], result of:
            0.08862671 = score(doc=497,freq=1.0), product of:
              0.10093444 = queryWeight, product of:
                1.2982856 = boost
                5.619598 = idf(docFreq=416, maxDocs=42306)
                0.013834513 = queryNorm
              0.8780622 = fieldWeight in 497, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.619598 = idf(docFreq=416, maxDocs=42306)
                0.15625 = fieldNorm(doc=497)
          0.06854205 = weight(abstract_txt:wird in 497) [ClassicSimilarity], result of:
            0.06854205 = score(doc=497,freq=1.0), product of:
              0.11541998 = queryWeight, product of:
                2.1951342 = boost
                3.800634 = idf(docFreq=2570, maxDocs=42306)
                0.013834513 = queryNorm
              0.59384906 = fieldWeight in 497, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.800634 = idf(docFreq=2570, maxDocs=42306)
                0.15625 = fieldNorm(doc=497)
          0.87426245 = weight(abstract_txt:agenten in 497) [ClassicSimilarity], result of:
            0.87426245 = score(doc=497,freq=1.0), product of:
              0.63009226 = queryWeight, product of:
                5.1288815 = boost
                8.8800955 = idf(docFreq=15, maxDocs=42306)
                0.013834513 = queryNorm
              1.387515 = fieldWeight in 497, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.8800955 = idf(docFreq=15, maxDocs=42306)
                0.15625 = fieldNorm(doc=497)
        0.16 = coord(4/25)
    
  5. Göbel, R.: Semantic Web & Linked Data für professionelle Informationsangebote : Hoffnungsträger oder "alter Hut" - Eine Praxisbetrachtung für die Wirtschaftsinformationen (2010) 0.14
    0.13617113 = sum of:
      0.13617113 = product of:
        0.68085563 = sum of:
          0.025087569 = weight(abstract_txt:nach in 1259) [ClassicSimilarity], result of:
            0.025087569 = score(doc=1259,freq=1.0), product of:
              0.061169438 = queryWeight, product of:
                1.0106896 = boost
                4.374746 = idf(docFreq=1447, maxDocs=42306)
                0.013834513 = queryNorm
              0.4101324 = fieldWeight in 1259, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.374746 = idf(docFreq=1447, maxDocs=42306)
                0.09375 = fieldNorm(doc=1259)
          0.036909375 = weight(abstract_txt:informationen in 1259) [ClassicSimilarity], result of:
            0.036909375 = score(doc=1259,freq=1.0), product of:
              0.07912613 = queryWeight, product of:
                1.1495041 = boost
                4.9756007 = idf(docFreq=793, maxDocs=42306)
                0.013834513 = queryNorm
              0.46646255 = fieldWeight in 1259, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9756007 = idf(docFreq=793, maxDocs=42306)
                0.09375 = fieldNorm(doc=1259)
          0.053176027 = weight(abstract_txt:suche in 1259) [ClassicSimilarity], result of:
            0.053176027 = score(doc=1259,freq=1.0), product of:
              0.10093444 = queryWeight, product of:
                1.2982856 = boost
                5.619598 = idf(docFreq=416, maxDocs=42306)
                0.013834513 = queryNorm
              0.5268373 = fieldWeight in 1259, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.619598 = idf(docFreq=416, maxDocs=42306)
                0.09375 = fieldNorm(doc=1259)
          0.041125223 = weight(abstract_txt:wird in 1259) [ClassicSimilarity], result of:
            0.041125223 = score(doc=1259,freq=1.0), product of:
              0.11541998 = queryWeight, product of:
                2.1951342 = boost
                3.800634 = idf(docFreq=2570, maxDocs=42306)
                0.013834513 = queryNorm
              0.3563094 = fieldWeight in 1259, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.800634 = idf(docFreq=2570, maxDocs=42306)
                0.09375 = fieldNorm(doc=1259)
          0.5245574 = weight(abstract_txt:agenten in 1259) [ClassicSimilarity], result of:
            0.5245574 = score(doc=1259,freq=1.0), product of:
              0.63009226 = queryWeight, product of:
                5.1288815 = boost
                8.8800955 = idf(docFreq=15, maxDocs=42306)
                0.013834513 = queryNorm
              0.8325089 = fieldWeight in 1259, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.8800955 = idf(docFreq=15, maxDocs=42306)
                0.09375 = fieldNorm(doc=1259)
        0.2 = coord(5/25)