Document (#29334)

Author
Westermeyer, D.
Title
Adaptive Techniken zur Informationsgewinnung : der Webcrawler InfoSpiders
Imprint
Münster : Institut für Wirtschaftsinformatik der Westfälische Wilhelms-Universität Münster
Year
2005
Pages
22 S
Abstract
Die Suche nach Informationen im Internet führt den Nutzer meistens direkt zu einer Suchmaschine. Teile der gelieferten Ergebnisse enthalten aber manchmal nicht das, was der Nutzer gesucht hat. Hier setzen sog. adaptive Agenten an, welche die Gewohnheiten ihres Nutzers zu erlernen versuchen, um später auf Basis dessen selbstständig Entscheidungen zu treffen, ohne dass der Nutzer dazu befragt werden muss. Zunächst werden im Grundlagenteil adaptive Techniken zur Informationsgewinnung sowie die grundlegenden Eigenschaften von Webcrawlern besprochen. Im Hauptteil wird daraufhin der Webcrawler InfoSpiders erläutert. Dieses Programm arbeitet mit mehreren adaptiven Agenten, die parallel basierend auf einem Satz von Startlinks das Internet nach Informationen durchsuchen. Dabei bedienen sich die Agenten verschiedenster Techniken. Darunter fallen beispielsweise statistische Methoden, die den Inhalt von Webseiten untersuchen sowie neuronale Netze, mit denen der Inhalt bewertet wird. Eine andere Technik implementiert der genetische Algorithmus mit Hilfe dessen die Agenten Nachkommen mit neuen Mutationen erzeugen können. Danach wird eine konkrete Implementierung des InfoSpiders-Algorithmus' anhand von MySpiders verdeutlicht. Im Anschluss daran wird der InfoSpiders-Algorithmus sowie MySpiders einer Evaluation bezüglich des zusätzlichen Nutzens gegenüber herkömmlichen Suchmaschinen unterzogen. Eine Zusammenfassung mit Ausblick zu weiteren Entwicklungen in dem Bereich adaptiver Agenten zur Suche im Internet wird das Thema abschließen.
Content
Ausarbeitung im Rahmen des Seminars Suchmaschinen und Suchalgorithmen, Institut für Wirtschaftsinformatik Praktische Informatik in der Wirtschaft, Westfälische Wilhelms-Universität Münster. - Vgl.: http://www-wi.uni-muenster.de/pi/lehre/ss05/seminarSuchen/Ausarbeitungen/DenisWestermeyer.pdf
Theme
Suchmaschinen
Object
InfoSpider

Similar documents (content)

  1. Lahme, N.: Information Retrieval im Wissensmanagement : ein am Vorwissen orientierter Ansatz zur Komposition von Informationsressourcen (2004) 0.18
    0.17683193 = sum of:
      0.17683193 = product of:
        0.5525998 = sum of:
          0.02910441 = weight(abstract_txt:nach in 4942) [ClassicSimilarity], result of:
            0.02910441 = score(doc=4942,freq=2.0), product of:
              0.060553793 = queryWeight, product of:
                1.0055591 = boost
                4.350232 = idf(docFreq=1550, maxDocs=44218)
                0.013842717 = queryNorm
              0.48063728 = fieldWeight in 4942, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.350232 = idf(docFreq=1550, maxDocs=44218)
                0.078125 = fieldNorm(doc=4942)
          0.03081217 = weight(abstract_txt:informationen in 4942) [ClassicSimilarity], result of:
            0.03081217 = score(doc=4942,freq=1.0), product of:
              0.07924898 = queryWeight, product of:
                1.15036 = boost
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.013842717 = queryNorm
              0.3888021 = fieldWeight in 4942, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.078125 = fieldNorm(doc=4942)
          0.027589306 = weight(abstract_txt:eine in 4942) [ClassicSimilarity], result of:
            0.027589306 = score(doc=4942,freq=3.0), product of:
              0.058433603 = queryWeight, product of:
                1.2098008 = boost
                3.4892128 = idf(docFreq=3668, maxDocs=44218)
                0.013842717 = queryNorm
              0.47214794 = fieldWeight in 4942, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4892128 = idf(docFreq=3668, maxDocs=44218)
                0.078125 = fieldNorm(doc=4942)
          0.044138882 = weight(abstract_txt:suche in 4942) [ClassicSimilarity], result of:
            0.044138882 = score(doc=4942,freq=1.0), product of:
              0.10070701 = queryWeight, product of:
                1.2967814 = boost
                5.6101127 = idf(docFreq=439, maxDocs=44218)
                0.013842717 = queryNorm
              0.43829006 = fieldWeight in 4942, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6101127 = idf(docFreq=439, maxDocs=44218)
                0.078125 = fieldNorm(doc=4942)
          0.06016812 = weight(abstract_txt:dessen in 4942) [ClassicSimilarity], result of:
            0.06016812 = score(doc=4942,freq=1.0), product of:
              0.1238102 = queryWeight, product of:
                1.4378551 = boost
                6.2204237 = idf(docFreq=238, maxDocs=44218)
                0.013842717 = queryNorm
              0.48597062 = fieldWeight in 4942, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2204237 = idf(docFreq=238, maxDocs=44218)
                0.078125 = fieldNorm(doc=4942)
          0.05230035 = weight(abstract_txt:sowie in 4942) [ClassicSimilarity], result of:
            0.05230035 = score(doc=4942,freq=2.0), product of:
              0.102455586 = queryWeight, product of:
                1.6019552 = boost
                4.6202335 = idf(docFreq=1183, maxDocs=44218)
                0.013842717 = queryNorm
              0.5104685 = fieldWeight in 4942, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6202335 = idf(docFreq=1183, maxDocs=44218)
                0.078125 = fieldNorm(doc=4942)
          0.033571295 = weight(abstract_txt:wird in 4942) [ClassicSimilarity], result of:
            0.033571295 = score(doc=4942,freq=1.0), product of:
              0.11388615 = queryWeight, product of:
                2.1804311 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.013842717 = queryNorm
              0.29477945 = fieldWeight in 4942, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.078125 = fieldNorm(doc=4942)
          0.27491525 = weight(abstract_txt:algorithmus in 4942) [ClassicSimilarity], result of:
            0.27491525 = score(doc=4942,freq=2.0), product of:
              0.30974084 = queryWeight, product of:
                2.785363 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.013842717 = queryNorm
              0.8875654 = fieldWeight in 4942, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.078125 = fieldNorm(doc=4942)
        0.32 = coord(8/25)
    
  2. Voregger, M.: Angriff der Heinzelmännchen : Steuern in den Informationsfluten des Webs - mit Hilfe digitaler Agenten (1997) 0.17
    0.17457183 = sum of:
      0.17457183 = product of:
        2.182148 = sum of:
          0.09319259 = weight(abstract_txt:internet in 379) [ClassicSimilarity], result of:
            0.09319259 = score(doc=379,freq=1.0), product of:
              0.06667597 = queryWeight, product of:
                1.2923115 = boost
                3.7271836 = idf(docFreq=2891, maxDocs=44218)
                0.013842717 = queryNorm
              1.3976939 = fieldWeight in 379, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7271836 = idf(docFreq=2891, maxDocs=44218)
                0.375 = fieldNorm(doc=379)
          2.0889554 = weight(abstract_txt:agenten in 379) [ClassicSimilarity], result of:
            2.0889554 = score(doc=379,freq=1.0), product of:
              0.62846935 = queryWeight, product of:
                5.122111 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.013842717 = queryNorm
              3.3238778 = fieldWeight in 379, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.375 = fieldNorm(doc=379)
        0.08 = coord(2/25)
    
  3. Weiner, M.: ¬Die Agenten kommen (2002) 0.17
    0.17111766 = sum of:
      0.17111766 = product of:
        0.8555883 = sum of:
          0.035645477 = weight(abstract_txt:nach in 6734) [ClassicSimilarity], result of:
            0.035645477 = score(doc=6734,freq=3.0), product of:
              0.060553793 = queryWeight, product of:
                1.0055591 = boost
                4.350232 = idf(docFreq=1550, maxDocs=44218)
                0.013842717 = queryNorm
              0.58865803 = fieldWeight in 6734, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.350232 = idf(docFreq=1550, maxDocs=44218)
                0.078125 = fieldNorm(doc=6734)
          0.03081217 = weight(abstract_txt:informationen in 6734) [ClassicSimilarity], result of:
            0.03081217 = score(doc=6734,freq=1.0), product of:
              0.07924898 = queryWeight, product of:
                1.15036 = boost
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.013842717 = queryNorm
              0.3888021 = fieldWeight in 6734, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.078125 = fieldNorm(doc=6734)
          0.015928693 = weight(abstract_txt:eine in 6734) [ClassicSimilarity], result of:
            0.015928693 = score(doc=6734,freq=1.0), product of:
              0.058433603 = queryWeight, product of:
                1.2098008 = boost
                3.4892128 = idf(docFreq=3668, maxDocs=44218)
                0.013842717 = queryNorm
              0.27259475 = fieldWeight in 6734, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4892128 = idf(docFreq=3668, maxDocs=44218)
                0.078125 = fieldNorm(doc=6734)
          0.019415123 = weight(abstract_txt:internet in 6734) [ClassicSimilarity], result of:
            0.019415123 = score(doc=6734,freq=1.0), product of:
              0.06667597 = queryWeight, product of:
                1.2923115 = boost
                3.7271836 = idf(docFreq=2891, maxDocs=44218)
                0.013842717 = queryNorm
              0.2911862 = fieldWeight in 6734, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7271836 = idf(docFreq=2891, maxDocs=44218)
                0.078125 = fieldNorm(doc=6734)
          0.75378686 = weight(abstract_txt:agenten in 6734) [ClassicSimilarity], result of:
            0.75378686 = score(doc=6734,freq=3.0), product of:
              0.62846935 = queryWeight, product of:
                5.122111 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.013842717 = queryNorm
              1.1994011 = fieldWeight in 6734, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.078125 = fieldNorm(doc=6734)
        0.2 = coord(5/25)
    
  4. Röscheisen, E.: Fin de such (2001) 0.17
    0.17071651 = sum of:
      0.17071651 = product of:
        1.0669782 = sum of:
          0.04115985 = weight(abstract_txt:nach in 6496) [ClassicSimilarity], result of:
            0.04115985 = score(doc=6496,freq=1.0), product of:
              0.060553793 = queryWeight, product of:
                1.0055591 = boost
                4.350232 = idf(docFreq=1550, maxDocs=44218)
                0.013842717 = queryNorm
              0.67972374 = fieldWeight in 6496, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.350232 = idf(docFreq=1550, maxDocs=44218)
                0.15625 = fieldNorm(doc=6496)
          0.088277765 = weight(abstract_txt:suche in 6496) [ClassicSimilarity], result of:
            0.088277765 = score(doc=6496,freq=1.0), product of:
              0.10070701 = queryWeight, product of:
                1.2967814 = boost
                5.6101127 = idf(docFreq=439, maxDocs=44218)
                0.013842717 = queryNorm
              0.8765801 = fieldWeight in 6496, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6101127 = idf(docFreq=439, maxDocs=44218)
                0.15625 = fieldNorm(doc=6496)
          0.06714259 = weight(abstract_txt:wird in 6496) [ClassicSimilarity], result of:
            0.06714259 = score(doc=6496,freq=1.0), product of:
              0.11388615 = queryWeight, product of:
                2.1804311 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.013842717 = queryNorm
              0.5895589 = fieldWeight in 6496, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.15625 = fieldNorm(doc=6496)
          0.87039804 = weight(abstract_txt:agenten in 6496) [ClassicSimilarity], result of:
            0.87039804 = score(doc=6496,freq=1.0), product of:
              0.62846935 = queryWeight, product of:
                5.122111 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.013842717 = queryNorm
              1.3849491 = fieldWeight in 6496, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.15625 = fieldNorm(doc=6496)
        0.16 = coord(4/25)
    
  5. Göbel, R.: Semantic Web & Linked Data für professionelle Informationsangebote : Hoffnungsträger oder "alter Hut" - Eine Praxisbetrachtung für die Wirtschaftsinformationen (2010) 0.14
    0.13543232 = sum of:
      0.13543232 = product of:
        0.6771616 = sum of:
          0.02469591 = weight(abstract_txt:nach in 4258) [ClassicSimilarity], result of:
            0.02469591 = score(doc=4258,freq=1.0), product of:
              0.060553793 = queryWeight, product of:
                1.0055591 = boost
                4.350232 = idf(docFreq=1550, maxDocs=44218)
                0.013842717 = queryNorm
              0.40783426 = fieldWeight in 4258, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.350232 = idf(docFreq=1550, maxDocs=44218)
                0.09375 = fieldNorm(doc=4258)
          0.0369746 = weight(abstract_txt:informationen in 4258) [ClassicSimilarity], result of:
            0.0369746 = score(doc=4258,freq=1.0), product of:
              0.07924898 = queryWeight, product of:
                1.15036 = boost
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.013842717 = queryNorm
              0.4665625 = fieldWeight in 4258, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.09375 = fieldNorm(doc=4258)
          0.052966654 = weight(abstract_txt:suche in 4258) [ClassicSimilarity], result of:
            0.052966654 = score(doc=4258,freq=1.0), product of:
              0.10070701 = queryWeight, product of:
                1.2967814 = boost
                5.6101127 = idf(docFreq=439, maxDocs=44218)
                0.013842717 = queryNorm
              0.52594805 = fieldWeight in 4258, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6101127 = idf(docFreq=439, maxDocs=44218)
                0.09375 = fieldNorm(doc=4258)
          0.040285554 = weight(abstract_txt:wird in 4258) [ClassicSimilarity], result of:
            0.040285554 = score(doc=4258,freq=1.0), product of:
              0.11388615 = queryWeight, product of:
                2.1804311 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.013842717 = queryNorm
              0.35373533 = fieldWeight in 4258, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.09375 = fieldNorm(doc=4258)
          0.52223885 = weight(abstract_txt:agenten in 4258) [ClassicSimilarity], result of:
            0.52223885 = score(doc=4258,freq=1.0), product of:
              0.62846935 = queryWeight, product of:
                5.122111 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.013842717 = queryNorm
              0.83096945 = fieldWeight in 4258, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.09375 = fieldNorm(doc=4258)
        0.2 = coord(5/25)