Document (#31988)

Author
Fichtner, K.
Title
Boyer-Moore Suchalgorithmus
Imprint
Münster : Institut für Wirtschaftsinformatik der Westfälische Wilhelms-Universität Münster
Year
2005
Pages
26 S
Abstract
Die Masse der Suchalgorithmen lässt sich in zwei grundlegend verschiedene Teilbereiche untergliedern. Auf der einen Seite stehen Algorithmen, die auf komplexen Datenstrukturen (häufig baumartig) ganze Datensätze unter Verwendung eines Indizes finden. Als geläufiger Vertreter sei hier die binäre Suche auf sortierten Arrays oder in binären Bäumen genannt. Die andere Gruppe, der sich diese Ausarbeitung widmet, dient dazu, Entsprechungen von Mustern in gegebenen Zeichenketten zu finden. Auf den folgenden Seiten werden nun zunächst einige Begriffe eingeführt, die für das weitere Verständnis und einen Vergleich verschiedener Suchalgorithmen nötig sind. Weiterhin wird ein naiver Suchalgorithmus dargestellt und mit der Idee von Boyer und Moore verglichen. Hierzu wird ihr Algorithmus zunächst informal beschrieben, dann mit Blick auf eine Implementation näher erläutert und anschließend einer Effizienzanalyse - sowohl empirisch als auch theoretisch - unterzogen. Abschließend findet eine kurze Bewertung mit Bezug auf Schwachstellen, Vorzüge und Verbesserungsmöglichkeiten statt, im Zuge derer einige prominente Modifikationen des Boyer-Moore Algorithmus vorgestellt werden.
Content
Ausarbeitung im Rahmen des Seminars Suchmaschinen und Suchalgorithmen, Institut für Wirtschaftsinformatik Praktische Informatik in der Wirtschaft, Westfälische Wilhelms-Universität Münster. - Vgl.: http://www-wi.uni-muenster.de/pi/lehre/ss05/seminarSuchen/Ausarbeitungen/KristoferFichtner.pdf
Theme
Retrievalalgorithmen
Object
Boyer-Moore Suchalgorithmus

Similar documents (content)

  1. Ackermann, J.: Knuth-Morris-Pratt (2005) 0.31
    0.30650228 = sum of:
      0.30650228 = product of:
        1.094651 = sum of:
          0.020763418 = weight(abstract_txt:einen in 1988) [ClassicSimilarity], result of:
            0.020763418 = score(doc=1988,freq=2.0), product of:
              0.06281648 = queryWeight, product of:
                1.0694157 = boost
                4.2738786 = idf(docFreq=1648, maxDocs=43556)
                0.013743738 = queryNorm
              0.33054093 = fieldWeight in 1988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2738786 = idf(docFreq=1648, maxDocs=43556)
                0.0546875 = fieldNorm(doc=1988)
          0.073149465 = weight(abstract_txt:mustern in 1988) [ClassicSimilarity], result of:
            0.073149465 = score(doc=1988,freq=1.0), product of:
              0.14543931 = queryWeight, product of:
                1.1506299 = boost
                9.196897 = idf(docFreq=11, maxDocs=43556)
                0.013743738 = queryNorm
              0.50295526 = fieldWeight in 1988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.196897 = idf(docFreq=11, maxDocs=43556)
                0.0546875 = fieldNorm(doc=1988)
          0.047999337 = weight(abstract_txt:finden in 1988) [ClassicSimilarity], result of:
            0.047999337 = score(doc=1988,freq=2.0), product of:
              0.10982415 = queryWeight, product of:
                1.4140302 = boost
                5.6511173 = idf(docFreq=415, maxDocs=43556)
                0.013743738 = queryNorm
              0.4370563 = fieldWeight in 1988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6511173 = idf(docFreq=415, maxDocs=43556)
                0.0546875 = fieldNorm(doc=1988)
          0.2565096 = weight(abstract_txt:algorithmus in 1988) [ClassicSimilarity], result of:
            0.2565096 = score(doc=1988,freq=7.0), product of:
              0.22109933 = queryWeight, product of:
                2.006335 = boost
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.013743738 = queryNorm
              1.1601555 = fieldWeight in 1988, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.0546875 = fieldNorm(doc=1988)
          0.17355797 = weight(abstract_txt:suchalgorithmen in 1988) [ClassicSimilarity], result of:
            0.17355797 = score(doc=1988,freq=1.0), product of:
              0.32597232 = queryWeight, product of:
                2.4361281 = boost
                9.735892 = idf(docFreq=6, maxDocs=43556)
                0.013743738 = queryNorm
              0.5324316 = fieldWeight in 1988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.735892 = idf(docFreq=6, maxDocs=43556)
                0.0546875 = fieldNorm(doc=1988)
          0.2497714 = weight(abstract_txt:moore in 1988) [ClassicSimilarity], result of:
            0.2497714 = score(doc=1988,freq=1.0), product of:
              0.47563806 = queryWeight, product of:
                3.6040738 = boost
                9.602362 = idf(docFreq=7, maxDocs=43556)
                0.013743738 = queryNorm
              0.52512914 = fieldWeight in 1988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.602362 = idf(docFreq=7, maxDocs=43556)
                0.0546875 = fieldNorm(doc=1988)
          0.27289975 = weight(abstract_txt:boyer in 1988) [ClassicSimilarity], result of:
            0.27289975 = score(doc=1988,freq=1.0), product of:
              0.5045647 = queryWeight, product of:
                3.71205 = boost
                9.890043 = idf(docFreq=5, maxDocs=43556)
                0.013743738 = queryNorm
              0.5408617 = fieldWeight in 1988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.890043 = idf(docFreq=5, maxDocs=43556)
                0.0546875 = fieldNorm(doc=1988)
        0.28 = coord(7/25)
    
  2. Uratani, N.; Takeda, M.: ¬A fast string-searching algorithm for multiple patterns (1993) 0.07
    0.07168061 = sum of:
      0.07168061 = product of:
        0.89600766 = sum of:
          0.42817956 = weight(abstract_txt:moore in 6272) [ClassicSimilarity], result of:
            0.42817956 = score(doc=6272,freq=1.0), product of:
              0.47563806 = queryWeight, product of:
                3.6040738 = boost
                9.602362 = idf(docFreq=7, maxDocs=43556)
                0.013743738 = queryNorm
              0.9002214 = fieldWeight in 6272, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.602362 = idf(docFreq=7, maxDocs=43556)
                0.09375 = fieldNorm(doc=6272)
          0.46782812 = weight(abstract_txt:boyer in 6272) [ClassicSimilarity], result of:
            0.46782812 = score(doc=6272,freq=1.0), product of:
              0.5045647 = queryWeight, product of:
                3.71205 = boost
                9.890043 = idf(docFreq=5, maxDocs=43556)
                0.013743738 = queryNorm
              0.92719156 = fieldWeight in 6272, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.890043 = idf(docFreq=5, maxDocs=43556)
                0.09375 = fieldNorm(doc=6272)
        0.08 = coord(2/25)
    
  3. Korves, J.: Seiten bewerten : Googles PageRank (2005) 0.05
    0.054041907 = sum of:
      0.054041907 = product of:
        0.33776194 = sum of:
          0.056997873 = weight(abstract_txt:ausarbeitung in 1989) [ClassicSimilarity], result of:
            0.056997873 = score(doc=1989,freq=1.0), product of:
              0.13648282 = queryWeight, product of:
                1.1146377 = boost
                8.909214 = idf(docFreq=15, maxDocs=43556)
                0.013743738 = queryNorm
              0.4176194 = fieldWeight in 1989, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.909214 = idf(docFreq=15, maxDocs=43556)
                0.046875 = fieldNorm(doc=1989)
          0.031292617 = weight(abstract_txt:einige in 1989) [ClassicSimilarity], result of:
            0.031292617 = score(doc=1989,freq=1.0), product of:
              0.11529492 = queryWeight, product of:
                1.4488213 = boost
                5.7901587 = idf(docFreq=361, maxDocs=43556)
                0.013743738 = queryNorm
              0.27141368 = fieldWeight in 1989, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7901587 = idf(docFreq=361, maxDocs=43556)
                0.046875 = fieldNorm(doc=1989)
          0.04591564 = weight(abstract_txt:zunächst in 1989) [ClassicSimilarity], result of:
            0.04591564 = score(doc=1989,freq=2.0), product of:
              0.1181624 = queryWeight, product of:
                1.4667274 = boost
                5.8617196 = idf(docFreq=336, maxDocs=43556)
                0.013743738 = queryNorm
              0.3885808 = fieldWeight in 1989, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8617196 = idf(docFreq=336, maxDocs=43556)
                0.046875 = fieldNorm(doc=1989)
          0.2035558 = weight(abstract_txt:algorithmus in 1989) [ClassicSimilarity], result of:
            0.2035558 = score(doc=1989,freq=6.0), product of:
              0.22109933 = queryWeight, product of:
                2.006335 = boost
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.013743738 = queryNorm
              0.9206531 = fieldWeight in 1989, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.046875 = fieldNorm(doc=1989)
        0.16 = coord(4/25)
    
  4. Karisch, K.-H.: Am Grund (2001) 0.04
    0.036904756 = sum of:
      0.036904756 = product of:
        0.30753964 = sum of:
          0.0411583 = weight(abstract_txt:theoretisch in 4364) [ClassicSimilarity], result of:
            0.0411583 = score(doc=4364,freq=1.0), product of:
              0.10985264 = queryWeight, product of:
                7.9929233 = idf(docFreq=39, maxDocs=43556)
                0.013743738 = queryNorm
              0.37466827 = fieldWeight in 4364, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9929233 = idf(docFreq=39, maxDocs=43556)
                0.046875 = fieldNorm(doc=4364)
          0.03246726 = weight(abstract_txt:zunächst in 4364) [ClassicSimilarity], result of:
            0.03246726 = score(doc=4364,freq=1.0), product of:
              0.1181624 = queryWeight, product of:
                1.4667274 = boost
                5.8617196 = idf(docFreq=336, maxDocs=43556)
                0.013743738 = queryNorm
              0.2747681 = fieldWeight in 4364, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8617196 = idf(docFreq=336, maxDocs=43556)
                0.046875 = fieldNorm(doc=4364)
          0.23391406 = weight(abstract_txt:boyer in 4364) [ClassicSimilarity], result of:
            0.23391406 = score(doc=4364,freq=1.0), product of:
              0.5045647 = queryWeight, product of:
                3.71205 = boost
                9.890043 = idf(docFreq=5, maxDocs=43556)
                0.013743738 = queryNorm
              0.46359578 = fieldWeight in 4364, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.890043 = idf(docFreq=5, maxDocs=43556)
                0.046875 = fieldNorm(doc=4364)
        0.12 = coord(3/25)
    
  5. Westermeyer, D.: Adaptive Techniken zur Informationsgewinnung : der Webcrawler InfoSpiders (2005) 0.04
    0.036826514 = sum of:
      0.036826514 = product of:
        0.30688763 = sum of:
          0.07168372 = weight(abstract_txt:unterzogen in 331) [ClassicSimilarity], result of:
            0.07168372 = score(doc=331,freq=1.0), product of:
              0.13126837 = queryWeight, product of:
                1.0931375 = boost
                8.737364 = idf(docFreq=18, maxDocs=43556)
                0.013743738 = queryNorm
              0.54608524 = fieldWeight in 331, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.737364 = idf(docFreq=18, maxDocs=43556)
                0.0625 = fieldNorm(doc=331)
          0.04328968 = weight(abstract_txt:zunächst in 331) [ClassicSimilarity], result of:
            0.04328968 = score(doc=331,freq=1.0), product of:
              0.1181624 = queryWeight, product of:
                1.4667274 = boost
                5.8617196 = idf(docFreq=336, maxDocs=43556)
                0.013743738 = queryNorm
              0.36635748 = fieldWeight in 331, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8617196 = idf(docFreq=336, maxDocs=43556)
                0.0625 = fieldNorm(doc=331)
          0.19191423 = weight(abstract_txt:algorithmus in 331) [ClassicSimilarity], result of:
            0.19191423 = score(doc=331,freq=3.0), product of:
              0.22109933 = queryWeight, product of:
                2.006335 = boost
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.013743738 = queryNorm
              0.86800003 = fieldWeight in 331, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.0625 = fieldNorm(doc=331)
        0.12 = coord(3/25)