Document (#31978)

Author
Khare, R.
Cutting, D.
Sitaker, K.
Rifkin, A.
Title
Nutch: a flexible and scalable open-source Web search engine
Source
http://wiki.commerce.net/images/0/06/CN-TR-04-04.pdf
Year
2004
Series
CommerceNet Labs Technical Report 04-04
Abstract
Nutch is an open-source Web search engine that can be used at global, local, and even personal scale. Its initial design goal was to enable a transparent alternative for global Web search in the public interest - one of its signature features is the ability to "explain" its result rankings. Recent work has emphasized how it can also be used for intranets; by local communities with richer data models, such as the Creative Commons metadata-enabled search for licensed content; on a personal scale to index a user's files, email, and web-surfing history; and we also report on several other research projects built on Nutch. In this paper, we present how the architecture of the Nutch system enables it to be more flexible and scalable than other comparable systems today.
Content
Vgl. auch: www.nutch.org
Theme
Suchmaschinen
Object
Nutch

Similar documents (content)

  1. Brin, S.; Page, L.: ¬The anatomy of a large-scale hypertextual Web search engine (1998) 0.09
    0.09375968 = sum of:
      0.09375968 = product of:
        0.4687984 = sum of:
          0.01581374 = weight(abstract_txt:also in 947) [ClassicSimilarity], result of:
            0.01581374 = score(doc=947,freq=1.0), product of:
              0.074453786 = queryWeight, product of:
                3.3983476 = idf(docFreq=4017, maxDocs=44218)
                0.02190882 = queryNorm
              0.21239673 = fieldWeight in 947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3983476 = idf(docFreq=4017, maxDocs=44218)
                0.0625 = fieldNorm(doc=947)
          0.06621487 = weight(abstract_txt:comparable in 947) [ClassicSimilarity], result of:
            0.06621487 = score(doc=947,freq=1.0), product of:
              0.15351732 = queryWeight, product of:
                1.0153606 = boost
                6.901097 = idf(docFreq=120, maxDocs=44218)
                0.02190882 = queryNorm
              0.43131855 = fieldWeight in 947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.901097 = idf(docFreq=120, maxDocs=44218)
                0.0625 = fieldNorm(doc=947)
          0.13234945 = weight(abstract_txt:scale in 947) [ClassicSimilarity], result of:
            0.13234945 = score(doc=947,freq=4.0), product of:
              0.19334152 = queryWeight, product of:
                1.6114588 = boost
                5.476297 = idf(docFreq=502, maxDocs=44218)
                0.02190882 = queryNorm
              0.6845371 = fieldWeight in 947, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.476297 = idf(docFreq=502, maxDocs=44218)
                0.0625 = fieldNorm(doc=947)
          0.13608015 = weight(abstract_txt:engine in 947) [ClassicSimilarity], result of:
            0.13608015 = score(doc=947,freq=4.0), product of:
              0.19695796 = queryWeight, product of:
                1.62646 = boost
                5.5272765 = idf(docFreq=477, maxDocs=44218)
                0.02190882 = queryNorm
              0.69090956 = fieldWeight in 947, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5272765 = idf(docFreq=477, maxDocs=44218)
                0.0625 = fieldNorm(doc=947)
          0.11834019 = weight(abstract_txt:search in 947) [ClassicSimilarity], result of:
            0.11834019 = score(doc=947,freq=9.0), product of:
              0.17253652 = queryWeight, product of:
                2.1528418 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.02190882 = queryNorm
              0.68588483 = fieldWeight in 947, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0625 = fieldNorm(doc=947)
        0.2 = coord(5/25)
    
  2. Yilmaz, T.; Ozcan, R.; Altingovde, I.S.; Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification (2019) 0.09
    0.09169855 = sum of:
      0.09169855 = product of:
        0.3820773 = sum of:
          0.01581374 = weight(abstract_txt:also in 5041) [ClassicSimilarity], result of:
            0.01581374 = score(doc=5041,freq=1.0), product of:
              0.074453786 = queryWeight, product of:
                3.3983476 = idf(docFreq=4017, maxDocs=44218)
                0.02190882 = queryNorm
              0.21239673 = fieldWeight in 5041, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3983476 = idf(docFreq=4017, maxDocs=44218)
                0.0625 = fieldNorm(doc=5041)
          0.065744095 = weight(abstract_txt:rankings in 5041) [ClassicSimilarity], result of:
            0.065744095 = score(doc=5041,freq=1.0), product of:
              0.1527888 = queryWeight, product of:
                1.0129485 = boost
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.02190882 = queryNorm
              0.43029392 = fieldWeight in 5041, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.0625 = fieldNorm(doc=5041)
          0.024863236 = weight(abstract_txt:other in 5041) [ClassicSimilarity], result of:
            0.024863236 = score(doc=5041,freq=2.0), product of:
              0.07990222 = queryWeight, product of:
                1.0359434 = boost
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.02190882 = queryNorm
              0.3111708 = fieldWeight in 5041, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.0625 = fieldNorm(doc=5041)
          0.053441133 = weight(abstract_txt:source in 5041) [ClassicSimilarity], result of:
            0.053441133 = score(doc=5041,freq=1.0), product of:
              0.16766696 = queryWeight, product of:
                1.5006533 = boost
                5.0997415 = idf(docFreq=732, maxDocs=44218)
                0.02190882 = queryNorm
              0.31873384 = fieldWeight in 5041, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0997415 = idf(docFreq=732, maxDocs=44218)
                0.0625 = fieldNorm(doc=5041)
          0.11784885 = weight(abstract_txt:engine in 5041) [ClassicSimilarity], result of:
            0.11784885 = score(doc=5041,freq=3.0), product of:
              0.19695796 = queryWeight, product of:
                1.62646 = boost
                5.5272765 = idf(docFreq=477, maxDocs=44218)
                0.02190882 = queryNorm
              0.5983452 = fieldWeight in 5041, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.5272765 = idf(docFreq=477, maxDocs=44218)
                0.0625 = fieldNorm(doc=5041)
          0.104366235 = weight(abstract_txt:search in 5041) [ClassicSimilarity], result of:
            0.104366235 = score(doc=5041,freq=7.0), product of:
              0.17253652 = queryWeight, product of:
                2.1528418 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.02190882 = queryNorm
              0.60489357 = fieldWeight in 5041, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0625 = fieldNorm(doc=5041)
        0.24 = coord(6/25)
    
  3. Wakeling, S.; Clough, P.; Connaway, L.S.; Sen, B.; Tomás, D.: Users and uses of a global union catalog : a mixed-methods study of WorldCat.org (2017) 0.09
    0.08713321 = sum of:
      0.08713321 = product of:
        0.36305505 = sum of:
          0.01581374 = weight(abstract_txt:also in 3794) [ClassicSimilarity], result of:
            0.01581374 = score(doc=3794,freq=1.0), product of:
              0.074453786 = queryWeight, product of:
                3.3983476 = idf(docFreq=4017, maxDocs=44218)
                0.02190882 = queryNorm
              0.21239673 = fieldWeight in 3794, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3983476 = idf(docFreq=4017, maxDocs=44218)
                0.0625 = fieldNorm(doc=3794)
          0.017580964 = weight(abstract_txt:other in 3794) [ClassicSimilarity], result of:
            0.017580964 = score(doc=3794,freq=1.0), product of:
              0.07990222 = queryWeight, product of:
                1.0359434 = boost
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.02190882 = queryNorm
              0.22003098 = fieldWeight in 3794, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.0625 = fieldNorm(doc=3794)
          0.06617472 = weight(abstract_txt:scale in 3794) [ClassicSimilarity], result of:
            0.06617472 = score(doc=3794,freq=1.0), product of:
              0.19334152 = queryWeight, product of:
                1.6114588 = boost
                5.476297 = idf(docFreq=502, maxDocs=44218)
                0.02190882 = queryNorm
              0.34226856 = fieldWeight in 3794, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.476297 = idf(docFreq=502, maxDocs=44218)
                0.0625 = fieldNorm(doc=3794)
          0.09622318 = weight(abstract_txt:engine in 3794) [ClassicSimilarity], result of:
            0.09622318 = score(doc=3794,freq=2.0), product of:
              0.19695796 = queryWeight, product of:
                1.62646 = boost
                5.5272765 = idf(docFreq=477, maxDocs=44218)
                0.02190882 = queryNorm
              0.48854682 = fieldWeight in 3794, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5272765 = idf(docFreq=477, maxDocs=44218)
                0.0625 = fieldNorm(doc=3794)
          0.09893872 = weight(abstract_txt:global in 3794) [ClassicSimilarity], result of:
            0.09893872 = score(doc=3794,freq=2.0), product of:
              0.20064634 = queryWeight, product of:
                1.6416185 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.02190882 = queryNorm
              0.49310005 = fieldWeight in 3794, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.0625 = fieldNorm(doc=3794)
          0.06832374 = weight(abstract_txt:search in 3794) [ClassicSimilarity], result of:
            0.06832374 = score(doc=3794,freq=3.0), product of:
              0.17253652 = queryWeight, product of:
                2.1528418 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.02190882 = queryNorm
              0.3959958 = fieldWeight in 3794, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0625 = fieldNorm(doc=3794)
        0.24 = coord(6/25)
    
  4. Petrelli, D.; Lanfranchi, V.; Ciravegna, F.; Begdev, R.; Chapman, S.: Highly focused document retrieval in aerospace engineering : user interaction design and evaluation (2011) 0.08
    0.07987188 = sum of:
      0.07987188 = product of:
        0.3327995 = sum of:
          0.013837023 = weight(abstract_txt:also in 4535) [ClassicSimilarity], result of:
            0.013837023 = score(doc=4535,freq=1.0), product of:
              0.074453786 = queryWeight, product of:
                3.3983476 = idf(docFreq=4017, maxDocs=44218)
                0.02190882 = queryNorm
              0.18584713 = fieldWeight in 4535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3983476 = idf(docFreq=4017, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4535)
          0.02175533 = weight(abstract_txt:other in 4535) [ClassicSimilarity], result of:
            0.02175533 = score(doc=4535,freq=2.0), product of:
              0.07990222 = queryWeight, product of:
                1.0359434 = boost
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.02190882 = queryNorm
              0.27227443 = fieldWeight in 4535, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4535)
          0.07219744 = weight(abstract_txt:personal in 4535) [ClassicSimilarity], result of:
            0.07219744 = score(doc=4535,freq=2.0), product of:
              0.1777718 = queryWeight, product of:
                1.5452119 = boost
                5.2511673 = idf(docFreq=629, maxDocs=44218)
                0.02190882 = queryNorm
              0.40612423 = fieldWeight in 4535, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2511673 = idf(docFreq=629, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4535)
          0.059535064 = weight(abstract_txt:engine in 4535) [ClassicSimilarity], result of:
            0.059535064 = score(doc=4535,freq=1.0), product of:
              0.19695796 = queryWeight, product of:
                1.62646 = boost
                5.5272765 = idf(docFreq=477, maxDocs=44218)
                0.02190882 = queryNorm
              0.30227295 = fieldWeight in 4535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5272765 = idf(docFreq=477, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4535)
          0.08829476 = weight(abstract_txt:flexible in 4535) [ClassicSimilarity], result of:
            0.08829476 = score(doc=4535,freq=1.0), product of:
              0.2561425 = queryWeight, product of:
                1.8548014 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.02190882 = queryNorm
              0.34470952 = fieldWeight in 4535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4535)
          0.07717987 = weight(abstract_txt:search in 4535) [ClassicSimilarity], result of:
            0.07717987 = score(doc=4535,freq=5.0), product of:
              0.17253652 = queryWeight, product of:
                2.1528418 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.02190882 = queryNorm
              0.44732484 = fieldWeight in 4535, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4535)
        0.24 = coord(6/25)
    
  5. Zhitomirsky-Geffet, M.; Bar-Ilan, J.; Levene, M.: Analysis of change in users' assessment of search results over time (2017) 0.08
    0.07958329 = sum of:
      0.07958329 = product of:
        0.39791644 = sum of:
          0.065744095 = weight(abstract_txt:rankings in 3593) [ClassicSimilarity], result of:
            0.065744095 = score(doc=3593,freq=1.0), product of:
              0.1527888 = queryWeight, product of:
                1.0129485 = boost
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.02190882 = queryNorm
              0.43029392 = fieldWeight in 3593, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.0625 = fieldNorm(doc=3593)
          0.08119055 = weight(abstract_txt:local in 3593) [ClassicSimilarity], result of:
            0.08119055 = score(doc=3593,freq=2.0), product of:
              0.17586957 = queryWeight, product of:
                1.5369225 = boost
                5.2229967 = idf(docFreq=647, maxDocs=44218)
                0.02190882 = queryNorm
              0.46165204 = fieldWeight in 3593, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2229967 = idf(docFreq=647, maxDocs=44218)
                0.0625 = fieldNorm(doc=3593)
          0.11461799 = weight(abstract_txt:scale in 3593) [ClassicSimilarity], result of:
            0.11461799 = score(doc=3593,freq=3.0), product of:
              0.19334152 = queryWeight, product of:
                1.6114588 = boost
                5.476297 = idf(docFreq=502, maxDocs=44218)
                0.02190882 = queryNorm
              0.59282655 = fieldWeight in 3593, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.476297 = idf(docFreq=502, maxDocs=44218)
                0.0625 = fieldNorm(doc=3593)
          0.06804007 = weight(abstract_txt:engine in 3593) [ClassicSimilarity], result of:
            0.06804007 = score(doc=3593,freq=1.0), product of:
              0.19695796 = queryWeight, product of:
                1.62646 = boost
                5.5272765 = idf(docFreq=477, maxDocs=44218)
                0.02190882 = queryNorm
              0.34545478 = fieldWeight in 3593, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5272765 = idf(docFreq=477, maxDocs=44218)
                0.0625 = fieldNorm(doc=3593)
          0.06832374 = weight(abstract_txt:search in 3593) [ClassicSimilarity], result of:
            0.06832374 = score(doc=3593,freq=3.0), product of:
              0.17253652 = queryWeight, product of:
                2.1528418 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.02190882 = queryNorm
              0.3959958 = fieldWeight in 3593, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0625 = fieldNorm(doc=3593)
        0.2 = coord(5/25)