Document (#37096)

Author
Das, A.
Jain, A.
Title
Indexing the World Wide Web : the journey so far
Source
Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a
Imprint
Hershey, PA : IGI Publishing
Year
2012
Pages
S.1-28
Abstract
In this chapter, the authors describe the key indexing components of today's web search engines. As the World Wide Web has grown, the systems and methods for indexing have changed significantly. The authors present the data structures used, the features extracted, the infrastructure needed, and the options available for designing a brand new search engine. Techniques are highlighted that improve relevance of results, discuss trade-offs to best utilize machine resources, and cover distributed processing concepts in this context. In particular, the authors delve into the topics of indexing phrases instead of terms, storage in memory vs. on disk, and data partitioning. Some thoughts on information organization for the newly emerging data-forms conclude the chapter.
Footnote
Vgl.: http://www.igi-global.com/book/next-generation-search-engines/64418.
Theme
Suchmaschinen
Object
Google

Similar documents (author)

  1. Jain, H.C.: Colon Classification : a review article (1964) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:jain in 1952) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 1952, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=1952)
    
  2. Jain, A.K.: Image data compression : a review (1981) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:jain in 8696) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 8696, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=8696)
    
  3. Jain, R.: Visual information retrieval in digital libraries (1997) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:jain in 760) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 760, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=760)
    
  4. Jain, P.: ¬An empirical study of knowledge management in academic libraries in East and Southern Africa (2007) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:jain in 864) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 864, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=864)
    
  5. Saggi, M.K.; Jain, S.: ¬A survey towards an integration of big data analytics to big insights for value-creation (2018) 4.70
    4.697151 = sum of:
      4.697151 = weight(author_txt:jain in 5053) [ClassicSimilarity], result of:
        4.697151 = fieldWeight in 5053, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.5 = fieldNorm(doc=5053)
    

Similar documents (content)

  1. Ceri, S.; Bozzon, A.; Brambilla, M.; Della Valle, E.; Fraternali, P.; Quarteroni, S.: Web Information Retrieval (2013) 0.12
    0.12239238 = sum of:
      0.12239238 = product of:
        0.43711564 = sum of:
          0.050773297 = weight(abstract_txt:cover in 1082) [ClassicSimilarity], result of:
            0.050773297 = score(doc=1082,freq=1.0), product of:
              0.1438274 = queryWeight, product of:
                1.0181125 = boost
                6.45514 = idf(docFreq=188, maxDocs=44218)
                0.021884678 = queryNorm
              0.35301548 = fieldWeight in 1082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.45514 = idf(docFreq=188, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1082)
          0.071031146 = weight(abstract_txt:grown in 1082) [ClassicSimilarity], result of:
            0.071031146 = score(doc=1082,freq=1.0), product of:
              0.17990805 = queryWeight, product of:
                1.1386763 = boost
                7.2195506 = idf(docFreq=87, maxDocs=44218)
                0.021884678 = queryNorm
              0.39481917 = fieldWeight in 1082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2195506 = idf(docFreq=87, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1082)
          0.058438372 = weight(abstract_txt:search in 1082) [ClassicSimilarity], result of:
            0.058438372 = score(doc=1082,freq=10.0), product of:
              0.0923762 = queryWeight, product of:
                1.1539049 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.021884678 = queryNorm
              0.6326128 = fieldWeight in 1082, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1082)
          0.042061385 = weight(abstract_txt:data in 1082) [ClassicSimilarity], result of:
            0.042061385 = score(doc=1082,freq=4.0), product of:
              0.11526413 = queryWeight, product of:
                1.5786386 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.021884678 = queryNorm
              0.36491305 = fieldWeight in 1082, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1082)
          0.09579592 = weight(abstract_txt:chapter in 1082) [ClassicSimilarity], result of:
            0.09579592 = score(doc=1082,freq=1.0), product of:
              0.27668953 = queryWeight, product of:
                1.997038 = boost
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.021884678 = queryNorm
              0.34622172 = fieldWeight in 1082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1082)
          0.05688246 = weight(abstract_txt:authors in 1082) [ClassicSimilarity], result of:
            0.05688246 = score(doc=1082,freq=1.0), product of:
              0.2237574 = queryWeight, product of:
                2.1995018 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.021884678 = queryNorm
              0.25421488 = fieldWeight in 1082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1082)
          0.062133037 = weight(abstract_txt:indexing in 1082) [ClassicSimilarity], result of:
            0.062133037 = score(doc=1082,freq=1.0), product of:
              0.26120797 = queryWeight, product of:
                2.744089 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.021884678 = queryNorm
              0.23786807 = fieldWeight in 1082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1082)
        0.28 = coord(7/25)
    
  2. Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (2005) 0.10
    0.096012354 = sum of:
      0.096012354 = product of:
        0.48006177 = sum of:
          0.05279953 = weight(abstract_txt:search in 7) [ClassicSimilarity], result of:
            0.05279953 = score(doc=7,freq=4.0), product of:
              0.0923762 = queryWeight, product of:
                1.1539049 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.021884678 = queryNorm
              0.5715707 = fieldWeight in 7, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.078125 = fieldNorm(doc=7)
          0.030043848 = weight(abstract_txt:data in 7) [ClassicSimilarity], result of:
            0.030043848 = score(doc=7,freq=1.0), product of:
              0.11526413 = queryWeight, product of:
                1.5786386 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.021884678 = queryNorm
              0.26065218 = fieldWeight in 7, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=7)
          0.193537 = weight(abstract_txt:chapter in 7) [ClassicSimilarity], result of:
            0.193537 = score(doc=7,freq=2.0), product of:
              0.27668953 = queryWeight, product of:
                1.997038 = boost
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.021884678 = queryNorm
              0.6994735 = fieldWeight in 7, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.078125 = fieldNorm(doc=7)
          0.11491993 = weight(abstract_txt:authors in 7) [ClassicSimilarity], result of:
            0.11491993 = score(doc=7,freq=2.0), product of:
              0.2237574 = queryWeight, product of:
                2.1995018 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.021884678 = queryNorm
              0.51359165 = fieldWeight in 7, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.078125 = fieldNorm(doc=7)
          0.08876147 = weight(abstract_txt:indexing in 7) [ClassicSimilarity], result of:
            0.08876147 = score(doc=7,freq=1.0), product of:
              0.26120797 = queryWeight, product of:
                2.744089 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.021884678 = queryNorm
              0.3398115 = fieldWeight in 7, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.078125 = fieldNorm(doc=7)
        0.2 = coord(5/25)
    
  3. Milstead, J.L.: Methodologies for subject analysis in bibliographic databases (1992) 0.10
    0.09508077 = sum of:
      0.09508077 = product of:
        0.59425485 = sum of:
          0.1206341 = weight(abstract_txt:trade in 2311) [ClassicSimilarity], result of:
            0.1206341 = score(doc=2311,freq=1.0), product of:
              0.17878976 = queryWeight, product of:
                1.1351318 = boost
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.021884678 = queryNorm
              0.674726 = fieldWeight in 2311, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.09375 = fieldNorm(doc=2311)
          0.20960711 = weight(abstract_txt:offs in 2311) [ClassicSimilarity], result of:
            0.20960711 = score(doc=2311,freq=1.0), product of:
              0.2584044 = queryWeight, product of:
                1.3646615 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.021884678 = queryNorm
              0.8111592 = fieldWeight in 2311, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.09375 = fieldNorm(doc=2311)
          0.0509861 = weight(abstract_txt:data in 2311) [ClassicSimilarity], result of:
            0.0509861 = score(doc=2311,freq=2.0), product of:
              0.11526413 = queryWeight, product of:
                1.5786386 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.021884678 = queryNorm
              0.44234142 = fieldWeight in 2311, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.09375 = fieldNorm(doc=2311)
          0.21302754 = weight(abstract_txt:indexing in 2311) [ClassicSimilarity], result of:
            0.21302754 = score(doc=2311,freq=4.0), product of:
              0.26120797 = queryWeight, product of:
                2.744089 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.021884678 = queryNorm
              0.81554765 = fieldWeight in 2311, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.09375 = fieldNorm(doc=2311)
        0.16 = coord(4/25)
    
  4. Habernal, I.; Konopík, M.; Rohlík, O.: Question answering (2012) 0.09
    0.09211096 = sum of:
      0.09211096 = product of:
        0.38379568 = sum of:
          0.026399765 = weight(abstract_txt:search in 101) [ClassicSimilarity], result of:
            0.026399765 = score(doc=101,freq=1.0), product of:
              0.0923762 = queryWeight, product of:
                1.1539049 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.021884678 = queryNorm
              0.28578535 = fieldWeight in 101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.078125 = fieldNorm(doc=101)
          0.04693427 = weight(abstract_txt:world in 101) [ClassicSimilarity], result of:
            0.04693427 = score(doc=101,freq=1.0), product of:
              0.13556683 = queryWeight, product of:
                1.3978697 = boost
                4.4314575 = idf(docFreq=1429, maxDocs=44218)
                0.021884678 = queryNorm
              0.34620762 = fieldWeight in 101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4314575 = idf(docFreq=1429, maxDocs=44218)
                0.078125 = fieldNorm(doc=101)
          0.062305823 = weight(abstract_txt:wide in 101) [ClassicSimilarity], result of:
            0.062305823 = score(doc=101,freq=1.0), product of:
              0.16374919 = queryWeight, product of:
                1.5363125 = boost
                4.870342 = idf(docFreq=921, maxDocs=44218)
                0.021884678 = queryNorm
              0.38049546 = fieldWeight in 101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.870342 = idf(docFreq=921, maxDocs=44218)
                0.078125 = fieldNorm(doc=101)
          0.030043848 = weight(abstract_txt:data in 101) [ClassicSimilarity], result of:
            0.030043848 = score(doc=101,freq=1.0), product of:
              0.11526413 = queryWeight, product of:
                1.5786386 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.021884678 = queryNorm
              0.26065218 = fieldWeight in 101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=101)
          0.13685131 = weight(abstract_txt:chapter in 101) [ClassicSimilarity], result of:
            0.13685131 = score(doc=101,freq=1.0), product of:
              0.27668953 = queryWeight, product of:
                1.997038 = boost
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.021884678 = queryNorm
              0.49460244 = fieldWeight in 101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.078125 = fieldNorm(doc=101)
          0.08126066 = weight(abstract_txt:authors in 101) [ClassicSimilarity], result of:
            0.08126066 = score(doc=101,freq=1.0), product of:
              0.2237574 = queryWeight, product of:
                2.1995018 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.021884678 = queryNorm
              0.36316413 = fieldWeight in 101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.078125 = fieldNorm(doc=101)
        0.24 = coord(6/25)
    
  5. Head, A.J.: ¬A question of interface design : how do online service GUIs measure up? (1997) 0.09
    0.08866986 = sum of:
      0.08866986 = product of:
        0.55418664 = sum of:
          0.12684381 = weight(abstract_txt:newly in 427) [ClassicSimilarity], result of:
            0.12684381 = score(doc=427,freq=1.0), product of:
              0.1668185 = queryWeight, product of:
                1.0964708 = boost
                6.9519553 = idf(docFreq=114, maxDocs=44218)
                0.021884678 = queryNorm
              0.76037014 = fieldWeight in 427, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9519553 = idf(docFreq=114, maxDocs=44218)
                0.109375 = fieldNorm(doc=427)
          0.1407398 = weight(abstract_txt:trade in 427) [ClassicSimilarity], result of:
            0.1407398 = score(doc=427,freq=1.0), product of:
              0.17878976 = queryWeight, product of:
                1.1351318 = boost
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.021884678 = queryNorm
              0.78718036 = fieldWeight in 427, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.109375 = fieldNorm(doc=427)
          0.24454162 = weight(abstract_txt:offs in 427) [ClassicSimilarity], result of:
            0.24454162 = score(doc=427,freq=1.0), product of:
              0.2584044 = queryWeight, product of:
                1.3646615 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.021884678 = queryNorm
              0.94635236 = fieldWeight in 427, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.109375 = fieldNorm(doc=427)
          0.042061385 = weight(abstract_txt:data in 427) [ClassicSimilarity], result of:
            0.042061385 = score(doc=427,freq=1.0), product of:
              0.11526413 = queryWeight, product of:
                1.5786386 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.021884678 = queryNorm
              0.36491305 = fieldWeight in 427, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.109375 = fieldNorm(doc=427)
        0.16 = coord(4/25)