Document (#37507)

Author
Derek Doran, D.
Gokhale, S.S.
Title
¬A classification framework for web robots
Source
Journal of the American Society for Information Science and Technology. 63(2012) no.12, S.2549-2554,
Year
2012
Series
Brief communication
Abstract
The behavior of modern web robots varies widely when they crawl for different purposes. In this article, we present a framework to classify these web robots from two orthogonal perspectives, namely, their functionality and the types of resources they consume. Applying the classification framework to a year-long access log from the UConn SoE web server, we present trends that point to significant differences in their crawling behavior.
Theme
Internet
Data Mining

Similar documents (author)

  1. Doran, K.: Unified disparity : theory and practice of union listing (1996) 6.18
    6.176928 = sum of:
      6.176928 = weight(author_txt:doran in 5795) [ClassicSimilarity], result of:
        6.176928 = score(doc=5795,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            9.883085 = idf(docFreq=5, maxDocs=43254)
            0.101182975 = queryNorm
          6.1769285 = fieldWeight in 5795, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            9.883085 = idf(docFreq=5, maxDocs=43254)
            0.625 = fieldNorm(doc=5795)
    
  2. Doran, C.; Martin, C.: Measuring success in outsourced cataloging : a data-driven investigation (2017) 4.94
    4.941542 = sum of:
      4.941542 = weight(author_txt:doran in 151) [ClassicSimilarity], result of:
        4.941542 = score(doc=151,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            9.883085 = idf(docFreq=5, maxDocs=43254)
            0.101182975 = queryNorm
          4.9415426 = fieldWeight in 151, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            9.883085 = idf(docFreq=5, maxDocs=43254)
            0.5 = fieldNorm(doc=151)
    
  3. Rittschof, K.A.; Kulhavy, R.W.; Stock, W.A.; Verdi, M.P.; Doran, J.M.: Thematic maps improve memory for facts and inferences : a test of the stimulus order hypothesis (1994) 3.09
    3.088464 = sum of:
      3.088464 = weight(author_txt:doran in 3158) [ClassicSimilarity], result of:
        3.088464 = score(doc=3158,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            9.883085 = idf(docFreq=5, maxDocs=43254)
            0.101182975 = queryNorm
          3.0884643 = fieldWeight in 3158, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            9.883085 = idf(docFreq=5, maxDocs=43254)
            0.3125 = fieldNorm(doc=3158)
    
  4. Monireh, E.; Sarker, M.K.; Bianchi, F.; Hitzler, P.; Doran, D.; Xie, N.: Reasoning over RDF knowledge bases using deep learning (2018) 2.47
    2.470771 = sum of:
      2.470771 = weight(author_txt:doran in 554) [ClassicSimilarity], result of:
        2.470771 = score(doc=554,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            9.883085 = idf(docFreq=5, maxDocs=43254)
            0.101182975 = queryNorm
          2.4707713 = fieldWeight in 554, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            9.883085 = idf(docFreq=5, maxDocs=43254)
            0.25 = fieldNorm(doc=554)
    

Similar documents (content)

  1. Byers, D.: Full-text indexing of non-textual resources (1998) 0.13
    0.13166808 = sum of:
      0.13166808 = product of:
        0.82292545 = sum of:
          0.03028537 = weight(abstract_txt:from in 5607) [ClassicSimilarity], result of:
            0.03028537 = score(doc=5607,freq=3.0), product of:
              0.050306708 = queryWeight, product of:
                1.0499873 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.017230801 = queryNorm
              0.60201454 = fieldWeight in 5607, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.125 = fieldNorm(doc=5607)
          0.08898005 = weight(abstract_txt:server in 5607) [ClassicSimilarity], result of:
            0.08898005 = score(doc=5607,freq=1.0), product of:
              0.118130706 = queryWeight, product of:
                1.1377256 = boost
                6.025871 = idf(docFreq=283, maxDocs=43254)
                0.017230801 = queryNorm
              0.75323385 = fieldWeight in 5607, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.025871 = idf(docFreq=283, maxDocs=43254)
                0.125 = fieldNorm(doc=5607)
          0.043597694 = weight(abstract_txt:they in 5607) [ClassicSimilarity], result of:
            0.043597694 = score(doc=5607,freq=1.0), product of:
              0.09250248 = queryWeight, product of:
                1.4237962 = boost
                3.7705102 = idf(docFreq=2708, maxDocs=43254)
                0.017230801 = queryNorm
              0.47131377 = fieldWeight in 5607, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7705102 = idf(docFreq=2708, maxDocs=43254)
                0.125 = fieldNorm(doc=5607)
          0.6600623 = weight(abstract_txt:robots in 5607) [ClassicSimilarity], result of:
            0.6600623 = score(doc=5607,freq=1.0), product of:
              0.64803445 = queryWeight, product of:
                4.6154685 = boost
                8.148484 = idf(docFreq=33, maxDocs=43254)
                0.017230801 = queryNorm
              1.0185605 = fieldWeight in 5607, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.148484 = idf(docFreq=33, maxDocs=43254)
                0.125 = fieldNorm(doc=5607)
        0.16 = coord(4/25)
    
  2. Kimmel, S.: Robot-generated databases on the World Wide Web (1996) 0.12
    0.11934624 = sum of:
      0.11934624 = product of:
        0.994552 = sum of:
          0.017485267 = weight(abstract_txt:from in 5793) [ClassicSimilarity], result of:
            0.017485267 = score(doc=5793,freq=1.0), product of:
              0.050306708 = queryWeight, product of:
                1.0499873 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.017230801 = queryNorm
              0.34757328 = fieldWeight in 5793, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.125 = fieldNorm(doc=5793)
          0.043597694 = weight(abstract_txt:they in 5793) [ClassicSimilarity], result of:
            0.043597694 = score(doc=5793,freq=1.0), product of:
              0.09250248 = queryWeight, product of:
                1.4237962 = boost
                3.7705102 = idf(docFreq=2708, maxDocs=43254)
                0.017230801 = queryNorm
              0.47131377 = fieldWeight in 5793, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7705102 = idf(docFreq=2708, maxDocs=43254)
                0.125 = fieldNorm(doc=5793)
          0.93346906 = weight(abstract_txt:robots in 5793) [ClassicSimilarity], result of:
            0.93346906 = score(doc=5793,freq=2.0), product of:
              0.64803445 = queryWeight, product of:
                4.6154685 = boost
                8.148484 = idf(docFreq=33, maxDocs=43254)
                0.017230801 = queryNorm
              1.4404621 = fieldWeight in 5793, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.148484 = idf(docFreq=33, maxDocs=43254)
                0.125 = fieldNorm(doc=5793)
        0.12 = coord(3/25)
    
  3. Moya Anegón, F. de; López-Huertas, M.J.: ¬An automatic model for updating the conceptual structure of a scientific discipline (2000) 0.10
    0.098047175 = sum of:
      0.098047175 = product of:
        0.4085299 = sum of:
          0.015299609 = weight(abstract_txt:from in 2127) [ClassicSimilarity], result of:
            0.015299609 = score(doc=2127,freq=4.0), product of:
              0.050306708 = queryWeight, product of:
                1.0499873 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.017230801 = queryNorm
              0.30412662 = fieldWeight in 2127, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.0546875 = fieldNorm(doc=2127)
          0.037014827 = weight(abstract_txt:applying in 2127) [ClassicSimilarity], result of:
            0.037014827 = score(doc=2127,freq=1.0), product of:
              0.11422631 = queryWeight, product of:
                1.1187658 = boost
                5.925452 = idf(docFreq=313, maxDocs=43254)
                0.017230801 = queryNorm
              0.32404816 = fieldWeight in 2127, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.925452 = idf(docFreq=313, maxDocs=43254)
                0.0546875 = fieldNorm(doc=2127)
          0.016208772 = weight(abstract_txt:their in 2127) [ClassicSimilarity], result of:
            0.016208772 = score(doc=2127,freq=2.0), product of:
              0.065869205 = queryWeight, product of:
                1.2014683 = boost
                3.1817396 = idf(docFreq=4880, maxDocs=43254)
                0.017230801 = queryNorm
              0.24607511 = fieldWeight in 2127, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1817396 = idf(docFreq=4880, maxDocs=43254)
                0.0546875 = fieldNorm(doc=2127)
          0.019073991 = weight(abstract_txt:they in 2127) [ClassicSimilarity], result of:
            0.019073991 = score(doc=2127,freq=1.0), product of:
              0.09250248 = queryWeight, product of:
                1.4237962 = boost
                3.7705102 = idf(docFreq=2708, maxDocs=43254)
                0.017230801 = queryNorm
              0.20619978 = fieldWeight in 2127, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7705102 = idf(docFreq=2708, maxDocs=43254)
                0.0546875 = fieldNorm(doc=2127)
          0.032155447 = weight(abstract_txt:classification in 2127) [ClassicSimilarity], result of:
            0.032155447 = score(doc=2127,freq=2.0), product of:
              0.1039965 = queryWeight, product of:
                1.5096647 = boost
                3.9979079 = idf(docFreq=2157, maxDocs=43254)
                0.017230801 = queryNorm
              0.3091974 = fieldWeight in 2127, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9979079 = idf(docFreq=2157, maxDocs=43254)
                0.0546875 = fieldNorm(doc=2127)
          0.28877726 = weight(abstract_txt:robots in 2127) [ClassicSimilarity], result of:
            0.28877726 = score(doc=2127,freq=1.0), product of:
              0.64803445 = queryWeight, product of:
                4.6154685 = boost
                8.148484 = idf(docFreq=33, maxDocs=43254)
                0.017230801 = queryNorm
              0.44562024 = fieldWeight in 2127, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.148484 = idf(docFreq=33, maxDocs=43254)
                0.0546875 = fieldNorm(doc=2127)
        0.24 = coord(6/25)
    
  4. Day, R.E.: Indexing it all : the subject in the age of documentation, information, and data (2014) 0.09
    0.09315822 = sum of:
      0.09315822 = product of:
        0.4657911 = sum of:
          0.015142685 = weight(abstract_txt:from in 4489) [ClassicSimilarity], result of:
            0.015142685 = score(doc=4489,freq=3.0), product of:
              0.050306708 = queryWeight, product of:
                1.0499873 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.017230801 = queryNorm
              0.30100727 = fieldWeight in 4489, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.0625 = fieldNorm(doc=4489)
          0.038506046 = weight(abstract_txt:purposes in 4489) [ClassicSimilarity], result of:
            0.038506046 = score(doc=4489,freq=1.0), product of:
              0.10728533 = queryWeight, product of:
                1.0842421 = boost
                5.7426 = idf(docFreq=376, maxDocs=43254)
                0.017230801 = queryNorm
              0.3589125 = fieldWeight in 4489, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7426 = idf(docFreq=376, maxDocs=43254)
                0.0625 = fieldNorm(doc=4489)
          0.06901257 = weight(abstract_txt:modern in 4489) [ClassicSimilarity], result of:
            0.06901257 = score(doc=4489,freq=3.0), product of:
              0.10975713 = queryWeight, product of:
                1.0966612 = boost
                5.808377 = idf(docFreq=352, maxDocs=43254)
                0.017230801 = queryNorm
              0.62877524 = fieldWeight in 4489, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.808377 = idf(docFreq=352, maxDocs=43254)
                0.0625 = fieldNorm(doc=4489)
          0.0130986655 = weight(abstract_txt:their in 4489) [ClassicSimilarity], result of:
            0.0130986655 = score(doc=4489,freq=1.0), product of:
              0.065869205 = queryWeight, product of:
                1.2014683 = boost
                3.1817396 = idf(docFreq=4880, maxDocs=43254)
                0.017230801 = queryNorm
              0.19885872 = fieldWeight in 4489, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1817396 = idf(docFreq=4880, maxDocs=43254)
                0.0625 = fieldNorm(doc=4489)
          0.33003116 = weight(abstract_txt:robots in 4489) [ClassicSimilarity], result of:
            0.33003116 = score(doc=4489,freq=1.0), product of:
              0.64803445 = queryWeight, product of:
                4.6154685 = boost
                8.148484 = idf(docFreq=33, maxDocs=43254)
                0.017230801 = queryNorm
              0.50928026 = fieldWeight in 4489, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.148484 = idf(docFreq=33, maxDocs=43254)
                0.0625 = fieldNorm(doc=4489)
        0.2 = coord(5/25)
    
  5. Hidalgo, C.: Why information grows : the evolution of order, from atoms to economies (2015) 0.09
    0.08859318 = sum of:
      0.08859318 = product of:
        0.44296587 = sum of:
          0.011357013 = weight(abstract_txt:from in 3619) [ClassicSimilarity], result of:
            0.011357013 = score(doc=3619,freq=3.0), product of:
              0.050306708 = queryWeight, product of:
                1.0499873 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.017230801 = queryNorm
              0.22575545 = fieldWeight in 3619, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.046875 = fieldNorm(doc=3619)
          0.031726994 = weight(abstract_txt:applying in 3619) [ClassicSimilarity], result of:
            0.031726994 = score(doc=3619,freq=1.0), product of:
              0.11422631 = queryWeight, product of:
                1.1187658 = boost
                5.925452 = idf(docFreq=313, maxDocs=43254)
                0.017230801 = queryNorm
              0.27775556 = fieldWeight in 3619, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.925452 = idf(docFreq=313, maxDocs=43254)
                0.046875 = fieldNorm(doc=3619)
          0.013893234 = weight(abstract_txt:their in 3619) [ClassicSimilarity], result of:
            0.013893234 = score(doc=3619,freq=2.0), product of:
              0.065869205 = queryWeight, product of:
                1.2014683 = boost
                3.1817396 = idf(docFreq=4880, maxDocs=43254)
                0.017230801 = queryNorm
              0.21092153 = fieldWeight in 3619, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1817396 = idf(docFreq=4880, maxDocs=43254)
                0.046875 = fieldNorm(doc=3619)
          0.03593774 = weight(abstract_txt:present in 3619) [ClassicSimilarity], result of:
            0.03593774 = score(doc=3619,freq=2.0), product of:
              0.124121614 = queryWeight, product of:
                1.6492816 = boost
                4.367643 = idf(docFreq=1490, maxDocs=43254)
                0.017230801 = queryNorm
              0.28953654 = fieldWeight in 3619, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.367643 = idf(docFreq=1490, maxDocs=43254)
                0.046875 = fieldNorm(doc=3619)
          0.3500509 = weight(abstract_txt:robots in 3619) [ClassicSimilarity], result of:
            0.3500509 = score(doc=3619,freq=2.0), product of:
              0.64803445 = queryWeight, product of:
                4.6154685 = boost
                8.148484 = idf(docFreq=33, maxDocs=43254)
                0.017230801 = queryNorm
              0.5401733 = fieldWeight in 3619, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.148484 = idf(docFreq=33, maxDocs=43254)
                0.046875 = fieldNorm(doc=3619)
        0.2 = coord(5/25)