Document (#40896)

Author
Neumann, M.
Steinberg, J.
Schaer, P.
Title
Web-ccraping for non-programmers : introducing OXPath for digital library metadata harvesting
Source
Code4Lib journal. Issue 38(2017), [http://journal.code4lib.org]
Year
2017
Abstract
Building up new collections for digital libraries is a demanding task. Available data sets have to be extracted which is usually done with the help of software developers as it involves custom data handlers or conversion scripts. In cases where the desired data is only available on the data provider's website custom web scrapers are needed. This may be the case for small to medium-size publishers, research institutes or funding agencies. As data curation is a typical task that is done by people with a library and information science background, these people are usually proficient with XML technologies but are not full-stack programmers. Therefore we would like to present a web scraping tool that does not demand the digital library curators to program custom web scrapers from scratch. We present the open-source tool OXPath, an extension of XPath, that allows the user to define data to be extracted from websites in a declarative way. By taking one of our own use cases as an example, we guide you in more detail through the process of creating an OXPath wrapper for metadata harvesting. We also point out some practical things to consider when creating a web scraper (with OXPath). On top of that, we also present a syntax highlighting plugin for the popular text editor Atom that we developed to further support OXPath users and to simplify the authoring process.
Content
Vgl.: http://journal.code4lib.org/articles/13007.
Theme
Metadaten

Similar documents (author)

  1. Schaer, P.: Integration von Open-Access-Repositorien in Fachportale (2010) 2.12
    2.117316 = sum of:
      2.117316 = product of:
        4.234632 = sum of:
          4.234632 = weight(author_txt:schaer in 2320) [ClassicSimilarity], result of:
            4.234632 = score(doc=2320,freq=1.0), product of:
              0.75920945 = queryWeight, product of:
                1.0800443 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.0787673 = queryNorm
              5.5776863 = fieldWeight in 2320, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.625 = fieldNorm(doc=2320)
        0.5 = coord(1/2)
    
  2. Schaer, P.: Sprachmodelle und neuronale Netze im Information Retrieval (2023) 2.12
    2.117316 = sum of:
      2.117316 = product of:
        4.234632 = sum of:
          4.234632 = weight(author_txt:schaer in 799) [ClassicSimilarity], result of:
            4.234632 = score(doc=799,freq=1.0), product of:
              0.75920945 = queryWeight, product of:
                1.0800443 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.0787673 = queryNorm
              5.5776863 = fieldWeight in 799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.625 = fieldNorm(doc=799)
        0.5 = coord(1/2)
    
  3. Munkelt, J.; Schaer, P.: Towards an IR test collection for the German National Library (2018) 1.69
    1.6938529 = sum of:
      1.6938529 = product of:
        3.3877058 = sum of:
          3.3877058 = weight(author_txt:schaer in 5780) [ClassicSimilarity], result of:
            3.3877058 = score(doc=5780,freq=1.0), product of:
              0.75920945 = queryWeight, product of:
                1.0800443 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.0787673 = queryNorm
              4.462149 = fieldWeight in 5780, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.5 = fieldNorm(doc=5780)
        0.5 = coord(1/2)
    
  4. Neumann. M.: HAL: Hyperspace Analogue to Language (2012) 1.68
    1.680587 = sum of:
      1.680587 = product of:
        3.361174 = sum of:
          3.361174 = weight(author_txt:neumann in 966) [ClassicSimilarity], result of:
            3.361174 = score(doc=966,freq=1.0), product of:
              0.65084636 = queryWeight, product of:
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.0787673 = queryNorm
              5.164313 = fieldWeight in 966, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.625 = fieldNorm(doc=966)
        0.5 = coord(1/2)
    
  5. Neumann, G.: Studienanleitung für das Lehrgebiet alphabetische Katalogisierung (1986) 1.68
    1.680587 = sum of:
      1.680587 = product of:
        3.361174 = sum of:
          3.361174 = weight(author_txt:neumann in 6011) [ClassicSimilarity], result of:
            3.361174 = score(doc=6011,freq=1.0), product of:
              0.65084636 = queryWeight, product of:
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.0787673 = queryNorm
              5.164313 = fieldWeight in 6011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.625 = fieldNorm(doc=6011)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Harlow, C.: Data munging tools in Preparation for RDF : Catmandu and LODRefine (2015) 0.16
    0.16191551 = sum of:
      0.16191551 = product of:
        0.505986 = sum of:
          0.02527125 = weight(abstract_txt:library in 2277) [ClassicSimilarity], result of:
            0.02527125 = score(doc=2277,freq=3.0), product of:
              0.07325586 = queryWeight, product of:
                1.0987227 = boost
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.020922353 = queryNorm
              0.3449724 = fieldWeight in 2277, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.0625 = fieldNorm(doc=2277)
          0.060547564 = weight(abstract_txt:metadata in 2277) [ClassicSimilarity], result of:
            0.060547564 = score(doc=2277,freq=3.0), product of:
              0.114584334 = queryWeight, product of:
                1.1219769 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.020922353 = queryNorm
              0.5284105 = fieldWeight in 2277, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.0625 = fieldNorm(doc=2277)
          0.05160702 = weight(abstract_txt:tool in 2277) [ClassicSimilarity], result of:
            0.05160702 = score(doc=2277,freq=2.0), product of:
              0.11791355 = queryWeight, product of:
                1.1381595 = boost
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.020922353 = queryNorm
              0.43766826 = fieldWeight in 2277, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.0625 = fieldNorm(doc=2277)
          0.01626355 = weight(abstract_txt:with in 2277) [ClassicSimilarity], result of:
            0.01626355 = score(doc=2277,freq=3.0), product of:
              0.06010091 = queryWeight, product of:
                1.1491503 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.020922353 = queryNorm
              0.27060407 = fieldWeight in 2277, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=2277)
          0.05454017 = weight(abstract_txt:cases in 2277) [ClassicSimilarity], result of:
            0.05454017 = score(doc=2277,freq=1.0), product of:
              0.15413888 = queryWeight, product of:
                1.3012998 = boost
                5.6614056 = idf(docFreq=417, maxDocs=44218)
                0.020922353 = queryNorm
              0.35383785 = fieldWeight in 2277, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6614056 = idf(docFreq=417, maxDocs=44218)
                0.0625 = fieldNorm(doc=2277)
          0.014136929 = weight(abstract_txt:that in 2277) [ClassicSimilarity], result of:
            0.014136929 = score(doc=2277,freq=2.0), product of:
              0.06750064 = queryWeight, product of:
                1.3615866 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.020922353 = queryNorm
              0.20943399 = fieldWeight in 2277, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=2277)
          0.18315789 = weight(abstract_txt:programmers in 2277) [ClassicSimilarity], result of:
            0.18315789 = score(doc=2277,freq=1.0), product of:
              0.345662 = queryWeight, product of:
                1.9487095 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.020922353 = queryNorm
              0.5298757 = fieldWeight in 2277, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.0625 = fieldNorm(doc=2277)
          0.1004616 = weight(abstract_txt:data in 2277) [ClassicSimilarity], result of:
            0.1004616 = score(doc=2277,freq=9.0), product of:
              0.16059333 = queryWeight, product of:
                2.3006241 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.020922353 = queryNorm
              0.62556523 = fieldWeight in 2277, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=2277)
        0.32 = coord(8/25)
    
  2. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.15
    0.14839374 = sum of:
      0.14839374 = product of:
        0.9274609 = sum of:
          0.01643209 = weight(abstract_txt:with in 2585) [ClassicSimilarity], result of:
            0.01643209 = score(doc=2585,freq=1.0), product of:
              0.06010091 = queryWeight, product of:
                1.1491503 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.020922353 = queryNorm
              0.27340835 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.017493557 = weight(abstract_txt:that in 2585) [ClassicSimilarity], result of:
            0.017493557 = score(doc=2585,freq=1.0), product of:
              0.06750064 = queryWeight, product of:
                1.3615866 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.020922353 = queryNorm
              0.25916135 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.3205263 = weight(abstract_txt:programmers in 2585) [ClassicSimilarity], result of:
            0.3205263 = score(doc=2585,freq=1.0), product of:
              0.345662 = queryWeight, product of:
                1.9487095 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.020922353 = queryNorm
              0.92728245 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.57300895 = weight(abstract_txt:custom in 2585) [ClassicSimilarity], result of:
            0.57300895 = score(doc=2585,freq=2.0), product of:
              0.46259815 = queryWeight, product of:
                2.7610157 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.020922353 = queryNorm
              1.2386755 = fieldWeight in 2585, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
        0.16 = coord(4/25)
    
  3. Güven, S.; Feiner, S.: ¬A hypermedia authoring tool for augmented and virtual reality (2003) 0.14
    0.13570262 = sum of:
      0.13570262 = product of:
        0.48465225 = sum of:
          0.044508506 = weight(abstract_txt:task in 5935) [ClassicSimilarity], result of:
            0.044508506 = score(doc=5935,freq=1.0), product of:
              0.11599961 = queryWeight, product of:
                1.1288846 = boost
                4.9112997 = idf(docFreq=884, maxDocs=44218)
                0.020922353 = queryNorm
              0.3836953 = fieldWeight in 5935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9112997 = idf(docFreq=884, maxDocs=44218)
                0.078125 = fieldNorm(doc=5935)
          0.04561459 = weight(abstract_txt:tool in 5935) [ClassicSimilarity], result of:
            0.04561459 = score(doc=5935,freq=1.0), product of:
              0.11791355 = queryWeight, product of:
                1.1381595 = boost
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.020922353 = queryNorm
              0.38684773 = fieldWeight in 5935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.078125 = fieldNorm(doc=5935)
          0.016598918 = weight(abstract_txt:with in 5935) [ClassicSimilarity], result of:
            0.016598918 = score(doc=5935,freq=2.0), product of:
              0.06010091 = queryWeight, product of:
                1.1491503 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.020922353 = queryNorm
              0.27618414 = fieldWeight in 5935, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.078125 = fieldNorm(doc=5935)
          0.09013252 = weight(abstract_txt:creating in 5935) [ClassicSimilarity], result of:
            0.09013252 = score(doc=5935,freq=2.0), product of:
              0.1473688 = queryWeight, product of:
                1.2724012 = boost
                5.53568 = idf(docFreq=473, maxDocs=44218)
                0.020922353 = queryNorm
              0.61161196 = fieldWeight in 5935, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.53568 = idf(docFreq=473, maxDocs=44218)
                0.078125 = fieldNorm(doc=5935)
          0.0124953985 = weight(abstract_txt:that in 5935) [ClassicSimilarity], result of:
            0.0124953985 = score(doc=5935,freq=1.0), product of:
              0.06750064 = queryWeight, product of:
                1.3615866 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.020922353 = queryNorm
              0.18511525 = fieldWeight in 5935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=5935)
          0.04635495 = weight(abstract_txt:present in 5935) [ClassicSimilarity], result of:
            0.04635495 = score(doc=5935,freq=1.0), product of:
              0.13643391 = queryWeight, product of:
                1.4994365 = boost
                4.348943 = idf(docFreq=1552, maxDocs=44218)
                0.020922353 = queryNorm
              0.3397612 = fieldWeight in 5935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.348943 = idf(docFreq=1552, maxDocs=44218)
                0.078125 = fieldNorm(doc=5935)
          0.22894737 = weight(abstract_txt:programmers in 5935) [ClassicSimilarity], result of:
            0.22894737 = score(doc=5935,freq=1.0), product of:
              0.345662 = queryWeight, product of:
                1.9487095 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.020922353 = queryNorm
              0.66234463 = fieldWeight in 5935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.078125 = fieldNorm(doc=5935)
        0.28 = coord(7/25)
    
  4. Foulonneau, M.: Information redundancy across metadata collections (2007) 0.13
    0.13324486 = sum of:
      0.13324486 = product of:
        0.41639018 = sum of:
          0.02063389 = weight(abstract_txt:library in 915) [ClassicSimilarity], result of:
            0.02063389 = score(doc=915,freq=2.0), product of:
              0.07325586 = queryWeight, product of:
                1.0987227 = boost
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.020922353 = queryNorm
              0.28166878 = fieldWeight in 915, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.0625 = fieldNorm(doc=915)
          0.115939766 = weight(abstract_txt:metadata in 915) [ClassicSimilarity], result of:
            0.115939766 = score(doc=915,freq=11.0), product of:
              0.114584334 = queryWeight, product of:
                1.1219769 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.020922353 = queryNorm
              1.0118291 = fieldWeight in 915, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.0625 = fieldNorm(doc=915)
          0.009389766 = weight(abstract_txt:with in 915) [ClassicSimilarity], result of:
            0.009389766 = score(doc=915,freq=1.0), product of:
              0.06010091 = queryWeight, product of:
                1.1491503 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.020922353 = queryNorm
              0.15623334 = fieldWeight in 915, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=915)
          0.014136929 = weight(abstract_txt:that in 915) [ClassicSimilarity], result of:
            0.014136929 = score(doc=915,freq=2.0), product of:
              0.06750064 = queryWeight, product of:
                1.3615866 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.020922353 = queryNorm
              0.20943399 = fieldWeight in 915, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=915)
          0.05186903 = weight(abstract_txt:digital in 915) [ClassicSimilarity], result of:
            0.05186903 = score(doc=915,freq=2.0), product of:
              0.1354338 = queryWeight, product of:
                1.4939307 = boost
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.020922353 = queryNorm
              0.3829844 = fieldWeight in 915, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.0625 = fieldNorm(doc=915)
          0.03708396 = weight(abstract_txt:present in 915) [ClassicSimilarity], result of:
            0.03708396 = score(doc=915,freq=1.0), product of:
              0.13643391 = queryWeight, product of:
                1.4994365 = boost
                4.348943 = idf(docFreq=1552, maxDocs=44218)
                0.020922353 = queryNorm
              0.27180895 = fieldWeight in 915, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.348943 = idf(docFreq=1552, maxDocs=44218)
                0.0625 = fieldNorm(doc=915)
          0.13384962 = weight(abstract_txt:harvesting in 915) [ClassicSimilarity], result of:
            0.13384962 = score(doc=915,freq=1.0), product of:
              0.28044388 = queryWeight, product of:
                1.7552714 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.020922353 = queryNorm
              0.47727776 = fieldWeight in 915, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.0625 = fieldNorm(doc=915)
          0.0334872 = weight(abstract_txt:data in 915) [ClassicSimilarity], result of:
            0.0334872 = score(doc=915,freq=1.0), product of:
              0.16059333 = queryWeight, product of:
                2.3006241 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.020922353 = queryNorm
              0.20852174 = fieldWeight in 915, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=915)
        0.32 = coord(8/25)
    
  5. Shreeves, S.L.; Kaczmarek, J.S.; Cole, T.W.: Harvesting cultural heritage metadata using OAI Protocol (2003) 0.12
    0.12106367 = sum of:
      0.12106367 = product of:
        0.50443196 = sum of:
          0.02579236 = weight(abstract_txt:library in 4775) [ClassicSimilarity], result of:
            0.02579236 = score(doc=4775,freq=2.0), product of:
              0.07325586 = queryWeight, product of:
                1.0987227 = boost
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.020922353 = queryNorm
              0.35208598 = fieldWeight in 4775, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.078125 = fieldNorm(doc=4775)
          0.1235922 = weight(abstract_txt:metadata in 4775) [ClassicSimilarity], result of:
            0.1235922 = score(doc=4775,freq=8.0), product of:
              0.114584334 = queryWeight, product of:
                1.1219769 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.020922353 = queryNorm
              1.0786134 = fieldWeight in 4775, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.078125 = fieldNorm(doc=4775)
          0.011737207 = weight(abstract_txt:with in 4775) [ClassicSimilarity], result of:
            0.011737207 = score(doc=4775,freq=1.0), product of:
              0.06010091 = queryWeight, product of:
                1.1491503 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.020922353 = queryNorm
              0.19529167 = fieldWeight in 4775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.078125 = fieldNorm(doc=4775)
          0.064836286 = weight(abstract_txt:digital in 4775) [ClassicSimilarity], result of:
            0.064836286 = score(doc=4775,freq=2.0), product of:
              0.1354338 = queryWeight, product of:
                1.4939307 = boost
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.020922353 = queryNorm
              0.4787305 = fieldWeight in 4775, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.078125 = fieldNorm(doc=4775)
          0.23661493 = weight(abstract_txt:harvesting in 4775) [ClassicSimilarity], result of:
            0.23661493 = score(doc=4775,freq=2.0), product of:
              0.28044388 = queryWeight, product of:
                1.7552714 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.020922353 = queryNorm
              0.8437158 = fieldWeight in 4775, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.078125 = fieldNorm(doc=4775)
          0.041859 = weight(abstract_txt:data in 4775) [ClassicSimilarity], result of:
            0.041859 = score(doc=4775,freq=1.0), product of:
              0.16059333 = queryWeight, product of:
                2.3006241 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.020922353 = queryNorm
              0.26065218 = fieldWeight in 4775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=4775)
        0.24 = coord(6/25)