Document (#26670)

Author
Koch, T.
Ardö, A.
Noodén, L.
Title
¬The construction of a robot-generated subject index : DESIRE II D3.6a, Working Paper 1
Source
http://www.lub.lu.se/desire/DESIRE36a-WP1.html
Year
1999
Abstract
This working paper describes the creation of a test database to carry out the automatic classification tasks of the DESIRE II work package D3.6a on. It is an improved version of NetLab's existing "All" Engineering database created after a comparative study of the outcome of two different approaches to collecting the documents. These two methods were selected from seven different general methodologies to build robot-generated subject indices, presented in this paper. We found a surprisingly low overlap between the Engineering link collections we used as seed pages for the robot and subsequently an even more surprisingly low overlap between the resources collected by the two different approaches. That inspite of using basically the same services to start the harvesting process from. A intellectual evaluation of the contents of both databases showed almost exactly the same percentage of relevant documents (77%), indicating that the main difference between those aproaches was the coverage of the resulting database.
Theme
Automatisches Klassifizieren
Internet
Object
DESIRE

Similar documents (author)

  1. Ardö, A.; Koch, T.: Lunds Universitets Elektroniska Bibliotek : Del.2: Gopher, World Wide Web (WWW). Planerade projekt (1993) 5.88
    5.8811736 = sum of:
      5.8811736 = sum of:
        1.9631602 = weight(author_txt:koch in 6001) [ClassicSimilarity], result of:
          1.9631602 = score(doc=6001,freq=1.0), product of:
            0.5335526 = queryWeight, product of:
              7.358825 = idf(docFreq=73, maxDocs=42740)
              0.07250513 = queryNorm
            3.6794126 = fieldWeight in 6001, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.358825 = idf(docFreq=73, maxDocs=42740)
              0.5 = fieldNorm(doc=6001)
        3.9180133 = weight(author_txt:ardö in 6001) [ClassicSimilarity], result of:
          3.9180133 = score(doc=6001,freq=1.0), product of:
            0.8457669 = queryWeight, product of:
              1.2590319 = boost
              9.264996 = idf(docFreq=10, maxDocs=42740)
              0.07250513 = queryNorm
            4.632498 = fieldWeight in 6001, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.264996 = idf(docFreq=10, maxDocs=42740)
              0.5 = fieldNorm(doc=6001)
    
  2. Ardö, A.; Koch, T.: Wide-area information server (WAIS) as the hub of an electronic library service at Lund University (1993) 5.88
    5.8811736 = sum of:
      5.8811736 = sum of:
        1.9631602 = weight(author_txt:koch in 459) [ClassicSimilarity], result of:
          1.9631602 = score(doc=459,freq=1.0), product of:
            0.5335526 = queryWeight, product of:
              7.358825 = idf(docFreq=73, maxDocs=42740)
              0.07250513 = queryNorm
            3.6794126 = fieldWeight in 459, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.358825 = idf(docFreq=73, maxDocs=42740)
              0.5 = fieldNorm(doc=459)
        3.9180133 = weight(author_txt:ardö in 459) [ClassicSimilarity], result of:
          3.9180133 = score(doc=459,freq=1.0), product of:
            0.8457669 = queryWeight, product of:
              1.2590319 = boost
              9.264996 = idf(docFreq=10, maxDocs=42740)
              0.07250513 = queryNorm
            4.632498 = fieldWeight in 459, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.264996 = idf(docFreq=10, maxDocs=42740)
              0.5 = fieldNorm(doc=459)
    
  3. Ardö, A.; Koch, T.: Automatic classification applied to full-text Internet documents in a robot-generated subject index (1999) 5.88
    5.8811736 = sum of:
      5.8811736 = sum of:
        1.9631602 = weight(author_txt:koch in 1383) [ClassicSimilarity], result of:
          1.9631602 = score(doc=1383,freq=1.0), product of:
            0.5335526 = queryWeight, product of:
              7.358825 = idf(docFreq=73, maxDocs=42740)
              0.07250513 = queryNorm
            3.6794126 = fieldWeight in 1383, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.358825 = idf(docFreq=73, maxDocs=42740)
              0.5 = fieldNorm(doc=1383)
        3.9180133 = weight(author_txt:ardö in 1383) [ClassicSimilarity], result of:
          3.9180133 = score(doc=1383,freq=1.0), product of:
            0.8457669 = queryWeight, product of:
              1.2590319 = boost
              9.264996 = idf(docFreq=10, maxDocs=42740)
              0.07250513 = queryNorm
            4.632498 = fieldWeight in 1383, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.264996 = idf(docFreq=10, maxDocs=42740)
              0.5 = fieldNorm(doc=1383)
    
  4. Koch, T.; Ardö, A.: Automatic classification of full-text HTML-documents from one specific subject area : DESIRE II D3.6a, Working Paper 2 (2000) 5.88
    5.8811736 = sum of:
      5.8811736 = sum of:
        1.9631602 = weight(author_txt:koch in 2668) [ClassicSimilarity], result of:
          1.9631602 = score(doc=2668,freq=1.0), product of:
            0.5335526 = queryWeight, product of:
              7.358825 = idf(docFreq=73, maxDocs=42740)
              0.07250513 = queryNorm
            3.6794126 = fieldWeight in 2668, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.358825 = idf(docFreq=73, maxDocs=42740)
              0.5 = fieldNorm(doc=2668)
        3.9180133 = weight(author_txt:ardö in 2668) [ClassicSimilarity], result of:
          3.9180133 = score(doc=2668,freq=1.0), product of:
            0.8457669 = queryWeight, product of:
              1.2590319 = boost
              9.264996 = idf(docFreq=10, maxDocs=42740)
              0.07250513 = queryNorm
            4.632498 = fieldWeight in 2668, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.264996 = idf(docFreq=10, maxDocs=42740)
              0.5 = fieldNorm(doc=2668)
    
  5. Koch, T.; Ardö, A.; Brümmer, A.: ¬The building and maintenance of robot based internet search services : A review of current indexing and data collection methods. Prepared to meet the requirements of Work Package 3 of EU Telematics for Research, project DESIRE. Version D3.11v0.3 (Draft version 3) (1996) 4.41
    4.41088 = sum of:
      4.41088 = sum of:
        1.47237 = weight(author_txt:koch in 2670) [ClassicSimilarity], result of:
          1.47237 = score(doc=2670,freq=1.0), product of:
            0.5335526 = queryWeight, product of:
              7.358825 = idf(docFreq=73, maxDocs=42740)
              0.07250513 = queryNorm
            2.7595594 = fieldWeight in 2670, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.358825 = idf(docFreq=73, maxDocs=42740)
              0.375 = fieldNorm(doc=2670)
        2.93851 = weight(author_txt:ardö in 2670) [ClassicSimilarity], result of:
          2.93851 = score(doc=2670,freq=1.0), product of:
            0.8457669 = queryWeight, product of:
              1.2590319 = boost
              9.264996 = idf(docFreq=10, maxDocs=42740)
              0.07250513 = queryNorm
            3.4743733 = fieldWeight in 2670, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.264996 = idf(docFreq=10, maxDocs=42740)
              0.375 = fieldNorm(doc=2670)
    

Similar documents (content)

  1. Ardö, A.; Godby, J.; Houghton, A.; Koch, T.; Reighart, R.; Thompson, R.; Vizine-Goetz, D.: Browsing engineering resources on the Web : a general knowledge organization scheme (Dewey) vs. a special scheme (EI) (2000) 0.25
    0.2549624 = sum of:
      0.2549624 = product of:
        1.0623434 = sum of:
          0.0295696 = weight(abstract_txt:subject in 1087) [ClassicSimilarity], result of:
            0.0295696 = score(doc=1087,freq=1.0), product of:
              0.08066469 = queryWeight, product of:
                1.1304277 = boost
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.018249456 = queryNorm
              0.3665743 = fieldWeight in 1087, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.09375 = fieldNorm(doc=1087)
          0.048755195 = weight(abstract_txt:documents in 1087) [ClassicSimilarity], result of:
            0.048755195 = score(doc=1087,freq=2.0), product of:
              0.089356 = queryWeight, product of:
                1.1897697 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.018249456 = queryNorm
              0.54562867 = fieldWeight in 1087, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.09375 = fieldNorm(doc=1087)
          0.08411938 = weight(abstract_txt:generated in 1087) [ClassicSimilarity], result of:
            0.08411938 = score(doc=1087,freq=1.0), product of:
              0.16195108 = queryWeight, product of:
                1.6017436 = boost
                5.5403976 = idf(docFreq=455, maxDocs=42740)
                0.018249456 = queryNorm
              0.5194123 = fieldWeight in 1087, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5403976 = idf(docFreq=455, maxDocs=42740)
                0.09375 = fieldNorm(doc=1087)
          0.21111375 = weight(abstract_txt:engineering in 1087) [ClassicSimilarity], result of:
            0.21111375 = score(doc=1087,freq=4.0), product of:
              0.18841298 = queryWeight, product of:
                1.7276528 = boost
                5.975915 = idf(docFreq=294, maxDocs=42740)
                0.018249456 = queryNorm
              1.1204841 = fieldWeight in 1087, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.975915 = idf(docFreq=294, maxDocs=42740)
                0.09375 = fieldNorm(doc=1087)
          0.19710547 = weight(abstract_txt:desire in 1087) [ClassicSimilarity], result of:
            0.19710547 = score(doc=1087,freq=1.0), product of:
              0.28570572 = queryWeight, product of:
                2.1274557 = boost
                7.358825 = idf(docFreq=73, maxDocs=42740)
                0.018249456 = queryNorm
              0.68988985 = fieldWeight in 1087, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.358825 = idf(docFreq=73, maxDocs=42740)
                0.09375 = fieldNorm(doc=1087)
          0.49168 = weight(abstract_txt:robot in 1087) [ClassicSimilarity], result of:
            0.49168 = score(doc=1087,freq=1.0), product of:
              0.6015503 = queryWeight, product of:
                3.7807908 = boost
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.018249456 = queryNorm
              0.8173548 = fieldWeight in 1087, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.09375 = fieldNorm(doc=1087)
        0.24 = coord(6/25)
    
  2. Koch, T.; Vizine-Goetz, D.: Automatic classification and content navigation support for Web services : DESIRE II cooperates with OCLC (1998) 0.19
    0.18501695 = sum of:
      0.18501695 = product of:
        0.9250847 = sum of:
          0.0295696 = weight(abstract_txt:subject in 2569) [ClassicSimilarity], result of:
            0.0295696 = score(doc=2569,freq=1.0), product of:
              0.08066469 = queryWeight, product of:
                1.1304277 = boost
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.018249456 = queryNorm
              0.3665743 = fieldWeight in 2569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.09375 = fieldNorm(doc=2569)
          0.14927995 = weight(abstract_txt:engineering in 2569) [ClassicSimilarity], result of:
            0.14927995 = score(doc=2569,freq=2.0), product of:
              0.18841298 = queryWeight, product of:
                1.7276528 = boost
                5.975915 = idf(docFreq=294, maxDocs=42740)
                0.018249456 = queryNorm
              0.79230183 = fieldWeight in 2569, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.975915 = idf(docFreq=294, maxDocs=42740)
                0.09375 = fieldNorm(doc=2569)
          0.05744965 = weight(abstract_txt:database in 2569) [ClassicSimilarity], result of:
            0.05744965 = score(doc=2569,freq=1.0), product of:
              0.14377227 = queryWeight, product of:
                1.8483503 = boost
                4.26227 = idf(docFreq=1636, maxDocs=42740)
                0.018249456 = queryNorm
              0.3995878 = fieldWeight in 2569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.26227 = idf(docFreq=1636, maxDocs=42740)
                0.09375 = fieldNorm(doc=2569)
          0.19710547 = weight(abstract_txt:desire in 2569) [ClassicSimilarity], result of:
            0.19710547 = score(doc=2569,freq=1.0), product of:
              0.28570572 = queryWeight, product of:
                2.1274557 = boost
                7.358825 = idf(docFreq=73, maxDocs=42740)
                0.018249456 = queryNorm
              0.68988985 = fieldWeight in 2569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.358825 = idf(docFreq=73, maxDocs=42740)
                0.09375 = fieldNorm(doc=2569)
          0.49168 = weight(abstract_txt:robot in 2569) [ClassicSimilarity], result of:
            0.49168 = score(doc=2569,freq=1.0), product of:
              0.6015503 = queryWeight, product of:
                3.7807908 = boost
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.018249456 = queryNorm
              0.8173548 = fieldWeight in 2569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.09375 = fieldNorm(doc=2569)
        0.2 = coord(5/25)
    
  3. Lindholm, J.; Schönthal, T.; Jansson , K.: Experiences of harvesting Web resources in engineering using automatic classification (2003) 0.15
    0.15166418 = sum of:
      0.15166418 = product of:
        0.9479012 = sum of:
          0.039426133 = weight(abstract_txt:subject in 89) [ClassicSimilarity], result of:
            0.039426133 = score(doc=89,freq=1.0), product of:
              0.08066469 = queryWeight, product of:
                1.1304277 = boost
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.018249456 = queryNorm
              0.48876572 = fieldWeight in 89, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.125 = fieldNorm(doc=89)
          0.11215917 = weight(abstract_txt:generated in 89) [ClassicSimilarity], result of:
            0.11215917 = score(doc=89,freq=1.0), product of:
              0.16195108 = queryWeight, product of:
                1.6017436 = boost
                5.5403976 = idf(docFreq=455, maxDocs=42740)
                0.018249456 = queryNorm
              0.6925497 = fieldWeight in 89, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5403976 = idf(docFreq=455, maxDocs=42740)
                0.125 = fieldNorm(doc=89)
          0.1407425 = weight(abstract_txt:engineering in 89) [ClassicSimilarity], result of:
            0.1407425 = score(doc=89,freq=1.0), product of:
              0.18841298 = queryWeight, product of:
                1.7276528 = boost
                5.975915 = idf(docFreq=294, maxDocs=42740)
                0.018249456 = queryNorm
              0.74698937 = fieldWeight in 89, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.975915 = idf(docFreq=294, maxDocs=42740)
                0.125 = fieldNorm(doc=89)
          0.65557337 = weight(abstract_txt:robot in 89) [ClassicSimilarity], result of:
            0.65557337 = score(doc=89,freq=1.0), product of:
              0.6015503 = queryWeight, product of:
                3.7807908 = boost
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.018249456 = queryNorm
              1.0898064 = fieldWeight in 89, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.125 = fieldNorm(doc=89)
        0.16 = coord(4/25)
    
  4. MacCain, K.W.: Descriptor and citation retrieval in the medical behavioral sciences literature : retrieval overlaps and novelty distribution (1989) 0.15
    0.15153207 = sum of:
      0.15153207 = product of:
        0.541186 = sum of:
          0.07574632 = weight(abstract_txt:percentage in 2290) [ClassicSimilarity], result of:
            0.07574632 = score(doc=2290,freq=1.0), product of:
              0.13535418 = queryWeight, product of:
                1.0354328 = boost
                7.1630807 = idf(docFreq=89, maxDocs=42740)
                0.018249456 = queryNorm
              0.5596157 = fieldWeight in 2290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1630807 = idf(docFreq=89, maxDocs=42740)
                0.078125 = fieldNorm(doc=2290)
          0.024641333 = weight(abstract_txt:subject in 2290) [ClassicSimilarity], result of:
            0.024641333 = score(doc=2290,freq=1.0), product of:
              0.08066469 = queryWeight, product of:
                1.1304277 = boost
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.018249456 = queryNorm
              0.30547857 = fieldWeight in 2290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.078125 = fieldNorm(doc=2290)
          0.028729271 = weight(abstract_txt:documents in 2290) [ClassicSimilarity], result of:
            0.028729271 = score(doc=2290,freq=1.0), product of:
              0.089356 = queryWeight, product of:
                1.1897697 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.018249456 = queryNorm
              0.32151476 = fieldWeight in 2290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.078125 = fieldNorm(doc=2290)
          0.037110582 = weight(abstract_txt:between in 2290) [ClassicSimilarity], result of:
            0.037110582 = score(doc=2290,freq=2.0), product of:
              0.0962926 = queryWeight, product of:
                1.5126663 = boost
                3.4881876 = idf(docFreq=3549, maxDocs=42740)
                0.018249456 = queryNorm
              0.38539392 = fieldWeight in 2290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4881876 = idf(docFreq=3549, maxDocs=42740)
                0.078125 = fieldNorm(doc=2290)
          0.031179942 = weight(abstract_txt:different in 2290) [ClassicSimilarity], result of:
            0.031179942 = score(doc=2290,freq=1.0), product of:
              0.108024254 = queryWeight, product of:
                1.6021655 = boost
                3.694571 = idf(docFreq=2887, maxDocs=42740)
                0.018249456 = queryNorm
              0.28863835 = fieldWeight in 2290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.694571 = idf(docFreq=2887, maxDocs=42740)
                0.078125 = fieldNorm(doc=2290)
          0.06770506 = weight(abstract_txt:database in 2290) [ClassicSimilarity], result of:
            0.06770506 = score(doc=2290,freq=2.0), product of:
              0.14377227 = queryWeight, product of:
                1.8483503 = boost
                4.26227 = idf(docFreq=1636, maxDocs=42740)
                0.018249456 = queryNorm
              0.47091874 = fieldWeight in 2290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.26227 = idf(docFreq=1636, maxDocs=42740)
                0.078125 = fieldNorm(doc=2290)
          0.27607343 = weight(abstract_txt:overlap in 2290) [ClassicSimilarity], result of:
            0.27607343 = score(doc=2290,freq=4.0), product of:
              0.25443122 = queryWeight, product of:
                2.007642 = boost
                6.9443917 = idf(docFreq=111, maxDocs=42740)
                0.018249456 = queryNorm
              1.0850612 = fieldWeight in 2290, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.9443917 = idf(docFreq=111, maxDocs=42740)
                0.078125 = fieldNorm(doc=2290)
        0.28 = coord(7/25)
    
  5. Kimmel, S.: WWW search tools in reference services (1997) 0.15
    0.14528856 = sum of:
      0.14528856 = product of:
        1.210738 = sum of:
          0.0591392 = weight(abstract_txt:subject in 1620) [ClassicSimilarity], result of:
            0.0591392 = score(doc=1620,freq=1.0), product of:
              0.08066469 = queryWeight, product of:
                1.1304277 = boost
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.018249456 = queryNorm
              0.7331486 = fieldWeight in 1620, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.1875 = fieldNorm(doc=1620)
          0.16823876 = weight(abstract_txt:generated in 1620) [ClassicSimilarity], result of:
            0.16823876 = score(doc=1620,freq=1.0), product of:
              0.16195108 = queryWeight, product of:
                1.6017436 = boost
                5.5403976 = idf(docFreq=455, maxDocs=42740)
                0.018249456 = queryNorm
              1.0388246 = fieldWeight in 1620, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5403976 = idf(docFreq=455, maxDocs=42740)
                0.1875 = fieldNorm(doc=1620)
          0.98336 = weight(abstract_txt:robot in 1620) [ClassicSimilarity], result of:
            0.98336 = score(doc=1620,freq=1.0), product of:
              0.6015503 = queryWeight, product of:
                3.7807908 = boost
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.018249456 = queryNorm
              1.6347096 = fieldWeight in 1620, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.1875 = fieldNorm(doc=1620)
        0.12 = coord(3/25)