Search (3 results, page 1 of 1)

  • × theme_ss:"Internet"
  • × author_ss:"Koch, T."
  1. Koch, T.; Ardö, A.; Noodén, L.: ¬The construction of a robot-generated subject index : DESIRE II D3.6a, Working Paper 1 (1999) 0.03
    0.034764133 = product of:
      0.06952827 = sum of:
        0.06952827 = product of:
          0.13905653 = sum of:
            0.13905653 = weight(_text_:ii in 1668) [ClassicSimilarity], result of:
              0.13905653 = score(doc=1668,freq=4.0), product of:
                0.2745971 = queryWeight, product of:
                  5.4016213 = idf(docFreq=541, maxDocs=44218)
                  0.050836053 = queryNorm
                0.506402 = fieldWeight in 1668, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.4016213 = idf(docFreq=541, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1668)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This working paper describes the creation of a test database to carry out the automatic classification tasks of the DESIRE II work package D3.6a on. It is an improved version of NetLab's existing "All" Engineering database created after a comparative study of the outcome of two different approaches to collecting the documents. These two methods were selected from seven different general methodologies to build robot-generated subject indices, presented in this paper. We found a surprisingly low overlap between the Engineering link collections we used as seed pages for the robot and subsequently an even more surprisingly low overlap between the resources collected by the two different approaches. That inspite of using basically the same services to start the harvesting process from. A intellectual evaluation of the contents of both databases showed almost exactly the same percentage of relevant documents (77%), indicating that the main difference between those aproaches was the coverage of the resulting database.
  2. Koch, T.; Ardö, A.: Automatic classification of full-text HTML-documents from one specific subject area : DESIRE II D3.6a, Working Paper 2 (2000) 0.03
    0.03277594 = product of:
      0.06555188 = sum of:
        0.06555188 = product of:
          0.13110375 = sum of:
            0.13110375 = weight(_text_:ii in 1667) [ClassicSimilarity], result of:
              0.13110375 = score(doc=1667,freq=2.0), product of:
                0.2745971 = queryWeight, product of:
                  5.4016213 = idf(docFreq=541, maxDocs=44218)
                  0.050836053 = queryNorm
                0.4774404 = fieldWeight in 1667, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4016213 = idf(docFreq=541, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1667)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  3. Ardö, A.; Godby, J.; Houghton, A.; Koch, T.; Reighart, R.; Thompson, R.; Vizine-Goetz, D.: Browsing engineering resources on the Web : a general knowledge organization scheme (Dewey) vs. a special scheme (EI) (2000) 0.02
    0.024581954 = product of:
      0.049163908 = sum of:
        0.049163908 = product of:
          0.098327816 = sum of:
            0.098327816 = weight(_text_:ii in 86) [ClassicSimilarity], result of:
              0.098327816 = score(doc=86,freq=2.0), product of:
                0.2745971 = queryWeight, product of:
                  5.4016213 = idf(docFreq=541, maxDocs=44218)
                  0.050836053 = queryNorm
                0.3580803 = fieldWeight in 86, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4016213 = idf(docFreq=541, maxDocs=44218)
                  0.046875 = fieldNorm(doc=86)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Under the auspices of the Desire II project, researchers at NetLab and OCLC are providing searching and browsing of a test collection of engineering documents on the Web. The goal of the project is to explore simple methods of automatic classification to provide subject browsing of a robot-generated engineering index. At NetLab the documents are automatically classified and organized using an engineering-specific scheme, the Engineering Index (Ei) Thesaurus and Classification; at OCLC the Dewey Decimal Classification (DDC), a general knowledge organization scheme, is being used