Search (2 results, page 1 of 1)

  • × author_ss:"Ardö, A."
  • × author_ss:"Golub, K."
  1. Golub, K.; Hamon, T.; Ardö, A.: Automated classification of textual documents based on a controlled vocabulary in engineering (2007) 0.02
    0.015148896 = product of:
      0.03787224 = sum of:
        0.027688364 = weight(_text_:on in 1461) [ClassicSimilarity], result of:
          0.027688364 = score(doc=1461,freq=6.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.25253648 = fieldWeight in 1461, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=1461)
        0.0101838745 = weight(_text_:information in 1461) [ClassicSimilarity], result of:
          0.0101838745 = score(doc=1461,freq=2.0), product of:
            0.08751074 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.049850095 = queryNorm
            0.116372846 = fieldWeight in 1461, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1461)
      0.4 = coord(2/5)
    
    Abstract
    Automated subject classification has been a challenging research issue for many years now, receiving particular attention in the past decade due to rapid increase of digital documents. The most frequent approach to automated classification is machine learning. It, however, requires training documents and performs well on new documents only if these are similar enough to the former. We explore a string-matching algorithm based on a controlled vocabulary, which does not require training documents - instead it reuses the intellectual work put into creating the controlled vocabulary. Terms from the Engineering Information thesaurus and classification scheme were matched against title and abstract of engineering papers from the Compendex database. Simple string-matching was enhanced by several methods such as term weighting schemes and cut-offs, exclusion of certain terms, and en- richment of the controlled vocabulary with automatically extracted terms. The best results are 76% recall when the controlled vocabulary is enriched with new terms, and 79% precision when certain terms are excluded. Precision of individual classes is up to 98%. These results are comparable to state-of-the-art machine-learning algorithms.
  2. Koch, T.; Golub, K.; Ardö, A.: Users browsing behaviour in a DDC-based Web service : a log analysis (2006) 0.00
    0.0031971764 = product of:
      0.015985882 = sum of:
        0.015985882 = weight(_text_:on in 2234) [ClassicSimilarity], result of:
          0.015985882 = score(doc=2234,freq=2.0), product of:
            0.109641045 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.049850095 = queryNorm
            0.14580199 = fieldWeight in 2234, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=2234)
      0.2 = coord(1/5)
    
    Abstract
    This study explores the navigation behaviour of all users of a large web service, Renardus, using web log analysis. Renardus provides integrated searching and browsing access to quality-controlled web resources from major individual subject gateway services. The main navigation feature is subject browsing through the Dewey Decimal Classification (DDC) based on mapping of classes of resources from the distributed gateways to the DDC structure. Among the more surprising results are the hugely dominant share of browsing activities, the good use of browsing support features like the graphical fish-eye overviews, rather long and varied navigation sequences, as well as extensive hierarchical directory-style browsing through the large DDC system.