Search (6 results, page 1 of 1)

Golub, K.; Hamon, T.; Ardö, A.: Automated classification of textual documents based on a controlled vocabulary in engineering (2007) 0.03
```
0.026122017 = product of:
  0.13061008 = sum of:
    0.13061008 = weight(_text_:engineering in 1461) [ClassicSimilarity], result of:
      0.13061008 = score(doc=1461,freq=6.0), product of:
        0.21172935 = queryWeight, product of:
          5.372528 = idf(docFreq=557, maxDocs=44218)
          0.03940963 = queryNorm
        0.6168728 = fieldWeight in 1461, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.372528 = idf(docFreq=557, maxDocs=44218)
          0.046875 = fieldNorm(doc=1461)
  0.2 = coord(1/5)
```
Abstract

Automated subject classification has been a challenging research issue for many years now, receiving particular attention in the past decade due to rapid increase of digital documents. The most frequent approach to automated classification is machine learning. It, however, requires training documents and performs well on new documents only if these are similar enough to the former. We explore a string-matching algorithm based on a controlled vocabulary, which does not require training documents - instead it reuses the intellectual work put into creating the controlled vocabulary. Terms from the Engineering Information thesaurus and classification scheme were matched against title and abstract of engineering papers from the Compendex database. Simple string-matching was enhanced by several methods such as term weighting schemes and cut-offs, exclusion of certain terms, and en- richment of the controlled vocabulary with automatically extracted terms. The best results are 76% recall when the controlled vocabulary is enriched with new terms, and 79% precision when certain terms are excluded. Precision of individual classes is up to 98%. These results are comparable to state-of-the-art machine-learning algorithms.
Golub, K.: Automated subject classification of textual Web pages, based on a controlled vocabulary : challenges and recommendations (2006) 0.02
```
0.021328537 = product of:
  0.10664268 = sum of:
    0.10664268 = weight(_text_:engineering in 5897) [ClassicSimilarity], result of:
      0.10664268 = score(doc=5897,freq=4.0), product of:
        0.21172935 = queryWeight, product of:
          5.372528 = idf(docFreq=557, maxDocs=44218)
          0.03940963 = queryNorm
        0.5036745 = fieldWeight in 5897, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.372528 = idf(docFreq=557, maxDocs=44218)
          0.046875 = fieldNorm(doc=5897)
  0.2 = coord(1/5)
```
Abstract

The primary objective of this study was to identify and address problems of applying a controlled vocabulary in automated subject classification of textual Web pages, in the area of engineering. Web pages have special characteristics such as structural information, but are at the same time rather heterogeneous. The classification approach used comprises string-to-string matching between words in a term list extracted from the Ei (Engineering Information) thesaurus and classification scheme, and words in the text to be classified. Based on a sample of 70 Web pages, a number of problems with the term list are identified. Reasons for those problems are discussed and improvements proposed. Methods for implementing the improvements are also specified, suggesting further research.
Golub, K.; Lykke, M.: Automated classification of web pages in hierarchical browsing (2009) 0.02
```
0.01777378 = product of:
  0.08886889 = sum of:
    0.08886889 = weight(_text_:engineering in 3614) [ClassicSimilarity], result of:
      0.08886889 = score(doc=3614,freq=4.0), product of:
        0.21172935 = queryWeight, product of:
          5.372528 = idf(docFreq=557, maxDocs=44218)
          0.03940963 = queryNorm
        0.41972876 = fieldWeight in 3614, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.372528 = idf(docFreq=557, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3614)
  0.2 = coord(1/5)
```
Abstract

Purpose - The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme. Design/methodology/approach - A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes. Findings - The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness. Research limitations/implications - Further research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation. Practical implications - Improvements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated. Originality/value - A user-based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.

Object

Engineering Index Classification

Golub, K.; Tudhope, D.; Zeng, M.L.; Zumer, M.: Terminology registries for knowledge organization systems : functionality, use, and attributes (2014) 0.00

0.0021357841 = product of:
  0.010678921 = sum of:
    0.010678921 = product of:
      0.032036763 = sum of:
        0.032036763 = weight(_text_:22 in 1347) [ClassicSimilarity], result of:
          0.032036763 = score(doc=1347,freq=2.0), product of:
            0.13800581 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03940963 = queryNorm
            0.23214069 = fieldWeight in 1347, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1347)
      0.33333334 = coord(1/3)
  0.2 = coord(1/5)

Date: 22. 8.2014 17:12:54

Matthews, B.; Jones, C.; Puzon, B.; Moon, J.; Tudhope, D.; Golub, K.; Nielsen, M.L.: ¬An evaluation of enhancing social tagging with a knowledge organization system (2010) 0.00

0.0017959764 = product of:
  0.008979882 = sum of:
    0.008979882 = product of:
      0.026939645 = sum of:
        0.026939645 = weight(_text_:29 in 4171) [ClassicSimilarity], result of:
          0.026939645 = score(doc=4171,freq=2.0), product of:
            0.13863076 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03940963 = queryNorm
            0.19432661 = fieldWeight in 4171, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4171)
      0.33333334 = coord(1/3)
  0.2 = coord(1/5)

Date: 29. 8.2010 11:39:20

Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.00

0.0017959764 = product of:
  0.008979882 = sum of:
    0.008979882 = product of:
      0.026939645 = sum of:
        0.026939645 = weight(_text_:29 in 2300) [ClassicSimilarity], result of:
          0.026939645 = score(doc=2300,freq=2.0), product of:
            0.13863076 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03940963 = queryNorm
            0.19432661 = fieldWeight in 2300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2300)
      0.33333334 = coord(1/3)
  0.2 = coord(1/5)

Source: Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro

Search (6 results, page 1 of 1)

Authors

Years

Themes