Search (8 results, page 1 of 1)

  • × author_ss:"Golub, K."
  1. Golub, K.: Automated subject classification of textual Web pages, based on a controlled vocabulary : challenges and recommendations (2006) 0.01
    0.0071366318 = product of:
      0.03568316 = sum of:
        0.03568316 = product of:
          0.07136632 = sum of:
            0.07136632 = weight(_text_:problems in 5897) [ClassicSimilarity], result of:
              0.07136632 = score(doc=5897,freq=6.0), product of:
                0.15058853 = queryWeight, product of:
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.036484417 = queryNorm
                0.47391602 = fieldWeight in 5897, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5897)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    The primary objective of this study was to identify and address problems of applying a controlled vocabulary in automated subject classification of textual Web pages, in the area of engineering. Web pages have special characteristics such as structural information, but are at the same time rather heterogeneous. The classification approach used comprises string-to-string matching between words in a term list extracted from the Ei (Engineering Information) thesaurus and classification scheme, and words in the text to be classified. Based on a sample of 70 Web pages, a number of problems with the term list are identified. Reasons for those problems are discussed and improvements proposed. Methods for implementing the improvements are also specified, suggesting further research.
  2. Golub, K.: Automated subject classification of textual web documents (2006) 0.00
    0.004855863 = product of:
      0.024279313 = sum of:
        0.024279313 = product of:
          0.048558626 = sum of:
            0.048558626 = weight(_text_:problems in 5600) [ClassicSimilarity], result of:
              0.048558626 = score(doc=5600,freq=4.0), product of:
                0.15058853 = queryWeight, product of:
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.036484417 = queryNorm
                0.322459 = fieldWeight in 5600, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5600)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Purpose - To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such. Design/methodology/approach - A range of works dealing with automated classification of full-text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages. Findings - Provides major similarities and differences between the three approaches: document pre-processing and utilization of web-specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized. Research limitations/implications - The paper does not attempt to provide an exhaustive bibliography of related resources. Practical implications - As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities. Originality/value - To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.
  3. Golub, K.: Subject access to information : an interdisciplinary approach (2015) 0.00
    0.0041203364 = product of:
      0.02060168 = sum of:
        0.02060168 = product of:
          0.04120336 = sum of:
            0.04120336 = weight(_text_:problems in 134) [ClassicSimilarity], result of:
              0.04120336 = score(doc=134,freq=2.0), product of:
                0.15058853 = queryWeight, product of:
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.036484417 = queryNorm
                0.27361554 = fieldWeight in 134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.046875 = fieldNorm(doc=134)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Drawing on the research of experts from the fields of computing and library science, this ground-breaking work will show you how to combine two very different approaches to classification to create more effective, user-friendly information-retrieval systems. * Provides an interdisciplinary overview of current and potential approaches to organizing information by subject * Covers both pure computer science and pure library science topics in easy-to-understand language accessible to audiences from both disciplines * Reviews technological standards for representation, storage, and retrieval of varied knowledge-organization systems and their constituent elements * Suggests a collaborative approach that will reduce duplicate efforts and make it easier to find solutions to practical problems.
  4. Golub, K.; Lykke, M.: Automated classification of web pages in hierarchical browsing (2009) 0.00
    0.0034336136 = product of:
      0.017168067 = sum of:
        0.017168067 = product of:
          0.034336135 = sum of:
            0.034336135 = weight(_text_:problems in 3614) [ClassicSimilarity], result of:
              0.034336135 = score(doc=3614,freq=2.0), product of:
                0.15058853 = queryWeight, product of:
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.036484417 = queryNorm
                0.22801295 = fieldWeight in 3614, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3614)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Purpose - The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme. Design/methodology/approach - A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes. Findings - The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness. Research limitations/implications - Further research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation. Practical implications - Improvements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated. Originality/value - A user-based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.
  5. Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.00
    0.0034336136 = product of:
      0.017168067 = sum of:
        0.017168067 = product of:
          0.034336135 = sum of:
            0.034336135 = weight(_text_:problems in 2300) [ClassicSimilarity], result of:
              0.034336135 = score(doc=2300,freq=2.0), product of:
                0.15058853 = queryWeight, product of:
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.036484417 = queryNorm
                0.22801295 = fieldWeight in 2300, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2300)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Subject terms play a crucial role in resource discovery but require substantial effort to produce. Automatic subject classification and indexing address problems of scale and sustainability and can be used to enrich existing bibliographic records, establish more connections across and between resources and enhance consistency of bibliographic data. The paper aims to put forward a complex methodological framework to evaluate automatic classification tools of Swedish textual documents based on the Dewey Decimal Classification (DDC) recently introduced to Swedish libraries. Three major complementary approaches are suggested: a quality-built gold standard, retrieval effects, domain analysis. The gold standard is built based on input from at least two catalogue librarians, end-users expert in the subject, end users inexperienced in the subject and automated tools. Retrieval effects are studied through a combination of assigned and free tasks, including factual and comprehensive types. The study also takes into consideration the different role and character of subject terms in various knowledge domains, such as scientific disciplines. As a theoretical framework, domain analysis is used and applied in relation to the implementation of DDC in Swedish libraries and chosen domains of knowledge within the DDC itself.
  6. Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.00
    0.0034336136 = product of:
      0.017168067 = sum of:
        0.017168067 = product of:
          0.034336135 = sum of:
            0.034336135 = weight(_text_:problems in 3311) [ClassicSimilarity], result of:
              0.034336135 = score(doc=3311,freq=2.0), product of:
                0.15058853 = queryWeight, product of:
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.036484417 = queryNorm
                0.22801295 = fieldWeight in 3311, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3311)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
  7. Golub, K.: Automatic subject indexing of text (2019) 0.00
    0.0034336136 = product of:
      0.017168067 = sum of:
        0.017168067 = product of:
          0.034336135 = sum of:
            0.034336135 = weight(_text_:problems in 5268) [ClassicSimilarity], result of:
              0.034336135 = score(doc=5268,freq=2.0), product of:
                0.15058853 = queryWeight, product of:
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.036484417 = queryNorm
                0.22801295 = fieldWeight in 5268, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1274753 = idf(docFreq=1937, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5268)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collec-tions, and enhance consistency of the metadata. In this work, au-tomatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are dis-cussed, in terms of their similarities and differences, advantages and disadvantages for automatic assigned indexing from KOSs: "text categorization," "document clustering," and "document classification." Text categorization is perhaps the most wide-spread, machine-learning approach with what seems generally good reported performance. Document clustering automatically both creates groups of related documents and extracts names of subjects depicting the group at hand. Document classification re-uses the intellectual effort invested into creating a KOS for sub-ject indexing and even simple string-matching algorithms have been reported to achieve good results, because one concept can be described using a number of different terms, including equiv-alent, related, narrower and broader terms. Finally, applicability of automatic subject indexing to operative information systems and challenges of evaluation are outlined, suggesting the need for more research.
  8. Golub, K.; Tudhope, D.; Zeng, M.L.; Zumer, M.: Terminology registries for knowledge organization systems : functionality, use, and attributes (2014) 0.00
    0.002965881 = product of:
      0.014829405 = sum of:
        0.014829405 = product of:
          0.02965881 = sum of:
            0.02965881 = weight(_text_:22 in 1347) [ClassicSimilarity], result of:
              0.02965881 = score(doc=1347,freq=2.0), product of:
                0.12776221 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.036484417 = queryNorm
                0.23214069 = fieldWeight in 1347, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1347)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    22. 8.2014 17:12:54