Search (139 results, page 7 of 7)

  • × theme_ss:"Automatisches Klassifizieren"
  • × type_ss:"a"
  1. Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004) 0.00
    0.0024303624 = product of:
      0.00972145 = sum of:
        0.00972145 = weight(_text_:information in 2563) [ClassicSimilarity], result of:
          0.00972145 = score(doc=2563,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 2563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2563)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 40(2004) no.2, S.239-255
  2. Hagedorn, K.; Chapman, S.; Newman, D.: Enhancing search and browse using automated clustering of subject metadata (2007) 0.00
    0.0024303624 = product of:
      0.00972145 = sum of:
        0.00972145 = weight(_text_:information in 1168) [ClassicSimilarity], result of:
          0.00972145 = score(doc=1168,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 1168, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1168)
      0.25 = coord(1/4)
    
    Abstract
    The Web puzzle of online information resources often hinders end-users from effective and efficient access to these resources. Clustering resources into appropriate subject-based groupings may help alleviate these difficulties, but will it work with heterogeneous material? The University of Michigan and the University of California Irvine joined forces to test automatically enhancing metadata records using the Topic Modeling algorithm on the varied OAIster corpus. We created labels for the resulting clusters of metadata records, matched the clusters to an in-house classification system, and developed a prototype that would showcase methods for search and retrieval using the enhanced records. Results indicated that while the algorithm was somewhat time-intensive to run and using a local classification scheme had its drawbacks, precise clustering of records was achieved and the prototype interface proved that faceted classification could be powerful in helping end-users find resources.
  3. Golub, K.; Hamon, T.; Ardö, A.: Automated classification of textual documents based on a controlled vocabulary in engineering (2007) 0.00
    0.0024303624 = product of:
      0.00972145 = sum of:
        0.00972145 = weight(_text_:information in 1461) [ClassicSimilarity], result of:
          0.00972145 = score(doc=1461,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 1461, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1461)
      0.25 = coord(1/4)
    
    Abstract
    Automated subject classification has been a challenging research issue for many years now, receiving particular attention in the past decade due to rapid increase of digital documents. The most frequent approach to automated classification is machine learning. It, however, requires training documents and performs well on new documents only if these are similar enough to the former. We explore a string-matching algorithm based on a controlled vocabulary, which does not require training documents - instead it reuses the intellectual work put into creating the controlled vocabulary. Terms from the Engineering Information thesaurus and classification scheme were matched against title and abstract of engineering papers from the Compendex database. Simple string-matching was enhanced by several methods such as term weighting schemes and cut-offs, exclusion of certain terms, and en- richment of the controlled vocabulary with automatically extracted terms. The best results are 76% recall when the controlled vocabulary is enriched with new terms, and 79% precision when certain terms are excluded. Precision of individual classes is up to 98%. These results are comparable to state-of-the-art machine-learning algorithms.
  4. Malenica, M.; Smuc, T.; Snajder, J.; Basic, B.D.: Language morphology offset : text classification on a Croatian-English parallel corpus (2008) 0.00
    0.0024303624 = product of:
      0.00972145 = sum of:
        0.00972145 = weight(_text_:information in 2035) [ClassicSimilarity], result of:
          0.00972145 = score(doc=2035,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 2035, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2035)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 44(2008) no.1, S.325-339
  5. Zhou, G.D.; Zhang, M.; Ji, D.H.; Zhu, Q.M.: Hierarchical learning strategy in semantic relation extraction (2008) 0.00
    0.0024303624 = product of:
      0.00972145 = sum of:
        0.00972145 = weight(_text_:information in 2077) [ClassicSimilarity], result of:
          0.00972145 = score(doc=2077,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 2077, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2077)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 44(2008) no.3, S.1008-1021
  6. Ko, Y.; Seo, J.: Text classification from unlabeled documents with bootstrapping and feature projection techniques (2009) 0.00
    0.0024303624 = product of:
      0.00972145 = sum of:
        0.00972145 = weight(_text_:information in 2452) [ClassicSimilarity], result of:
          0.00972145 = score(doc=2452,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 2452, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2452)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 45(2009) no.1, S.70-83
  7. Golub, K.: Automated subject classification of textual documents in the context of Web-based hierarchical browsing (2011) 0.00
    0.0024303624 = product of:
      0.00972145 = sum of:
        0.00972145 = weight(_text_:information in 4558) [ClassicSimilarity], result of:
          0.00972145 = score(doc=4558,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 4558, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=4558)
      0.25 = coord(1/4)
    
    Abstract
    While automated methods for information organization have been around for several decades now, exponential growth of the World Wide Web has put them into the forefront of research in different communities, within which several approaches can be identified: 1) machine learning (algorithms that allow computers to improve their performance based on learning from pre-existing data); 2) document clustering (algorithms for unsupervised document organization and automated topic extraction); and 3) string matching (algorithms that match given strings within larger text). Here the aim was to automatically organize textual documents into hierarchical structures for subject browsing. The string-matching approach was tested using a controlled vocabulary (containing pre-selected and pre-defined authorized terms, each corresponding to only one concept). The results imply that an appropriate controlled vocabulary, with a sufficient number of entry terms designating classes, could in itself be a solution for automated classification. Then, if the same controlled vocabulary had an appropriat hierarchical structure, it would at the same time provide a good browsing structure for the collection of automatically classified documents.
  8. Desale, S.K.; Kumbhar, R.: Research on automatic classification of documents in library environment : a literature review (2013) 0.00
    0.0024303624 = product of:
      0.00972145 = sum of:
        0.00972145 = weight(_text_:information in 1071) [ClassicSimilarity], result of:
          0.00972145 = score(doc=1071,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 1071, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1071)
      0.25 = coord(1/4)
    
    Abstract
    This paper aims to provide an overview of automatic classification research, which focuses on issues related to the automatic classification of documents in a library environment. The review covers literature published in mainstream library and information science studies. The review was done on literature published in both academic and professional LIS journals and other documents. This review reveals that basically three types of research are being done on automatic classification: 1) hierarchical classification using different library classification schemes, 2) text categorization and document categorization using different type of classifiers with or without using training documents, and 3) automatic bibliographic classification. Predominantly this research is directed towards solving problems of organization of digital documents in an online environment. However, very little research is devoted towards solving the problems of arrangement of physical documents.
  9. Na, J.-C.; Sui, H.; Khoo, C.; Chan, S.; Zhou, Y.: Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews (2004) 0.00
    0.0020253018 = product of:
      0.008101207 = sum of:
        0.008101207 = weight(_text_:information in 2624) [ClassicSimilarity], result of:
          0.008101207 = score(doc=2624,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.09697737 = fieldWeight in 2624, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2624)
      0.25 = coord(1/4)
    
    Source
    Knowledge organization and the global information society: Proceedings of the 8th International ISKO Conference 13-16 July 2004, London, UK. Ed.: I.C. McIlwaine
  10. Kwon, O.W.; Lee, J.H.: Text categorization based on k-nearest neighbor approach for web site classification (2003) 0.00
    0.0020253018 = product of:
      0.008101207 = sum of:
        0.008101207 = weight(_text_:information in 1070) [ClassicSimilarity], result of:
          0.008101207 = score(doc=1070,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.09697737 = fieldWeight in 1070, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1070)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 39(2003) no.1, S.25-44
  11. Pong, J.Y.-H.; Kwok, R.C.-W.; Lau, R.Y.-K.; Hao, J.-X.; Wong, P.C.-C.: ¬A comparative study of two automatic document classification methods in a library setting (2008) 0.00
    0.0020253018 = product of:
      0.008101207 = sum of:
        0.008101207 = weight(_text_:information in 2532) [ClassicSimilarity], result of:
          0.008101207 = score(doc=2532,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.09697737 = fieldWeight in 2532, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2532)
      0.25 = coord(1/4)
    
    Source
    Journal of information science. 34(2008) no.2, S.213-230
  12. Fang, H.: Classifying research articles in multidisciplinary sciences journals into subject categories (2015) 0.00
    0.0020253018 = product of:
      0.008101207 = sum of:
        0.008101207 = weight(_text_:information in 2194) [ClassicSimilarity], result of:
          0.008101207 = score(doc=2194,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.09697737 = fieldWeight in 2194, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2194)
      0.25 = coord(1/4)
    
    Abstract
    In the Thomson Reuters Web of Science database, the subject categories of a journal are applied to all articles in the journal. However, many articles in multidisciplinary Sciences journals may only be represented by a small number of subject categories. To provide more accurate information on the research areas of articles in such journals, we can classify articles in these journals into subject categories as defined by Web of Science based on their references. For an article in a multidisciplinary sciences journal, the method counts the subject categories in all of the article's references indexed by Web of Science, and uses the most numerous subject categories of the references to determine the most appropriate classification of the article. We used articles in an issue of Proceedings of the National Academy of Sciences (PNAS) to validate the correctness of the method by comparing the obtained results with the categories of the articles as defined by PNAS and their content. This study shows that the method provides more precise search results for the subject category of interest in bibliometric investigations through recognition of articles in multidisciplinary sciences journals whose work relates to a particular subject category.
  13. Borodin, Y.; Polishchuk, V.; Mahmud, J.; Ramakrishnan, I.V.; Stent, A.: Live and learn from mistakes : a lightweight system for document classification (2013) 0.00
    0.0020253018 = product of:
      0.008101207 = sum of:
        0.008101207 = weight(_text_:information in 2722) [ClassicSimilarity], result of:
          0.008101207 = score(doc=2722,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.09697737 = fieldWeight in 2722, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2722)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 49(2013) no.1, S.83-98
  14. Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.00
    0.0020253018 = product of:
      0.008101207 = sum of:
        0.008101207 = weight(_text_:information in 3627) [ClassicSimilarity], result of:
          0.008101207 = score(doc=3627,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.09697737 = fieldWeight in 3627, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3627)
      0.25 = coord(1/4)
    
    Abstract
    A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).
  15. Wang, H.; Hong, M.: Supervised Hebb rule based feature selection for text classification (2019) 0.00
    0.0020253018 = product of:
      0.008101207 = sum of:
        0.008101207 = weight(_text_:information in 5036) [ClassicSimilarity], result of:
          0.008101207 = score(doc=5036,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.09697737 = fieldWeight in 5036, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5036)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 56(2019) no.1, S.167-191
  16. Ru, C.; Tang, J.; Li, S.; Xie, S.; Wang, T.: Using semantic similarity to reduce wrong labels in distant supervision for relation extraction (2018) 0.00
    0.0020253018 = product of:
      0.008101207 = sum of:
        0.008101207 = weight(_text_:information in 5055) [ClassicSimilarity], result of:
          0.008101207 = score(doc=5055,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.09697737 = fieldWeight in 5055, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5055)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 54(2018) no.4, S.593-608
  17. Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015) 0.00
    0.0016202416 = product of:
      0.006480966 = sum of:
        0.006480966 = weight(_text_:information in 2301) [ClassicSimilarity], result of:
          0.006480966 = score(doc=2301,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.0775819 = fieldWeight in 2301, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=2301)
      0.25 = coord(1/4)
    
    Abstract
    Analytico-synthetic and faceted classifications, such as Universal Decimal Classification (UDC) express content of documents with complex, pre-combined classification codes. Without classification authority control that would help manage and access structured notations, the use of UDC codes in searching and browsing is limited. Existing UDC parsing solutions are usually created for a particular database system or a specific task and are not widely applicable. The approach described in this paper provides a solution by which the analysis and interpretation of UDC notations would be stored into an intermediate format (in this case, in XML) by automatic means without any data or information loss. Due to its richness, the output file can be converted into different formats, such as standard mark-up and data exchange formats or simple lists of the recommended entry points of a UDC number. The program can also be used to create authority records containing complex UDC numbers which can be comprehensively analysed in order to be retrieved effectively. The Java program, as well as the corresponding schema definition it employs, is under continuous development. The current version of the interpreter software is now available online for testing purposes at the following web site: http://interpreter-eto.rhcloud.com. The future plan is to implement conversion methods for standard formats and to create standard online interfaces in order to make it possible to use the features of software as a service. This would result in the algorithm being able to be employed both in existing and future library systems to analyse UDC numbers without any significant programming effort.
  18. Kragelj, M.; Borstnar, M.K.: Automatic classification of older electronic texts into the Universal Decimal Classification-UDC (2021) 0.00
    0.0016202416 = product of:
      0.006480966 = sum of:
        0.006480966 = weight(_text_:information in 175) [ClassicSimilarity], result of:
          0.006480966 = score(doc=175,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.0775819 = fieldWeight in 175, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=175)
      0.25 = coord(1/4)
    
    Abstract
    Purpose The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods. Design/methodology/approach The general research approach is inherent to design science research, in which the problem of UDC assignment of the old, digitised texts is addressed by developing a machine-learning classification model. A corpus of 70,000 scholarly texts, fully bibliographically processed by librarians, was used to train and test the model, which was used for classification of old texts on a corpus of 200,000 items. Human experts evaluated the performance of the model. Findings Results suggest that machine-learning models can correctly assign the UDC at some level for almost any scholarly text. Furthermore, the model can be recommended for the UDC assignment of older texts. Ten librarians corroborated this on 150 randomly selected texts. Research limitations/implications The main limitations of this study were unavailability of labelled older texts and the limited availability of librarians. Practical implications The classification model can provide a recommendation to the librarians during their classification work; furthermore, it can be implemented as an add-on to full-text search in the library databases. Social implications The proposed methodology supports librarians by recommending UDC classifiers, thus saving time in their daily work. By automatically classifying older texts, digital libraries can provide a better user experience by enabling structured searches. These contribute to making knowledge more widely available and useable. Originality/value These findings contribute to the field of automated classification of bibliographical information with the usage of full texts, especially in cases in which the texts are old, unstructured and in which archaic language and vocabulary are used.
  19. Borko, H.: Research in computer based classification systems (1985) 0.00
    0.0014177114 = product of:
      0.0056708455 = sum of:
        0.0056708455 = weight(_text_:information in 3647) [ClassicSimilarity], result of:
          0.0056708455 = score(doc=3647,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.06788416 = fieldWeight in 3647, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3647)
      0.25 = coord(1/4)
    
    Abstract
    The selection in this reader by R. M. Needham and K. Sparck Jones reports an early approach to automatic classification that was taken in England. The following selection reviews various approaches that were being pursued in the United States at about the same time. It then discusses a particular approach initiated in the early 1960s by Harold Borko, at that time Head of the Language Processing and Retrieval Research Staff at the System Development Corporation, Santa Monica, California and, since 1966, a member of the faculty at the Graduate School of Library and Information Science, University of California, Los Angeles. As was described earlier, there are two steps in automatic classification, the first being to identify pairs of terms that are similar by virtue of co-occurring as index terms in the same documents, and the second being to form equivalence classes of intersubstitutable terms. To compute similarities, Borko and his associates used a standard correlation formula; to derive classification categories, where Needham and Sparck Jones used clumping, the Borko team used the statistical technique of factor analysis. The fact that documents can be classified automatically, and in any number of ways, is worthy of passing notice. Worthy of serious attention would be a demonstra tion that a computer-based classification system was effective in the organization and retrieval of documents. One reason for the inclusion of the following selection in the reader is that it addresses the question of evaluation. To evaluate the effectiveness of their automatically derived classification, Borko and his team asked three questions. The first was Is the classification reliable? in other words, could the categories derived from one sample of texts be used to classify other texts? Reliability was assessed by a case-study comparison of the classes derived from three different samples of abstracts. The notso-surprising conclusion reached was that automatically derived classes were reliable only to the extent that the sample from which they were derived was representative of the total document collection. The second evaluation question asked whether the classification was reasonable, in the sense of adequately describing the content of the document collection. The answer was sought by comparing the automatically derived categories with categories in a related classification system that was manually constructed. Here the conclusion was that the automatic method yielded categories that fairly accurately reflected the major area of interest in the sample collection of texts; however, since there were only eleven such categories and they were quite broad, they could not be regarded as suitable for use in a university or any large general library. The third evaluation question asked whether automatic classification was accurate, in the sense of producing results similar to those obtainabie by human cIassifiers. When using human classification as a criterion, automatic classification was found to be 50 percent accurate.

Years

Languages

  • e 131
  • d 7
  • chi 1
  • More… Less…