Search (145 results, page 1 of 8)

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.15

0.14780095 = product of:
  0.22170141 = sum of:
    0.059888236 = weight(_text_:bibliographic in 2560) [ClassicSimilarity], result of:
      0.059888236 = score(doc=2560,freq=2.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.30108726 = fieldWeight in 2560, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.16181318 = sum of:
      0.11335658 = weight(_text_:classification in 2560) [ClassicSimilarity], result of:
        0.11335658 = score(doc=2560,freq=16.0), product of:
          0.16271563 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.051092815 = queryNorm
          0.69665456 = fieldWeight in 2560, product of:
            4.0 = tf(freq=16.0), with freq of:
              16.0 = termFreq=16.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2560)
      0.048456598 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
        0.048456598 = score(doc=2560,freq=2.0), product of:
          0.17891833 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051092815 = queryNorm
          0.2708308 = fieldWeight in 2560, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2560)
  0.6666667 = coord(2/3)

Abstract: The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Date: 22. 9.2008 18:31:54
Source: International cataloguing and bibliographic control. 36(2007) no.4, S.78-82

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.12

0.12145533 = product of:
  0.18218298 = sum of:
    0.08114894 = product of:
      0.2434468 = sum of:
        0.2434468 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.2434468 = score(doc=562,freq=2.0), product of:
            0.43316546 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.051092815 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.101034045 = sum of:
      0.059499815 = weight(_text_:classification in 562) [ClassicSimilarity], result of:
        0.059499815 = score(doc=562,freq=6.0), product of:
          0.16271563 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.051092815 = queryNorm
          0.3656675 = fieldWeight in 562, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
      0.041534226 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.041534226 = score(doc=562,freq=2.0), product of:
          0.17891833 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051092815 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
  0.6666667 = coord(2/3)

Abstract: Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Schiminovich, S.: Automatic classification and retrieval of documents by means of a bibliographic pattern discovery algorithm (1971) 0.11

0.106569394 = product of:
  0.15985408 = sum of:
    0.11977647 = weight(_text_:bibliographic in 4846) [ClassicSimilarity], result of:
      0.11977647 = score(doc=4846,freq=2.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.6021745 = fieldWeight in 4846, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.109375 = fieldNorm(doc=4846)
    0.040077604 = product of:
      0.08015521 = sum of:
        0.08015521 = weight(_text_:classification in 4846) [ClassicSimilarity], result of:
          0.08015521 = score(doc=4846,freq=2.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.49260917 = fieldWeight in 4846, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.109375 = fieldNorm(doc=4846)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Wang, J.: ¬An extensive study on automated Dewey Decimal Classification (2009) 0.08
```
0.08402608 = product of:
  0.12603912 = sum of:
    0.08555462 = weight(_text_:bibliographic in 3172) [ClassicSimilarity], result of:
      0.08555462 = score(doc=3172,freq=8.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.43012467 = fieldWeight in 3172, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3172)
    0.040484495 = product of:
      0.08096899 = sum of:
        0.08096899 = weight(_text_:classification in 3172) [ClassicSimilarity], result of:
          0.08096899 = score(doc=3172,freq=16.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.49761042 = fieldWeight in 3172, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3172)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

In this paper, we present a theoretical analysis and extensive experiments on the automated assignment of Dewey Decimal Classification (DDC) classes to bibliographic data with a supervised machine-learning approach. Library classification systems, such as the DDC, impose great obstacles on state-of-art text categorization (TC) technologies, including deep hierarchy, data sparseness, and skewed distribution. We first analyze statistically the document and category distributions over the DDC, and discuss the obstacles imposed by bibliographic corpora and library classification schemes on TC technology. To overcome these obstacles, we propose an innovative algorithm to reshape the DDC structure into a balanced virtual tree by balancing the category distribution and flattening the hierarchy. To improve the classification effectiveness to a level acceptable to real-world applications, we propose an interactive classification model that is able to predict a class of any depth within a limited number of user interactions. The experiments are conducted on a large bibliographic collection created by the Library of Congress within the science and technology domains over 10 years. With no more than three interactions, a classification accuracy of nearly 90% is achieved, thus providing a practical solution to the automatic bibliographic classification problem.

Desale, S.K.; Kumbhar, R.: Research on automatic classification of documents in library environment : a literature review (2013) 0.06

0.06451768 = product of:
  0.09677651 = sum of:
    0.05133277 = weight(_text_:bibliographic in 1071) [ClassicSimilarity], result of:
      0.05133277 = score(doc=1071,freq=2.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.2580748 = fieldWeight in 1071, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.046875 = fieldNorm(doc=1071)
    0.045443736 = product of:
      0.09088747 = sum of:
        0.09088747 = weight(_text_:classification in 1071) [ClassicSimilarity], result of:
          0.09088747 = score(doc=1071,freq=14.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.55856633 = fieldWeight in 1071, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=1071)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: This paper aims to provide an overview of automatic classification research, which focuses on issues related to the automatic classification of documents in a library environment. The review covers literature published in mainstream library and information science studies. The review was done on literature published in both academic and professional LIS journals and other documents. This review reveals that basically three types of research are being done on automatic classification: 1) hierarchical classification using different library classification schemes, 2) text categorization and document categorization using different type of classifiers with or without using training documents, and 3) automatic bibliographic classification. Predominantly this research is directed towards solving problems of organization of digital documents in an online environment. However, very little research is devoted towards solving the problems of arrangement of physical documents.

Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.06
```
0.06370457 = product of:
  0.095556855 = sum of:
    0.060496252 = weight(_text_:bibliographic in 2300) [ClassicSimilarity], result of:
      0.060496252 = score(doc=2300,freq=4.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.30414405 = fieldWeight in 2300, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2300)
    0.0350606 = product of:
      0.0701212 = sum of:
        0.0701212 = weight(_text_:classification in 2300) [ClassicSimilarity], result of:
          0.0701212 = score(doc=2300,freq=12.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.43094325 = fieldWeight in 2300, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2300)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Subject terms play a crucial role in resource discovery but require substantial effort to produce. Automatic subject classification and indexing address problems of scale and sustainability and can be used to enrich existing bibliographic records, establish more connections across and between resources and enhance consistency of bibliographic data. The paper aims to put forward a complex methodological framework to evaluate automatic classification tools of Swedish textual documents based on the Dewey Decimal Classification (DDC) recently introduced to Swedish libraries. Three major complementary approaches are suggested: a quality-built gold standard, retrieval effects, domain analysis. The gold standard is built based on input from at least two catalogue librarians, end-users expert in the subject, end users inexperienced in the subject and automated tools. Retrieval effects are studied through a combination of assigned and free tasks, including factual and comprehensive types. The study also takes into consideration the different role and character of subject terms in various knowledge domains, such as scientific disciplines. As a theoretical framework, domain analysis is used and applied in relation to the implementation of DDC in Swedish libraries and chosen domains of knowledge within the DDC itself.

Source

Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro

Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.06

0.060896795 = product of:
  0.09134519 = sum of:
    0.0684437 = weight(_text_:bibliographic in 2564) [ClassicSimilarity], result of:
      0.0684437 = score(doc=2564,freq=2.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.34409973 = fieldWeight in 2564, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0625 = fieldNorm(doc=2564)
    0.022901488 = product of:
      0.045802977 = sum of:
        0.045802977 = weight(_text_:classification in 2564) [ClassicSimilarity], result of:
          0.045802977 = score(doc=2564,freq=2.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.28149095 = fieldWeight in 2564, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0625 = fieldNorm(doc=2564)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.

Reiner, U.: DDC-based search in the data of the German National Bibliography (2008) 0.06

0.05712334 = product of:
  0.08568501 = sum of:
    0.05133277 = weight(_text_:bibliographic in 2166) [ClassicSimilarity], result of:
      0.05133277 = score(doc=2166,freq=2.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.2580748 = fieldWeight in 2166, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.046875 = fieldNorm(doc=2166)
    0.034352235 = product of:
      0.06870447 = sum of:
        0.06870447 = weight(_text_:classification in 2166) [ClassicSimilarity], result of:
          0.06870447 = score(doc=2166,freq=8.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.42223644 = fieldWeight in 2166, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=2166)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: In 2004, the German National Library began to classify title records of the German National Bibliography according to subject groups based on the divisions of the Dewey Decimal Classification (DDC). Since 2006, all titles of the main series of the German National Bibliography are classified in strict compliance with the DDC. On this basis, an enhanced DDC-based search can be realized - e.g., searching the data of the German National Bibliography for title records using number components of synthesized classification numbers or searching for DDC numbers using unclassified title records. This paper gives an account of the current research and development of the DDC-based search. The work is conducted in the VZG project Colibri that focuses on the automatic analysis of DDC-synthesized numbers and the automatic classification of bibliographic title records.
Source: New pespectives on subject indexing and classification: essays in honour of Magda Heiner-Freiling. Red.: K. Knull-Schlomann, u.a

Wille, J.: Automatisches Klassifizieren bibliographischer Beschreibungsdaten : Vorgehensweise und Ergebnisse (2006) 0.05

0.053284697 = product of:
  0.07992704 = sum of:
    0.059888236 = weight(_text_:bibliographic in 6090) [ClassicSimilarity], result of:
      0.059888236 = score(doc=6090,freq=2.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.30108726 = fieldWeight in 6090, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6090)
    0.020038802 = product of:
      0.040077604 = sum of:
        0.040077604 = weight(_text_:classification in 6090) [ClassicSimilarity], result of:
          0.040077604 = score(doc=6090,freq=2.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.24630459 = fieldWeight in 6090, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6090)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Diese Arbeit befasst sich mit den praktischen Aspekten des Automatischen Klassifizierens bibliographischer Referenzdaten. Im Vordergrund steht die konkrete Vorgehensweise anhand des eigens zu diesem Zweck entwickelten Open Source-Programms COBRA "Classification Of Bibliographic Records, Automatic". Es werden die Rahmenbedingungen und Parameter f¨ur einen Einsatz im bibliothekarischen Umfeld geklärt. Schließlich erfolgt eine Auswertung von Klassifizierungsergebnissen am Beispiel sozialwissenschaftlicher Daten aus der Datenbank SOLIS.

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.05

0.04887543 = product of:
  0.14662628 = sum of:
    0.14662628 = sum of:
      0.098169684 = weight(_text_:classification in 5273) [ClassicSimilarity], result of:
        0.098169684 = score(doc=5273,freq=12.0), product of:
          0.16271563 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.051092815 = queryNorm
          0.60332054 = fieldWeight in 5273, product of:
            3.4641016 = tf(freq=12.0), with freq of:
              12.0 = termFreq=12.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.0546875 = fieldNorm(doc=5273)
      0.048456598 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
        0.048456598 = score(doc=5273,freq=2.0), product of:
          0.17891833 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051092815 = queryNorm
          0.2708308 = fieldWeight in 5273, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=5273)
  0.33333334 = coord(1/3)

Abstract: In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with large collections of practical data. In this article, we introduce a new evaluation scheme for internal node classifiers, which can be used effectively to develop a hierarchical classification system. We also show that our method for constructing the hierarchical classification system is very effective, especially for the task of constructing classifiers applied to hierarchy tree with a lot of levels.
Date: 22. 7.2006 16:24:52

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.04

0.042159148 = product of:
  0.12647744 = sum of:
    0.12647744 = sum of:
      0.057253722 = weight(_text_:classification in 2748) [ClassicSimilarity], result of:
        0.057253722 = score(doc=2748,freq=2.0), product of:
          0.16271563 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.051092815 = queryNorm
          0.35186368 = fieldWeight in 2748, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.078125 = fieldNorm(doc=2748)
      0.06922371 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
        0.06922371 = score(doc=2748,freq=2.0), product of:
          0.17891833 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051092815 = queryNorm
          0.38690117 = fieldWeight in 2748, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=2748)
  0.33333334 = coord(1/3)

Date: 1. 2.2016 18:25:22

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.04
```
0.039449386 = product of:
  0.11834815 = sum of:
    0.11834815 = sum of:
      0.07681393 = weight(_text_:classification in 2760) [ClassicSimilarity], result of:
        0.07681393 = score(doc=2760,freq=10.0), product of:
          0.16271563 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.051092815 = queryNorm
          0.4720747 = fieldWeight in 2760, product of:
            3.1622777 = tf(freq=10.0), with freq of:
              10.0 = termFreq=10.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.046875 = fieldNorm(doc=2760)
      0.041534226 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
        0.041534226 = score(doc=2760,freq=2.0), product of:
          0.17891833 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051092815 = queryNorm
          0.23214069 = fieldWeight in 2760, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2760)
  0.33333334 = coord(1/3)
```
Abstract

Information is often organized as a text hierarchy. A hierarchical text-classification system is thus essential for the management, sharing, and dissemination of information. It aims to automatically classify each incoming document into zero, one, or several categories in the text hierarchy. In this paper, we present a technique called CRHTC (context recognition for hierarchical text classification) that performs hierarchical text classification by recognizing the context of discussion (COD) of each category. A category's COD is governed by its ancestor categories, whose contents indicate contextual backgrounds of the category. A document may be classified into a category only if its content matches the category's COD. CRHTC does not require any trials to manually set parameters, and hence is more portable and easier to implement than other methods. It is empirically evaluated under various conditions. The results show that CRHTC achieves both better and more stable performance than several hierarchical and nonhierarchical text-classification methodologies.

Date

22. 3.2009 19:11:54

Automatic classification research at OCLC (2002) 0.04

0.03929102 = product of:
  0.11787306 = sum of:
    0.11787306 = sum of:
      0.069416456 = weight(_text_:classification in 1563) [ClassicSimilarity], result of:
        0.069416456 = score(doc=1563,freq=6.0), product of:
          0.16271563 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.051092815 = queryNorm
          0.42661208 = fieldWeight in 1563, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1563)
      0.048456598 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
        0.048456598 = score(doc=1563,freq=2.0), product of:
          0.17891833 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051092815 = queryNorm
          0.2708308 = fieldWeight in 1563, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1563)
  0.33333334 = coord(1/3)

Abstract: OCLC enlists the cooperation of the world's libraries to make the written record of humankind's cultural heritage more accessible through electronic media. Part of this goal can be accomplished through the application of the principles of knowledge organization. We believe that cultural artifacts are effectively lost unless they are indexed, cataloged and classified. Accordingly, OCLC has developed products, sponsored research projects, and encouraged the participation in international standards communities whose outcome has been improved library classification schemes, cataloging productivity tools, and new proposals for the creation and maintenance of metadata. Though cataloging and classification requires expert intellectual effort, we recognize that at least some of the work must be automated if we hope to keep pace with cultural change
Date: 5. 5.2003 9:22:09

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.04

0.03929102 = product of:
  0.11787306 = sum of:
    0.11787306 = sum of:
      0.069416456 = weight(_text_:classification in 1673) [ClassicSimilarity], result of:
        0.069416456 = score(doc=1673,freq=6.0), product of:
          0.16271563 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.051092815 = queryNorm
          0.42661208 = fieldWeight in 1673, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1673)
      0.048456598 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
        0.048456598 = score(doc=1673,freq=2.0), product of:
          0.17891833 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051092815 = queryNorm
          0.2708308 = fieldWeight in 1673, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1673)
  0.33333334 = coord(1/3)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.03

0.033678018 = product of:
  0.101034045 = sum of:
    0.101034045 = sum of:
      0.059499815 = weight(_text_:classification in 2158) [ClassicSimilarity], result of:
        0.059499815 = score(doc=2158,freq=6.0), product of:
          0.16271563 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.051092815 = queryNorm
          0.3656675 = fieldWeight in 2158, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.046875 = fieldNorm(doc=2158)
      0.041534226 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
        0.041534226 = score(doc=2158,freq=2.0), product of:
          0.17891833 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051092815 = queryNorm
          0.23214069 = fieldWeight in 2158, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2158)
  0.33333334 = coord(1/3)

Abstract: This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
Date: 4. 8.2015 19:22:04

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.03
```
0.03062186 = product of:
  0.09186558 = sum of:
    0.09186558 = sum of:
      0.057253722 = weight(_text_:classification in 1107) [ClassicSimilarity], result of:
        0.057253722 = score(doc=1107,freq=8.0), product of:
          0.16271563 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.051092815 = queryNorm
          0.35186368 = fieldWeight in 1107, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1107)
      0.034611855 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
        0.034611855 = score(doc=1107,freq=2.0), product of:
          0.17891833 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051092815 = queryNorm
          0.19345059 = fieldWeight in 1107, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1107)
  0.33333334 = coord(1/3)
```
Abstract

Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.

Date

28.10.2013 19:22:57

Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.03

0.025295487 = product of:
  0.07588646 = sum of:
    0.07588646 = sum of:
      0.034352235 = weight(_text_:classification in 3051) [ClassicSimilarity], result of:
        0.034352235 = score(doc=3051,freq=2.0), product of:
          0.16271563 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.051092815 = queryNorm
          0.21111822 = fieldWeight in 3051, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.046875 = fieldNorm(doc=3051)
      0.041534226 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
        0.041534226 = score(doc=3051,freq=2.0), product of:
          0.17891833 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051092815 = queryNorm
          0.23214069 = fieldWeight in 3051, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=3051)
  0.33333334 = coord(1/3)

Date: 22. 8.2009 19:51:28
Footnote: Vgl. auch die Präsentationen unter: http://www.bibliothek.uni-regensburg.de/Systematik/pdf/Anw2008_PPT1.pdf. http://blog.bib.uni-mannheim.de/Classification/wp-content/uploads/2007/10/hu-berlin-2007-2.pdf. Volltexte unter:

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.03
```
0.025032118 = product of:
  0.075096354 = sum of:
    0.075096354 = sum of:
      0.040484495 = weight(_text_:classification in 2765) [ClassicSimilarity], result of:
        0.040484495 = score(doc=2765,freq=4.0), product of:
          0.16271563 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.051092815 = queryNorm
          0.24880521 = fieldWeight in 2765, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2765)
      0.034611855 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
        0.034611855 = score(doc=2765,freq=2.0), product of:
          0.17891833 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051092815 = queryNorm
          0.19345059 = fieldWeight in 2765, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2765)
  0.33333334 = coord(1/3)
```
Abstract

Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy.

Date

22. 3.2009 19:14:43
Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.02
```
0.024697494 = product of:
  0.07409248 = sum of:
    0.07409248 = weight(_text_:bibliographic in 977) [ClassicSimilarity], result of:
      0.07409248 = score(doc=977,freq=6.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.3724989 = fieldWeight in 977, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0390625 = fieldNorm(doc=977)
  0.33333334 = coord(1/3)
```
Abstract

The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.

Sparck Jones, K.: Automatic classification (1976) 0.02

0.02159173 = product of:
  0.06477519 = sum of:
    0.06477519 = product of:
      0.12955038 = sum of:
        0.12955038 = weight(_text_:classification in 2908) [ClassicSimilarity], result of:
          0.12955038 = score(doc=2908,freq=4.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.7961767 = fieldWeight in 2908, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.125 = fieldNorm(doc=2908)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Classification in the 1970s: a second look. Rev. ed. Ed.: A. Maltby

Search (145 results, page 1 of 8)

Authors

Years

Languages

Types

Themes

Subjects