Search (201 results, page 1 of 11)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07

0.065768644 = sum of:
  0.04781587 = product of:
    0.19126348 = sum of:
      0.19126348 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.19126348 = score(doc=562,freq=2.0), product of:
          0.34031555 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.040140964 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.25 = coord(1/4)
  0.0070756786 = weight(_text_:a in 562) [ClassicSimilarity], result of:
    0.0070756786 = score(doc=562,freq=8.0), product of:
      0.04628442 = queryWeight, product of:
        1.153047 = idf(docFreq=37942, maxDocs=44218)
        0.040140964 = queryNorm
      0.15287387 = fieldWeight in 562, product of:
        2.828427 = tf(freq=8.0), with freq of:
          8.0 = termFreq=8.0
        1.153047 = idf(docFreq=37942, maxDocs=44218)
        0.046875 = fieldNorm(doc=562)
  0.010877093 = product of:
    0.032631278 = sum of:
      0.032631278 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.032631278 = score(doc=562,freq=2.0), product of:
          0.14056681 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.040140964 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)

Abstract: Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Type: a

Panyr, J.: STEINADLER: ein Verfahren zur automatischen Deskribierung und zur automatischen thematischen Klassifikation (1978) 0.03

0.025802074 = product of:
  0.03870311 = sum of:
    0.009434237 = weight(_text_:a in 5169) [ClassicSimilarity], result of:
      0.009434237 = score(doc=5169,freq=2.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.20383182 = fieldWeight in 5169, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.125 = fieldNorm(doc=5169)
    0.029268874 = product of:
      0.08780662 = sum of:
        0.08780662 = weight(_text_:29 in 5169) [ClassicSimilarity], result of:
          0.08780662 = score(doc=5169,freq=2.0), product of:
            0.14120336 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.040140964 = queryNorm
            0.6218451 = fieldWeight in 5169, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.125 = fieldNorm(doc=5169)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Source: Nachrichten für Dokumentation. 29(1978), S.92-96
Type: a

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.02

0.01921991 = product of:
  0.028829865 = sum of:
    0.0070756786 = weight(_text_:a in 1046) [ClassicSimilarity], result of:
      0.0070756786 = score(doc=1046,freq=2.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.15287387 = fieldWeight in 1046, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.09375 = fieldNorm(doc=1046)
    0.021754187 = product of:
      0.065262556 = sum of:
        0.065262556 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.065262556 = score(doc=1046,freq=2.0), product of:
            0.14056681 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.040140964 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Date: 5. 5.2003 14:17:22
Type: a

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.02

0.01671492 = product of:
  0.025072377 = sum of:
    0.0123824375 = weight(_text_:a in 5273) [ClassicSimilarity], result of:
      0.0123824375 = score(doc=5273,freq=18.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.26752928 = fieldWeight in 5273, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5273)
    0.012689941 = product of:
      0.038069822 = sum of:
        0.038069822 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
          0.038069822 = score(doc=5273,freq=2.0), product of:
            0.14056681 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.040140964 = queryNorm
            0.2708308 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Abstract: In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with large collections of practical data. In this article, we introduce a new evaluation scheme for internal node classifiers, which can be used effectively to develop a hierarchical classification system. We also show that our method for constructing the hierarchical classification system is very effective, especially for the task of constructing classifiers applied to hierarchy tree with a lot of levels.
Date: 22. 7.2006 16:24:52
Type: a

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.02

0.016016591 = product of:
  0.024024887 = sum of:
    0.0058963983 = weight(_text_:a in 2748) [ClassicSimilarity], result of:
      0.0058963983 = score(doc=2748,freq=2.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.12739488 = fieldWeight in 2748, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.078125 = fieldNorm(doc=2748)
    0.018128488 = product of:
      0.054385465 = sum of:
        0.054385465 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.054385465 = score(doc=2748,freq=2.0), product of:
            0.14056681 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.040140964 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Date: 1. 2.2016 18:25:22
Type: a

Chung, Y.-M.; Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents (2003) 0.02
```
0.015139677 = product of:
  0.022709515 = sum of:
    0.011733686 = weight(_text_:a in 1566) [ClassicSimilarity], result of:
      0.011733686 = score(doc=1566,freq=22.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.25351265 = fieldWeight in 1566, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=1566)
    0.010975828 = product of:
      0.032927483 = sum of:
        0.032927483 = weight(_text_:29 in 1566) [ClassicSimilarity], result of:
          0.032927483 = score(doc=1566,freq=2.0), product of:
            0.14120336 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.040140964 = queryNorm
            0.23319192 = fieldWeight in 1566, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=1566)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

This study developed a specialized directory system using an automatic classification technique. Economics was selected as the subject field for the classification experiments with Web documents. The classification scheme of the directory follows the DDC, and subject terms representing each class number or subject category were selected from the DDC table to construct a representative term dictionary. In collecting and classifying the Web documents, various strategies were tested in order to find the optimal thresholds. In the classification experiments, Web documents in economics were classified into a total of 757 hierarchical subject categories built from the DDC scheme. The first and second experiments using the representative term dictionary resulted in relatively high precision ratios of 77 and 60%, respectively. The third experiment employing a machine learning-based k-nearest neighbours (kNN) classifier in a closed experimental setting achieved a precision ratio of 96%. This implies that it is possible to enhance the classification performance by applying a hybrid method combining a dictionary-based technique and a kNN classifier

Source

Journal of information science. 29(2003) no.2, S.117-126

Type

a

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.02

0.015073854 = product of:
  0.02261078 = sum of:
    0.011733686 = weight(_text_:a in 2158) [ClassicSimilarity], result of:
      0.011733686 = score(doc=2158,freq=22.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.25351265 = fieldWeight in 2158, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2158)
    0.010877093 = product of:
      0.032631278 = sum of:
        0.032631278 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.032631278 = score(doc=2158,freq=2.0), product of:
            0.14056681 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.040140964 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Abstract: This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
Date: 4. 8.2015 19:22:04
Type: a

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01

0.014612844 = product of:
  0.021919265 = sum of:
    0.009229324 = weight(_text_:a in 1673) [ClassicSimilarity], result of:
      0.009229324 = score(doc=1673,freq=10.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.19940455 = fieldWeight in 1673, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.012689941 = product of:
      0.038069822 = sum of:
        0.038069822 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.038069822 = score(doc=1673,freq=2.0), product of:
            0.14056681 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.040140964 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia; vgl. auch: http://www7.scu.edu.au/programme/posters/1846/com1846.htm.
Type: a

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01

0.014612844 = product of:
  0.021919265 = sum of:
    0.009229324 = weight(_text_:a in 2560) [ClassicSimilarity], result of:
      0.009229324 = score(doc=2560,freq=10.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.19940455 = fieldWeight in 2560, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.012689941 = product of:
      0.038069822 = sum of:
        0.038069822 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.038069822 = score(doc=2560,freq=2.0), product of:
            0.14056681 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.040140964 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Abstract: The proliferation of digital resources and their integration into a traditional library setting has created a pressing need for an automated tool that organizes textual information based on library classification schemes. Automated text classification is a research field of developing tools, methods, and models to automate text classification. This article describes the current popular approach for text classification and major text classification projects and applications that are based on library classification schemes. Related issues and challenges are discussed, and a number of considerations for the challenges are examined.
Date: 22. 9.2008 18:31:54
Type: a

Savic, D.: Designing an expert system for classifying office documents (1994) 0.01

0.014203634 = product of:
  0.021305451 = sum of:
    0.006671014 = weight(_text_:a in 2655) [ClassicSimilarity], result of:
      0.006671014 = score(doc=2655,freq=4.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.14413087 = fieldWeight in 2655, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=2655)
    0.014634437 = product of:
      0.04390331 = sum of:
        0.04390331 = weight(_text_:29 in 2655) [ClassicSimilarity], result of:
          0.04390331 = score(doc=2655,freq=2.0), product of:
            0.14120336 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.040140964 = queryNorm
            0.31092256 = fieldWeight in 2655, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0625 = fieldNorm(doc=2655)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Abstract: Can records management benefit from artificial intelligence technology, in particular from expert systems? Gives an answer to this question by showing an example of a small scale prototype project in automatic classification of office documents. Project methodology and basic elements of an expert system's approach are elaborated to give guidelines to potential users of this promising technology
Source: Records management quarterly. 28(1994) no.3, S.20-29
Type: a

Ruocco, A.S.; Frieder, O.: Clustering and classification of large document bases in a parallel environment (1997) 0.01

0.01404006 = product of:
  0.02106009 = sum of:
    0.008254958 = weight(_text_:a in 1661) [ClassicSimilarity], result of:
      0.008254958 = score(doc=1661,freq=8.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.17835285 = fieldWeight in 1661, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1661)
    0.012805132 = product of:
      0.038415395 = sum of:
        0.038415395 = weight(_text_:29 in 1661) [ClassicSimilarity], result of:
          0.038415395 = score(doc=1661,freq=2.0), product of:
            0.14120336 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.040140964 = queryNorm
            0.27205724 = fieldWeight in 1661, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1661)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Abstract: Proposes the use of parallel computing systems to overcome the computationally intense clustering process. Examines 2 operations: clustering a document set and classifying the document set. Uses a subset of the TIPSTER corpus, specifically, articles from the Wall Street Journal. Document set classification was performed without the large storage requirements for ancillary data matrices. The time performance of the parallel systems was an improvement over sequential systems times, and produced the same clustering and classification scheme. Results show near linear speed up in higher threshold clustering applications
Date: 29. 7.1998 17:45:02
Type: a

Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.01

0.01404006 = product of:
  0.02106009 = sum of:
    0.008254958 = weight(_text_:a in 1595) [ClassicSimilarity], result of:
      0.008254958 = score(doc=1595,freq=8.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.17835285 = fieldWeight in 1595, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1595)
    0.012805132 = product of:
      0.038415395 = sum of:
        0.038415395 = weight(_text_:29 in 1595) [ClassicSimilarity], result of:
          0.038415395 = score(doc=1595,freq=2.0), product of:
            0.14120336 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.040140964 = queryNorm
            0.27205724 = fieldWeight in 1595, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1595)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Abstract: This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based an the divide-and-conquer principle. The method is evaluated using backpropagation neural networks, such as the machine learning algorithm, that leam to assign MeSH categories to a subset of MEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers, are provided. The results indicate that the use of hierarchical structures improves Performance significantly.
Date: 11. 5.2003 18:29:44
Type: a

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.01

0.013491558 = product of:
  0.020237336 = sum of:
    0.009360243 = weight(_text_:a in 2760) [ClassicSimilarity], result of:
      0.009360243 = score(doc=2760,freq=14.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.20223314 = fieldWeight in 2760, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2760)
    0.010877093 = product of:
      0.032631278 = sum of:
        0.032631278 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.032631278 = score(doc=2760,freq=2.0), product of:
            0.14056681 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.040140964 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Abstract: Information is often organized as a text hierarchy. A hierarchical text-classification system is thus essential for the management, sharing, and dissemination of information. It aims to automatically classify each incoming document into zero, one, or several categories in the text hierarchy. In this paper, we present a technique called CRHTC (context recognition for hierarchical text classification) that performs hierarchical text classification by recognizing the context of discussion (COD) of each category. A category's COD is governed by its ancestor categories, whose contents indicate contextual backgrounds of the category. A document may be classified into a category only if its content matches the category's COD. CRHTC does not require any trials to manually set parameters, and hence is more portable and easier to implement than other methods. It is empirically evaluated under various conditions. The results show that CRHTC achieves both better and more stable performance than several hierarchical and nonhierarchical text-classification methodologies.
Date: 22. 3.2009 19:11:54
Type: a

Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.01

0.013302757 = product of:
  0.019954136 = sum of:
    0.0071490034 = weight(_text_:a in 2219) [ClassicSimilarity], result of:
      0.0071490034 = score(doc=2219,freq=6.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.1544581 = fieldWeight in 2219, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2219)
    0.012805132 = product of:
      0.038415395 = sum of:
        0.038415395 = weight(_text_:29 in 2219) [ClassicSimilarity], result of:
          0.038415395 = score(doc=2219,freq=2.0), product of:
            0.14120336 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.040140964 = queryNorm
            0.27205724 = fieldWeight in 2219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2219)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Abstract: Classification of office documents is one of the administrative functions carried out by almost every organization and institution which sends and receives correspondence. Processing of this increasing amount of information coming and out going mail, in particular its classification, is time consuming and expensive. More and more organizations are seeking a solution for meeting this challenge by designing computer based systems for automatic classification. Examines the present status of available knowledge and methodology which can be used for automatic classification of office documents. Besides a review of classic methods and techniques, the focus id also placed on the application of artificial intelligence
Source: Records management quarterly. 29(1995) no.4, S.3-18
Type: a

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01
```
0.012851405 = product of:
  0.019277107 = sum of:
    0.010212863 = weight(_text_:a in 1107) [ClassicSimilarity], result of:
      0.010212863 = score(doc=1107,freq=24.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.22065444 = fieldWeight in 1107, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1107)
    0.009064244 = product of:
      0.027192732 = sum of:
        0.027192732 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.027192732 = score(doc=1107,freq=2.0), product of:
            0.14056681 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.040140964 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.

Date

28.10.2013 19:22:57

Type

a
Kwon, O.W.; Lee, J.H.: Text categorization based on k-nearest neighbor approach for web site classification (2003) 0.01
```
0.012616396 = product of:
  0.018924594 = sum of:
    0.009778071 = weight(_text_:a in 1070) [ClassicSimilarity], result of:
      0.009778071 = score(doc=1070,freq=22.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.21126054 = fieldWeight in 1070, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1070)
    0.009146524 = product of:
      0.02743957 = sum of:
        0.02743957 = weight(_text_:29 in 1070) [ClassicSimilarity], result of:
          0.02743957 = score(doc=1070,freq=2.0), product of:
            0.14120336 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.040140964 = queryNorm
            0.19432661 = fieldWeight in 1070, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1070)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

Automatic categorization is a viable method to deal with the scaling problem on the World Wide Web. For Web site classification, this paper proposes the use of Web pages linked with the home page in a different manner from the sole use of home pages in previous research. To implement our proposed method, we derive a scheme for Web site classification based on the k-nearest neighbor (k-NN) approach. It consists of three phases: Web page selection (connectivity analysis), Web page classification, and Web site classification. Given a Web site, the Web page selection chooses several representative Web pages using connectivity analysis. The k-NN classifier next classifies each of the selected Web pages. Finally, the classified Web pages are extended to a classification of the entire Web site. To improve performance, we supplement the k-NN approach with a feature selection method and a term weighting scheme using markup tags, and also reform its document-document similarity measure. In our experiments on a Korean commercial Web directory, the proposed system, using both a home page and its linked pages, improved the performance of micro-averaging breakeven point by 30.02%, compared with an ordinary classification which uses a home page only.

Date

27.12.2007 17:32:29

Type

a
Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.01
```
0.012591119 = product of:
  0.018886678 = sum of:
    0.0079108495 = weight(_text_:a in 3464) [ClassicSimilarity], result of:
      0.0079108495 = score(doc=3464,freq=10.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.1709182 = fieldWeight in 3464, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=3464)
    0.010975828 = product of:
      0.032927483 = sum of:
        0.032927483 = weight(_text_:29 in 3464) [ClassicSimilarity], result of:
          0.032927483 = score(doc=3464,freq=2.0), product of:
            0.14120336 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.040140964 = queryNorm
            0.23319192 = fieldWeight in 3464, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.

Date

1. 6.2010 9:29:57

Type

a

Dubin, D.: Dimensions and discriminability (1998) 0.01

0.012351385 = product of:
  0.018527078 = sum of:
    0.005837137 = weight(_text_:a in 2338) [ClassicSimilarity], result of:
      0.005837137 = score(doc=2338,freq=4.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.12611452 = fieldWeight in 2338, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.012689941 = product of:
      0.038069822 = sum of:
        0.038069822 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.038069822 = score(doc=2338,freq=2.0), product of:
            0.14056681 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.040140964 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Abstract: Visualization interfaces can improve subject access by highlighting the inclusion of document representation components in similarity and discrimination relationships. Within a set of retrieved documents, what kinds of groupings can index terms and subject headings make explicit? The role of controlled vocabulary in classifying search output is examined
Date: 22. 9.1997 19:16:05
Type: a

Giorgetti, D.; Sebastiani, F.: Automating survey coding by multiclass text categorization techniques (2003) 0.01
```
0.011994082 = product of:
  0.017991122 = sum of:
    0.008844598 = weight(_text_:a in 5172) [ClassicSimilarity], result of:
      0.008844598 = score(doc=5172,freq=18.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.19109234 = fieldWeight in 5172, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5172)
    0.009146524 = product of:
      0.02743957 = sum of:
        0.02743957 = weight(_text_:29 in 5172) [ClassicSimilarity], result of:
          0.02743957 = score(doc=5172,freq=2.0), product of:
            0.14120336 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.040140964 = queryNorm
            0.19432661 = fieldWeight in 5172, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5172)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

In this issue Giorgetti, and Sebastiani suggest that answers to open ended questions in survey instruments can be coded automatically by creating classifiers which learn from training sets of manually coded answers. The manual effort required is only that of classifying a representative set of documents, not creating a dictionary of words that trigger an assignment. They use a naive Bayesian probabilistic learner from Mc Callum's RAINBOW package and the multi-class support vector machine learner from Hsu and Lin's BSVM package, both examples of text categorization techniques. Data from the 1996 General Social Survey by the U.S. National Opinion Research Center provided a set of answers to three questions (previously tested by Viechnicki using a dictionary approach), their associated manually assigned category codes, and a complete set of predefined category codes. The learners were run on three random disjoint subsets of the answer sets to create the classifiers and a remaining set was used as a test set. The dictionary approach is out preformed by 18% for RAINBOW and by 17% for BSVM, while the standard deviation of the results is reduced by 28% and 34% respectively over the dictionary approach.

Date

9. 7.2006 10:29:12

Type

a
Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.01
```
0.011656861 = product of:
  0.01748529 = sum of:
    0.008338767 = weight(_text_:a in 2300) [ClassicSimilarity], result of:
      0.008338767 = score(doc=2300,freq=16.0), product of:
        0.04628442 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040140964 = queryNorm
        0.18016359 = fieldWeight in 2300, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2300)
    0.009146524 = product of:
      0.02743957 = sum of:
        0.02743957 = weight(_text_:29 in 2300) [ClassicSimilarity], result of:
          0.02743957 = score(doc=2300,freq=2.0), product of:
            0.14120336 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.040140964 = queryNorm
            0.19432661 = fieldWeight in 2300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2300)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

Subject terms play a crucial role in resource discovery but require substantial effort to produce. Automatic subject classification and indexing address problems of scale and sustainability and can be used to enrich existing bibliographic records, establish more connections across and between resources and enhance consistency of bibliographic data. The paper aims to put forward a complex methodological framework to evaluate automatic classification tools of Swedish textual documents based on the Dewey Decimal Classification (DDC) recently introduced to Swedish libraries. Three major complementary approaches are suggested: a quality-built gold standard, retrieval effects, domain analysis. The gold standard is built based on input from at least two catalogue librarians, end-users expert in the subject, end users inexperienced in the subject and automated tools. Retrieval effects are studied through a combination of assigned and free tasks, including factual and comprehensive types. The study also takes into consideration the different role and character of subject terms in various knowledge domains, such as scientific disciplines. As a theoretical framework, domain analysis is used and applied in relation to the implementation of DDC in Swedish libraries and chosen domains of knowledge within the DDC itself.

Source

Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro

Type

a

Search (201 results, page 1 of 11)

Authors

Years

Languages

Types

Themes