Search (89 results, page 1 of 5)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.33

0.3290356 = product of:
  0.7520814 = sum of:
    0.050968137 = product of:
      0.1529044 = sum of:
        0.1529044 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.1529044 = score(doc=562,freq=2.0), product of:
            0.27206317 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.032090448 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.1529044 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1529044 = score(doc=562,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.1529044 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1529044 = score(doc=562,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.0764522 = product of:
      0.1529044 = sum of:
        0.1529044 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.1529044 = score(doc=562,freq=2.0), product of:
            0.27206317 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.032090448 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
    0.1529044 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1529044 = score(doc=562,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.1529044 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1529044 = score(doc=562,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.013043438 = product of:
      0.026086876 = sum of:
        0.026086876 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.026086876 = score(doc=562,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.4375 = coord(7/16)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Sebastiani, F.: Classification of text, automatic (2006) 0.02

0.0246226 = product of:
  0.13132054 = sum of:
    0.030953024 = weight(_text_:26 in 5003) [ClassicSimilarity], result of:
      0.030953024 = score(doc=5003,freq=2.0), product of:
        0.113328174 = queryWeight, product of:
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.032090448 = queryNorm
        0.27312735 = fieldWeight in 5003, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5003)
    0.078176126 = weight(_text_:2nd in 5003) [ClassicSimilarity], result of:
      0.078176126 = score(doc=5003,freq=2.0), product of:
        0.18010403 = queryWeight, product of:
          5.6123877 = idf(docFreq=438, maxDocs=44218)
          0.032090448 = queryNorm
        0.43406096 = fieldWeight in 5003, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.6123877 = idf(docFreq=438, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5003)
    0.022191396 = product of:
      0.044382792 = sum of:
        0.044382792 = weight(_text_:ed in 5003) [ClassicSimilarity], result of:
          0.044382792 = score(doc=5003,freq=4.0), product of:
            0.11411327 = queryWeight, product of:
              3.5559888 = idf(docFreq=3431, maxDocs=44218)
              0.032090448 = queryNorm
            0.38893628 = fieldWeight in 5003, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5559888 = idf(docFreq=3431, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5003)
      0.5 = coord(1/2)
  0.1875 = coord(3/16)

Date: 17. 5.2006 20:45:26
Source: Encyclopedia of language and linguistics. 2nd ed. Ed.: K. Brown. Vol. 14

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.02

0.021252422 = product of:
  0.11334625 = sum of:
    0.049064454 = weight(_text_:cataloguing in 2560) [ClassicSimilarity], result of:
      0.049064454 = score(doc=2560,freq=2.0), product of:
        0.14268221 = queryWeight, product of:
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.032090448 = queryNorm
        0.34387225 = fieldWeight in 2560, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.049064454 = weight(_text_:cataloguing in 2560) [ClassicSimilarity], result of:
      0.049064454 = score(doc=2560,freq=2.0), product of:
        0.14268221 = queryWeight, product of:
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.032090448 = queryNorm
        0.34387225 = fieldWeight in 2560, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2560)
    0.015217344 = product of:
      0.030434689 = sum of:
        0.030434689 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.030434689 = score(doc=2560,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.1875 = coord(3/16)

Date: 22. 9.2008 18:31:54
Source: International cataloguing and bibliographic control. 36(2007) no.4, S.78-82

Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.02
```
0.016510097 = product of:
  0.08805385 = sum of:
    0.022482576 = product of:
      0.04496515 = sum of:
        0.04496515 = weight(_text_:rules in 3300) [ClassicSimilarity], result of:
          0.04496515 = score(doc=3300,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.27821955 = fieldWeight in 3300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3300)
      0.5 = coord(1/2)
    0.02060612 = weight(_text_:american in 3300) [ClassicSimilarity], result of:
      0.02060612 = score(doc=3300,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.18834224 = fieldWeight in 3300, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
    0.04496515 = weight(_text_:rules in 3300) [ClassicSimilarity], result of:
      0.04496515 = score(doc=3300,freq=2.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.27821955 = fieldWeight in 3300, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
  0.1875 = coord(3/16)
```
Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.12, S.2530-2539
Yoon, Y.; Lee, G.G.: Efficient implementation of associative classifiers for document classification (2007) 0.01
```
0.014307825 = product of:
  0.1144626 = sum of:
    0.0381542 = product of:
      0.0763084 = sum of:
        0.0763084 = weight(_text_:rules in 909) [ClassicSimilarity], result of:
          0.0763084 = score(doc=909,freq=4.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.47215426 = fieldWeight in 909, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.046875 = fieldNorm(doc=909)
      0.5 = coord(1/2)
    0.0763084 = weight(_text_:rules in 909) [ClassicSimilarity], result of:
      0.0763084 = score(doc=909,freq=4.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.47215426 = fieldWeight in 909, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.046875 = fieldNorm(doc=909)
  0.125 = coord(2/16)
```
Abstract

In practical text classification tasks, the ability to interpret the classification result is as important as the ability to classify exactly. Associative classifiers have many favorable characteristics such as rapid training, good classification accuracy, and excellent interpretation. However, associative classifiers also have some obstacles to overcome when they are applied in the area of text classification. The target text collection generally has a very high dimension, thus the training process might take a very long time. We propose a feature selection based on the mutual information between the word and class variables to reduce the space dimension of the associative classifiers. In addition, the training process of the associative classifier produces a huge amount of classification rules, which makes the prediction with a new document ineffective. We resolve this by introducing a new efficient method for storing and pruning classification rules. This method can also be used when predicting a test document. Experimental results using the 20-newsgroups dataset show many benefits of the associative classification in both training and predicting when applied to a real world problem.
Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.01
```
0.01011716 = product of:
  0.08093728 = sum of:
    0.026979093 = product of:
      0.053958185 = sum of:
        0.053958185 = weight(_text_:rules in 3390) [ClassicSimilarity], result of:
          0.053958185 = score(doc=3390,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.33386347 = fieldWeight in 3390, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.046875 = fieldNorm(doc=3390)
      0.5 = coord(1/2)
    0.053958185 = weight(_text_:rules in 3390) [ClassicSimilarity], result of:
      0.053958185 = score(doc=3390,freq=2.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.33386347 = fieldWeight in 3390, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.046875 = fieldNorm(doc=3390)
  0.125 = coord(2/16)
```
Abstract

The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to 1960. Until the late '80s, the dominant approach to the problem involved knowledge-engineering automatic categorisers, i.e. manually building a set of rules encoding expert knowledge an how to classify documents. In the '90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest. A newer paradigm based an machine learning has superseded the previous approach. Within this paradigm, a general inductive process automatically builds a classifier by "learning", from a set of previously classified documents, the characteristics of one or more categories; the advantages are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this tutorial we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues of document indexing, classifier construction, and classifier evaluation, will be touched upon.

Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.01

0.0090667745 = product of:
  0.072534196 = sum of:
    0.04126692 = weight(_text_:author in 5997) [ClassicSimilarity], result of:
      0.04126692 = score(doc=5997,freq=2.0), product of:
        0.15482868 = queryWeight, product of:
          4.824759 = idf(docFreq=964, maxDocs=44218)
          0.032090448 = queryNorm
        0.26653278 = fieldWeight in 5997, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.824759 = idf(docFreq=964, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
    0.031267278 = weight(_text_:26 in 5997) [ClassicSimilarity], result of:
      0.031267278 = score(doc=5997,freq=4.0), product of:
        0.113328174 = queryWeight, product of:
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.032090448 = queryNorm
        0.2759003 = fieldWeight in 5997, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
  0.125 = coord(2/16)

Content: Data Analysis, Statistics, and Classification.- Pattern Recognition and Automation.- Data Mining, Information Processing, and Automation.- New Media, Web Mining, and Automation.- Applications in Management Science, Finance, and Marketing.- Applications in Medicine, Biology, Archaeology, and Others.- Author Index.- Subject Index.
Date: 26. 9.2006 18:02:28
26. 9.2006 18:20:10

Pong, J.Y.-H.; Kwok, R.C.-W.; Lau, R.Y.-K.; Hao, J.-X.; Wong, P.C.-C.: ¬A comparative study of two automatic document classification methods in a library setting (2008) 0.01
```
0.008761509 = product of:
  0.070092075 = sum of:
    0.035046037 = weight(_text_:cataloguing in 2532) [ClassicSimilarity], result of:
      0.035046037 = score(doc=2532,freq=2.0), product of:
        0.14268221 = queryWeight, product of:
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.032090448 = queryNorm
        0.24562302 = fieldWeight in 2532, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2532)
    0.035046037 = weight(_text_:cataloguing in 2532) [ClassicSimilarity], result of:
      0.035046037 = score(doc=2532,freq=2.0), product of:
        0.14268221 = queryWeight, product of:
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.032090448 = queryNorm
        0.24562302 = fieldWeight in 2532, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2532)
  0.125 = coord(2/16)
```
Abstract

In current library practice, trained human experts usually carry out document cataloguing and indexing based on a manual approach. With the explosive growth in the number of electronic documents available on the Internet and digital libraries, it is increasingly difficult for library practitioners to categorize both electronic documents and traditional library materials using just a manual approach. To improve the effectiveness and efficiency of document categorization at the library setting, more in-depth studies of using automatic document classification methods to categorize library items are required. Machine learning research has advanced rapidly in recent years. However, applying machine learning techniques to improve library practice is still a relatively unexplored area. This paper illustrates the design and development of a machine learning based automatic document classification system to alleviate the manual categorization problem encountered within the library setting. Two supervised machine learning algorithms have been tested. Our empirical tests show that supervised machine learning algorithms in general, and the k-nearest neighbours (KNN) algorithm in particular, can be used to develop an effective document classification system to enhance current library practice. Moreover, some concrete recommendations regarding how to practically apply the KNN algorithm to develop automatic document classification in a library setting are made. To our best knowledge, this is the first in-depth study of applying the KNN algorithm to automatic document classification based on the widely used LCC classification scheme adopted by many large libraries.
Adams, K.C.: Word wranglers : Automatic classification tools transform enterprise documents from "bags of words" into knowledge resources (2003) 0.01
```
0.008430966 = product of:
  0.06744773 = sum of:
    0.022482576 = product of:
      0.04496515 = sum of:
        0.04496515 = weight(_text_:rules in 1665) [ClassicSimilarity], result of:
          0.04496515 = score(doc=1665,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.27821955 = fieldWeight in 1665, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1665)
      0.5 = coord(1/2)
    0.04496515 = weight(_text_:rules in 1665) [ClassicSimilarity], result of:
      0.04496515 = score(doc=1665,freq=2.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.27821955 = fieldWeight in 1665, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1665)
  0.125 = coord(2/16)
```
Abstract

Taxonomies are an important part of any knowledge management (KM) system, and automatic classification software is emerging as a "killer app" for consumer and enterprise portals. A number of companies such as Inxight Software , Mohomine, Metacode, and others claim to interpret the semantic content of any textual document and automatically classify text on the fly. The promise that software could automatically produce a Yahoo-style directory is a siren call not many IT managers are able to resist. KM needs have grown more complex due to the increasing amount of digital information, the declining effectiveness of keyword searching, and heterogeneous document formats in corporate databases. This environment requires innovative KM tools, and automatic classification technology is an example of this new kind of software. These products can be divided into three categories according to their underlying technology - rules-based, catalog-by-example, and statistical clustering. Evolving trends in this market include framing classification as a cyborg (computer- and human-based) activity and the increasing use of extensible markup language (XML) and support vector machine (SVM) technology. In this article, we'll survey the rapidly changing automatic classification software market and examine the features and capabilities of leading classification products.
Mu, T.; Goulermas, J.Y.; Korkontzelos, I.; Ananiadou, S.: Descriptive document clustering via discriminant learning in a co-embedded space of multilevel similarities (2016) 0.01
```
0.006020497 = product of:
  0.09632795 = sum of:
    0.09632795 = weight(_text_:descriptive in 2496) [ClassicSimilarity], result of:
      0.09632795 = score(doc=2496,freq=6.0), product of:
        0.17974061 = queryWeight, product of:
          5.601063 = idf(docFreq=443, maxDocs=44218)
          0.032090448 = queryNorm
        0.5359276 = fieldWeight in 2496, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.601063 = idf(docFreq=443, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2496)
  0.0625 = coord(1/16)
```
Abstract

Descriptive document clustering aims at discovering clusters of semantically interrelated documents together with meaningful labels to summarize the content of each document cluster. In this work, we propose a novel descriptive clustering framework, referred to as CEDL. It relies on the formulation and generation of 2 types of heterogeneous objects, which correspond to documents and candidate phrases, using multilevel similarity information. CEDL is composed of 5 main processing stages. First, it simultaneously maps the documents and candidate phrases into a common co-embedded space that preserves higher-order, neighbor-based proximities between the combined sets of documents and phrases. Then, it discovers an approximate cluster structure of documents in the common space. The third stage extracts promising topic phrases by constructing a discriminant model where documents along with their cluster memberships are used as training instances. Subsequently, the final cluster labels are selected from the topic phrases using a ranking scheme using multiple scores based on the extracted co-embedding information and the discriminant output. The final stage polishes the initial clusters to reduce noise and accommodate the multitopic nature of documents. The effectiveness and competitiveness of CEDL is demonstrated qualitatively and quantitatively with experiments using document databases from different application fields.

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.01

0.005508239 = product of:
  0.04406591 = sum of:
    0.028848568 = weight(_text_:american in 5273) [ClassicSimilarity], result of:
      0.028848568 = score(doc=5273,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.26367915 = fieldWeight in 5273, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5273)
    0.015217344 = product of:
      0.030434689 = sum of:
        0.030434689 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
          0.030434689 = score(doc=5273,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.2708308 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.5 = coord(1/2)
  0.125 = coord(2/16)

Date: 22. 7.2006 16:24:52
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.3, S.431-442

Rooney, N.; Patterson, D.; Galushka, M.; Dobrynin, V.; Smirnova, E.: ¬An investigation into the stability of contextual document clustering (2008) 0.01

0.005339428 = product of:
  0.042715423 = sum of:
    0.022109302 = weight(_text_:26 in 1356) [ClassicSimilarity], result of:
      0.022109302 = score(doc=1356,freq=2.0), product of:
        0.113328174 = queryWeight, product of:
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.032090448 = queryNorm
        0.19509095 = fieldWeight in 1356, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1356)
    0.02060612 = weight(_text_:american in 1356) [ClassicSimilarity], result of:
      0.02060612 = score(doc=1356,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.18834224 = fieldWeight in 1356, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1356)
  0.125 = coord(2/16)

Date: 9. 2.2008 16:39:26
Source: Journal of the American Society for Information Science and Technology. 59(2008) no.2, S.256-266

Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.00

0.004886008 = product of:
  0.078176126 = sum of:
    0.078176126 = weight(_text_:2nd in 7209) [ClassicSimilarity], result of:
      0.078176126 = score(doc=7209,freq=2.0), product of:
        0.18010403 = queryWeight, product of:
          5.6123877 = idf(docFreq=438, maxDocs=44218)
          0.032090448 = queryNorm
        0.43406096 = fieldWeight in 7209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.6123877 = idf(docFreq=438, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
  0.0625 = coord(1/16)

Source: Internet world and document delivery world international 94: Proceedings of the 2nd Annual Conference, London, May 1994

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.00

0.0047213477 = product of:
  0.03777078 = sum of:
    0.024727343 = weight(_text_:american in 2760) [ClassicSimilarity], result of:
      0.024727343 = score(doc=2760,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.22601068 = fieldWeight in 2760, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.046875 = fieldNorm(doc=2760)
    0.013043438 = product of:
      0.026086876 = sum of:
        0.026086876 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.026086876 = score(doc=2760,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.5 = coord(1/2)
  0.125 = coord(2/16)

Date: 22. 3.2009 19:11:54
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.803-813

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.00

0.0047213477 = product of:
  0.03777078 = sum of:
    0.024727343 = weight(_text_:american in 690) [ClassicSimilarity], result of:
      0.024727343 = score(doc=690,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.22601068 = fieldWeight in 690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.046875 = fieldNorm(doc=690)
    0.013043438 = product of:
      0.026086876 = sum of:
        0.026086876 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
          0.026086876 = score(doc=690,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.23214069 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
      0.5 = coord(1/2)
  0.125 = coord(2/16)

Date: 23. 3.2013 13:22:36
Source: Journal of the American Society for Information Science and Technology. 64(2013) no.4, S.844-860

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.00

0.0039344565 = product of:
  0.031475652 = sum of:
    0.02060612 = weight(_text_:american in 2765) [ClassicSimilarity], result of:
      0.02060612 = score(doc=2765,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.18834224 = fieldWeight in 2765, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2765)
    0.010869532 = product of:
      0.021739064 = sum of:
        0.021739064 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.021739064 = score(doc=2765,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.125 = coord(2/16)

Date: 22. 3.2009 19:14:43
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.814-825

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.00

0.0039344565 = product of:
  0.031475652 = sum of:
    0.02060612 = weight(_text_:american in 1107) [ClassicSimilarity], result of:
      0.02060612 = score(doc=1107,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.18834224 = fieldWeight in 1107, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1107)
    0.010869532 = product of:
      0.021739064 = sum of:
        0.021739064 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.021739064 = score(doc=1107,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.5 = coord(1/2)
  0.125 = coord(2/16)

Date: 28.10.2013 19:22:57
Source: Journal of the American Society for Information Science and Technology. 64(2013) no.11, S.2265-2277

Kleinoeder, H.H.; Puzicha, J.: Automatische Katalogisierung am Beispiel einer Pilotanwendung (2002) 0.00

0.003869128 = product of:
  0.061906047 = sum of:
    0.061906047 = weight(_text_:26 in 1154) [ClassicSimilarity], result of:
      0.061906047 = score(doc=1154,freq=2.0), product of:
        0.113328174 = queryWeight, product of:
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.032090448 = queryNorm
        0.5462547 = fieldWeight in 1154, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.109375 = fieldNorm(doc=1154)
  0.0625 = coord(1/16)

Date: 26. 2.1996 17:51:49

Dubin, D.: Dimensions and discriminability (1998) 0.00

0.003863629 = product of:
  0.061818063 = sum of:
    0.061818063 = sum of:
      0.031383373 = weight(_text_:ed in 2338) [ClassicSimilarity], result of:
        0.031383373 = score(doc=2338,freq=2.0), product of:
          0.11411327 = queryWeight, product of:
            3.5559888 = idf(docFreq=3431, maxDocs=44218)
            0.032090448 = queryNorm
          0.27501947 = fieldWeight in 2338, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5559888 = idf(docFreq=3431, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2338)
      0.030434689 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
        0.030434689 = score(doc=2338,freq=2.0), product of:
          0.11237528 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.032090448 = queryNorm
          0.2708308 = fieldWeight in 2338, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2338)
  0.0625 = coord(1/16)

Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Wartena, C.; Sommer, M.: Automatic classification of scientific records using the German Subject Heading Authority File (SWD) (2012) 0.00

0.0034900059 = product of:
  0.055840094 = sum of:
    0.055840094 = weight(_text_:2nd in 472) [ClassicSimilarity], result of:
      0.055840094 = score(doc=472,freq=2.0), product of:
        0.18010403 = queryWeight, product of:
          5.6123877 = idf(docFreq=438, maxDocs=44218)
          0.032090448 = queryNorm
        0.31004354 = fieldWeight in 472, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.6123877 = idf(docFreq=438, maxDocs=44218)
          0.0390625 = fieldNorm(doc=472)
  0.0625 = coord(1/16)

Source: Proceedings of the 2nd International Workshop on Semantic Digital Archives held in conjunction with the 16th Int. Conference on Theory and Practice of Digital Libraries (TPDL) on September 27, 2012 in Paphos, Cyprus [http://ceur-ws.org/Vol-912/proceedings.pdf]. Eds.: A. Mitschik et al

Search (89 results, page 1 of 5)

Authors

Years

Languages

Types

Themes

Subjects