Search (45 results, page 1 of 3)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.36

0.3604252 = sum of:
  0.07437435 = product of:
    0.22312303 = sum of:
      0.22312303 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.22312303 = score(doc=562,freq=2.0), product of:
          0.39700332 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046827413 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.22312303 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
    0.22312303 = score(doc=562,freq=2.0), product of:
      0.39700332 = queryWeight, product of:
        8.478011 = idf(docFreq=24, maxDocs=44218)
        0.046827413 = queryNorm
      0.56201804 = fieldWeight in 562, product of:
        1.4142135 = tf(freq=2.0), with freq of:
          2.0 = termFreq=2.0
        8.478011 = idf(docFreq=24, maxDocs=44218)
        0.046875 = fieldNorm(doc=562)
  0.043894395 = weight(_text_:data in 562) [ClassicSimilarity], result of:
    0.043894395 = score(doc=562,freq=4.0), product of:
      0.14807065 = queryWeight, product of:
        3.1620505 = idf(docFreq=5088, maxDocs=44218)
        0.046827413 = queryNorm
      0.29644224 = fieldWeight in 562, product of:
        2.0 = tf(freq=4.0), with freq of:
          4.0 = termFreq=4.0
        3.1620505 = idf(docFreq=5088, maxDocs=44218)
        0.046875 = fieldNorm(doc=562)
  0.019033402 = product of:
    0.038066804 = sum of:
      0.038066804 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
        0.038066804 = score(doc=562,freq=2.0), product of:
          0.16398162 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046827413 = queryNorm
          0.23214069 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.5 = coord(1/2)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Source: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK

Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.04
```
0.0448143 = product of:
  0.0896286 = sum of:
    0.06843241 = weight(_text_:data in 5997) [ClassicSimilarity], result of:
      0.06843241 = score(doc=5997,freq=14.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.46216056 = fieldWeight in 5997, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 5997) [ClassicSimilarity], result of:
          0.042392377 = score(doc=5997,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 5997, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5997)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Given the huge amount of information in the internet and in practically every domain of knowledge that we are facing today, knowledge discovery calls for automation. The book deals with methods from classification and data analysis that respond effectively to this rapidly growing challenge. The interested reader will find new methodological insights as well as applications in economics, management science, finance, and marketing, and in pattern recognition, biology, health, and archaeology.

Content

Data Analysis, Statistics, and Classification.- Pattern Recognition and Automation.- Data Mining, Information Processing, and Automation.- New Media, Web Mining, and Automation.- Applications in Management Science, Finance, and Marketing.- Applications in Medicine, Biology, Archaeology, and Others.- Author Index.- Subject Index.

RSWK

Data Mining / Kongress / Passau <2000>

Series

Proceedings of the ... annual conference of the Gesellschaft für Klassifikation e.V. ; 24)(Studies in classification, data analysis, and knowledge organization

Subject

Data Mining / Kongress / Passau <2000>

Theme

Data Mining
Zhou, G.D.; Zhang, M.; Ji, D.H.; Zhu, Q.M.: Hierarchical learning strategy in semantic relation extraction (2008) 0.03
```
0.03466491 = product of:
  0.06932982 = sum of:
    0.043894395 = weight(_text_:data in 2077) [ClassicSimilarity], result of:
      0.043894395 = score(doc=2077,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.29644224 = fieldWeight in 2077, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=2077)
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 2077) [ClassicSimilarity], result of:
          0.05087085 = score(doc=2077,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 2077, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=2077)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

This paper proposes a novel hierarchical learning strategy to deal with the data sparseness problem in semantic relation extraction by modeling the commonality among related classes. For each class in the hierarchy either manually predefined or automatically clustered, a discriminative function is determined in a top-down way. As the upper-level class normally has much more positive training examples than the lower-level class, the corresponding discriminative function can be determined more reliably and guide the discriminative function learning in the lower-level one more effectively, which otherwise might suffer from limited training data. In this paper, two classifier learning approaches, i.e. the simple perceptron algorithm and the state-of-the-art Support Vector Machines, are applied using the hierarchical learning strategy. Moreover, several kinds of class hierarchies either manually predefined or automatically clustered are explored and compared. Evaluation on the ACE RDC 2003 and 2004 corpora shows that the hierarchical learning strategy much improves the performance on least- and medium-frequent relations.

Source

Information processing and management. 44(2008) no.3, S.1008-1021

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.03

0.029208332 = product of:
  0.058416665 = sum of:
    0.036211025 = weight(_text_:data in 5273) [ClassicSimilarity], result of:
      0.036211025 = score(doc=5273,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24455236 = fieldWeight in 5273, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5273)
    0.022205638 = product of:
      0.044411276 = sum of:
        0.044411276 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
          0.044411276 = score(doc=5273,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.2708308 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with large collections of practical data. In this article, we introduce a new evaluation scheme for internal node classifiers, which can be used effectively to develop a hierarchical classification system. We also show that our method for constructing the hierarchical classification system is very effective, especially for the task of constructing classifiers applied to hierarchy tree with a lot of levels.
Date: 22. 7.2006 16:24:52

Ibekwe-SanJuan, F.; SanJuan, E.: From term variants to research topics (2002) 0.03
```
0.028887425 = product of:
  0.05777485 = sum of:
    0.03657866 = weight(_text_:data in 1853) [ClassicSimilarity], result of:
      0.03657866 = score(doc=1853,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 1853, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1853)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 1853) [ClassicSimilarity], result of:
          0.042392377 = score(doc=1853,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 1853, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1853)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

In a scientific and technological watch (STW) task, an expert user needs to survey the evolution of research topics in his area of specialisation in order to detect interesting changes. The majority of methods proposing evaluation metrics (bibliometrics and scientometrics studies) for STW rely solely an statistical data analysis methods (Co-citation analysis, co-word analysis). Such methods usually work an structured databases where the units of analysis (words, keywords) are already attributed to documents by human indexers. The advent of huge amounts of unstructured textual data has rendered necessary the integration of natural language processing (NLP) techniques to first extract meaningful units from texts. We propose a method for STW which is NLP-oriented. The method not only analyses texts linguistically in order to extract terms from them, but also uses linguistic relations (syntactic variations) as the basis for clustering. Terms and variation relations are formalised as weighted di-graphs which the clustering algorithm, CPCL (Classification by Preferential Clustered Link) will seek to reduce in order to produces classes. These classes ideally represent the research topics present in the corpus. The results of the classification are subjected to validation by an expert in STW.

Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004) 0.03

0.028236724 = product of:
  0.05647345 = sum of:
    0.031038022 = weight(_text_:data in 2563) [ClassicSimilarity], result of:
      0.031038022 = score(doc=2563,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2096163 = fieldWeight in 2563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=2563)
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 2563) [ClassicSimilarity], result of:
          0.05087085 = score(doc=2563,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 2563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=2563)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Source: Information processing and management. 40(2004) no.2, S.239-255
Theme: Data Mining

Li, T.; Zhu, S.; Ogihara, M.: Text categorization via generalized discriminant analysis (2008) 0.02
```
0.023530604 = product of:
  0.04706121 = sum of:
    0.02586502 = weight(_text_:data in 2119) [ClassicSimilarity], result of:
      0.02586502 = score(doc=2119,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.17468026 = fieldWeight in 2119, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2119)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 2119) [ClassicSimilarity], result of:
          0.042392377 = score(doc=2119,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 2119, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2119)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Text categorization is an important research area and has been receiving much attention due to the growth of the on-line information and of Internet. Automated text categorization is generally cast as a multi-class classification problem. Much of previous work focused on binary document classification problems. Support vector machines (SVMs) excel in binary classification, but the elegant theory behind large-margin hyperplane cannot be easily extended to multi-class text classification. In addition, the training time and scaling are also important concerns. On the other hand, other techniques naturally extensible to handle multi-class classification are generally not as accurate as SVM. This paper presents a simple and efficient solution to multi-class text categorization. Classification problems are first formulated as optimization via discriminant analysis. Text categorization is then cast as the problem of finding coordinate transformations that reflects the inherent similarity from the data. While most of the previous approaches decompose a multi-class classification problem into multiple independent binary classification tasks, the proposed approach enables direct multi-class classification. By using generalized singular value decomposition (GSVD), a coordinate transformation that reflects the inherent class structure indicated by the generalized singular values is identified. Extensive experiments demonstrate the efficiency and effectiveness of the proposed approach.

Source

Information processing and management. 44(2008) no.5, S.1684-1697
Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.02
```
0.020863095 = product of:
  0.04172619 = sum of:
    0.02586502 = weight(_text_:data in 2765) [ClassicSimilarity], result of:
      0.02586502 = score(doc=2765,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.17468026 = fieldWeight in 2765, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2765)
    0.01586117 = product of:
      0.03172234 = sum of:
        0.03172234 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.03172234 = score(doc=2765,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy.

Date

22. 3.2009 19:14:43

Fong, A.C.M.: Mining a Web citation database for document clustering (2002) 0.02

0.018105512 = product of:
  0.07242205 = sum of:
    0.07242205 = weight(_text_:data in 3940) [ClassicSimilarity], result of:
      0.07242205 = score(doc=3940,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.48910472 = fieldWeight in 3940, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.109375 = fieldNorm(doc=3940)
  0.25 = coord(1/4)

Theme: Data Mining

Wu, M.; Fuller, M.; Wilkinson, R.: Using clustering and classification approaches in interactive retrieval (2001) 0.01

0.014837332 = product of:
  0.05934933 = sum of:
    0.05934933 = product of:
      0.11869866 = sum of:
        0.11869866 = weight(_text_:processing in 2666) [ClassicSimilarity], result of:
          0.11869866 = score(doc=2666,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.6261658 = fieldWeight in 2666, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.109375 = fieldNorm(doc=2666)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Information processing and management. 37(2001) no.3, S.459-484

Major, R.L.; Ragsdale, C.T.: ¬An aggregation approach to the classification problem using multiple prediction experts (2000) 0.01

0.012717713 = product of:
  0.05087085 = sum of:
    0.05087085 = product of:
      0.1017417 = sum of:
        0.1017417 = weight(_text_:processing in 3789) [ClassicSimilarity], result of:
          0.1017417 = score(doc=3789,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.53671354 = fieldWeight in 3789, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.09375 = fieldNorm(doc=3789)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Information processing and management. 36(2000) no.4, S.683-696

Reiner, U.: DDC-based search in the data of the German National Bibliography (2008) 0.01
```
0.010973599 = product of:
  0.043894395 = sum of:
    0.043894395 = weight(_text_:data in 2166) [ClassicSimilarity], result of:
      0.043894395 = score(doc=2166,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.29644224 = fieldWeight in 2166, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=2166)
  0.25 = coord(1/4)
```
Abstract

In 2004, the German National Library began to classify title records of the German National Bibliography according to subject groups based on the divisions of the Dewey Decimal Classification (DDC). Since 2006, all titles of the main series of the German National Bibliography are classified in strict compliance with the DDC. On this basis, an enhanced DDC-based search can be realized - e.g., searching the data of the German National Bibliography for title records using number components of synthesized classification numbers or searching for DDC numbers using unclassified title records. This paper gives an account of the current research and development of the DDC-based search. The work is conducted in the VZG project Colibri that focuses on the automatic analysis of DDC-synthesized numbers and the automatic classification of bibliographic title records.
Gauch, S.; Chandramouli, A.; Ranganathan, S.: Training a hierarchical classifier using inter document relationships (2009) 0.01
```
0.010973599 = product of:
  0.043894395 = sum of:
    0.043894395 = weight(_text_:data in 2697) [ClassicSimilarity], result of:
      0.043894395 = score(doc=2697,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.29644224 = fieldWeight in 2697, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=2697)
  0.25 = coord(1/4)
```
Abstract

Text classifiers automatically classify documents into appropriate concepts for different applications. Most classification approaches use flat classifiers that treat each concept as independent, even when the concept space is hierarchically structured. In contrast, hierarchical text classification exploits the structural relationships between the concepts. In this article, we explore the effectiveness of hierarchical classification for a large concept hierarchy. Since the quality of the classification is dependent on the quality and quantity of the training data, we evaluate the use of documents selected from subconcepts to address the sparseness of training data for the top-level classifiers and the use of document relationships to identify the most representative training documents. By selecting training documents using structural and similarity relationships, we achieve a statistically significant improvement of 39.8% (from 54.5-76.2%) in the accuracy of the hierarchical classifier over that of the flat classifier for a large, three-level concept hierarchy.

Brückner, T.; Dambeck, H.: Sortierautomaten : Grundlagen der Textklassifizierung (2003) 0.01

0.0103460075 = product of:
  0.04138403 = sum of:
    0.04138403 = weight(_text_:data in 2398) [ClassicSimilarity], result of:
      0.04138403 = score(doc=2398,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2794884 = fieldWeight in 2398, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=2398)
  0.25 = coord(1/4)

Theme: Data Mining

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.009516701 = product of:
  0.038066804 = sum of:
    0.038066804 = product of:
      0.07613361 = sum of:
        0.07613361 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.07613361 = score(doc=1046,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 5. 5.2003 14:17:22

Na, J.-C.; Sui, H.; Khoo, C.; Chan, S.; Zhou, Y.: Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews (2004) 0.01
```
0.009178219 = product of:
  0.036712877 = sum of:
    0.036712877 = product of:
      0.073425755 = sum of:
        0.073425755 = weight(_text_:processing in 2624) [ClassicSimilarity], result of:
          0.073425755 = score(doc=2624,freq=6.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.38733965 = fieldWeight in 2624, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2624)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

This paper reports a study in automatic sentiment classification, i.e., automatically classifying documents as expressing positive or negative Sentiments/opinions. The study investigates the effectiveness of using SVM (Support Vector Machine) an various text features to classify product reviews into recommended (positive Sentiment) and not recommended (negative sentiment). Compared with traditional topical classification, it was hypothesized that syntactic and semantic processing of text would be more important for sentiment classification. In the first part of this study, several different approaches, unigrams (individual words), selected words (such as verb, adjective, and adverb), and words labelled with part-of-speech tags were investigated. A sample of 1,800 various product reviews was retrieved from Review Centre (www.reviewcentre.com) for the study. 1,200 reviews were used for training, and 600 for testing. Using SVM, the baseline unigram approach obtained an accuracy rate of around 76%. The use of selected words obtained a marginally better result of 77.33%. Error analysis suggests various approaches for improving classification accuracy: use of negation phrase, making inference from superficial words, and solving the problem of comments an parts. The second part of the study that is in progress investigates the use of negation phrase through simple linguistic processing to improve classification accuracy. This approach increased the accuracy rate up to 79.33%.
Wang, J.: ¬An extensive study on automated Dewey Decimal Classification (2009) 0.01
```
0.009144665 = product of:
  0.03657866 = sum of:
    0.03657866 = weight(_text_:data in 3172) [ClassicSimilarity], result of:
      0.03657866 = score(doc=3172,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 3172, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3172)
  0.25 = coord(1/4)
```
Abstract

In this paper, we present a theoretical analysis and extensive experiments on the automated assignment of Dewey Decimal Classification (DDC) classes to bibliographic data with a supervised machine-learning approach. Library classification systems, such as the DDC, impose great obstacles on state-of-art text categorization (TC) technologies, including deep hierarchy, data sparseness, and skewed distribution. We first analyze statistically the document and category distributions over the DDC, and discuss the obstacles imposed by bibliographic corpora and library classification schemes on TC technology. To overcome these obstacles, we propose an innovative algorithm to reshape the DDC structure into a balanced virtual tree by balancing the category distribution and flattening the hierarchy. To improve the classification effectiveness to a level acceptable to real-world applications, we propose an interactive classification model that is able to predict a class of any depth within a limited number of user interactions. The experiments are conducted on a large bibliographic collection created by the Library of Congress within the science and technology domains over 10 years. With no more than three interactions, a classification accuracy of nearly 90% is achieved, thus providing a practical solution to the automatic bibliographic classification problem.

Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.01

0.008478476 = product of:
  0.033913903 = sum of:
    0.033913903 = product of:
      0.067827806 = sum of:
        0.067827806 = weight(_text_:processing in 2564) [ClassicSimilarity], result of:
          0.067827806 = score(doc=2564,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.35780904 = fieldWeight in 2564, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0625 = fieldNorm(doc=2564)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Information processing and management. 38(2002) no.1, S.79-89

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01

0.007930585 = product of:
  0.03172234 = sum of:
    0.03172234 = product of:
      0.06344468 = sum of:
        0.06344468 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.06344468 = score(doc=611,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 8.2009 12:54:24

Frank, E.; Paynter, G.W.: Predicting Library of Congress Classifications from Library of Congress Subject Headings (2004) 0.01
```
0.0077595054 = product of:
  0.031038022 = sum of:
    0.031038022 = weight(_text_:data in 2218) [ClassicSimilarity], result of:
      0.031038022 = score(doc=2218,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2096163 = fieldWeight in 2218, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=2218)
  0.25 = coord(1/4)
```
Abstract

This paper addresses the problem of automatically assigning a Library of Congress Classification (LCC) to a work given its set of Library of Congress Subject Headings (LCSH). LCCs are organized in a tree: The root node of this hierarchy comprises all possible topics, and leaf nodes correspond to the most specialized topic areas defined. We describe a procedure that, given a resource identified by its LCSH, automatically places that resource in the LCC hierarchy. The procedure uses machine learning techniques and training data from a large library catalog to learn a model that maps from sets of LCSH to classifications from the LCC tree. We present empirical results for our technique showing its accuracy an an independent collection of 50,000 LCSH/LCC pairs.

Search (45 results, page 1 of 3)

Authors

Languages

Types

Themes

Subjects