Search (16 results, page 1 of 1)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.23

0.23356806 = product of:
  0.31142408 = sum of:
    0.07317444 = product of:
      0.21952331 = sum of:
        0.21952331 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.21952331 = score(doc=562,freq=2.0), product of:
            0.39059833 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046071928 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.21952331 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.21952331 = score(doc=562,freq=2.0), product of:
        0.39059833 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046071928 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.01872633 = product of:
      0.03745266 = sum of:
        0.03745266 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.03745266 = score(doc=562,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.009363165 = product of:
  0.03745266 = sum of:
    0.03745266 = product of:
      0.07490532 = sum of:
        0.07490532 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.07490532 = score(doc=1046,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 5. 5.2003 14:17:22

Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.01
```
0.0062964396 = product of:
  0.025185758 = sum of:
    0.025185758 = product of:
      0.050371516 = sum of:
        0.050371516 = weight(_text_:design in 1595) [ClassicSimilarity], result of:
          0.050371516 = score(doc=1595,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.29078758 = fieldWeight in 1595, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1595)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based an the divide-and-conquer principle. The method is evaluated using backpropagation neural networks, such as the machine learning algorithm, that leam to assign MeSH categories to a subset of MEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers, are provided. The results indicate that the use of hierarchical structures improves Performance significantly.

Automatic classification research at OCLC (2002) 0.01

0.0054618465 = product of:
  0.021847386 = sum of:
    0.021847386 = product of:
      0.04369477 = sum of:
        0.04369477 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
          0.04369477 = score(doc=1563,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.2708308 = fieldWeight in 1563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1563)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 5. 5.2003 9:22:09

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.01

0.0054618465 = product of:
  0.021847386 = sum of:
    0.021847386 = product of:
      0.04369477 = sum of:
        0.04369477 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
          0.04369477 = score(doc=5273,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.2708308 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 7.2006 16:24:52

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01

0.0054618465 = product of:
  0.021847386 = sum of:
    0.021847386 = product of:
      0.04369477 = sum of:
        0.04369477 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.04369477 = score(doc=2560,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 9.2008 18:31:54

Prabowo, R.; Jackson, M.; Burden, P.; Knoell, H.-D.: Ontology-based automatic classification for the Web pages : design, implementation and evaluation (2002) 0.01

0.0053969487 = product of:
  0.021587795 = sum of:
    0.021587795 = product of:
      0.04317559 = sum of:
        0.04317559 = weight(_text_:design in 3383) [ClassicSimilarity], result of:
          0.04317559 = score(doc=3383,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.24924651 = fieldWeight in 3383, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046875 = fieldNorm(doc=3383)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Cosh, K.J.; Burns, R.; Daniel, T.: Content clouds : classifying content in Web 2.0 (2008) 0.01
```
0.0053969487 = product of:
  0.021587795 = sum of:
    0.021587795 = product of:
      0.04317559 = sum of:
        0.04317559 = weight(_text_:design in 2013) [ClassicSimilarity], result of:
          0.04317559 = score(doc=2013,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.24924651 = fieldWeight in 2013, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046875 = fieldNorm(doc=2013)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Purpose - With increasing amounts of user generated content being produced electronically in the form of wikis, blogs, forums etc. the purpose of this paper is to investigate a new approach to classifying ad hoc content. Design/methodology/approach - The approach applies natural language processing (NLP) tools to automatically extract the content of some text, visualizing the results in a content cloud. Findings - Content clouds share the visual simplicity of a tag cloud, but display the details of an article at a different level of abstraction, providing a complimentary classification. Research limitations/implications - Provides the general approach to creating a content cloud. In the future, the process can be refined and enhanced by further evaluation of results. Further work is also required to better identify closely related articles. Practical implications - Being able to automatically classify the content generated by web users will enable others to find more appropriate content. Originality/value - The approach is original. Other researchers have produced a cloud, simply by using skiplists to filter unwanted words, this paper's approach improves this by applying appropriate NLP techniques.
Ozmutlu, S.; Cosar, G.C.: Analyzing the results of automatic new topic identification (2008) 0.01
```
0.0053969487 = product of:
  0.021587795 = sum of:
    0.021587795 = product of:
      0.04317559 = sum of:
        0.04317559 = weight(_text_:design in 2604) [ClassicSimilarity], result of:
          0.04317559 = score(doc=2604,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.24924651 = fieldWeight in 2604, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046875 = fieldNorm(doc=2604)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Purpose - Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. Recently, various studies have focused on new topic identification/session identification of search engine transaction logs, and several problems regarding the estimation of topic shifts and continuations were observed in these studies. This study aims to analyze the reasons for the problems that were encountered as a result of applying automatic new topic identification. Design/methodology/approach - Measures, such as cleaning the data of common words and analyzing the errors of automatic new topic identification, are applied to eliminate the problems in estimating topic shifts and continuations. Findings - The findings show that the resulting errors of automatic new topic identification have a pattern, and further research is required to improve the performance of automatic new topic identification. Originality/value - Improving the performance of automatic new topic identification would be valuable to search engine designers, so that they can develop new clustering and query recommendation algorithms, as well as custom-tailored graphical user interfaces for search engine users.

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.00

0.0046815826 = product of:
  0.01872633 = sum of:
    0.01872633 = product of:
      0.03745266 = sum of:
        0.03745266 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.03745266 = score(doc=2760,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 3.2009 19:11:54

Golub, K.: Automated subject classification of textual web documents (2006) 0.00
```
0.0044974573 = product of:
  0.01798983 = sum of:
    0.01798983 = product of:
      0.03597966 = sum of:
        0.03597966 = weight(_text_:design in 5600) [ClassicSimilarity], result of:
          0.03597966 = score(doc=5600,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.20770542 = fieldWeight in 5600, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5600)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Purpose - To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such. Design/methodology/approach - A range of works dealing with automated classification of full-text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages. Findings - Provides major similarities and differences between the three approaches: document pre-processing and utilization of web-specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized. Research limitations/implications - The paper does not attempt to provide an exhaustive bibliography of related resources. Practical implications - As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities. Originality/value - To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.
Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.00
```
0.0044974573 = product of:
  0.01798983 = sum of:
    0.01798983 = product of:
      0.03597966 = sum of:
        0.03597966 = weight(_text_:design in 831) [ClassicSimilarity], result of:
          0.03597966 = score(doc=831,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.20770542 = fieldWeight in 831, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.0390625 = fieldNorm(doc=831)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Purpose - The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task. Design/methodology/approach - Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation-based approach was compared with the non-segmentation-based approach. Findings - There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy. Practical implications - Apply the findings to real web text classification is ongoing work. Originality/value - The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.
Pong, J.Y.-H.; Kwok, R.C.-W.; Lau, R.Y.-K.; Hao, J.-X.; Wong, P.C.-C.: ¬A comparative study of two automatic document classification methods in a library setting (2008) 0.00
```
0.0044974573 = product of:
  0.01798983 = sum of:
    0.01798983 = product of:
      0.03597966 = sum of:
        0.03597966 = weight(_text_:design in 2532) [ClassicSimilarity], result of:
          0.03597966 = score(doc=2532,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.20770542 = fieldWeight in 2532, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2532)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

In current library practice, trained human experts usually carry out document cataloguing and indexing based on a manual approach. With the explosive growth in the number of electronic documents available on the Internet and digital libraries, it is increasingly difficult for library practitioners to categorize both electronic documents and traditional library materials using just a manual approach. To improve the effectiveness and efficiency of document categorization at the library setting, more in-depth studies of using automatic document classification methods to categorize library items are required. Machine learning research has advanced rapidly in recent years. However, applying machine learning techniques to improve library practice is still a relatively unexplored area. This paper illustrates the design and development of a machine learning based automatic document classification system to alleviate the manual categorization problem encountered within the library setting. Two supervised machine learning algorithms have been tested. Our empirical tests show that supervised machine learning algorithms in general, and the k-nearest neighbours (KNN) algorithm in particular, can be used to develop an effective document classification system to enhance current library practice. Moreover, some concrete recommendations regarding how to practically apply the KNN algorithm to develop automatic document classification in a library setting are made. To our best knowledge, this is the first in-depth study of applying the KNN algorithm to automatic document classification based on the widely used LCC classification scheme adopted by many large libraries.
Golub, K.; Lykke, M.: Automated classification of web pages in hierarchical browsing (2009) 0.00
```
0.0044974573 = product of:
  0.01798983 = sum of:
    0.01798983 = product of:
      0.03597966 = sum of:
        0.03597966 = weight(_text_:design in 3614) [ClassicSimilarity], result of:
          0.03597966 = score(doc=3614,freq=2.0), product of:
            0.17322445 = queryWeight, product of:
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.046071928 = queryNorm
            0.20770542 = fieldWeight in 3614, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7598698 = idf(docFreq=2798, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3614)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Purpose - The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme. Design/methodology/approach - A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes. Findings - The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness. Research limitations/implications - Further research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation. Practical implications - Improvements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated. Originality/value - A user-based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.00

0.003901319 = product of:
  0.015605276 = sum of:
    0.015605276 = product of:
      0.031210553 = sum of:
        0.031210553 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.031210553 = score(doc=2765,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 3.2009 19:14:43

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.00

0.0031210552 = product of:
  0.012484221 = sum of:
    0.012484221 = product of:
      0.024968442 = sum of:
        0.024968442 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.024968442 = score(doc=2741,freq=2.0), product of:
            0.16133605 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046071928 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 12. 9.2004 9:56:22

Search (16 results, page 1 of 1)

Authors

Types

Themes