Search (52 results, page 2 of 3)

Shen, D.; Chen, Z.; Yang, Q.; Zeng, H.J.; Zhang, B.; Lu, Y.; Ma, W.Y.: Web page classification through summarization (2004) 0.00

0.0044140695 = product of:
  0.017656278 = sum of:
    0.017656278 = product of:
      0.05296883 = sum of:
        0.05296883 = weight(_text_:k in 4132) [ClassicSimilarity], result of:
          0.05296883 = score(doc=4132,freq=2.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.39440846 = fieldWeight in 4132, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.078125 = fieldNorm(doc=4132)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Hu, G.; Zhou, S.; Guan, J.; Hu, X.: Towards effective document clustering : a constrained K-means based approach (2008) 0.00
```
0.004369706 = product of:
  0.017478824 = sum of:
    0.017478824 = product of:
      0.052436467 = sum of:
        0.052436467 = weight(_text_:k in 2113) [ClassicSimilarity], result of:
          0.052436467 = score(doc=2113,freq=4.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.39044446 = fieldWeight in 2113, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2113)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)
```
Abstract

Document clustering is an important tool for document collection organization and browsing. In real applications, some limited knowledge about cluster membership of a small number of documents is often available, such as some pairs of documents belonging to the same cluster. This kind of prior knowledge can be served as constraints for the clustering process. We integrate the constraints into the trace formulation of the sum of square Euclidean distance function of K-means. Then, the combined criterion function is transformed into trace maximization, which is further optimized by eigen-decomposition. Our experimental evaluation shows that the proposed semi-supervised clustering method can achieve better performance, compared to three existing methods.

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.00

0.0038228673 = product of:
  0.015291469 = sum of:
    0.015291469 = product of:
      0.030582938 = sum of:
        0.030582938 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.030582938 = score(doc=2760,freq=2.0), product of:
            0.13174312 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037621226 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 3.2009 19:11:54

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.00

0.0038228673 = product of:
  0.015291469 = sum of:
    0.015291469 = product of:
      0.030582938 = sum of:
        0.030582938 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.030582938 = score(doc=2158,freq=2.0), product of:
            0.13174312 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037621226 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 4. 8.2015 19:22:04

Alberts, I.; Forest, D.: Email pragmatics and automatic classification : a study in the organizational context (2012) 0.00
```
0.0038226962 = product of:
  0.015290785 = sum of:
    0.015290785 = product of:
      0.045872353 = sum of:
        0.045872353 = weight(_text_:k in 238) [ClassicSimilarity], result of:
          0.045872353 = score(doc=238,freq=6.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.34156775 = fieldWeight in 238, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=238)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)
```
Abstract

This paper presents a two-phased research project aiming to improve email triage for public administration managers. The first phase developed a typology of email classification patterns through a qualitative study involving 34 participants. Inspired by the fields of pragmatics and speech act theory, this typology comprising four top level categories and 13 subcategories represents the typical email triage behaviors of managers in an organizational context. The second study phase was conducted on a corpus of 1,703 messages using email samples of two managers. Using the k-NN (k-nearest neighbor) algorithm, statistical treatments automatically classified the email according to lexical and nonlexical features representative of managers' triage patterns. The automatic classification of email according to the lexicon of the messages was found to be substantially more efficient when k = 2 and n = 2,000. For four categories, the average recall rate was 94.32%, the average precision rate was 94.50%, and the accuracy rate was 94.54%. For 13 categories, the average recall rate was 91.09%, the average precision rate was 84.18%, and the accuracy rate was 88.70%. It appears that a message's nonlexical features are also deeply influenced by email pragmatics. Features related to the recipient and the sender were the most relevant for characterizing email.

Lindholm, J.; Schönthal, T.; Jansson , K.: Experiences of harvesting Web resources in engineering using automatic classification (2003) 0.00

0.0035312555 = product of:
  0.014125022 = sum of:
    0.014125022 = product of:
      0.042375065 = sum of:
        0.042375065 = weight(_text_:k in 4088) [ClassicSimilarity], result of:
          0.042375065 = score(doc=4088,freq=2.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.31552678 = fieldWeight in 4088, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0625 = fieldNorm(doc=4088)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Yi, K.: Challenges in automated classification using library classification schemes (2006) 0.00

0.0035312555 = product of:
  0.014125022 = sum of:
    0.014125022 = product of:
      0.042375065 = sum of:
        0.042375065 = weight(_text_:k in 5810) [ClassicSimilarity], result of:
          0.042375065 = score(doc=5810,freq=2.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.31552678 = fieldWeight in 5810, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0625 = fieldNorm(doc=5810)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.00

0.003185723 = product of:
  0.012742892 = sum of:
    0.012742892 = product of:
      0.025485784 = sum of:
        0.025485784 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.025485784 = score(doc=2765,freq=2.0), product of:
            0.13174312 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037621226 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 3.2009 19:14:43

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.00

0.003185723 = product of:
  0.012742892 = sum of:
    0.012742892 = product of:
      0.025485784 = sum of:
        0.025485784 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.025485784 = score(doc=1107,freq=2.0), product of:
            0.13174312 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037621226 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 28.10.2013 19:22:57

Pong, J.Y.-H.; Kwok, R.C.-W.; Lau, R.Y.-K.; Hao, J.-X.; Wong, P.C.-C.: ¬A comparative study of two automatic document classification methods in a library setting (2008) 0.00
```
0.0031212184 = product of:
  0.012484874 = sum of:
    0.012484874 = product of:
      0.03745462 = sum of:
        0.03745462 = weight(_text_:k in 2532) [ClassicSimilarity], result of:
          0.03745462 = score(doc=2532,freq=4.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.2788889 = fieldWeight in 2532, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2532)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)
```
Abstract

In current library practice, trained human experts usually carry out document cataloguing and indexing based on a manual approach. With the explosive growth in the number of electronic documents available on the Internet and digital libraries, it is increasingly difficult for library practitioners to categorize both electronic documents and traditional library materials using just a manual approach. To improve the effectiveness and efficiency of document categorization at the library setting, more in-depth studies of using automatic document classification methods to categorize library items are required. Machine learning research has advanced rapidly in recent years. However, applying machine learning techniques to improve library practice is still a relatively unexplored area. This paper illustrates the design and development of a machine learning based automatic document classification system to alleviate the manual categorization problem encountered within the library setting. Two supervised machine learning algorithms have been tested. Our empirical tests show that supervised machine learning algorithms in general, and the k-nearest neighbours (KNN) algorithm in particular, can be used to develop an effective document classification system to enhance current library practice. Moreover, some concrete recommendations regarding how to practically apply the KNN algorithm to develop automatic document classification in a library setting are made. To our best knowledge, this is the first in-depth study of applying the KNN algorithm to automatic document classification based on the widely used LCC classification scheme adopted by many large libraries.

Han, K.; Rezapour, R.; Nakamura, K.; Devkota, D.; Miller, D.C.; Diesner, J.: ¬An expert-in-the-loop method for domain-specific document categorization based on small training data (2023) 0.00

0.0031212184 = product of:
  0.012484874 = sum of:
    0.012484874 = product of:
      0.03745462 = sum of:
        0.03745462 = weight(_text_:k in 967) [ClassicSimilarity], result of:
          0.03745462 = score(doc=967,freq=4.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.2788889 = fieldWeight in 967, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Yang, Y.; Liu, X.: ¬A re-examination of text categorization methods (1999) 0.00
```
0.0030898487 = product of:
  0.012359395 = sum of:
    0.012359395 = product of:
      0.037078183 = sum of:
        0.037078183 = weight(_text_:k in 3386) [ClassicSimilarity], result of:
          0.037078183 = score(doc=3386,freq=2.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.27608594 = fieldWeight in 3386, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3386)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)
```
Abstract

This paper reports a controlled study with statistical significance tests an five text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a Naive Bayes (NB) classifier. We focus an the robustness of these methods in dealing with a skewed category distribution, and their performance as function of the training-set category frequency. Our results show that SVM, kNN and LLSF significantly outperform NNet and NB when the number of positive training instances per category are small (less than ten, and that all the methods perform comparably when the categories are sufficiently common (over 300 instances).

Sebastiani, F.: Classification of text, automatic (2006) 0.00

0.0030898487 = product of:
  0.012359395 = sum of:
    0.012359395 = product of:
      0.037078183 = sum of:
        0.037078183 = weight(_text_:k in 5003) [ClassicSimilarity], result of:
          0.037078183 = score(doc=5003,freq=2.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.27608594 = fieldWeight in 5003, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5003)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Source: Encyclopedia of language and linguistics. 2nd ed. Ed.: K. Brown. Vol. 14

Chung, Y.-M.; Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents (2003) 0.00
```
0.0026484418 = product of:
  0.010593767 = sum of:
    0.010593767 = product of:
      0.0317813 = sum of:
        0.0317813 = weight(_text_:k in 1566) [ClassicSimilarity], result of:
          0.0317813 = score(doc=1566,freq=2.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.23664509 = fieldWeight in 1566, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=1566)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)
```
Abstract

This study developed a specialized directory system using an automatic classification technique. Economics was selected as the subject field for the classification experiments with Web documents. The classification scheme of the directory follows the DDC, and subject terms representing each class number or subject category were selected from the DDC table to construct a representative term dictionary. In collecting and classifying the Web documents, various strategies were tested in order to find the optimal thresholds. In the classification experiments, Web documents in economics were classified into a total of 757 hierarchical subject categories built from the DDC scheme. The first and second experiments using the representative term dictionary resulted in relatively high precision ratios of 77 and 60%, respectively. The third experiment employing a machine learning-based k-nearest neighbours (kNN) classifier in a closed experimental setting achieved a precision ratio of 96%. This implies that it is possible to enhance the classification performance by applying a hybrid method combining a dictionary-based technique and a kNN classifier

Sun, A.; Lim, E.-P.; Ng, W.-K.: Performance measurement framework for hierarchical text classification (2003) 0.00

0.0026484418 = product of:
  0.010593767 = sum of:
    0.010593767 = product of:
      0.0317813 = sum of:
        0.0317813 = weight(_text_:k in 1808) [ClassicSimilarity], result of:
          0.0317813 = score(doc=1808,freq=2.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.23664509 = fieldWeight in 1808, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=1808)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Golub, K.: Automated subject classification of textual Web pages, based on a controlled vocabulary : challenges and recommendations (2006) 0.00

0.0026484418 = product of:
  0.010593767 = sum of:
    0.010593767 = product of:
      0.0317813 = sum of:
        0.0317813 = weight(_text_:k in 5897) [ClassicSimilarity], result of:
          0.0317813 = score(doc=5897,freq=2.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.23664509 = fieldWeight in 5897, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=5897)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Hagedorn, K.; Chapman, S.; Newman, D.: Enhancing search and browse using automated clustering of subject metadata (2007) 0.00

0.0026484418 = product of:
  0.010593767 = sum of:
    0.010593767 = product of:
      0.0317813 = sum of:
        0.0317813 = weight(_text_:k in 1168) [ClassicSimilarity], result of:
          0.0317813 = score(doc=1168,freq=2.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.23664509 = fieldWeight in 1168, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=1168)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Golub, K.; Hamon, T.; Ardö, A.: Automated classification of textual documents based on a controlled vocabulary in engineering (2007) 0.00

0.0026484418 = product of:
  0.010593767 = sum of:
    0.010593767 = product of:
      0.0317813 = sum of:
        0.0317813 = weight(_text_:k in 1461) [ClassicSimilarity], result of:
          0.0317813 = score(doc=1461,freq=2.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.23664509 = fieldWeight in 1461, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=1461)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Reiner, U.: DDC-based search in the data of the German National Bibliography (2008) 0.00

0.0026484418 = product of:
  0.010593767 = sum of:
    0.010593767 = product of:
      0.0317813 = sum of:
        0.0317813 = weight(_text_:k in 2166) [ClassicSimilarity], result of:
          0.0317813 = score(doc=2166,freq=2.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.23664509 = fieldWeight in 2166, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=2166)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Source: New pespectives on subject indexing and classification: essays in honour of Magda Heiner-Freiling. Red.: K. Knull-Schlomann, u.a

Golub, K.: Automated subject classification of textual documents in the context of Web-based hierarchical browsing (2011) 0.00

0.0026484418 = product of:
  0.010593767 = sum of:
    0.010593767 = product of:
      0.0317813 = sum of:
        0.0317813 = weight(_text_:k in 4558) [ClassicSimilarity], result of:
          0.0317813 = score(doc=4558,freq=2.0), product of:
            0.13429943 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.037621226 = queryNorm
            0.23664509 = fieldWeight in 4558, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=4558)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Search (52 results, page 2 of 3)

Authors

Years

Languages

Types

Themes