Search (55 results, page 3 of 3)

Golub, K.: Automated subject classification of textual web documents (2006) 0.00

0.0034311134 = product of:
  0.01029334 = sum of:
    0.01029334 = product of:
      0.03088002 = sum of:
        0.03088002 = weight(_text_:k in 5600) [ClassicSimilarity], result of:
          0.03088002 = score(doc=5600,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.19720423 = fieldWeight in 5600, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5600)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.00
```
0.0034311134 = product of:
  0.01029334 = sum of:
    0.01029334 = product of:
      0.03088002 = sum of:
        0.03088002 = weight(_text_:k in 448) [ClassicSimilarity], result of:
          0.03088002 = score(doc=448,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.19720423 = fieldWeight in 448, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=448)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

A distributed memory parallel version of the group average hierarchical agglomerative clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard Text REtrieval Conference (TREC) test collection, our parallel hierarchical clustering algorithm is shown to be scalable in terms of processors efficiently used and the collection size. Results show that our algorithm performs close to the expected O(n**2/p) time on p processors rather than the worst-case O(n**3/p) time. Furthermore, the O(n**2/p) memory complexity per node allows larger collections to be clustered as the number of nodes increases. While partitioning algorithms such as k-means are trivially parallelizable, our results confirm those of other studies which showed that hierarchical algorithms produce significantly tighter clusters in the document clustering task. Finally, we show how our parallel hierarchical agglomerative clustering algorithm can be used as the clustering subroutine for a parallel version of the buckshot algorithm to cluster the complete TREC collection at near theoretical runtime expectations.

Kishida, K.: High-speed rough clustering for very large document collections (2010) 0.00

0.0034311134 = product of:
  0.01029334 = sum of:
    0.01029334 = product of:
      0.03088002 = sum of:
        0.03088002 = weight(_text_:k in 3463) [ClassicSimilarity], result of:
          0.03088002 = score(doc=3463,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.19720423 = fieldWeight in 3463, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3463)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Golub, K.; Lykke, M.: Automated classification of web pages in hierarchical browsing (2009) 0.00

0.0034311134 = product of:
  0.01029334 = sum of:
    0.01029334 = product of:
      0.03088002 = sum of:
        0.03088002 = weight(_text_:k in 3614) [ClassicSimilarity], result of:
          0.03088002 = score(doc=3614,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.19720423 = fieldWeight in 3614, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3614)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Fagni, T.; Sebastiani, F.: Selecting negative examples for hierarchical text classification: An experimental comparison (2010) 0.00
```
0.0034311134 = product of:
  0.01029334 = sum of:
    0.01029334 = product of:
      0.03088002 = sum of:
        0.03088002 = weight(_text_:k in 4101) [ClassicSimilarity], result of:
          0.03088002 = score(doc=4101,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.19720423 = fieldWeight in 4101, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4101)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Hierarchical text classification (HTC) approaches have recently attracted a lot of interest on the part of researchers in human language technology and machine learning, since they have been shown to bring about equal, if not better, classification accuracy with respect to their "flat" counterparts while allowing exponential time savings at both learning and classification time. A typical component of HTC methods is a "local" policy for selecting negative examples: Given a category c, its negative training examples are by default identified with the training examples that are negative for c and positive for the categories which are siblings of c in the hierarchy. However, this policy has always been taken for granted and never been subjected to careful scrutiny since first proposed 15 years ago. This article proposes a thorough experimental comparison between this policy and three other policies for the selection of negative examples in HTC contexts, one of which (BEST LOCAL (k)) is being proposed for the first time in this article. We compare these policies on the hierarchical versions of three supervised learning algorithms (boosting, support vector machines, and naïve Bayes) by performing experiments on two standard TC datasets, REUTERS-21578 and RCV1-V2.

Yang, P.; Gao, W.; Tan, Q.; Wong, K.-F.: ¬A link-bridged topic model for cross-domain document classification (2013) 0.00

0.0034311134 = product of:
  0.01029334 = sum of:
    0.01029334 = product of:
      0.03088002 = sum of:
        0.03088002 = weight(_text_:k in 2706) [ClassicSimilarity], result of:
          0.03088002 = score(doc=2706,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.19720423 = fieldWeight in 2706, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2706)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.00

0.0034311134 = product of:
  0.01029334 = sum of:
    0.01029334 = product of:
      0.03088002 = sum of:
        0.03088002 = weight(_text_:k in 3311) [ClassicSimilarity], result of:
          0.03088002 = score(doc=3311,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.19720423 = fieldWeight in 3311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Ribeiro-Neto, B.; Laender, A.H.F.; Lima, L.R.S. de: ¬An experimental study in automatically categorizing medical documents (2001) 0.00

0.0033317097 = product of:
  0.009995129 = sum of:
    0.009995129 = product of:
      0.029985385 = sum of:
        0.029985385 = weight(_text_:29 in 5702) [ClassicSimilarity], result of:
          0.029985385 = score(doc=5702,freq=2.0), product of:
            0.15430406 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0438652 = queryNorm
            0.19432661 = fieldWeight in 5702, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5702)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 29. 9.2001 13:59:42

Chung, Y.M.; Lee, J.Y.: ¬A corpus-based approach to comparative evaluation of statistical term association measures (2001) 0.00

0.0033317097 = product of:
  0.009995129 = sum of:
    0.009995129 = product of:
      0.029985385 = sum of:
        0.029985385 = weight(_text_:29 in 5769) [ClassicSimilarity], result of:
          0.029985385 = score(doc=5769,freq=2.0), product of:
            0.15430406 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0438652 = queryNorm
            0.19432661 = fieldWeight in 5769, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5769)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 29. 9.2001 14:01:18

Ibekwe-SanJuan, F.; SanJuan, E.: From term variants to research topics (2002) 0.00

0.0033317097 = product of:
  0.009995129 = sum of:
    0.009995129 = product of:
      0.029985385 = sum of:
        0.029985385 = weight(_text_:29 in 1853) [ClassicSimilarity], result of:
          0.029985385 = score(doc=1853,freq=2.0), product of:
            0.15430406 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0438652 = queryNorm
            0.19432661 = fieldWeight in 1853, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1853)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: Knowledge organization. 29(2002) nos.3/4, S.181-197

Giorgetti, D.; Sebastiani, F.: Automating survey coding by multiclass text categorization techniques (2003) 0.00

0.0033317097 = product of:
  0.009995129 = sum of:
    0.009995129 = product of:
      0.029985385 = sum of:
        0.029985385 = weight(_text_:29 in 5172) [ClassicSimilarity], result of:
          0.029985385 = score(doc=5172,freq=2.0), product of:
            0.15430406 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0438652 = queryNorm
            0.19432661 = fieldWeight in 5172, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5172)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 9. 7.2006 10:29:12

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.00

0.0033017385 = product of:
  0.009905215 = sum of:
    0.009905215 = product of:
      0.029715646 = sum of:
        0.029715646 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.029715646 = score(doc=2765,freq=2.0), product of:
            0.15360846 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0438652 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 22. 3.2009 19:14:43

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.00

0.0033017385 = product of:
  0.009905215 = sum of:
    0.009905215 = product of:
      0.029715646 = sum of:
        0.029715646 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.029715646 = score(doc=1107,freq=2.0), product of:
            0.15360846 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0438652 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 28.10.2013 19:22:57

Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015) 0.00

0.0026653677 = product of:
  0.007996103 = sum of:
    0.007996103 = product of:
      0.023988307 = sum of:
        0.023988307 = weight(_text_:29 in 2301) [ClassicSimilarity], result of:
          0.023988307 = score(doc=2301,freq=2.0), product of:
            0.15430406 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0438652 = queryNorm
            0.15546128 = fieldWeight in 2301, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03125 = fieldNorm(doc=2301)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro

Borko, H.: Research in computer based classification systems (1985) 0.00
```
0.0024017796 = product of:
  0.0072053387 = sum of:
    0.0072053387 = product of:
      0.021616016 = sum of:
        0.021616016 = weight(_text_:k in 3647) [ClassicSimilarity], result of:
          0.021616016 = score(doc=3647,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.13804297 = fieldWeight in 3647, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3647)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

The selection in this reader by R. M. Needham and K. Sparck Jones reports an early approach to automatic classification that was taken in England. The following selection reviews various approaches that were being pursued in the United States at about the same time. It then discusses a particular approach initiated in the early 1960s by Harold Borko, at that time Head of the Language Processing and Retrieval Research Staff at the System Development Corporation, Santa Monica, California and, since 1966, a member of the faculty at the Graduate School of Library and Information Science, University of California, Los Angeles. As was described earlier, there are two steps in automatic classification, the first being to identify pairs of terms that are similar by virtue of co-occurring as index terms in the same documents, and the second being to form equivalence classes of intersubstitutable terms. To compute similarities, Borko and his associates used a standard correlation formula; to derive classification categories, where Needham and Sparck Jones used clumping, the Borko team used the statistical technique of factor analysis. The fact that documents can be classified automatically, and in any number of ways, is worthy of passing notice. Worthy of serious attention would be a demonstra tion that a computer-based classification system was effective in the organization and retrieval of documents. One reason for the inclusion of the following selection in the reader is that it addresses the question of evaluation. To evaluate the effectiveness of their automatically derived classification, Borko and his team asked three questions. The first was Is the classification reliable? in other words, could the categories derived from one sample of texts be used to classify other texts? Reliability was assessed by a case-study comparison of the classes derived from three different samples of abstracts. The notso-surprising conclusion reached was that automatically derived classes were reliable only to the extent that the sample from which they were derived was representative of the total document collection. The second evaluation question asked whether the classification was reasonable, in the sense of adequately describing the content of the document collection. The answer was sought by comparing the automatically derived categories with categories in a related classification system that was manually constructed. Here the conclusion was that the automatic method yielded categories that fairly accurately reflected the major area of interest in the sample collection of texts; however, since there were only eleven such categories and they were quite broad, they could not be regarded as suitable for use in a university or any large general library. The third evaluation question asked whether automatic classification was accurate, in the sense of producing results similar to those obtainabie by human cIassifiers. When using human classification as a criterion, automatic classification was found to be 50 percent accurate.

Search (55 results, page 3 of 3)

Authors

Years

Types

Themes