Search (44 results, page 2 of 3)

Chung, Y.-M.; Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents (2003) 0.01
```
0.009611183 = product of:
  0.019222366 = sum of:
    0.019222366 = product of:
      0.03844473 = sum of:
        0.03844473 = weight(_text_:k in 1566) [ClassicSimilarity], result of:
          0.03844473 = score(doc=1566,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.23664509 = fieldWeight in 1566, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=1566)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This study developed a specialized directory system using an automatic classification technique. Economics was selected as the subject field for the classification experiments with Web documents. The classification scheme of the directory follows the DDC, and subject terms representing each class number or subject category were selected from the DDC table to construct a representative term dictionary. In collecting and classifying the Web documents, various strategies were tested in order to find the optimal thresholds. In the classification experiments, Web documents in economics were classified into a total of 757 hierarchical subject categories built from the DDC scheme. The first and second experiments using the representative term dictionary resulted in relatively high precision ratios of 77 and 60%, respectively. The third experiment employing a machine learning-based k-nearest neighbours (kNN) classifier in a closed experimental setting achieved a precision ratio of 96%. This implies that it is possible to enhance the classification performance by applying a hybrid method combining a dictionary-based technique and a kNN classifier

Sun, A.; Lim, E.-P.; Ng, W.-K.: Performance measurement framework for hierarchical text classification (2003) 0.01

0.009611183 = product of:
  0.019222366 = sum of:
    0.019222366 = product of:
      0.03844473 = sum of:
        0.03844473 = weight(_text_:k in 1808) [ClassicSimilarity], result of:
          0.03844473 = score(doc=1808,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.23664509 = fieldWeight in 1808, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=1808)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Golub, K.: Automated subject classification of textual Web pages, based on a controlled vocabulary : challenges and recommendations (2006) 0.01

0.009611183 = product of:
  0.019222366 = sum of:
    0.019222366 = product of:
      0.03844473 = sum of:
        0.03844473 = weight(_text_:k in 5897) [ClassicSimilarity], result of:
          0.03844473 = score(doc=5897,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.23664509 = fieldWeight in 5897, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=5897)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Hagedorn, K.; Chapman, S.; Newman, D.: Enhancing search and browse using automated clustering of subject metadata (2007) 0.01

0.009611183 = product of:
  0.019222366 = sum of:
    0.019222366 = product of:
      0.03844473 = sum of:
        0.03844473 = weight(_text_:k in 1168) [ClassicSimilarity], result of:
          0.03844473 = score(doc=1168,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.23664509 = fieldWeight in 1168, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=1168)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Golub, K.; Hamon, T.; Ardö, A.: Automated classification of textual documents based on a controlled vocabulary in engineering (2007) 0.01

0.009611183 = product of:
  0.019222366 = sum of:
    0.019222366 = product of:
      0.03844473 = sum of:
        0.03844473 = weight(_text_:k in 1461) [ClassicSimilarity], result of:
          0.03844473 = score(doc=1461,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.23664509 = fieldWeight in 1461, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=1461)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Reiner, U.: DDC-based search in the data of the German National Bibliography (2008) 0.01

0.009611183 = product of:
  0.019222366 = sum of:
    0.019222366 = product of:
      0.03844473 = sum of:
        0.03844473 = weight(_text_:k in 2166) [ClassicSimilarity], result of:
          0.03844473 = score(doc=2166,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.23664509 = fieldWeight in 2166, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=2166)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: New pespectives on subject indexing and classification: essays in honour of Magda Heiner-Freiling. Red.: K. Knull-Schlomann, u.a

Golub, K.: Automated subject classification of textual documents in the context of Web-based hierarchical browsing (2011) 0.01

0.009611183 = product of:
  0.019222366 = sum of:
    0.019222366 = product of:
      0.03844473 = sum of:
        0.03844473 = weight(_text_:k in 4558) [ClassicSimilarity], result of:
          0.03844473 = score(doc=4558,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.23664509 = fieldWeight in 4558, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=4558)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.01

0.009248778 = product of:
  0.018497556 = sum of:
    0.018497556 = product of:
      0.036995113 = sum of:
        0.036995113 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.036995113 = score(doc=2760,freq=2.0), product of:
            0.15936506 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045509085 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2009 19:11:54

Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.01

0.009248778 = product of:
  0.018497556 = sum of:
    0.018497556 = product of:
      0.036995113 = sum of:
        0.036995113 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
          0.036995113 = score(doc=3051,freq=2.0), product of:
            0.15936506 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045509085 = queryNorm
            0.23214069 = fieldWeight in 3051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 8.2009 19:51:28

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01

0.009248778 = product of:
  0.018497556 = sum of:
    0.018497556 = product of:
      0.036995113 = sum of:
        0.036995113 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.036995113 = score(doc=2158,freq=2.0), product of:
            0.15936506 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045509085 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 4. 8.2015 19:22:04

Golub, K.: Automated subject classification of textual web documents (2006) 0.01

0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 5600) [ClassicSimilarity], result of:
          0.032037273 = score(doc=5600,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 5600, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5600)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.01
```
0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 448) [ClassicSimilarity], result of:
          0.032037273 = score(doc=448,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 448, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=448)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

A distributed memory parallel version of the group average hierarchical agglomerative clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard Text REtrieval Conference (TREC) test collection, our parallel hierarchical clustering algorithm is shown to be scalable in terms of processors efficiently used and the collection size. Results show that our algorithm performs close to the expected O(n**2/p) time on p processors rather than the worst-case O(n**3/p) time. Furthermore, the O(n**2/p) memory complexity per node allows larger collections to be clustered as the number of nodes increases. While partitioning algorithms such as k-means are trivially parallelizable, our results confirm those of other studies which showed that hierarchical algorithms produce significantly tighter clusters in the document clustering task. Finally, we show how our parallel hierarchical agglomerative clustering algorithm can be used as the clustering subroutine for a parallel version of the buckshot algorithm to cluster the complete TREC collection at near theoretical runtime expectations.

Kishida, K.: High-speed rough clustering for very large document collections (2010) 0.01

0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 3463) [ClassicSimilarity], result of:
          0.032037273 = score(doc=3463,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 3463, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3463)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Golub, K.; Lykke, M.: Automated classification of web pages in hierarchical browsing (2009) 0.01

0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 3614) [ClassicSimilarity], result of:
          0.032037273 = score(doc=3614,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 3614, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3614)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Fagni, T.; Sebastiani, F.: Selecting negative examples for hierarchical text classification: An experimental comparison (2010) 0.01
```
0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 4101) [ClassicSimilarity], result of:
          0.032037273 = score(doc=4101,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 4101, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4101)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Hierarchical text classification (HTC) approaches have recently attracted a lot of interest on the part of researchers in human language technology and machine learning, since they have been shown to bring about equal, if not better, classification accuracy with respect to their "flat" counterparts while allowing exponential time savings at both learning and classification time. A typical component of HTC methods is a "local" policy for selecting negative examples: Given a category c, its negative training examples are by default identified with the training examples that are negative for c and positive for the categories which are siblings of c in the hierarchy. However, this policy has always been taken for granted and never been subjected to careful scrutiny since first proposed 15 years ago. This article proposes a thorough experimental comparison between this policy and three other policies for the selection of negative examples in HTC contexts, one of which (BEST LOCAL (k)) is being proposed for the first time in this article. We compare these policies on the hierarchical versions of three supervised learning algorithms (boosting, support vector machines, and naïve Bayes) by performing experiments on two standard TC datasets, REUTERS-21578 and RCV1-V2.
Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.01
```
0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 967) [ClassicSimilarity], result of:
          0.032037273 = score(doc=967,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.

Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.01

0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 2300) [ClassicSimilarity], result of:
          0.032037273 = score(doc=2300,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 2300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2300)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Yang, P.; Gao, W.; Tan, Q.; Wong, K.-F.: ¬A link-bridged topic model for cross-domain document classification (2013) 0.01

0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 2706) [ClassicSimilarity], result of:
          0.032037273 = score(doc=2706,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 2706, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2706)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01

0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 3311) [ClassicSimilarity], result of:
          0.032037273 = score(doc=3311,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 3311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.01

0.007707316 = product of:
  0.015414632 = sum of:
    0.015414632 = product of:
      0.030829264 = sum of:
        0.030829264 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
          0.030829264 = score(doc=2765,freq=2.0), product of:
            0.15936506 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045509085 = queryNorm
            0.19345059 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2009 19:14:43

Search (44 results, page 2 of 3)

Authors

Years

Languages

Themes