Search (13 results, page 1 of 1)

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.05
```
0.051791668 = product of:
  0.103583336 = sum of:
    0.103583336 = sum of:
      0.06658822 = weight(_text_:k in 690) [ClassicSimilarity], result of:
        0.06658822 = score(doc=690,freq=6.0), product of:
          0.16245733 = queryWeight, product of:
            3.569778 = idf(docFreq=3384, maxDocs=44218)
            0.045509085 = queryNorm
          0.40988132 = fieldWeight in 690, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.569778 = idf(docFreq=3384, maxDocs=44218)
            0.046875 = fieldNorm(doc=690)
      0.036995113 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
        0.036995113 = score(doc=690,freq=2.0), product of:
          0.15936506 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.045509085 = queryNorm
          0.23214069 = fieldWeight in 690, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=690)
  0.5 = coord(1/2)
```
Abstract

We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.

Date

23. 3.2013 13:22:36

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.02

0.015414632 = product of:
  0.030829264 = sum of:
    0.030829264 = product of:
      0.061658528 = sum of:
        0.061658528 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.061658528 = score(doc=2748,freq=2.0), product of:
            0.15936506 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045509085 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 2.2016 18:25:22

Alberts, I.; Forest, D.: Email pragmatics and automatic classification : a study in the organizational context (2012) 0.01
```
0.013872546 = product of:
  0.027745092 = sum of:
    0.027745092 = product of:
      0.055490185 = sum of:
        0.055490185 = weight(_text_:k in 238) [ClassicSimilarity], result of:
          0.055490185 = score(doc=238,freq=6.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.34156775 = fieldWeight in 238, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=238)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper presents a two-phased research project aiming to improve email triage for public administration managers. The first phase developed a typology of email classification patterns through a qualitative study involving 34 participants. Inspired by the fields of pragmatics and speech act theory, this typology comprising four top level categories and 13 subcategories represents the typical email triage behaviors of managers in an organizational context. The second study phase was conducted on a corpus of 1,703 messages using email samples of two managers. Using the k-NN (k-nearest neighbor) algorithm, statistical treatments automatically classified the email according to lexical and nonlexical features representative of managers' triage patterns. The automatic classification of email according to the lexicon of the messages was found to be substantially more efficient when k = 2 and n = 2,000. For four categories, the average recall rate was 94.32%, the average precision rate was 94.50%, and the accuracy rate was 94.54%. For 13 categories, the average recall rate was 91.09%, the average precision rate was 84.18%, and the accuracy rate was 88.70%. It appears that a message's nonlexical features are also deeply influenced by email pragmatics. Features related to the recipient and the sender were the most relevant for characterizing email.

Golub, K.: Automated subject classification of textual documents in the context of Web-based hierarchical browsing (2011) 0.01

0.009611183 = product of:
  0.019222366 = sum of:
    0.019222366 = product of:
      0.03844473 = sum of:
        0.03844473 = weight(_text_:k in 4558) [ClassicSimilarity], result of:
          0.03844473 = score(doc=4558,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.23664509 = fieldWeight in 4558, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=4558)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Sojka, P.; Lee, M.; Rehurek, R.; Hatlapatka, R.; Kucbel, M.; Bouche, T.; Goutorbe, C.; Anghelache, R.; Wojciechowski, K.: Toolset for entity and semantic associations : Final Release (2013) 0.01

0.009611183 = product of:
  0.019222366 = sum of:
    0.019222366 = product of:
      0.03844473 = sum of:
        0.03844473 = weight(_text_:k in 1057) [ClassicSimilarity], result of:
          0.03844473 = score(doc=1057,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.23664509 = fieldWeight in 1057, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=1057)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01

0.009248778 = product of:
  0.018497556 = sum of:
    0.018497556 = product of:
      0.036995113 = sum of:
        0.036995113 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.036995113 = score(doc=2158,freq=2.0), product of:
            0.15936506 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045509085 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 4. 8.2015 19:22:04

Kishida, K.: High-speed rough clustering for very large document collections (2010) 0.01

0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 3463) [ClassicSimilarity], result of:
          0.032037273 = score(doc=3463,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 3463, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3463)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Fagni, T.; Sebastiani, F.: Selecting negative examples for hierarchical text classification: An experimental comparison (2010) 0.01
```
0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 4101) [ClassicSimilarity], result of:
          0.032037273 = score(doc=4101,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 4101, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4101)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Hierarchical text classification (HTC) approaches have recently attracted a lot of interest on the part of researchers in human language technology and machine learning, since they have been shown to bring about equal, if not better, classification accuracy with respect to their "flat" counterparts while allowing exponential time savings at both learning and classification time. A typical component of HTC methods is a "local" policy for selecting negative examples: Given a category c, its negative training examples are by default identified with the training examples that are negative for c and positive for the categories which are siblings of c in the hierarchy. However, this policy has always been taken for granted and never been subjected to careful scrutiny since first proposed 15 years ago. This article proposes a thorough experimental comparison between this policy and three other policies for the selection of negative examples in HTC contexts, one of which (BEST LOCAL (k)) is being proposed for the first time in this article. We compare these policies on the hierarchical versions of three supervised learning algorithms (boosting, support vector machines, and naïve Bayes) by performing experiments on two standard TC datasets, REUTERS-21578 and RCV1-V2.
Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.01
```
0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 967) [ClassicSimilarity], result of:
          0.032037273 = score(doc=967,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.

Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.01

0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 2300) [ClassicSimilarity], result of:
          0.032037273 = score(doc=2300,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 2300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2300)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Yang, P.; Gao, W.; Tan, Q.; Wong, K.-F.: ¬A link-bridged topic model for cross-domain document classification (2013) 0.01

0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 2706) [ClassicSimilarity], result of:
          0.032037273 = score(doc=2706,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 2706, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2706)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01

0.008009318 = product of:
  0.016018637 = sum of:
    0.016018637 = product of:
      0.032037273 = sum of:
        0.032037273 = weight(_text_:k in 3311) [ClassicSimilarity], result of:
          0.032037273 = score(doc=3311,freq=2.0), product of:
            0.16245733 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045509085 = queryNorm
            0.19720423 = fieldWeight in 3311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01

0.007707316 = product of:
  0.015414632 = sum of:
    0.015414632 = product of:
      0.030829264 = sum of:
        0.030829264 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.030829264 = score(doc=1107,freq=2.0), product of:
            0.15936506 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045509085 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 28.10.2013 19:22:57

Search (13 results, page 1 of 1)

Authors

Types

Themes