Search (23 results, page 1 of 2)

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.03
```
0.03430918 = product of:
  0.06861836 = sum of:
    0.06861836 = product of:
      0.102927536 = sum of:
        0.06616664 = weight(_text_:k in 690) [ClassicSimilarity], result of:
          0.06616664 = score(doc=690,freq=6.0), product of:
            0.16142878 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045220956 = queryNorm
            0.40988132 = fieldWeight in 690, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
        0.036760893 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
          0.036760893 = score(doc=690,freq=2.0), product of:
            0.15835609 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045220956 = queryNorm
            0.23214069 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)
```
Abstract

We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.

Date

23. 3.2013 13:22:36

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01

0.01021136 = product of:
  0.02042272 = sum of:
    0.02042272 = product of:
      0.061268155 = sum of:
        0.061268155 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.061268155 = score(doc=2748,freq=2.0), product of:
            0.15835609 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045220956 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Date: 1. 2.2016 18:25:22

Alberts, I.; Forest, D.: Email pragmatics and automatic classification : a study in the organizational context (2012) 0.01
```
0.009189812 = product of:
  0.018379623 = sum of:
    0.018379623 = product of:
      0.055138867 = sum of:
        0.055138867 = weight(_text_:k in 238) [ClassicSimilarity], result of:
          0.055138867 = score(doc=238,freq=6.0), product of:
            0.16142878 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045220956 = queryNorm
            0.34156775 = fieldWeight in 238, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=238)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

This paper presents a two-phased research project aiming to improve email triage for public administration managers. The first phase developed a typology of email classification patterns through a qualitative study involving 34 participants. Inspired by the fields of pragmatics and speech act theory, this typology comprising four top level categories and 13 subcategories represents the typical email triage behaviors of managers in an organizational context. The second study phase was conducted on a corpus of 1,703 messages using email samples of two managers. Using the k-NN (k-nearest neighbor) algorithm, statistical treatments automatically classified the email according to lexical and nonlexical features representative of managers' triage patterns. The automatic classification of email according to the lexicon of the messages was found to be substantially more efficient when k = 2 and n = 2,000. For four categories, the average recall rate was 94.32%, the average precision rate was 94.50%, and the accuracy rate was 94.54%. For 13 categories, the average recall rate was 91.09%, the average precision rate was 84.18%, and the accuracy rate was 88.70%. It appears that a message's nonlexical features are also deeply influenced by email pragmatics. Features related to the recipient and the sender were the most relevant for characterizing email.

Golub, K.: Automated subject classification of textual documents in the context of Web-based hierarchical browsing (2011) 0.01

0.006366888 = product of:
  0.012733776 = sum of:
    0.012733776 = product of:
      0.03820133 = sum of:
        0.03820133 = weight(_text_:k in 4558) [ClassicSimilarity], result of:
          0.03820133 = score(doc=4558,freq=2.0), product of:
            0.16142878 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045220956 = queryNorm
            0.23664509 = fieldWeight in 4558, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=4558)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Sojka, P.; Lee, M.; Rehurek, R.; Hatlapatka, R.; Kucbel, M.; Bouche, T.; Goutorbe, C.; Anghelache, R.; Wojciechowski, K.: Toolset for entity and semantic associations : Final Release (2013) 0.01

0.006366888 = product of:
  0.012733776 = sum of:
    0.012733776 = product of:
      0.03820133 = sum of:
        0.03820133 = weight(_text_:k in 1057) [ClassicSimilarity], result of:
          0.03820133 = score(doc=1057,freq=2.0), product of:
            0.16142878 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045220956 = queryNorm
            0.23664509 = fieldWeight in 1057, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=1057)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01

0.0061268155 = product of:
  0.012253631 = sum of:
    0.012253631 = product of:
      0.036760893 = sum of:
        0.036760893 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.036760893 = score(doc=2158,freq=2.0), product of:
            0.15835609 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045220956 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Date: 4. 8.2015 19:22:04

Kishida, K.: High-speed rough clustering for very large document collections (2010) 0.01

0.00530574 = product of:
  0.01061148 = sum of:
    0.01061148 = product of:
      0.03183444 = sum of:
        0.03183444 = weight(_text_:k in 3463) [ClassicSimilarity], result of:
          0.03183444 = score(doc=3463,freq=2.0), product of:
            0.16142878 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045220956 = queryNorm
            0.19720423 = fieldWeight in 3463, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3463)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Fagni, T.; Sebastiani, F.: Selecting negative examples for hierarchical text classification: An experimental comparison (2010) 0.01
```
0.00530574 = product of:
  0.01061148 = sum of:
    0.01061148 = product of:
      0.03183444 = sum of:
        0.03183444 = weight(_text_:k in 4101) [ClassicSimilarity], result of:
          0.03183444 = score(doc=4101,freq=2.0), product of:
            0.16142878 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045220956 = queryNorm
            0.19720423 = fieldWeight in 4101, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4101)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

Hierarchical text classification (HTC) approaches have recently attracted a lot of interest on the part of researchers in human language technology and machine learning, since they have been shown to bring about equal, if not better, classification accuracy with respect to their "flat" counterparts while allowing exponential time savings at both learning and classification time. A typical component of HTC methods is a "local" policy for selecting negative examples: Given a category c, its negative training examples are by default identified with the training examples that are negative for c and positive for the categories which are siblings of c in the hierarchy. However, this policy has always been taken for granted and never been subjected to careful scrutiny since first proposed 15 years ago. This article proposes a thorough experimental comparison between this policy and three other policies for the selection of negative examples in HTC contexts, one of which (BEST LOCAL (k)) is being proposed for the first time in this article. We compare these policies on the hierarchical versions of three supervised learning algorithms (boosting, support vector machines, and naïve Bayes) by performing experiments on two standard TC datasets, REUTERS-21578 and RCV1-V2.
Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.01
```
0.00530574 = product of:
  0.01061148 = sum of:
    0.01061148 = product of:
      0.03183444 = sum of:
        0.03183444 = weight(_text_:k in 967) [ClassicSimilarity], result of:
          0.03183444 = score(doc=967,freq=2.0), product of:
            0.16142878 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045220956 = queryNorm
            0.19720423 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.

Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.01

0.00530574 = product of:
  0.01061148 = sum of:
    0.01061148 = product of:
      0.03183444 = sum of:
        0.03183444 = weight(_text_:k in 2300) [ClassicSimilarity], result of:
          0.03183444 = score(doc=2300,freq=2.0), product of:
            0.16142878 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045220956 = queryNorm
            0.19720423 = fieldWeight in 2300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2300)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Yang, P.; Gao, W.; Tan, Q.; Wong, K.-F.: ¬A link-bridged topic model for cross-domain document classification (2013) 0.01

0.00530574 = product of:
  0.01061148 = sum of:
    0.01061148 = product of:
      0.03183444 = sum of:
        0.03183444 = weight(_text_:k in 2706) [ClassicSimilarity], result of:
          0.03183444 = score(doc=2706,freq=2.0), product of:
            0.16142878 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045220956 = queryNorm
            0.19720423 = fieldWeight in 2706, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2706)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01

0.00530574 = product of:
  0.01061148 = sum of:
    0.01061148 = product of:
      0.03183444 = sum of:
        0.03183444 = weight(_text_:k in 3311) [ClassicSimilarity], result of:
          0.03183444 = score(doc=3311,freq=2.0), product of:
            0.16142878 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.045220956 = queryNorm
            0.19720423 = fieldWeight in 3311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01

0.00510568 = product of:
  0.01021136 = sum of:
    0.01021136 = product of:
      0.030634077 = sum of:
        0.030634077 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.030634077 = score(doc=1107,freq=2.0), product of:
            0.15835609 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045220956 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Date: 28.10.2013 19:22:57

Kasprzik, A.: Automatisierte und semiautomatisierte Klassifizierung : eine Analyse aktueller Projekte (2014) 0.00

0.0030839336 = product of:
  0.006167867 = sum of:
    0.006167867 = product of:
      0.0185036 = sum of:
        0.0185036 = weight(_text_:h in 2470) [ClassicSimilarity], result of:
          0.0185036 = score(doc=2470,freq=2.0), product of:
            0.11234917 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.045220956 = queryNorm
            0.16469726 = fieldWeight in 2470, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.046875 = fieldNorm(doc=2470)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Source: Perspektive Bibliothek. 3(2014) H.1, S.85-110

Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.00

0.0030839336 = product of:
  0.006167867 = sum of:
    0.006167867 = product of:
      0.0185036 = sum of:
        0.0185036 = weight(_text_:h in 3015) [ClassicSimilarity], result of:
          0.0185036 = score(doc=3015,freq=2.0), product of:
            0.11234917 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.045220956 = queryNorm
            0.16469726 = fieldWeight in 3015, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.046875 = fieldNorm(doc=3015)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

HaCohen-Kerner, Y.; Beck, H.; Yehudai, E.; Rosenstein, M.; Mughaz, D.: Cuisine : classification using stylistic feature sets and/or name-based feature sets (2010) 0.00

0.0025699446 = product of:
  0.005139889 = sum of:
    0.005139889 = product of:
      0.015419668 = sum of:
        0.015419668 = weight(_text_:h in 3706) [ClassicSimilarity], result of:
          0.015419668 = score(doc=3706,freq=2.0), product of:
            0.11234917 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.045220956 = queryNorm
            0.13724773 = fieldWeight in 3706, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3706)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Qu, B.; Cong, G.; Li, C.; Sun, A.; Chen, H.: ¬An evaluation of classification models for question topic categorization (2012) 0.00

0.0025699446 = product of:
  0.005139889 = sum of:
    0.005139889 = product of:
      0.015419668 = sum of:
        0.015419668 = weight(_text_:h in 237) [ClassicSimilarity], result of:
          0.015419668 = score(doc=237,freq=2.0), product of:
            0.11234917 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.045220956 = queryNorm
            0.13724773 = fieldWeight in 237, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.0390625 = fieldNorm(doc=237)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Fang, H.: Classifying research articles in multidisciplinary sciences journals into subject categories (2015) 0.00

0.0025699446 = product of:
  0.005139889 = sum of:
    0.005139889 = product of:
      0.015419668 = sum of:
        0.015419668 = weight(_text_:h in 2194) [ClassicSimilarity], result of:
          0.015419668 = score(doc=2194,freq=2.0), product of:
            0.11234917 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.045220956 = queryNorm
            0.13724773 = fieldWeight in 2194, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2194)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.00
```
0.0025699446 = product of:
  0.005139889 = sum of:
    0.005139889 = product of:
      0.015419668 = sum of:
        0.015419668 = weight(_text_:h in 2836) [ClassicSimilarity], result of:
          0.015419668 = score(doc=2836,freq=2.0), product of:
            0.11234917 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.045220956 = queryNorm
            0.13724773 = fieldWeight in 2836, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2836)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.

Suominen, A.; Toivanen, H.: Map of science with topic modeling : comparison of unsupervised learning and human-assigned subject classification (2016) 0.00

0.0025699446 = product of:
  0.005139889 = sum of:
    0.005139889 = product of:
      0.015419668 = sum of:
        0.015419668 = weight(_text_:h in 3121) [ClassicSimilarity], result of:
          0.015419668 = score(doc=3121,freq=2.0), product of:
            0.11234917 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.045220956 = queryNorm
            0.13724773 = fieldWeight in 3121, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3121)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Search (23 results, page 1 of 2)

Authors

Languages

Types

Themes