Search (65 results, page 1 of 4)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.05

0.054370552 = product of:
  0.08155583 = sum of:
    0.06966957 = product of:
      0.2090087 = sum of:
        0.2090087 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.2090087 = score(doc=562,freq=2.0), product of:
            0.37188965 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0438652 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.011886258 = product of:
      0.035658773 = sum of:
        0.035658773 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.035658773 = score(doc=562,freq=2.0), product of:
            0.15360846 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0438652 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.02
```
0.022187045 = product of:
  0.06656113 = sum of:
    0.06656113 = product of:
      0.0998417 = sum of:
        0.06418292 = weight(_text_:k in 690) [ClassicSimilarity], result of:
          0.06418292 = score(doc=690,freq=6.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.40988132 = fieldWeight in 690, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
        0.035658773 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
          0.035658773 = score(doc=690,freq=2.0), product of:
            0.15360846 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0438652 = queryNorm
            0.23214069 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)
```
Abstract

We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.

Date

23. 3.2013 13:22:36
Kwon, O.W.; Lee, J.H.: Text categorization based on k-nearest neighbor approach for web site classification (2003) 0.02
```
0.022007827 = product of:
  0.06602348 = sum of:
    0.06602348 = product of:
      0.09903521 = sum of:
        0.06904983 = weight(_text_:k in 1070) [ClassicSimilarity], result of:
          0.06904983 = score(doc=1070,freq=10.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.44096208 = fieldWeight in 1070, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1070)
        0.029985385 = weight(_text_:29 in 1070) [ClassicSimilarity], result of:
          0.029985385 = score(doc=1070,freq=2.0), product of:
            0.15430406 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0438652 = queryNorm
            0.19432661 = fieldWeight in 1070, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1070)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)
```
Abstract

Automatic categorization is a viable method to deal with the scaling problem on the World Wide Web. For Web site classification, this paper proposes the use of Web pages linked with the home page in a different manner from the sole use of home pages in previous research. To implement our proposed method, we derive a scheme for Web site classification based on the k-nearest neighbor (k-NN) approach. It consists of three phases: Web page selection (connectivity analysis), Web page classification, and Web site classification. Given a Web site, the Web page selection chooses several representative Web pages using connectivity analysis. The k-NN classifier next classifies each of the selected Web pages. Finally, the classified Web pages are extended to a classification of the entire Web site. To improve performance, we supplement the k-NN approach with a feature selection method and a term weighting scheme using markup tags, and also reform its document-document similarity measure. In our experiments on a Korean commercial Web directory, the proposed system, using both a home page and its linked pages, improved the performance of micro-averaging breakeven point by 30.02%, compared with an ordinary classification which uses a home page only.

Date

27.12.2007 17:32:29

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.02

0.018851986 = product of:
  0.056555957 = sum of:
    0.056555957 = product of:
      0.084833935 = sum of:
        0.04323203 = weight(_text_:k in 2560) [ClassicSimilarity], result of:
          0.04323203 = score(doc=2560,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.27608594 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
        0.041601904 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.041601904 = score(doc=2560,freq=2.0), product of:
            0.15360846 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0438652 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)

Date: 22. 9.2008 18:31:54

Chung, Y.-M.; Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents (2003) 0.02
```
0.016230777 = product of:
  0.048692327 = sum of:
    0.048692327 = product of:
      0.07303849 = sum of:
        0.037056025 = weight(_text_:k in 1566) [ClassicSimilarity], result of:
          0.037056025 = score(doc=1566,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.23664509 = fieldWeight in 1566, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=1566)
        0.03598246 = weight(_text_:29 in 1566) [ClassicSimilarity], result of:
          0.03598246 = score(doc=1566,freq=2.0), product of:
            0.15430406 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0438652 = queryNorm
            0.23319192 = fieldWeight in 1566, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=1566)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)
```
Abstract

This study developed a specialized directory system using an automatic classification technique. Economics was selected as the subject field for the classification experiments with Web documents. The classification scheme of the directory follows the DDC, and subject terms representing each class number or subject category were selected from the DDC table to construct a representative term dictionary. In collecting and classifying the Web documents, various strategies were tested in order to find the optimal thresholds. In the classification experiments, Web documents in economics were classified into a total of 757 hierarchical subject categories built from the DDC scheme. The first and second experiments using the representative term dictionary resulted in relatively high precision ratios of 77 and 60%, respectively. The third experiment employing a machine learning-based k-nearest neighbours (kNN) classifier in a closed experimental setting achieved a precision ratio of 96%. This implies that it is possible to enhance the classification performance by applying a hybrid method combining a dictionary-based technique and a kNN classifier

Source

Journal of information science. 29(2003) no.2, S.117-126
Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.01
```
0.013525645 = product of:
  0.040576935 = sum of:
    0.040576935 = product of:
      0.060865402 = sum of:
        0.03088002 = weight(_text_:k in 967) [ClassicSimilarity], result of:
          0.03088002 = score(doc=967,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.19720423 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
        0.029985385 = weight(_text_:29 in 967) [ClassicSimilarity], result of:
          0.029985385 = score(doc=967,freq=2.0), product of:
            0.15430406 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0438652 = queryNorm
            0.19432661 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)
```
Abstract

Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.

Date

25. 6.2013 19:05:29

Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.01

0.013525645 = product of:
  0.040576935 = sum of:
    0.040576935 = product of:
      0.060865402 = sum of:
        0.03088002 = weight(_text_:k in 2300) [ClassicSimilarity], result of:
          0.03088002 = score(doc=2300,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.19720423 = fieldWeight in 2300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2300)
        0.029985385 = weight(_text_:29 in 2300) [ClassicSimilarity], result of:
          0.029985385 = score(doc=2300,freq=2.0), product of:
            0.15430406 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0438652 = queryNorm
            0.19432661 = fieldWeight in 2300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2300)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)

Source: Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro

Sparck Jones, K.: Automatic classification (1976) 0.01

0.010979563 = product of:
  0.03293869 = sum of:
    0.03293869 = product of:
      0.09881607 = sum of:
        0.09881607 = weight(_text_:k in 2908) [ClassicSimilarity], result of:
          0.09881607 = score(doc=2908,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.63105357 = fieldWeight in 2908, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.125 = fieldNorm(doc=2908)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.01

0.0107725635 = product of:
  0.03231769 = sum of:
    0.03231769 = product of:
      0.048476532 = sum of:
        0.024704017 = weight(_text_:k in 2741) [ClassicSimilarity], result of:
          0.024704017 = score(doc=2741,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.15776339 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
        0.023772515 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.023772515 = score(doc=2741,freq=2.0), product of:
            0.15360846 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0438652 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)

Date: 12. 9.2004 9:56:22

Panyr, J.: STEINADLER: ein Verfahren zur automatischen Deskribierung und zur automatischen thematischen Klassifikation (1978) 0.01

0.010661471 = product of:
  0.03198441 = sum of:
    0.03198441 = product of:
      0.095953226 = sum of:
        0.095953226 = weight(_text_:29 in 5169) [ClassicSimilarity], result of:
          0.095953226 = score(doc=5169,freq=2.0), product of:
            0.15430406 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0438652 = queryNorm
            0.6218451 = fieldWeight in 5169, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.125 = fieldNorm(doc=5169)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: Nachrichten für Dokumentation. 29(1978), S.92-96

Yu, W.; Gong, Y.: Document clustering by concept factorization (2004) 0.01

0.008234672 = product of:
  0.024704017 = sum of:
    0.024704017 = product of:
      0.07411205 = sum of:
        0.07411205 = weight(_text_:k in 4084) [ClassicSimilarity], result of:
          0.07411205 = score(doc=4084,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.47329018 = fieldWeight in 4084, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.09375 = fieldNorm(doc=4084)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.007924172 = product of:
  0.023772515 = sum of:
    0.023772515 = product of:
      0.071317546 = sum of:
        0.071317546 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.071317546 = score(doc=1046,freq=2.0), product of:
            0.15360846 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0438652 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 5. 5.2003 14:17:22

Wätjen, H.-J.; Diekmann, B.; Möller, G.; Carstensen, K.-U.: Bericht zum DFG-Projekt: GERHARD : German Harvest Automated Retrieval and Directory (1998) 0.01

0.006862227 = product of:
  0.02058668 = sum of:
    0.02058668 = product of:
      0.06176004 = sum of:
        0.06176004 = weight(_text_:k in 3065) [ClassicSimilarity], result of:
          0.06176004 = score(doc=3065,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.39440846 = fieldWeight in 3065, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.078125 = fieldNorm(doc=3065)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Shen, D.; Chen, Z.; Yang, Q.; Zeng, H.J.; Zhang, B.; Lu, Y.; Ma, W.Y.: Web page classification through summarization (2004) 0.01

0.006862227 = product of:
  0.02058668 = sum of:
    0.02058668 = product of:
      0.06176004 = sum of:
        0.06176004 = weight(_text_:k in 4132) [ClassicSimilarity], result of:
          0.06176004 = score(doc=4132,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.39440846 = fieldWeight in 4132, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.078125 = fieldNorm(doc=4132)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Hu, G.; Zhou, S.; Guan, J.; Hu, X.: Towards effective document clustering : a constrained K-means based approach (2008) 0.01
```
0.0067932582 = product of:
  0.020379774 = sum of:
    0.020379774 = product of:
      0.06113932 = sum of:
        0.06113932 = weight(_text_:k in 2113) [ClassicSimilarity], result of:
          0.06113932 = score(doc=2113,freq=4.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.39044446 = fieldWeight in 2113, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2113)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Document clustering is an important tool for document collection organization and browsing. In real applications, some limited knowledge about cluster membership of a small number of documents is often available, such as some pairs of documents belonging to the same cluster. This kind of prior knowledge can be served as constraints for the clustering process. We integrate the constraints into the trace formulation of the sum of square Euclidean distance function of K-means. Then, the combined criterion function is transformed into trace maximization, which is further optimized by eigen-decomposition. Our experimental evaluation shows that the proposed semi-supervised clustering method can achieve better performance, compared to three existing methods.

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01

0.006603477 = product of:
  0.01981043 = sum of:
    0.01981043 = product of:
      0.059431292 = sum of:
        0.059431292 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.059431292 = score(doc=611,freq=2.0), product of:
            0.15360846 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0438652 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 22. 8.2009 12:54:24

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01

0.006603477 = product of:
  0.01981043 = sum of:
    0.01981043 = product of:
      0.059431292 = sum of:
        0.059431292 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.059431292 = score(doc=2748,freq=2.0), product of:
            0.15360846 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0438652 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 1. 2.2016 18:25:22

Alberts, I.; Forest, D.: Email pragmatics and automatic classification : a study in the organizational context (2012) 0.01
```
0.0059428625 = product of:
  0.017828587 = sum of:
    0.017828587 = product of:
      0.053485762 = sum of:
        0.053485762 = weight(_text_:k in 238) [ClassicSimilarity], result of:
          0.053485762 = score(doc=238,freq=6.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.34156775 = fieldWeight in 238, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0390625 = fieldNorm(doc=238)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

This paper presents a two-phased research project aiming to improve email triage for public administration managers. The first phase developed a typology of email classification patterns through a qualitative study involving 34 participants. Inspired by the fields of pragmatics and speech act theory, this typology comprising four top level categories and 13 subcategories represents the typical email triage behaviors of managers in an organizational context. The second study phase was conducted on a corpus of 1,703 messages using email samples of two managers. Using the k-NN (k-nearest neighbor) algorithm, statistical treatments automatically classified the email according to lexical and nonlexical features representative of managers' triage patterns. The automatic classification of email according to the lexicon of the messages was found to be substantially more efficient when k = 2 and n = 2,000. For four categories, the average recall rate was 94.32%, the average precision rate was 94.50%, and the accuracy rate was 94.54%. For 13 categories, the average recall rate was 91.09%, the average precision rate was 84.18%, and the accuracy rate was 88.70%. It appears that a message's nonlexical features are also deeply influenced by email pragmatics. Features related to the recipient and the sender were the most relevant for characterizing email.

Lindholm, J.; Schönthal, T.; Jansson , K.: Experiences of harvesting Web resources in engineering using automatic classification (2003) 0.01

0.0054897815 = product of:
  0.016469344 = sum of:
    0.016469344 = product of:
      0.049408033 = sum of:
        0.049408033 = weight(_text_:k in 4088) [ClassicSimilarity], result of:
          0.049408033 = score(doc=4088,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.31552678 = fieldWeight in 4088, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0625 = fieldNorm(doc=4088)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Yi, K.: Challenges in automated classification using library classification schemes (2006) 0.01

0.0054897815 = product of:
  0.016469344 = sum of:
    0.016469344 = product of:
      0.049408033 = sum of:
        0.049408033 = weight(_text_:k in 5810) [ClassicSimilarity], result of:
          0.049408033 = score(doc=5810,freq=2.0), product of:
            0.15658903 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0438652 = queryNorm
            0.31552678 = fieldWeight in 5810, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.0625 = fieldNorm(doc=5810)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Search (65 results, page 1 of 4)

Authors

Years

Languages

Types

Themes