Search (8 results, page 1 of 1)

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.02

0.0153221395 = product of:
  0.030644279 = sum of:
    0.030644279 = product of:
      0.061288558 = sum of:
        0.061288558 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.061288558 = score(doc=2748,freq=2.0), product of:
            0.15840882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045236014 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 2.2016 18:25:22

Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.01
```
0.013936987 = product of:
  0.027873974 = sum of:
    0.027873974 = product of:
      0.05574795 = sum of:
        0.05574795 = weight(_text_:n in 3015) [ClassicSimilarity], result of:
          0.05574795 = score(doc=3015,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.28582513 = fieldWeight in 3015, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.046875 = fieldNorm(doc=3015)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.

Maghsoodi, N.; Homayounpour, M.M.: Improving Farsi multiclass text classification using a thesaurus and two-stage feature selection (2011) 0.01

0.011614156 = product of:
  0.023228312 = sum of:
    0.023228312 = product of:
      0.046456624 = sum of:
        0.046456624 = weight(_text_:n in 4775) [ClassicSimilarity], result of:
          0.046456624 = score(doc=4775,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.23818761 = fieldWeight in 4775, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4775)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Qu, B.; Cong, G.; Li, C.; Sun, A.; Chen, H.: ¬An evaluation of classification models for question topic categorization (2012) 0.01
```
0.011614156 = product of:
  0.023228312 = sum of:
    0.023228312 = product of:
      0.046456624 = sum of:
        0.046456624 = weight(_text_:n in 237) [ClassicSimilarity], result of:
          0.046456624 = score(doc=237,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.23818761 = fieldWeight in 237, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=237)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.
Alberts, I.; Forest, D.: Email pragmatics and automatic classification : a study in the organizational context (2012) 0.01
```
0.011614156 = product of:
  0.023228312 = sum of:
    0.023228312 = product of:
      0.046456624 = sum of:
        0.046456624 = weight(_text_:n in 238) [ClassicSimilarity], result of:
          0.046456624 = score(doc=238,freq=2.0), product of:
            0.19504215 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.045236014 = queryNorm
            0.23818761 = fieldWeight in 238, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=238)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper presents a two-phased research project aiming to improve email triage for public administration managers. The first phase developed a typology of email classification patterns through a qualitative study involving 34 participants. Inspired by the fields of pragmatics and speech act theory, this typology comprising four top level categories and 13 subcategories represents the typical email triage behaviors of managers in an organizational context. The second study phase was conducted on a corpus of 1,703 messages using email samples of two managers. Using the k-NN (k-nearest neighbor) algorithm, statistical treatments automatically classified the email according to lexical and nonlexical features representative of managers' triage patterns. The automatic classification of email according to the lexicon of the messages was found to be substantially more efficient when k = 2 and n = 2,000. For four categories, the average recall rate was 94.32%, the average precision rate was 94.50%, and the accuracy rate was 94.54%. For 13 categories, the average recall rate was 91.09%, the average precision rate was 84.18%, and the accuracy rate was 88.70%. It appears that a message's nonlexical features are also deeply influenced by email pragmatics. Features related to the recipient and the sender were the most relevant for characterizing email.

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.01

0.0091932835 = product of:
  0.018386567 = sum of:
    0.018386567 = product of:
      0.036773134 = sum of:
        0.036773134 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
          0.036773134 = score(doc=690,freq=2.0), product of:
            0.15840882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045236014 = queryNorm
            0.23214069 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 23. 3.2013 13:22:36

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01

0.0091932835 = product of:
  0.018386567 = sum of:
    0.018386567 = product of:
      0.036773134 = sum of:
        0.036773134 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.036773134 = score(doc=2158,freq=2.0), product of:
            0.15840882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045236014 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 4. 8.2015 19:22:04

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01

0.0076610697 = product of:
  0.0153221395 = sum of:
    0.0153221395 = product of:
      0.030644279 = sum of:
        0.030644279 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.030644279 = score(doc=1107,freq=2.0), product of:
            0.15840882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045236014 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 28.10.2013 19:22:57

Search (8 results, page 1 of 1)

Authors

Themes