Search (8 results, page 1 of 1)

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.02

0.01662725 = product of:
  0.0332545 = sum of:
    0.0332545 = product of:
      0.066509 = sum of:
        0.066509 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.066509 = score(doc=2748,freq=2.0), product of:
            0.17190179 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049089137 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 1. 2.2016 18:25:22

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.01

0.00997635 = product of:
  0.0199527 = sum of:
    0.0199527 = product of:
      0.0399054 = sum of:
        0.0399054 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
          0.0399054 = score(doc=690,freq=2.0), product of:
            0.17190179 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049089137 = queryNorm
            0.23214069 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 23. 3.2013 13:22:36

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01

0.00997635 = product of:
  0.0199527 = sum of:
    0.0199527 = product of:
      0.0399054 = sum of:
        0.0399054 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.0399054 = score(doc=2158,freq=2.0), product of:
            0.17190179 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049089137 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 4. 8.2015 19:22:04

Schaalje, G.B.; Blades, N.J.; Funai, T.: ¬An open-set size-adjusted Bayesian classifier for authorship attribution (2013) 0.01
```
0.008453867 = product of:
  0.016907735 = sum of:
    0.016907735 = product of:
      0.06763094 = sum of:
        0.06763094 = weight(_text_:authors in 1041) [ClassicSimilarity], result of:
          0.06763094 = score(doc=1041,freq=2.0), product of:
            0.22378825 = queryWeight, product of:
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.049089137 = queryNorm
            0.30220953 = fieldWeight in 1041, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.046875 = fieldNorm(doc=1041)
      0.25 = coord(1/4)
  0.5 = coord(1/2)
```
Abstract

Recent studies of authorship attribution have used machine-learning methods including regularized multinomial logistic regression, neural nets, support vector machines, and the nearest shrunken centroid classifier to identify likely authors of disputed texts. These methods are all limited by an inability to perform open-set classification and account for text and corpus size. We propose a customized Bayesian logit-normal-beta-binomial classification model for supervised authorship attribution. The model is based on the beta-binomial distribution with an explicit inverse relationship between extra-binomial variation and text size. The model internally estimates the relationship of extra-binomial variation to text size, and uses Markov Chain Monte Carlo (MCMC) to produce distributions of posterior authorship probabilities instead of point estimates. We illustrate the method by training the machine-learning methods as well as the open-set Bayesian classifier on undisputed papers of The Federalist, and testing the method on documents historically attributed to Alexander Hamilton, John Jay, and James Madison. The Bayesian classifier was the best classifier of these texts.

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01

0.008313625 = product of:
  0.01662725 = sum of:
    0.01662725 = product of:
      0.0332545 = sum of:
        0.0332545 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
          0.0332545 = score(doc=1107,freq=2.0), product of:
            0.17190179 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049089137 = queryNorm
            0.19345059 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 28.10.2013 19:22:57

HaCohen-Kerner, Y.; Beck, H.; Yehudai, E.; Rosenstein, M.; Mughaz, D.: Cuisine : classification using stylistic feature sets and/or name-based feature sets (2010) 0.01
```
0.0070448895 = product of:
  0.014089779 = sum of:
    0.014089779 = product of:
      0.056359116 = sum of:
        0.056359116 = weight(_text_:authors in 3706) [ClassicSimilarity], result of:
          0.056359116 = score(doc=3706,freq=2.0), product of:
            0.22378825 = queryWeight, product of:
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.049089137 = queryNorm
            0.25184128 = fieldWeight in 3706, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3706)
      0.25 = coord(1/4)
  0.5 = coord(1/2)
```
Abstract

Document classification presents challenges due to the large number of features, their dependencies, and the large number of training documents. In this research, we investigated the use of six stylistic feature sets (including 42 features) and/or six name-based feature sets (including 234 features) for various combinations of the following classification tasks: ethnic groups of the authors and/or periods of time when the documents were written and/or places where the documents were written. The investigated corpus contains Jewish Law articles written in Hebrew-Aramaic, which present interesting problems for classification. Our system CUISINE (Classification UsIng Stylistic feature sets and/or NamE-based feature sets) achieves accuracy results between 90.71 to 98.99% for the seven classification experiments (ethnicity, time, place, ethnicity&time, ethnicity&place, time&place, ethnicity&time&place). For the first six tasks, the stylistic feature sets in general and the quantitative feature set in particular are enough for excellent classification results. In contrast, the name-based feature sets are rather poor for these tasks. However, for the most complex task (ethnicity&time&place), a hill-climbing model using all feature sets succeeds in significantly improving the classification results. Most of the stylistic features (34 of 42) are language-independent and domain-independent. These features might be useful to the community at large, at least for rather simple tasks.
AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.01
```
0.0070448895 = product of:
  0.014089779 = sum of:
    0.014089779 = product of:
      0.056359116 = sum of:
        0.056359116 = weight(_text_:authors in 2836) [ClassicSimilarity], result of:
          0.056359116 = score(doc=2836,freq=2.0), product of:
            0.22378825 = queryWeight, product of:
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.049089137 = queryNorm
            0.25184128 = fieldWeight in 2836, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2836)
      0.25 = coord(1/4)
  0.5 = coord(1/2)
```
Abstract

Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.
Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.01
```
0.0070448895 = product of:
  0.014089779 = sum of:
    0.014089779 = product of:
      0.056359116 = sum of:
        0.056359116 = weight(_text_:authors in 3627) [ClassicSimilarity], result of:
          0.056359116 = score(doc=3627,freq=2.0), product of:
            0.22378825 = queryWeight, product of:
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.049089137 = queryNorm
            0.25184128 = fieldWeight in 3627, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3627)
      0.25 = coord(1/4)
  0.5 = coord(1/2)
```
Abstract

A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).

Search (8 results, page 1 of 1)

Authors

Themes