Search (45 results, page 3 of 3)

Desale, S.K.; Kumbhar, R.: Research on automatic classification of documents in library environment : a literature review (2013) 0.00
```
0.0011642005 = product of:
  0.002328401 = sum of:
    0.002328401 = product of:
      0.006985203 = sum of:
        0.006985203 = weight(_text_:a in 1071) [ClassicSimilarity], result of:
          0.006985203 = score(doc=1071,freq=6.0), product of:
            0.052761257 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045758117 = queryNorm
            0.13239266 = fieldWeight in 1071, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1071)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

This paper aims to provide an overview of automatic classification research, which focuses on issues related to the automatic classification of documents in a library environment. The review covers literature published in mainstream library and information science studies. The review was done on literature published in both academic and professional LIS journals and other documents. This review reveals that basically three types of research are being done on automatic classification: 1) hierarchical classification using different library classification schemes, 2) text categorization and document categorization using different type of classifiers with or without using training documents, and 3) automatic bibliographic classification. Predominantly this research is directed towards solving problems of organization of digital documents in an online environment. However, very little research is devoted towards solving the problems of arrangement of physical documents.

Type

a
Wartena, C.; Sommer, M.: Automatic classification of scientific records using the German Subject Heading Authority File (SWD) (2012) 0.00
```
0.0011202524 = product of:
  0.0022405048 = sum of:
    0.0022405048 = product of:
      0.0067215143 = sum of:
        0.0067215143 = weight(_text_:a in 472) [ClassicSimilarity], result of:
          0.0067215143 = score(doc=472,freq=8.0), product of:
            0.052761257 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045758117 = queryNorm
            0.12739488 = fieldWeight in 472, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=472)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.

Source

Proceedings of the 2nd International Workshop on Semantic Digital Archives held in conjunction with the 16th Int. Conference on Theory and Practice of Digital Libraries (TPDL) on September 27, 2012 in Paphos, Cyprus [http://ceur-ws.org/Vol-912/proceedings.pdf]. Eds.: A. Mitschik et al
Salles, T.; Rocha, L.; Gonçalves, M.A.; Almeida, J.M.; Mourão, F.; Meira Jr., W.; Viegas, F.: ¬A quantitative analysis of the temporal effects on automatic text classification (2016) 0.00
```
0.0011202524 = product of:
  0.0022405048 = sum of:
    0.0022405048 = product of:
      0.0067215143 = sum of:
        0.0067215143 = weight(_text_:a in 3014) [ClassicSimilarity], result of:
          0.0067215143 = score(doc=3014,freq=8.0), product of:
            0.052761257 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045758117 = queryNorm
            0.12739488 = fieldWeight in 3014, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3014)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

Automatic text classification (TC) continues to be a relevant research topic and several TC algorithms have been proposed. However, the majority of TC algorithms assume that the underlying data distribution does not change over time. In this work, we are concerned with the challenges imposed by the temporal dynamics observed in textual data sets. We provide evidence of the existence of temporal effects in three textual data sets, reflected by variations observed over time in the class distribution, in the pairwise class similarities, and in the relationships between terms and classes. We then quantify, using a series of full factorial design experiments, the impact of these effects on four well-known TC algorithms. We show that these temporal effects affect each analyzed data set differently and that they restrict the performance of each considered TC algorithm to different extents. The reported quantitative analyses, which are the original contributions of this article, provide valuable new insights to better understand the behavior of TC algorithms when faced with nonstatic (temporal) data distributions and highlight important requirements for the proposal of more accurate classification models.

Type

a
Schaalje, G.B.; Blades, N.J.; Funai, T.: ¬An open-set size-adjusted Bayesian classifier for authorship attribution (2013) 0.00
```
9.5056574E-4 = product of:
  0.0019011315 = sum of:
    0.0019011315 = product of:
      0.0057033943 = sum of:
        0.0057033943 = weight(_text_:a in 1041) [ClassicSimilarity], result of:
          0.0057033943 = score(doc=1041,freq=4.0), product of:
            0.052761257 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045758117 = queryNorm
            0.10809815 = fieldWeight in 1041, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1041)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

Recent studies of authorship attribution have used machine-learning methods including regularized multinomial logistic regression, neural nets, support vector machines, and the nearest shrunken centroid classifier to identify likely authors of disputed texts. These methods are all limited by an inability to perform open-set classification and account for text and corpus size. We propose a customized Bayesian logit-normal-beta-binomial classification model for supervised authorship attribution. The model is based on the beta-binomial distribution with an explicit inverse relationship between extra-binomial variation and text size. The model internally estimates the relationship of extra-binomial variation to text size, and uses Markov Chain Monte Carlo (MCMC) to produce distributions of posterior authorship probabilities instead of point estimates. We illustrate the method by training the machine-learning methods as well as the open-set Bayesian classifier on undisputed papers of The Federalist, and testing the method on documents historically attributed to Alexander Hamilton, John Jay, and James Madison. The Bayesian classifier was the best classifier of these texts.

Type

a
Sojka, P.; Lee, M.; Rehurek, R.; Hatlapatka, R.; Kucbel, M.; Bouche, T.; Goutorbe, C.; Anghelache, R.; Wojciechowski, K.: Toolset for entity and semantic associations : Final Release (2013) 0.00
```
6.721515E-4 = product of:
  0.001344303 = sum of:
    0.001344303 = product of:
      0.004032909 = sum of:
        0.004032909 = weight(_text_:a in 1057) [ClassicSimilarity], result of:
          0.004032909 = score(doc=1057,freq=2.0), product of:
            0.052761257 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045758117 = queryNorm
            0.07643694 = fieldWeight in 1057, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1057)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

In this document we describe the final release of the toolset for entity and semantic associations, integrating two versions (language dependent and language independent) of Unsupervised Document Similarity implemented by MU (using gensim tool) and Citation Indexing, Resolution and Matching (UJF/CMD). We give a brief description of tools, the rationale behind decisions made, and provide elementary evaluation. Tools are integrated in the main project result, EuDML website, and they deliver the needed functionality for exploratory searching and browsing the collected documents. EuDML users and content providers thus benefit from millions of algorithmically generated similarity and citation links, developed using state of the art machine learning and matching methods.

Search (45 results, page 3 of 3)

Authors

Languages

Types

Themes