Search (16 results, page 1 of 1)

  • × theme_ss:"Automatisches Klassifizieren"
  • × year_i:[2010 TO 2020}
  1. Schaalje, G.B.; Blades, N.J.; Funai, T.: ¬An open-set size-adjusted Bayesian classifier for authorship attribution (2013) 0.02
    0.017150164 = product of:
      0.102900974 = sum of:
        0.102900974 = weight(_text_:john in 1041) [ClassicSimilarity], result of:
          0.102900974 = score(doc=1041,freq=2.0), product of:
            0.24518675 = queryWeight, product of:
              6.330911 = idf(docFreq=213, maxDocs=44218)
              0.03872851 = queryNorm
            0.41968408 = fieldWeight in 1041, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.330911 = idf(docFreq=213, maxDocs=44218)
              0.046875 = fieldNorm(doc=1041)
      0.16666667 = coord(1/6)
    
    Abstract
    Recent studies of authorship attribution have used machine-learning methods including regularized multinomial logistic regression, neural nets, support vector machines, and the nearest shrunken centroid classifier to identify likely authors of disputed texts. These methods are all limited by an inability to perform open-set classification and account for text and corpus size. We propose a customized Bayesian logit-normal-beta-binomial classification model for supervised authorship attribution. The model is based on the beta-binomial distribution with an explicit inverse relationship between extra-binomial variation and text size. The model internally estimates the relationship of extra-binomial variation to text size, and uses Markov Chain Monte Carlo (MCMC) to produce distributions of posterior authorship probabilities instead of point estimates. We illustrate the method by training the machine-learning methods as well as the open-set Bayesian classifier on undisputed papers of The Federalist, and testing the method on documents historically attributed to Alexander Hamilton, John Jay, and James Madison. The Bayesian classifier was the best classifier of these texts.
  2. Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01
    0.012187276 = product of:
      0.036561828 = sum of:
        0.023443883 = weight(_text_:r in 1107) [ClassicSimilarity], result of:
          0.023443883 = score(doc=1107,freq=2.0), product of:
            0.12820137 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.03872851 = queryNorm
            0.18286766 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
        0.013117946 = product of:
          0.026235892 = sum of:
            0.026235892 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
              0.026235892 = score(doc=1107,freq=2.0), product of:
                0.13562064 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03872851 = queryNorm
                0.19345059 = fieldWeight in 1107, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1107)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Date
    28.10.2013 19:22:57
  3. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.01
    0.008398255 = product of:
      0.05038953 = sum of:
        0.05038953 = sum of:
          0.01890646 = weight(_text_:4 in 690) [ClassicSimilarity], result of:
            0.01890646 = score(doc=690,freq=2.0), product of:
              0.105097495 = queryWeight, product of:
                2.7136984 = idf(docFreq=7967, maxDocs=44218)
                0.03872851 = queryNorm
              0.17989448 = fieldWeight in 690, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7136984 = idf(docFreq=7967, maxDocs=44218)
                0.046875 = fieldNorm(doc=690)
          0.03148307 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
            0.03148307 = score(doc=690,freq=2.0), product of:
              0.13562064 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03872851 = queryNorm
              0.23214069 = fieldWeight in 690, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=690)
      0.16666667 = coord(1/6)
    
    Date
    23. 3.2013 13:22:36
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.4, S.844-860
  4. Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01
    0.008398255 = product of:
      0.05038953 = sum of:
        0.05038953 = sum of:
          0.01890646 = weight(_text_:4 in 2158) [ClassicSimilarity], result of:
            0.01890646 = score(doc=2158,freq=2.0), product of:
              0.105097495 = queryWeight, product of:
                2.7136984 = idf(docFreq=7967, maxDocs=44218)
                0.03872851 = queryNorm
              0.17989448 = fieldWeight in 2158, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7136984 = idf(docFreq=7967, maxDocs=44218)
                0.046875 = fieldNorm(doc=2158)
          0.03148307 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
            0.03148307 = score(doc=2158,freq=2.0), product of:
              0.13562064 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03872851 = queryNorm
              0.23214069 = fieldWeight in 2158, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2158)
      0.16666667 = coord(1/6)
    
    Date
    4. 8.2015 19:22:04
  5. Sojka, P.; Lee, M.; Rehurek, R.; Hatlapatka, R.; Kucbel, M.; Bouche, T.; Goutorbe, C.; Anghelache, R.; Wojciechowski, K.: Toolset for entity and semantic associations : Final Release (2013) 0.01
    0.0081212 = product of:
      0.0487272 = sum of:
        0.0487272 = weight(_text_:r in 1057) [ClassicSimilarity], result of:
          0.0487272 = score(doc=1057,freq=6.0), product of:
            0.12820137 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.03872851 = queryNorm
            0.38008332 = fieldWeight in 1057, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.046875 = fieldNorm(doc=1057)
      0.16666667 = coord(1/6)
    
  6. Liu, R.-L.: Context-based term frequency assessment for text classification (2010) 0.00
    0.004688777 = product of:
      0.028132662 = sum of:
        0.028132662 = weight(_text_:r in 3331) [ClassicSimilarity], result of:
          0.028132662 = score(doc=3331,freq=2.0), product of:
            0.12820137 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.03872851 = queryNorm
            0.2194412 = fieldWeight in 3331, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.046875 = fieldNorm(doc=3331)
      0.16666667 = coord(1/6)
    
  7. Desale, S.K.; Kumbhar, R.: Research on automatic classification of documents in library environment : a literature review (2013) 0.00
    0.004688777 = product of:
      0.028132662 = sum of:
        0.028132662 = weight(_text_:r in 1071) [ClassicSimilarity], result of:
          0.028132662 = score(doc=1071,freq=2.0), product of:
            0.12820137 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.03872851 = queryNorm
            0.2194412 = fieldWeight in 1071, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.046875 = fieldNorm(doc=1071)
      0.16666667 = coord(1/6)
    
  8. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.00
    0.004372649 = product of:
      0.026235892 = sum of:
        0.026235892 = product of:
          0.052471783 = sum of:
            0.052471783 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.052471783 = score(doc=2748,freq=2.0), product of:
                0.13562064 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03872851 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Date
    1. 2.2016 18:25:22
  9. Yilmaz, T.; Ozcan, R.; Altingovde, I.S.; Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification (2019) 0.00
    0.003907314 = product of:
      0.023443883 = sum of:
        0.023443883 = weight(_text_:r in 5041) [ClassicSimilarity], result of:
          0.023443883 = score(doc=5041,freq=2.0), product of:
            0.12820137 = queryWeight, product of:
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.03872851 = queryNorm
            0.18286766 = fieldWeight in 5041, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.3102584 = idf(docFreq=4387, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5041)
      0.16666667 = coord(1/6)
    
  10. Aphinyanaphongs, Y.; Fu, L.D.; Li, Z.; Peskin, E.R.; Efstathiadis, E.; Aliferis, C.F.; Statnikov, A.: ¬A comprehensive empirical comparison of modern supervised classification and feature selection methods for text categorization (2014) 0.00
    0.0015755383 = product of:
      0.00945323 = sum of:
        0.00945323 = product of:
          0.01890646 = sum of:
            0.01890646 = weight(_text_:4 in 1496) [ClassicSimilarity], result of:
              0.01890646 = score(doc=1496,freq=2.0), product of:
                0.105097495 = queryWeight, product of:
                  2.7136984 = idf(docFreq=7967, maxDocs=44218)
                  0.03872851 = queryNorm
                0.17989448 = fieldWeight in 1496, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.7136984 = idf(docFreq=7967, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1496)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    An important aspect to performing text categorization is selecting appropriate supervised classification and feature selection methods. A comprehensive benchmark is needed to inform best practices in this broad application field. Previous benchmarks have evaluated performance for a few supervised classification and feature selection methods and limited ways to optimize them. The present work updates prior benchmarks by increasing the number of classifiers and feature selection methods order of magnitude, including adding recently developed, state-of-the-art methods. Specifically, this study used 229 text categorization data sets/tasks, and evaluated 28 classification methods (both well-established and proprietary/commercial) and 19 feature selection methods according to 4 classification performance metrics. We report several key findings that will be helpful in establishing best methodological practices for text categorization.
  11. Billal, B.; Fonseca, A.; Sadat, F.; Lounis, H.: Semi-supervised learning and social media text analysis towards multi-labeling categorization (2017) 0.00
    0.0014854318 = product of:
      0.00891259 = sum of:
        0.00891259 = product of:
          0.01782518 = sum of:
            0.01782518 = weight(_text_:4 in 4095) [ClassicSimilarity], result of:
              0.01782518 = score(doc=4095,freq=4.0), product of:
                0.105097495 = queryWeight, product of:
                  2.7136984 = idf(docFreq=7967, maxDocs=44218)
                  0.03872851 = queryNorm
                0.16960615 = fieldWeight in 4095, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.7136984 = idf(docFreq=7967, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4095)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    In traditional text classification, classes are mutually exclusive, i.e. it is not possible to have one text or text fragment classified into more than one class. On the other hand, in multi-label classification an individual text may belong to several classes simultaneously. This type of classification is required by a large number of current applications such as big data classification, images and video annotation. Supervised learning is the most used type of machine learning in the classification task. It requires large quantities of labeled data and the intervention of a human tagger in the creation of the training sets. When the data sets become very large or heavily noisy, this operation can be tedious, prone to error and time consuming. In this case, semi-supervised learning, which requires only few labels, is a better choice. In this paper, we study and evaluate several methods to address the problem of multi-label classification using semi-supervised learning and data from social networks. First, we propose a linguistic pre-processing involving tokeni-sation, recognition of named entities and hashtag segmentation in order to decrease the noise in this type of massive and unstructured real data and then we perform a word sense disambiguation using WordNet. Second, several experiments related to multi-label classification and semi-supervised learning are carried out on these data sets and compared to each other. These evaluations compare the results of the approaches considered. This paper proposes a method for combining semi-supervised methods with a graph method for the extraction of subjects in social networks using a multi-label classification approach. Experiments show that the performance of the proposed model increases in 4 p.p. the precision of the classification when compared to a baseline.
    Date
    4. 2.2018 13:10:17
  12. Vilares, D.; Alonso, M.A.; Gómez-Rodríguez, C.: On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages (2015) 0.00
    0.0013129486 = product of:
      0.007877692 = sum of:
        0.007877692 = product of:
          0.015755383 = sum of:
            0.015755383 = weight(_text_:4 in 2161) [ClassicSimilarity], result of:
              0.015755383 = score(doc=2161,freq=2.0), product of:
                0.105097495 = queryWeight, product of:
                  2.7136984 = idf(docFreq=7967, maxDocs=44218)
                  0.03872851 = queryNorm
                0.14991207 = fieldWeight in 2161, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.7136984 = idf(docFreq=7967, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2161)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Date
    4. 8.2015 19:18:47
  13. Fang, H.: Classifying research articles in multidisciplinary sciences journals into subject categories (2015) 0.00
    0.0013129486 = product of:
      0.007877692 = sum of:
        0.007877692 = product of:
          0.015755383 = sum of:
            0.015755383 = weight(_text_:4 in 2194) [ClassicSimilarity], result of:
              0.015755383 = score(doc=2194,freq=2.0), product of:
                0.105097495 = queryWeight, product of:
                  2.7136984 = idf(docFreq=7967, maxDocs=44218)
                  0.03872851 = queryNorm
                0.14991207 = fieldWeight in 2194, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.7136984 = idf(docFreq=7967, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2194)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Date
    4. 9.2015 15:35:34
  14. AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.00
    0.0013129486 = product of:
      0.007877692 = sum of:
        0.007877692 = product of:
          0.015755383 = sum of:
            0.015755383 = weight(_text_:4 in 2836) [ClassicSimilarity], result of:
              0.015755383 = score(doc=2836,freq=2.0), product of:
                0.105097495 = queryWeight, product of:
                  2.7136984 = idf(docFreq=7967, maxDocs=44218)
                  0.03872851 = queryNorm
                0.14991207 = fieldWeight in 2836, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.7136984 = idf(docFreq=7967, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2836)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Abstract
    Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.
  15. Chae, G.; Park, J.; Park, J.; Yeo, W.S.; Shi, C.: Linking and clustering artworks using social tags : revitalizing crowd-sourced information on cultural collections (2016) 0.00
    0.0013129486 = product of:
      0.007877692 = sum of:
        0.007877692 = product of:
          0.015755383 = sum of:
            0.015755383 = weight(_text_:4 in 2852) [ClassicSimilarity], result of:
              0.015755383 = score(doc=2852,freq=2.0), product of:
                0.105097495 = queryWeight, product of:
                  2.7136984 = idf(docFreq=7967, maxDocs=44218)
                  0.03872851 = queryNorm
                0.14991207 = fieldWeight in 2852, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.7136984 = idf(docFreq=7967, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2852)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.4, S.885-899
  16. Ru, C.; Tang, J.; Li, S.; Xie, S.; Wang, T.: Using semantic similarity to reduce wrong labels in distant supervision for relation extraction (2018) 0.00
    0.0013129486 = product of:
      0.007877692 = sum of:
        0.007877692 = product of:
          0.015755383 = sum of:
            0.015755383 = weight(_text_:4 in 5055) [ClassicSimilarity], result of:
              0.015755383 = score(doc=5055,freq=2.0), product of:
                0.105097495 = queryWeight, product of:
                  2.7136984 = idf(docFreq=7967, maxDocs=44218)
                  0.03872851 = queryNorm
                0.14991207 = fieldWeight in 5055, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.7136984 = idf(docFreq=7967, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5055)
          0.5 = coord(1/2)
      0.16666667 = coord(1/6)
    
    Source
    Information processing and management. 54(2018) no.4, S.593-608