Search (31 results, page 1 of 2)

  • × theme_ss:"Automatisches Klassifizieren"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07
    0.07227165 = sum of:
      0.053885084 = product of:
        0.21554033 = sum of:
          0.21554033 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
            0.21554033 = score(doc=562,freq=2.0), product of:
              0.38351142 = queryWeight, product of:
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.045236014 = queryNorm
              0.56201804 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.25 = coord(1/4)
      0.018386567 = product of:
        0.036773134 = sum of:
          0.036773134 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
            0.036773134 = score(doc=562,freq=2.0), product of:
              0.15840882 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045236014 = queryNorm
              0.23214069 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.5 = coord(1/2)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.04
    0.03855045 = product of:
      0.0771009 = sum of:
        0.0771009 = sum of:
          0.046456624 = weight(_text_:n in 2765) [ClassicSimilarity], result of:
            0.046456624 = score(doc=2765,freq=2.0), product of:
              0.19504215 = queryWeight, product of:
                4.3116565 = idf(docFreq=1611, maxDocs=44218)
                0.045236014 = queryNorm
              0.23818761 = fieldWeight in 2765, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3116565 = idf(docFreq=1611, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2765)
          0.030644279 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
            0.030644279 = score(doc=2765,freq=2.0), product of:
              0.15840882 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045236014 = queryNorm
              0.19345059 = fieldWeight in 2765, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2765)
      0.5 = coord(1/2)
    
    Date
    22. 3.2009 19:14:43
  3. Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.02
    0.020116309 = product of:
      0.040232617 = sum of:
        0.040232617 = product of:
          0.080465235 = sum of:
            0.080465235 = weight(_text_:n in 448) [ClassicSimilarity], result of:
              0.080465235 = score(doc=448,freq=6.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.41255307 = fieldWeight in 448, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=448)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A distributed memory parallel version of the group average hierarchical agglomerative clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard Text REtrieval Conference (TREC) test collection, our parallel hierarchical clustering algorithm is shown to be scalable in terms of processors efficiently used and the collection size. Results show that our algorithm performs close to the expected O(n**2/p) time on p processors rather than the worst-case O(n**3/p) time. Furthermore, the O(n**2/p) memory complexity per node allows larger collections to be clustered as the number of nodes increases. While partitioning algorithms such as k-means are trivially parallelizable, our results confirm those of other studies which showed that hierarchical algorithms produce significantly tighter clusters in the document clustering task. Finally, we show how our parallel hierarchical agglomerative clustering algorithm can be used as the clustering subroutine for a parallel version of the buckshot algorithm to cluster the complete TREC collection at near theoretical runtime expectations.
  4. Choi, B.; Peng, X.: Dynamic and hierarchical classification of Web pages (2004) 0.02
    0.019709876 = product of:
      0.03941975 = sum of:
        0.03941975 = product of:
          0.0788395 = sum of:
            0.0788395 = weight(_text_:n in 2555) [ClassicSimilarity], result of:
              0.0788395 = score(doc=2555,freq=4.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.40421778 = fieldWeight in 2555, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2555)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet. Although many automatic classification systems have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of Web pages being added into the systems. They also require searching through all existing categories to make any classification. This article proposes a dynamic and hierarchical classification system that is capable of adding new categories as required, organising the Web pages into a tree structure, and classifying Web pages by searching through only one path of the tree. The proposed single-path search technique reduces the search complexity from (n) to (log(n)). Test results show that the system improves the accuracy of classification by 6 percent in comparison to related systems. The dynamic-category expansion technique also achieves satisfying results for adding new categories into the system as required.
  5. Fuhr, N.: Klassifikationsverfahren bei der automatischen Indexierung (1983) 0.02
    0.01858265 = product of:
      0.0371653 = sum of:
        0.0371653 = product of:
          0.0743306 = sum of:
            0.0743306 = weight(_text_:n in 7697) [ClassicSimilarity], result of:
              0.0743306 = score(doc=7697,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.38110018 = fieldWeight in 7697, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7697)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  6. Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.02
    0.018386567 = product of:
      0.036773134 = sum of:
        0.036773134 = product of:
          0.07354627 = sum of:
            0.07354627 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
              0.07354627 = score(doc=1046,freq=2.0), product of:
                0.15840882 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045236014 = queryNorm
                0.46428138 = fieldWeight in 1046, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1046)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    5. 5.2003 14:17:22
  7. Meder, N.: Artificial intelligence as a tool of classification, or: the network of language games as cognitive paradigm (1985) 0.02
    0.016259817 = product of:
      0.032519635 = sum of:
        0.032519635 = product of:
          0.06503927 = sum of:
            0.06503927 = weight(_text_:n in 7694) [ClassicSimilarity], result of:
              0.06503927 = score(doc=7694,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.33346266 = fieldWeight in 7694, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=7694)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  8. Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.02
    0.0153221395 = product of:
      0.030644279 = sum of:
        0.030644279 = product of:
          0.061288558 = sum of:
            0.061288558 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
              0.061288558 = score(doc=611,freq=2.0), product of:
                0.15840882 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045236014 = queryNorm
                0.38690117 = fieldWeight in 611, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=611)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 8.2009 12:54:24
  9. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.02
    0.0153221395 = product of:
      0.030644279 = sum of:
        0.030644279 = product of:
          0.061288558 = sum of:
            0.061288558 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.061288558 = score(doc=2748,freq=2.0), product of:
                0.15840882 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045236014 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    1. 2.2016 18:25:22
  10. Drori, O.; Alon, N.: Using document classification for displaying search results (2003) 0.01
    0.013936987 = product of:
      0.027873974 = sum of:
        0.027873974 = product of:
          0.05574795 = sum of:
            0.05574795 = weight(_text_:n in 1565) [ClassicSimilarity], result of:
              0.05574795 = score(doc=1565,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.28582513 = fieldWeight in 1565, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1565)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  11. Finn, A.; Kushmerick, N.: Learning to classify documents according to genre (2006) 0.01
    0.013936987 = product of:
      0.027873974 = sum of:
        0.027873974 = product of:
          0.05574795 = sum of:
            0.05574795 = weight(_text_:n in 6010) [ClassicSimilarity], result of:
              0.05574795 = score(doc=6010,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.28582513 = fieldWeight in 6010, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6010)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  12. Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.01
    0.013936987 = product of:
      0.027873974 = sum of:
        0.027873974 = product of:
          0.05574795 = sum of:
            0.05574795 = weight(_text_:n in 3015) [ClassicSimilarity], result of:
              0.05574795 = score(doc=3015,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.28582513 = fieldWeight in 3015, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3015)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
  13. Calado, P.; Cristo, M.; Gonçalves, M.A.; Moura, E.S. de; Ribeiro-Neto, B.; Ziviani, N.: Link-based similarity measures for the classification of Web documents (2006) 0.01
    0.011614156 = product of:
      0.023228312 = sum of:
        0.023228312 = product of:
          0.046456624 = sum of:
            0.046456624 = weight(_text_:n in 4921) [ClassicSimilarity], result of:
              0.046456624 = score(doc=4921,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.23818761 = fieldWeight in 4921, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4921)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  14. Rooney, N.; Patterson, D.; Galushka, M.; Dobrynin, V.; Smirnova, E.: ¬An investigation into the stability of contextual document clustering (2008) 0.01
    0.011614156 = product of:
      0.023228312 = sum of:
        0.023228312 = product of:
          0.046456624 = sum of:
            0.046456624 = weight(_text_:n in 1356) [ClassicSimilarity], result of:
              0.046456624 = score(doc=1356,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.23818761 = fieldWeight in 1356, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1356)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  15. Mengle, S.S.R.; Goharian, N.: Ambiguity measure feature-selection algorithm (2009) 0.01
    0.011614156 = product of:
      0.023228312 = sum of:
        0.023228312 = product of:
          0.046456624 = sum of:
            0.046456624 = weight(_text_:n in 2804) [ClassicSimilarity], result of:
              0.046456624 = score(doc=2804,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.23818761 = fieldWeight in 2804, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2804)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  16. Maghsoodi, N.; Homayounpour, M.M.: Improving Farsi multiclass text classification using a thesaurus and two-stage feature selection (2011) 0.01
    0.011614156 = product of:
      0.023228312 = sum of:
        0.023228312 = product of:
          0.046456624 = sum of:
            0.046456624 = weight(_text_:n in 4775) [ClassicSimilarity], result of:
              0.046456624 = score(doc=4775,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.23818761 = fieldWeight in 4775, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4775)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  17. Qu, B.; Cong, G.; Li, C.; Sun, A.; Chen, H.: ¬An evaluation of classification models for question topic categorization (2012) 0.01
    0.011614156 = product of:
      0.023228312 = sum of:
        0.023228312 = product of:
          0.046456624 = sum of:
            0.046456624 = weight(_text_:n in 237) [ClassicSimilarity], result of:
              0.046456624 = score(doc=237,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.23818761 = fieldWeight in 237, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=237)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.
  18. Alberts, I.; Forest, D.: Email pragmatics and automatic classification : a study in the organizational context (2012) 0.01
    0.011614156 = product of:
      0.023228312 = sum of:
        0.023228312 = product of:
          0.046456624 = sum of:
            0.046456624 = weight(_text_:n in 238) [ClassicSimilarity], result of:
              0.046456624 = score(doc=238,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.23818761 = fieldWeight in 238, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=238)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper presents a two-phased research project aiming to improve email triage for public administration managers. The first phase developed a typology of email classification patterns through a qualitative study involving 34 participants. Inspired by the fields of pragmatics and speech act theory, this typology comprising four top level categories and 13 subcategories represents the typical email triage behaviors of managers in an organizational context. The second study phase was conducted on a corpus of 1,703 messages using email samples of two managers. Using the k-NN (k-nearest neighbor) algorithm, statistical treatments automatically classified the email according to lexical and nonlexical features representative of managers' triage patterns. The automatic classification of email according to the lexicon of the messages was found to be substantially more efficient when k = 2 and n = 2,000. For four categories, the average recall rate was 94.32%, the average precision rate was 94.50%, and the accuracy rate was 94.54%. For 13 categories, the average recall rate was 91.09%, the average precision rate was 84.18%, and the accuracy rate was 88.70%. It appears that a message's nonlexical features are also deeply influenced by email pragmatics. Features related to the recipient and the sender were the most relevant for characterizing email.
  19. Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.01
    0.010725497 = product of:
      0.021450995 = sum of:
        0.021450995 = product of:
          0.04290199 = sum of:
            0.04290199 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
              0.04290199 = score(doc=141,freq=2.0), product of:
                0.15840882 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045236014 = queryNorm
                0.2708308 = fieldWeight in 141, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=141)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Pages
    S.1-22
  20. Dubin, D.: Dimensions and discriminability (1998) 0.01
    0.010725497 = product of:
      0.021450995 = sum of:
        0.021450995 = product of:
          0.04290199 = sum of:
            0.04290199 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
              0.04290199 = score(doc=2338,freq=2.0), product of:
                0.15840882 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045236014 = queryNorm
                0.2708308 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 9.1997 19:16:05