Search (126 results, page 1 of 7)

  • × theme_ss:"Automatisches Klassifizieren"
  1. Dubin, D.: Dimensions and discriminability (1998) 0.05
    0.052232232 = product of:
      0.104464464 = sum of:
        0.05503747 = weight(_text_:processing in 2338) [ClassicSimilarity], result of:
          0.05503747 = score(doc=2338,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.3130829 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
        0.049426995 = product of:
          0.07414049 = sum of:
            0.03295579 = weight(_text_:science in 2338) [ClassicSimilarity], result of:
              0.03295579 = score(doc=2338,freq=4.0), product of:
                0.11438741 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.043425296 = queryNorm
                0.2881068 = fieldWeight in 2338, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
            0.041184697 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
              0.041184697 = score(doc=2338,freq=2.0), product of:
                0.15206799 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043425296 = queryNorm
                0.2708308 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.6666667 = coord(2/3)
      0.5 = coord(2/4)
    
    Date
    22. 9.1997 19:16:05
    Imprint
    Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
    Source
    Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al
  2. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.04
    0.0448773 = product of:
      0.0897546 = sum of:
        0.04717497 = weight(_text_:processing in 3464) [ClassicSimilarity], result of:
          0.04717497 = score(doc=3464,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.26835677 = fieldWeight in 3464, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
        0.04257962 = product of:
          0.06386943 = sum of:
            0.02824782 = weight(_text_:science in 3464) [ClassicSimilarity], result of:
              0.02824782 = score(doc=3464,freq=4.0), product of:
                0.11438741 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.043425296 = queryNorm
                0.24694869 = fieldWeight in 3464, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3464)
            0.03562161 = weight(_text_:29 in 3464) [ClassicSimilarity], result of:
              0.03562161 = score(doc=3464,freq=2.0), product of:
                0.15275662 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043425296 = queryNorm
                0.23319192 = fieldWeight in 3464, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3464)
          0.6666667 = coord(2/3)
      0.5 = coord(2/4)
    
    Abstract
    We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.
    Date
    1. 6.2010 9:29:57
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1105-1119
  3. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.04
    0.040368967 = product of:
      0.08073793 = sum of:
        0.06897088 = product of:
          0.20691264 = sum of:
            0.20691264 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.20691264 = score(doc=562,freq=2.0), product of:
                0.36816013 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.043425296 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.011767056 = product of:
          0.035301168 = sum of:
            0.035301168 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.035301168 = score(doc=562,freq=2.0), product of:
                0.15206799 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043425296 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  4. Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.03
    0.03444516 = product of:
      0.06889032 = sum of:
        0.05503747 = weight(_text_:processing in 2219) [ClassicSimilarity], result of:
          0.05503747 = score(doc=2219,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.3130829 = fieldWeight in 2219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2219)
        0.013852848 = product of:
          0.04155854 = sum of:
            0.04155854 = weight(_text_:29 in 2219) [ClassicSimilarity], result of:
              0.04155854 = score(doc=2219,freq=2.0), product of:
                0.15275662 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043425296 = queryNorm
                0.27205724 = fieldWeight in 2219, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2219)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Classification of office documents is one of the administrative functions carried out by almost every organization and institution which sends and receives correspondence. Processing of this increasing amount of information coming and out going mail, in particular its classification, is time consuming and expensive. More and more organizations are seeking a solution for meeting this challenge by designing computer based systems for automatic classification. Examines the present status of available knowledge and methodology which can be used for automatic classification of office documents. Besides a review of classic methods and techniques, the focus id also placed on the application of artificial intelligence
    Source
    Records management quarterly. 29(1995) no.4, S.3-18
  5. Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.03
    0.028295456 = product of:
      0.05659091 = sum of:
        0.04717497 = weight(_text_:processing in 3015) [ClassicSimilarity], result of:
          0.04717497 = score(doc=3015,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.26835677 = fieldWeight in 3015, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=3015)
        0.00941594 = product of:
          0.02824782 = sum of:
            0.02824782 = weight(_text_:science in 3015) [ClassicSimilarity], result of:
              0.02824782 = score(doc=3015,freq=4.0), product of:
                0.11438741 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.043425296 = queryNorm
                0.24694869 = fieldWeight in 3015, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3015)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.7, S.1668-1678
  6. Kwok, K.L.: ¬The use of titles and cited titles as document representations for automatic classification (1975) 0.03
    0.027518734 = product of:
      0.11007494 = sum of:
        0.11007494 = weight(_text_:processing in 4347) [ClassicSimilarity], result of:
          0.11007494 = score(doc=4347,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.6261658 = fieldWeight in 4347, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.109375 = fieldNorm(doc=4347)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 11(1975), S.201-206
  7. Wu, M.; Fuller, M.; Wilkinson, R.: Using clustering and classification approaches in interactive retrieval (2001) 0.03
    0.027518734 = product of:
      0.11007494 = sum of:
        0.11007494 = weight(_text_:processing in 2666) [ClassicSimilarity], result of:
          0.11007494 = score(doc=2666,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.6261658 = fieldWeight in 2666, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.109375 = fieldNorm(doc=2666)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 37(2001) no.3, S.459-484
  8. Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004) 0.03
    0.026916523 = product of:
      0.053833045 = sum of:
        0.04717497 = weight(_text_:processing in 2563) [ClassicSimilarity], result of:
          0.04717497 = score(doc=2563,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.26835677 = fieldWeight in 2563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=2563)
        0.006658075 = product of:
          0.019974224 = sum of:
            0.019974224 = weight(_text_:science in 2563) [ClassicSimilarity], result of:
              0.019974224 = score(doc=2563,freq=2.0), product of:
                0.11438741 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.043425296 = queryNorm
                0.17461908 = fieldWeight in 2563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2563)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Topic discovery is an important means for marketing, e-Business and social science studies. As well, it can be applied to various purposes, such as identifying a group with certain properties and observing the emergence and diminishment of a certain cyber community. Previous topic discovery work (J.M. Kleinberg, Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, p. 668) requires manual judgment of usefulness of outcomes and is thus incapable of handling the explosive growth of the Internet. In this paper, we propose the Automatic Topic Discovery (ATD) method, which combines a method of base set construction, a clustering algorithm and an iterative principal eigenvector computation method to discover the topics relevant to a given query without using manual examination. Given a query, ATD returns with topics associated with the query and top representative pages for each topic. Our experiments show that the ATD method performs better than the traditional eigenvector method in terms of computation time and topic discovery quality.
    Source
    Information processing and management. 40(2004) no.2, S.239-255
  9. Barbu, E.: What kind of knowledge is in Wikipedia? : unsupervised extraction of properties for similar concepts (2014) 0.03
    0.026916523 = product of:
      0.053833045 = sum of:
        0.04717497 = weight(_text_:processing in 1547) [ClassicSimilarity], result of:
          0.04717497 = score(doc=1547,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.26835677 = fieldWeight in 1547, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=1547)
        0.006658075 = product of:
          0.019974224 = sum of:
            0.019974224 = weight(_text_:science in 1547) [ClassicSimilarity], result of:
              0.019974224 = score(doc=1547,freq=2.0), product of:
                0.11438741 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.043425296 = queryNorm
                0.17461908 = fieldWeight in 1547, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1547)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    This article presents a novel method for extracting knowledge from Wikipedia and a classification schema for annotating the extracted knowledge. Unlike the majority of approaches in the literature, we use the raw Wikipedia text for knowledge acquisition. The main assumption made is that the concepts classified under the same node in a taxonomy are described in a comparable way in Wikipedia. The annotation of the extracted knowledge is done at two levels: ontological and logical. The extracted properties are evaluated in the traditional way, that is, by computing the precision of the extraction procedure and in a clustering task. The second method of evaluation is seldom used in the natural language processing community, but it is regularly employed in cognitive psychology.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.12, S.2489-2497
  10. Ibekwe-SanJuan, F.; SanJuan, E.: From term variants to research topics (2002) 0.02
    0.024603685 = product of:
      0.04920737 = sum of:
        0.03931248 = weight(_text_:processing in 1853) [ClassicSimilarity], result of:
          0.03931248 = score(doc=1853,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.22363065 = fieldWeight in 1853, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1853)
        0.009894893 = product of:
          0.029684676 = sum of:
            0.029684676 = weight(_text_:29 in 1853) [ClassicSimilarity], result of:
              0.029684676 = score(doc=1853,freq=2.0), product of:
                0.15275662 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043425296 = queryNorm
                0.19432661 = fieldWeight in 1853, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1853)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    In a scientific and technological watch (STW) task, an expert user needs to survey the evolution of research topics in his area of specialisation in order to detect interesting changes. The majority of methods proposing evaluation metrics (bibliometrics and scientometrics studies) for STW rely solely an statistical data analysis methods (Co-citation analysis, co-word analysis). Such methods usually work an structured databases where the units of analysis (words, keywords) are already attributed to documents by human indexers. The advent of huge amounts of unstructured textual data has rendered necessary the integration of natural language processing (NLP) techniques to first extract meaningful units from texts. We propose a method for STW which is NLP-oriented. The method not only analyses texts linguistically in order to extract terms from them, but also uses linguistic relations (syntactic variations) as the basis for clustering. Terms and variation relations are formalised as weighted di-graphs which the clustering algorithm, CPCL (Classification by Preferential Clustered Link) will seek to reduce in order to produces classes. These classes ideally represent the research topics present in the corpus. The results of the classification are subjected to validation by an expert in STW.
    Source
    Knowledge organization. 29(2002) nos.3/4, S.181-197
  11. Kwon, O.W.; Lee, J.H.: Text categorization based on k-nearest neighbor approach for web site classification (2003) 0.02
    0.024603685 = product of:
      0.04920737 = sum of:
        0.03931248 = weight(_text_:processing in 1070) [ClassicSimilarity], result of:
          0.03931248 = score(doc=1070,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.22363065 = fieldWeight in 1070, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1070)
        0.009894893 = product of:
          0.029684676 = sum of:
            0.029684676 = weight(_text_:29 in 1070) [ClassicSimilarity], result of:
              0.029684676 = score(doc=1070,freq=2.0), product of:
                0.15275662 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043425296 = queryNorm
                0.19432661 = fieldWeight in 1070, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1070)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Date
    27.12.2007 17:32:29
    Source
    Information processing and management. 39(2003) no.1, S.25-44
  12. Golub, K.: Automated subject classification of textual web documents (2006) 0.02
    0.024461292 = product of:
      0.048922583 = sum of:
        0.03931248 = weight(_text_:processing in 5600) [ClassicSimilarity], result of:
          0.03931248 = score(doc=5600,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.22363065 = fieldWeight in 5600, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5600)
        0.009610103 = product of:
          0.02883031 = sum of:
            0.02883031 = weight(_text_:science in 5600) [ClassicSimilarity], result of:
              0.02883031 = score(doc=5600,freq=6.0), product of:
                0.11438741 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.043425296 = queryNorm
                0.25204095 = fieldWeight in 5600, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5600)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Purpose - To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such. Design/methodology/approach - A range of works dealing with automated classification of full-text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages. Findings - Provides major similarities and differences between the three approaches: document pre-processing and utilization of web-specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized. Research limitations/implications - The paper does not attempt to provide an exhaustive bibliography of related resources. Practical implications - As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities. Originality/value - To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.
  13. Major, R.L.; Ragsdale, C.T.: ¬An aggregation approach to the classification problem using multiple prediction experts (2000) 0.02
    0.023587486 = product of:
      0.09434994 = sum of:
        0.09434994 = weight(_text_:processing in 3789) [ClassicSimilarity], result of:
          0.09434994 = score(doc=3789,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.53671354 = fieldWeight in 3789, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.09375 = fieldNorm(doc=3789)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 36(2000) no.4, S.683-696
  14. Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.02
    0.023579547 = product of:
      0.047159094 = sum of:
        0.03931248 = weight(_text_:processing in 5997) [ClassicSimilarity], result of:
          0.03931248 = score(doc=5997,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.22363065 = fieldWeight in 5997, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5997)
        0.007846616 = product of:
          0.023539849 = sum of:
            0.023539849 = weight(_text_:science in 5997) [ClassicSimilarity], result of:
              0.023539849 = score(doc=5997,freq=4.0), product of:
                0.11438741 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.043425296 = queryNorm
                0.20579056 = fieldWeight in 5997, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5997)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Given the huge amount of information in the internet and in practically every domain of knowledge that we are facing today, knowledge discovery calls for automation. The book deals with methods from classification and data analysis that respond effectively to this rapidly growing challenge. The interested reader will find new methodological insights as well as applications in economics, management science, finance, and marketing, and in pattern recognition, biology, health, and archaeology.
    Content
    Data Analysis, Statistics, and Classification.- Pattern Recognition and Automation.- Data Mining, Information Processing, and Automation.- New Media, Web Mining, and Automation.- Applications in Management Science, Finance, and Marketing.- Applications in Medicine, Biology, Archaeology, and Others.- Author Index.- Subject Index.
  15. Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.02
    0.023579547 = product of:
      0.047159094 = sum of:
        0.03931248 = weight(_text_:processing in 3300) [ClassicSimilarity], result of:
          0.03931248 = score(doc=3300,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.22363065 = fieldWeight in 3300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3300)
        0.007846616 = product of:
          0.023539849 = sum of:
            0.023539849 = weight(_text_:science in 3300) [ClassicSimilarity], result of:
              0.023539849 = score(doc=3300,freq=4.0), product of:
                0.11438741 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.043425296 = queryNorm
                0.20579056 = fieldWeight in 3300, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3300)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.12, S.2530-2539
  16. Kishida, K.: High-speed rough clustering for very large document collections (2010) 0.02
    0.022430437 = product of:
      0.044860873 = sum of:
        0.03931248 = weight(_text_:processing in 3463) [ClassicSimilarity], result of:
          0.03931248 = score(doc=3463,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.22363065 = fieldWeight in 3463, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3463)
        0.005548396 = product of:
          0.016645188 = sum of:
            0.016645188 = weight(_text_:science in 3463) [ClassicSimilarity], result of:
              0.016645188 = score(doc=3463,freq=2.0), product of:
                0.11438741 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.043425296 = queryNorm
                0.1455159 = fieldWeight in 3463, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3463)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Document clustering is an important tool, but it is not yet widely used in practice probably because of its high computational complexity. This article explores techniques of high-speed rough clustering of documents, assuming that it is sometimes necessary to obtain a clustering result in a shorter time, although the result is just an approximate outline of document clusters. A promising approach for such clustering is to reduce the number of documents to be checked for generating cluster vectors in the leader-follower clustering algorithm. Based on this idea, the present article proposes a modified Crouch algorithm and incomplete single-pass leader-follower algorithm. Also, a two-stage grouping technique, in which the first stage attempts to decrease the number of documents to be processed in the second stage by applying a quick merging technique, is developed. An experiment using a part of the Reuters corpus RCV1 showed empirically that both the modified Crouch and the incomplete single-pass leader-follower algorithms achieve clustering results more efficiently than the original methods, and also improved the effectiveness of clustering results. On the other hand, the two-stage grouping technique did not reduce the processing time in this experiment.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1092-1104
  17. Vilares, D.; Alonso, M.A.; Gómez-Rodríguez, C.: On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages (2015) 0.02
    0.022430437 = product of:
      0.044860873 = sum of:
        0.03931248 = weight(_text_:processing in 2161) [ClassicSimilarity], result of:
          0.03931248 = score(doc=2161,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.22363065 = fieldWeight in 2161, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2161)
        0.005548396 = product of:
          0.016645188 = sum of:
            0.016645188 = weight(_text_:science in 2161) [ClassicSimilarity], result of:
              0.016645188 = score(doc=2161,freq=2.0), product of:
                0.11438741 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.043425296 = queryNorm
                0.1455159 = fieldWeight in 2161, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2161)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1799-1816
  18. Mu, T.; Goulermas, J.Y.; Korkontzelos, I.; Ananiadou, S.: Descriptive document clustering via discriminant learning in a co-embedded space of multilevel similarities (2016) 0.02
    0.022430437 = product of:
      0.044860873 = sum of:
        0.03931248 = weight(_text_:processing in 2496) [ClassicSimilarity], result of:
          0.03931248 = score(doc=2496,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.22363065 = fieldWeight in 2496, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2496)
        0.005548396 = product of:
          0.016645188 = sum of:
            0.016645188 = weight(_text_:science in 2496) [ClassicSimilarity], result of:
              0.016645188 = score(doc=2496,freq=2.0), product of:
                0.11438741 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.043425296 = queryNorm
                0.1455159 = fieldWeight in 2496, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2496)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Descriptive document clustering aims at discovering clusters of semantically interrelated documents together with meaningful labels to summarize the content of each document cluster. In this work, we propose a novel descriptive clustering framework, referred to as CEDL. It relies on the formulation and generation of 2 types of heterogeneous objects, which correspond to documents and candidate phrases, using multilevel similarity information. CEDL is composed of 5 main processing stages. First, it simultaneously maps the documents and candidate phrases into a common co-embedded space that preserves higher-order, neighbor-based proximities between the combined sets of documents and phrases. Then, it discovers an approximate cluster structure of documents in the common space. The third stage extracts promising topic phrases by constructing a discriminant model where documents along with their cluster memberships are used as training instances. Subsequently, the final cluster labels are selected from the topic phrases using a ranking scheme using multiple scores based on the extracted co-embedding information and the discriminant output. The final stage polishes the initial clusters to reduce noise and accommodate the multitopic nature of documents. The effectiveness and competitiveness of CEDL is demonstrated qualitatively and quantitatively with experiments using document databases from different application fields.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.1, S.106-133
  19. Na, J.-C.; Sui, H.; Khoo, C.; Chan, S.; Zhou, Y.: Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews (2004) 0.02
    0.017022802 = product of:
      0.068091206 = sum of:
        0.068091206 = weight(_text_:processing in 2624) [ClassicSimilarity], result of:
          0.068091206 = score(doc=2624,freq=6.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.38733965 = fieldWeight in 2624, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2624)
      0.25 = coord(1/4)
    
    Abstract
    This paper reports a study in automatic sentiment classification, i.e., automatically classifying documents as expressing positive or negative Sentiments/opinions. The study investigates the effectiveness of using SVM (Support Vector Machine) an various text features to classify product reviews into recommended (positive Sentiment) and not recommended (negative sentiment). Compared with traditional topical classification, it was hypothesized that syntactic and semantic processing of text would be more important for sentiment classification. In the first part of this study, several different approaches, unigrams (individual words), selected words (such as verb, adjective, and adverb), and words labelled with part-of-speech tags were investigated. A sample of 1,800 various product reviews was retrieved from Review Centre (www.reviewcentre.com) for the study. 1,200 reviews were used for training, and 600 for testing. Using SVM, the baseline unigram approach obtained an accuracy rate of around 76%. The use of selected words obtained a marginally better result of 77.33%. Error analysis suggests various approaches for improving classification accuracy: use of negation phrase, making inference from superficial words, and solving the problem of comments an parts. The second part of the study that is in progress investigates the use of negation phrase through simple linguistic processing to improve classification accuracy. This approach increased the accuracy rate up to 79.33%.
  20. Autonomy, Inc.: Automatic classification (o.J.) 0.02
    0.01572499 = product of:
      0.06289996 = sum of:
        0.06289996 = weight(_text_:processing in 1666) [ClassicSimilarity], result of:
          0.06289996 = score(doc=1666,freq=2.0), product of:
            0.175792 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.043425296 = queryNorm
            0.35780904 = fieldWeight in 1666, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0625 = fieldNorm(doc=1666)
      0.25 = coord(1/4)
    
    Abstract
    Autonomy's Classification solutions remove the necessity for organizations to rely on human intervention or manual processing of information, such as manual tagging, typically required to make most other e-business applications work. Autonomy's ability to consistently and accurately classify data automatically is a unique infrastructure solution that overcomes the predicaments surrounding the exponential growth of unstructured data.

Years

Languages

  • e 117
  • d 8
  • chi 1
  • More… Less…

Types

  • a 116
  • el 7
  • m 2
  • x 2
  • s 1
  • More… Less…