Search (24 results, page 1 of 2)

  • × theme_ss:"Automatisches Klassifizieren"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.17
    0.17360047 = product of:
      0.2893341 = sum of:
        0.067984015 = product of:
          0.20395203 = sum of:
            0.20395203 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.20395203 = score(doc=562,freq=2.0), product of:
                0.36289233 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.042803947 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.20395203 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.20395203 = score(doc=562,freq=2.0), product of:
            0.36289233 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.042803947 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.017398031 = product of:
          0.034796063 = sum of:
            0.034796063 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.034796063 = score(doc=562,freq=2.0), product of:
                0.14989214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042803947 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Fagni, T.; Sebastiani, F.: Selecting negative examples for hierarchical text classification: An experimental comparison (2010) 0.02
    0.023549197 = product of:
      0.11774598 = sum of:
        0.11774598 = weight(_text_:policy in 4101) [ClassicSimilarity], result of:
          0.11774598 = score(doc=4101,freq=6.0), product of:
            0.22950763 = queryWeight, product of:
              5.361833 = idf(docFreq=563, maxDocs=44218)
              0.042803947 = queryNorm
            0.5130373 = fieldWeight in 4101, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.361833 = idf(docFreq=563, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4101)
      0.2 = coord(1/5)
    
    Abstract
    Hierarchical text classification (HTC) approaches have recently attracted a lot of interest on the part of researchers in human language technology and machine learning, since they have been shown to bring about equal, if not better, classification accuracy with respect to their "flat" counterparts while allowing exponential time savings at both learning and classification time. A typical component of HTC methods is a "local" policy for selecting negative examples: Given a category c, its negative training examples are by default identified with the training examples that are negative for c and positive for the categories which are siblings of c in the hierarchy. However, this policy has always been taken for granted and never been subjected to careful scrutiny since first proposed 15 years ago. This article proposes a thorough experimental comparison between this policy and three other policies for the selection of negative examples in HTC contexts, one of which (BEST LOCAL (k)) is being proposed for the first time in this article. We compare these policies on the hierarchical versions of three supervised learning algorithms (boosting, support vector machines, and naïve Bayes) by performing experiments on two standard TC datasets, REUTERS-21578 and RCV1-V2.
  3. Cui, H.; Heidorn, P.B.; Zhang, H.: ¬An approach to automatic classification of text for information retrieval (2002) 0.02
    0.020992003 = product of:
      0.10496002 = sum of:
        0.10496002 = weight(_text_:great in 174) [ClassicSimilarity], result of:
          0.10496002 = score(doc=174,freq=2.0), product of:
            0.24101958 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.042803947 = queryNorm
            0.43548337 = fieldWeight in 174, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.0546875 = fieldNorm(doc=174)
      0.2 = coord(1/5)
    
    Abstract
    In this paper, we explore an approach to make better use of semi-structured documents in information retrieval in the domain of biology. Using machine learning techniques, we make those inherent structures explicit by XML markups. This marking up has great potentials in improving task performance in specimen identification and the usability of online flora and fauna.
  4. Xu, Y.; Bernard, A.: Knowledge organization through statistical computation : a new approach (2009) 0.02
    0.017993147 = product of:
      0.08996573 = sum of:
        0.08996573 = weight(_text_:great in 3252) [ClassicSimilarity], result of:
          0.08996573 = score(doc=3252,freq=2.0), product of:
            0.24101958 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.042803947 = queryNorm
            0.37327147 = fieldWeight in 3252, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.046875 = fieldNorm(doc=3252)
      0.2 = coord(1/5)
    
    Abstract
    Knowledge organization (KO) is an interdisciplinary issue which includes some problems in knowledge classification such as how to classify newly emerged knowledge. With the great complexity and ambiguity of knowledge, it is becoming sometimes inefficient to classify knowledge by logical reasoning. This paper attempts to propose a statistical approach to knowledge organization in order to resolve the problems in classifying complex and mass knowledge. By integrating the classification process into a mathematical model, a knowledge classifier, based on the maximum entropy theory, is constructed and the experimental results show that the classification results acquired from the classifier are reliable. The approach proposed in this paper is quite formal and is not dependent on specific contexts, so it could easily be adapted to the use of knowledge classification in other domains within KO.
  5. Wang, J.: ¬An extensive study on automated Dewey Decimal Classification (2009) 0.01
    0.014994288 = product of:
      0.07497144 = sum of:
        0.07497144 = weight(_text_:great in 3172) [ClassicSimilarity], result of:
          0.07497144 = score(doc=3172,freq=2.0), product of:
            0.24101958 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.042803947 = queryNorm
            0.31105953 = fieldWeight in 3172, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3172)
      0.2 = coord(1/5)
    
    Abstract
    In this paper, we present a theoretical analysis and extensive experiments on the automated assignment of Dewey Decimal Classification (DDC) classes to bibliographic data with a supervised machine-learning approach. Library classification systems, such as the DDC, impose great obstacles on state-of-art text categorization (TC) technologies, including deep hierarchy, data sparseness, and skewed distribution. We first analyze statistically the document and category distributions over the DDC, and discuss the obstacles imposed by bibliographic corpora and library classification schemes on TC technology. To overcome these obstacles, we propose an innovative algorithm to reshape the DDC structure into a balanced virtual tree by balancing the category distribution and flattening the hierarchy. To improve the classification effectiveness to a level acceptable to real-world applications, we propose an interactive classification model that is able to predict a class of any depth within a limited number of user interactions. The experiments are conducted on a large bibliographic collection created by the Library of Congress within the science and technology domains over 10 years. With no more than three interactions, a classification accuracy of nearly 90% is achieved, thus providing a practical solution to the automatic bibliographic classification problem.
  6. Barthel, S.; Tönnies, S.; Balke, W.-T.: Large-scale experiments for mathematical document classification (2013) 0.01
    0.014994288 = product of:
      0.07497144 = sum of:
        0.07497144 = weight(_text_:great in 1056) [ClassicSimilarity], result of:
          0.07497144 = score(doc=1056,freq=2.0), product of:
            0.24101958 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.042803947 = queryNorm
            0.31105953 = fieldWeight in 1056, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1056)
      0.2 = coord(1/5)
    
    Abstract
    The ever increasing amount of digitally available information is curse and blessing at the same time. On the one hand, users have increasingly large amounts of information at their fingertips. On the other hand, the assessment and refinement of web search results becomes more and more tiresome and difficult for non-experts in a domain. Therefore, established digital libraries offer specialized collections with a certain degree of quality. This quality can largely be attributed to the great effort invested into semantic enrichment of the provided documents e.g. by annotating their documents with respect to a domain-specific taxonomy. This process is still done manually in many domains, e.g. chemistry CAS, medicine MeSH, or mathematics MSC. But due to the growing amount of data, this manual task gets more and more time consuming and expensive. The only solution for this problem seems to employ automated classification algorithms, but from evaluations done in previous research, conclusions to a real world scenario are difficult to make. We therefore conducted a large scale feasibility study on a real world data set from one of the biggest mathematical digital libraries, i.e. Zentralblatt MATH, with special focus on its practical applicability.
  7. Pech, G.; Delgado, C.; Sorella, S.P.: Classifying papers into subfields using Abstracts, Titles, Keywords and KeyWords Plus through pattern detection and optimization procedures : an application in Physics (2022) 0.01
    0.013596135 = product of:
      0.06798068 = sum of:
        0.06798068 = weight(_text_:policy in 744) [ClassicSimilarity], result of:
          0.06798068 = score(doc=744,freq=2.0), product of:
            0.22950763 = queryWeight, product of:
              5.361833 = idf(docFreq=563, maxDocs=44218)
              0.042803947 = queryNorm
            0.29620224 = fieldWeight in 744, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.361833 = idf(docFreq=563, maxDocs=44218)
              0.0390625 = fieldNorm(doc=744)
      0.2 = coord(1/5)
    
    Abstract
    Classifying papers according to the fields of knowledge is critical to clearly understand the dynamics of scientific (sub)fields, their leading questions, and trends. Most studies rely on journal categories defined by popular databases such as WoS or Scopus, but some experts find that those categories may not correctly map the existing subfields nor identify the subfield of a specific article. This study addresses the classification problem using data from each paper (Abstract, Title, Keywords, and the KeyWords Plus) and the help of experts to identify the existing subfields and journals exclusive of each subfield. These "exclusive journals" are critical to obtain, through a pattern detection procedure that uses machine learning techniques (from software NVivo), a list of the frequent terms that are specific to each subfield. With that list of terms and with the help of optimization procedures, we can identify to which subfield each paper most likely belongs. This study can contribute to support scientific policy-makers, funding, and research institutions-via more accurate academic performance evaluations-, to support editors in their tasks to redefine the scopes of journals, and to support popular databases in their processes of refining categories.
  8. Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01
    0.0069592125 = product of:
      0.034796063 = sum of:
        0.034796063 = product of:
          0.069592126 = sum of:
            0.069592126 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
              0.069592126 = score(doc=1046,freq=2.0), product of:
                0.14989214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042803947 = queryNorm
                0.46428138 = fieldWeight in 1046, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1046)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    5. 5.2003 14:17:22
  9. Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01
    0.0057993443 = product of:
      0.02899672 = sum of:
        0.02899672 = product of:
          0.05799344 = sum of:
            0.05799344 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
              0.05799344 = score(doc=611,freq=2.0), product of:
                0.14989214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042803947 = queryNorm
                0.38690117 = fieldWeight in 611, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=611)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    22. 8.2009 12:54:24
  10. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01
    0.0057993443 = product of:
      0.02899672 = sum of:
        0.02899672 = product of:
          0.05799344 = sum of:
            0.05799344 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.05799344 = score(doc=2748,freq=2.0), product of:
                0.14989214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042803947 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    1. 2.2016 18:25:22
  11. Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.00
    0.004059541 = product of:
      0.020297704 = sum of:
        0.020297704 = product of:
          0.04059541 = sum of:
            0.04059541 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
              0.04059541 = score(doc=141,freq=2.0), product of:
                0.14989214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042803947 = queryNorm
                0.2708308 = fieldWeight in 141, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=141)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Pages
    S.1-22
  12. Dubin, D.: Dimensions and discriminability (1998) 0.00
    0.004059541 = product of:
      0.020297704 = sum of:
        0.020297704 = product of:
          0.04059541 = sum of:
            0.04059541 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
              0.04059541 = score(doc=2338,freq=2.0), product of:
                0.14989214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042803947 = queryNorm
                0.2708308 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    22. 9.1997 19:16:05
  13. Automatic classification research at OCLC (2002) 0.00
    0.004059541 = product of:
      0.020297704 = sum of:
        0.020297704 = product of:
          0.04059541 = sum of:
            0.04059541 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
              0.04059541 = score(doc=1563,freq=2.0), product of:
                0.14989214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042803947 = queryNorm
                0.2708308 = fieldWeight in 1563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1563)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    5. 5.2003 9:22:09
  14. Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.00
    0.004059541 = product of:
      0.020297704 = sum of:
        0.020297704 = product of:
          0.04059541 = sum of:
            0.04059541 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
              0.04059541 = score(doc=1673,freq=2.0), product of:
                0.14989214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042803947 = queryNorm
                0.2708308 = fieldWeight in 1673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1673)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    1. 8.1996 22:08:06
  15. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.00
    0.004059541 = product of:
      0.020297704 = sum of:
        0.020297704 = product of:
          0.04059541 = sum of:
            0.04059541 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
              0.04059541 = score(doc=5273,freq=2.0), product of:
                0.14989214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042803947 = queryNorm
                0.2708308 = fieldWeight in 5273, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5273)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    22. 7.2006 16:24:52
  16. Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.00
    0.004059541 = product of:
      0.020297704 = sum of:
        0.020297704 = product of:
          0.04059541 = sum of:
            0.04059541 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
              0.04059541 = score(doc=2560,freq=2.0), product of:
                0.14989214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042803947 = queryNorm
                0.2708308 = fieldWeight in 2560, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2560)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    22. 9.2008 18:31:54
  17. Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.00
    0.0034796062 = product of:
      0.017398031 = sum of:
        0.017398031 = product of:
          0.034796063 = sum of:
            0.034796063 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
              0.034796063 = score(doc=2760,freq=2.0), product of:
                0.14989214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042803947 = queryNorm
                0.23214069 = fieldWeight in 2760, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2760)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    22. 3.2009 19:11:54
  18. Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.00
    0.0034796062 = product of:
      0.017398031 = sum of:
        0.017398031 = product of:
          0.034796063 = sum of:
            0.034796063 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
              0.034796063 = score(doc=3051,freq=2.0), product of:
                0.14989214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042803947 = queryNorm
                0.23214069 = fieldWeight in 3051, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3051)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    22. 8.2009 19:51:28
  19. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.00
    0.0034796062 = product of:
      0.017398031 = sum of:
        0.017398031 = product of:
          0.034796063 = sum of:
            0.034796063 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
              0.034796063 = score(doc=690,freq=2.0), product of:
                0.14989214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042803947 = queryNorm
                0.23214069 = fieldWeight in 690, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=690)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    23. 3.2013 13:22:36
  20. Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.00
    0.0034796062 = product of:
      0.017398031 = sum of:
        0.017398031 = product of:
          0.034796063 = sum of:
            0.034796063 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
              0.034796063 = score(doc=2158,freq=2.0), product of:
                0.14989214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042803947 = queryNorm
                0.23214069 = fieldWeight in 2158, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2158)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    4. 8.2015 19:22:04