Search (30 results, page 1 of 2)

  • × theme_ss:"Automatisches Klassifizieren"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.10
    0.10312565 = sum of:
      0.08211206 = product of:
        0.24633618 = sum of:
          0.24633618 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
            0.24633618 = score(doc=562,freq=2.0), product of:
              0.43830654 = queryWeight, product of:
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.051699217 = queryNorm
              0.56201804 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.33333334 = coord(1/3)
      0.02101359 = product of:
        0.04202718 = sum of:
          0.04202718 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
            0.04202718 = score(doc=562,freq=2.0), product of:
              0.18104185 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051699217 = queryNorm
              0.23214069 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.5 = coord(1/2)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Vizine-Goetz, D.: NetLab / OCLC collaboration seeks to improve Web searching (1999) 0.02
    0.02336838 = product of:
      0.04673676 = sum of:
        0.04673676 = product of:
          0.09347352 = sum of:
            0.09347352 = weight(_text_:searching in 4180) [ClassicSimilarity], result of:
              0.09347352 = score(doc=4180,freq=2.0), product of:
                0.2091384 = queryWeight, product of:
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.051699217 = queryNorm
                0.44694576 = fieldWeight in 4180, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4180)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  3. Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.02
    0.02101359 = product of:
      0.04202718 = sum of:
        0.04202718 = product of:
          0.08405436 = sum of:
            0.08405436 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
              0.08405436 = score(doc=1046,freq=2.0), product of:
                0.18104185 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051699217 = queryNorm
                0.46428138 = fieldWeight in 1046, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1046)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    5. 5.2003 14:17:22
  4. Golub, K.; Lykke, M.: Automated classification of web pages in hierarchical browsing (2009) 0.02
    0.020237612 = product of:
      0.040475223 = sum of:
        0.040475223 = product of:
          0.08095045 = sum of:
            0.08095045 = weight(_text_:searching in 3614) [ClassicSimilarity], result of:
              0.08095045 = score(doc=3614,freq=6.0), product of:
                0.2091384 = queryWeight, product of:
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.051699217 = queryNorm
                0.38706642 = fieldWeight in 3614, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3614)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Purpose - The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme. Design/methodology/approach - A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes. Findings - The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness. Research limitations/implications - Further research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation. Practical implications - Improvements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated. Originality/value - A user-based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.
  5. Reiner, U.: DDC-based search in the data of the German National Bibliography (2008) 0.02
    0.019828727 = product of:
      0.039657455 = sum of:
        0.039657455 = product of:
          0.07931491 = sum of:
            0.07931491 = weight(_text_:searching in 2166) [ClassicSimilarity], result of:
              0.07931491 = score(doc=2166,freq=4.0), product of:
                0.2091384 = queryWeight, product of:
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.051699217 = queryNorm
                0.37924606 = fieldWeight in 2166, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2166)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In 2004, the German National Library began to classify title records of the German National Bibliography according to subject groups based on the divisions of the Dewey Decimal Classification (DDC). Since 2006, all titles of the main series of the German National Bibliography are classified in strict compliance with the DDC. On this basis, an enhanced DDC-based search can be realized - e.g., searching the data of the German National Bibliography for title records using number components of synthesized classification numbers or searching for DDC numbers using unclassified title records. This paper gives an account of the current research and development of the DDC-based search. The work is conducted in the VZG project Colibri that focuses on the automatic analysis of DDC-synthesized numbers and the automatic classification of bibliographic title records.
  6. Choi, B.; Peng, X.: Dynamic and hierarchical classification of Web pages (2004) 0.02
    0.019828727 = product of:
      0.039657455 = sum of:
        0.039657455 = product of:
          0.07931491 = sum of:
            0.07931491 = weight(_text_:searching in 2555) [ClassicSimilarity], result of:
              0.07931491 = score(doc=2555,freq=4.0), product of:
                0.2091384 = queryWeight, product of:
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.051699217 = queryNorm
                0.37924606 = fieldWeight in 2555, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2555)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet. Although many automatic classification systems have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of Web pages being added into the systems. They also require searching through all existing categories to make any classification. This article proposes a dynamic and hierarchical classification system that is capable of adding new categories as required, organising the Web pages into a tree structure, and classifying Web pages by searching through only one path of the tree. The proposed single-path search technique reduces the search complexity from (n) to (log(n)). Test results show that the system improves the accuracy of classification by 6 percent in comparison to related systems. The dynamic-category expansion technique also achieves satisfying results for adding new categories into the system as required.
  7. Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.02
    0.017511327 = product of:
      0.035022654 = sum of:
        0.035022654 = product of:
          0.07004531 = sum of:
            0.07004531 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
              0.07004531 = score(doc=611,freq=2.0), product of:
                0.18104185 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051699217 = queryNorm
                0.38690117 = fieldWeight in 611, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=611)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 8.2009 12:54:24
  8. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.02
    0.017511327 = product of:
      0.035022654 = sum of:
        0.035022654 = product of:
          0.07004531 = sum of:
            0.07004531 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.07004531 = score(doc=2748,freq=2.0), product of:
                0.18104185 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051699217 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    1. 2.2016 18:25:22
  9. Rose, J.R.; Gasteiger, J.: HORACE: an automatic system for the hierarchical classification of chemical reactions (1994) 0.02
    0.016357865 = product of:
      0.03271573 = sum of:
        0.03271573 = product of:
          0.06543146 = sum of:
            0.06543146 = weight(_text_:searching in 7696) [ClassicSimilarity], result of:
              0.06543146 = score(doc=7696,freq=2.0), product of:
                0.2091384 = queryWeight, product of:
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.051699217 = queryNorm
                0.31286204 = fieldWeight in 7696, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=7696)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Describes an automatic classification system for classifying chemical reactions. A detailed study of the classification of chemical reactions, based on topological and physicochemical features, is followed by an analysis of the hierarchical classification produced by the HORACE algorithm (Hierarchical Organization of Reactions through Attribute and Condition Eduction), which combines both approaches in a synergistic manner. The searching and updating of reaction hierarchies is demonstrated with the hierarchies produced for 2 data sets by the HORACE algorithm. Shows that reaction hierarchies provide an efficient access to reaction information and indicate the main reaction types for a given reaction scheme, define the scope of a reaction type, enable searchers to find unusual reactions, and can help in locating the reactions most relevant for a given problem
  10. Koch, T.; Vizine-Goetz, D.: Automatic classification and content navigation support for Web services : DESIRE II cooperates with OCLC (1998) 0.02
    0.016357865 = product of:
      0.03271573 = sum of:
        0.03271573 = product of:
          0.06543146 = sum of:
            0.06543146 = weight(_text_:searching in 1568) [ClassicSimilarity], result of:
              0.06543146 = score(doc=1568,freq=2.0), product of:
                0.2091384 = queryWeight, product of:
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.051699217 = queryNorm
                0.31286204 = fieldWeight in 1568, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1568)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Emerging standards in knowledge representation and organization are preparing the way for distributed vocabulary support in Internet search services. NetLab researchers are exploring several innovative solutions for searching and browsing in the subject-based Internet gateway, Electronic Engineering Library, Sweden (EELS). The implementation of the EELS service is described, specifically, the generation of the robot-gathered database 'All' engineering and the automated application of the Ei thesaurus and classification scheme. NetLab and OCLC researchers are collaborating to investigate advanced solutions to automated classification in the DESIRE II context. A plan for furthering the development of distributed vocabulary support in Internet search services is offered.
  11. Denoyer, L.; Gallinari, P.: Bayesian network model for semi-structured document classification (2004) 0.01
    0.014021028 = product of:
      0.028042056 = sum of:
        0.028042056 = product of:
          0.05608411 = sum of:
            0.05608411 = weight(_text_:searching in 995) [ClassicSimilarity], result of:
              0.05608411 = score(doc=995,freq=2.0), product of:
                0.2091384 = queryWeight, product of:
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.051699217 = queryNorm
                0.26816747 = fieldWeight in 995, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.046875 = fieldNorm(doc=995)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Recently, a new community has started to emerge around the development of new information research methods for searching and analyzing semi-structured and XML like documents. The goal is to handle both content and structural information, and to deal with different types of information content (text, image, etc.). We consider here the task of structured document classification. We propose a generative model able to handle both structure and content which is based on Bayesian networks. We then show how to transform this generative model into a discriminant classifier using the method of Fisher kernel. The model is then extended for dealing with different types of content information (here text and images). The model was tested on three databases: the classical webKB corpus composed of HTML pages, the new INEX corpus which has become a reference in the field of ad-hoc retrieval for XML documents, and a multimedia corpus of Web pages.
  12. Sojka, P.; Lee, M.; Rehurek, R.; Hatlapatka, R.; Kucbel, M.; Bouche, T.; Goutorbe, C.; Anghelache, R.; Wojciechowski, K.: Toolset for entity and semantic associations : Final Release (2013) 0.01
    0.014021028 = product of:
      0.028042056 = sum of:
        0.028042056 = product of:
          0.05608411 = sum of:
            0.05608411 = weight(_text_:searching in 1057) [ClassicSimilarity], result of:
              0.05608411 = score(doc=1057,freq=2.0), product of:
                0.2091384 = queryWeight, product of:
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.051699217 = queryNorm
                0.26816747 = fieldWeight in 1057, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1057)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this document we describe the final release of the toolset for entity and semantic associations, integrating two versions (language dependent and language independent) of Unsupervised Document Similarity implemented by MU (using gensim tool) and Citation Indexing, Resolution and Matching (UJF/CMD). We give a brief description of tools, the rationale behind decisions made, and provide elementary evaluation. Tools are integrated in the main project result, EuDML website, and they deliver the needed functionality for exploratory searching and browsing the collected documents. EuDML users and content providers thus benefit from millions of algorithmically generated similarity and citation links, developed using state of the art machine learning and matching methods.
  13. Wu, M.; Liu, Y.-H.; Brownlee, R.; Zhang, X.: Evaluating utility and automatic classification of subject metadata from Research Data Australia (2021) 0.01
    0.014021028 = product of:
      0.028042056 = sum of:
        0.028042056 = product of:
          0.05608411 = sum of:
            0.05608411 = weight(_text_:searching in 453) [ClassicSimilarity], result of:
              0.05608411 = score(doc=453,freq=2.0), product of:
                0.2091384 = queryWeight, product of:
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.051699217 = queryNorm
                0.26816747 = fieldWeight in 453, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.046875 = fieldNorm(doc=453)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper, we present a case study of how well subject metadata (comprising headings from an international classification scheme) has been deployed in a national data catalogue, and how often data seekers use subject metadata when searching for data. Through an analysis of user search behaviour as recorded in search logs, we find evidence that users utilise the subject metadata for data discovery. Since approximately half of the records ingested by the catalogue did not include subject metadata at the time of harvest, we experimented with automatic subject classification approaches in order to enrich these records and to provide additional support for user search and data discovery. Our results show that automatic methods work well for well represented categories of subject metadata, and these categories tend to have features that can distinguish themselves from the other categories. Our findings raise implications for data catalogue providers; they should invest more effort to enhance the quality of data records by providing an adequate description of these records for under-represented subject categories.
  14. Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.01
    0.012257928 = product of:
      0.024515856 = sum of:
        0.024515856 = product of:
          0.049031712 = sum of:
            0.049031712 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
              0.049031712 = score(doc=141,freq=2.0), product of:
                0.18104185 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051699217 = queryNorm
                0.2708308 = fieldWeight in 141, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=141)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Pages
    S.1-22
  15. Dubin, D.: Dimensions and discriminability (1998) 0.01
    0.012257928 = product of:
      0.024515856 = sum of:
        0.024515856 = product of:
          0.049031712 = sum of:
            0.049031712 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
              0.049031712 = score(doc=2338,freq=2.0), product of:
                0.18104185 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051699217 = queryNorm
                0.2708308 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 9.1997 19:16:05
  16. Automatic classification research at OCLC (2002) 0.01
    0.012257928 = product of:
      0.024515856 = sum of:
        0.024515856 = product of:
          0.049031712 = sum of:
            0.049031712 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
              0.049031712 = score(doc=1563,freq=2.0), product of:
                0.18104185 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051699217 = queryNorm
                0.2708308 = fieldWeight in 1563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1563)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    5. 5.2003 9:22:09
  17. Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01
    0.012257928 = product of:
      0.024515856 = sum of:
        0.024515856 = product of:
          0.049031712 = sum of:
            0.049031712 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
              0.049031712 = score(doc=1673,freq=2.0), product of:
                0.18104185 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051699217 = queryNorm
                0.2708308 = fieldWeight in 1673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1673)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    1. 8.1996 22:08:06
  18. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.01
    0.012257928 = product of:
      0.024515856 = sum of:
        0.024515856 = product of:
          0.049031712 = sum of:
            0.049031712 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
              0.049031712 = score(doc=5273,freq=2.0), product of:
                0.18104185 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051699217 = queryNorm
                0.2708308 = fieldWeight in 5273, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5273)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 7.2006 16:24:52
  19. Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01
    0.012257928 = product of:
      0.024515856 = sum of:
        0.024515856 = product of:
          0.049031712 = sum of:
            0.049031712 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
              0.049031712 = score(doc=2560,freq=2.0), product of:
                0.18104185 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051699217 = queryNorm
                0.2708308 = fieldWeight in 2560, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2560)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 9.2008 18:31:54
  20. Adams, K.C.: Word wranglers : Automatic classification tools transform enterprise documents from "bags of words" into knowledge resources (2003) 0.01
    0.01168419 = product of:
      0.02336838 = sum of:
        0.02336838 = product of:
          0.04673676 = sum of:
            0.04673676 = weight(_text_:searching in 1665) [ClassicSimilarity], result of:
              0.04673676 = score(doc=1665,freq=2.0), product of:
                0.2091384 = queryWeight, product of:
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.051699217 = queryNorm
                0.22347288 = fieldWeight in 1665, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1665)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Taxonomies are an important part of any knowledge management (KM) system, and automatic classification software is emerging as a "killer app" for consumer and enterprise portals. A number of companies such as Inxight Software , Mohomine, Metacode, and others claim to interpret the semantic content of any textual document and automatically classify text on the fly. The promise that software could automatically produce a Yahoo-style directory is a siren call not many IT managers are able to resist. KM needs have grown more complex due to the increasing amount of digital information, the declining effectiveness of keyword searching, and heterogeneous document formats in corporate databases. This environment requires innovative KM tools, and automatic classification technology is an example of this new kind of software. These products can be divided into three categories according to their underlying technology - rules-based, catalog-by-example, and statistical clustering. Evolving trends in this market include framing classification as a cyborg (computer- and human-based) activity and the increasing use of extensible markup language (XML) and support vector machine (SVM) technology. In this article, we'll survey the rapidly changing automatic classification software market and examine the features and capabilities of leading classification products.