Search (24 results, page 1 of 2)

  • × theme_ss:"Automatisches Klassifizieren"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.05
    0.047204413 = product of:
      0.094408825 = sum of:
        0.07517143 = product of:
          0.2255143 = sum of:
            0.2255143 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.2255143 = score(doc=562,freq=2.0), product of:
                0.4012581 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.047329273 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.019237388 = product of:
          0.038474776 = sum of:
            0.038474776 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.038474776 = score(doc=562,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Pfeffer, M.: Automatische Vergabe von RVK-Notationen anhand von bibliografischen Daten mittels fallbasiertem Schließen (2007) 0.03
    0.034011394 = product of:
      0.13604558 = sum of:
        0.13604558 = weight(_text_:master in 558) [ClassicSimilarity], result of:
          0.13604558 = score(doc=558,freq=2.0), product of:
            0.3116585 = queryWeight, product of:
              6.5848994 = idf(docFreq=165, maxDocs=44218)
              0.047329273 = queryNorm
            0.4365213 = fieldWeight in 558, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5848994 = idf(docFreq=165, maxDocs=44218)
              0.046875 = fieldNorm(doc=558)
      0.25 = coord(1/4)
    
    Content
    Masterarbeit im Rahmen des postgradualen Fernstudiums Master of Arts (Library and Information Science)
  3. Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.02
    0.01731034 = product of:
      0.06924136 = sum of:
        0.06924136 = weight(_text_:reference in 2564) [ClassicSimilarity], result of:
          0.06924136 = score(doc=2564,freq=2.0), product of:
            0.19255297 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.047329273 = queryNorm
            0.35959643 = fieldWeight in 2564, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.0625 = fieldNorm(doc=2564)
      0.25 = coord(1/4)
    
    Abstract
    The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.
  4. Wartena, C.; Sommer, M.: Automatic classification of scientific records using the German Subject Heading Authority File (SWD) (2012) 0.01
    0.013279125 = product of:
      0.0531165 = sum of:
        0.0531165 = product of:
          0.106233 = sum of:
            0.106233 = weight(_text_:file in 472) [ClassicSimilarity], result of:
              0.106233 = score(doc=472,freq=4.0), product of:
                0.25368783 = queryWeight, product of:
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.047329273 = queryNorm
                0.41875482 = fieldWeight in 472, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=472)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.
  5. Denoyer, L.; Gallinari, P.: Bayesian network model for semi-structured document classification (2004) 0.01
    0.012982754 = product of:
      0.051931016 = sum of:
        0.051931016 = weight(_text_:reference in 995) [ClassicSimilarity], result of:
          0.051931016 = score(doc=995,freq=2.0), product of:
            0.19255297 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.047329273 = queryNorm
            0.2696973 = fieldWeight in 995, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.046875 = fieldNorm(doc=995)
      0.25 = coord(1/4)
    
    Abstract
    Recently, a new community has started to emerge around the development of new information research methods for searching and analyzing semi-structured and XML like documents. The goal is to handle both content and structural information, and to deal with different types of information content (text, image, etc.). We consider here the task of structured document classification. We propose a generative model able to handle both structure and content which is based on Bayesian networks. We then show how to transform this generative model into a discriminant classifier using the method of Fisher kernel. The model is then extended for dealing with different types of content information (here text and images). The model was tested on three databases: the classical webKB corpus composed of HTML pages, the new INEX corpus which has become a reference in the field of ad-hoc retrieval for XML documents, and a multimedia corpus of Web pages.
  6. Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01
    0.009618694 = product of:
      0.038474776 = sum of:
        0.038474776 = product of:
          0.07694955 = sum of:
            0.07694955 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
              0.07694955 = score(doc=1046,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.46428138 = fieldWeight in 1046, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1046)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    5. 5.2003 14:17:22
  7. Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.01
    0.009389759 = product of:
      0.037559036 = sum of:
        0.037559036 = product of:
          0.07511807 = sum of:
            0.07511807 = weight(_text_:file in 977) [ClassicSimilarity], result of:
              0.07511807 = score(doc=977,freq=2.0), product of:
                0.25368783 = queryWeight, product of:
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.047329273 = queryNorm
                0.29610437 = fieldWeight in 977, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=977)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.
  8. Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01
    0.008015579 = product of:
      0.032062314 = sum of:
        0.032062314 = product of:
          0.06412463 = sum of:
            0.06412463 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
              0.06412463 = score(doc=611,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.38690117 = fieldWeight in 611, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=611)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 8.2009 12:54:24
  9. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01
    0.008015579 = product of:
      0.032062314 = sum of:
        0.032062314 = product of:
          0.06412463 = sum of:
            0.06412463 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.06412463 = score(doc=2748,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    1. 2.2016 18:25:22
  10. Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015) 0.01
    0.007511807 = product of:
      0.030047229 = sum of:
        0.030047229 = product of:
          0.060094457 = sum of:
            0.060094457 = weight(_text_:file in 2301) [ClassicSimilarity], result of:
              0.060094457 = score(doc=2301,freq=2.0), product of:
                0.25368783 = queryWeight, product of:
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.047329273 = queryNorm
                0.23688349 = fieldWeight in 2301, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2301)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Analytico-synthetic and faceted classifications, such as Universal Decimal Classification (UDC) express content of documents with complex, pre-combined classification codes. Without classification authority control that would help manage and access structured notations, the use of UDC codes in searching and browsing is limited. Existing UDC parsing solutions are usually created for a particular database system or a specific task and are not widely applicable. The approach described in this paper provides a solution by which the analysis and interpretation of UDC notations would be stored into an intermediate format (in this case, in XML) by automatic means without any data or information loss. Due to its richness, the output file can be converted into different formats, such as standard mark-up and data exchange formats or simple lists of the recommended entry points of a UDC number. The program can also be used to create authority records containing complex UDC numbers which can be comprehensively analysed in order to be retrieved effectively. The Java program, as well as the corresponding schema definition it employs, is under continuous development. The current version of the interpreter software is now available online for testing purposes at the following web site: http://interpreter-eto.rhcloud.com. The future plan is to implement conversion methods for standard formats and to create standard online interfaces in order to make it possible to use the features of software as a service. This would result in the algorithm being able to be employed both in existing and future library systems to analyse UDC numbers without any significant programming effort.
  11. Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.01
    0.005610905 = product of:
      0.02244362 = sum of:
        0.02244362 = product of:
          0.04488724 = sum of:
            0.04488724 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
              0.04488724 = score(doc=141,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.2708308 = fieldWeight in 141, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=141)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Pages
    S.1-22
  12. Dubin, D.: Dimensions and discriminability (1998) 0.01
    0.005610905 = product of:
      0.02244362 = sum of:
        0.02244362 = product of:
          0.04488724 = sum of:
            0.04488724 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
              0.04488724 = score(doc=2338,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.2708308 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 9.1997 19:16:05
  13. Automatic classification research at OCLC (2002) 0.01
    0.005610905 = product of:
      0.02244362 = sum of:
        0.02244362 = product of:
          0.04488724 = sum of:
            0.04488724 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
              0.04488724 = score(doc=1563,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.2708308 = fieldWeight in 1563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1563)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    5. 5.2003 9:22:09
  14. Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01
    0.005610905 = product of:
      0.02244362 = sum of:
        0.02244362 = product of:
          0.04488724 = sum of:
            0.04488724 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
              0.04488724 = score(doc=1673,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.2708308 = fieldWeight in 1673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1673)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    1. 8.1996 22:08:06
  15. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.01
    0.005610905 = product of:
      0.02244362 = sum of:
        0.02244362 = product of:
          0.04488724 = sum of:
            0.04488724 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
              0.04488724 = score(doc=5273,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.2708308 = fieldWeight in 5273, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5273)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 7.2006 16:24:52
  16. Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01
    0.005610905 = product of:
      0.02244362 = sum of:
        0.02244362 = product of:
          0.04488724 = sum of:
            0.04488724 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
              0.04488724 = score(doc=2560,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.2708308 = fieldWeight in 2560, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2560)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 9.2008 18:31:54
  17. Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.00
    0.004809347 = product of:
      0.019237388 = sum of:
        0.019237388 = product of:
          0.038474776 = sum of:
            0.038474776 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
              0.038474776 = score(doc=2760,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.23214069 = fieldWeight in 2760, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2760)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 3.2009 19:11:54
  18. Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.00
    0.004809347 = product of:
      0.019237388 = sum of:
        0.019237388 = product of:
          0.038474776 = sum of:
            0.038474776 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
              0.038474776 = score(doc=3051,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.23214069 = fieldWeight in 3051, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3051)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 8.2009 19:51:28
  19. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.00
    0.004809347 = product of:
      0.019237388 = sum of:
        0.019237388 = product of:
          0.038474776 = sum of:
            0.038474776 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
              0.038474776 = score(doc=690,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.23214069 = fieldWeight in 690, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=690)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    23. 3.2013 13:22:36
  20. Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.00
    0.004809347 = product of:
      0.019237388 = sum of:
        0.019237388 = product of:
          0.038474776 = sum of:
            0.038474776 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
              0.038474776 = score(doc=2158,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.23214069 = fieldWeight in 2158, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2158)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    4. 8.2015 19:22:04