Search (60 results, page 1 of 3)

  • × theme_ss:"Automatisches Klassifizieren"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.20
    0.20075516 = product of:
      0.26767355 = sum of:
        0.064180925 = product of:
          0.19254276 = sum of:
            0.19254276 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.19254276 = score(doc=562,freq=2.0), product of:
                0.3425918 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.040409453 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.19254276 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.19254276 = score(doc=562,freq=2.0), product of:
            0.3425918 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.040409453 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.010949845 = product of:
          0.032849535 = sum of:
            0.032849535 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.032849535 = score(doc=562,freq=2.0), product of:
                0.14150701 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.040409453 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
      0.75 = coord(3/4)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.02
    0.01940544 = product of:
      0.03881088 = sum of:
        0.020561136 = product of:
          0.041122273 = sum of:
            0.041122273 = weight(_text_:online in 611) [ClassicSimilarity], result of:
              0.041122273 = score(doc=611,freq=2.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.33531237 = fieldWeight in 611, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.078125 = fieldNorm(doc=611)
          0.5 = coord(1/2)
        0.018249743 = product of:
          0.054749228 = sum of:
            0.054749228 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
              0.054749228 = score(doc=611,freq=2.0), product of:
                0.14150701 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.040409453 = queryNorm
                0.38690117 = fieldWeight in 611, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=611)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Date
    22. 8.2009 12:54:24
    Theme
    Klassifikationssysteme im Online-Retrieval
  3. Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01
    0.013583808 = product of:
      0.027167616 = sum of:
        0.014392796 = product of:
          0.028785592 = sum of:
            0.028785592 = weight(_text_:online in 1673) [ClassicSimilarity], result of:
              0.028785592 = score(doc=1673,freq=2.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.23471867 = fieldWeight in 1673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1673)
          0.5 = coord(1/2)
        0.01277482 = product of:
          0.03832446 = sum of:
            0.03832446 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
              0.03832446 = score(doc=1673,freq=2.0), product of:
                0.14150701 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.040409453 = queryNorm
                0.2708308 = fieldWeight in 1673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1673)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Date
    1. 8.1996 22:08:06
    Theme
    Klassifikationssysteme im Online-Retrieval
  4. Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015) 0.01
    0.009498649 = product of:
      0.018997299 = sum of:
        0.011631137 = product of:
          0.023262274 = sum of:
            0.023262274 = weight(_text_:online in 2301) [ClassicSimilarity], result of:
              0.023262274 = score(doc=2301,freq=4.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.18968134 = fieldWeight in 2301, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2301)
          0.5 = coord(1/2)
        0.0073661613 = product of:
          0.022098484 = sum of:
            0.022098484 = weight(_text_:29 in 2301) [ClassicSimilarity], result of:
              0.022098484 = score(doc=2301,freq=2.0), product of:
                0.14214782 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.040409453 = queryNorm
                0.15546128 = fieldWeight in 2301, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2301)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Analytico-synthetic and faceted classifications, such as Universal Decimal Classification (UDC) express content of documents with complex, pre-combined classification codes. Without classification authority control that would help manage and access structured notations, the use of UDC codes in searching and browsing is limited. Existing UDC parsing solutions are usually created for a particular database system or a specific task and are not widely applicable. The approach described in this paper provides a solution by which the analysis and interpretation of UDC notations would be stored into an intermediate format (in this case, in XML) by automatic means without any data or information loss. Due to its richness, the output file can be converted into different formats, such as standard mark-up and data exchange formats or simple lists of the recommended entry points of a UDC number. The program can also be used to create authority records containing complex UDC numbers which can be comprehensively analysed in order to be retrieved effectively. The Java program, as well as the corresponding schema definition it employs, is under continuous development. The current version of the interpreter software is now available online for testing purposes at the following web site: http://interpreter-eto.rhcloud.com. The future plan is to implement conversion methods for standard formats and to create standard online interfaces in order to make it possible to use the features of software as a service. This would result in the algorithm being able to be employed both in existing and future library systems to analyse UDC numbers without any significant programming effort.
    Source
    Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro
  5. Möller, G.: Automatic classification of the World Wide Web using Universal Decimal Classification (1999) 0.01
    0.008903234 = product of:
      0.035612937 = sum of:
        0.035612937 = product of:
          0.071225874 = sum of:
            0.071225874 = weight(_text_:online in 494) [ClassicSimilarity], result of:
              0.071225874 = score(doc=494,freq=6.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.5807781 = fieldWeight in 494, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.078125 = fieldNorm(doc=494)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Online information 99: 23rd International Online Information Meeting, Proceedings, London, 7-9 December 1999. Ed.: D. Raitt et al
    Theme
    Klassifikationssysteme im Online-Retrieval
  6. Ardö, A.; Koch, T.: Automatic classification applied to full-text Internet documents in a robot-generated subject index (1999) 0.01
    0.008723352 = product of:
      0.03489341 = sum of:
        0.03489341 = product of:
          0.06978682 = sum of:
            0.06978682 = weight(_text_:online in 382) [ClassicSimilarity], result of:
              0.06978682 = score(doc=382,freq=4.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.569044 = fieldWeight in 382, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.09375 = fieldNorm(doc=382)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Online information 99: 23rd International Online Information Meeting, Proceedings, London, 7-9 December 1999. Ed.: D. Raitt et al
  7. Panyr, J.: STEINADLER: ein Verfahren zur automatischen Deskribierung und zur automatischen thematischen Klassifikation (1978) 0.01
    0.0073661613 = product of:
      0.029464645 = sum of:
        0.029464645 = product of:
          0.088393934 = sum of:
            0.088393934 = weight(_text_:29 in 5169) [ClassicSimilarity], result of:
              0.088393934 = score(doc=5169,freq=2.0), product of:
                0.14214782 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.040409453 = queryNorm
                0.6218451 = fieldWeight in 5169, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.125 = fieldNorm(doc=5169)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Source
    Nachrichten für Dokumentation. 29(1978), S.92-96
  8. Wätjen, H.-J.: Automatisches Sammeln, Klassifizieren und Indexieren von wissenschaftlich relevanten Informationsressourcen im deutschen World Wide Web : das DFG-Projekt GERHARD (1998) 0.01
    0.0072694602 = product of:
      0.029077841 = sum of:
        0.029077841 = product of:
          0.058155682 = sum of:
            0.058155682 = weight(_text_:online in 3066) [ClassicSimilarity], result of:
              0.058155682 = score(doc=3066,freq=4.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.47420335 = fieldWeight in 3066, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3066)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Footnote
    Vortrag auf der 20. Online-Tagung der Deutschen Gesellschaft für Dokumentation, 5.-7.5.1998. Session 3: WWW-Suchmaschinen
    Theme
    Klassifikationssysteme im Online-Retrieval
  9. GERHARD : eine Spezialsuchmaschine für die Wissenschaft (1998) 0.01
    0.0061683413 = product of:
      0.024673365 = sum of:
        0.024673365 = product of:
          0.04934673 = sum of:
            0.04934673 = weight(_text_:online in 381) [ClassicSimilarity], result of:
              0.04934673 = score(doc=381,freq=2.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.40237486 = fieldWeight in 381, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.09375 = fieldNorm(doc=381)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Theme
    Klassifikationssysteme im Online-Retrieval
  10. Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01
    0.0054749227 = product of:
      0.02189969 = sum of:
        0.02189969 = product of:
          0.06569907 = sum of:
            0.06569907 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
              0.06569907 = score(doc=1046,freq=2.0), product of:
                0.14150701 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.040409453 = queryNorm
                0.46428138 = fieldWeight in 1046, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1046)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    5. 5.2003 14:17:22
  11. Wätjen, H.-J.; Diekmann, B.; Möller, G.; Carstensen, K.-U.: Bericht zum DFG-Projekt: GERHARD : German Harvest Automated Retrieval and Directory (1998) 0.01
    0.005140284 = product of:
      0.020561136 = sum of:
        0.020561136 = product of:
          0.041122273 = sum of:
            0.041122273 = weight(_text_:online in 3065) [ClassicSimilarity], result of:
              0.041122273 = score(doc=3065,freq=2.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.33531237 = fieldWeight in 3065, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3065)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Theme
    Klassifikationssysteme im Online-Retrieval
  12. Vizine-Goetz, D.: NetLab / OCLC collaboration seeks to improve Web searching (1999) 0.01
    0.005140284 = product of:
      0.020561136 = sum of:
        0.020561136 = product of:
          0.041122273 = sum of:
            0.041122273 = weight(_text_:online in 4180) [ClassicSimilarity], result of:
              0.041122273 = score(doc=4180,freq=2.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.33531237 = fieldWeight in 4180, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4180)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Theme
    Klassifikationssysteme im Online-Retrieval
  13. Oberhauser, O.: Automatisches Klassifizieren und Bibliothekskataloge (2005) 0.01
    0.005088622 = product of:
      0.020354489 = sum of:
        0.020354489 = product of:
          0.040708978 = sum of:
            0.040708978 = weight(_text_:online in 4099) [ClassicSimilarity], result of:
              0.040708978 = score(doc=4099,freq=4.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.33194235 = fieldWeight in 4099, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4099)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    In der bibliothekarischen Welt sind die Vorzüge einer klassifikatorischen Inhaltserschließung seit jeher wohlbekannt. Auch im Zeitalter der Online-Kataloge gibt es dafür keinen wirklichen Ersatz, da - kurz formuliert - ein stichwortbasiertes Retrieval alleine mit Problemen wie Ambiguität und Mehrsprachigkeit nicht fertig zu werden vermag. Zahlreiche Online-Kataloge weisen daher Notationen verschiedener Klassifikationssysteme auf; allerdings sind die darauf basierenden Abfragemöglichkeiten meist noch arg unterentwickelt. Viele Datensätze in OPACs sind aber überhaupt nicht sachlich erschlossen, sei es, dass sie aus retrospektiv konvertierten Nominalkatalogen stammen, sei es, dass ein Mangel an personellen Ressourcen ihre inhaltliche Erschließung verhindert hat. Angesichts großer Mengen solcher Datensätze liegt ein Interesse an automatischen Verfahren zur Sacherschließung durchaus nahe.
  14. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.00
    0.0045624357 = product of:
      0.018249743 = sum of:
        0.018249743 = product of:
          0.054749228 = sum of:
            0.054749228 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.054749228 = score(doc=2748,freq=2.0), product of:
                0.14150701 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.040409453 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    1. 2.2016 18:25:22
  15. Leroy, G.; Miller, T.; Rosemblat, G.; Browne, A.: ¬A balanced approach to health information evaluation : a vocabulary-based naïve Bayes classifier and readability formulas (2008) 0.00
    0.004361676 = product of:
      0.017446704 = sum of:
        0.017446704 = product of:
          0.03489341 = sum of:
            0.03489341 = weight(_text_:online in 1998) [ClassicSimilarity], result of:
              0.03489341 = score(doc=1998,freq=4.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.284522 = fieldWeight in 1998, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1998)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Since millions seek health information online, it is vital for this information to be comprehensible. Most studies use readability formulas, which ignore vocabulary, and conclude that online health information is too difficult. We developed a vocabularly-based, naïve Bayes classifier to distinguish between three difficulty levels in text. It proved 98% accurate in a 250-document evaluation. We compared our classifier with readability formulas for 90 new documents with different origins and asked representative human evaluators, an expert and a consumer, to judge each document. Average readability grade levels for educational and commercial pages was 10th grade or higher, too difficult according to current literature. In contrast, the classifier showed that 70-90% of these pages were written at an intermediate, appropriate level indicating that vocabulary usage is frequently appropriate in text considered too difficult by readability formula evaluations. The expert considered the pages more difficult for a consumer than the consumer did.
  16. Koch, T.: Nutzung von Klassifikationssystemen zur verbesserten Beschreibung, Organisation und Suche von Internetressourcen (1998) 0.00
    0.0041122274 = product of:
      0.01644891 = sum of:
        0.01644891 = product of:
          0.03289782 = sum of:
            0.03289782 = weight(_text_:online in 1030) [ClassicSimilarity], result of:
              0.03289782 = score(doc=1030,freq=2.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.2682499 = fieldWeight in 1030, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1030)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Theme
    Klassifikationssysteme im Online-Retrieval
  17. Savic, D.: Designing an expert system for classifying office documents (1994) 0.00
    0.0036830807 = product of:
      0.014732323 = sum of:
        0.014732323 = product of:
          0.044196967 = sum of:
            0.044196967 = weight(_text_:29 in 2655) [ClassicSimilarity], result of:
              0.044196967 = score(doc=2655,freq=2.0), product of:
                0.14214782 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.040409453 = queryNorm
                0.31092256 = fieldWeight in 2655, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2655)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Source
    Records management quarterly. 28(1994) no.3, S.20-29
  18. Borodin, Y.; Polishchuk, V.; Mahmud, J.; Ramakrishnan, I.V.; Stent, A.: Live and learn from mistakes : a lightweight system for document classification (2013) 0.00
    0.0036347301 = product of:
      0.0145389205 = sum of:
        0.0145389205 = product of:
          0.029077841 = sum of:
            0.029077841 = weight(_text_:online in 2722) [ClassicSimilarity], result of:
              0.029077841 = score(doc=2722,freq=4.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.23710167 = fieldWeight in 2722, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2722)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    We present a Life-Long Learning from Mistakes (3LM) algorithm for document classification, which could be used in various scenarios such as spam filtering, blog classification, and web resource categorization. We extend the ideas of online clustering and batch-mode centroid-based classification to online learning with negative feedback. The 3LM is a competitive learning algorithm, which avoids over-smoothing, characteristic of the centroid-based classifiers, by using a different class representative, which we call clusterhead. The clusterheads competing for vector-space dominance are drawn toward misclassified documents, eventually bringing the model to a "balanced state" for a fixed distribution of documents. Subsequently, the clusterheads oscillate between the misclassified documents, heuristically minimizing the rate of misclassifications, an NP-complete problem. Further, the 3LM algorithm prevents over-fitting by "leashing" the clusterheads to their respective centroids. A clusterhead provably converges if its class can be separated by a hyper-plane from all other classes. Lifelong learning with fixed learning rate allows 3LM to adapt to possibly changing distribution of the data and continually learn and unlearn document classes. We report on our experiments, which demonstrate high accuracy of document classification on Reuters21578, OHSUMED, and TREC07p-spam datasets. The 3LM algorithm did not show over-fitting, while consistently outperforming centroid-based, Naïve Bayes, C4.5, AdaBoost, kNN, and SVM whose accuracy had been reported on the same three corpora.
  19. Wätjen, H.-J.: GERHARD : Automatisches Sammeln, Klassifizieren und Indexieren von wissenschaftlich relevanten Informationsressourcen im deutschen World Wide Web (1998) 0.00
    0.003598199 = product of:
      0.014392796 = sum of:
        0.014392796 = product of:
          0.028785592 = sum of:
            0.028785592 = weight(_text_:online in 3064) [ClassicSimilarity], result of:
              0.028785592 = score(doc=3064,freq=2.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.23471867 = fieldWeight in 3064, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3064)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Theme
    Klassifikationssysteme im Online-Retrieval
  20. Cui, H.; Heidorn, P.B.; Zhang, H.: ¬An approach to automatic classification of text for information retrieval (2002) 0.00
    0.003598199 = product of:
      0.014392796 = sum of:
        0.014392796 = product of:
          0.028785592 = sum of:
            0.028785592 = weight(_text_:online in 174) [ClassicSimilarity], result of:
              0.028785592 = score(doc=174,freq=2.0), product of:
                0.1226387 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.040409453 = queryNorm
                0.23471867 = fieldWeight in 174, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=174)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    In this paper, we explore an approach to make better use of semi-structured documents in information retrieval in the domain of biology. Using machine learning techniques, we make those inherent structures explicit by XML markups. This marking up has great potentials in improving task performance in specimen identification and the usability of online flora and fauna.

Years

Languages

  • e 45
  • d 15

Types

  • a 53
  • el 9
  • x 2
  • r 1
  • More… Less…