Document (#39850)

Scherer Auberson, K.
Counteracting concept drift in natural language classifiers : proposal for an automated method
Chur : Hochschule für Technik und Wirtschaft / Arbeitsbereich Informationswissenschaft
VIII, 88 S
Churer Schriften zur Informationswissenschaft / Arbeitsbereich Informationswissenschaft; Schrift 98
Natural Language Classifier helfen Unternehmen zunehmend dabei die Flut von Textdaten zu überwinden. Aber diese Classifier, einmal trainiert, verlieren mit der Zeit ihre Nützlichkeit. Sie bleiben statisch, aber die zugrundeliegende Domäne der Textdaten verändert sich: Ihre Genauigkeit nimmt aufgrund eines Phänomens ab, das als Konzeptdrift bekannt ist. Die Frage ist ob Konzeptdrift durch die Ausgabe eines Classifiers zuverlässig erkannt werden kann, und falls ja: ist es möglich dem durch nachtrainieren des Classifiers entgegenzuwirken. Es wird eine System-Implementierung mittels Proof-of-Concept vorgestellt, bei der das Konfidenzmass des Classifiers zur Erkennung von Konzeptdrift verwendet wird. Der Classifier wird dann iterativ neu trainiert, indem er Stichproben mit niedrigem Konfidenzmass auswählt, sie korrigiert und im Trainingsset der nächsten Iteration verwendet. Die Leistung des Classifiers wird über die Zeit gemessen, und die Leistung des Systems beobachtet. Basierend darauf werden schließlich Empfehlungen gegeben, die sich bei der Implementierung solcher Systeme als nützlich erweisen können.
Diese Publikation entstand im Rahmen einer Thesis zum Master of Science FHO in Business Administration, Major Information and Data Management.
Vgl. unter:

Similar documents (author)

  1. Scherer, A.: Neuronale Netze : Grundlagen und Anwendungen (1995) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:scherer in 1898) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 1898, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=1898)
  2. Scherer, H.: Zwerge auf den Schultern von Riesen : das Schachbuch Alfons' des Weisen: Beispiel früher Fachkommunikation (1996) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:scherer in 3526) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 3526, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=3526)
  3. Scherer, A.: Intranet : Kommunikation im Unternehmen (1998) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:scherer in 4906) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 4906, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=4906)
  4. Scherer, B.: Automatische Indexierung und ihre Anwendung im DFG-Projekt "Gemeinsames Portal für Bibliotheken, Archive und Museen (BAM)" (2003) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:scherer in 4283) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 4283, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=4283)
  5. Scherer, B.: ¬Die Pandemie ist kein Überfall von Außerirdischen (2020) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:scherer in 5706) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 5706, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=5706)

Similar documents (content)

  1. Gauch, S.; Chandramouli, A.; Ranganathan, S.: Training a hierarchical classifier using inter document relationships (2009) 0.10
    0.09673552 = sum of:
      0.09673552 = product of:
        0.8061294 = sum of:
          0.052065164 = weight(abstract_txt:concept in 2697) [ClassicSimilarity], result of:
            0.052065164 = score(doc=2697,freq=4.0), product of:
              0.07395853 = queryWeight, product of:
                1.0470693 = boost
                4.505458 = idf(docFreq=1327, maxDocs=44218)
                0.015677394 = queryNorm
              0.7039778 = fieldWeight in 2697, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.505458 = idf(docFreq=1327, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.22939256 = weight(abstract_txt:classifier in 2697) [ClassicSimilarity], result of:
            0.22939256 = score(doc=2697,freq=2.0), product of:
              0.28667074 = queryWeight, product of:
                2.5247514 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.015677394 = queryNorm
              0.8001952 = fieldWeight in 2697, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.5246717 = weight(abstract_txt:classifiers in 2697) [ClassicSimilarity], result of:
            0.5246717 = score(doc=2697,freq=3.0), product of:
              0.51543605 = queryWeight, product of:
                4.3705764 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.015677394 = queryNorm
              1.0179181 = fieldWeight in 2697, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
        0.12 = coord(3/25)
  2. Fabian, C.; Haller, K.: ¬Der Image-Katalog als alternatives Modell der Konversion : Die Konversion des Alphabetischen Katalogs 1953-1981 der Bayerischen Staatsbibliothek (1998) 0.08
    0.07516226 = sum of:
      0.07516226 = product of:
        0.46976414 = sum of:
          0.03858331 = weight(abstract_txt:eines in 865) [ClassicSimilarity], result of:
            0.03858331 = score(doc=865,freq=2.0), product of:
              0.07630736 = queryWeight, product of:
                1.0635662 = boost
                4.5764427 = idf(docFreq=1236, maxDocs=44218)
                0.015677394 = queryNorm
              0.50563025 = fieldWeight in 865, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5764427 = idf(docFreq=1236, maxDocs=44218)
                0.078125 = fieldNorm(doc=865)
          0.027384138 = weight(abstract_txt:aber in 865) [ClassicSimilarity], result of:
            0.027384138 = score(doc=865,freq=1.0), product of:
              0.07649672 = queryWeight, product of:
                1.064885 = boost
                4.5821176 = idf(docFreq=1229, maxDocs=44218)
                0.015677394 = queryNorm
              0.35797793 = fieldWeight in 865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5821176 = idf(docFreq=1229, maxDocs=44218)
                0.078125 = fieldNorm(doc=865)
          0.030581018 = weight(abstract_txt:wird in 865) [ClassicSimilarity], result of:
            0.030581018 = score(doc=865,freq=1.0), product of:
              0.103742026 = queryWeight, product of:
                1.7537742 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.015677394 = queryNorm
              0.29477945 = fieldWeight in 865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.078125 = fieldNorm(doc=865)
          0.37321568 = weight(abstract_txt:textdaten in 865) [ClassicSimilarity], result of:
            0.37321568 = score(doc=865,freq=2.0), product of:
              0.346423 = queryWeight, product of:
                2.2661293 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.015677394 = queryNorm
              1.077341 = fieldWeight in 865, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.078125 = fieldNorm(doc=865)
        0.16 = coord(4/25)
  3. Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.06
    0.06216899 = sum of:
      0.06216899 = product of:
        0.77711236 = sum of:
          0.18351404 = weight(abstract_txt:classifier in 1107) [ClassicSimilarity], result of:
            0.18351404 = score(doc=1107,freq=2.0), product of:
              0.28667074 = queryWeight, product of:
                2.5247514 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.015677394 = queryNorm
              0.64015615 = fieldWeight in 1107, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
          0.5935983 = weight(abstract_txt:classifiers in 1107) [ClassicSimilarity], result of:
            0.5935983 = score(doc=1107,freq=6.0), product of:
              0.51543605 = queryWeight, product of:
                4.3705764 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.015677394 = queryNorm
              1.1516429 = fieldWeight in 1107, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
        0.08 = coord(2/25)
  4. Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen : Beiträge zur GLDV Tagung 2005 in Bonn (2005) 0.06
    0.06124414 = sum of:
      0.06124414 = product of:
        0.3827759 = sum of:
          0.03286097 = weight(abstract_txt:aber in 3578) [ClassicSimilarity], result of:
            0.03286097 = score(doc=3578,freq=1.0), product of:
              0.07649672 = queryWeight, product of:
                1.064885 = boost
                4.5821176 = idf(docFreq=1229, maxDocs=44218)
                0.015677394 = queryNorm
              0.42957354 = fieldWeight in 3578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5821176 = idf(docFreq=1229, maxDocs=44218)
                0.09375 = fieldNorm(doc=3578)
          0.038561556 = weight(abstract_txt:ihre in 3578) [ClassicSimilarity], result of:
            0.038561556 = score(doc=3578,freq=1.0), product of:
              0.08510577 = queryWeight, product of:
                1.1232096 = boost
                4.8330836 = idf(docFreq=956, maxDocs=44218)
                0.015677394 = queryNorm
              0.45310158 = fieldWeight in 3578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8330836 = idf(docFreq=956, maxDocs=44218)
                0.09375 = fieldNorm(doc=3578)
          0.03669722 = weight(abstract_txt:wird in 3578) [ClassicSimilarity], result of:
            0.03669722 = score(doc=3578,freq=1.0), product of:
              0.103742026 = queryWeight, product of:
                1.7537742 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.015677394 = queryNorm
              0.35373533 = fieldWeight in 3578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.09375 = fieldNorm(doc=3578)
          0.27465615 = weight(abstract_txt:trainiert in 3578) [ClassicSimilarity], result of:
            0.27465615 = score(doc=3578,freq=1.0), product of:
              0.31505194 = queryWeight, product of:
                2.1610878 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.015677394 = queryNorm
              0.8717805 = fieldWeight in 3578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.09375 = fieldNorm(doc=3578)
        0.16 = coord(4/25)
  5. Oberhauser, O.: ¬Die Dewey Decimal Classification im Österreichischen Verbundkatalog : Status und Perspektiven (2009) 0.06
    0.05528898 = sum of:
      0.05528898 = product of:
        0.2764449 = sum of:
          0.02728252 = weight(abstract_txt:eines in 2922) [ClassicSimilarity], result of:
            0.02728252 = score(doc=2922,freq=1.0), product of:
              0.07630736 = queryWeight, product of:
                1.0635662 = boost
                4.5764427 = idf(docFreq=1236, maxDocs=44218)
                0.015677394 = queryNorm
              0.3575346 = fieldWeight in 2922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5764427 = idf(docFreq=1236, maxDocs=44218)
                0.078125 = fieldNorm(doc=2922)
          0.027384138 = weight(abstract_txt:aber in 2922) [ClassicSimilarity], result of:
            0.027384138 = score(doc=2922,freq=1.0), product of:
              0.07649672 = queryWeight, product of:
                1.064885 = boost
                4.5821176 = idf(docFreq=1229, maxDocs=44218)
                0.015677394 = queryNorm
              0.35797793 = fieldWeight in 2922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5821176 = idf(docFreq=1229, maxDocs=44218)
                0.078125 = fieldNorm(doc=2922)
          0.08508413 = weight(abstract_txt:verwendet in 2922) [ClassicSimilarity], result of:
            0.08508413 = score(doc=2922,freq=1.0), product of:
              0.16288303 = queryWeight, product of:
                1.5538863 = boost
                6.686252 = idf(docFreq=149, maxDocs=44218)
                0.015677394 = queryNorm
              0.5223634 = fieldWeight in 2922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.686252 = idf(docFreq=149, maxDocs=44218)
                0.078125 = fieldNorm(doc=2922)
          0.1061131 = weight(abstract_txt:implementierung in 2922) [ClassicSimilarity], result of:
            0.1061131 = score(doc=2922,freq=1.0), product of:
              0.1887221 = queryWeight, product of:
                1.6726023 = boost
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.015677394 = queryNorm
              0.5622717 = fieldWeight in 2922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.078125 = fieldNorm(doc=2922)
          0.030581018 = weight(abstract_txt:wird in 2922) [ClassicSimilarity], result of:
            0.030581018 = score(doc=2922,freq=1.0), product of:
              0.103742026 = queryWeight, product of:
                1.7537742 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.015677394 = queryNorm
              0.29477945 = fieldWeight in 2922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.078125 = fieldNorm(doc=2922)
        0.2 = coord(5/25)