Document (#34080)

Author
Liu, R.-L.
Title
Interactive high-quality text classification
Source
Information processing and management. 44(2008) no.3, S.1062-1075
Year
2008
Abstract
Automatic text classification (TC) is essential for information sharing and management. Its ideal goals are to achieve high-quality TC: (1) accepting almost all documents that should be accepted (i.e., high recall) and (2) rejecting almost all documents that should be rejected (i.e., high precision). Unfortunately, the ideal goals are rarely achieved, making automatic TC not suitable for those applications in which a classifier's erroneous decision may incur high cost and/or serious problems. One way to pursue the ideal is to consult users to confirm the classifier's decisions so that potential errors may be corrected. However, its main challenge lies on the control of the number of confirmations, which may incur heavy cognitive load on the users. We thus develop an intelligent and classifier-independent confirmation strategy ICCOM. Empirical evaluation shows that ICCOM may help various kinds of classifiers to achieve very high precision and recall by conducting fewer confirmations. The contributions are significant to the archiving and recommendation of critical information, since identification of possible TC errors (those that require confirmation) is the key to process information more properly.

Similar documents (content)

  1. Tagheva, K.; Borsack, J.; Condit, A.: Effects of OCR errors on ranking and feedback using the vector space model (1996) 0.16
    0.16380681 = sum of:
      0.16380681 = product of:
        0.6825284 = sum of:
          0.036834102 = weight(abstract_txt:text in 5020) [ClassicSimilarity], result of:
            0.036834102 = score(doc=5020,freq=1.0), product of:
              0.08315161 = queryWeight, product of:
                1.0093223 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.020341333 = queryNorm
              0.44297522 = fieldWeight in 5020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.109375 = fieldNorm(doc=5020)
          0.038645472 = weight(abstract_txt:documents in 5020) [ClassicSimilarity], result of:
            0.038645472 = score(doc=5020,freq=1.0), product of:
              0.085855804 = queryWeight, product of:
                1.0256032 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.020341333 = queryNorm
              0.45012066 = fieldWeight in 5020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.109375 = fieldNorm(doc=5020)
          0.19084048 = weight(abstract_txt:corrected in 5020) [ClassicSimilarity], result of:
            0.19084048 = score(doc=5020,freq=1.0), product of:
              0.1976094 = queryWeight, product of:
                1.1002296 = boost
                8.829678 = idf(docFreq=16, maxDocs=42740)
                0.020341333 = queryNorm
              0.965746 = fieldWeight in 5020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.829678 = idf(docFreq=16, maxDocs=42740)
                0.109375 = fieldNorm(doc=5020)
          0.093083195 = weight(abstract_txt:precision in 5020) [ClassicSimilarity], result of:
            0.093083195 = score(doc=5020,freq=1.0), product of:
              0.15427117 = queryWeight, product of:
                1.3747917 = boost
                5.5165615 = idf(docFreq=466, maxDocs=42740)
                0.020341333 = queryNorm
              0.6033739 = fieldWeight in 5020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5165615 = idf(docFreq=466, maxDocs=42740)
                0.109375 = fieldNorm(doc=5020)
          0.10449132 = weight(abstract_txt:recall in 5020) [ClassicSimilarity], result of:
            0.10449132 = score(doc=5020,freq=1.0), product of:
              0.16663161 = queryWeight, product of:
                1.4288058 = boost
                5.733301 = idf(docFreq=375, maxDocs=42740)
                0.020341333 = queryNorm
              0.62707984 = fieldWeight in 5020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.733301 = idf(docFreq=375, maxDocs=42740)
                0.109375 = fieldNorm(doc=5020)
          0.21863379 = weight(abstract_txt:errors in 5020) [ClassicSimilarity], result of:
            0.21863379 = score(doc=5020,freq=2.0), product of:
              0.21635757 = queryWeight, product of:
                1.6280981 = boost
                6.532992 = idf(docFreq=168, maxDocs=42740)
                0.020341333 = queryNorm
              1.0105206 = fieldWeight in 5020, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.532992 = idf(docFreq=168, maxDocs=42740)
                0.109375 = fieldNorm(doc=5020)
        0.24 = coord(6/25)
    
  2. Ringltetter, C.; Stubbe, A.: Practical aspects of automatic genre classification (2008) 0.15
    0.15432173 = sum of:
      0.15432173 = product of:
        0.42867145 = sum of:
          0.02976645 = weight(abstract_txt:text in 3955) [ClassicSimilarity], result of:
            0.02976645 = score(doc=3955,freq=2.0), product of:
              0.08315161 = queryWeight, product of:
                1.0093223 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.020341333 = queryNorm
              0.35797805 = fieldWeight in 3955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=3955)
          0.054092396 = weight(abstract_txt:documents in 3955) [ClassicSimilarity], result of:
            0.054092396 = score(doc=3955,freq=6.0), product of:
              0.085855804 = queryWeight, product of:
                1.0256032 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.020341333 = queryNorm
              0.6300377 = fieldWeight in 3955, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.0625 = fieldNorm(doc=3955)
          0.09977489 = weight(abstract_txt:rejected in 3955) [ClassicSimilarity], result of:
            0.09977489 = score(doc=3955,freq=1.0), product of:
              0.18623735 = queryWeight, product of:
                1.0681025 = boost
                8.571848 = idf(docFreq=21, maxDocs=42740)
                0.020341333 = queryNorm
              0.5357405 = fieldWeight in 3955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.571848 = idf(docFreq=21, maxDocs=42740)
                0.0625 = fieldNorm(doc=3955)
          0.025660764 = weight(abstract_txt:those in 3955) [ClassicSimilarity], result of:
            0.025660764 = score(doc=3955,freq=1.0), product of:
              0.09489478 = queryWeight, product of:
                1.0782406 = boost
                4.326605 = idf(docFreq=1534, maxDocs=42740)
                0.020341333 = queryNorm
              0.2704128 = fieldWeight in 3955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.326605 = idf(docFreq=1534, maxDocs=42740)
                0.0625 = fieldNorm(doc=3955)
          0.028098427 = weight(abstract_txt:should in 3955) [ClassicSimilarity], result of:
            0.028098427 = score(doc=3955,freq=1.0), product of:
              0.10081317 = queryWeight, product of:
                1.1113559 = boost
                4.459485 = idf(docFreq=1343, maxDocs=42740)
                0.020341333 = queryNorm
              0.27871782 = fieldWeight in 3955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.459485 = idf(docFreq=1343, maxDocs=42740)
                0.0625 = fieldNorm(doc=3955)
          0.06299674 = weight(abstract_txt:automatic in 3955) [ClassicSimilarity], result of:
            0.06299674 = score(doc=3955,freq=2.0), product of:
              0.1370665 = queryWeight, product of:
                1.2958663 = boost
                5.199861 = idf(docFreq=640, maxDocs=42740)
                0.020341333 = queryNorm
              0.45960712 = fieldWeight in 3955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.199861 = idf(docFreq=640, maxDocs=42740)
                0.0625 = fieldNorm(doc=3955)
          0.0531904 = weight(abstract_txt:precision in 3955) [ClassicSimilarity], result of:
            0.0531904 = score(doc=3955,freq=1.0), product of:
              0.15427117 = queryWeight, product of:
                1.3747917 = boost
                5.5165615 = idf(docFreq=466, maxDocs=42740)
                0.020341333 = queryNorm
              0.3447851 = fieldWeight in 3955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5165615 = idf(docFreq=466, maxDocs=42740)
                0.0625 = fieldNorm(doc=3955)
          0.059709325 = weight(abstract_txt:recall in 3955) [ClassicSimilarity], result of:
            0.059709325 = score(doc=3955,freq=1.0), product of:
              0.16663161 = queryWeight, product of:
                1.4288058 = boost
                5.733301 = idf(docFreq=375, maxDocs=42740)
                0.020341333 = queryNorm
              0.35833132 = fieldWeight in 3955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.733301 = idf(docFreq=375, maxDocs=42740)
                0.0625 = fieldNorm(doc=3955)
          0.015382032 = weight(abstract_txt:that in 3955) [ClassicSimilarity], result of:
            0.015382032 = score(doc=3955,freq=2.0), product of:
              0.072673336 = queryWeight, product of:
                1.4919425 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.020341333 = queryNorm
              0.21165991 = fieldWeight in 3955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=3955)
        0.36 = coord(9/25)
    
  3. Tseng, Y.-H.: Solving vocabulary problems with interactive query expansion (1998) 0.14
    0.13740593 = sum of:
      0.13740593 = product of:
        0.49073547 = sum of:
          0.02976645 = weight(abstract_txt:text in 75) [ClassicSimilarity], result of:
            0.02976645 = score(doc=75,freq=2.0), product of:
              0.08315161 = queryWeight, product of:
                1.0093223 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.020341333 = queryNorm
              0.35797805 = fieldWeight in 75, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=75)
          0.022083126 = weight(abstract_txt:documents in 75) [ClassicSimilarity], result of:
            0.022083126 = score(doc=75,freq=1.0), product of:
              0.085855804 = queryWeight, product of:
                1.0256032 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.020341333 = queryNorm
              0.2572118 = fieldWeight in 75, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.0625 = fieldNorm(doc=75)
          0.1063808 = weight(abstract_txt:precision in 75) [ClassicSimilarity], result of:
            0.1063808 = score(doc=75,freq=4.0), product of:
              0.15427117 = queryWeight, product of:
                1.3747917 = boost
                5.5165615 = idf(docFreq=466, maxDocs=42740)
                0.020341333 = queryNorm
              0.6895702 = fieldWeight in 75, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5165615 = idf(docFreq=466, maxDocs=42740)
                0.0625 = fieldNorm(doc=75)
          0.1335141 = weight(abstract_txt:recall in 75) [ClassicSimilarity], result of:
            0.1335141 = score(doc=75,freq=5.0), product of:
              0.16663161 = queryWeight, product of:
                1.4288058 = boost
                5.733301 = idf(docFreq=375, maxDocs=42740)
                0.020341333 = queryNorm
              0.8012532 = fieldWeight in 75, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.733301 = idf(docFreq=375, maxDocs=42740)
                0.0625 = fieldNorm(doc=75)
          0.018839065 = weight(abstract_txt:that in 75) [ClassicSimilarity], result of:
            0.018839065 = score(doc=75,freq=3.0), product of:
              0.072673336 = queryWeight, product of:
                1.4919425 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.020341333 = queryNorm
              0.2592294 = fieldWeight in 75, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=75)
          0.070407614 = weight(abstract_txt:achieve in 75) [ClassicSimilarity], result of:
            0.070407614 = score(doc=75,freq=1.0), product of:
              0.18598405 = queryWeight, product of:
                1.5094974 = boost
                6.0570884 = idf(docFreq=271, maxDocs=42740)
                0.020341333 = queryNorm
              0.37856802 = fieldWeight in 75, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0570884 = idf(docFreq=271, maxDocs=42740)
                0.0625 = fieldNorm(doc=75)
          0.1097443 = weight(abstract_txt:high in 75) [ClassicSimilarity], result of:
            0.1097443 = score(doc=75,freq=1.0), product of:
              0.36059886 = queryWeight, product of:
                3.6405528 = boost
                4.8694243 = idf(docFreq=891, maxDocs=42740)
                0.020341333 = queryNorm
              0.30433902 = fieldWeight in 75, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8694243 = idf(docFreq=891, maxDocs=42740)
                0.0625 = fieldNorm(doc=75)
        0.28 = coord(7/25)
    
  4. Toepfer, M.; Seifert, C.: Content-based quality estimation for automatic subject indexing of short texts under precision and recall constraints 0.12
    0.1223894 = sum of:
      0.1223894 = product of:
        0.437105 = sum of:
          0.026310075 = weight(abstract_txt:text in 310) [ClassicSimilarity], result of:
            0.026310075 = score(doc=310,freq=1.0), product of:
              0.08315161 = queryWeight, product of:
                1.0093223 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.020341333 = queryNorm
              0.3164109 = fieldWeight in 310, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=310)
          0.027603908 = weight(abstract_txt:documents in 310) [ClassicSimilarity], result of:
            0.027603908 = score(doc=310,freq=1.0), product of:
              0.085855804 = queryWeight, product of:
                1.0256032 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.020341333 = queryNorm
              0.32151476 = fieldWeight in 310, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.078125 = fieldNorm(doc=310)
          0.081337124 = weight(abstract_txt:quality in 310) [ClassicSimilarity], result of:
            0.081337124 = score(doc=310,freq=4.0), product of:
              0.11116339 = queryWeight, product of:
                1.1670122 = boost
                4.6828146 = idf(docFreq=1074, maxDocs=42740)
                0.020341333 = queryNorm
              0.7316898 = fieldWeight in 310, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.6828146 = idf(docFreq=1074, maxDocs=42740)
                0.078125 = fieldNorm(doc=310)
          0.066488 = weight(abstract_txt:precision in 310) [ClassicSimilarity], result of:
            0.066488 = score(doc=310,freq=1.0), product of:
              0.15427117 = queryWeight, product of:
                1.3747917 = boost
                5.5165615 = idf(docFreq=466, maxDocs=42740)
                0.020341333 = queryNorm
              0.43098137 = fieldWeight in 310, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5165615 = idf(docFreq=466, maxDocs=42740)
                0.078125 = fieldNorm(doc=310)
          0.07463665 = weight(abstract_txt:recall in 310) [ClassicSimilarity], result of:
            0.07463665 = score(doc=310,freq=1.0), product of:
              0.16663161 = queryWeight, product of:
                1.4288058 = boost
                5.733301 = idf(docFreq=375, maxDocs=42740)
                0.020341333 = queryNorm
              0.44791415 = fieldWeight in 310, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.733301 = idf(docFreq=375, maxDocs=42740)
                0.078125 = fieldNorm(doc=310)
          0.023548832 = weight(abstract_txt:that in 310) [ClassicSimilarity], result of:
            0.023548832 = score(doc=310,freq=3.0), product of:
              0.072673336 = queryWeight, product of:
                1.4919425 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.020341333 = queryNorm
              0.32403675 = fieldWeight in 310, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.078125 = fieldNorm(doc=310)
          0.13718039 = weight(abstract_txt:high in 310) [ClassicSimilarity], result of:
            0.13718039 = score(doc=310,freq=1.0), product of:
              0.36059886 = queryWeight, product of:
                3.6405528 = boost
                4.8694243 = idf(docFreq=891, maxDocs=42740)
                0.020341333 = queryNorm
              0.38042378 = fieldWeight in 310, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8694243 = idf(docFreq=891, maxDocs=42740)
                0.078125 = fieldNorm(doc=310)
        0.28 = coord(7/25)
    
  5. Taghva, K.; Borsack, J.; Condit, A.: Evaluation of model-based retrieval effectiveness with OCR text (1996) 0.12
    0.12071796 = sum of:
      0.12071796 = product of:
        0.5029915 = sum of:
          0.05468446 = weight(abstract_txt:text in 4554) [ClassicSimilarity], result of:
            0.05468446 = score(doc=4554,freq=3.0), product of:
              0.08315161 = queryWeight, product of:
                1.0093223 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.020341333 = queryNorm
              0.65764767 = fieldWeight in 4554, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.09375 = fieldNorm(doc=4554)
          0.03312469 = weight(abstract_txt:documents in 4554) [ClassicSimilarity], result of:
            0.03312469 = score(doc=4554,freq=1.0), product of:
              0.085855804 = queryWeight, product of:
                1.0256032 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.020341333 = queryNorm
              0.3858177 = fieldWeight in 4554, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.09375 = fieldNorm(doc=4554)
          0.0797856 = weight(abstract_txt:precision in 4554) [ClassicSimilarity], result of:
            0.0797856 = score(doc=4554,freq=1.0), product of:
              0.15427117 = queryWeight, product of:
                1.3747917 = boost
                5.5165615 = idf(docFreq=466, maxDocs=42740)
                0.020341333 = queryNorm
              0.51717764 = fieldWeight in 4554, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5165615 = idf(docFreq=466, maxDocs=42740)
                0.09375 = fieldNorm(doc=4554)
          0.08956399 = weight(abstract_txt:recall in 4554) [ClassicSimilarity], result of:
            0.08956399 = score(doc=4554,freq=1.0), product of:
              0.16663161 = queryWeight, product of:
                1.4288058 = boost
                5.733301 = idf(docFreq=375, maxDocs=42740)
                0.020341333 = queryNorm
              0.537497 = fieldWeight in 4554, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.733301 = idf(docFreq=375, maxDocs=42740)
                0.09375 = fieldNorm(doc=4554)
          0.016315108 = weight(abstract_txt:that in 4554) [ClassicSimilarity], result of:
            0.016315108 = score(doc=4554,freq=1.0), product of:
              0.072673336 = queryWeight, product of:
                1.4919425 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.020341333 = queryNorm
              0.22449924 = fieldWeight in 4554, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.09375 = fieldNorm(doc=4554)
          0.22951765 = weight(abstract_txt:errors in 4554) [ClassicSimilarity], result of:
            0.22951765 = score(doc=4554,freq=3.0), product of:
              0.21635757 = queryWeight, product of:
                1.6280981 = boost
                6.532992 = idf(docFreq=168, maxDocs=42740)
                0.020341333 = queryNorm
              1.0608256 = fieldWeight in 4554, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.532992 = idf(docFreq=168, maxDocs=42740)
                0.09375 = fieldNorm(doc=4554)
        0.24 = coord(6/25)