Document (#34079)

Author
Liu, R.-L.
Title
Interactive high-quality text classification
Source
Information processing and management. 44(2008) no.3, S.1062-1075
Year
2008
Abstract
Automatic text classification (TC) is essential for information sharing and management. Its ideal goals are to achieve high-quality TC: (1) accepting almost all documents that should be accepted (i.e., high recall) and (2) rejecting almost all documents that should be rejected (i.e., high precision). Unfortunately, the ideal goals are rarely achieved, making automatic TC not suitable for those applications in which a classifier's erroneous decision may incur high cost and/or serious problems. One way to pursue the ideal is to consult users to confirm the classifier's decisions so that potential errors may be corrected. However, its main challenge lies on the control of the number of confirmations, which may incur heavy cognitive load on the users. We thus develop an intelligent and classifier-independent confirmation strategy ICCOM. Empirical evaluation shows that ICCOM may help various kinds of classifiers to achieve very high precision and recall by conducting fewer confirmations. The contributions are significant to the archiving and recommendation of critical information, since identification of possible TC errors (those that require confirmation) is the key to process information more properly.

Similar documents (content)

  1. Tagheva, K.; Borsack, J.; Condit, A.: Effects of OCR errors on ranking and feedback using the vector space model (1996) 0.17
    0.16574417 = sum of:
      0.16574417 = product of:
        0.69060075 = sum of:
          0.036821127 = weight(abstract_txt:text in 4951) [ClassicSimilarity], result of:
            0.036821127 = score(doc=4951,freq=1.0), product of:
              0.083249606 = queryWeight, product of:
                1.0099556 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.020383703 = queryNorm
              0.4422979 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.038977154 = weight(abstract_txt:documents in 4951) [ClassicSimilarity], result of:
            0.038977154 = score(doc=4951,freq=1.0), product of:
              0.08646843 = queryWeight, product of:
                1.0292953 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.020383703 = queryNorm
              0.45076746 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.1938733 = weight(abstract_txt:corrected in 4951) [ClassicSimilarity], result of:
            0.1938733 = score(doc=4951,freq=1.0), product of:
              0.19997981 = queryWeight, product of:
                1.1068513 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.020383703 = queryNorm
              0.96946436 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.09391771 = weight(abstract_txt:precision in 4951) [ClassicSimilarity], result of:
            0.09391771 = score(doc=4951,freq=1.0), product of:
              0.15541126 = queryWeight, product of:
                1.3799154 = boost
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.020383703 = queryNorm
              0.6043173 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.10579133 = weight(abstract_txt:recall in 4951) [ClassicSimilarity], result of:
            0.10579133 = score(doc=4951,freq=1.0), product of:
              0.16824837 = queryWeight, product of:
                1.4357759 = boost
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.020383703 = queryNorm
              0.6287807 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.22122017 = weight(abstract_txt:errors in 4951) [ClassicSimilarity], result of:
            0.22122017 = score(doc=4951,freq=2.0), product of:
              0.21836883 = queryWeight, product of:
                1.6357108 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.020383703 = queryNorm
              1.0130575 = fieldWeight in 4951, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
        0.24 = coord(6/25)
    
  2. Ringltetter, C.; Stubbe, A.: Practical aspects of automatic genre classification (2008) 0.15
    0.15360475 = sum of:
      0.15360475 = product of:
        0.42667982 = sum of:
          0.029755965 = weight(abstract_txt:text in 1954) [ClassicSimilarity], result of:
            0.029755965 = score(doc=1954,freq=2.0), product of:
              0.083249606 = queryWeight, product of:
                1.0099556 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.020383703 = queryNorm
              0.3574307 = fieldWeight in 1954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=1954)
          0.054556657 = weight(abstract_txt:documents in 1954) [ClassicSimilarity], result of:
            0.054556657 = score(doc=1954,freq=6.0), product of:
              0.08646843 = queryWeight, product of:
                1.0292953 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.020383703 = queryNorm
              0.63094306 = fieldWeight in 1954, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=1954)
          0.096943915 = weight(abstract_txt:rejected in 1954) [ClassicSimilarity], result of:
            0.096943915 = score(doc=1954,freq=1.0), product of:
              0.18295595 = queryWeight, product of:
                1.0586916 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.020383703 = queryNorm
              0.5298757 = fieldWeight in 1954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.0625 = fieldNorm(doc=1954)
          0.025514586 = weight(abstract_txt:those in 1954) [ClassicSimilarity], result of:
            0.025514586 = score(doc=1954,freq=1.0), product of:
              0.09466771 = queryWeight, product of:
                1.0769912 = boost
                4.312277 = idf(docFreq=1610, maxDocs=44218)
                0.020383703 = queryNorm
              0.2695173 = fieldWeight in 1954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.312277 = idf(docFreq=1610, maxDocs=44218)
                0.0625 = fieldNorm(doc=1954)
          0.027715312 = weight(abstract_txt:should in 1954) [ClassicSimilarity], result of:
            0.027715312 = score(doc=1954,freq=1.0), product of:
              0.100035936 = queryWeight, product of:
                1.1071061 = boost
                4.432857 = idf(docFreq=1427, maxDocs=44218)
                0.020383703 = queryNorm
              0.27705356 = fieldWeight in 1954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.432857 = idf(docFreq=1427, maxDocs=44218)
                0.0625 = fieldNorm(doc=1954)
          0.063108824 = weight(abstract_txt:automatic in 1954) [ClassicSimilarity], result of:
            0.063108824 = score(doc=1954,freq=2.0), product of:
              0.13742305 = queryWeight, product of:
                1.2976005 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.020383703 = queryNorm
              0.45923027 = fieldWeight in 1954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=1954)
          0.053667262 = weight(abstract_txt:precision in 1954) [ClassicSimilarity], result of:
            0.053667262 = score(doc=1954,freq=1.0), product of:
              0.15541126 = queryWeight, product of:
                1.3799154 = boost
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.020383703 = queryNorm
              0.34532416 = fieldWeight in 1954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.0625 = fieldNorm(doc=1954)
          0.06045219 = weight(abstract_txt:recall in 1954) [ClassicSimilarity], result of:
            0.06045219 = score(doc=1954,freq=1.0), product of:
              0.16824837 = queryWeight, product of:
                1.4357759 = boost
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.020383703 = queryNorm
              0.35930327 = fieldWeight in 1954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.0625 = fieldNorm(doc=1954)
          0.014965114 = weight(abstract_txt:that in 1954) [ClassicSimilarity], result of:
            0.014965114 = score(doc=1954,freq=2.0), product of:
              0.07145504 = queryWeight, product of:
                1.479441 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.020383703 = queryNorm
              0.20943399 = fieldWeight in 1954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=1954)
        0.36 = coord(9/25)
    
  3. Tseng, Y.-H.: Solving vocabulary problems with interactive query expansion (1998) 0.14
    0.1374835 = sum of:
      0.1374835 = product of:
        0.49101254 = sum of:
          0.029755965 = weight(abstract_txt:text in 5159) [ClassicSimilarity], result of:
            0.029755965 = score(doc=5159,freq=2.0), product of:
              0.083249606 = queryWeight, product of:
                1.0099556 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.020383703 = queryNorm
              0.3574307 = fieldWeight in 5159, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=5159)
          0.02227266 = weight(abstract_txt:documents in 5159) [ClassicSimilarity], result of:
            0.02227266 = score(doc=5159,freq=1.0), product of:
              0.08646843 = queryWeight, product of:
                1.0292953 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.020383703 = queryNorm
              0.2575814 = fieldWeight in 5159, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=5159)
          0.107334524 = weight(abstract_txt:precision in 5159) [ClassicSimilarity], result of:
            0.107334524 = score(doc=5159,freq=4.0), product of:
              0.15541126 = queryWeight, product of:
                1.3799154 = boost
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.020383703 = queryNorm
              0.6906483 = fieldWeight in 5159, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.0625 = fieldNorm(doc=5159)
          0.13517521 = weight(abstract_txt:recall in 5159) [ClassicSimilarity], result of:
            0.13517521 = score(doc=5159,freq=5.0), product of:
              0.16824837 = queryWeight, product of:
                1.4357759 = boost
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.020383703 = queryNorm
              0.80342656 = fieldWeight in 5159, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.0625 = fieldNorm(doc=5159)
          0.018328445 = weight(abstract_txt:that in 5159) [ClassicSimilarity], result of:
            0.018328445 = score(doc=5159,freq=3.0), product of:
              0.07145504 = queryWeight, product of:
                1.479441 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.020383703 = queryNorm
              0.2565032 = fieldWeight in 5159, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=5159)
          0.068604074 = weight(abstract_txt:achieve in 5159) [ClassicSimilarity], result of:
            0.068604074 = score(doc=5159,freq=1.0), product of:
              0.1830527 = queryWeight, product of:
                1.4976119 = boost
                5.9964437 = idf(docFreq=298, maxDocs=44218)
                0.020383703 = queryNorm
              0.37477773 = fieldWeight in 5159, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9964437 = idf(docFreq=298, maxDocs=44218)
                0.0625 = fieldNorm(doc=5159)
          0.10954162 = weight(abstract_txt:high in 5159) [ClassicSimilarity], result of:
            0.10954162 = score(doc=5159,freq=1.0), product of:
              0.36066392 = queryWeight, product of:
                3.641021 = boost
                4.8595543 = idf(docFreq=931, maxDocs=44218)
                0.020383703 = queryNorm
              0.30372214 = fieldWeight in 5159, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8595543 = idf(docFreq=931, maxDocs=44218)
                0.0625 = fieldNorm(doc=5159)
        0.28 = coord(7/25)
    
  4. Toepfer, M.; Seifert, C.: Content-based quality estimation for automatic subject indexing of short texts under precision and recall constraints 0.12
    0.12229093 = sum of:
      0.12229093 = product of:
        0.43675333 = sum of:
          0.026300807 = weight(abstract_txt:text in 4309) [ClassicSimilarity], result of:
            0.026300807 = score(doc=4309,freq=1.0), product of:
              0.083249606 = queryWeight, product of:
                1.0099556 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.020383703 = queryNorm
              0.3159271 = fieldWeight in 4309, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=4309)
          0.027840827 = weight(abstract_txt:documents in 4309) [ClassicSimilarity], result of:
            0.027840827 = score(doc=4309,freq=1.0), product of:
              0.08646843 = queryWeight, product of:
                1.0292953 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.020383703 = queryNorm
              0.32197678 = fieldWeight in 4309, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=4309)
          0.08012481 = weight(abstract_txt:quality in 4309) [ClassicSimilarity], result of:
            0.08012481 = score(doc=4309,freq=4.0), product of:
              0.11021165 = queryWeight, product of:
                1.1620504 = boost
                4.6528544 = idf(docFreq=1145, maxDocs=44218)
                0.020383703 = queryNorm
              0.7270085 = fieldWeight in 4309, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.6528544 = idf(docFreq=1145, maxDocs=44218)
                0.078125 = fieldNorm(doc=4309)
          0.067084074 = weight(abstract_txt:precision in 4309) [ClassicSimilarity], result of:
            0.067084074 = score(doc=4309,freq=1.0), product of:
              0.15541126 = queryWeight, product of:
                1.3799154 = boost
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.020383703 = queryNorm
              0.4316552 = fieldWeight in 4309, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.078125 = fieldNorm(doc=4309)
          0.075565234 = weight(abstract_txt:recall in 4309) [ClassicSimilarity], result of:
            0.075565234 = score(doc=4309,freq=1.0), product of:
              0.16824837 = queryWeight, product of:
                1.4357759 = boost
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.020383703 = queryNorm
              0.44912907 = fieldWeight in 4309, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.078125 = fieldNorm(doc=4309)
          0.022910558 = weight(abstract_txt:that in 4309) [ClassicSimilarity], result of:
            0.022910558 = score(doc=4309,freq=3.0), product of:
              0.07145504 = queryWeight, product of:
                1.479441 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.020383703 = queryNorm
              0.320629 = fieldWeight in 4309, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=4309)
          0.13692702 = weight(abstract_txt:high in 4309) [ClassicSimilarity], result of:
            0.13692702 = score(doc=4309,freq=1.0), product of:
              0.36066392 = queryWeight, product of:
                3.641021 = boost
                4.8595543 = idf(docFreq=931, maxDocs=44218)
                0.020383703 = queryNorm
              0.37965268 = fieldWeight in 4309, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8595543 = idf(docFreq=931, maxDocs=44218)
                0.078125 = fieldNorm(doc=4309)
        0.28 = coord(7/25)
    
  5. Taghva, K.; Borsack, J.; Condit, A.: Evaluation of model-based retrieval effectiveness with OCR text (1996) 0.12
    0.12176618 = sum of:
      0.12176618 = product of:
        0.5073591 = sum of:
          0.054665193 = weight(abstract_txt:text in 4485) [ClassicSimilarity], result of:
            0.054665193 = score(doc=4485,freq=3.0), product of:
              0.083249606 = queryWeight, product of:
                1.0099556 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.020383703 = queryNorm
              0.6566421 = fieldWeight in 4485, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
          0.03340899 = weight(abstract_txt:documents in 4485) [ClassicSimilarity], result of:
            0.03340899 = score(doc=4485,freq=1.0), product of:
              0.08646843 = queryWeight, product of:
                1.0292953 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.020383703 = queryNorm
              0.38637212 = fieldWeight in 4485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
          0.08050089 = weight(abstract_txt:precision in 4485) [ClassicSimilarity], result of:
            0.08050089 = score(doc=4485,freq=1.0), product of:
              0.15541126 = queryWeight, product of:
                1.3799154 = boost
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.020383703 = queryNorm
              0.51798624 = fieldWeight in 4485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
          0.09067829 = weight(abstract_txt:recall in 4485) [ClassicSimilarity], result of:
            0.09067829 = score(doc=4485,freq=1.0), product of:
              0.16824837 = queryWeight, product of:
                1.4357759 = boost
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.020383703 = queryNorm
              0.5389549 = fieldWeight in 4485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
          0.0158729 = weight(abstract_txt:that in 4485) [ClassicSimilarity], result of:
            0.0158729 = score(doc=4485,freq=1.0), product of:
              0.07145504 = queryWeight, product of:
                1.479441 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.020383703 = queryNorm
              0.22213829 = fieldWeight in 4485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
          0.23223281 = weight(abstract_txt:errors in 4485) [ClassicSimilarity], result of:
            0.23223281 = score(doc=4485,freq=3.0), product of:
              0.21836883 = queryWeight, product of:
                1.6357108 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.020383703 = queryNorm
              1.0634888 = fieldWeight in 4485, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
        0.24 = coord(6/25)