Document (#19862)

Author
Jett, M.
Reuse, B.
Kessling, G.
Title
Implementation of an online database for tables of contents of books
Source
Electronic library. 16(1998) no.2, S.123-130
Year
1998
Abstract
Many small libraries do not have the resources to build a holdings database but the availability of affordable scanners and improved OCR software has made possible a new approach for creating online databases. Describes the work undertaken at the Otto Hahn Library of the Max Planck Institute for Biophysical Chemistry, Germany, to create a database consisting of the titles, bibliographic descriptions and contents tables of books acquired by the library. The book information and table of contents pages are scanned and converted to text using OCR software. A computer program is used to extract as much information as possible, in particular from the CIP data with corrections and missing information being supplied manually. Finally, the information, which consists of: title; author; ISBN; publication year; call number; series; language; and other relevant information for books, as well as the entire table of contents, is stored and added to an Ovid database using the Ovid Local Loader software. Pays particular attention to the algorithm used to extract specific information from the CIP data. 2 OCR software packeges have been tested: OmniPage Pro 7.0 and FineReader 3.0. Experience has shown that FineReader is better at character recognition and retains the formatting better but OmniPage Pro is easier to train to recognize special characters
Theme
Katalogfragen allgemein
Object
Ovid Local Leader

Similar documents (content)

  1. Diodato, V.: Tables of contents and book indexes : how well do they match readers' descriptions of books? (1986) 0.17
    0.1717313 = sum of:
      0.1717313 = product of:
        0.85865647 = sum of:
          0.23765403 = weight(abstract_txt:tables in 376) [ClassicSimilarity], result of:
            0.23765403 = score(doc=376,freq=4.0), product of:
              0.22770587 = queryWeight, product of:
                1.807052 = boost
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.018864818 = queryNorm
              1.0436887 = fieldWeight in 376, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.078125 = fieldNorm(doc=376)
          0.12830618 = weight(abstract_txt:table in 376) [ClassicSimilarity], result of:
            0.12830618 = score(doc=376,freq=1.0), product of:
              0.23966014 = queryWeight, product of:
                1.8538793 = boost
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.018864818 = queryNorm
              0.5353672 = fieldWeight in 376, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.078125 = fieldNorm(doc=376)
          0.01697224 = weight(abstract_txt:information in 376) [ClassicSimilarity], result of:
            0.01697224 = score(doc=376,freq=1.0), product of:
              0.08973543 = queryWeight, product of:
                1.9648354 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.018864818 = queryNorm
              0.18913643 = fieldWeight in 376, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.078125 = fieldNorm(doc=376)
          0.20854694 = weight(abstract_txt:books in 376) [ClassicSimilarity], result of:
            0.20854694 = score(doc=376,freq=6.0), product of:
              0.20871162 = queryWeight, product of:
                2.1188612 = boost
                5.2214546 = idf(docFreq=648, maxDocs=44218)
                0.018864818 = queryNorm
              0.99921095 = fieldWeight in 376, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.2214546 = idf(docFreq=648, maxDocs=44218)
                0.078125 = fieldNorm(doc=376)
          0.26717705 = weight(abstract_txt:contents in 376) [ClassicSimilarity], result of:
            0.26717705 = score(doc=376,freq=3.0), product of:
              0.3414023 = queryWeight, product of:
                3.1291888 = boost
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.018864818 = queryNorm
              0.7825872 = fieldWeight in 376, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.078125 = fieldNorm(doc=376)
        0.2 = coord(5/25)
    
  2. Enhancing USMARC records with table of contents (1992) 0.14
    0.14407407 = sum of:
      0.14407407 = product of:
        0.900463 = sum of:
          0.28518483 = weight(abstract_txt:tables in 2124) [ClassicSimilarity], result of:
            0.28518483 = score(doc=2124,freq=1.0), product of:
              0.22770587 = queryWeight, product of:
                1.807052 = boost
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.018864818 = queryNorm
              1.2524264 = fieldWeight in 2124, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.1875 = fieldNorm(doc=2124)
          0.04073338 = weight(abstract_txt:information in 2124) [ClassicSimilarity], result of:
            0.04073338 = score(doc=2124,freq=1.0), product of:
              0.08973543 = queryWeight, product of:
                1.9648354 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.018864818 = queryNorm
              0.45392746 = fieldWeight in 2124, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.1875 = fieldNorm(doc=2124)
          0.20433342 = weight(abstract_txt:books in 2124) [ClassicSimilarity], result of:
            0.20433342 = score(doc=2124,freq=1.0), product of:
              0.20871162 = queryWeight, product of:
                2.1188612 = boost
                5.2214546 = idf(docFreq=648, maxDocs=44218)
                0.018864818 = queryNorm
              0.97902274 = fieldWeight in 2124, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2214546 = idf(docFreq=648, maxDocs=44218)
                0.1875 = fieldNorm(doc=2124)
          0.37021136 = weight(abstract_txt:contents in 2124) [ClassicSimilarity], result of:
            0.37021136 = score(doc=2124,freq=1.0), product of:
              0.3414023 = queryWeight, product of:
                3.1291888 = boost
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.018864818 = queryNorm
              1.0843846 = fieldWeight in 2124, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.1875 = fieldNorm(doc=2124)
        0.16 = coord(4/25)
    
  3. Jordan, R.P.: Measures of personality and social psychological attitudes : an index to the instruments and their authors (1992) 0.12
    0.11715631 = sum of:
      0.11715631 = product of:
        0.73222697 = sum of:
          0.054460865 = weight(abstract_txt:particular in 3715) [ClassicSimilarity], result of:
            0.054460865 = score(doc=3715,freq=1.0), product of:
              0.098948024 = queryWeight, product of:
                1.1912067 = boost
                4.4031897 = idf(docFreq=1470, maxDocs=44218)
                0.018864818 = queryNorm
              0.5503987 = fieldWeight in 3715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4031897 = idf(docFreq=1470, maxDocs=44218)
                0.125 = fieldNorm(doc=3715)
          0.29032373 = weight(abstract_txt:table in 3715) [ClassicSimilarity], result of:
            0.29032373 = score(doc=3715,freq=2.0), product of:
              0.23966014 = queryWeight, product of:
                1.8538793 = boost
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.018864818 = queryNorm
              1.2113976 = fieldWeight in 3715, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.125 = fieldNorm(doc=3715)
          0.038403794 = weight(abstract_txt:information in 3715) [ClassicSimilarity], result of:
            0.038403794 = score(doc=3715,freq=2.0), product of:
              0.08973543 = queryWeight, product of:
                1.9648354 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.018864818 = queryNorm
              0.4279669 = fieldWeight in 3715, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.125 = fieldNorm(doc=3715)
          0.34903863 = weight(abstract_txt:contents in 3715) [ClassicSimilarity], result of:
            0.34903863 = score(doc=3715,freq=2.0), product of:
              0.3414023 = queryWeight, product of:
                3.1291888 = boost
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.018864818 = queryNorm
              1.0223676 = fieldWeight in 3715, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.125 = fieldNorm(doc=3715)
        0.16 = coord(4/25)
    
  4. Conlon, S.P.N.; Evens, M.; Ahlswede, T.: Developing a large lexical database for information retrieval, parsing, and text generation systems (1993) 0.11
    0.10947175 = sum of:
      0.10947175 = product of:
        0.54735875 = sum of:
          0.03945354 = weight(abstract_txt:possible in 5813) [ClassicSimilarity], result of:
            0.03945354 = score(doc=5813,freq=1.0), product of:
              0.10918293 = queryWeight, product of:
                1.2512985 = boost
                4.6253138 = idf(docFreq=1177, maxDocs=44218)
                0.018864818 = queryNorm
              0.36135262 = fieldWeight in 5813, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6253138 = idf(docFreq=1177, maxDocs=44218)
                0.078125 = fieldNorm(doc=5813)
          0.16804677 = weight(abstract_txt:tables in 5813) [ClassicSimilarity], result of:
            0.16804677 = score(doc=5813,freq=2.0), product of:
              0.22770587 = queryWeight, product of:
                1.807052 = boost
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.018864818 = queryNorm
              0.7379993 = fieldWeight in 5813, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.078125 = fieldNorm(doc=5813)
          0.12830618 = weight(abstract_txt:table in 5813) [ClassicSimilarity], result of:
            0.12830618 = score(doc=5813,freq=1.0), product of:
              0.23966014 = queryWeight, product of:
                1.8538793 = boost
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.018864818 = queryNorm
              0.5353672 = fieldWeight in 5813, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.078125 = fieldNorm(doc=5813)
          0.024002371 = weight(abstract_txt:information in 5813) [ClassicSimilarity], result of:
            0.024002371 = score(doc=5813,freq=2.0), product of:
              0.08973543 = queryWeight, product of:
                1.9648354 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.018864818 = queryNorm
              0.2674793 = fieldWeight in 5813, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.078125 = fieldNorm(doc=5813)
          0.1875499 = weight(abstract_txt:database in 5813) [ClassicSimilarity], result of:
            0.1875499 = score(doc=5813,freq=9.0), product of:
              0.18696967 = queryWeight, product of:
                2.3157098 = boost
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.018864818 = queryNorm
              1.0031034 = fieldWeight in 5813, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.078125 = fieldNorm(doc=5813)
        0.2 = coord(5/25)
    
  5. DeHart, F.E.; Matthews, K.: Subject enhancements and OPACs : planning ahead (1990) 0.10
    0.10343728 = sum of:
      0.10343728 = product of:
        0.5171864 = sum of:
          0.06695488 = weight(abstract_txt:possible in 4654) [ClassicSimilarity], result of:
            0.06695488 = score(doc=4654,freq=2.0), product of:
              0.10918293 = queryWeight, product of:
                1.2512985 = boost
                4.6253138 = idf(docFreq=1177, maxDocs=44218)
                0.018864818 = queryNorm
              0.6132358 = fieldWeight in 4654, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6253138 = idf(docFreq=1177, maxDocs=44218)
                0.09375 = fieldNorm(doc=4654)
          0.14259242 = weight(abstract_txt:tables in 4654) [ClassicSimilarity], result of:
            0.14259242 = score(doc=4654,freq=1.0), product of:
              0.22770587 = queryWeight, product of:
                1.807052 = boost
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.018864818 = queryNorm
              0.6262132 = fieldWeight in 4654, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.09375 = fieldNorm(doc=4654)
          0.02036669 = weight(abstract_txt:information in 4654) [ClassicSimilarity], result of:
            0.02036669 = score(doc=4654,freq=1.0), product of:
              0.08973543 = queryWeight, product of:
                1.9648354 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.018864818 = queryNorm
              0.22696373 = fieldWeight in 4654, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.09375 = fieldNorm(doc=4654)
          0.10216671 = weight(abstract_txt:books in 4654) [ClassicSimilarity], result of:
            0.10216671 = score(doc=4654,freq=1.0), product of:
              0.20871162 = queryWeight, product of:
                2.1188612 = boost
                5.2214546 = idf(docFreq=648, maxDocs=44218)
                0.018864818 = queryNorm
              0.48951137 = fieldWeight in 4654, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2214546 = idf(docFreq=648, maxDocs=44218)
                0.09375 = fieldNorm(doc=4654)
          0.18510568 = weight(abstract_txt:contents in 4654) [ClassicSimilarity], result of:
            0.18510568 = score(doc=4654,freq=1.0), product of:
              0.3414023 = queryWeight, product of:
                3.1291888 = boost
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.018864818 = queryNorm
              0.5421923 = fieldWeight in 4654, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.09375 = fieldNorm(doc=4654)
        0.2 = coord(5/25)