Document (#18773)

Author
Prasad, A.R.D.
Title
Application of OCR in building bibliographic databases
Source
DESIDOC bulletin of information technology. 17(1997) no.4, S.17-19
Year
1997
Abstract
Bibliographic databases tend to be very verbose and pose a problem for libraries due to the huge amount of data entry involved. In this situation, technologies that offer solutions are retrospective conversion and OCR. Discusses the building of an intelligent system for the automatic identification of bibliographic elements such as title, author, publisher, etc. Considers the resolution of conflicts in situations where more than one bibliographic element satisfies the criteria specified for identification. This work is being carried out at the Indian Documentation Research and Training Centre, Bangalore, with the financial assistance of NISSAT (National Information System for Science and Technology)
Footnote
Contribution to the first in a series of special issues of this journal focusing on Indian bibliographic databases

Similar documents (author)

  1. Prasad, A.R.D.: PROMETHEUS: an automatic indexing system (1996) 7.96
    7.9553595 = sum of:
      7.9553595 = sum of:
        3.7470129 = weight(author_txt:prasad in 5258) [ClassicSimilarity], result of:
          3.7470129 = score(doc=5258,freq=1.0), product of:
            0.6792449 = queryWeight, product of:
              8.826303 = idf(docFreq=16, maxDocs=42596)
              0.0769569 = queryNorm
            5.516439 = fieldWeight in 5258, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.826303 = idf(docFreq=16, maxDocs=42596)
              0.625 = fieldNorm(doc=5258)
        4.2083464 = weight(author_txt:a.r.d in 5258) [ClassicSimilarity], result of:
          4.2083464 = score(doc=5258,freq=1.0), product of:
            0.7339118 = queryWeight, product of:
              1.0394623 = boost
              9.174609 = idf(docFreq=11, maxDocs=42596)
              0.0769569 = queryNorm
            5.734131 = fieldWeight in 5258, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.174609 = idf(docFreq=11, maxDocs=42596)
              0.625 = fieldNorm(doc=5258)
    
  2. Prasad, A.R.D.; Kar, B.B.: Parsing Boolean search expression using definite clause grammars (1994) 6.36
    6.3642874 = sum of:
      6.3642874 = sum of:
        2.9976103 = weight(author_txt:prasad in 188) [ClassicSimilarity], result of:
          2.9976103 = score(doc=188,freq=1.0), product of:
            0.6792449 = queryWeight, product of:
              8.826303 = idf(docFreq=16, maxDocs=42596)
              0.0769569 = queryNorm
            4.4131513 = fieldWeight in 188, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.826303 = idf(docFreq=16, maxDocs=42596)
              0.5 = fieldNorm(doc=188)
        3.366677 = weight(author_txt:a.r.d in 188) [ClassicSimilarity], result of:
          3.366677 = score(doc=188,freq=1.0), product of:
            0.7339118 = queryWeight, product of:
              1.0394623 = boost
              9.174609 = idf(docFreq=11, maxDocs=42596)
              0.0769569 = queryNorm
            4.5873046 = fieldWeight in 188, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.174609 = idf(docFreq=11, maxDocs=42596)
              0.5 = fieldNorm(doc=188)
    
  3. Karisiddappa, C.R.; Prasad, A.R.D.: Declarative programming and thesaurus construction (1993) 6.36
    6.3642874 = sum of:
      6.3642874 = sum of:
        2.9976103 = weight(author_txt:prasad in 3286) [ClassicSimilarity], result of:
          2.9976103 = score(doc=3286,freq=1.0), product of:
            0.6792449 = queryWeight, product of:
              8.826303 = idf(docFreq=16, maxDocs=42596)
              0.0769569 = queryNorm
            4.4131513 = fieldWeight in 3286, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.826303 = idf(docFreq=16, maxDocs=42596)
              0.5 = fieldNorm(doc=3286)
        3.366677 = weight(author_txt:a.r.d in 3286) [ClassicSimilarity], result of:
          3.366677 = score(doc=3286,freq=1.0), product of:
            0.7339118 = queryWeight, product of:
              1.0394623 = boost
              9.174609 = idf(docFreq=11, maxDocs=42596)
              0.0769569 = queryNorm
            4.5873046 = fieldWeight in 3286, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.174609 = idf(docFreq=11, maxDocs=42596)
              0.5 = fieldNorm(doc=3286)
    
  4. Mundgod, M.B.; Prasad, A.R.D.: Automatic identification of bibliographic data elements from the title pages of documents : a heuristic approach (1996) 6.36
    6.3642874 = sum of:
      6.3642874 = sum of:
        2.9976103 = weight(author_txt:prasad in 1398) [ClassicSimilarity], result of:
          2.9976103 = score(doc=1398,freq=1.0), product of:
            0.6792449 = queryWeight, product of:
              8.826303 = idf(docFreq=16, maxDocs=42596)
              0.0769569 = queryNorm
            4.4131513 = fieldWeight in 1398, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.826303 = idf(docFreq=16, maxDocs=42596)
              0.5 = fieldNorm(doc=1398)
        3.366677 = weight(author_txt:a.r.d in 1398) [ClassicSimilarity], result of:
          3.366677 = score(doc=1398,freq=1.0), product of:
            0.7339118 = queryWeight, product of:
              1.0394623 = boost
              9.174609 = idf(docFreq=11, maxDocs=42596)
              0.0769569 = queryNorm
            4.5873046 = fieldWeight in 1398, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.174609 = idf(docFreq=11, maxDocs=42596)
              0.5 = fieldNorm(doc=1398)
    
  5. Prasad, A.R.D.; Madalli, D.P.: Faceted infrastructure for semantic digital libraries (2008) 6.36
    6.3642874 = sum of:
      6.3642874 = sum of:
        2.9976103 = weight(author_txt:prasad in 3085) [ClassicSimilarity], result of:
          2.9976103 = score(doc=3085,freq=1.0), product of:
            0.6792449 = queryWeight, product of:
              8.826303 = idf(docFreq=16, maxDocs=42596)
              0.0769569 = queryNorm
            4.4131513 = fieldWeight in 3085, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.826303 = idf(docFreq=16, maxDocs=42596)
              0.5 = fieldNorm(doc=3085)
        3.366677 = weight(author_txt:a.r.d in 3085) [ClassicSimilarity], result of:
          3.366677 = score(doc=3085,freq=1.0), product of:
            0.7339118 = queryWeight, product of:
              1.0394623 = boost
              9.174609 = idf(docFreq=11, maxDocs=42596)
              0.0769569 = queryNorm
            4.5873046 = fieldWeight in 3085, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.174609 = idf(docFreq=11, maxDocs=42596)
              0.5 = fieldNorm(doc=3085)
    

Similar documents (content)

  1. Mundgod, M.B.; Prasad, A.R.D.: Automatic identification of bibliographic data elements from the title pages of documents : a heuristic approach (1996) 0.10
    0.09975972 = sum of:
      0.09975972 = product of:
        0.49879858 = sum of:
          0.06916623 = weight(abstract_txt:entry in 1398) [ClassicSimilarity], result of:
            0.06916623 = score(doc=1398,freq=1.0), product of:
              0.12536578 = queryWeight, product of:
                5.884964 = idf(docFreq=321, maxDocs=42596)
                0.021302726 = queryNorm
              0.5517154 = fieldWeight in 1398, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.884964 = idf(docFreq=321, maxDocs=42596)
                0.09375 = fieldNorm(doc=1398)
          0.04470183 = weight(abstract_txt:system in 1398) [ClassicSimilarity], result of:
            0.04470183 = score(doc=1398,freq=3.0), product of:
              0.08186584 = queryWeight, product of:
                1.1428175 = boost
                3.3627198 = idf(docFreq=4010, maxDocs=42596)
                0.021302726 = queryNorm
              0.5460376 = fieldWeight in 1398, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3627198 = idf(docFreq=4010, maxDocs=42596)
                0.09375 = fieldNorm(doc=1398)
          0.13079989 = weight(abstract_txt:publisher in 1398) [ClassicSimilarity], result of:
            0.13079989 = score(doc=1398,freq=1.0), product of:
              0.1917143 = queryWeight, product of:
                1.2366242 = boost
                7.277489 = idf(docFreq=79, maxDocs=42596)
                0.021302726 = queryNorm
              0.6822646 = fieldWeight in 1398, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.277489 = idf(docFreq=79, maxDocs=42596)
                0.09375 = fieldNorm(doc=1398)
          0.10980519 = weight(abstract_txt:building in 1398) [ClassicSimilarity], result of:
            0.10980519 = score(doc=1398,freq=1.0), product of:
              0.214952 = queryWeight, product of:
                1.8518093 = boost
                5.4489155 = idf(docFreq=497, maxDocs=42596)
                0.021302726 = queryNorm
              0.5108358 = fieldWeight in 1398, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4489155 = idf(docFreq=497, maxDocs=42596)
                0.09375 = fieldNorm(doc=1398)
          0.14432544 = weight(abstract_txt:bibliographic in 1398) [ClassicSimilarity], result of:
            0.14432544 = score(doc=1398,freq=2.0), product of:
              0.25792187 = queryWeight, product of:
                2.8686965 = boost
                4.220544 = idf(docFreq=1700, maxDocs=42596)
                0.021302726 = queryNorm
              0.5595704 = fieldWeight in 1398, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.220544 = idf(docFreq=1700, maxDocs=42596)
                0.09375 = fieldNorm(doc=1398)
        0.2 = coord(5/25)
    
  2. Hariharan, A.; Rao, B.R.K.; Somaiah, M.S.: Design and development of a database on micro-CDS/ISIS : union catalogue of the S&T conference proceedings (1991) 0.10
    0.09926479 = sum of:
      0.09926479 = product of:
        0.62040496 = sum of:
          0.080710575 = weight(abstract_txt:centre in 512) [ClassicSimilarity], result of:
            0.080710575 = score(doc=512,freq=1.0), product of:
              0.13895364 = queryWeight, product of:
                1.052799 = boost
                6.195684 = idf(docFreq=235, maxDocs=42596)
                0.021302726 = queryNorm
              0.58084536 = fieldWeight in 512, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.195684 = idf(docFreq=235, maxDocs=42596)
                0.09375 = fieldNorm(doc=512)
          0.025808614 = weight(abstract_txt:system in 512) [ClassicSimilarity], result of:
            0.025808614 = score(doc=512,freq=1.0), product of:
              0.08186584 = queryWeight, product of:
                1.1428175 = boost
                3.3627198 = idf(docFreq=4010, maxDocs=42596)
                0.021302726 = queryNorm
              0.315255 = fieldWeight in 512, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3627198 = idf(docFreq=4010, maxDocs=42596)
                0.09375 = fieldNorm(doc=512)
          0.16162568 = weight(abstract_txt:indian in 512) [ClassicSimilarity], result of:
            0.16162568 = score(doc=512,freq=1.0), product of:
              0.22076143 = queryWeight, product of:
                1.3270036 = boost
                7.809368 = idf(docFreq=46, maxDocs=42596)
                0.021302726 = queryNorm
              0.73212826 = fieldWeight in 512, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.809368 = idf(docFreq=46, maxDocs=42596)
                0.09375 = fieldNorm(doc=512)
          0.35226008 = weight(abstract_txt:bangalore in 512) [ClassicSimilarity], result of:
            0.35226008 = score(doc=512,freq=2.0), product of:
              0.2945429 = queryWeight, product of:
                1.5327976 = boost
                9.020458 = idf(docFreq=13, maxDocs=42596)
                0.021302726 = queryNorm
              1.195955 = fieldWeight in 512, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.020458 = idf(docFreq=13, maxDocs=42596)
                0.09375 = fieldNorm(doc=512)
        0.16 = coord(4/25)
    
  3. Jakac-Bizjak, V.: Planning the national electronic library in Slovenia (1998) 0.10
    0.09833946 = sum of:
      0.09833946 = product of:
        0.40974775 = sum of:
          0.06725882 = weight(abstract_txt:centre in 6195) [ClassicSimilarity], result of:
            0.06725882 = score(doc=6195,freq=1.0), product of:
              0.13895364 = queryWeight, product of:
                1.052799 = boost
                6.195684 = idf(docFreq=235, maxDocs=42596)
                0.021302726 = queryNorm
              0.48403782 = fieldWeight in 6195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.195684 = idf(docFreq=235, maxDocs=42596)
                0.078125 = fieldNorm(doc=6195)
          0.0688253 = weight(abstract_txt:conversion in 6195) [ClassicSimilarity], result of:
            0.0688253 = score(doc=6195,freq=1.0), product of:
              0.14110287 = queryWeight, product of:
                1.0609097 = boost
                6.2434154 = idf(docFreq=224, maxDocs=42596)
                0.021302726 = queryNorm
              0.48776683 = fieldWeight in 6195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2434154 = idf(docFreq=224, maxDocs=42596)
                0.078125 = fieldNorm(doc=6195)
          0.030415742 = weight(abstract_txt:system in 6195) [ClassicSimilarity], result of:
            0.030415742 = score(doc=6195,freq=2.0), product of:
              0.08186584 = queryWeight, product of:
                1.1428175 = boost
                3.3627198 = idf(docFreq=4010, maxDocs=42596)
                0.021302726 = queryNorm
              0.37153155 = fieldWeight in 6195, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3627198 = idf(docFreq=4010, maxDocs=42596)
                0.078125 = fieldNorm(doc=6195)
          0.09046984 = weight(abstract_txt:retrospective in 6195) [ClassicSimilarity], result of:
            0.09046984 = score(doc=6195,freq=1.0), product of:
              0.16931924 = queryWeight, product of:
                1.162154 = boost
                6.839234 = idf(docFreq=123, maxDocs=42596)
                0.021302726 = queryNorm
              0.53431517 = fieldWeight in 6195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.839234 = idf(docFreq=123, maxDocs=42596)
                0.078125 = fieldNorm(doc=6195)
          0.067733504 = weight(abstract_txt:databases in 6195) [ClassicSimilarity], result of:
            0.067733504 = score(doc=6195,freq=2.0), product of:
              0.13960665 = queryWeight, product of:
                1.492377 = boost
                4.3912926 = idf(docFreq=1433, maxDocs=42596)
                0.021302726 = queryNorm
              0.48517388 = fieldWeight in 6195, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3912926 = idf(docFreq=1433, maxDocs=42596)
                0.078125 = fieldNorm(doc=6195)
          0.08504457 = weight(abstract_txt:bibliographic in 6195) [ClassicSimilarity], result of:
            0.08504457 = score(doc=6195,freq=1.0), product of:
              0.25792187 = queryWeight, product of:
                2.8686965 = boost
                4.220544 = idf(docFreq=1700, maxDocs=42596)
                0.021302726 = queryNorm
              0.32972997 = fieldWeight in 6195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.220544 = idf(docFreq=1700, maxDocs=42596)
                0.078125 = fieldNorm(doc=6195)
        0.24 = coord(6/25)
    
  4. McGarry, D.: Displays of bibliographic records in call number order : functions of the displays and data elements needed (1992) 0.09
    0.08778286 = sum of:
      0.08778286 = product of:
        0.4389143 = sum of:
          0.06521054 = weight(abstract_txt:entry in 2384) [ClassicSimilarity], result of:
            0.06521054 = score(doc=2384,freq=2.0), product of:
              0.12536578 = queryWeight, product of:
                5.884964 = idf(docFreq=321, maxDocs=42596)
                0.021302726 = queryNorm
              0.5201622 = fieldWeight in 2384, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.884964 = idf(docFreq=321, maxDocs=42596)
                0.0625 = fieldNorm(doc=2384)
          0.059076104 = weight(abstract_txt:element in 2384) [ClassicSimilarity], result of:
            0.059076104 = score(doc=2384,freq=1.0), product of:
              0.14788303 = queryWeight, product of:
                1.0860996 = boost
                6.3916574 = idf(docFreq=193, maxDocs=42596)
                0.021302726 = queryNorm
              0.39947858 = fieldWeight in 2384, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3916574 = idf(docFreq=193, maxDocs=42596)
                0.0625 = fieldNorm(doc=2384)
          0.08719992 = weight(abstract_txt:publisher in 2384) [ClassicSimilarity], result of:
            0.08719992 = score(doc=2384,freq=1.0), product of:
              0.1917143 = queryWeight, product of:
                1.2366242 = boost
                7.277489 = idf(docFreq=79, maxDocs=42596)
                0.021302726 = queryNorm
              0.45484307 = fieldWeight in 2384, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.277489 = idf(docFreq=79, maxDocs=42596)
                0.0625 = fieldNorm(doc=2384)
          0.09135641 = weight(abstract_txt:identification in 2384) [ClassicSimilarity], result of:
            0.09135641 = score(doc=2384,freq=1.0), product of:
              0.24916086 = queryWeight, product of:
                1.9937257 = boost
                5.866502 = idf(docFreq=327, maxDocs=42596)
                0.021302726 = queryNorm
              0.36665636 = fieldWeight in 2384, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.866502 = idf(docFreq=327, maxDocs=42596)
                0.0625 = fieldNorm(doc=2384)
          0.13607132 = weight(abstract_txt:bibliographic in 2384) [ClassicSimilarity], result of:
            0.13607132 = score(doc=2384,freq=4.0), product of:
              0.25792187 = queryWeight, product of:
                2.8686965 = boost
                4.220544 = idf(docFreq=1700, maxDocs=42596)
                0.021302726 = queryNorm
              0.527568 = fieldWeight in 2384, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.220544 = idf(docFreq=1700, maxDocs=42596)
                0.0625 = fieldNorm(doc=2384)
        0.2 = coord(5/25)
    
  5. VanAvery, A.R.: Recat vs. Recon of serials : a problem for shared cataloging (1990) 0.08
    0.08440535 = sum of:
      0.08440535 = product of:
        0.5275334 = sum of:
          0.110120475 = weight(abstract_txt:conversion in 778) [ClassicSimilarity], result of:
            0.110120475 = score(doc=778,freq=1.0), product of:
              0.14110287 = queryWeight, product of:
                1.0609097 = boost
                6.2434154 = idf(docFreq=224, maxDocs=42596)
                0.021302726 = queryNorm
              0.7804269 = fieldWeight in 778, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2434154 = idf(docFreq=224, maxDocs=42596)
                0.125 = fieldNorm(doc=778)
          0.20470987 = weight(abstract_txt:retrospective in 778) [ClassicSimilarity], result of:
            0.20470987 = score(doc=778,freq=2.0), product of:
              0.16931924 = queryWeight, product of:
                1.162154 = boost
                6.839234 = idf(docFreq=123, maxDocs=42596)
                0.021302726 = queryNorm
              1.2090172 = fieldWeight in 778, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.839234 = idf(docFreq=123, maxDocs=42596)
                0.125 = fieldNorm(doc=778)
          0.07663171 = weight(abstract_txt:databases in 778) [ClassicSimilarity], result of:
            0.07663171 = score(doc=778,freq=1.0), product of:
              0.13960665 = queryWeight, product of:
                1.492377 = boost
                4.3912926 = idf(docFreq=1433, maxDocs=42596)
                0.021302726 = queryNorm
              0.5489116 = fieldWeight in 778, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3912926 = idf(docFreq=1433, maxDocs=42596)
                0.125 = fieldNorm(doc=778)
          0.13607132 = weight(abstract_txt:bibliographic in 778) [ClassicSimilarity], result of:
            0.13607132 = score(doc=778,freq=1.0), product of:
              0.25792187 = queryWeight, product of:
                2.8686965 = boost
                4.220544 = idf(docFreq=1700, maxDocs=42596)
                0.021302726 = queryNorm
              0.527568 = fieldWeight in 778, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.220544 = idf(docFreq=1700, maxDocs=42596)
                0.125 = fieldNorm(doc=778)
        0.16 = coord(4/25)