Document (#20655)

Author
Lawson, M.
Title
Automatic extraction of citations from the text of English-language patents : an example of template mining
Source
Journal of information science. 22(1996) no.6, S.423-436
Year
1996
Abstract
Describes and evaluates methods for automatically isolating and extracting biliographic references from the full texts of patents, designed to facilitate the work of patent examiners who currently perform this task manually. These references include citations both to patents and to other bibliographic sources. Notes that patents are unusual as citing documents in that the citations occur maily in the body of the text, rather than as footnotes or in separate sections. Describes the natural language processing technique of template mining used to extract data directly from the text where either the data or the text surrounding the data form recognizable patterns. When text matches a template, the system extracts data according to instructions associated with that template. Examines the sub languages of citations and the development of templates for the extraction of citations to patent. Reports results of running 2 reference extraction systems against a sample of 100 European Patent Office patent documents, with recall and prescision data for patent and non patent citations, and concludes with suggestions for future improvements
Field
Patentinformation

Similar documents (author)

  1. Lawson, V.L.: Using a computer-assisted-instruction program to replace the traditional library tour : an experimental study (1989) 5.94
    5.937289 = sum of:
      5.937289 = weight(author_txt:lawson in 6668) [ClassicSimilarity], result of:
        5.937289 = fieldWeight in 6668, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.625 = fieldNorm(doc=6668)
    
  2. Lawson, G.T.: Software reviews : Microsoft Cinemania (1994) 5.94
    5.937289 = sum of:
      5.937289 = weight(author_txt:lawson in 955) [ClassicSimilarity], result of:
        5.937289 = fieldWeight in 955, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.625 = fieldNorm(doc=955)
    
  3. Lawson, D.: You've come a long way, Dewey! (2001) 5.94
    5.937289 = sum of:
      5.937289 = weight(author_txt:lawson in 5913) [ClassicSimilarity], result of:
        5.937289 = fieldWeight in 5913, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.625 = fieldNorm(doc=5913)
    
  4. Lawson, A.E.: How do people learn? : and what does that imply about the nature of knowledge (2000) 5.94
    5.937289 = sum of:
      5.937289 = weight(author_txt:lawson in 6139) [ClassicSimilarity], result of:
        5.937289 = fieldWeight in 6139, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.625 = fieldNorm(doc=6139)
    
  5. Lawson, V.; Vasconcellos, M.: Forty ways to skin a cat : users report on machine translation (1994) 4.75
    4.749831 = sum of:
      4.749831 = weight(author_txt:lawson in 6956) [ClassicSimilarity], result of:
        4.749831 = fieldWeight in 6956, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.5 = fieldNorm(doc=6956)
    

Similar documents (content)

  1. Tseng, Y.-H.; Lin, C.-J.; Lin, Y.-I.: Text mining techniques for patent analysis (2007) 0.34
    0.34131902 = sum of:
      0.34131902 = product of:
        1.066622 = sum of:
          0.046342757 = weight(abstract_txt:extracts in 935) [ClassicSimilarity], result of:
            0.046342757 = score(doc=935,freq=1.0), product of:
              0.09816103 = queryWeight, product of:
                1.0559384 = boost
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.0123065915 = queryNorm
              0.47210953 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
          0.012025265 = weight(abstract_txt:describes in 935) [ClassicSimilarity], result of:
            0.012025265 = score(doc=935,freq=1.0), product of:
              0.050314244 = queryWeight, product of:
                1.0691274 = boost
                3.8240511 = idf(docFreq=2624, maxDocs=44218)
                0.0123065915 = queryNorm
              0.2390032 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8240511 = idf(docFreq=2624, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
          0.015053141 = weight(abstract_txt:documents in 935) [ClassicSimilarity], result of:
            0.015053141 = score(doc=935,freq=1.0), product of:
              0.058440324 = queryWeight, product of:
                1.152233 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0123065915 = queryNorm
              0.2575814 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
          0.050643582 = weight(abstract_txt:mining in 935) [ClassicSimilarity], result of:
            0.050643582 = score(doc=935,freq=1.0), product of:
              0.13121317 = queryWeight, product of:
                1.7265245 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0123065915 = queryNorm
              0.38596416 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
          0.10827524 = weight(abstract_txt:extraction in 935) [ClassicSimilarity], result of:
            0.10827524 = score(doc=935,freq=2.0), product of:
              0.19784924 = queryWeight, product of:
                2.596551 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0123065915 = queryNorm
              0.54726136 = fieldWeight in 935, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
          0.05027697 = weight(abstract_txt:text in 935) [ClassicSimilarity], result of:
            0.05027697 = score(doc=935,freq=2.0), product of:
              0.14066215 = queryWeight, product of:
                2.8264587 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0123065915 = queryNorm
              0.3574307 = fieldWeight in 935, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
          0.17772198 = weight(abstract_txt:patents in 935) [ClassicSimilarity], result of:
            0.17772198 = score(doc=935,freq=1.0), product of:
              0.38176718 = queryWeight, product of:
                4.1648397 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0123065915 = queryNorm
              0.4655245 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
          0.6062831 = weight(abstract_txt:patent in 935) [ClassicSimilarity], result of:
            0.6062831 = score(doc=935,freq=8.0), product of:
              0.49517107 = queryWeight, product of:
                5.8092794 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.0123065915 = queryNorm
              1.2243912 = fieldWeight in 935, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
        0.32 = coord(8/25)
    
  2. Perez-Molina, E.: ¬The role of patent citations as a footprint of technology (2018) 0.34
    0.34100518 = sum of:
      0.34100518 = product of:
        1.0656412 = sum of:
          0.0062980475 = weight(abstract_txt:with in 4187) [ClassicSimilarity], result of:
            0.0062980475 = score(doc=4187,freq=1.0), product of:
              0.032249443 = queryWeight, product of:
                1.0483124 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0123065915 = queryNorm
              0.19529167 = fieldWeight in 4187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.078125 = fieldNorm(doc=4187)
          0.018816428 = weight(abstract_txt:documents in 4187) [ClassicSimilarity], result of:
            0.018816428 = score(doc=4187,freq=1.0), product of:
              0.058440324 = queryWeight, product of:
                1.152233 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0123065915 = queryNorm
              0.32197678 = fieldWeight in 4187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=4187)
          0.012039228 = weight(abstract_txt:from in 4187) [ClassicSimilarity], result of:
            0.012039228 = score(doc=4187,freq=2.0), product of:
              0.03942521 = queryWeight, product of:
                1.1590886 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0123065915 = queryNorm
              0.30536878 = fieldWeight in 4187, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=4187)
          0.047232937 = weight(abstract_txt:references in 4187) [ClassicSimilarity], result of:
            0.047232937 = score(doc=4187,freq=1.0), product of:
              0.10794052 = queryWeight, product of:
                1.5659441 = boost
                5.601063 = idf(docFreq=443, maxDocs=44218)
                0.0123065915 = queryNorm
              0.43758303 = fieldWeight in 4187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.601063 = idf(docFreq=443, maxDocs=44218)
                0.078125 = fieldNorm(doc=4187)
          0.035294138 = weight(abstract_txt:data in 4187) [ClassicSimilarity], result of:
            0.035294138 = score(doc=4187,freq=2.0), product of:
              0.09574723 = queryWeight, product of:
                2.3319387 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0123065915 = queryNorm
              0.36861783 = fieldWeight in 4187, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=4187)
          0.4443049 = weight(abstract_txt:patents in 4187) [ClassicSimilarity], result of:
            0.4443049 = score(doc=4187,freq=4.0), product of:
              0.38176718 = queryWeight, product of:
                4.1648397 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0123065915 = queryNorm
              1.1638112 = fieldWeight in 4187, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.078125 = fieldNorm(doc=4187)
          0.12272855 = weight(abstract_txt:citations in 4187) [ClassicSimilarity], result of:
            0.12272855 = score(doc=4187,freq=1.0), product of:
              0.2942334 = queryWeight, product of:
                4.4780674 = boost
                5.339045 = idf(docFreq=576, maxDocs=44218)
                0.0123065915 = queryNorm
              0.4171129 = fieldWeight in 4187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.339045 = idf(docFreq=576, maxDocs=44218)
                0.078125 = fieldNorm(doc=4187)
          0.37892693 = weight(abstract_txt:patent in 4187) [ClassicSimilarity], result of:
            0.37892693 = score(doc=4187,freq=2.0), product of:
              0.49517107 = queryWeight, product of:
                5.8092794 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.0123065915 = queryNorm
              0.7652445 = fieldWeight in 4187, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.078125 = fieldNorm(doc=4187)
        0.32 = coord(8/25)
    
  3. Azagra-Caro, J.M.; Mattsson, P.; Perruchas, F.: Smoothing the lies : the distinctive effects of patent characteristics on examiner and applicant citations (2011) 0.28
    0.28007278 = sum of:
      0.28007278 = product of:
        1.4003639 = sum of:
          0.18471357 = weight(abstract_txt:examiners in 4747) [ClassicSimilarity], result of:
            0.18471357 = score(doc=4747,freq=2.0), product of:
              0.16878495 = queryWeight, product of:
                1.3846368 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.0123065915 = queryNorm
              1.0943723 = fieldWeight in 4747, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.078125 = fieldNorm(doc=4747)
          0.047232937 = weight(abstract_txt:references in 4747) [ClassicSimilarity], result of:
            0.047232937 = score(doc=4747,freq=1.0), product of:
              0.10794052 = queryWeight, product of:
                1.5659441 = boost
                5.601063 = idf(docFreq=443, maxDocs=44218)
                0.0123065915 = queryNorm
              0.43758303 = fieldWeight in 4747, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.601063 = idf(docFreq=443, maxDocs=44218)
                0.078125 = fieldNorm(doc=4747)
          0.22215246 = weight(abstract_txt:patents in 4747) [ClassicSimilarity], result of:
            0.22215246 = score(doc=4747,freq=1.0), product of:
              0.38176718 = queryWeight, product of:
                4.1648397 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0123065915 = queryNorm
              0.5819056 = fieldWeight in 4747, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.078125 = fieldNorm(doc=4747)
          0.34712878 = weight(abstract_txt:citations in 4747) [ClassicSimilarity], result of:
            0.34712878 = score(doc=4747,freq=8.0), product of:
              0.2942334 = queryWeight, product of:
                4.4780674 = boost
                5.339045 = idf(docFreq=576, maxDocs=44218)
                0.0123065915 = queryNorm
              1.1797734 = fieldWeight in 4747, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.339045 = idf(docFreq=576, maxDocs=44218)
                0.078125 = fieldNorm(doc=4747)
          0.5991361 = weight(abstract_txt:patent in 4747) [ClassicSimilarity], result of:
            0.5991361 = score(doc=4747,freq=5.0), product of:
              0.49517107 = queryWeight, product of:
                5.8092794 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.0123065915 = queryNorm
              1.2099578 = fieldWeight in 4747, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.078125 = fieldNorm(doc=4747)
        0.2 = coord(5/25)
    
  4. Karki, M.M.S.: Patent citation analysis : a policy analysis tool (1997) 0.25
    0.25265002 = sum of:
      0.25265002 = product of:
        1.5790627 = sum of:
          0.029761013 = weight(abstract_txt:describes in 2076) [ClassicSimilarity], result of:
            0.029761013 = score(doc=2076,freq=2.0), product of:
              0.050314244 = queryWeight, product of:
                1.0691274 = boost
                3.8240511 = idf(docFreq=2624, maxDocs=44218)
                0.0123065915 = queryNorm
              0.5915027 = fieldWeight in 2076, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.8240511 = idf(docFreq=2624, maxDocs=44218)
                0.109375 = fieldNorm(doc=2076)
          0.5386911 = weight(abstract_txt:patents in 2076) [ClassicSimilarity], result of:
            0.5386911 = score(doc=2076,freq=3.0), product of:
              0.38176718 = queryWeight, product of:
                4.1648397 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0123065915 = queryNorm
              1.4110461 = fieldWeight in 2076, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.109375 = fieldNorm(doc=2076)
          0.17181997 = weight(abstract_txt:citations in 2076) [ClassicSimilarity], result of:
            0.17181997 = score(doc=2076,freq=1.0), product of:
              0.2942334 = queryWeight, product of:
                4.4780674 = boost
                5.339045 = idf(docFreq=576, maxDocs=44218)
                0.0123065915 = queryNorm
              0.583958 = fieldWeight in 2076, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.339045 = idf(docFreq=576, maxDocs=44218)
                0.109375 = fieldNorm(doc=2076)
          0.8387906 = weight(abstract_txt:patent in 2076) [ClassicSimilarity], result of:
            0.8387906 = score(doc=2076,freq=5.0), product of:
              0.49517107 = queryWeight, product of:
                5.8092794 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.0123065915 = queryNorm
              1.693941 = fieldWeight in 2076, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.109375 = fieldNorm(doc=2076)
        0.16 = coord(4/25)
    
  5. Huang, M.-H.; Huang, W.-T.; Chang, C.-C.; Chen, D. Z.; Lin, C.-P.: The greater scattering phenomenon beyond Bradford's law in patent citation (2014) 0.24
    0.2419807 = sum of:
      0.2419807 = product of:
        1.2099035 = sum of:
          0.0062980475 = weight(abstract_txt:with in 1352) [ClassicSimilarity], result of:
            0.0062980475 = score(doc=1352,freq=1.0), product of:
              0.032249443 = queryWeight, product of:
                1.0483124 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0123065915 = queryNorm
              0.19529167 = fieldWeight in 1352, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.078125 = fieldNorm(doc=1352)
          0.008513019 = weight(abstract_txt:from in 1352) [ClassicSimilarity], result of:
            0.008513019 = score(doc=1352,freq=1.0), product of:
              0.03942521 = queryWeight, product of:
                1.1590886 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0123065915 = queryNorm
              0.21592833 = fieldWeight in 1352, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=1352)
          0.38477936 = weight(abstract_txt:patents in 1352) [ClassicSimilarity], result of:
            0.38477936 = score(doc=1352,freq=3.0), product of:
              0.38176718 = queryWeight, product of:
                4.1648397 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0123065915 = queryNorm
              1.0078901 = fieldWeight in 1352, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.078125 = fieldNorm(doc=1352)
          0.27442938 = weight(abstract_txt:citations in 1352) [ClassicSimilarity], result of:
            0.27442938 = score(doc=1352,freq=5.0), product of:
              0.2942334 = queryWeight, product of:
                4.4780674 = boost
                5.339045 = idf(docFreq=576, maxDocs=44218)
                0.0123065915 = queryNorm
              0.9326928 = fieldWeight in 1352, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.339045 = idf(docFreq=576, maxDocs=44218)
                0.078125 = fieldNorm(doc=1352)
          0.5358836 = weight(abstract_txt:patent in 1352) [ClassicSimilarity], result of:
            0.5358836 = score(doc=1352,freq=4.0), product of:
              0.49517107 = queryWeight, product of:
                5.8092794 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.0123065915 = queryNorm
              1.0822191 = fieldWeight in 1352, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.078125 = fieldNorm(doc=1352)
        0.2 = coord(5/25)