Document (#20656)

Author
Lawson, M.
Title
Automatic extraction of citations from the text of English-language patents : an example of template mining
Source
Journal of information science. 22(1996) no.6, S.423-436
Year
1996
Abstract
Describes and evaluates methods for automatically isolating and extracting biliographic references from the full texts of patents, designed to facilitate the work of patent examiners who currently perform this task manually. These references include citations both to patents and to other bibliographic sources. Notes that patents are unusual as citing documents in that the citations occur maily in the body of the text, rather than as footnotes or in separate sections. Describes the natural language processing technique of template mining used to extract data directly from the text where either the data or the text surrounding the data form recognizable patterns. When text matches a template, the system extracts data according to instructions associated with that template. Examines the sub languages of citations and the development of templates for the extraction of citations to patent. Reports results of running 2 reference extraction systems against a sample of 100 European Patent Office patent documents, with recall and prescision data for patent and non patent citations, and concludes with suggestions for future improvements
Field
Patentinformation

Similar documents (author)

  1. Lawson, V.L.: Using a computer-assisted-instruction program to replace the traditional library tour : an experimental study (1989) 6.00
    5.9971275 = sum of:
      5.9971275 = weight(author_txt:lawson in 6668) [ClassicSimilarity], result of:
        5.9971275 = fieldWeight in 6668, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.595404 = idf(docFreq=7, maxDocs=43254)
          0.625 = fieldNorm(doc=6668)
    
  2. Lawson, G.T.: Software reviews : Microsoft Cinemania (1994) 6.00
    5.9971275 = sum of:
      5.9971275 = weight(author_txt:lawson in 2024) [ClassicSimilarity], result of:
        5.9971275 = fieldWeight in 2024, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.595404 = idf(docFreq=7, maxDocs=43254)
          0.625 = fieldNorm(doc=2024)
    
  3. Lawson, D.: You've come a long way, Dewey! (2001) 6.00
    5.9971275 = sum of:
      5.9971275 = weight(author_txt:lawson in 914) [ClassicSimilarity], result of:
        5.9971275 = fieldWeight in 914, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.595404 = idf(docFreq=7, maxDocs=43254)
          0.625 = fieldNorm(doc=914)
    
  4. Lawson, A.E.: How do people learn? : and what does that imply about the nature of knowledge (2000) 6.00
    5.9971275 = sum of:
      5.9971275 = weight(author_txt:lawson in 1140) [ClassicSimilarity], result of:
        5.9971275 = fieldWeight in 1140, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.595404 = idf(docFreq=7, maxDocs=43254)
          0.625 = fieldNorm(doc=1140)
    
  5. Lawson, V.; Vasconcellos, M.: Forty ways to skin a cat : users report on machine translation (1994) 4.80
    4.797702 = sum of:
      4.797702 = weight(author_txt:lawson in 6956) [ClassicSimilarity], result of:
        4.797702 = fieldWeight in 6956, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.595404 = idf(docFreq=7, maxDocs=43254)
          0.5 = fieldNorm(doc=6956)
    

Similar documents (content)

  1. Perez-Molina, E.: ¬The role of patent citations as a footprint of technology (2018) 0.34
    0.34017533 = sum of:
      0.34017533 = product of:
        1.0630479 = sum of:
          0.0063758884 = weight(abstract_txt:with in 188) [ClassicSimilarity], result of:
            0.0063758884 = score(doc=188,freq=1.0), product of:
              0.03250603 = queryWeight, product of:
                1.0497453 = boost
                2.5106533 = idf(docFreq=9548, maxDocs=43254)
                0.012333697 = queryNorm
              0.19614479 = fieldWeight in 188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5106533 = idf(docFreq=9548, maxDocs=43254)
                0.078125 = fieldNorm(doc=188)
          0.018733514 = weight(abstract_txt:documents in 188) [ClassicSimilarity], result of:
            0.018733514 = score(doc=188,freq=1.0), product of:
              0.05825312 = queryWeight, product of:
                1.1474029 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.012333697 = queryNorm
              0.32158816 = fieldWeight in 188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.078125 = fieldNorm(doc=188)
          0.012249112 = weight(abstract_txt:from in 188) [ClassicSimilarity], result of:
            0.012249112 = score(doc=188,freq=2.0), product of:
              0.03987156 = queryWeight, product of:
                1.1626087 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.012333697 = queryNorm
              0.3072143 = fieldWeight in 188, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.078125 = fieldNorm(doc=188)
          0.047155973 = weight(abstract_txt:references in 188) [ClassicSimilarity], result of:
            0.047155973 = score(doc=188,freq=1.0), product of:
              0.10779473 = queryWeight, product of:
                1.5608282 = boost
                5.5994987 = idf(docFreq=434, maxDocs=43254)
                0.012333697 = queryNorm
              0.43746084 = fieldWeight in 188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5994987 = idf(docFreq=434, maxDocs=43254)
                0.078125 = fieldNorm(doc=188)
          0.035990454 = weight(abstract_txt:data in 188) [ClassicSimilarity], result of:
            0.035990454 = score(doc=188,freq=2.0), product of:
              0.096976824 = queryWeight, product of:
                2.3407786 = boost
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.012333697 = queryNorm
              0.3711243 = fieldWeight in 188, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.078125 = fieldNorm(doc=188)
          0.4425857 = weight(abstract_txt:patents in 188) [ClassicSimilarity], result of:
            0.4425857 = score(doc=188,freq=4.0), product of:
              0.3806811 = queryWeight, product of:
                4.1481266 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.012333697 = queryNorm
              1.1626154 = fieldWeight in 188, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.078125 = fieldNorm(doc=188)
          0.12354549 = weight(abstract_txt:citations in 188) [ClassicSimilarity], result of:
            0.12354549 = score(doc=188,freq=1.0), product of:
              0.29545957 = queryWeight, product of:
                4.475752 = boost
                5.3522797 = idf(docFreq=556, maxDocs=43254)
                0.012333697 = queryNorm
              0.41814685 = fieldWeight in 188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3522797 = idf(docFreq=556, maxDocs=43254)
                0.078125 = fieldNorm(doc=188)
          0.37641177 = weight(abstract_txt:patent in 188) [ClassicSimilarity], result of:
            0.37641177 = score(doc=188,freq=2.0), product of:
              0.49284717 = queryWeight, product of:
                5.7806025 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.012333697 = queryNorm
              0.7637495 = fieldWeight in 188, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.078125 = fieldNorm(doc=188)
        0.32 = coord(8/25)
    
  2. Tseng, Y.-H.; Lin, C.-J.; Lin, Y.-I.: Text mining techniques for patent analysis (2007) 0.34
    0.34012604 = sum of:
      0.34012604 = product of:
        1.0628939 = sum of:
          0.045901835 = weight(abstract_txt:extracts in 2936) [ClassicSimilarity], result of:
            0.045901835 = score(doc=2936,freq=1.0), product of:
              0.097511634 = queryWeight, product of:
                1.0497105 = boost
                7.53171 = idf(docFreq=62, maxDocs=43254)
                0.012333697 = queryNorm
              0.47073188 = fieldWeight in 2936, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.53171 = idf(docFreq=62, maxDocs=43254)
                0.0625 = fieldNorm(doc=2936)
          0.01187338 = weight(abstract_txt:describes in 2936) [ClassicSimilarity], result of:
            0.01187338 = score(doc=2936,freq=1.0), product of:
              0.0498765 = queryWeight, product of:
                1.0617062 = boost
                3.8088896 = idf(docFreq=2606, maxDocs=43254)
                0.012333697 = queryNorm
              0.2380556 = fieldWeight in 2936, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8088896 = idf(docFreq=2606, maxDocs=43254)
                0.0625 = fieldNorm(doc=2936)
          0.01498681 = weight(abstract_txt:documents in 2936) [ClassicSimilarity], result of:
            0.01498681 = score(doc=2936,freq=1.0), product of:
              0.05825312 = queryWeight, product of:
                1.1474029 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.012333697 = queryNorm
              0.25727051 = fieldWeight in 2936, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.0625 = fieldNorm(doc=2936)
          0.05116985 = weight(abstract_txt:mining in 2936) [ClassicSimilarity], result of:
            0.05116985 = score(doc=2936,freq=1.0), product of:
              0.13208571 = queryWeight, product of:
                1.7277633 = boost
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.012333697 = queryNorm
              0.38739884 = fieldWeight in 2936, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.0625 = fieldNorm(doc=2936)
          0.10921263 = weight(abstract_txt:extraction in 2936) [ClassicSimilarity], result of:
            0.10921263 = score(doc=2936,freq=2.0), product of:
              0.19893692 = queryWeight, product of:
                2.5969265 = boost
                6.2110133 = idf(docFreq=235, maxDocs=43254)
                0.012333697 = queryNorm
              0.5489812 = fieldWeight in 2936, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2110133 = idf(docFreq=235, maxDocs=43254)
                0.0625 = fieldNorm(doc=2936)
          0.050456256 = weight(abstract_txt:text in 2936) [ClassicSimilarity], result of:
            0.050456256 = score(doc=2936,freq=2.0), product of:
              0.14095908 = queryWeight, product of:
                2.822103 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.012333697 = queryNorm
              0.35794964 = fieldWeight in 2936, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.0625 = fieldNorm(doc=2936)
          0.17703429 = weight(abstract_txt:patents in 2936) [ClassicSimilarity], result of:
            0.17703429 = score(doc=2936,freq=1.0), product of:
              0.3806811 = queryWeight, product of:
                4.1481266 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.012333697 = queryNorm
              0.46504617 = fieldWeight in 2936, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.0625 = fieldNorm(doc=2936)
          0.60225886 = weight(abstract_txt:patent in 2936) [ClassicSimilarity], result of:
            0.60225886 = score(doc=2936,freq=8.0), product of:
              0.49284717 = queryWeight, product of:
                5.7806025 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.012333697 = queryNorm
              1.2219992 = fieldWeight in 2936, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.0625 = fieldNorm(doc=2936)
        0.32 = coord(8/25)
    
  3. Azagra-Caro, J.M.; Mattsson, P.; Perruchas, F.: Smoothing the lies : the distinctive effects of patent characteristics on examiner and applicant citations (2011) 0.28
    0.27927706 = sum of:
      0.27927706 = product of:
        1.3963852 = sum of:
          0.18333772 = weight(abstract_txt:examiners in 1212) [ClassicSimilarity], result of:
            0.18333772 = score(doc=1212,freq=2.0), product of:
              0.16790138 = queryWeight, product of:
                1.3774266 = boost
                9.883085 = idf(docFreq=5, maxDocs=43254)
                0.012333697 = queryNorm
              1.091937 = fieldWeight in 1212, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.883085 = idf(docFreq=5, maxDocs=43254)
                0.078125 = fieldNorm(doc=1212)
          0.047155973 = weight(abstract_txt:references in 1212) [ClassicSimilarity], result of:
            0.047155973 = score(doc=1212,freq=1.0), product of:
              0.10779473 = queryWeight, product of:
                1.5608282 = boost
                5.5994987 = idf(docFreq=434, maxDocs=43254)
                0.012333697 = queryNorm
              0.43746084 = fieldWeight in 1212, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5994987 = idf(docFreq=434, maxDocs=43254)
                0.078125 = fieldNorm(doc=1212)
          0.22129285 = weight(abstract_txt:patents in 1212) [ClassicSimilarity], result of:
            0.22129285 = score(doc=1212,freq=1.0), product of:
              0.3806811 = queryWeight, product of:
                4.1481266 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.012333697 = queryNorm
              0.5813077 = fieldWeight in 1212, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.078125 = fieldNorm(doc=1212)
          0.3494394 = weight(abstract_txt:citations in 1212) [ClassicSimilarity], result of:
            0.3494394 = score(doc=1212,freq=8.0), product of:
              0.29545957 = queryWeight, product of:
                4.475752 = boost
                5.3522797 = idf(docFreq=556, maxDocs=43254)
                0.012333697 = queryNorm
              1.1826979 = fieldWeight in 1212, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.3522797 = idf(docFreq=556, maxDocs=43254)
                0.078125 = fieldNorm(doc=1212)
          0.59515923 = weight(abstract_txt:patent in 1212) [ClassicSimilarity], result of:
            0.59515923 = score(doc=1212,freq=5.0), product of:
              0.49284717 = queryWeight, product of:
                5.7806025 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.012333697 = queryNorm
              1.2075939 = fieldWeight in 1212, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.078125 = fieldNorm(doc=1212)
        0.2 = coord(5/25)
    
  4. Karki, M.M.S.: Patent citation analysis : a policy analysis tool (1997) 0.25
    0.25154853 = sum of:
      0.25154853 = product of:
        1.5721784 = sum of:
          0.029385118 = weight(abstract_txt:describes in 4077) [ClassicSimilarity], result of:
            0.029385118 = score(doc=4077,freq=2.0), product of:
              0.0498765 = queryWeight, product of:
                1.0617062 = boost
                3.8088896 = idf(docFreq=2606, maxDocs=43254)
                0.012333697 = queryNorm
              0.5891576 = fieldWeight in 4077, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.8088896 = idf(docFreq=2606, maxDocs=43254)
                0.109375 = fieldNorm(doc=4077)
          0.5366066 = weight(abstract_txt:patents in 4077) [ClassicSimilarity], result of:
            0.5366066 = score(doc=4077,freq=3.0), product of:
              0.3806811 = queryWeight, product of:
                4.1481266 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.012333697 = queryNorm
              1.4095962 = fieldWeight in 4077, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.109375 = fieldNorm(doc=4077)
          0.17296368 = weight(abstract_txt:citations in 4077) [ClassicSimilarity], result of:
            0.17296368 = score(doc=4077,freq=1.0), product of:
              0.29545957 = queryWeight, product of:
                4.475752 = boost
                5.3522797 = idf(docFreq=556, maxDocs=43254)
                0.012333697 = queryNorm
              0.5854056 = fieldWeight in 4077, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3522797 = idf(docFreq=556, maxDocs=43254)
                0.109375 = fieldNorm(doc=4077)
          0.83322304 = weight(abstract_txt:patent in 4077) [ClassicSimilarity], result of:
            0.83322304 = score(doc=4077,freq=5.0), product of:
              0.49284717 = queryWeight, product of:
                5.7806025 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.012333697 = queryNorm
              1.6906316 = fieldWeight in 4077, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.109375 = fieldNorm(doc=4077)
        0.16 = coord(4/25)
    
  5. Huang, M.-H.; Huang, W.-T.; Chang, C.-C.; Chen, D. Z.; Lin, C.-P.: The greater scattering phenomenon beyond Bradford's law in patent citation (2014) 0.24
    0.24138212 = sum of:
      0.24138212 = product of:
        1.2069106 = sum of:
          0.0063758884 = weight(abstract_txt:with in 2817) [ClassicSimilarity], result of:
            0.0063758884 = score(doc=2817,freq=1.0), product of:
              0.03250603 = queryWeight, product of:
                1.0497453 = boost
                2.5106533 = idf(docFreq=9548, maxDocs=43254)
                0.012333697 = queryNorm
              0.19614479 = fieldWeight in 2817, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5106533 = idf(docFreq=9548, maxDocs=43254)
                0.078125 = fieldNorm(doc=2817)
          0.00866143 = weight(abstract_txt:from in 2817) [ClassicSimilarity], result of:
            0.00866143 = score(doc=2817,freq=1.0), product of:
              0.03987156 = queryWeight, product of:
                1.1626087 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.012333697 = queryNorm
              0.2172333 = fieldWeight in 2817, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.078125 = fieldNorm(doc=2817)
          0.3832905 = weight(abstract_txt:patents in 2817) [ClassicSimilarity], result of:
            0.3832905 = score(doc=2817,freq=3.0), product of:
              0.3806811 = queryWeight, product of:
                4.1481266 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.012333697 = queryNorm
              1.0068545 = fieldWeight in 2817, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.078125 = fieldNorm(doc=2817)
          0.2762561 = weight(abstract_txt:citations in 2817) [ClassicSimilarity], result of:
            0.2762561 = score(doc=2817,freq=5.0), product of:
              0.29545957 = queryWeight, product of:
                4.475752 = boost
                5.3522797 = idf(docFreq=556, maxDocs=43254)
                0.012333697 = queryNorm
              0.93500483 = fieldWeight in 2817, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.3522797 = idf(docFreq=556, maxDocs=43254)
                0.078125 = fieldNorm(doc=2817)
          0.53232664 = weight(abstract_txt:patent in 2817) [ClassicSimilarity], result of:
            0.53232664 = score(doc=2817,freq=4.0), product of:
              0.49284717 = queryWeight, product of:
                5.7806025 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.012333697 = queryNorm
              1.0801048 = fieldWeight in 2817, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.078125 = fieldNorm(doc=2817)
        0.2 = coord(5/25)