Document (#32935)

Author
Kim, J.-H.
Choi, K.-S.
Title
Patent document categorization based on semantic structural information
Source
Information processing and management. 43(2007) no.5, S.1200-1215
Year
2007
Abstract
The number of patent documents is currently rising rapidly worldwide, creating the need for an automatic categorization system to replace time-consuming and labor-intensive manual categorization. Because accurate patent classification is crucial to search for relevant existing patents in a certain field, patent categorization is a very important and useful field. As patent documents are structural documents with their own characteristics distinguished from general documents, these unique traits should be considered in the patent categorization process. In this paper, we categorize Japanese patent documents automatically, focusing on their characteristics: patents are structured by claims, purposes, effects, embodiments of the invention, and so on. We propose a patent document categorization method that uses the k-NN (k-Nearest Neighbour) approach. In order to retrieve similar documents from a training document set, some specific components to denote the so-called semantic elements, such as claim, purpose, and application field, are compared instead of the whole texts. Because those specific components are identified by various user-defined tags, first all of the components are clustered into several semantic elements. Such semantically clustered structural components are the basic features of patent categorization. We can achieve a 74% improvement of categorization performance over a baseline system that does not use the structural information of the patent.
Footnote
Beitrag innerhalb eines Themenschwerpunkt "special issue on patent processing"
Field
Patentinformation

Similar documents (author)

  1. Choi, Y.: Effects of contextual factors on image searching on the Web (2010) 5.17
    5.1710296 = sum of:
      5.1710296 = weight(author_txt:choi in 460) [ClassicSimilarity], result of:
        5.1710296 = fieldWeight in 460, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.273647 = idf(docFreq=29, maxDocs=43254)
          0.625 = fieldNorm(doc=460)
    
  2. Choi, Y.: ¬A Practical application of FRBR for organizing information in digital environments (2012) 5.17
    5.1710296 = sum of:
      5.1710296 = weight(author_txt:choi in 1784) [ClassicSimilarity], result of:
        5.1710296 = fieldWeight in 1784, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.273647 = idf(docFreq=29, maxDocs=43254)
          0.625 = fieldNorm(doc=1784)
    
  3. Choi, Y.: Analysis of image search queries on the web : query modification patterns and semantic attributes (2013) 5.17
    5.1710296 = sum of:
      5.1710296 = weight(author_txt:choi in 2434) [ClassicSimilarity], result of:
        5.1710296 = fieldWeight in 2434, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.273647 = idf(docFreq=29, maxDocs=43254)
          0.625 = fieldNorm(doc=2434)
    
  4. Choi, N.: Information systems attachment : an empirical exploration of its antecedents and its impact on community participation intention (2013) 5.17
    5.1710296 = sum of:
      5.1710296 = weight(author_txt:choi in 2579) [ClassicSimilarity], result of:
        5.1710296 = fieldWeight in 2579, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.273647 = idf(docFreq=29, maxDocs=43254)
          0.625 = fieldNorm(doc=2579)
    
  5. Choi, Y.: ¬A complete assessment of tagging quality : a consolidated methodology (2015) 5.17
    5.1710296 = sum of:
      5.1710296 = weight(author_txt:choi in 3195) [ClassicSimilarity], result of:
        5.1710296 = fieldWeight in 3195, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.273647 = idf(docFreq=29, maxDocs=43254)
          0.625 = fieldNorm(doc=3195)
    

Similar documents (content)

  1. Liu, D.-R.; Shih, M.-J.: Hybrid-patent classification based on patent-network analysis (2011) 0.21
    0.20898859 = sum of:
      0.20898859 = product of:
        1.3061787 = sum of:
          0.04821376 = weight(abstract_txt:nearest in 654) [ClassicSimilarity], result of:
            0.04821376 = score(doc=654,freq=1.0), product of:
              0.09467039 = queryWeight, product of:
                1.1211259 = boost
                8.148484 = idf(docFreq=33, maxDocs=43254)
                0.010362939 = queryNorm
              0.50928026 = fieldWeight in 654, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.148484 = idf(docFreq=33, maxDocs=43254)
                0.0625 = fieldNorm(doc=654)
          0.103832595 = weight(abstract_txt:patents in 654) [ClassicSimilarity], result of:
            0.103832595 = score(doc=654,freq=2.0), product of:
              0.15787837 = queryWeight, product of:
                2.0474987 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.010362939 = queryNorm
              0.6576746 = fieldWeight in 654, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.0625 = fieldNorm(doc=654)
          0.052739594 = weight(abstract_txt:documents in 654) [ClassicSimilarity], result of:
            0.052739594 = score(doc=654,freq=2.0), product of:
              0.14495452 = queryWeight, product of:
                3.3981209 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.010362939 = queryNorm
              0.36383545 = fieldWeight in 654, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.0625 = fieldNorm(doc=654)
          1.1013927 = weight(abstract_txt:patent in 654) [ClassicSimilarity], result of:
            1.1013927 = score(doc=654,freq=14.0), product of:
              0.6813218 = queryWeight, product of:
                9.510941 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.010362939 = queryNorm
              1.616553 = fieldWeight in 654, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.0625 = fieldNorm(doc=654)
        0.16 = coord(4/25)
    
  2. Yan, B.; Luo, J.: Measuring technological distance for patent mapping (2017) 0.18
    0.1810863 = sum of:
      0.1810863 = product of:
        1.1317894 = sum of:
          0.024576625 = weight(abstract_txt:field in 4816) [ClassicSimilarity], result of:
            0.024576625 = score(doc=4816,freq=1.0), product of:
              0.08712753 = queryWeight, product of:
                1.8628832 = boost
                4.513223 = idf(docFreq=1288, maxDocs=43254)
                0.010362939 = queryNorm
              0.28207645 = fieldWeight in 4816, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.513223 = idf(docFreq=1288, maxDocs=43254)
                0.0625 = fieldNorm(doc=4816)
          0.103832595 = weight(abstract_txt:patents in 4816) [ClassicSimilarity], result of:
            0.103832595 = score(doc=4816,freq=2.0), product of:
              0.15787837 = queryWeight, product of:
                2.0474987 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.010362939 = queryNorm
              0.6576746 = fieldWeight in 4816, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.0625 = fieldNorm(doc=4816)
          0.07253342 = weight(abstract_txt:structural in 4816) [ClassicSimilarity], result of:
            0.07253342 = score(doc=4816,freq=1.0), product of:
              0.1973084 = queryWeight, product of:
                3.2370553 = boost
                5.881831 = idf(docFreq=327, maxDocs=43254)
                0.010362939 = queryNorm
              0.36761445 = fieldWeight in 4816, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.881831 = idf(docFreq=327, maxDocs=43254)
                0.0625 = fieldNorm(doc=4816)
          0.93084675 = weight(abstract_txt:patent in 4816) [ClassicSimilarity], result of:
            0.93084675 = score(doc=4816,freq=10.0), product of:
              0.6813218 = queryWeight, product of:
                9.510941 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.010362939 = queryNorm
              1.3662366 = fieldWeight in 4816, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.0625 = fieldNorm(doc=4816)
        0.16 = coord(4/25)
    
  3. Lai, K.-K.; Wu, S.-J.: Using the patent co-citation approach to establish a new patent classification system (2005) 0.17
    0.17214455 = sum of:
      0.17214455 = product of:
        1.0759034 = sum of:
          0.025007024 = weight(abstract_txt:specific in 3014) [ClassicSimilarity], result of:
            0.025007024 = score(doc=3014,freq=3.0), product of:
              0.053388096 = queryWeight, product of:
                1.1906512 = boost
                4.326901 = idf(docFreq=1552, maxDocs=43254)
                0.010362939 = queryNorm
              0.46840075 = fieldWeight in 3014, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.326901 = idf(docFreq=1552, maxDocs=43254)
                0.0625 = fieldNorm(doc=3014)
          0.020976057 = weight(abstract_txt:characteristics in 3014) [ClassicSimilarity], result of:
            0.020976057 = score(doc=3014,freq=1.0), product of:
              0.06848457 = queryWeight, product of:
                1.348524 = boost
                4.900621 = idf(docFreq=874, maxDocs=43254)
                0.010362939 = queryNorm
              0.3062888 = fieldWeight in 3014, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.900621 = idf(docFreq=874, maxDocs=43254)
                0.0625 = fieldNorm(doc=3014)
          0.14684147 = weight(abstract_txt:patents in 3014) [ClassicSimilarity], result of:
            0.14684147 = score(doc=3014,freq=4.0), product of:
              0.15787837 = queryWeight, product of:
                2.0474987 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.010362939 = queryNorm
              0.93009233 = fieldWeight in 3014, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.0625 = fieldNorm(doc=3014)
          0.8830788 = weight(abstract_txt:patent in 3014) [ClassicSimilarity], result of:
            0.8830788 = score(doc=3014,freq=9.0), product of:
              0.6813218 = queryWeight, product of:
                9.510941 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.010362939 = queryNorm
              1.2961259 = fieldWeight in 3014, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.0625 = fieldNorm(doc=3014)
        0.16 = coord(4/25)
    
  4. Cetintas, S.; Si, L.: Effective query generation and postprocessing strategies for prior art patent search (2012) 0.17
    0.16975044 = sum of:
      0.16975044 = product of:
        1.0609403 = sum of:
          0.030412022 = weight(abstract_txt:field in 1536) [ClassicSimilarity], result of:
            0.030412022 = score(doc=1536,freq=2.0), product of:
              0.08712753 = queryWeight, product of:
                1.8628832 = boost
                4.513223 = idf(docFreq=1288, maxDocs=43254)
                0.010362939 = queryNorm
              0.3490518 = fieldWeight in 1536, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.513223 = idf(docFreq=1288, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1536)
          0.14365204 = weight(abstract_txt:patents in 1536) [ClassicSimilarity], result of:
            0.14365204 = score(doc=1536,freq=5.0), product of:
              0.15787837 = queryWeight, product of:
                2.0474987 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.010362939 = queryNorm
              0.90989053 = fieldWeight in 1536, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1536)
          0.032630958 = weight(abstract_txt:documents in 1536) [ClassicSimilarity], result of:
            0.032630958 = score(doc=1536,freq=1.0), product of:
              0.14495452 = queryWeight, product of:
                3.3981209 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.010362939 = queryNorm
              0.2251117 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1536)
          0.8542453 = weight(abstract_txt:patent in 1536) [ClassicSimilarity], result of:
            0.8542453 = score(doc=1536,freq=11.0), product of:
              0.6813218 = queryWeight, product of:
                9.510941 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.010362939 = queryNorm
              1.2538059 = fieldWeight in 1536, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.0546875 = fieldNorm(doc=1536)
        0.16 = coord(4/25)
    
  5. Kay, L.; Newman, N.; Youtie, J.; Porter, A.L.; Rafols, I.: Patent overlay mapping : visualizing technological distance (2014) 0.16
    0.15619434 = sum of:
      0.15619434 = product of:
        1.3016195 = sum of:
          0.07342073 = weight(abstract_txt:patents in 3008) [ClassicSimilarity], result of:
            0.07342073 = score(doc=3008,freq=1.0), product of:
              0.15787837 = queryWeight, product of:
                2.0474987 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.010362939 = queryNorm
              0.46504617 = fieldWeight in 3008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.0625 = fieldNorm(doc=3008)
          0.20850722 = weight(abstract_txt:categorization in 3008) [ClassicSimilarity], result of:
            0.20850722 = score(doc=3008,freq=1.0), product of:
              0.5025866 = queryWeight, product of:
                7.3063045 = boost
                6.6378922 = idf(docFreq=153, maxDocs=43254)
                0.010362939 = queryNorm
              0.41486827 = fieldWeight in 3008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6378922 = idf(docFreq=153, maxDocs=43254)
                0.0625 = fieldNorm(doc=3008)
          1.0196916 = weight(abstract_txt:patent in 3008) [ClassicSimilarity], result of:
            1.0196916 = score(doc=3008,freq=12.0), product of:
              0.6813218 = queryWeight, product of:
                9.510941 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.010362939 = queryNorm
              1.4966372 = fieldWeight in 3008, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.0625 = fieldNorm(doc=3008)
        0.12 = coord(3/25)