Document (#37072)

Author
Cetintas, S.
Si, L.
Title
Effective query generation and postprocessing strategies for prior art patent search
Source
Journal of the American Society for Information Science and Technology. 63(2012) no.3, S.512-527
Year
2012
Abstract
Rapid increase in global competition demands increased protection of intellectual property rights and underlines the importance of patents as major intellectual property documents. Prior art patent search is the task of identifying related patents for a given patent file, and is an essential step in judging the validity of a patent application. This article proposes an automated query generation and postprocessing method for prior art patent search. The proposed approach first constructs structured queries by combining terms extracted from different fields of a query patent and then reranks the retrieved patents by utilizing the International Patent Classification (IPC) code similarities between the query patent and the retrieved patents along with the retrieval score. An extensive set of empirical results carried out on a large-scale, real-world dataset shows that utilizing 20 or 30 query terms extracted from all fields of an original query patent according to their log(tf)idf values helps form a representative search query out of the query patent and is found to be more effective than is using any number of query terms from any single field. It is shown that combining terms extracted from different fields of the query patent by giving higher importance to terms extracted from the abstract, claims, and description fields than to terms extracted from the title field is more effective than treating all extracted terms equally while forming the search query. Finally, utilizing the similarities between the IPC codes of the query patent and retrieved patents is shown to be beneficial to improve the effectiveness of the prior art search.
Field
Patentinformation

Similar documents (content)

  1. Liu, D.-R.; Shih, M.-J.: Hybrid-patent classification based on patent-network analysis (2011) 0.47
    0.46566066 = sum of:
      0.46566066 = product of:
        1.6630738 = sum of:
          0.013070763 = weight(abstract_txt:than in 4189) [ClassicSimilarity], result of:
            0.013070763 = score(doc=4189,freq=1.0), product of:
              0.053691283 = queryWeight, product of:
                1.4196386 = boost
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.009709767 = queryNorm
              0.24344292 = fieldWeight in 4189, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.0625 = fieldNorm(doc=4189)
          0.025018923 = weight(abstract_txt:effective in 4189) [ClassicSimilarity], result of:
            0.025018923 = score(doc=4189,freq=1.0), product of:
              0.08277176 = queryWeight, product of:
                1.7626538 = boost
                4.8362236 = idf(docFreq=953, maxDocs=44218)
                0.009709767 = queryNorm
              0.30226398 = fieldWeight in 4189, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8362236 = idf(docFreq=953, maxDocs=44218)
                0.0625 = fieldNorm(doc=4189)
          0.016177036 = weight(abstract_txt:from in 4189) [ClassicSimilarity], result of:
            0.016177036 = score(doc=4189,freq=3.0), product of:
              0.054067798 = queryWeight, product of:
                2.0146995 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.009709767 = queryNorm
              0.29919907 = fieldWeight in 4189, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=4189)
          0.10337776 = weight(abstract_txt:extracted in 4189) [ClassicSimilarity], result of:
            0.10337776 = score(doc=4189,freq=1.0), product of:
              0.26853314 = queryWeight, product of:
                4.489933 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.009709767 = queryNorm
              0.38497207 = fieldWeight in 4189, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.0625 = fieldNorm(doc=4189)
          0.21542732 = weight(abstract_txt:patents in 4189) [ClassicSimilarity], result of:
            0.21542732 = score(doc=4189,freq=2.0), product of:
              0.32722256 = queryWeight, product of:
                4.5245137 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.009709767 = queryNorm
              0.65835106 = fieldWeight in 4189, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0625 = fieldNorm(doc=4189)
          0.19008704 = weight(abstract_txt:query in 4189) [ClassicSimilarity], result of:
            0.19008704 = score(doc=4189,freq=4.0), product of:
              0.31989306 = queryWeight, product of:
                6.9304 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.009709767 = queryNorm
              0.5942206 = fieldWeight in 4189, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.0625 = fieldNorm(doc=4189)
          1.099915 = weight(abstract_txt:patent in 4189) [ClassicSimilarity], result of:
            1.099915 = score(doc=4189,freq=14.0), product of:
              0.67907834 = queryWeight, product of:
                10.097547 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.009709767 = queryNorm
              1.6197174 = fieldWeight in 4189, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.0625 = fieldNorm(doc=4189)
        0.28 = coord(7/25)
    
  2. Kay, L.; Newman, N.; Youtie, J.; Porter, A.L.; Rafols, I.: Patent overlay mapping : visualizing technological distance (2014) 0.33
    0.32868627 = sum of:
      0.32868627 = product of:
        1.3695261 = sum of:
          0.041759662 = weight(abstract_txt:similarities in 1543) [ClassicSimilarity], result of:
            0.041759662 = score(doc=1543,freq=1.0), product of:
              0.10174444 = queryWeight, product of:
                1.5956427 = boost
                6.5669885 = idf(docFreq=168, maxDocs=44218)
                0.009709767 = queryNorm
              0.41043678 = fieldWeight in 1543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5669885 = idf(docFreq=168, maxDocs=44218)
                0.0625 = fieldNorm(doc=1543)
          0.016177036 = weight(abstract_txt:from in 1543) [ClassicSimilarity], result of:
            0.016177036 = score(doc=1543,freq=3.0), product of:
              0.054067798 = queryWeight, product of:
                2.0146995 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.009709767 = queryNorm
              0.29919907 = fieldWeight in 1543, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=1543)
          0.03755812 = weight(abstract_txt:fields in 1543) [ClassicSimilarity], result of:
            0.03755812 = score(doc=1543,freq=1.0), product of:
              0.11944059 = queryWeight, product of:
                2.4449573 = boost
                5.0312033 = idf(docFreq=784, maxDocs=44218)
                0.009709767 = queryNorm
              0.3144502 = fieldWeight in 1543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0312033 = idf(docFreq=784, maxDocs=44218)
                0.0625 = fieldNorm(doc=1543)
          0.10337776 = weight(abstract_txt:extracted in 1543) [ClassicSimilarity], result of:
            0.10337776 = score(doc=1543,freq=1.0), product of:
              0.26853314 = queryWeight, product of:
                4.489933 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.009709767 = queryNorm
              0.38497207 = fieldWeight in 1543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.0625 = fieldNorm(doc=1543)
          0.15233012 = weight(abstract_txt:patents in 1543) [ClassicSimilarity], result of:
            0.15233012 = score(doc=1543,freq=1.0), product of:
              0.32722256 = queryWeight, product of:
                4.5245137 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.009709767 = queryNorm
              0.4655245 = fieldWeight in 1543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0625 = fieldNorm(doc=1543)
          1.0183234 = weight(abstract_txt:patent in 1543) [ClassicSimilarity], result of:
            1.0183234 = score(doc=1543,freq=12.0), product of:
              0.67907834 = queryWeight, product of:
                10.097547 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.009709767 = queryNorm
              1.4995669 = fieldWeight in 1543, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.0625 = fieldNorm(doc=1543)
        0.24 = coord(6/25)
    
  3. Stock, M.; Stock, W.G.: Intellectual property information : A comparative analysis of main information providers (2006) 0.33
    0.32617086 = sum of:
      0.32617086 = product of:
        1.3590453 = sum of:
          0.03196497 = weight(abstract_txt:intellectual in 210) [ClassicSimilarity], result of:
            0.03196497 = score(doc=210,freq=1.0), product of:
              0.0733695 = queryWeight, product of:
                1.3549962 = boost
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.009709767 = queryNorm
              0.43567106 = fieldWeight in 210, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.078125 = fieldNorm(doc=210)
          0.050197896 = weight(abstract_txt:property in 210) [ClassicSimilarity], result of:
            0.050197896 = score(doc=210,freq=1.0), product of:
              0.09912649 = queryWeight, product of:
                1.5749804 = boost
                6.481951 = idf(docFreq=183, maxDocs=44218)
                0.009709767 = queryNorm
              0.50640243 = fieldWeight in 210, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.481951 = idf(docFreq=183, maxDocs=44218)
                0.078125 = fieldNorm(doc=210)
          0.07161242 = weight(abstract_txt:search in 210) [ClassicSimilarity], result of:
            0.07161242 = score(doc=210,freq=7.0), product of:
              0.09471076 = queryWeight, product of:
                2.666494 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.009709767 = queryNorm
              0.756117 = fieldWeight in 210, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.078125 = fieldNorm(doc=210)
          0.042660583 = weight(abstract_txt:terms in 210) [ClassicSimilarity], result of:
            0.042660583 = score(doc=210,freq=1.0), product of:
              0.135033 = queryWeight, product of:
                3.4390166 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.009709767 = queryNorm
              0.3159271 = fieldWeight in 210, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=210)
          0.19041264 = weight(abstract_txt:patents in 210) [ClassicSimilarity], result of:
            0.19041264 = score(doc=210,freq=1.0), product of:
              0.32722256 = queryWeight, product of:
                4.5245137 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.009709767 = queryNorm
              0.5819056 = fieldWeight in 210, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.078125 = fieldNorm(doc=210)
          0.9721967 = weight(abstract_txt:patent in 210) [ClassicSimilarity], result of:
            0.9721967 = score(doc=210,freq=7.0), product of:
              0.67907834 = queryWeight, product of:
                10.097547 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.009709767 = queryNorm
              1.4316415 = fieldWeight in 210, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.078125 = fieldNorm(doc=210)
        0.24 = coord(6/25)
    
  4. Yan, B.; Luo, J.: Measuring technological distance for patent mapping (2017) 0.30
    0.29929152 = sum of:
      0.29929152 = product of:
        1.247048 = sum of:
          0.013365215 = weight(abstract_txt:field in 3351) [ClassicSimilarity], result of:
            0.013365215 = score(doc=3351,freq=1.0), product of:
              0.04760545 = queryWeight, product of:
                1.091462 = boost
                4.491995 = idf(docFreq=1345, maxDocs=44218)
                0.009709767 = queryNorm
              0.28074968 = fieldWeight in 3351, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.491995 = idf(docFreq=1345, maxDocs=44218)
                0.0625 = fieldNorm(doc=3351)
          0.041759662 = weight(abstract_txt:similarities in 3351) [ClassicSimilarity], result of:
            0.041759662 = score(doc=3351,freq=1.0), product of:
              0.10174444 = queryWeight, product of:
                1.5956427 = boost
                6.5669885 = idf(docFreq=168, maxDocs=44218)
                0.009709767 = queryNorm
              0.41043678 = fieldWeight in 3351, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5669885 = idf(docFreq=168, maxDocs=44218)
                0.0625 = fieldNorm(doc=3351)
          0.009339816 = weight(abstract_txt:from in 3351) [ClassicSimilarity], result of:
            0.009339816 = score(doc=3351,freq=1.0), product of:
              0.054067798 = queryWeight, product of:
                2.0146995 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.009709767 = queryNorm
              0.17274266 = fieldWeight in 3351, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=3351)
          0.03755812 = weight(abstract_txt:fields in 3351) [ClassicSimilarity], result of:
            0.03755812 = score(doc=3351,freq=1.0), product of:
              0.11944059 = queryWeight, product of:
                2.4449573 = boost
                5.0312033 = idf(docFreq=784, maxDocs=44218)
                0.009709767 = queryNorm
              0.3144502 = fieldWeight in 3351, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0312033 = idf(docFreq=784, maxDocs=44218)
                0.0625 = fieldNorm(doc=3351)
          0.21542732 = weight(abstract_txt:patents in 3351) [ClassicSimilarity], result of:
            0.21542732 = score(doc=3351,freq=2.0), product of:
              0.32722256 = queryWeight, product of:
                4.5245137 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.009709767 = queryNorm
              0.65835106 = fieldWeight in 3351, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0625 = fieldNorm(doc=3351)
          0.92959785 = weight(abstract_txt:patent in 3351) [ClassicSimilarity], result of:
            0.92959785 = score(doc=3351,freq=10.0), product of:
              0.67907834 = queryWeight, product of:
                10.097547 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.009709767 = queryNorm
              1.368911 = fieldWeight in 3351, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.0625 = fieldNorm(doc=3351)
        0.24 = coord(6/25)
    
  5. Fujii, A.; Iwayama, M.; Kando, N.: Introduction to the special issue on patent processing (2007) 0.29
    0.2949122 = sum of:
      0.2949122 = product of:
        1.2288008 = sum of:
          0.029312009 = weight(abstract_txt:importance in 929) [ClassicSimilarity], result of:
            0.029312009 = score(doc=929,freq=1.0), product of:
              0.061325666 = queryWeight, product of:
                1.2388006 = boost
                5.098378 = idf(docFreq=733, maxDocs=44218)
                0.009709767 = queryNorm
              0.47797295 = fieldWeight in 929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.098378 = idf(docFreq=733, maxDocs=44218)
                0.09375 = fieldNorm(doc=929)
          0.038357962 = weight(abstract_txt:intellectual in 929) [ClassicSimilarity], result of:
            0.038357962 = score(doc=929,freq=1.0), product of:
              0.0733695 = queryWeight, product of:
                1.3549962 = boost
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.009709767 = queryNorm
              0.5228053 = fieldWeight in 929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.09375 = fieldNorm(doc=929)
          0.060237475 = weight(abstract_txt:property in 929) [ClassicSimilarity], result of:
            0.060237475 = score(doc=929,freq=1.0), product of:
              0.09912649 = queryWeight, product of:
                1.5749804 = boost
                6.481951 = idf(docFreq=183, maxDocs=44218)
                0.009709767 = queryNorm
              0.60768294 = fieldWeight in 929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.481951 = idf(docFreq=183, maxDocs=44218)
                0.09375 = fieldNorm(doc=929)
          0.014009723 = weight(abstract_txt:from in 929) [ClassicSimilarity], result of:
            0.014009723 = score(doc=929,freq=1.0), product of:
              0.054067798 = queryWeight, product of:
                2.0146995 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.009709767 = queryNorm
              0.259114 = fieldWeight in 929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.09375 = fieldNorm(doc=929)
          0.32314098 = weight(abstract_txt:patents in 929) [ClassicSimilarity], result of:
            0.32314098 = score(doc=929,freq=2.0), product of:
              0.32722256 = queryWeight, product of:
                4.5245137 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.009709767 = queryNorm
              0.9875266 = fieldWeight in 929, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.09375 = fieldNorm(doc=929)
          0.76374257 = weight(abstract_txt:patent in 929) [ClassicSimilarity], result of:
            0.76374257 = score(doc=929,freq=3.0), product of:
              0.67907834 = queryWeight, product of:
                10.097547 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.009709767 = queryNorm
              1.1246752 = fieldWeight in 929, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.09375 = fieldNorm(doc=929)
        0.24 = coord(6/25)