Document (#37073)

Author
Cetintas, S.
Si, L.
Title
Effective query generation and postprocessing strategies for prior art patent search
Source
Journal of the American Society for Information Science and Technology. 63(2012) no.3, S.512-527
Year
2012
Abstract
Rapid increase in global competition demands increased protection of intellectual property rights and underlines the importance of patents as major intellectual property documents. Prior art patent search is the task of identifying related patents for a given patent file, and is an essential step in judging the validity of a patent application. This article proposes an automated query generation and postprocessing method for prior art patent search. The proposed approach first constructs structured queries by combining terms extracted from different fields of a query patent and then reranks the retrieved patents by utilizing the International Patent Classification (IPC) code similarities between the query patent and the retrieved patents along with the retrieval score. An extensive set of empirical results carried out on a large-scale, real-world dataset shows that utilizing 20 or 30 query terms extracted from all fields of an original query patent according to their log(tf)idf values helps form a representative search query out of the query patent and is found to be more effective than is using any number of query terms from any single field. It is shown that combining terms extracted from different fields of the query patent by giving higher importance to terms extracted from the abstract, claims, and description fields than to terms extracted from the title field is more effective than treating all extracted terms equally while forming the search query. Finally, utilizing the similarities between the IPC codes of the query patent and retrieved patents is shown to be beneficial to improve the effectiveness of the prior art search.
Field
Patentinformation

Similar documents (content)

  1. Liu, D.-R.; Shih, M.-J.: Hybrid-patent classification based on patent-network analysis (2011) 0.46
    0.4639364 = sum of:
      0.4639364 = product of:
        1.6569157 = sum of:
          0.013180873 = weight(abstract_txt:than in 654) [ClassicSimilarity], result of:
            0.013180873 = score(doc=654,freq=1.0), product of:
              0.05400553 = queryWeight, product of:
                1.4270892 = boost
                3.905044 = idf(docFreq=2367, maxDocs=43254)
                0.009690834 = queryNorm
              0.24406525 = fieldWeight in 654, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.905044 = idf(docFreq=2367, maxDocs=43254)
                0.0625 = fieldNorm(doc=654)
          0.025073892 = weight(abstract_txt:effective in 654) [ClassicSimilarity], result of:
            0.025073892 = score(doc=654,freq=1.0), product of:
              0.08291312 = queryWeight, product of:
                1.76825 = boost
                4.838586 = idf(docFreq=930, maxDocs=43254)
                0.009690834 = queryNorm
              0.30241162 = fieldWeight in 654, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.838586 = idf(docFreq=930, maxDocs=43254)
                0.0625 = fieldNorm(doc=654)
          0.016484134 = weight(abstract_txt:from in 654) [ClassicSimilarity], result of:
            0.016484134 = score(doc=654,freq=3.0), product of:
              0.05476324 = queryWeight, product of:
                2.0323176 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.009690834 = queryNorm
              0.30100727 = fieldWeight in 654, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.0625 = fieldNorm(doc=654)
          0.10457739 = weight(abstract_txt:extracted in 654) [ClassicSimilarity], result of:
            0.10457739 = score(doc=654,freq=1.0), product of:
              0.2706724 = queryWeight, product of:
                4.518237 = boost
                6.1817837 = idf(docFreq=242, maxDocs=43254)
                0.009690834 = queryNorm
              0.38636148 = fieldWeight in 654, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1817837 = idf(docFreq=242, maxDocs=43254)
                0.0625 = fieldNorm(doc=654)
          0.21492073 = weight(abstract_txt:patents in 654) [ClassicSimilarity], result of:
            0.21492073 = score(doc=654,freq=2.0), product of:
              0.32678884 = queryWeight, product of:
                4.532001 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.009690834 = queryNorm
              0.6576746 = fieldWeight in 654, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.0625 = fieldNorm(doc=654)
          0.18839943 = weight(abstract_txt:query in 654) [ClassicSimilarity], result of:
            0.18839943 = score(doc=654,freq=4.0), product of:
              0.31807426 = queryWeight, product of:
                6.926698 = boost
                4.738502 = idf(docFreq=1028, maxDocs=43254)
                0.009690834 = queryNorm
              0.59231275 = fieldWeight in 654, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.738502 = idf(docFreq=1028, maxDocs=43254)
                0.0625 = fieldNorm(doc=654)
          1.0942792 = weight(abstract_txt:patent in 654) [ClassicSimilarity], result of:
            1.0942792 = score(doc=654,freq=14.0), product of:
              0.6769213 = queryWeight, product of:
                10.104879 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.009690834 = queryNorm
              1.616553 = fieldWeight in 654, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.0625 = fieldNorm(doc=654)
        0.28 = coord(7/25)
    
  2. Kay, L.; Newman, N.; Youtie, J.; Porter, A.L.; Rafols, I.: Patent overlay mapping : visualizing technological distance (2014) 0.33
    0.32799435 = sum of:
      0.32799435 = product of:
        1.3666432 = sum of:
          0.04229718 = weight(abstract_txt:similarities in 3008) [ClassicSimilarity], result of:
            0.04229718 = score(doc=3008,freq=1.0), product of:
              0.10264063 = queryWeight, product of:
                1.6063718 = boost
                6.5934405 = idf(docFreq=160, maxDocs=43254)
                0.009690834 = queryNorm
              0.41209003 = fieldWeight in 3008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5934405 = idf(docFreq=160, maxDocs=43254)
                0.0625 = fieldNorm(doc=3008)
          0.016484134 = weight(abstract_txt:from in 3008) [ClassicSimilarity], result of:
            0.016484134 = score(doc=3008,freq=3.0), product of:
              0.05476324 = queryWeight, product of:
                2.0323176 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.009690834 = queryNorm
              0.30100727 = fieldWeight in 3008, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.0625 = fieldNorm(doc=3008)
          0.038206957 = weight(abstract_txt:fields in 3008) [ClassicSimilarity], result of:
            0.038206957 = score(doc=3008,freq=1.0), product of:
              0.12084165 = queryWeight, product of:
                2.4649591 = boost
                5.0587797 = idf(docFreq=746, maxDocs=43254)
                0.009690834 = queryNorm
              0.31617373 = fieldWeight in 3008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0587797 = idf(docFreq=746, maxDocs=43254)
                0.0625 = fieldNorm(doc=3008)
          0.10457739 = weight(abstract_txt:extracted in 3008) [ClassicSimilarity], result of:
            0.10457739 = score(doc=3008,freq=1.0), product of:
              0.2706724 = queryWeight, product of:
                4.518237 = boost
                6.1817837 = idf(docFreq=242, maxDocs=43254)
                0.009690834 = queryNorm
              0.38636148 = fieldWeight in 3008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1817837 = idf(docFreq=242, maxDocs=43254)
                0.0625 = fieldNorm(doc=3008)
          0.15197189 = weight(abstract_txt:patents in 3008) [ClassicSimilarity], result of:
            0.15197189 = score(doc=3008,freq=1.0), product of:
              0.32678884 = queryWeight, product of:
                4.532001 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.009690834 = queryNorm
              0.46504617 = fieldWeight in 3008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.0625 = fieldNorm(doc=3008)
          1.0131056 = weight(abstract_txt:patent in 3008) [ClassicSimilarity], result of:
            1.0131056 = score(doc=3008,freq=12.0), product of:
              0.6769213 = queryWeight, product of:
                10.104879 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.009690834 = queryNorm
              1.4966372 = fieldWeight in 3008, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.0625 = fieldNorm(doc=3008)
        0.24 = coord(6/25)
    
  3. Stock, M.; Stock, W.G.: Intellectual property information : A comparative analysis of main information providers (2006) 0.33
    0.3250073 = sum of:
      0.3250073 = product of:
        1.354197 = sum of:
          0.032147065 = weight(abstract_txt:intellectual in 1336) [ClassicSimilarity], result of:
            0.032147065 = score(doc=1336,freq=1.0), product of:
              0.0736658 = queryWeight, product of:
                1.3608785 = boost
                5.5858 = idf(docFreq=440, maxDocs=43254)
                0.009690834 = queryNorm
              0.43639064 = fieldWeight in 1336, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5858 = idf(docFreq=440, maxDocs=43254)
                0.078125 = fieldNorm(doc=1336)
          0.050362702 = weight(abstract_txt:property in 1336) [ClassicSimilarity], result of:
            0.050362702 = score(doc=1336,freq=1.0), product of:
              0.09936751 = queryWeight, product of:
                1.5805513 = boost
                6.487459 = idf(docFreq=178, maxDocs=43254)
                0.009690834 = queryNorm
              0.5068327 = fieldWeight in 1336, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.487459 = idf(docFreq=178, maxDocs=43254)
                0.078125 = fieldNorm(doc=1336)
          0.071364164 = weight(abstract_txt:search in 1336) [ClassicSimilarity], result of:
            0.071364164 = score(doc=1336,freq=7.0), product of:
              0.09451474 = queryWeight, product of:
                2.6699111 = boost
                3.6529322 = idf(docFreq=3046, maxDocs=43254)
                0.009690834 = queryNorm
              0.7550586 = fieldWeight in 1336, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.6529322 = idf(docFreq=3046, maxDocs=43254)
                0.078125 = fieldNorm(doc=1336)
          0.043143116 = weight(abstract_txt:terms in 1336) [ClassicSimilarity], result of:
            0.043143116 = score(doc=1336,freq=1.0), product of:
              0.13608243 = queryWeight, product of:
                3.4603612 = boost
                4.058069 = idf(docFreq=2031, maxDocs=43254)
                0.009690834 = queryNorm
              0.31703666 = fieldWeight in 1336, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.058069 = idf(docFreq=2031, maxDocs=43254)
                0.078125 = fieldNorm(doc=1336)
          0.18996488 = weight(abstract_txt:patents in 1336) [ClassicSimilarity], result of:
            0.18996488 = score(doc=1336,freq=1.0), product of:
              0.32678884 = queryWeight, product of:
                4.532001 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.009690834 = queryNorm
              0.5813077 = fieldWeight in 1336, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.078125 = fieldNorm(doc=1336)
          0.9672152 = weight(abstract_txt:patent in 1336) [ClassicSimilarity], result of:
            0.9672152 = score(doc=1336,freq=7.0), product of:
              0.6769213 = queryWeight, product of:
                10.104879 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.009690834 = queryNorm
              1.4288443 = fieldWeight in 1336, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.078125 = fieldNorm(doc=1336)
        0.24 = coord(6/25)
    
  4. Yan, B.; Luo, J.: Measuring technological distance for patent mapping (2017) 0.30
    0.2984021 = sum of:
      0.2984021 = product of:
        1.2433422 = sum of:
          0.013565493 = weight(abstract_txt:field in 4816) [ClassicSimilarity], result of:
            0.013565493 = score(doc=4816,freq=1.0), product of:
              0.048091546 = queryWeight, product of:
                1.0995647 = boost
                4.513223 = idf(docFreq=1288, maxDocs=43254)
                0.009690834 = queryNorm
              0.28207645 = fieldWeight in 4816, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.513223 = idf(docFreq=1288, maxDocs=43254)
                0.0625 = fieldNorm(doc=4816)
          0.04229718 = weight(abstract_txt:similarities in 4816) [ClassicSimilarity], result of:
            0.04229718 = score(doc=4816,freq=1.0), product of:
              0.10264063 = queryWeight, product of:
                1.6063718 = boost
                6.5934405 = idf(docFreq=160, maxDocs=43254)
                0.009690834 = queryNorm
              0.41209003 = fieldWeight in 4816, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5934405 = idf(docFreq=160, maxDocs=43254)
                0.0625 = fieldNorm(doc=4816)
          0.009517119 = weight(abstract_txt:from in 4816) [ClassicSimilarity], result of:
            0.009517119 = score(doc=4816,freq=1.0), product of:
              0.05476324 = queryWeight, product of:
                2.0323176 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.009690834 = queryNorm
              0.17378664 = fieldWeight in 4816, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.0625 = fieldNorm(doc=4816)
          0.038206957 = weight(abstract_txt:fields in 4816) [ClassicSimilarity], result of:
            0.038206957 = score(doc=4816,freq=1.0), product of:
              0.12084165 = queryWeight, product of:
                2.4649591 = boost
                5.0587797 = idf(docFreq=746, maxDocs=43254)
                0.009690834 = queryNorm
              0.31617373 = fieldWeight in 4816, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0587797 = idf(docFreq=746, maxDocs=43254)
                0.0625 = fieldNorm(doc=4816)
          0.21492073 = weight(abstract_txt:patents in 4816) [ClassicSimilarity], result of:
            0.21492073 = score(doc=4816,freq=2.0), product of:
              0.32678884 = queryWeight, product of:
                4.532001 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.009690834 = queryNorm
              0.6576746 = fieldWeight in 4816, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.0625 = fieldNorm(doc=4816)
          0.92483467 = weight(abstract_txt:patent in 4816) [ClassicSimilarity], result of:
            0.92483467 = score(doc=4816,freq=10.0), product of:
              0.6769213 = queryWeight, product of:
                10.104879 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.009690834 = queryNorm
              1.3662366 = fieldWeight in 4816, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.0625 = fieldNorm(doc=4816)
        0.24 = coord(6/25)
    
  5. Fujii, A.; Iwayama, M.; Kando, N.: Introduction to the special issue on patent processing (2007) 0.29
    0.2940354 = sum of:
      0.2940354 = product of:
        1.2251476 = sum of:
          0.029649872 = weight(abstract_txt:importance in 2930) [ClassicSimilarity], result of:
            0.029649872 = score(doc=2930,freq=1.0), product of:
              0.061811045 = queryWeight, product of:
                1.246578 = boost
                5.1166472 = idf(docFreq=704, maxDocs=43254)
                0.009690834 = queryNorm
              0.47968566 = fieldWeight in 2930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1166472 = idf(docFreq=704, maxDocs=43254)
                0.09375 = fieldNorm(doc=2930)
          0.038576476 = weight(abstract_txt:intellectual in 2930) [ClassicSimilarity], result of:
            0.038576476 = score(doc=2930,freq=1.0), product of:
              0.0736658 = queryWeight, product of:
                1.3608785 = boost
                5.5858 = idf(docFreq=440, maxDocs=43254)
                0.009690834 = queryNorm
              0.52366877 = fieldWeight in 2930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5858 = idf(docFreq=440, maxDocs=43254)
                0.09375 = fieldNorm(doc=2930)
          0.060435247 = weight(abstract_txt:property in 2930) [ClassicSimilarity], result of:
            0.060435247 = score(doc=2930,freq=1.0), product of:
              0.09936751 = queryWeight, product of:
                1.5805513 = boost
                6.487459 = idf(docFreq=178, maxDocs=43254)
                0.009690834 = queryNorm
              0.6081993 = fieldWeight in 2930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.487459 = idf(docFreq=178, maxDocs=43254)
                0.09375 = fieldNorm(doc=2930)
          0.014275679 = weight(abstract_txt:from in 2930) [ClassicSimilarity], result of:
            0.014275679 = score(doc=2930,freq=1.0), product of:
              0.05476324 = queryWeight, product of:
                2.0323176 = boost
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.009690834 = queryNorm
              0.26067996 = fieldWeight in 2930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7805862 = idf(docFreq=7289, maxDocs=43254)
                0.09375 = fieldNorm(doc=2930)
          0.3223811 = weight(abstract_txt:patents in 2930) [ClassicSimilarity], result of:
            0.3223811 = score(doc=2930,freq=2.0), product of:
              0.32678884 = queryWeight, product of:
                4.532001 = boost
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.009690834 = queryNorm
              0.98651195 = fieldWeight in 2930, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4407387 = idf(docFreq=68, maxDocs=43254)
                0.09375 = fieldNorm(doc=2930)
          0.7598292 = weight(abstract_txt:patent in 2930) [ClassicSimilarity], result of:
            0.7598292 = score(doc=2930,freq=3.0), product of:
              0.6769213 = queryWeight, product of:
                10.104879 = boost
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.009690834 = queryNorm
              1.1224779 = fieldWeight in 2930, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.912671 = idf(docFreq=116, maxDocs=43254)
                0.09375 = fieldNorm(doc=2930)
        0.24 = coord(6/25)