Document (#32936)

Author
Tseng, Y.-H.
Lin, C.-J.
Lin, Y.-I.
Title
Text mining techniques for patent analysis
Source
Information processing and management. 43(2007) no.5, S.1216-1247
Year
2007
Abstract
Patent documents contain important research results. However, they are lengthy and rich in technical terminology such that it takes a lot of human efforts for analyses. Automatic tools for assisting patent engineers or decision makers in patent analysis are in great demand. This paper describes a series of text mining techniques that conforms to the analytical process used by patent analysts. These techniques include text segmentation, summary extraction, feature selection, term association, cluster generation, topic identification, and information mapping. The issues of efficiency and effectiveness are considered in the design of these techniques. Some important features of the proposed methodology include a rigorous approach to verify the usefulness of segment extracts as the document surrogates, a corpus- and dictionary-free algorithm for keyphrase extraction, an efficient co-word analysis method that can be applied to large volume of patents, and an automatic procedure to create generic cluster titles for ease of result interpretation. Evaluation of these techniques was conducted. The results confirm that the machine-generated summaries do preserve more important content words than some other sections for classification. To demonstrate the feasibility, the proposed methodology was applied to a real-world patent set for domain analysis and mapping, which shows that our approach is more effective than existing classification systems. The attempt in this paper to automate the whole process not only helps create final patent maps for topic analyses, but also facilitates or improves other patent analysis tasks such as patent classification, organization, knowledge sharing, and prior art searches.
Footnote
Beitrag innerhalb eines Themenschwerpunkt "special issue on patent processing"
Field
Patentinformation

Similar documents (author)

  1. Tseng, Y.-H.: Automatic cataloguing and searching for retrospective data by use of OCR text (2001) 4.57
    4.565969 = sum of:
      4.565969 = weight(author_txt:tseng in 5421) [ClassicSimilarity], result of:
        4.565969 = fieldWeight in 5421, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.5 = fieldNorm(doc=5421)
    
  2. Tseng, Y.-H.: Keyword extraction techniques and relevance feedback (1997) 4.57
    4.565969 = sum of:
      4.565969 = weight(author_txt:tseng in 1830) [ClassicSimilarity], result of:
        4.565969 = fieldWeight in 1830, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.5 = fieldNorm(doc=1830)
    
  3. Tseng, Y.-H.: Solving vocabulary problems with interactive query expansion (1998) 4.57
    4.565969 = sum of:
      4.565969 = weight(author_txt:tseng in 5159) [ClassicSimilarity], result of:
        4.565969 = fieldWeight in 5159, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.5 = fieldNorm(doc=5159)
    
  4. Tseng, Y.H.; Lin, Y.I.: Evaluation of fuzzy search, term suggestion, and term relevance feedback in an OPAC system (1998) 4.57
    4.565969 = sum of:
      4.565969 = weight(author_txt:tseng in 6430) [ClassicSimilarity], result of:
        4.565969 = fieldWeight in 6430, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.5 = fieldNorm(doc=6430)
    
  5. Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 4.57
    4.565969 = sum of:
      4.565969 = weight(author_txt:tseng in 5226) [ClassicSimilarity], result of:
        4.565969 = fieldWeight in 5226, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.5 = fieldNorm(doc=5226)
    

Similar documents (content)

  1. Kim, J.-H.; Choi, K.-S.: Patent document categorization based on semantic structural information (2007) 0.33
    0.33067158 = sum of:
      0.33067158 = product of:
        1.18097 = sum of:
          0.015715675 = weight(abstract_txt:process in 933) [ClassicSimilarity], result of:
            0.015715675 = score(doc=933,freq=1.0), product of:
              0.062071115 = queryWeight, product of:
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.015322374 = queryNorm
              0.25318822 = fieldWeight in 933, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.0625 = fieldNorm(doc=933)
          0.011490573 = weight(abstract_txt:these in 933) [ClassicSimilarity], result of:
            0.011490573 = score(doc=933,freq=1.0), product of:
              0.057666786 = queryWeight, product of:
                1.1804938 = boost
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.015322374 = queryNorm
              0.19925809 = fieldWeight in 933, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.0625 = fieldNorm(doc=933)
          0.03315501 = weight(abstract_txt:automatic in 933) [ClassicSimilarity], result of:
            0.03315501 = score(doc=933,freq=1.0), product of:
              0.10210185 = queryWeight, product of:
                1.2825433 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.015322374 = queryNorm
              0.32472485 = fieldWeight in 933, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=933)
          0.011118701 = weight(abstract_txt:that in 933) [ClassicSimilarity], result of:
            0.011118701 = score(doc=933,freq=2.0), product of:
              0.053089287 = queryWeight, product of:
                1.4622737 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015322374 = queryNorm
              0.20943399 = fieldWeight in 933, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=933)
          0.022559527 = weight(abstract_txt:classification in 933) [ClassicSimilarity], result of:
            0.022559527 = score(doc=933,freq=1.0), product of:
              0.09041724 = queryWeight, product of:
                1.4781772 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.015322374 = queryNorm
              0.2495047 = fieldWeight in 933, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=933)
          0.026549418 = weight(abstract_txt:important in 933) [ClassicSimilarity], result of:
            0.026549418 = score(doc=933,freq=1.0), product of:
              0.10078625 = queryWeight, product of:
                1.5606356 = boost
                4.2147684 = idf(docFreq=1775, maxDocs=44218)
                0.015322374 = queryNorm
              0.26342303 = fieldWeight in 933, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2147684 = idf(docFreq=1775, maxDocs=44218)
                0.0625 = fieldNorm(doc=933)
          1.060381 = weight(abstract_txt:patent in 933) [ClassicSimilarity], result of:
            1.060381 = score(doc=933,freq=9.0), product of:
              0.8165175 = queryWeight, product of:
                7.693859 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.015322374 = queryNorm
              1.298663 = fieldWeight in 933, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.0625 = fieldNorm(doc=933)
        0.28 = coord(7/25)
    
  2. Lawson, M.: Automatic extraction of citations from the text of English-language patents : an example of template mining (1996) 0.31
    0.30777377 = sum of:
      0.30777377 = product of:
        1.099192 = sum of:
          0.011490573 = weight(abstract_txt:these in 2654) [ClassicSimilarity], result of:
            0.011490573 = score(doc=2654,freq=1.0), product of:
              0.057666786 = queryWeight, product of:
                1.1804938 = boost
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.015322374 = queryNorm
              0.19925809 = fieldWeight in 2654, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.0625 = fieldNorm(doc=2654)
          0.026363624 = weight(abstract_txt:include in 2654) [ClassicSimilarity], result of:
            0.026363624 = score(doc=2654,freq=1.0), product of:
              0.087633654 = queryWeight, product of:
                1.1882031 = boost
                4.8134246 = idf(docFreq=975, maxDocs=44218)
                0.015322374 = queryNorm
              0.30083904 = fieldWeight in 2654, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8134246 = idf(docFreq=975, maxDocs=44218)
                0.0625 = fieldNorm(doc=2654)
          0.013617571 = weight(abstract_txt:that in 2654) [ClassicSimilarity], result of:
            0.013617571 = score(doc=2654,freq=3.0), product of:
              0.053089287 = queryWeight, product of:
                1.4622737 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015322374 = queryNorm
              0.2565032 = fieldWeight in 2654, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=2654)
          0.046898 = weight(abstract_txt:text in 2654) [ClassicSimilarity], result of:
            0.046898 = score(doc=2654,freq=4.0), product of:
              0.092778526 = queryWeight, product of:
                1.4973544 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015322374 = queryNorm
              0.5054833 = fieldWeight in 2654, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2654)
          0.055672854 = weight(abstract_txt:mining in 2654) [ClassicSimilarity], result of:
            0.055672854 = score(doc=2654,freq=1.0), product of:
              0.14424358 = queryWeight, product of:
                1.524416 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.015322374 = queryNorm
              0.38596416 = fieldWeight in 2654, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0625 = fieldNorm(doc=2654)
          0.07935183 = weight(abstract_txt:extraction in 2654) [ClassicSimilarity], result of:
            0.07935183 = score(doc=2654,freq=2.0), product of:
              0.14499804 = queryWeight, product of:
                1.5283974 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.015322374 = queryNorm
              0.54726136 = fieldWeight in 2654, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0625 = fieldNorm(doc=2654)
          0.8657976 = weight(abstract_txt:patent in 2654) [ClassicSimilarity], result of:
            0.8657976 = score(doc=2654,freq=6.0), product of:
              0.8165175 = queryWeight, product of:
                7.693859 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.015322374 = queryNorm
              1.060354 = fieldWeight in 2654, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.0625 = fieldNorm(doc=2654)
        0.28 = coord(7/25)
    
  3. Kang, I.-S.; Na, S.-H.; Kim, J.; Lee, J.-H.: Cluster-based patent retrieval (2007) 0.30
    0.29893842 = sum of:
      0.29893842 = product of:
        1.2455767 = sum of:
          0.011490573 = weight(abstract_txt:these in 930) [ClassicSimilarity], result of:
            0.011490573 = score(doc=930,freq=1.0), product of:
              0.057666786 = queryWeight, product of:
                1.1804938 = boost
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.015322374 = queryNorm
              0.19925809 = fieldWeight in 930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.0625 = fieldNorm(doc=930)
          0.013617571 = weight(abstract_txt:that in 930) [ClassicSimilarity], result of:
            0.013617571 = score(doc=930,freq=3.0), product of:
              0.053089287 = queryWeight, product of:
                1.4622737 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015322374 = queryNorm
              0.2565032 = fieldWeight in 930, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=930)
          0.031903986 = weight(abstract_txt:classification in 930) [ClassicSimilarity], result of:
            0.031903986 = score(doc=930,freq=2.0), product of:
              0.09041724 = queryWeight, product of:
                1.4781772 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.015322374 = queryNorm
              0.3528529 = fieldWeight in 930, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=930)
          0.17570928 = weight(abstract_txt:cluster in 930) [ClassicSimilarity], result of:
            0.17570928 = score(doc=930,freq=7.0), product of:
              0.1622425 = queryWeight, product of:
                1.6167302 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.015322374 = queryNorm
              1.083004 = fieldWeight in 930, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0625 = fieldNorm(doc=930)
          0.07768711 = weight(abstract_txt:techniques in 930) [ClassicSimilarity], result of:
            0.07768711 = score(doc=930,freq=2.0), product of:
              0.19403057 = queryWeight, product of:
                2.7955053 = boost
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.015322374 = queryNorm
              0.40038592 = fieldWeight in 930, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.0625 = fieldNorm(doc=930)
          0.9351682 = weight(abstract_txt:patent in 930) [ClassicSimilarity], result of:
            0.9351682 = score(doc=930,freq=7.0), product of:
              0.8165175 = queryWeight, product of:
                7.693859 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.015322374 = queryNorm
              1.1453131 = fieldWeight in 930, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.0625 = fieldNorm(doc=930)
        0.24 = coord(6/25)
    
  4. Lai, K.-K.; Wu, S.-J.: Using the patent co-citation approach to establish a new patent classification system (2005) 0.29
    0.29427072 = sum of:
      0.29427072 = product of:
        1.2261281 = sum of:
          0.023150017 = weight(abstract_txt:proposed in 1013) [ClassicSimilarity], result of:
            0.023150017 = score(doc=1013,freq=1.0), product of:
              0.0803591 = queryWeight, product of:
                1.1378179 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.015322374 = queryNorm
              0.2880821 = fieldWeight in 1013, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0625 = fieldNorm(doc=1013)
          0.031971607 = weight(abstract_txt:create in 1013) [ClassicSimilarity], result of:
            0.031971607 = score(doc=1013,freq=1.0), product of:
              0.09965762 = queryWeight, product of:
                1.2670988 = boost
                5.133032 = idf(docFreq=708, maxDocs=44218)
                0.015322374 = queryNorm
              0.3208145 = fieldWeight in 1013, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.133032 = idf(docFreq=708, maxDocs=44218)
                0.0625 = fieldNorm(doc=1013)
          0.007862109 = weight(abstract_txt:that in 1013) [ClassicSimilarity], result of:
            0.007862109 = score(doc=1013,freq=1.0), product of:
              0.053089287 = queryWeight, product of:
                1.4622737 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015322374 = queryNorm
              0.1480922 = fieldWeight in 1013, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=1013)
          0.045119055 = weight(abstract_txt:classification in 1013) [ClassicSimilarity], result of:
            0.045119055 = score(doc=1013,freq=4.0), product of:
              0.09041724 = queryWeight, product of:
                1.4781772 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.015322374 = queryNorm
              0.4990094 = fieldWeight in 1013, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=1013)
          0.05764426 = weight(abstract_txt:analysis in 1013) [ClassicSimilarity], result of:
            0.05764426 = score(doc=1013,freq=4.0), product of:
              0.12622099 = queryWeight, product of:
                2.2547116 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.015322374 = queryNorm
              0.45669314 = fieldWeight in 1013, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.0625 = fieldNorm(doc=1013)
          1.060381 = weight(abstract_txt:patent in 1013) [ClassicSimilarity], result of:
            1.060381 = score(doc=1013,freq=9.0), product of:
              0.8165175 = queryWeight, product of:
                7.693859 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.015322374 = queryNorm
              1.298663 = fieldWeight in 1013, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.0625 = fieldNorm(doc=1013)
        0.24 = coord(6/25)
    
  5. Liu, D.-R.; Shih, M.-J.: Hybrid-patent classification based on patent-network analysis (2011) 0.29
    0.29189998 = sum of:
      0.29189998 = product of:
        1.4594998 = sum of:
          0.03273907 = weight(abstract_txt:proposed in 4189) [ClassicSimilarity], result of:
            0.03273907 = score(doc=4189,freq=2.0), product of:
              0.0803591 = queryWeight, product of:
                1.1378179 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.015322374 = queryNorm
              0.4074096 = fieldWeight in 4189, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0625 = fieldNorm(doc=4189)
          0.015724218 = weight(abstract_txt:that in 4189) [ClassicSimilarity], result of:
            0.015724218 = score(doc=4189,freq=4.0), product of:
              0.053089287 = queryWeight, product of:
                1.4622737 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015322374 = queryNorm
              0.2961844 = fieldWeight in 4189, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=4189)
          0.059686895 = weight(abstract_txt:classification in 4189) [ClassicSimilarity], result of:
            0.059686895 = score(doc=4189,freq=7.0), product of:
              0.09041724 = queryWeight, product of:
                1.4781772 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.015322374 = queryNorm
              0.66012734 = fieldWeight in 4189, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=4189)
          0.02882213 = weight(abstract_txt:analysis in 4189) [ClassicSimilarity], result of:
            0.02882213 = score(doc=4189,freq=1.0), product of:
              0.12622099 = queryWeight, product of:
                2.2547116 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.015322374 = queryNorm
              0.22834657 = fieldWeight in 4189, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.0625 = fieldNorm(doc=4189)
          1.3225275 = weight(abstract_txt:patent in 4189) [ClassicSimilarity], result of:
            1.3225275 = score(doc=4189,freq=14.0), product of:
              0.8165175 = queryWeight, product of:
                7.693859 = boost
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.015322374 = queryNorm
              1.6197174 = fieldWeight in 4189, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                6.926203 = idf(docFreq=117, maxDocs=44218)
                0.0625 = fieldNorm(doc=4189)
        0.2 = coord(5/25)