Search (3 results, page 1 of 1)

Kang, H.-K.; Choi, K.-S.: Two-level document ranking using mutual information in natural language information retrieval (1997) 0.00
```
0.0030255679 = product of:
  0.0060511357 = sum of:
    0.0060511357 = product of:
      0.012102271 = sum of:
        0.012102271 = weight(_text_:a in 159) [ClassicSimilarity], result of:
          0.012102271 = score(doc=159,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.22789092 = fieldWeight in 159, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=159)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Discusses a model of a natural language information retrieval system that is based on a 2-level document ranking method using mutual information. Retrieves documents based on automatically constructed index terms. Reorders the retrieved documents using mutual information. Shows that the method achieves retrieval effectiveness improvement over a traditional linear searching methods. Analyzes 7 newly developed formulas that reorder the retrieved documents. Among the 7 formulas, recommends 1 formula that dominates the others in terms of retrieval effectiveness

Type

a
Park, Y.C.; Choi, K.-S.: Automatic thesaurus construction using Bayesian networks (1996) 0.00
```
0.00270615 = product of:
  0.0054123 = sum of:
    0.0054123 = product of:
      0.0108246 = sum of:
        0.0108246 = weight(_text_:a in 6581) [ClassicSimilarity], result of:
          0.0108246 = score(doc=6581,freq=8.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.20383182 = fieldWeight in 6581, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=6581)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automatic thesaurus construction is accomplished by extracting term relations mechanically. A popular method uses statistical analysis to discover the term relations. For low frequency terms the statistical information of the terms cannot be reliably used for deciding the relationship of terms. This problem is referred to as the data sparseness problem. Many studies have shown that low frequency terms are of most use in thesaurus construction. Characterizes the statistical behaviour of terms by using an inference network. Develops a formal approach using a Baysian network for the data sparseness problem

Type

a
Kim, J.-H.; Choi, K.-S.: Patent document categorization based on semantic structural information (2007) 0.00
```
0.0022374375 = product of:
  0.004474875 = sum of:
    0.004474875 = product of:
      0.00894975 = sum of:
        0.00894975 = weight(_text_:a in 933) [ClassicSimilarity], result of:
          0.00894975 = score(doc=933,freq=14.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1685276 = fieldWeight in 933, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=933)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The number of patent documents is currently rising rapidly worldwide, creating the need for an automatic categorization system to replace time-consuming and labor-intensive manual categorization. Because accurate patent classification is crucial to search for relevant existing patents in a certain field, patent categorization is a very important and useful field. As patent documents are structural documents with their own characteristics distinguished from general documents, these unique traits should be considered in the patent categorization process. In this paper, we categorize Japanese patent documents automatically, focusing on their characteristics: patents are structured by claims, purposes, effects, embodiments of the invention, and so on. We propose a patent document categorization method that uses the k-NN (k-Nearest Neighbour) approach. In order to retrieve similar documents from a training document set, some specific components to denote the so-called semantic elements, such as claim, purpose, and application field, are compared instead of the whole texts. Because those specific components are identified by various user-defined tags, first all of the components are clustered into several semantic elements. Such semantically clustered structural components are the basic features of patent categorization. We can achieve a 74% improvement of categorization performance over a baseline system that does not use the structural information of the patent.

Type

a

Search (3 results, page 1 of 1)

Authors

Years