Search (5 results, page 1 of 1)

  • × author_ss:"Lee, G.G."
  • × year_i:[2000 TO 2010}
  1. Jung, H.; Yi, E.; Kim, D.; Lee, G.G.: Information extraction with automatic knowledge expansion (2005) 0.00
    0.004371658 = product of:
      0.017486632 = sum of:
        0.017486632 = weight(_text_:information in 1008) [ClassicSimilarity], result of:
          0.017486632 = score(doc=1008,freq=12.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.2850541 = fieldWeight in 1008, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1008)
      0.25 = coord(1/4)
    
    Abstract
    POSIE (POSTECH Information Extraction System) is an information extraction system which uses multiple learning strategies, i.e., SmL, user-oriented learning, and separate-context learning, in a question answering framework. POSIE replaces laborious annotation with automatic instance extraction by the SmL from structured Web documents, and places the user at the end of the user-oriented learning cycle. Information extraction as question answering simplifies the extraction procedures for a set of slots. We introduce the techniques verified on the question answering framework, such as domain knowledge and instance rules, into an information extraction problem. To incrementally improve extraction performance, a sequence of the user-oriented learning and the separate-context learning produces context rules and generalizes them in both the learning and extraction phases. Experiments on the "continuing education" domain initially show that the F1-measure becomes 0.477 and recall 0.748 with no user training. However, as the size of the training documents grows, the F1-measure reaches beyond 0.75 with recall 0.772. We also obtain F-measure of about 0.9 for five out of seven slots on "job offering" domain.
    Source
    Information processing and management. 41(2005) no.2, S.217-242
  2. Lee, C.; Lee, G.G.: Probabilistic information retrieval model for a dependence structured indexing system (2005) 0.00
    0.003606434 = product of:
      0.014425736 = sum of:
        0.014425736 = weight(_text_:information in 1004) [ClassicSimilarity], result of:
          0.014425736 = score(doc=1004,freq=6.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.23515764 = fieldWeight in 1004, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1004)
      0.25 = coord(1/4)
    
    Abstract
    Most previous information retrieval (IR) models assume that terms of queries and documents are statistically independent from each other. However, conditional independence assumption is obviously and openly understood to be wrong, so we present a new method of incorporating term dependence into a probabilistic retrieval model by adapting a dependency structured indexing system using a dependency parse tree and Chow Expansion to compensate the weakness of the assumption. In this paper, we describe a theoretic process to apply the Chow Expansion to the general probabilistic models and the state-of-the-art 2-Poisson model. Through experiments on document collections in English and Korean, we demonstrate that the incorporation of term dependences using Chow Expansion contributes to the improvement of performance in probabilistic IR systems.
    Source
    Information processing and management. 41(2005) no.2, S.161-176
  3. Cho, B.-H.; Lee, C.; Lee, G.G.: Exploring term dependences in probabilistic information retrieval model (2003) 0.00
    0.003606434 = product of:
      0.014425736 = sum of:
        0.014425736 = weight(_text_:information in 1077) [ClassicSimilarity], result of:
          0.014425736 = score(doc=1077,freq=6.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.23515764 = fieldWeight in 1077, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1077)
      0.25 = coord(1/4)
    
    Abstract
    Most previous information retrieval (IR) models assume that terms of queries and documents are statistically independent from each another. However, this kind of conditional independence assumption is obviously and openly understood to be wrong, so we present a new method of incorporating term dependence in probabilistic retrieval model by adapting Bahadur-Lazarsfeld expansion (BLE) to compensate the weakness of the assumption. In this paper, we describe a theoretic process to apply BLE to the general probabilistic models and the state-of-the-art 2-Poisson model. Through the experiments on two standard document collections, HANTEC2.0 in Korean and WT10g in English, we demonstrate that incorporation of term dependences using the BLE significantly contribute to the improvement of performance in at least two different language IR systems.
    Source
    Information processing and management. 39(2003) no.4, S.505-519
  4. Yoon, Y.; Lee, G.G.: Efficient implementation of associative classifiers for document classification (2007) 0.00
    0.003091229 = product of:
      0.012364916 = sum of:
        0.012364916 = weight(_text_:information in 909) [ClassicSimilarity], result of:
          0.012364916 = score(doc=909,freq=6.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.20156369 = fieldWeight in 909, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=909)
      0.25 = coord(1/4)
    
    Abstract
    In practical text classification tasks, the ability to interpret the classification result is as important as the ability to classify exactly. Associative classifiers have many favorable characteristics such as rapid training, good classification accuracy, and excellent interpretation. However, associative classifiers also have some obstacles to overcome when they are applied in the area of text classification. The target text collection generally has a very high dimension, thus the training process might take a very long time. We propose a feature selection based on the mutual information between the word and class variables to reduce the space dimension of the associative classifiers. In addition, the training process of the associative classifier produces a huge amount of classification rules, which makes the prediction with a new document ineffective. We resolve this by introducing a new efficient method for storing and pruning classification rules. This method can also be used when predicting a test document. Experimental results using the 20-newsgroups dataset show many benefits of the associative classification in both training and predicting when applied to a real world problem.
    Footnote
    Beitrag in: Special issue on AIRS2005: Information Retrieval Research in Asia
    Source
    Information processing and management. 43(2007) no.2, S.393-405
  5. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.00
    0.0020821756 = product of:
      0.008328702 = sum of:
        0.008328702 = weight(_text_:information in 5273) [ClassicSimilarity], result of:
          0.008328702 = score(doc=5273,freq=2.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.13576832 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.25 = coord(1/4)
    
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.3, S.431-442