Search (3 results, page 1 of 1)

  • × author_ss:"Zhang, M."
  1. Zhou, G.D.; Zhang, M.; Ji, D.H.; Zhu, Q.M.: Hierarchical learning strategy in semantic relation extraction (2008) 0.01
    0.009606745 = product of:
      0.05764047 = sum of:
        0.05764047 = weight(_text_:problem in 2077) [ClassicSimilarity], result of:
          0.05764047 = score(doc=2077,freq=2.0), product of:
            0.20485485 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04826377 = queryNorm
            0.28137225 = fieldWeight in 2077, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=2077)
      0.16666667 = coord(1/6)
    
    Abstract
    This paper proposes a novel hierarchical learning strategy to deal with the data sparseness problem in semantic relation extraction by modeling the commonality among related classes. For each class in the hierarchy either manually predefined or automatically clustered, a discriminative function is determined in a top-down way. As the upper-level class normally has much more positive training examples than the lower-level class, the corresponding discriminative function can be determined more reliably and guide the discriminative function learning in the lower-level one more effectively, which otherwise might suffer from limited training data. In this paper, two classifier learning approaches, i.e. the simple perceptron algorithm and the state-of-the-art Support Vector Machines, are applied using the hierarchical learning strategy. Moreover, several kinds of class hierarchies either manually predefined or automatically clustered are explored and compared. Evaluation on the ACE RDC 2003 and 2004 corpora shows that the hierarchical learning strategy much improves the performance on least- and medium-frequent relations.
  2. Liu, Y.; Zhang, M.; Cen, R.; Ru, L.; Ma, S.: Data cleansing for Web information retrieval using query independent features (2007) 0.01
    0.008005621 = product of:
      0.04803372 = sum of:
        0.04803372 = weight(_text_:problem in 607) [ClassicSimilarity], result of:
          0.04803372 = score(doc=607,freq=2.0), product of:
            0.20485485 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04826377 = queryNorm
            0.23447686 = fieldWeight in 607, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=607)
      0.16666667 = coord(1/6)
    
    Abstract
    Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance.
  3. Ahn, J.-w.; Soergel, D.; Lin, X.; Zhang, M.: Mapping between ARTstor terms and the Getty Art and Architecture Thesaurus (2014) 0.01
    0.0065390747 = product of:
      0.03923445 = sum of:
        0.03923445 = weight(_text_:22 in 1421) [ClassicSimilarity], result of:
          0.03923445 = score(doc=1421,freq=2.0), product of:
            0.1690115 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04826377 = queryNorm
            0.23214069 = fieldWeight in 1421, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1421)
      0.16666667 = coord(1/6)
    
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik