Search (3 results, page 1 of 1)

  • × author_ss:"Kan, M.-Y."
  • × type_ss:"a"
  1. Bird, S.; Dale, R.; Dorr, B.; Gibson, B.; Joseph, M.; Kan, M.-Y.; Lee, D.; Powley, B.; Radev, D.; Tan, Y.F.: ¬The ACL Anthology Reference Corpus : a reference dataset for bibliographic research in computational linguistics (2008) 0.00
    0.0023042266 = product of:
      0.01382536 = sum of:
        0.01382536 = weight(_text_:in in 2804) [ClassicSimilarity], result of:
          0.01382536 = score(doc=2804,freq=30.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.23282567 = fieldWeight in 2804, product of:
              5.477226 = tf(freq=30.0), with freq of:
                30.0 = termFreq=30.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.03125 = fieldNorm(doc=2804)
      0.16666667 = coord(1/6)
    
    Abstract
    The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Anthology that can be used for research in scholarly document processing. This corpus, which we call the ACL Anthology Reference Corpus (ACL ARC), brings together the recent activities of a number of research groups around the world. Our goal is to make the corpus widely available, and to encourage other researchers to use it as a standard testbed for experiments in both bibliographic and bibliometric research.
    Content
    Vgl. auch: Automatic Term Recognition (ATR) is a research task that deals with the identification of domain-specific terms. Terms, in simple words, are textual realization of significant concepts in an expertise domain. Additionally, domain-specific terms may be classified into a number of categories, in which each category represents a significant concept. A term classification task is often defined on top of an ATR procedure to perform such categorization. For instance, in the biomedical domain, terms can be classified as drugs, proteins, and genes. This is a reference dataset for terminology extraction and classification research in computational linguistics. It is a set of manually annotated terms in English language that are extracted from the ACL Anthology Reference Corpus (ACL ARC). The ACL ARC is a canonicalised and frozen subset of scientific publications in the domain of Human Language Technologies (HLT). It consists of 10,921 articles from 1965 to 2006. The dataset, called ACL RD-TEC, is comprised of more than 69,000 candidate terms that are manually annotated as valid and invalid terms. Furthermore, valid terms are classified as technology and non-technology terms. Technology terms refer to a method, process, or in general a technological concept in the domain of HLT, e.g. machine translation, word sense disambiguation, and language modelling. On the other hand, non-technology terms refer to important concepts other than technological; examples of such terms in the domain of HLT are multilingual lexicon, corpora, word sense, and language model. The dataset is created to serve as a gold standard for the comparison of the algorithms of term recognition and classification. [http://catalog.elra.info/product_info.php?products_id=1236].
  2. Ye, S.; Chua, T.-S.; Kan, M.-Y.; Qiu, L.: Document concept lattice for text understanding and summarization (2007) 0.00
    0.0017848461 = product of:
      0.010709076 = sum of:
        0.010709076 = weight(_text_:in in 941) [ClassicSimilarity], result of:
          0.010709076 = score(doc=941,freq=8.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.18034597 = fieldWeight in 941, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=941)
      0.16666667 = coord(1/6)
    
    Abstract
    We argue that the quality of a summary can be evaluated based on how many concepts in the original document(s) that can be preserved after summarization. Here, a concept refers to an abstract or concrete entity or its action often expressed by diverse terms in text. Summary generation can thus be considered as an optimization problem of selecting a set of sentences with minimal answer loss. In this paper, we propose a document concept lattice that indexes the hierarchy of local topics tied to a set of frequent concepts and the corresponding sentences containing these topics. The local topics will specify the promising sub-spaces related to the selected concepts and sentences. Based on this lattice, the summary is an optimized selection of a set of distinct and salient local topics that lead to maximal coverage of concepts with the given number of sentences. Our summarizer based on the concept lattice has demonstrated competitive performance in Document Understanding Conference 2005 and 2006 evaluations as well as follow-on tests.
  3. An, J.; Kim, N.; Kan, M.-Y.; Kumar Chandrasekaran, M.; Song, M.: Exploring characteristics of highly cited authors according to citation location and content (2017) 0.00
    8.9242304E-4 = product of:
      0.005354538 = sum of:
        0.005354538 = weight(_text_:in in 3765) [ClassicSimilarity], result of:
          0.005354538 = score(doc=3765,freq=2.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.09017298 = fieldWeight in 3765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=3765)
      0.16666667 = coord(1/6)
    
    Abstract
    Big Science and cross-disciplinary collaborations have reshaped the intellectual structure of research areas. A number of works have tried to uncover this hidden intellectual structure by analyzing citation contexts. However, none of them analyzed by document logical structures such as sections. The two major goals of this study are to find characteristics of authors who are highly cited section-wise and to identify the differences in section-wise author networks. This study uses 29,158 of research articles culled from the ACL Anthology, which hosts articles on computational linguistics and natural language processing. We find that the distribution of citations across sections is skewed and that a different set of highly cited authors share distinct academic characteristics, according to their citation locations. Furthermore, the author networks based on citation context similarity reveal that the intellectual structure of a domain differs across different sections.