Document (#39805)

Author
Bird, S.
Dale, R.
Dorr, B.
Gibson, B.
Joseph, M.
Kan, M.-Y.
Lee, D.
Powley, B.
Radev, D.
Tan, Y.F.
Title
¬The ACL Anthology Reference Corpus : a reference dataset for bibliographic research in computational linguistics
Source
Proceedings of Language Resources and Evaluation Conference (LREC 08). Marrakesh, Morocco, May [http://acl-arc.comp.nus.edu.sg/lrec08.pdf]
Year
2008
Abstract
The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Anthology that can be used for research in scholarly document processing. This corpus, which we call the ACL Anthology Reference Corpus (ACL ARC), brings together the recent activities of a number of research groups around the world. Our goal is to make the corpus widely available, and to encourage other researchers to use it as a standard testbed for experiments in both bibliographic and bibliometric research.
Content
Vgl. zum Corpus unter: http://acl-arc.comp.nus.edu.sg/.
Vgl. auch: Automatic Term Recognition (ATR) is a research task that deals with the identification of domain-specific terms. Terms, in simple words, are textual realization of significant concepts in an expertise domain. Additionally, domain-specific terms may be classified into a number of categories, in which each category represents a significant concept. A term classification task is often defined on top of an ATR procedure to perform such categorization. For instance, in the biomedical domain, terms can be classified as drugs, proteins, and genes. This is a reference dataset for terminology extraction and classification research in computational linguistics. It is a set of manually annotated terms in English language that are extracted from the ACL Anthology Reference Corpus (ACL ARC). The ACL ARC is a canonicalised and frozen subset of scientific publications in the domain of Human Language Technologies (HLT). It consists of 10,921 articles from 1965 to 2006. The dataset, called ACL RD-TEC, is comprised of more than 69,000 candidate terms that are manually annotated as valid and invalid terms. Furthermore, valid terms are classified as technology and non-technology terms. Technology terms refer to a method, process, or in general a technological concept in the domain of HLT, e.g. machine translation, word sense disambiguation, and language modelling. On the other hand, non-technology terms refer to important concepts other than technological; examples of such terms in the domain of HLT are multilingual lexicon, corpora, word sense, and language model. The dataset is created to serve as a gold standard for the comparison of the algorithms of term recognition and classification. [http://catalog.elra.info/product_info.php?products_id=1236].
Theme
Computerlinguistik
Object
ACL Anthology Reference Corpus

Similar documents (author)

  1. Radev, D.R.; Joseph, M.T.; Gibson, B.; Muthukrishnan, P.: ¬A bibliometric and network analysis of the field of computational linguistics (2016) 1.38
    1.3817466 = sum of:
      1.3817466 = product of:
        2.7634933 = sum of:
          1.364859 = weight(author_txt:gibson in 2764) [ClassicSimilarity], result of:
            1.364859 = score(doc=2764,freq=1.0), product of:
              0.4821849 = queryWeight, product of:
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.05323404 = queryNorm
              2.830572 = fieldWeight in 2764, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.3125 = fieldNorm(doc=2764)
          1.3986343 = weight(author_txt:radev in 2764) [ClassicSimilarity], result of:
            1.3986343 = score(doc=2764,freq=1.0), product of:
              0.49010733 = queryWeight, product of:
                1.0081817 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.05323404 = queryNorm
              2.8537307 = fieldWeight in 2764, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.3125 = fieldNorm(doc=2764)
        0.5 = coord(2/4)
    
  2. Dale, D.C.: Subject access in online catalogs : an overview bibliography (1989) 0.76
    0.76134044 = sum of:
      0.76134044 = product of:
        3.0453618 = sum of:
          3.0453618 = weight(author_txt:dale in 368) [ClassicSimilarity], result of:
            3.0453618 = score(doc=368,freq=1.0), product of:
              0.51867384 = queryWeight, product of:
                1.0371472 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.05323404 = queryNorm
              5.871439 = fieldWeight in 368, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.625 = fieldNorm(doc=368)
        0.25 = coord(1/4)
    
  3. Dale, T.: Selecting an indexing scheme (1996) 0.76
    0.76134044 = sum of:
      0.76134044 = product of:
        3.0453618 = sum of:
          3.0453618 = weight(author_txt:dale in 3347) [ClassicSimilarity], result of:
            3.0453618 = score(doc=3347,freq=1.0), product of:
              0.51867384 = queryWeight, product of:
                1.0371472 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.05323404 = queryNorm
              5.871439 = fieldWeight in 3347, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.625 = fieldNorm(doc=3347)
        0.25 = coord(1/4)
    
  4. Dale, D.C.: Subject access in online catalogs : an overview bibliography (1989) 0.76
    0.76134044 = sum of:
      0.76134044 = product of:
        3.0453618 = sum of:
          3.0453618 = weight(author_txt:dale in 856) [ClassicSimilarity], result of:
            3.0453618 = score(doc=856,freq=1.0), product of:
              0.51867384 = queryWeight, product of:
                1.0371472 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.05323404 = queryNorm
              5.871439 = fieldWeight in 856, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.625 = fieldNorm(doc=856)
        0.25 = coord(1/4)
    
  5. Dale, P.: Gabriel - a history and a new beginning (2002) 0.76
    0.76134044 = sum of:
      0.76134044 = product of:
        3.0453618 = sum of:
          3.0453618 = weight(author_txt:dale in 945) [ClassicSimilarity], result of:
            3.0453618 = score(doc=945,freq=1.0), product of:
              0.51867384 = queryWeight, product of:
                1.0371472 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.05323404 = queryNorm
              5.871439 = fieldWeight in 945, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.625 = fieldNorm(doc=945)
        0.25 = coord(1/4)
    

Similar documents (content)

  1. Zadeh, B.Q.; Handschuh, S.: ¬The ACL RD-TEC : a dataset for benchmarking terminology extraction and classification in computational linguistics (2014) 0.38
    0.3800955 = sum of:
      0.3800955 = product of:
        1.357484 = sum of:
          0.03242045 = weight(abstract_txt:derived in 2803) [ClassicSimilarity], result of:
            0.03242045 = score(doc=2803,freq=1.0), product of:
              0.072217904 = queryWeight, product of:
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.012567843 = queryNorm
              0.44892538 = fieldWeight in 2803, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.078125 = fieldNorm(doc=2803)
          0.0906592 = weight(abstract_txt:dataset in 2803) [ClassicSimilarity], result of:
            0.0906592 = score(doc=2803,freq=3.0), product of:
              0.099387795 = queryWeight, product of:
                1.1731244 = boost
                6.7410603 = idf(docFreq=141, maxDocs=44218)
                0.012567843 = queryNorm
              0.91217643 = fieldWeight in 2803, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.7410603 = idf(docFreq=141, maxDocs=44218)
                0.078125 = fieldNorm(doc=2803)
          0.17577806 = weight(abstract_txt:computational in 2803) [ClassicSimilarity], result of:
            0.17577806 = score(doc=2803,freq=4.0), product of:
              0.17690168 = queryWeight, product of:
                2.213393 = boost
                6.3593493 = idf(docFreq=207, maxDocs=44218)
                0.012567843 = queryNorm
              0.9936483 = fieldWeight in 2803, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.3593493 = idf(docFreq=207, maxDocs=44218)
                0.078125 = fieldNorm(doc=2803)
          0.20072888 = weight(abstract_txt:linguistics in 2803) [ClassicSimilarity], result of:
            0.20072888 = score(doc=2803,freq=4.0), product of:
              0.19326895 = queryWeight, product of:
                2.3135219 = boost
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.012567843 = queryNorm
              1.0385987 = fieldWeight in 2803, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.078125 = fieldNorm(doc=2803)
          0.07593336 = weight(abstract_txt:reference in 2803) [ClassicSimilarity], result of:
            0.07593336 = score(doc=2803,freq=1.0), product of:
              0.217793 = queryWeight, product of:
                3.8831532 = boost
                4.46271 = idf(docFreq=1385, maxDocs=44218)
                0.012567843 = queryNorm
              0.3486492 = fieldWeight in 2803, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.46271 = idf(docFreq=1385, maxDocs=44218)
                0.078125 = fieldNorm(doc=2803)
          0.27403963 = weight(abstract_txt:corpus in 2803) [ClassicSimilarity], result of:
            0.27403963 = score(doc=2803,freq=2.0), product of:
              0.4067127 = queryWeight, product of:
                5.306479 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.012567843 = queryNorm
              0.67379165 = fieldWeight in 2803, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.078125 = fieldNorm(doc=2803)
          0.50792444 = weight(abstract_txt:anthology in 2803) [ClassicSimilarity], result of:
            0.50792444 = score(doc=2803,freq=1.0), product of:
              0.7177694 = queryWeight, product of:
                6.305217 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.012567843 = queryNorm
              0.707643 = fieldWeight in 2803, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=2803)
        0.28 = coord(7/25)
    
  2. An, J.; Kim, N.; Kan, M.-Y.; Kumar Chandrasekaran, M.; Song, M.: Exploring characteristics of highly cited authors according to citation location and content (2017) 0.16
    0.15667465 = sum of:
      0.15667465 = product of:
        0.78337324 = sum of:
          0.04099451 = weight(abstract_txt:processing in 3765) [ClassicSimilarity], result of:
            0.04099451 = score(doc=3765,freq=1.0), product of:
              0.10639617 = queryWeight, product of:
                1.7165464 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.012567843 = queryNorm
              0.38530064 = fieldWeight in 3765, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.078125 = fieldNorm(doc=3765)
          0.08788903 = weight(abstract_txt:computational in 3765) [ClassicSimilarity], result of:
            0.08788903 = score(doc=3765,freq=1.0), product of:
              0.17690168 = queryWeight, product of:
                2.213393 = boost
                6.3593493 = idf(docFreq=207, maxDocs=44218)
                0.012567843 = queryNorm
              0.49682415 = fieldWeight in 3765, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3593493 = idf(docFreq=207, maxDocs=44218)
                0.078125 = fieldNorm(doc=3765)
          0.10036444 = weight(abstract_txt:linguistics in 3765) [ClassicSimilarity], result of:
            0.10036444 = score(doc=3765,freq=1.0), product of:
              0.19326895 = queryWeight, product of:
                2.3135219 = boost
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.012567843 = queryNorm
              0.5192993 = fieldWeight in 3765, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.078125 = fieldNorm(doc=3765)
          0.046200812 = weight(abstract_txt:research in 3765) [ClassicSimilarity], result of:
            0.046200812 = score(doc=3765,freq=2.0), product of:
              0.13189824 = queryWeight, product of:
                3.3103406 = boost
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.012567843 = queryNorm
              0.35027617 = fieldWeight in 3765, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.078125 = fieldNorm(doc=3765)
          0.50792444 = weight(abstract_txt:anthology in 3765) [ClassicSimilarity], result of:
            0.50792444 = score(doc=3765,freq=1.0), product of:
              0.7177694 = queryWeight, product of:
                6.305217 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.012567843 = queryNorm
              0.707643 = fieldWeight in 3765, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=3765)
        0.2 = coord(5/25)
    
  3. Wan, X.; Liu, F.: WL-index : leveraging citation mention number to quantify an individual's scientific impact (2014) 0.15
    0.14714311 = sum of:
      0.14714311 = product of:
        0.7357155 = sum of:
          0.055651333 = weight(abstract_txt:bibliometric in 1549) [ClassicSimilarity], result of:
            0.055651333 = score(doc=1549,freq=3.0), product of:
              0.08330073 = queryWeight, product of:
                1.0739943 = boost
                6.1714344 = idf(docFreq=250, maxDocs=44218)
                0.012567843 = queryNorm
              0.66807735 = fieldWeight in 1549, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1714344 = idf(docFreq=250, maxDocs=44218)
                0.0625 = fieldNorm(doc=1549)
          0.032795608 = weight(abstract_txt:processing in 1549) [ClassicSimilarity], result of:
            0.032795608 = score(doc=1549,freq=1.0), product of:
              0.10639617 = queryWeight, product of:
                1.7165464 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.012567843 = queryNorm
              0.3082405 = fieldWeight in 1549, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.0625 = fieldNorm(doc=1549)
          0.08590879 = weight(abstract_txt:reference in 1549) [ClassicSimilarity], result of:
            0.08590879 = score(doc=1549,freq=2.0), product of:
              0.217793 = queryWeight, product of:
                3.8831532 = boost
                4.46271 = idf(docFreq=1385, maxDocs=44218)
                0.012567843 = queryNorm
              0.39445156 = fieldWeight in 1549, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.46271 = idf(docFreq=1385, maxDocs=44218)
                0.0625 = fieldNorm(doc=1549)
          0.1550202 = weight(abstract_txt:corpus in 1549) [ClassicSimilarity], result of:
            0.1550202 = score(doc=1549,freq=1.0), product of:
              0.4067127 = queryWeight, product of:
                5.306479 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.012567843 = queryNorm
              0.3811541 = fieldWeight in 1549, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0625 = fieldNorm(doc=1549)
          0.40633956 = weight(abstract_txt:anthology in 1549) [ClassicSimilarity], result of:
            0.40633956 = score(doc=1549,freq=1.0), product of:
              0.7177694 = queryWeight, product of:
                6.305217 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.012567843 = queryNorm
              0.56611437 = fieldWeight in 1549, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0625 = fieldNorm(doc=1549)
        0.2 = coord(5/25)
    
  4. Radev, D.R.; Joseph, M.T.; Gibson, B.; Muthukrishnan, P.: ¬A bibliometric and network analysis of the field of computational linguistics (2016) 0.14
    0.13993858 = sum of:
      0.13993858 = product of:
        0.87461615 = sum of:
          0.10546684 = weight(abstract_txt:computational in 2764) [ClassicSimilarity], result of:
            0.10546684 = score(doc=2764,freq=1.0), product of:
              0.17690168 = queryWeight, product of:
                2.213393 = boost
                6.3593493 = idf(docFreq=207, maxDocs=44218)
                0.012567843 = queryNorm
              0.596189 = fieldWeight in 2764, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3593493 = idf(docFreq=207, maxDocs=44218)
                0.09375 = fieldNorm(doc=2764)
          0.120437324 = weight(abstract_txt:linguistics in 2764) [ClassicSimilarity], result of:
            0.120437324 = score(doc=2764,freq=1.0), product of:
              0.19326895 = queryWeight, product of:
                2.3135219 = boost
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.012567843 = queryNorm
              0.62315917 = fieldWeight in 2764, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.09375 = fieldNorm(doc=2764)
          0.039202686 = weight(abstract_txt:research in 2764) [ClassicSimilarity], result of:
            0.039202686 = score(doc=2764,freq=1.0), product of:
              0.13189824 = queryWeight, product of:
                3.3103406 = boost
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.012567843 = queryNorm
              0.2972192 = fieldWeight in 2764, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.09375 = fieldNorm(doc=2764)
          0.6095093 = weight(abstract_txt:anthology in 2764) [ClassicSimilarity], result of:
            0.6095093 = score(doc=2764,freq=1.0), product of:
              0.7177694 = queryWeight, product of:
                6.305217 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.012567843 = queryNorm
              0.8491715 = fieldWeight in 2764, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.09375 = fieldNorm(doc=2764)
        0.16 = coord(4/25)
    
  5. Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.11
    0.1071913 = sum of:
      0.1071913 = product of:
        0.5359565 = sum of:
          0.04099451 = weight(abstract_txt:processing in 3015) [ClassicSimilarity], result of:
            0.04099451 = score(doc=3015,freq=1.0), product of:
              0.10639617 = queryWeight, product of:
                1.7165464 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.012567843 = queryNorm
              0.38530064 = fieldWeight in 3015, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
          0.08788903 = weight(abstract_txt:computational in 3015) [ClassicSimilarity], result of:
            0.08788903 = score(doc=3015,freq=1.0), product of:
              0.17690168 = queryWeight, product of:
                2.213393 = boost
                6.3593493 = idf(docFreq=207, maxDocs=44218)
                0.012567843 = queryNorm
              0.49682415 = fieldWeight in 3015, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3593493 = idf(docFreq=207, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
          0.10036444 = weight(abstract_txt:linguistics in 3015) [ClassicSimilarity], result of:
            0.10036444 = score(doc=3015,freq=1.0), product of:
              0.19326895 = queryWeight, product of:
                2.3135219 = boost
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.012567843 = queryNorm
              0.5192993 = fieldWeight in 3015, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
          0.032668903 = weight(abstract_txt:research in 3015) [ClassicSimilarity], result of:
            0.032668903 = score(doc=3015,freq=1.0), product of:
              0.13189824 = queryWeight, product of:
                3.3103406 = boost
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.012567843 = queryNorm
              0.24768265 = fieldWeight in 3015, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
          0.27403963 = weight(abstract_txt:corpus in 3015) [ClassicSimilarity], result of:
            0.27403963 = score(doc=3015,freq=2.0), product of:
              0.4067127 = queryWeight, product of:
                5.306479 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.012567843 = queryNorm
              0.67379165 = fieldWeight in 3015, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
        0.2 = coord(5/25)