Search (1 results, page 1 of 1)

  • × author_ss:"Cortez, E."
  • × author_ss:"Gonçalves, M.A."
  1. Cortez, E.; Silva, A.S. da; Gonçalves, M.A.; Mesquita, F.; Moura, E.S. de: ¬A flexible approach for extracting metadata from bibliographic citations (2009) 0.01
    0.007663213 = product of:
      0.015326426 = sum of:
        0.015326426 = product of:
          0.030652853 = sum of:
            0.030652853 = weight(_text_:web in 2848) [ClassicSimilarity], result of:
              0.030652853 = score(doc=2848,freq=2.0), product of:
                0.17002425 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.052098576 = queryNorm
                0.18028519 = fieldWeight in 2848, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2848)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this article we present FLUX-CiM, a novel method for extracting components (e.g., author names, article titles, venues, page numbers) from bibliographic citations. Our method does not rely on patterns encoding specific delimiters used in a particular citation style. This feature yields a high degree of automation and flexibility, and allows FLUX-CiM to extract from citations in any given format. Differently from previous methods that are based on models learned from user-driven training, our method relies on a knowledge base automatically constructed from an existing set of sample metadata records from a given field (e.g., computer science, health sciences, social sciences, etc.). These records are usually available on the Web or other public data repositories. To demonstrate the effectiveness and applicability of our proposed method, we present a series of experiments in which we apply it to extract bibliographic data from citations in articles of different fields. Results of these experiments exhibit precision and recall levels above 94% for all fields, and perfect extraction for the large majority of citations tested. In addition, in a comparison against a state-of-the-art information-extraction method, ours produced superior results without the training phase required by that method. Finally, we present a strategy for using bibliographic data resulting from the extraction process with FLUX-CiM to automatically update and expand the knowledge base of a given domain. We show that this strategy can be used to achieve good extraction results even if only a very small initial sample of bibliographic records is available for building the knowledge base.