Search (6 results, page 1 of 1)

Couto, T.; Cristo, M.; Gonçalves, M.A.; Calado, P.; Ziviani, N.; Moura, E.; Ribeiro-Neto, B.: ¬A comparative study of citations and links in document classification (2006) 0.04
```
0.035117112 = product of:
  0.10535134 = sum of:
    0.10535134 = weight(_text_:citation in 2531) [ClassicSimilarity], result of:
      0.10535134 = score(doc=2531,freq=6.0), product of:
        0.23479973 = queryWeight, product of:
          4.6892867 = idf(docFreq=1104, maxDocs=44218)
          0.050071523 = queryNorm
        0.44868594 = fieldWeight in 2531, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.6892867 = idf(docFreq=1104, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2531)
  0.33333334 = coord(1/3)
```
Abstract

It is well known that links are an important source of information when dealing with Web collections. However, the question remains on whether the same techniques that are used on the Web can be applied to collections of documents containing citations between scientific papers. In this work we present a comparative study of digital library citations and Web links, in the context of automatic text classification. We show that there are in fact differences between citations and links in this context. For the comparison, we run a series of experiments using a digital library of computer science papers and a Web directory. In our reference collections, measures based on co-citation tend to perform better for pages in the Web directory, with gains up to 37% over text based classifiers, while measures based on bibliographic coupling perform better in a digital library. We also propose a simple and effective way of combining a traditional text based classifier with a citation-link based classifier. This combination is based on the notion of classifier reliability and presented gains of up to 14% in micro-averaged F1 in the Web collection. However, no significant gain was obtained in the digital library. Finally, a user study was performed to further investigate the causes for these results. We discovered that misclassifications by the citation-link based classifiers are in fact difficult cases, hard to classify even for humans.
Ferreira, A.A.; Veloso, A.; Gonçalves, M.A.; Laender, A.H.F.: Self-training author name disambiguation for information scarce scenarios (2014) 0.03
```
0.028673 = product of:
  0.086019 = sum of:
    0.086019 = weight(_text_:citation in 1292) [ClassicSimilarity], result of:
      0.086019 = score(doc=1292,freq=4.0), product of:
        0.23479973 = queryWeight, product of:
          4.6892867 = idf(docFreq=1104, maxDocs=44218)
          0.050071523 = queryNorm
        0.36635053 = fieldWeight in 1292, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.6892867 = idf(docFreq=1104, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1292)
  0.33333334 = coord(1/3)
```
Abstract

We present a novel 3-step self-training method for author name disambiguation-SAND (self-training associative name disambiguator)-which requires no manual labeling, no parameterization (in real-world scenarios) and is particularly suitable for the common situation in which only the most basic information about a citation record is available (i.e., author names, and work and venue titles). During the first step, real-world heuristics on coauthors are able to produce highly pure (although fragmented) clusters. The most representative of these clusters are then selected to serve as training data for the third supervised author assignment step. The third step exploits a state-of-the-art transductive disambiguation method capable of detecting unseen authors not included in any training example and incorporating reliable predictions to the training data. Experiments conducted with standard public collections, using the minimum set of attributes present in a citation, demonstrate that our proposed method outperforms all representative unsupervised author grouping disambiguation methods and is very competitive with fully supervised author assignment methods. Thus, different from other bootstrapping methods that explore privileged, hard to obtain information such as self-citations and personal information, our proposed method produces topnotch performance with no (manual) training data or parameterization and in the presence of scarce information.
Santana, A.F.; Gonçalves, M.A.; Laender, A.H.F.; Ferreira, A.A.: Incremental author name disambiguation by exploiting domain-specific heuristics (2017) 0.02
```
0.024329849 = product of:
  0.072989546 = sum of:
    0.072989546 = weight(_text_:citation in 3587) [ClassicSimilarity], result of:
      0.072989546 = score(doc=3587,freq=2.0), product of:
        0.23479973 = queryWeight, product of:
          4.6892867 = idf(docFreq=1104, maxDocs=44218)
          0.050071523 = queryNorm
        0.31085873 = fieldWeight in 3587, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6892867 = idf(docFreq=1104, maxDocs=44218)
          0.046875 = fieldNorm(doc=3587)
  0.33333334 = coord(1/3)
```
Abstract

The vast majority of the current author name disambiguation solutions are designed to disambiguate a whole digital library (DL) at once considering the entire repository. However, these solutions besides being very expensive and having scalability problems, also may not benefit from eventual manual corrections, as they may be lost whenever the process of disambiguating the entire repository is required. In the real world, in which repositories are updated on a daily basis, incremental solutions that disambiguate only the newly introduced citation records, are likely to produce improved results in the long run. However, the problem of incremental author name disambiguation has been largely neglected in the literature. In this article we present a new author name disambiguation method, specially designed for the incremental scenario. In our experiments, our new method largely outperforms recent incremental proposals reported in the literature as well as the current state-of-the-art non-incremental method.
Cortez, E.; Silva, A.S. da; Gonçalves, M.A.; Mesquita, F.; Moura, E.S. de: ¬A flexible approach for extracting metadata from bibliographic citations (2009) 0.02
```
0.020274874 = product of:
  0.06082462 = sum of:
    0.06082462 = weight(_text_:citation in 2848) [ClassicSimilarity], result of:
      0.06082462 = score(doc=2848,freq=2.0), product of:
        0.23479973 = queryWeight, product of:
          4.6892867 = idf(docFreq=1104, maxDocs=44218)
          0.050071523 = queryNorm
        0.25904894 = fieldWeight in 2848, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6892867 = idf(docFreq=1104, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2848)
  0.33333334 = coord(1/3)
```
Abstract

In this article we present FLUX-CiM, a novel method for extracting components (e.g., author names, article titles, venues, page numbers) from bibliographic citations. Our method does not rely on patterns encoding specific delimiters used in a particular citation style. This feature yields a high degree of automation and flexibility, and allows FLUX-CiM to extract from citations in any given format. Differently from previous methods that are based on models learned from user-driven training, our method relies on a knowledge base automatically constructed from an existing set of sample metadata records from a given field (e.g., computer science, health sciences, social sciences, etc.). These records are usually available on the Web or other public data repositories. To demonstrate the effectiveness and applicability of our proposed method, we present a series of experiments in which we apply it to extract bibliographic data from citations in articles of different fields. Results of these experiments exhibit precision and recall levels above 94% for all fields, and perfect extraction for the large majority of citations tested. In addition, in a comparison against a state-of-the-art information-extraction method, ours produced superior results without the training phase required by that method. Finally, we present a strategy for using bibliographic data resulting from the extraction process with FLUX-CiM to automatically update and expand the knowledge base of a given domain. We show that this strategy can be used to achieve good extraction results even if only a very small initial sample of bibliographic records is available for building the knowledge base.

Dalip, D.H.; Gonçalves, M.A.; Cristo, M.; Calado, P.: ¬A general multiview framework for assessing the quality of collaboratively created content on web 2.0 (2017) 0.01

0.005653334 = product of:
  0.01696 = sum of:
    0.01696 = product of:
      0.03392 = sum of:
        0.03392 = weight(_text_:22 in 3343) [ClassicSimilarity], result of:
          0.03392 = score(doc=3343,freq=2.0), product of:
            0.17534193 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050071523 = queryNorm
            0.19345059 = fieldWeight in 3343, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3343)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 16.11.2017 13:04:22

Belém, F.M.; Almeida, J.M.; Gonçalves, M.A.: ¬A survey on tag recommendation methods : a review (2017) 0.01

0.005653334 = product of:
  0.01696 = sum of:
    0.01696 = product of:
      0.03392 = sum of:
        0.03392 = weight(_text_:22 in 3524) [ClassicSimilarity], result of:
          0.03392 = score(doc=3524,freq=2.0), product of:
            0.17534193 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050071523 = queryNorm
            0.19345059 = fieldWeight in 3524, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3524)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 16.11.2017 13:30:22

Search (6 results, page 1 of 1)

Authors

Years

Themes