Search (4 results, page 1 of 1)

Silva, A.J.C.; Gonçalves, M.A.; Laender, A.H.F.; Modesto, M.A.B.; Cristo, M.; Ziviani, N.: Finding what is missing from a digital library : a case study in the computer science field (2009) 0.06
```
0.05671598 = product of:
  0.11343196 = sum of:
    0.07853575 = weight(_text_:digital in 4219) [ClassicSimilarity], result of:
      0.07853575 = score(doc=4219,freq=6.0), product of:
        0.20808177 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.052751686 = queryNorm
        0.37742734 = fieldWeight in 4219, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4219)
    0.03489621 = weight(_text_:library in 4219) [ClassicSimilarity], result of:
      0.03489621 = score(doc=4219,freq=6.0), product of:
        0.13870415 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.052751686 = queryNorm
        0.25158736 = fieldWeight in 4219, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4219)
  0.5 = coord(2/4)
```
Abstract

This article proposes a process to retrieve the URL of a document for which metadata records exist in a digital library catalog but a pointer to the full text of the document is not available. The process uses results from queries submitted to Web search engines for finding the URL of the corresponding full text or any related material. We present a comprehensive study of this process in different situations by investigating different query strategies applied to three general purpose search engines (Google, Yahoo!, MSN) and two specialized ones (Scholar and CiteSeer), considering five user scenarios. Specifically, we have conducted experiments with metadata records taken from the Brazilian Digital Library of Computing (BDBComp) and The DBLP Computer Science Bibliography (DBLP). We found that Scholar was the most effective search engine for this task in all considered scenarios and that simple strategies for combining and re-ranking results from Scholar and Google significantly improve the retrieval quality. Moreover, we study the influence of the number of query results on the effectiveness of finding missing information as well as the coverage of the proposed scenarios.
Santana, A.F.; Gonçalves, M.A.; Laender, A.H.F.; Ferreira, A.A.: Incremental author name disambiguation by exploiting domain-specific heuristics (2017) 0.04
```
0.039293982 = product of:
  0.078587964 = sum of:
    0.05441116 = weight(_text_:digital in 3587) [ClassicSimilarity], result of:
      0.05441116 = score(doc=3587,freq=2.0), product of:
        0.20808177 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.052751686 = queryNorm
        0.26148933 = fieldWeight in 3587, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.046875 = fieldNorm(doc=3587)
    0.0241768 = weight(_text_:library in 3587) [ClassicSimilarity], result of:
      0.0241768 = score(doc=3587,freq=2.0), product of:
        0.13870415 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.052751686 = queryNorm
        0.17430481 = fieldWeight in 3587, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.046875 = fieldNorm(doc=3587)
  0.5 = coord(2/4)
```
Abstract

The vast majority of the current author name disambiguation solutions are designed to disambiguate a whole digital library (DL) at once considering the entire repository. However, these solutions besides being very expensive and having scalability problems, also may not benefit from eventual manual corrections, as they may be lost whenever the process of disambiguating the entire repository is required. In the real world, in which repositories are updated on a daily basis, incremental solutions that disambiguate only the newly introduced citation records, are likely to produce improved results in the long run. However, the problem of incremental author name disambiguation has been largely neglected in the literature. In this article we present a new author name disambiguation method, specially designed for the incremental scenario. In our experiments, our new method largely outperforms recent incremental proposals reported in the literature as well as the current state-of-the-art non-incremental method.
Cota, R.G.; Ferreira, A.A.; Nascimento, C.; Gonçalves, M.A.; Laender, A.H.F.: ¬An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations (2010) 0.01
```
0.011335658 = product of:
  0.04534263 = sum of:
    0.04534263 = weight(_text_:digital in 3986) [ClassicSimilarity], result of:
      0.04534263 = score(doc=3986,freq=2.0), product of:
        0.20808177 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.052751686 = queryNorm
        0.21790776 = fieldWeight in 3986, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3986)
  0.25 = coord(1/4)
```
Abstract

Name ambiguity in the context of bibliographic citations is a difficult problem which, despite the many efforts from the research community, still has a lot of room for improvement. In this article, we present a heuristic-based hierarchical clustering method to deal with this problem. The method successively fuses clusters of citations of similar author names based on several heuristics and similarity measures on the components of the citations (e.g., coauthor names, work title, and publication venue title). During the disambiguation task, the information about fused clusters is aggregated providing more information for the next round of fusion. In order to demonstrate the effectiveness of our method, we ran a series of experiments in two different collections extracted from real-world digital libraries and compared it, under two metrics, with four representative methods described in the literature. We present comparisons of results using each considered attribute separately (i.e., coauthor names, work title, and publication venue title) with the author name attribute and using all attributes together. These results show that our unsupervised method, when using all attributes, performs competitively against all other methods, under both metrics, loosing only in one case against a supervised method, whose result was very close to ours. Moreover, such results are achieved without the burden of any training and without using any privileged information such as knowing a priori the correct number of clusters.

Freitas-Junior, H.R.; Ribeiro-Neto, B.A.; Freitas-Vale, R. de; Laender, A.H.F.; Lima, L.R.S. de: Categorization-driven cross-language retrieval of medical information (2006) 0.00

0.004466953 = product of:
  0.017867813 = sum of:
    0.017867813 = product of:
      0.035735626 = sum of:
        0.035735626 = weight(_text_:22 in 5282) [ClassicSimilarity], result of:
          0.035735626 = score(doc=5282,freq=2.0), product of:
            0.18472742 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052751686 = queryNorm
            0.19345059 = fieldWeight in 5282, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5282)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 7.2006 16:46:36

Search (4 results, page 1 of 1)

Authors

Years

Themes