Search (8 results, page 1 of 1)

Gonçalves, M.A.; Moreira, B.L.; Fox, E.A.; Watson, L.T.: "What is a good digital library?" : a quality model for digital libraries (2007) 0.04
```
0.03731078 = product of:
  0.13058773 = sum of:
    0.03718255 = weight(_text_:processing in 937) [ClassicSimilarity], result of:
      0.03718255 = score(doc=937,freq=2.0), product of:
        0.1662677 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.04107254 = queryNorm
        0.22363065 = fieldWeight in 937, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0390625 = fieldNorm(doc=937)
    0.09340518 = weight(_text_:digital in 937) [ClassicSimilarity], result of:
      0.09340518 = score(doc=937,freq=14.0), product of:
        0.16201277 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.04107254 = queryNorm
        0.57652974 = fieldWeight in 937, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0390625 = fieldNorm(doc=937)
  0.2857143 = coord(2/7)
```
Abstract

In this article, we elaborate on the meaning of quality in digital libraries (DLs) by proposing a model that is deeply grounded in a formal framework for digital libraries: 5S (Streams, Structures, Spaces, Scenarios, and Societies). For each major DL concept in the framework we formally define a number of dimensions of quality and propose a set of numerical indicators for those quality dimensions. In particular, we consider key concepts of a minimal DL: catalog, collection, digital object, metadata specification, repository, and services. Regarding quality dimensions, we consider: accessibility, accuracy, completeness, composability, conformance, consistency, effectiveness, efficiency, extensibility, pertinence, preservability, relevance, reliability, reusability, significance, similarity, and timeliness. Regarding measurement, we consider characteristics like: response time (with regard to efficiency), cost of migration (with respect to preservability), and number of service failures (to assess reliability). For some key DL concepts, the (quality dimension, numerical indicator) pairs are illustrated through their application to a number of "real-world" digital libraries. We also discuss connections between the proposed dimensions of DL quality and an expanded version of a workshop's consensus view of the life cycle of information in digital libraries. Such connections can be used to determine when and where quality issues can be measured, assessed, and improved - as well as how possible quality problems can be prevented, detected, and eliminated.

Source

Information processing and management. 43(2007) no.5, S.1416-1437
Couto, T.; Cristo, M.; Gonçalves, M.A.; Calado, P.; Ziviani, N.; Moura, E.; Ribeiro-Neto, B.: ¬A comparative study of citations and links in document classification (2006) 0.04
```
0.037287988 = product of:
  0.13050795 = sum of:
    0.08647639 = weight(_text_:digital in 2531) [ClassicSimilarity], result of:
      0.08647639 = score(doc=2531,freq=12.0), product of:
        0.16201277 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.04107254 = queryNorm
        0.5337628 = fieldWeight in 2531, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2531)
    0.044031553 = weight(_text_:techniques in 2531) [ClassicSimilarity], result of:
      0.044031553 = score(doc=2531,freq=2.0), product of:
        0.18093403 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.04107254 = queryNorm
        0.24335694 = fieldWeight in 2531, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2531)
  0.2857143 = coord(2/7)
```
Abstract

It is well known that links are an important source of information when dealing with Web collections. However, the question remains on whether the same techniques that are used on the Web can be applied to collections of documents containing citations between scientific papers. In this work we present a comparative study of digital library citations and Web links, in the context of automatic text classification. We show that there are in fact differences between citations and links in this context. For the comparison, we run a series of experiments using a digital library of computer science papers and a Web directory. In our reference collections, measures based on co-citation tend to perform better for pages in the Web directory, with gains up to 37% over text based classifiers, while measures based on bibliographic coupling perform better in a digital library. We also propose a simple and effective way of combining a traditional text based classifier with a citation-link based classifier. This combination is based on the notion of classifier reliability and presented gains of up to 14% in micro-averaged F1 in the Web collection. However, no significant gain was obtained in the digital library. Finally, a user study was performed to further investigate the causes for these results. We discovered that misclassifications by the citation-link based classifiers are in fact difficult cases, hard to classify even for humans.

Source

International Conference on Digital Libraries: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, Chapel Hill, NC, USA
Silva, A.J.C.; Gonçalves, M.A.; Laender, A.H.F.; Modesto, M.A.B.; Cristo, M.; Ziviani, N.: Finding what is missing from a digital library : a case study in the computer science field (2009) 0.03
```
0.02809446 = product of:
  0.0983306 = sum of:
    0.03718255 = weight(_text_:processing in 4219) [ClassicSimilarity], result of:
      0.03718255 = score(doc=4219,freq=2.0), product of:
        0.1662677 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.04107254 = queryNorm
        0.22363065 = fieldWeight in 4219, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4219)
    0.061148047 = weight(_text_:digital in 4219) [ClassicSimilarity], result of:
      0.061148047 = score(doc=4219,freq=6.0), product of:
        0.16201277 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.04107254 = queryNorm
        0.37742734 = fieldWeight in 4219, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4219)
  0.2857143 = coord(2/7)
```
Abstract

This article proposes a process to retrieve the URL of a document for which metadata records exist in a digital library catalog but a pointer to the full text of the document is not available. The process uses results from queries submitted to Web search engines for finding the URL of the corresponding full text or any related material. We present a comprehensive study of this process in different situations by investigating different query strategies applied to three general purpose search engines (Google, Yahoo!, MSN) and two specialized ones (Scholar and CiteSeer), considering five user scenarios. Specifically, we have conducted experiments with metadata records taken from the Brazilian Digital Library of Computing (BDBComp) and The DBLP Computer Science Bibliography (DBLP). We found that Scholar was the most effective search engine for this task in all considered scenarios and that simple strategies for combining and re-ranking results from Scholar and Google significantly improve the retrieval quality. Moreover, we study the influence of the number of query results on the effectiveness of finding missing information as well as the coverage of the proposed scenarios.

Source

Information processing and management. 45(2009) no.3, S.380-391
Belém, F.M.; Almeida, J.M.; Gonçalves, M.A.: ¬A survey on tag recommendation methods : a review (2017) 0.02
```
0.016555276 = product of:
  0.05794346 = sum of:
    0.044031553 = weight(_text_:techniques in 3524) [ClassicSimilarity], result of:
      0.044031553 = score(doc=3524,freq=2.0), product of:
        0.18093403 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.04107254 = queryNorm
        0.24335694 = fieldWeight in 3524, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3524)
    0.013911906 = product of:
      0.027823811 = sum of:
        0.027823811 = weight(_text_:22 in 3524) [ClassicSimilarity], result of:
          0.027823811 = score(doc=3524,freq=2.0), product of:
            0.14382903 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04107254 = queryNorm
            0.19345059 = fieldWeight in 3524, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3524)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

Tags (keywords freely assigned by users to describe web content) have become highly popular on Web 2.0 applications, because of the strong stimuli and easiness for users to create and describe their own content. This increase in tag popularity has led to a vast literature on tag recommendation methods. These methods aim at assisting users in the tagging process, possibly increasing the quality of the generated tags and, consequently, improving the quality of the information retrieval (IR) services that rely on tags as data sources. Regardless of the numerous and diversified previous studies on tag recommendation, to our knowledge, no previous work has summarized and organized them into a single survey article. In this article, we propose a taxonomy for tag recommendation methods, classifying them according to the target of the recommendations, their objectives, exploited data sources, and underlying techniques. Moreover, we provide a critical overview of these methods, pointing out their advantages and disadvantages. Finally, we describe the main open challenges related to the field, such as tag ambiguity, cold start, and evaluation issues.

Date

16.11.2017 13:30:22
Martins, E.F.; Belém, F.M.; Almeida, J.M.; Gonçalves, M.A.: On cold start for associative tag recommendation (2016) 0.01
```
0.006290222 = product of:
  0.044031553 = sum of:
    0.044031553 = weight(_text_:techniques in 2494) [ClassicSimilarity], result of:
      0.044031553 = score(doc=2494,freq=2.0), product of:
        0.18093403 = queryWeight, product of:
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.04107254 = queryNorm
        0.24335694 = fieldWeight in 2494, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.405231 = idf(docFreq=1467, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2494)
  0.14285715 = coord(1/7)
```
Abstract

Tag recommendation strategies that exploit term co-occurrence patterns with tags previously assigned to the target object have consistently produced state-of-the-art results. However, such techniques work only for objects with previously assigned tags. Here we focus on tag recommendation for objects with no tags, a variation of the well-known \textit{cold start} problem. We start by evaluating state-of-the-art co-occurrence based methods in cold start. Our results show that the effectiveness of these methods suffers in this situation. Moreover, we show that employing various automatic filtering strategies to generate an initial tag set that enables the use of co-occurrence patterns produces only marginal improvements. We then propose a new approach that exploits both positive and negative user feedback to iteratively select input tags along with a genetic programming strategy to learn the recommendation function. Our experimental results indicate that extending the methods to include user relevance feedback leads to gains in precision of up to 58% over the best baseline in cold start scenarios and gains of up to 43% over the best baseline in objects that contain some initial tags (i.e., no cold start). We also show that our best relevance-feedback-driven strategy performs well even in scenarios that lack user cooperation (i.e., users may refuse to provide feedback) and user reliability (i.e., users may provide the wrong feedback).
Santana, A.F.; Gonçalves, M.A.; Laender, A.H.F.; Ferreira, A.A.: Incremental author name disambiguation by exploiting domain-specific heuristics (2017) 0.01
```
0.0060520875 = product of:
  0.042364612 = sum of:
    0.042364612 = weight(_text_:digital in 3587) [ClassicSimilarity], result of:
      0.042364612 = score(doc=3587,freq=2.0), product of:
        0.16201277 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.04107254 = queryNorm
        0.26148933 = fieldWeight in 3587, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.046875 = fieldNorm(doc=3587)
  0.14285715 = coord(1/7)
```
Abstract

The vast majority of the current author name disambiguation solutions are designed to disambiguate a whole digital library (DL) at once considering the entire repository. However, these solutions besides being very expensive and having scalability problems, also may not benefit from eventual manual corrections, as they may be lost whenever the process of disambiguating the entire repository is required. In the real world, in which repositories are updated on a daily basis, incremental solutions that disambiguate only the newly introduced citation records, are likely to produce improved results in the long run. However, the problem of incremental author name disambiguation has been largely neglected in the literature. In this article we present a new author name disambiguation method, specially designed for the incremental scenario. In our experiments, our new method largely outperforms recent incremental proposals reported in the literature as well as the current state-of-the-art non-incremental method.
Cota, R.G.; Ferreira, A.A.; Nascimento, C.; Gonçalves, M.A.; Laender, A.H.F.: ¬An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations (2010) 0.01
```
0.0050434056 = product of:
  0.03530384 = sum of:
    0.03530384 = weight(_text_:digital in 3986) [ClassicSimilarity], result of:
      0.03530384 = score(doc=3986,freq=2.0), product of:
        0.16201277 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.04107254 = queryNorm
        0.21790776 = fieldWeight in 3986, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3986)
  0.14285715 = coord(1/7)
```
Abstract

Name ambiguity in the context of bibliographic citations is a difficult problem which, despite the many efforts from the research community, still has a lot of room for improvement. In this article, we present a heuristic-based hierarchical clustering method to deal with this problem. The method successively fuses clusters of citations of similar author names based on several heuristics and similarity measures on the components of the citations (e.g., coauthor names, work title, and publication venue title). During the disambiguation task, the information about fused clusters is aggregated providing more information for the next round of fusion. In order to demonstrate the effectiveness of our method, we ran a series of experiments in two different collections extracted from real-world digital libraries and compared it, under two metrics, with four representative methods described in the literature. We present comparisons of results using each considered attribute separately (i.e., coauthor names, work title, and publication venue title) with the author name attribute and using all attributes together. These results show that our unsupervised method, when using all attributes, performs competitively against all other methods, under both metrics, loosing only in one case against a supervised method, whose result was very close to ours. Moreover, such results are achieved without the burden of any training and without using any privileged information such as knowing a priori the correct number of clusters.

Dalip, D.H.; Gonçalves, M.A.; Cristo, M.; Calado, P.: ¬A general multiview framework for assessing the quality of collaboratively created content on web 2.0 (2017) 0.00

0.0019874151 = product of:
  0.013911906 = sum of:
    0.013911906 = product of:
      0.027823811 = sum of:
        0.027823811 = weight(_text_:22 in 3343) [ClassicSimilarity], result of:
          0.027823811 = score(doc=3343,freq=2.0), product of:
            0.14382903 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04107254 = queryNorm
            0.19345059 = fieldWeight in 3343, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3343)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)

Date: 16.11.2017 13:04:22

Search (8 results, page 1 of 1)

Authors

Years

Themes