Search (7 results, page 1 of 1)

Silva, R.M.; Gonçalves, M.A.; Veloso, A.: ¬A Two-stage active learning method for learning to rank (2014) 0.03
```
0.03400038 = product of:
  0.10200114 = sum of:
    0.10200114 = weight(_text_:query in 1184) [ClassicSimilarity], result of:
      0.10200114 = score(doc=1184,freq=6.0), product of:
        0.22937049 = queryWeight, product of:
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.049352113 = queryNorm
        0.44470036 = fieldWeight in 1184, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1184)
  0.33333334 = coord(1/3)
```
Abstract

Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can later be used to rank new query results. These training sets are costly and laborious to produce, requiring human annotators to assess the relevance or order of the documents in relation to a query. Active learning algorithms are able to reduce the labeling effort by selectively sampling an unlabeled set and choosing data instances that maximize a learning function's effectiveness. In this article, we propose a novel two-stage active learning method for L2R that combines and exploits interesting properties of its constituent parts, thus being effective and practical. In the first stage, an association rule active sampling algorithm is used to select a very small but effective initial training set. In the second stage, a query-by-committee strategy trained with the first-stage set is used to iteratively select more examples until a preset labeling budget is met or a target effectiveness is achieved. We test our method with various LETOR benchmarking data sets and compare it with several baselines to show that it achieves good results using only a small portion of the original training sets.
Silva, A.J.C.; Gonçalves, M.A.; Laender, A.H.F.; Modesto, M.A.B.; Cristo, M.; Ziviani, N.: Finding what is missing from a digital library : a case study in the computer science field (2009) 0.03
```
0.027761191 = product of:
  0.08328357 = sum of:
    0.08328357 = weight(_text_:query in 4219) [ClassicSimilarity], result of:
      0.08328357 = score(doc=4219,freq=4.0), product of:
        0.22937049 = queryWeight, product of:
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.049352113 = queryNorm
        0.3630963 = fieldWeight in 4219, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.6476326 = idf(docFreq=1151, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4219)
  0.33333334 = coord(1/3)
```
Abstract

This article proposes a process to retrieve the URL of a document for which metadata records exist in a digital library catalog but a pointer to the full text of the document is not available. The process uses results from queries submitted to Web search engines for finding the URL of the corresponding full text or any related material. We present a comprehensive study of this process in different situations by investigating different query strategies applied to three general purpose search engines (Google, Yahoo!, MSN) and two specialized ones (Scholar and CiteSeer), considering five user scenarios. Specifically, we have conducted experiments with metadata records taken from the Brazilian Digital Library of Computing (BDBComp) and The DBLP Computer Science Bibliography (DBLP). We found that Scholar was the most effective search engine for this task in all considered scenarios and that simple strategies for combining and re-ranking results from Scholar and Google significantly improve the retrieval quality. Moreover, we study the influence of the number of query results on the effectiveness of finding missing information as well as the coverage of the proposed scenarios.
Moura, E.S. de; Fernandes, D.; Ribeiro-Neto, B.; Silva, A.S. da; Gonçalves, M.A.: Using structural information to improve search in Web collections (2010) 0.02
```
0.01701071 = product of:
  0.051032126 = sum of:
    0.051032126 = product of:
      0.10206425 = sum of:
        0.10206425 = weight(_text_:page in 4119) [ClassicSimilarity], result of:
          0.10206425 = score(doc=4119,freq=2.0), product of:
            0.27565226 = queryWeight, product of:
              5.5854197 = idf(docFreq=450, maxDocs=44218)
              0.049352113 = queryNorm
            0.37026453 = fieldWeight in 4119, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.5854197 = idf(docFreq=450, maxDocs=44218)
              0.046875 = fieldNorm(doc=4119)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

In this work, we investigate the problem of using the block structure of Web pages to improve ranking results. Starting with basic intuitions provided by the concepts of term frequency (TF) and inverse document frequency (IDF), we propose nine block-weight functions to distinguish the impact of term occurrences inside page blocks, instead of inside whole pages. These are then used to compute a modified BM25 ranking function. Using four distinct Web collections, we ran extensive experiments to compare our block-weight ranking formulas with two other baselines: (a) a BM25 ranking applied to full pages, and (b) a BM25 ranking that takes into account best blocks. Our methods suggest that our block-weighting ranking method is superior to all baselines across all collections we used and that average gain in precision figures from 5 to 20% are generated.
Calado, P.; Cristo, M.; Gonçalves, M.A.; Moura, E.S. de; Ribeiro-Neto, B.; Ziviani, N.: Link-based similarity measures for the classification of Web documents (2006) 0.01
```
0.01417559 = product of:
  0.04252677 = sum of:
    0.04252677 = product of:
      0.08505354 = sum of:
        0.08505354 = weight(_text_:page in 4921) [ClassicSimilarity], result of:
          0.08505354 = score(doc=4921,freq=2.0), product of:
            0.27565226 = queryWeight, product of:
              5.5854197 = idf(docFreq=450, maxDocs=44218)
              0.049352113 = queryNorm
            0.30855376 = fieldWeight in 4921, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.5854197 = idf(docFreq=450, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4921)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Traditional text-based document classifiers tend to perform poorly an the Web. Text in Web documents is usually noisy and often does not contain enough information to determine their topic. However, the Web provides a different source that can be useful to document classification: its hyperlink structure. In this work, the authors evaluate how the link structure of the Web can be used to determine a measure of similarity appropriate for document classification. They experiment with five different similarity measures and determine their adequacy for predicting the topic of a Web page. Tests performed an a Web directory Show that link information alone allows classifying documents with an average precision of 86%. Further, when combined with a traditional textbased classifier, precision increases to values of up to 90%, representing gains that range from 63 to 132% over the use of text-based classification alone. Because the measures proposed in this article are straightforward to compute, they provide a practical and effective solution for Web classification and related information retrieval tasks. Further, the authors provide an important set of guidelines an how link structure can be used effectively to classify Web documents.
Cortez, E.; Silva, A.S. da; Gonçalves, M.A.; Mesquita, F.; Moura, E.S. de: ¬A flexible approach for extracting metadata from bibliographic citations (2009) 0.01
```
0.01417559 = product of:
  0.04252677 = sum of:
    0.04252677 = product of:
      0.08505354 = sum of:
        0.08505354 = weight(_text_:page in 2848) [ClassicSimilarity], result of:
          0.08505354 = score(doc=2848,freq=2.0), product of:
            0.27565226 = queryWeight, product of:
              5.5854197 = idf(docFreq=450, maxDocs=44218)
              0.049352113 = queryNorm
            0.30855376 = fieldWeight in 2848, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.5854197 = idf(docFreq=450, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2848)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

In this article we present FLUX-CiM, a novel method for extracting components (e.g., author names, article titles, venues, page numbers) from bibliographic citations. Our method does not rely on patterns encoding specific delimiters used in a particular citation style. This feature yields a high degree of automation and flexibility, and allows FLUX-CiM to extract from citations in any given format. Differently from previous methods that are based on models learned from user-driven training, our method relies on a knowledge base automatically constructed from an existing set of sample metadata records from a given field (e.g., computer science, health sciences, social sciences, etc.). These records are usually available on the Web or other public data repositories. To demonstrate the effectiveness and applicability of our proposed method, we present a series of experiments in which we apply it to extract bibliographic data from citations in articles of different fields. Results of these experiments exhibit precision and recall levels above 94% for all fields, and perfect extraction for the large majority of citations tested. In addition, in a comparison against a state-of-the-art information-extraction method, ours produced superior results without the training phase required by that method. Finally, we present a strategy for using bibliographic data resulting from the extraction process with FLUX-CiM to automatically update and expand the knowledge base of a given domain. We show that this strategy can be used to achieve good extraction results even if only a very small initial sample of bibliographic records is available for building the knowledge base.

Dalip, D.H.; Gonçalves, M.A.; Cristo, M.; Calado, P.: ¬A general multiview framework for assessing the quality of collaboratively created content on web 2.0 (2017) 0.01

0.0055721086 = product of:
  0.016716326 = sum of:
    0.016716326 = product of:
      0.03343265 = sum of:
        0.03343265 = weight(_text_:22 in 3343) [ClassicSimilarity], result of:
          0.03343265 = score(doc=3343,freq=2.0), product of:
            0.1728227 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049352113 = queryNorm
            0.19345059 = fieldWeight in 3343, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3343)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 16.11.2017 13:04:22

Belém, F.M.; Almeida, J.M.; Gonçalves, M.A.: ¬A survey on tag recommendation methods : a review (2017) 0.01

0.0055721086 = product of:
  0.016716326 = sum of:
    0.016716326 = product of:
      0.03343265 = sum of:
        0.03343265 = weight(_text_:22 in 3524) [ClassicSimilarity], result of:
          0.03343265 = score(doc=3524,freq=2.0), product of:
            0.1728227 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049352113 = queryNorm
            0.19345059 = fieldWeight in 3524, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3524)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 16.11.2017 13:30:22

Search (7 results, page 1 of 1)

Authors

Years

Themes