Search (4 results, page 1 of 1)

  • × author_ss:"Cristo, M."
  1. Calado, P.; Cristo, M.; Gonçalves, M.A.; Moura, E.S. de; Ribeiro-Neto, B.; Ziviani, N.: Link-based similarity measures for the classification of Web documents (2006) 0.03
    0.031929266 = product of:
      0.15964633 = sum of:
        0.15964633 = weight(_text_:link in 4921) [ClassicSimilarity], result of:
          0.15964633 = score(doc=4921,freq=8.0), product of:
            0.2711644 = queryWeight, product of:
              5.3287 = idf(docFreq=582, maxDocs=44218)
              0.05088753 = queryNorm
            0.58874375 = fieldWeight in 4921, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.3287 = idf(docFreq=582, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4921)
      0.2 = coord(1/5)
    
    Abstract
    Traditional text-based document classifiers tend to perform poorly an the Web. Text in Web documents is usually noisy and often does not contain enough information to determine their topic. However, the Web provides a different source that can be useful to document classification: its hyperlink structure. In this work, the authors evaluate how the link structure of the Web can be used to determine a measure of similarity appropriate for document classification. They experiment with five different similarity measures and determine their adequacy for predicting the topic of a Web page. Tests performed an a Web directory Show that link information alone allows classifying documents with an average precision of 86%. Further, when combined with a traditional textbased classifier, precision increases to values of up to 90%, representing gains that range from 63 to 132% over the use of text-based classification alone. Because the measures proposed in this article are straightforward to compute, they provide a practical and effective solution for Web classification and related information retrieval tasks. Further, the authors provide an important set of guidelines an how link structure can be used effectively to classify Web documents.
  2. Couto, T.; Cristo, M.; Gonçalves, M.A.; Calado, P.; Ziviani, N.; Moura, E.; Ribeiro-Neto, B.: ¬A comparative study of citations and links in document classification (2006) 0.02
    0.022577403 = product of:
      0.11288701 = sum of:
        0.11288701 = weight(_text_:link in 2531) [ClassicSimilarity], result of:
          0.11288701 = score(doc=2531,freq=4.0), product of:
            0.2711644 = queryWeight, product of:
              5.3287 = idf(docFreq=582, maxDocs=44218)
              0.05088753 = queryNorm
            0.4163047 = fieldWeight in 2531, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.3287 = idf(docFreq=582, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2531)
      0.2 = coord(1/5)
    
    Abstract
    It is well known that links are an important source of information when dealing with Web collections. However, the question remains on whether the same techniques that are used on the Web can be applied to collections of documents containing citations between scientific papers. In this work we present a comparative study of digital library citations and Web links, in the context of automatic text classification. We show that there are in fact differences between citations and links in this context. For the comparison, we run a series of experiments using a digital library of computer science papers and a Web directory. In our reference collections, measures based on co-citation tend to perform better for pages in the Web directory, with gains up to 37% over text based classifiers, while measures based on bibliographic coupling perform better in a digital library. We also propose a simple and effective way of combining a traditional text based classifier with a citation-link based classifier. This combination is based on the notion of classifier reliability and presented gains of up to 14% in micro-averaged F1 in the Web collection. However, no significant gain was obtained in the digital library. Finally, a user study was performed to further investigate the causes for these results. We discovered that misclassifications by the citation-link based classifiers are in fact difficult cases, hard to classify even for humans.
  3. Souza, J.; Carvalho, A.; Cristo, M.; Moura, E.; Calado, P.; Chirita, P.-A.; Nejdl, W.: Using site-level connections to estimate link confidence (2012) 0.02
    0.022577403 = product of:
      0.11288701 = sum of:
        0.11288701 = weight(_text_:link in 498) [ClassicSimilarity], result of:
          0.11288701 = score(doc=498,freq=4.0), product of:
            0.2711644 = queryWeight, product of:
              5.3287 = idf(docFreq=582, maxDocs=44218)
              0.05088753 = queryNorm
            0.4163047 = fieldWeight in 498, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.3287 = idf(docFreq=582, maxDocs=44218)
              0.0390625 = fieldNorm(doc=498)
      0.2 = coord(1/5)
    
    Abstract
    Search engines are essential tools for web users today. They rely on a large number of features to compute the rank of search results for each given query. The estimated reputation of pages is among the effective features available for search engine designers, probably being adopted by most current commercial search engines. Page reputation is estimated by analyzing the linkage relationships between pages. This information is used by link analysis algorithms as a query-independent feature, to be taken into account when computing the rank of the results. Unfortunately, several types of links found on the web may damage the estimated page reputation and thus cause a negative effect on the quality of search results. This work studies alternatives to reduce the negative impact of such noisy links. More specifically, the authors propose and evaluate new methods that deal with noisy links, considering scenarios where the reputation of pages is computed using the PageRank algorithm. They show, through experiments with real web content, that their methods achieve significant improvements when compared to previous solutions proposed in the literature.
  4. Dalip, D.H.; Gonçalves, M.A.; Cristo, M.; Calado, P.: ¬A general multiview framework for assessing the quality of collaboratively created content on web 2.0 (2017) 0.01
    0.006894558 = product of:
      0.03447279 = sum of:
        0.03447279 = weight(_text_:22 in 3343) [ClassicSimilarity], result of:
          0.03447279 = score(doc=3343,freq=2.0), product of:
            0.17819946 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05088753 = queryNorm
            0.19345059 = fieldWeight in 3343, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3343)
      0.2 = coord(1/5)
    
    Date
    16.11.2017 13:04:22