Search (4 results, page 1 of 1)

Calado, P.; Cristo, M.; Gonçalves, M.A.; Moura, E.S. de; Ribeiro-Neto, B.; Ziviani, N.: Link-based similarity measures for the classification of Web documents (2006) 0.03
```
0.031929266 = product of:
  0.15964633 = sum of:
    0.15964633 = weight(_text_:link in 4921) [ClassicSimilarity], result of:
      0.15964633 = score(doc=4921,freq=8.0), product of:
        0.2711644 = queryWeight, product of:
          5.3287 = idf(docFreq=582, maxDocs=44218)
          0.05088753 = queryNorm
        0.58874375 = fieldWeight in 4921, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          5.3287 = idf(docFreq=582, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4921)
  0.2 = coord(1/5)
```
Abstract

Traditional text-based document classifiers tend to perform poorly an the Web. Text in Web documents is usually noisy and often does not contain enough information to determine their topic. However, the Web provides a different source that can be useful to document classification: its hyperlink structure. In this work, the authors evaluate how the link structure of the Web can be used to determine a measure of similarity appropriate for document classification. They experiment with five different similarity measures and determine their adequacy for predicting the topic of a Web page. Tests performed an a Web directory Show that link information alone allows classifying documents with an average precision of 86%. Further, when combined with a traditional textbased classifier, precision increases to values of up to 90%, representing gains that range from 63 to 132% over the use of text-based classification alone. Because the measures proposed in this article are straightforward to compute, they provide a practical and effective solution for Web classification and related information retrieval tasks. Further, the authors provide an important set of guidelines an how link structure can be used effectively to classify Web documents.
Couto, T.; Cristo, M.; Gonçalves, M.A.; Calado, P.; Ziviani, N.; Moura, E.; Ribeiro-Neto, B.: ¬A comparative study of citations and links in document classification (2006) 0.02
```
0.022577403 = product of:
  0.11288701 = sum of:
    0.11288701 = weight(_text_:link in 2531) [ClassicSimilarity], result of:
      0.11288701 = score(doc=2531,freq=4.0), product of:
        0.2711644 = queryWeight, product of:
          5.3287 = idf(docFreq=582, maxDocs=44218)
          0.05088753 = queryNorm
        0.4163047 = fieldWeight in 2531, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.3287 = idf(docFreq=582, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2531)
  0.2 = coord(1/5)
```
Abstract

It is well known that links are an important source of information when dealing with Web collections. However, the question remains on whether the same techniques that are used on the Web can be applied to collections of documents containing citations between scientific papers. In this work we present a comparative study of digital library citations and Web links, in the context of automatic text classification. We show that there are in fact differences between citations and links in this context. For the comparison, we run a series of experiments using a digital library of computer science papers and a Web directory. In our reference collections, measures based on co-citation tend to perform better for pages in the Web directory, with gains up to 37% over text based classifiers, while measures based on bibliographic coupling perform better in a digital library. We also propose a simple and effective way of combining a traditional text based classifier with a citation-link based classifier. This combination is based on the notion of classifier reliability and presented gains of up to 14% in micro-averaged F1 in the Web collection. However, no significant gain was obtained in the digital library. Finally, a user study was performed to further investigate the causes for these results. We discovered that misclassifications by the citation-link based classifiers are in fact difficult cases, hard to classify even for humans.

Dalip, D.H.; Gonçalves, M.A.; Cristo, M.; Calado, P.: ¬A general multiview framework for assessing the quality of collaboratively created content on web 2.0 (2017) 0.01

0.006894558 = product of:
  0.03447279 = sum of:
    0.03447279 = weight(_text_:22 in 3343) [ClassicSimilarity], result of:
      0.03447279 = score(doc=3343,freq=2.0), product of:
        0.17819946 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.05088753 = queryNorm
        0.19345059 = fieldWeight in 3343, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3343)
  0.2 = coord(1/5)

Date: 16.11.2017 13:04:22

Belém, F.M.; Almeida, J.M.; Gonçalves, M.A.: ¬A survey on tag recommendation methods : a review (2017) 0.01

0.006894558 = product of:
  0.03447279 = sum of:
    0.03447279 = weight(_text_:22 in 3524) [ClassicSimilarity], result of:
      0.03447279 = score(doc=3524,freq=2.0), product of:
        0.17819946 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.05088753 = queryNorm
        0.19345059 = fieldWeight in 3524, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3524)
  0.2 = coord(1/5)

Date: 16.11.2017 13:30:22

Search (4 results, page 1 of 1)

Authors

Years

Themes