Search (2 results, page 1 of 1)

Calado, P.; Cristo, M.; Gonçalves, M.A.; Moura, E.S. de; Ribeiro-Neto, B.; Ziviani, N.: Link-based similarity measures for the classification of Web documents (2006) 0.04
```
0.03930773 = sum of:
  0.021199638 = product of:
    0.08479855 = sum of:
      0.08479855 = weight(_text_:authors in 4921) [ClassicSimilarity], result of:
        0.08479855 = score(doc=4921,freq=4.0), product of:
          0.23809293 = queryWeight, product of:
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.052226946 = queryNorm
          0.35615736 = fieldWeight in 4921, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4921)
    0.25 = coord(1/4)
  0.01810809 = product of:
    0.03621618 = sum of:
      0.03621618 = weight(_text_:b in 4921) [ClassicSimilarity], result of:
        0.03621618 = score(doc=4921,freq=2.0), product of:
          0.18503809 = queryWeight, product of:
            3.542962 = idf(docFreq=3476, maxDocs=44218)
            0.052226946 = queryNorm
          0.19572285 = fieldWeight in 4921, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.542962 = idf(docFreq=3476, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4921)
    0.5 = coord(1/2)
```
Abstract

Traditional text-based document classifiers tend to perform poorly an the Web. Text in Web documents is usually noisy and often does not contain enough information to determine their topic. However, the Web provides a different source that can be useful to document classification: its hyperlink structure. In this work, the authors evaluate how the link structure of the Web can be used to determine a measure of similarity appropriate for document classification. They experiment with five different similarity measures and determine their adequacy for predicting the topic of a Web page. Tests performed an a Web directory Show that link information alone allows classifying documents with an average precision of 86%. Further, when combined with a traditional textbased classifier, precision increases to values of up to 90%, representing gains that range from 63 to 132% over the use of text-based classification alone. Because the measures proposed in this article are straightforward to compute, they provide a practical and effective solution for Web classification and related information retrieval tasks. Further, the authors provide an important set of guidelines an how link structure can be used effectively to classify Web documents.
Moura, E.S. de; Fernandes, D.; Ribeiro-Neto, B.; Silva, A.S. da; Gonçalves, M.A.: Using structural information to improve search in Web collections (2010) 0.02
```
0.015365225 = product of:
  0.03073045 = sum of:
    0.03073045 = product of:
      0.0614609 = sum of:
        0.0614609 = weight(_text_:b in 4119) [ClassicSimilarity], result of:
          0.0614609 = score(doc=4119,freq=4.0), product of:
            0.18503809 = queryWeight, product of:
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.052226946 = queryNorm
            0.3321527 = fieldWeight in 4119, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.542962 = idf(docFreq=3476, maxDocs=44218)
              0.046875 = fieldNorm(doc=4119)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this work, we investigate the problem of using the block structure of Web pages to improve ranking results. Starting with basic intuitions provided by the concepts of term frequency (TF) and inverse document frequency (IDF), we propose nine block-weight functions to distinguish the impact of term occurrences inside page blocks, instead of inside whole pages. These are then used to compute a modified BM25 ranking function. Using four distinct Web collections, we ran extensive experiments to compare our block-weight ranking formulas with two other baselines: (a) a BM25 ranking applied to full pages, and (b) a BM25 ranking that takes into account best blocks. Our methods suggest that our block-weighting ranking method is superior to all baselines across all collections we used and that average gain in precision figures from 5 to 20% are generated.

Search (2 results, page 1 of 1)

Authors

Years

Themes