Search (3 results, page 1 of 1)

Li, L.; Shang, Y.; Zhang, W.: Improvement of HITS-based algorithms on Web documents 0.06
```
0.058921922 = sum of:
  0.054862697 = product of:
    0.21945079 = sum of:
      0.21945079 = weight(_text_:3a in 2514) [ClassicSimilarity], result of:
        0.21945079 = score(doc=2514,freq=2.0), product of:
          0.39046928 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046056706 = queryNorm
          0.56201804 = fieldWeight in 2514, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=2514)
    0.25 = coord(1/4)
  0.0040592253 = product of:
    0.008118451 = sum of:
      0.008118451 = weight(_text_:a in 2514) [ClassicSimilarity], result of:
        0.008118451 = score(doc=2514,freq=8.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.15287387 = fieldWeight in 2514, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046875 = fieldNorm(doc=2514)
    0.5 = coord(1/2)
```
Abstract

In this paper, we present two ways to improve the precision of HITS-based algorithms onWeb documents. First, by analyzing the limitations of current HITS-based algorithms, we propose a new weighted HITS-based method that assigns appropriate weights to in-links of root documents. Then, we combine content analysis with HITS-based algorithms and study the effects of four representative relevance scoring methods, VSM, Okapi, TLS, and CDR, using a set of broad topic queries. Our experimental results show that our weighted HITS-based method performs significantly better than Bharat's improved HITS algorithm. When we combine our weighted HITS-based method or Bharat's HITS algorithm with any of the four relevance scoring methods, the combined methods are only marginally better than our weighted HITS-based method. Between the four relevance scoring methods, there is no significant quality difference when they are combined with a HITS-based algorithm.

Content

Vgl.: http%3A%2F%2Fdelab.csd.auth.gr%2F~dimitris%2Fcourses%2Fir_spring06%2Fpage_rank_computing%2Fp527-li.pdf. Vgl. auch: http://www2002.org/CDROM/refereed/643/.

Type

a
Zhang, W.; Yoshida, T.; Tang, X.: ¬A comparative study of TF*IDF, LSI and multi-words for text classification (2011) 0.00
```
0.001757696 = product of:
  0.003515392 = sum of:
    0.003515392 = product of:
      0.007030784 = sum of:
        0.007030784 = weight(_text_:a in 1165) [ClassicSimilarity], result of:
          0.007030784 = score(doc=1165,freq=6.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.13239266 = fieldWeight in 1165, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1165)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

One of the main themes in text mining is text representation, which is fundamental and indispensable for text-based intellegent information processing. Generally, text representation inludes two tasks: indexing and weighting. This paper has comparatively studied TF*IDF, LSI and multi-word for text representation. We used a Chinese and an English document collection to respectively evaluate the three methods in information retreival and text categorization. Experimental results have demonstrated that in text categorization, LSI has better performance than other methods in both document collections. Also, LSI has produced the best performance in retrieving English documents. This outcome has shown that LSI has both favorable semantic and statistical quality and is different with the claim that LSI can not produce discriminative power for indexing.

Type

a

Zhang, W.; Korf, R.E.: Performance of linear-space search algorithms (1995) 0.00

0.0016913437 = product of:
  0.0033826875 = sum of:
    0.0033826875 = product of:
      0.006765375 = sum of:
        0.006765375 = weight(_text_:a in 4744) [ClassicSimilarity], result of:
          0.006765375 = score(doc=4744,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.12739488 = fieldWeight in 4744, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=4744)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Search (3 results, page 1 of 1)

Authors

Years

Themes