Search (5 results, page 1 of 1)

Zhang, J.; Wolfram, D.: Visualization of term discrimination analysis (2001) 0.03
```
0.02949708 = product of:
  0.05899416 = sum of:
    0.05899416 = product of:
      0.11798832 = sum of:
        0.11798832 = weight(_text_:y in 5210) [ClassicSimilarity], result of:
          0.11798832 = score(doc=5210,freq=6.0), product of:
            0.25623685 = queryWeight, product of:
              4.8124003 = idf(docFreq=976, maxDocs=44218)
              0.053245123 = queryNorm
            0.46046585 = fieldWeight in 5210, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.8124003 = idf(docFreq=976, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5210)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Zang and Wolfram compute the discrimination value for terms as the difference between the centroid value of all terms in the corpus and that value without the term in question, and suggest selection be made by comparing density changes with a visualization tool. The Distance Angle Retrieval Environment (DARE) visually projects a document or term space by presenting distance similarity on the X axis and angular similarity on the Y axis. Thus a document icon appearing close to the X axis would be relevant to reference points in terms of a distance similarity measure, while those close to the Y axis are relevant to reference points in terms of an angle based measure. Using 450 Associated Press news reports indexed by 44 distinct terms, the removal of the term ``Yeltsin'' causes the cluster to fall on the Y axis indicating a good discriminator. For an angular measure, cosine say, movement along the X axis to the left will signal good discrimination, as movement to the right will signal poor discrimination. A term density space could also be used. Most terms are shown to be indifferent discriminators. Different measures result in different choices as good and poor discriminators, as does the use of a term space rather than a document space. The visualization approach is clearly feasible, and provides some additional insights not found in the computation of a discrimination value.

Zhang, J.; Chen, Y.; Zhao, Y.; Wolfram, D.; Ma, F.: Public health and social media : a study of Zika virus-related posts on Yahoo! Answers (2020) 0.02

0.024084264 = product of:
  0.04816853 = sum of:
    0.04816853 = product of:
      0.09633706 = sum of:
        0.09633706 = weight(_text_:y in 5672) [ClassicSimilarity], result of:
          0.09633706 = score(doc=5672,freq=4.0), product of:
            0.25623685 = queryWeight, product of:
              4.8124003 = idf(docFreq=976, maxDocs=44218)
              0.053245123 = queryNorm
            0.37596878 = fieldWeight in 5672, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.8124003 = idf(docFreq=976, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5672)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Zhang, J.; Wolfram, D.; Wang, P.; Hong, Y.; Gillis, R.: Visualization of health-subject analysis based on query term co-occurrences (2008) 0.02

0.017030146 = product of:
  0.034060292 = sum of:
    0.034060292 = product of:
      0.068120584 = sum of:
        0.068120584 = weight(_text_:y in 2376) [ClassicSimilarity], result of:
          0.068120584 = score(doc=2376,freq=2.0), product of:
            0.25623685 = queryWeight, product of:
              4.8124003 = idf(docFreq=976, maxDocs=44218)
              0.053245123 = queryNorm
            0.26585007 = fieldWeight in 2376, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8124003 = idf(docFreq=976, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2376)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Wolfram, D.; Zhang, J.: ¬The influence of indexing practices and weighting algorithms on document spaces (2008) 0.01
```
0.012967757 = product of:
  0.025935514 = sum of:
    0.025935514 = product of:
      0.103742056 = sum of:
        0.103742056 = weight(_text_:authors in 1963) [ClassicSimilarity], result of:
          0.103742056 = score(doc=1963,freq=4.0), product of:
            0.24273461 = queryWeight, product of:
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.053245123 = queryNorm
            0.42738882 = fieldWeight in 1963, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.046875 = fieldNorm(doc=1963)
      0.25 = coord(1/4)
  0.5 = coord(1/2)
```
Abstract

Index modeling and computer simulation techniques are used to examine the influence of indexing frequency distributions, indexing exhaustivity distributions, and three weighting methods on hypothetical document spaces in a vector-based information retrieval (IR) system. The way documents are indexed plays an important role in retrieval. The authors demonstrate the influence of different indexing characteristics on document space density (DSD) changes and document space discriminative capacity for IR. Document environments that contain a relatively higher percentage of infrequently occurring terms provide lower density outcomes than do environments where a higher percentage of frequently occurring terms exists. Different indexing exhaustivity levels, however, have little influence on the document space densities. A weighting algorithm that favors higher weights for infrequently occurring terms results in the lowest overall document space densities, which allows documents to be more readily differentiated from one another. This in turn can positively influence IR. The authors also discuss the influence on outcomes using two methods of normalization of term weights (i.e., means and ranges) for the different weighting methods.
Zhang, J.; Wolfram, D.; Wang, P.: Analysis of query keywords of sports-related queries using visualization and clustering (2009) 0.01
```
0.0076413243 = product of:
  0.015282649 = sum of:
    0.015282649 = product of:
      0.061130594 = sum of:
        0.061130594 = weight(_text_:authors in 2947) [ClassicSimilarity], result of:
          0.061130594 = score(doc=2947,freq=2.0), product of:
            0.24273461 = queryWeight, product of:
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.053245123 = queryNorm
            0.25184128 = fieldWeight in 2947, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2947)
      0.25 = coord(1/4)
  0.5 = coord(1/2)
```
Abstract

The authors investigated 11 sports-related query keywords extracted from a public search engine query log to better understand sports-related information seeking on the Internet. After the query log contents were cleaned and query data were parsed, popular sports-related keywords were identified, along with frequently co-occurring query terms associated with the identified keywords. Relationships among each sports-related focus keyword and its related keywords were characterized and grouped using multidimensional scaling (MDS) in combination with traditional hierarchical clustering methods. The two approaches were synthesized in a visual context by highlighting the results of the hierarchical clustering analysis in the visual MDS configuration. Important events, people, subjects, merchandise, and so on related to a sport were illustrated, and relationships among the sports were analyzed. A small-scale comparative study of sports searches with and without term assistance was conducted. Searches that used search term assistance by relying on previous query term relationships outperformed the searches without the search term assistance. The findings of this study provide insights into sports information seeking behavior on the Internet. The developed method also may be applied to other query log subject areas.

Search (5 results, page 1 of 1)

Authors

Years