Search (6 results, page 1 of 1)

Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.02
```
0.01774153 = product of:
  0.07096612 = sum of:
    0.07096612 = sum of:
      0.040352322 = weight(_text_:methods in 1605) [ClassicSimilarity], result of:
        0.040352322 = score(doc=1605,freq=2.0), product of:
          0.18168657 = queryWeight, product of:
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.045191016 = queryNorm
          0.22209854 = fieldWeight in 1605, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1605)
      0.030613795 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
        0.030613795 = score(doc=1605,freq=2.0), product of:
          0.15825124 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.045191016 = queryNorm
          0.19345059 = fieldWeight in 1605, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1605)
  0.25 = coord(1/4)
```
Abstract

Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22

Vaughan, L.: Statistical methods for the information professional : a practical, painless approach to understanding, using, and interpreting statistics (2001) 0.01

0.014123313 = product of:
  0.056493253 = sum of:
    0.056493253 = product of:
      0.112986505 = sum of:
        0.112986505 = weight(_text_:methods in 4684) [ClassicSimilarity], result of:
          0.112986505 = score(doc=4684,freq=2.0), product of:
            0.18168657 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.045191016 = queryNorm
            0.62187594 = fieldWeight in 4684, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.109375 = fieldNorm(doc=4684)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Thelwall, M.; Vaughan, L.; Björneborn, L.: Webometrics (2004) 0.01
```
0.0071333502 = product of:
  0.028533401 = sum of:
    0.028533401 = product of:
      0.057066802 = sum of:
        0.057066802 = weight(_text_:methods in 4279) [ClassicSimilarity], result of:
          0.057066802 = score(doc=4279,freq=4.0), product of:
            0.18168657 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.045191016 = queryNorm
            0.31409478 = fieldWeight in 4279, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4279)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Webometrics, the quantitative study of Web-related phenomena, emerged from the realization that methods originally designed for bibliometric analysis of scientific journal article citation patterns could be applied to the Web, with commercial search engines providing the raw data. Almind and Ingwersen (1997) defined the field and gave it its name. Other pioneers included Rodriguez Gairin (1997) and Aguillo (1998). Larson (1996) undertook exploratory link structure analysis, as did Rousseau (1997). Webometrics encompasses research from fields beyond information science such as communication studies, statistical physics, and computer science. In this review we concentrate on link analysis, but also cover other aspects of webometrics, including Web log fle analysis. One theme that runs through this chapter is the messiness of Web data and the need for data cleansing heuristics. The uncontrolled Web creates numerous problems in the interpretation of results, for instance, from the automatic creation or replication of links. The loose connection between top-level domain specifications (e.g., com, edu, and org) and their actual content is also a frustrating problem. For example, many .com sites contain noncommercial content, although com is ostensibly the main commercial top-level domain. Indeed, a skeptical researcher could claim that obstacles of this kind are so great that all Web analyses lack value. As will be seen, one response to this view, a view shared by critics of evaluative bibliometrics, is to demonstrate that Web data correlate significantly with some non-Web data in order to prove that the Web data are not wholly random. A practical response has been to develop increasingly sophisticated data cleansing techniques and multiple data analysis methods.
Leydesdorff, L.; Vaughan, L.: Co-occurrence matrices and their applications in information science : extending ACA to the Web environment (2006) 0.01
```
0.0071333502 = product of:
  0.028533401 = sum of:
    0.028533401 = product of:
      0.057066802 = sum of:
        0.057066802 = weight(_text_:methods in 6113) [ClassicSimilarity], result of:
          0.057066802 = score(doc=6113,freq=4.0), product of:
            0.18168657 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.045191016 = queryNorm
            0.31409478 = fieldWeight in 6113, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6113)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Co-occurrence matrices, such as cocitation, coword, and colink matrices, have been used widely in the information sciences. However, confusion and controversy have hindered the proper statistical analysis of these data. The underlying problem, in our opinion, involved understanding the nature of various types of matrices. This article discusses the difference between a symmetrical cocitation matrix and an asymmetrical citation matrix as well as the appropriate statistical techniques that can be applied to each of these matrices, respectively. Similarity measures (such as the Pearson correlation coefficient or the cosine) should not be applied to the symmetrical cocitation matrix but can be applied to the asymmetrical citation matrix to derive the proximity matrix. The argument is illustrated with examples. The study then extends the application of co-occurrence matrices to the Web environment, in which the nature of the available data and thus data collection methods are different from those of traditional databases such as the Science Citation Index. A set of data collected with the Google Scholar search engine is analyzed by using both the traditional methods of multivariate analysis and the new visualization software Pajek, which is based on social network analysis and graph theory.
Romero-Frías, E.; Vaughan, L.: Exploring the relationships between media and political parties through web hyperlink analysis : the case of Spain (2012) 0.01
```
0.0060528484 = product of:
  0.024211394 = sum of:
    0.024211394 = product of:
      0.048422787 = sum of:
        0.048422787 = weight(_text_:methods in 239) [ClassicSimilarity], result of:
          0.048422787 = score(doc=239,freq=2.0), product of:
            0.18168657 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.045191016 = queryNorm
            0.26651827 = fieldWeight in 239, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.046875 = fieldNorm(doc=239)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

The study focuses on the web presence of the main Spanish media and seeks to determine whether hyperlink analysis of media and political parties can provide insight into their political orientation. The research included all major national media and political parties in Spain. Inlink and co-link data about these organizations were collected and analyzed using multidimensional scaling (MDS) and other statistical methods. In the MDS map, media are clustered based on their political orientation. There are significantly more co-links between media and parties with the same political orientation than there are between those with different political orientations. Findings from the study suggest the potential of using link analysis to gain new insights into the interactions among media and political parties.
Ninkov, A.; Vaughan, L.: ¬A webometric analysis of the online vaccination debate (2017) 0.01
```
0.0050440403 = product of:
  0.020176161 = sum of:
    0.020176161 = product of:
      0.040352322 = sum of:
        0.040352322 = weight(_text_:methods in 3605) [ClassicSimilarity], result of:
          0.040352322 = score(doc=3605,freq=2.0), product of:
            0.18168657 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.045191016 = queryNorm
            0.22209854 = fieldWeight in 3605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3605)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Webometrics research methods can be effectively used to measure and analyze information on the web. One topic discussed vehemently online that could benefit from this type of analysis is vaccines. We carried out a study analyzing the web presence of both sides of this debate. We collected a variety of webometric data and analyzed the data both quantitatively and qualitatively. The study found far more anti- than pro-vaccine web domains. The anti and pro sides had similar web visibility as measured by the number of links coming from general websites and Tweets. However, the links to the pro domains were of higher quality measured by PageRank scores. The result from the qualitative content analysis confirmed this finding. The analysis of site ages revealed that the battle between the two sides had a long history and is still ongoing. The web scene was polarized with either pro or anti views and little neutral ground. The study suggests ways that professional information can be promoted more effectively on the web. The study demonstrates that webometrics analysis is effective in studying online information dissemination. This kind of analysis can be used to study not only health information but other information as well.

Search (6 results, page 1 of 1)

Authors

Years

Types

Themes