Search (53 results, page 1 of 3)

Thelwall, M.; Wilkinson, D.: Finding similar academic Web sites with links, bibliometric couplings and colinks (2004) 0.02
```
0.015666807 = product of:
  0.054833822 = sum of:
    0.032765217 = weight(_text_:retrieval in 2571) [ClassicSimilarity], result of:
      0.032765217 = score(doc=2571,freq=4.0), product of:
        0.11553899 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03819578 = queryNorm
        0.2835858 = fieldWeight in 2571, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2571)
    0.022068607 = weight(_text_:internet in 2571) [ClassicSimilarity], result of:
      0.022068607 = score(doc=2571,freq=2.0), product of:
        0.11276311 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03819578 = queryNorm
        0.1957077 = fieldWeight in 2571, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=2571)
  0.2857143 = coord(2/7)
```
Abstract

A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or sites that are similar in content. In this paper we assess the extent to which links, colinks and couplings can be used to identify similar Web sites. As an experiment, a random sample of 500 pairs of domains from the UK academic Web were taken and human assessments of site similarity, based upon content type, were compared against ratings for the three concepts. The results show that using a combination of all three gives the highest probability of identifying similar sites, but surprisingly this was only a marginal improvement over using links alone. Another unexpected result was that high values for either colink counts or couplings were associated with only a small increased likelihood of similarity. The principal advantage of using couplings and colinks was found to be greater coverage in terms of a much larger number of pairs of sites being connected by these measures, instead of increased probability of similarity. In information retrieval terminology, this is improved recall rather than improved precision.

Theme

Internet
Thelwall, M.; Sud, P.: ¬A comparison of methods for collecting web citation data for academic organizations (2011) 0.01
```
0.013055672 = product of:
  0.04569485 = sum of:
    0.027304346 = weight(_text_:retrieval in 4626) [ClassicSimilarity], result of:
      0.027304346 = score(doc=4626,freq=4.0), product of:
        0.11553899 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03819578 = queryNorm
        0.23632148 = fieldWeight in 4626, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4626)
    0.018390507 = weight(_text_:internet in 4626) [ClassicSimilarity], result of:
      0.018390507 = score(doc=4626,freq=2.0), product of:
        0.11276311 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03819578 = queryNorm
        0.16308975 = fieldWeight in 4626, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4626)
  0.2857143 = coord(2/7)
```
Abstract

The primary webometric method for estimating the online impact of an organization is to count links to its website. Link counts have been available from commercial search engines for over a decade but this was set to end by early 2012 and so a replacement is needed. This article compares link counts to two alternative methods: URL citations and organization title mentions. New variations of these methods are also introduced. The three methods are compared against each other using Yahoo!. Two of the three methods (URL citations and organization title mentions) are also compared against each other using Bing. Evidence from a case study of 131 UK universities and 49 US Library and Information Science (LIS) departments suggests that Bing's Hit Count Estimates (HCEs) for popular title searches are not useful for webometric research but that Yahoo!'s HCEs for all three types of search and Bing's URL citation HCEs seem to be consistent. For exact URL counts the results of all three methods in Yahoo! and both methods in Bing are also consistent. Four types of accuracy factors are also introduced and defined: search engine coverage, search engine retrieval variation, search engine retrieval anomalies, and query polysemy.

Theme

Internet

Thelwall, M.: Directing students to new information types : a new role for Google in literature searches? (2005) 0.01

0.0108375205 = product of:
  0.03793132 = sum of:
    0.025746709 = weight(_text_:internet in 364) [ClassicSimilarity], result of:
      0.025746709 = score(doc=364,freq=2.0), product of:
        0.11276311 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03819578 = queryNorm
        0.22832564 = fieldWeight in 364, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0546875 = fieldNorm(doc=364)
    0.0121846115 = product of:
      0.036553834 = sum of:
        0.036553834 = weight(_text_:29 in 364) [ClassicSimilarity], result of:
          0.036553834 = score(doc=364,freq=2.0), product of:
            0.13436082 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03819578 = queryNorm
            0.27205724 = fieldWeight in 364, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=364)
      0.33333334 = coord(1/3)
  0.2857143 = coord(2/7)

Date: 3. 6.2007 16:37:29
Series: Internet reference services quarterly. 10(2005) nos.3/4

Vaughan, L.; Thelwall, M.: Scholarly use of the Web : what are the key inducers of links to journal Web sites? (2003) 0.01
```
0.009917542 = product of:
  0.0347114 = sum of:
    0.026008105 = weight(_text_:internet in 1236) [ClassicSimilarity], result of:
      0.026008105 = score(doc=1236,freq=4.0), product of:
        0.11276311 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03819578 = queryNorm
        0.23064373 = fieldWeight in 1236, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1236)
    0.008703294 = product of:
      0.026109882 = sum of:
        0.026109882 = weight(_text_:29 in 1236) [ClassicSimilarity], result of:
          0.026109882 = score(doc=1236,freq=2.0), product of:
            0.13436082 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03819578 = queryNorm
            0.19432661 = fieldWeight in 1236, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1236)
      0.33333334 = coord(1/3)
  0.2857143 = coord(2/7)
```
Abstract

Web links have been studied by information scientists for at least six years but it is only in the past two that clear evidence has emerged to show that counts of links to scholarly Web spaces (universities and departments) can correlate significantly with research measures, giving some credence to their use for the investigation of scholarly communication. This paper reports an a study to investigate the factors that influence the creation of links to journal Web sites. An empirical approach is used: collecting data and testing for significant patterns. The specific questions addressed are whether site age and site content are inducers of links to a journal's Web site as measured by the ratio of link counts to Journal Impact Factors, two variables previously discovered to be related. A new methodology for data collection is also introduced that uses the Internet Archive to obtain an earliest known creation date for Web sites. The results show that both site age and site content are significant factors for the disciplines studied: library and information science, and law. Comparisons between the two fields also show disciplinary differences in Web site characteristics. Scholars and publishers should be particularly aware that richer content an a journal's Web site tends to generate links and thus the traffic to the site.

Source

Journal of the American Society for Information Science and technology. 54(2003) no.1, S.29-38

Theme

Internet

Thelwall, M.; Buckley, K.; Paltoglou, G.: Sentiment in Twitter events (2011) 0.01

0.00926246 = product of:
  0.03241861 = sum of:
    0.022068607 = weight(_text_:internet in 4345) [ClassicSimilarity], result of:
      0.022068607 = score(doc=4345,freq=2.0), product of:
        0.11276311 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03819578 = queryNorm
        0.1957077 = fieldWeight in 4345, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=4345)
    0.010350002 = product of:
      0.031050006 = sum of:
        0.031050006 = weight(_text_:22 in 4345) [ClassicSimilarity], result of:
          0.031050006 = score(doc=4345,freq=2.0), product of:
            0.13375512 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03819578 = queryNorm
            0.23214069 = fieldWeight in 4345, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=4345)
      0.33333334 = coord(1/3)
  0.2857143 = coord(2/7)

Date: 22. 1.2011 14:27:06
Theme: Internet

Thelwall, M.; Prabowo, R.; Fairclough, R.: Are raw RSS feeds suitable for broad issue scanning? : a science concern case study (2006) 0.01
```
0.0077410867 = product of:
  0.027093802 = sum of:
    0.018390507 = weight(_text_:internet in 6116) [ClassicSimilarity], result of:
      0.018390507 = score(doc=6116,freq=2.0), product of:
        0.11276311 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03819578 = queryNorm
        0.16308975 = fieldWeight in 6116, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6116)
    0.008703294 = product of:
      0.026109882 = sum of:
        0.026109882 = weight(_text_:29 in 6116) [ClassicSimilarity], result of:
          0.026109882 = score(doc=6116,freq=2.0), product of:
            0.13436082 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03819578 = queryNorm
            0.19432661 = fieldWeight in 6116, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6116)
      0.33333334 = coord(1/3)
  0.2857143 = coord(2/7)
```
Abstract

Broad issue scanning is the task of identifying important public debates arising in a given broad issue; really simple syndication (RSS) feeds are a natural information source for investigating broad issues. RSS, as originally conceived, is a method for publishing timely and concise information on the Internet, for example, about the main stories in a news site or the latest postings in a blog. RSS feeds are potentially a nonintrusive source of high-quality data about public opinion: Monitoring a large number may allow quantitative methods to extract information relevant to a given need. In this article we describe an RSS feed-based coword frequency method to identify bursts of discussion relevant to a given broad issue. A case study of public science concerns is used to demonstrate the method and assess the suitability of raw RSS feeds for broad issue scanning (i.e., without data cleansing). An attempt to identify genuine science concern debates from the corpus through investigating the top 1,000 "burst" words found only two genuine debates, however. The low success rate was mainly caused by a few pathological feeds that dominated the results and obscured any significant debates. The results point to the need to develop effective data cleansing procedures for RSS feeds, particularly if there is not a large quantity of discussion about the broad issue, and a range of potential techniques is suggested. Finally, the analysis confirmed that the time series information generated by real-time monitoring of RSS feeds could usefully illustrate the evolution of new debates relevant to a broad issue.

Date

21.10.2006 19:29:49
Shifman, L.; Thelwall, M.: Assessing global diffusion with Web memetics : the spread and evolution of a popular joke (2009) 0.01
```
0.0054605645 = product of:
  0.03822395 = sum of:
    0.03822395 = weight(_text_:internet in 3303) [ClassicSimilarity], result of:
      0.03822395 = score(doc=3303,freq=6.0), product of:
        0.11276311 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03819578 = queryNorm
        0.33897567 = fieldWeight in 3303, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=3303)
  0.14285715 = coord(1/7)
```
Abstract

Memes are small units of culture, analogous to genes, which flow from person to person by copying or imitation. More than any previous medium, the Internet has the technical capabilities for global meme diffusion. Yet, to spread globally, memes need to negotiate their way through cultural and linguistic borders. This article introduces a new broad method, Web memetics, comprising extensive Web searches and combined quantitative and qualitative analyses, to identify and assess: (a) the different versions of a meme, (b) its evolution online, and (c) its Web presence and translation into common Internet languages. This method is demonstrated through one extensively circulated joke about men, women, and computers. The results show that the joke has mutated into several different versions and is widely translated, and that translations incorporate small, local adaptations while retaining the English versions' fundamental components. In conclusion, Web memetics has demonstrated its ability to identify and track the evolution and spread of memes online, with interesting results, albeit for only one case study.

Theme

Internet

Thelwall, M.; Buckley, K.; Paltoglou, G.; Cai, D.; Kappas, A.: Sentiment strength detection in short informal text (2010) 0.00

0.004950942 = product of:
  0.03465659 = sum of:
    0.03465659 = product of:
      0.051984888 = sum of:
        0.026109882 = weight(_text_:29 in 4200) [ClassicSimilarity], result of:
          0.026109882 = score(doc=4200,freq=2.0), product of:
            0.13436082 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03819578 = queryNorm
            0.19432661 = fieldWeight in 4200, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4200)
        0.025875006 = weight(_text_:22 in 4200) [ClassicSimilarity], result of:
          0.025875006 = score(doc=4200,freq=2.0), product of:
            0.13375512 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03819578 = queryNorm
            0.19345059 = fieldWeight in 4200, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4200)
      0.6666667 = coord(2/3)
  0.14285715 = coord(1/7)

Date: 22. 1.2011 14:29:23

Thelwall, M.; Thelwall, S.: ¬A thematic analysis of highly retweeted early COVID-19 tweets : consensus, information, dissent and lockdown life (2020) 0.00
```
0.004950942 = product of:
  0.03465659 = sum of:
    0.03465659 = product of:
      0.051984888 = sum of:
        0.026109882 = weight(_text_:29 in 178) [ClassicSimilarity], result of:
          0.026109882 = score(doc=178,freq=2.0), product of:
            0.13436082 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03819578 = queryNorm
            0.19432661 = fieldWeight in 178, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=178)
        0.025875006 = weight(_text_:22 in 178) [ClassicSimilarity], result of:
          0.025875006 = score(doc=178,freq=2.0), product of:
            0.13375512 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03819578 = queryNorm
            0.19345059 = fieldWeight in 178, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=178)
      0.6666667 = coord(2/3)
  0.14285715 = coord(1/7)
```
Abstract

Purpose Public attitudes towards COVID-19 and social distancing are critical in reducing its spread. It is therefore important to understand public reactions and information dissemination in all major forms, including on social media. This article investigates important issues reflected on Twitter in the early stages of the public reaction to COVID-19. Design/methodology/approach A thematic analysis of the most retweeted English-language tweets mentioning COVID-19 during March 10-29, 2020. Findings The main themes identified for the 87 qualifying tweets accounting for 14 million retweets were: lockdown life; attitude towards social restrictions; politics; safety messages; people with COVID-19; support for key workers; work; and COVID-19 facts/news. Research limitations/implications Twitter played many positive roles, mainly through unofficial tweets. Users shared social distancing information, helped build support for social distancing, criticised government responses, expressed support for key workers and helped each other cope with social isolation. A few popular tweets not supporting social distancing show that government messages sometimes failed. Practical implications Public health campaigns in future may consider encouraging grass roots social web activity to support campaign goals. At a methodological level, analysing retweet counts emphasised politics and ignored practical implementation issues. Originality/value This is the first qualitative analysis of general COVID-19-related retweeting.

Date

20. 1.2015 18:30:22
Thelwall, M.: Assessing web search engines : a webometric approach (2011) 0.00
```
0.0046807453 = product of:
  0.032765217 = sum of:
    0.032765217 = weight(_text_:retrieval in 10) [ClassicSimilarity], result of:
      0.032765217 = score(doc=10,freq=4.0), product of:
        0.11553899 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03819578 = queryNorm
        0.2835858 = fieldWeight in 10, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=10)
  0.14285715 = coord(1/7)
```
Abstract

Information Retrieval (IR) research typically evaluates search systems in terms of the standard precision, recall and F-measures to weight the relative importance of precision and recall (e.g. van Rijsbergen, 1979). All of these assess the extent to which the system returns good matches for a query. In contrast, webometric measures are designed specifically for web search engines and are designed to monitor changes in results over time and various aspects of the internal logic of the way in which search engine select the results to be returned. This chapter introduces a range of webometric measurements and illustrates them with case studies of Google, Bing and Yahoo! This is a very fertile area for simple and complex new investigations into search engine results.

Source

Innovations in information retrieval: perspectives for theory and practice. Eds.: A. Foster, u. P. Rafferty
Thelwall, M.: ¬A comparison of sources of links for academic Web impact factor calculations (2002) 0.00
```
0.004458532 = product of:
  0.031209724 = sum of:
    0.031209724 = weight(_text_:internet in 4474) [ClassicSimilarity], result of:
      0.031209724 = score(doc=4474,freq=4.0), product of:
        0.11276311 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03819578 = queryNorm
        0.27677247 = fieldWeight in 4474, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=4474)
  0.14285715 = coord(1/7)
```
Abstract

There has been much recent interest in extracting information from collections of Web links. One tool that has been used is Ingwersen's Web impact factor. It has been demonstrated that several versions of this metric can produce results that correlate with research ratings of British universities showing that, despite being a measure of a purely Internet phenomenon, the results are susceptible to a wider interpretation. This paper addresses the question of which is the best possible domain to count backlinks from, if research is the focus of interest. WIFs for British universities calculated from several different source domains are compared, primarily the .edu, .ac.uk and .uk domains, and the entire Web. The results show that all four areas produce WIFs that correlate strongly with research ratings, but that none produce incontestably superior figures. It was also found that the WIF was less able to differentiate in more homogeneous subsets of universities, although positive results are still possible.

Theme

Internet

Thelwall, M.; Vaughan, L.: Webometrics : an introduction to the special issue (2004) 0.00

0.0042035445 = product of:
  0.02942481 = sum of:
    0.02942481 = weight(_text_:internet in 2908) [ClassicSimilarity], result of:
      0.02942481 = score(doc=2908,freq=2.0), product of:
        0.11276311 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03819578 = queryNorm
        0.2609436 = fieldWeight in 2908, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0625 = fieldNorm(doc=2908)
  0.14285715 = coord(1/7)

Theme: Internet

Thelwall, M.: ¬A layered approach for investigating the topological structure of communities in the Web (2003) 0.00
```
0.003900621 = product of:
  0.027304346 = sum of:
    0.027304346 = weight(_text_:retrieval in 4450) [ClassicSimilarity], result of:
      0.027304346 = score(doc=4450,freq=4.0), product of:
        0.11553899 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03819578 = queryNorm
        0.23632148 = fieldWeight in 4450, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4450)
  0.14285715 = coord(1/7)
```
Abstract

A layered approach for identifying communities in the Web is presented and explored by applying the flake exact community identification algorithm to the UK academic Web. Although community or topic identification is a common task in information retrieval, a new perspective is developed by: the application of alternative document models, shifting the focus from individual pages to aggregated collections based upon Web directories, domains and entire sites; the removal of internal site links; and the adaptation of a new fast algorithm to allow fully-automated community identification using all possible single starting points. The overall topology of the graphs in the three least-aggregated layers was first investigated and found to include a large number of isolated points but, surprisingly, with most of the remainder being in one huge connected component, exact proportions varying by layer. The community identification process then found that the number of communities far exceeded the number of topological components, indicating that community identification is a potentially useful technique, even with random starting points. Both the number and size of communities identified was dependent on the parameter of the algorithm, with very different results being obtained in each case. In conclusion, the UK academic Web is embedded with layers of non-trivial communities and, if it is not unique in this, then there is the promise of improved results for information retrieval algorithms that can exploit this additional structure, and the application of the technique directly to partially automate Web metrics tasks such as that of finding all pages related to a given subject hosted by a single country's universities.
Thelwall, M.: Extracting macroscopic information from Web links (2001) 0.00
```
0.0037154437 = product of:
  0.026008105 = sum of:
    0.026008105 = weight(_text_:internet in 6851) [ClassicSimilarity], result of:
      0.026008105 = score(doc=6851,freq=4.0), product of:
        0.11276311 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03819578 = queryNorm
        0.23064373 = fieldWeight in 6851, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6851)
  0.14285715 = coord(1/7)
```
Abstract

Much has been written about the potential and pitfalls of macroscopic Web-based link analysis, yet there have been no studies that have provided clear statistical evidence that any of the proposed calculations can produce results over large areas of the Web that correlate with phenomena external to the Internet. This article attempts to provide such evidence through an evaluation of Ingwersen's (1998) proposed external Web Impact Factor (WIF) for the original use of the Web: the interlinking of academic research. In particular, it studies the case of the relationship between academic hyperlinks and research activity for universities in Britain, a country chosen for its variety of institutions and the existence of an official government rating exercise for research. After reviewing the numerous reasons why link counts may be unreliable, it demonstrates that four different WIFs do, in fact, correlate with the conventional academic research measures. The WIF delivering the greatest correlation with research rankings was the ratio of Web pages with links pointing at research-based pages to faculty numbers. The scarcity of links to electronic academic papers in the data set suggests that, in contrast to citation analysis, this WIF is measuring the reputations of universities and their scholars, rather than the quality of their publications

Theme

Internet
Thelwall, M.: Results from a web impact factor crawler (2001) 0.00
```
0.0037154437 = product of:
  0.026008105 = sum of:
    0.026008105 = weight(_text_:internet in 4490) [ClassicSimilarity], result of:
      0.026008105 = score(doc=4490,freq=4.0), product of:
        0.11276311 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03819578 = queryNorm
        0.23064373 = fieldWeight in 4490, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4490)
  0.14285715 = coord(1/7)
```
Abstract

Web impact factors, the proposed web equivalent of impact factors for journals, can be calculated by using search engines. It has been found that the results are problematic because of the variable coverage of search engines as well as their ability to give significantly different results over short periods of time. The fundamental problem is that although some search engines provide a functionality that is capable of being used for impact calculations, this is not their primary task and therefore they do not give guarantees as to performance in this respect. In this paper, a bespoke web crawler designed specifically for the calculation of reliable WIFs is presented. This crawler was used to calculate WIFs for a number of UK universities, and the results of these calculations are discussed. The principal findings were that with certain restrictions, WIFs can be calculated reliably, but do not correlate with accepted research rankings owing to the variety of material hosted on university servers. Changes to the calculations to improve the fit of the results to research rankings are proposed, but there are still inherent problems undermining the reliability of the calculation. These problems still apply if the WIF scores are taken on their own as indicators of the general impact of any area of the Internet, but with care would not apply to online journals.

Theme

Internet

Payne, N.; Thelwall, M.: Mathematical models for academic webs : linear relationship or non-linear power law? (2005) 0.00

0.0036781013 = product of:
  0.025746709 = sum of:
    0.025746709 = weight(_text_:internet in 1066) [ClassicSimilarity], result of:
      0.025746709 = score(doc=1066,freq=2.0), product of:
        0.11276311 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03819578 = queryNorm
        0.22832564 = fieldWeight in 1066, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1066)
  0.14285715 = coord(1/7)

Theme: Internet

Thelwall, M.: Can Google's PageRank be used to find the most important academic Web pages? (2003) 0.00
```
0.0033097868 = product of:
  0.023168506 = sum of:
    0.023168506 = weight(_text_:retrieval in 4457) [ClassicSimilarity], result of:
      0.023168506 = score(doc=4457,freq=2.0), product of:
        0.11553899 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03819578 = queryNorm
        0.20052543 = fieldWeight in 4457, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=4457)
  0.14285715 = coord(1/7)
```
Abstract

Google's PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems in order to test whether it is capable of identifying the most important Web pages. The results are also compared with simple inlink counts. It was discovered that the highest inlinked pages do not always have the highest PageRank, indicating that the two metrics are genuinely different, even for the top pages. More significantly, however, internal links dominated external links for the high ranks in either method and superficial reasons accounted for high scores in both cases. It is concluded that PageRank is not useful for identifying the top pages in a site and that it must be combined with a powerful text matching techniques in order to get the quality of information retrieval results provided by Google.
Thelwall, M.; Vaughan, L.: New versions of PageRank employing alternative Web document models (2004) 0.00
```
0.0033097868 = product of:
  0.023168506 = sum of:
    0.023168506 = weight(_text_:retrieval in 674) [ClassicSimilarity], result of:
      0.023168506 = score(doc=674,freq=2.0), product of:
        0.11553899 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03819578 = queryNorm
        0.20052543 = fieldWeight in 674, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=674)
  0.14285715 = coord(1/7)
```
Abstract

Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects' rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.

Thelwall, M.; Wilkinson, D.: Graph structure in three national academic Webs : power laws with anomalies (2003) 0.00

0.0031526582 = product of:
  0.022068607 = sum of:
    0.022068607 = weight(_text_:internet in 1681) [ClassicSimilarity], result of:
      0.022068607 = score(doc=1681,freq=2.0), product of:
        0.11276311 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03819578 = queryNorm
        0.1957077 = fieldWeight in 1681, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=1681)
  0.14285715 = coord(1/7)

Theme: Internet

Thelwall, M.: Text characteristics of English language university Web sites (2005) 0.00
```
0.0031526582 = product of:
  0.022068607 = sum of:
    0.022068607 = weight(_text_:internet in 3463) [ClassicSimilarity], result of:
      0.022068607 = score(doc=3463,freq=2.0), product of:
        0.11276311 = queryWeight, product of:
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.03819578 = queryNorm
        0.1957077 = fieldWeight in 3463, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.9522398 = idf(docFreq=6276, maxDocs=44218)
          0.046875 = fieldNorm(doc=3463)
  0.14285715 = coord(1/7)
```
Abstract

The nature of the contents of academic Web sites is of direct relevance to the new field of scientific Web intelligence, and for search engine and topic-specific crawler designers. We analyze word frequencies in national academic Webs using the Web sites of three Englishspeaking nations: Australia, New Zealand, and the United Kingdom. Strong regularities were found in page size and word frequency distributions, but with significant anomalies. At least 26% of pages contain no words. High frequency words include university names and acronyms, Internet terminology, and computing product names: not always words in common usage away from the Web. A minority of low frequency words are spelling mistakes, with other common types including nonwords, proper names, foreign language terms or computer science variable names. Based upon these findings, recommendations for data cleansing and filtering are made, particularly for clustering applications.

Search (53 results, page 1 of 3)

Authors

Years

Themes