Search (113 results, page 2 of 6)

  • × author_ss:"Thelwall, M."
  1. Thelwall, M.; Wilkinson, D.: Finding similar academic Web sites with links, bibliometric couplings and colinks (2004) 0.00
    0.0044597755 = product of:
      0.017839102 = sum of:
        0.017839102 = weight(_text_:information in 2571) [ClassicSimilarity], result of:
          0.017839102 = score(doc=2571,freq=6.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.20156369 = fieldWeight in 2571, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2571)
      0.25 = coord(1/4)
    
    Abstract
    A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or sites that are similar in content. In this paper we assess the extent to which links, colinks and couplings can be used to identify similar Web sites. As an experiment, a random sample of 500 pairs of domains from the UK academic Web were taken and human assessments of site similarity, based upon content type, were compared against ratings for the three concepts. The results show that using a combination of all three gives the highest probability of identifying similar sites, but surprisingly this was only a marginal improvement over using links alone. Another unexpected result was that high values for either colink counts or couplings were associated with only a small increased likelihood of similarity. The principal advantage of using couplings and colinks was found to be greater coverage in terms of a much larger number of pairs of sites being connected by these measures, instead of increased probability of similarity. In information retrieval terminology, this is improved recall rather than improved precision.
    Source
    Information processing and management. 40(2004) no.3, S.515-526
  2. Thelwall, M.; Vann, K.; Fairclough, R.: Web issue analysis : an integrated water resource management case study (2006) 0.00
    0.0044597755 = product of:
      0.017839102 = sum of:
        0.017839102 = weight(_text_:information in 5906) [ClassicSimilarity], result of:
          0.017839102 = score(doc=5906,freq=6.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.20156369 = fieldWeight in 5906, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=5906)
      0.25 = coord(1/4)
    
    Abstract
    In this article Web issue analysis is introduced as a new technique to investigate an issue as reflected on the Web. The issue chosen, integrated water resource management (IWRM), is a United Nations-initiated paradigm for managing water resources in an international context, particularly in developing nations. As with many international governmental initiatives, there is a considerable body of online information about it: 41.381 hypertext markup language (HTML) pages and 28.735 PDF documents mentioning the issue were downloaded. A page uniform resource locator (URL) and link analysis revealed the international and sectoral spread of IWRM. A noun and noun phrase occurrence analysis was used to identify the issues most commonly discussed, revealing some unexpected topics such as private sector and economic growth. Although the complexity of the methods required to produce meaningful statistics from the data is disadvantageous to easy interpretation, it was still possible to produce data that could be subject to a reasonably intuitive interpretation. Hence Web issue analysis is claimed to be a useful new technique for information science.
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.10, S.1303-1314
  3. Thelwall, M.; Vaughan, L.: New versions of PageRank employing alternative Web document models (2004) 0.00
    0.0044597755 = product of:
      0.017839102 = sum of:
        0.017839102 = weight(_text_:information in 674) [ClassicSimilarity], result of:
          0.017839102 = score(doc=674,freq=6.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.20156369 = fieldWeight in 674, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=674)
      0.25 = coord(1/4)
    
    Abstract
    Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects' rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.
  4. Kousha, K.; Thelwall, M.: Google book search : citation analysis for social science and the humanities (2009) 0.00
    0.0042914203 = product of:
      0.017165681 = sum of:
        0.017165681 = weight(_text_:information in 2946) [ClassicSimilarity], result of:
          0.017165681 = score(doc=2946,freq=8.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.19395474 = fieldWeight in 2946, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2946)
      0.25 = coord(1/4)
    
    Abstract
    In both the social sciences and the humanities, books and monographs play significant roles in research communication. The absence of citations from most books and monographs from the Thomson Reuters/Institute for Scientific Information databases (ISI) has been criticized, but attempts to include citations from or to books in the research evaluation of the social sciences and humanities have not led to widespread adoption. This article assesses whether Google Book Search (GBS) can partially fill this gap by comparing citations from books with citations from journal articles to journal articles in 10 science, social science, and humanities disciplines. Book citations were 31% to 212% of ISI citations and, hence, numerous enough to supplement ISI citations in the social sciences and humanities covered, but not in the sciences (3%-5%), except for computing (46%), due to numerous published conference proceedings. A case study was also made of all 1,923 articles in the 51 information science and library science ISI-indexed journals published in 2003. Within this set, highly book-cited articles tended to receive many ISI citations, indicating a significant relationship between the two types of citation data, but with important exceptions that point to the additional information provided by book citations. In summary, GBS is clearly a valuable new source of citation data for the social sciences and humanities. One practical implication is that book-oriented scholars should consult it for additional citations to their work when applying for promotion and tenure.
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.8, S.1537-1549
  5. Kousha, K.; Thelwall, M.: News stories as evidence for research? : BBC citations from articles, Books, and Wikipedia (2017) 0.00
    0.0042914203 = product of:
      0.017165681 = sum of:
        0.017165681 = weight(_text_:information in 3760) [ClassicSimilarity], result of:
          0.017165681 = score(doc=3760,freq=8.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.19395474 = fieldWeight in 3760, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3760)
      0.25 = coord(1/4)
    
    Abstract
    Although news stories target the general public and are sometimes inaccurate, they can serve as sources of real-world information for researchers. This article investigates the extent to which academics exploit journalism using content and citation analyses of online BBC News stories cited by Scopus articles. A total of 27,234 Scopus-indexed publications have cited at least one BBC News story, with a steady annual increase. Citations from the arts and humanities (2.8% of publications in 2015) and social sciences (1.5%) were more likely than citations from medicine (0.1%) and science (<0.1%). Surprisingly, half of the sampled Scopus-cited science and technology (53%) and medicine and health (47%) stories were based on academic research, rather than otherwise unpublished information, suggesting that researchers have chosen a lower-quality secondary source for their citations. Nevertheless, the BBC News stories that were most frequently cited by Scopus, Google Books, and Wikipedia introduced new information from many different topics, including politics, business, economics, statistics, and reports about events. Thus, news stories are mediating real-world knowledge into the academic domain, a potential cause for concern.
    Source
    Journal of the Association for Information Science and Technology. 68(2017) no.8, S.2017-2028
  6. Thelwall, M.: Bibliometrics to webometrics (2009) 0.00
    0.00424829 = product of:
      0.01699316 = sum of:
        0.01699316 = weight(_text_:information in 4239) [ClassicSimilarity], result of:
          0.01699316 = score(doc=4239,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.1920054 = fieldWeight in 4239, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4239)
      0.25 = coord(1/4)
    
    Abstract
    Bibliometrics has changed out of all recognition since 1958; becoming established as a field, being taught widely in library and information science schools, and being at the core of a number of science evaluation research groups around the world. This was all made possible by the work of Eugene Garfield and his Science Citation Index. This article reviews the distance that bibliometrics has travelled since 1958 by comparing early bibliometrics with current practice, and by giving an overview of a range of recent developments, such as patent analysis, national research evaluation exercises, visualization techniques, new applications, online citation indexes, and the creation of digital libraries. Webometrics, a modern, fast-growing offshoot of bibliometrics, is reviewed in detail. Finally, future prospects are discussed with regard to both bibliometrics and webometrics.
    Source
    Information science in transition, Ed.: A. Gilchrist
  7. Thelwall, M.; Stuart, D.: Web crawling ethics revisited : cost, privacy, and denial of service (2006) 0.00
    0.00424829 = product of:
      0.01699316 = sum of:
        0.01699316 = weight(_text_:information in 6098) [ClassicSimilarity], result of:
          0.01699316 = score(doc=6098,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.1920054 = fieldWeight in 6098, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6098)
      0.25 = coord(1/4)
    
    Abstract
    Ethical aspects of the employment of Web crawlers for information science research and other contexts are reviewed. The difference between legal and ethical uses of communications technologies is emphasized as well as the changing boundary between ethical and unethical conduct. A review of the potential impacts on Web site owners is used to underpin a new framework for ethical crawling, and it is argued that delicate human judgment is required for each individual case, with verdicts likely to change over time. Decisions can be based upon an approximate cost-benefit analysis, but it is crucial that crawler owners find out about the technological issues affecting the owners of the sites being crawled in order to produce an informed assessment.
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.13, S.1771-1779
  8. Vaughan, L.; Thelwall, M.: Scholarly use of the Web : what are the key inducers of links to journal Web sites? (2003) 0.00
    0.0037164795 = product of:
      0.014865918 = sum of:
        0.014865918 = weight(_text_:information in 1236) [ClassicSimilarity], result of:
          0.014865918 = score(doc=1236,freq=6.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.16796975 = fieldWeight in 1236, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1236)
      0.25 = coord(1/4)
    
    Abstract
    Web links have been studied by information scientists for at least six years but it is only in the past two that clear evidence has emerged to show that counts of links to scholarly Web spaces (universities and departments) can correlate significantly with research measures, giving some credence to their use for the investigation of scholarly communication. This paper reports an a study to investigate the factors that influence the creation of links to journal Web sites. An empirical approach is used: collecting data and testing for significant patterns. The specific questions addressed are whether site age and site content are inducers of links to a journal's Web site as measured by the ratio of link counts to Journal Impact Factors, two variables previously discovered to be related. A new methodology for data collection is also introduced that uses the Internet Archive to obtain an earliest known creation date for Web sites. The results show that both site age and site content are significant factors for the disciplines studied: library and information science, and law. Comparisons between the two fields also show disciplinary differences in Web site characteristics. Scholars and publishers should be particularly aware that richer content an a journal's Web site tends to generate links and thus the traffic to the site.
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.1, S.29-38
  9. Barjak, F.; Thelwall, M.: ¬A statistical analysis of the web presences of European life sciences research teams (2008) 0.00
    0.0037164795 = product of:
      0.014865918 = sum of:
        0.014865918 = weight(_text_:information in 1383) [ClassicSimilarity], result of:
          0.014865918 = score(doc=1383,freq=6.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.16796975 = fieldWeight in 1383, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1383)
      0.25 = coord(1/4)
    
    Abstract
    Web links have been used for around ten years to explore the online impact of academic information and information producers. Nevertheless, few studies have attempted to relate link counts to relevant offline attributes of the owners of the targeted Web sites, with the exception of research productivity. This article reports the results of a study to relate site inlink counts to relevant owner characteristics for over 400 European life-science research group Web sites. The analysis confirmed that research-group size and Web-presence size were important for attracting Web links, although research productivity was not. Little evidence was found for significant influence of any of an array of factors, including research-group leader gender and industry connections. In addition, the choice of search engine for link data created a surprising international difference in the results, with Google perhaps giving unreliable results. Overall, the data collection, statistical analysis and results interpretation were all complex and it seems that we still need to know more about search engines, hyperlinks, and their function in science before we can draw conclusions on their usefulness and role in the canon of science and technology indicators.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.4, S.628-643
  10. Mohammadi , E.; Thelwall, M.: Mendeley readership altmetrics for the social sciences and humanities : research evaluation and knowledge flows (2014) 0.00
    0.0037164795 = product of:
      0.014865918 = sum of:
        0.014865918 = weight(_text_:information in 2190) [ClassicSimilarity], result of:
          0.014865918 = score(doc=2190,freq=6.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.16796975 = fieldWeight in 2190, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2190)
      0.25 = coord(1/4)
    
    Abstract
    Although there is evidence that counting the readers of an article in the social reference site, Mendeley, may help to capture its research impact, the extent to which this is true for different scientific fields is unknown. In this study, we compare Mendeley readership counts with citations for different social sciences and humanities disciplines. The overall correlation between Mendeley readership counts and citations for the social sciences was higher than for the humanities. Low and medium correlations between Mendeley bookmarks and citation counts in all the investigated disciplines suggest that these measures reflect different aspects of research impact. Mendeley data were also used to discover patterns of information flow between scientific fields. Comparing information flows based on Mendeley bookmarking data and cross-disciplinary citation analysis for the disciplines revealed substantial similarities and some differences. Thus, the evidence from this study suggests that Mendeley readership data could be used to help capture knowledge transfer across scientific disciplines, especially for people that read but do not author articles, as well as giving impact evidence at an earlier stage than is possible with citation counts.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.8, S.1627-1638
  11. Thelwall, M.; Kousha, K.: SlideShare presentations, citations, users, and trends : a professional site with academic and educational uses (2017) 0.00
    0.0037164795 = product of:
      0.014865918 = sum of:
        0.014865918 = weight(_text_:information in 3766) [ClassicSimilarity], result of:
          0.014865918 = score(doc=3766,freq=6.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.16796975 = fieldWeight in 3766, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3766)
      0.25 = coord(1/4)
    
    Abstract
    SlideShare is a free social website that aims to help users distribute and find presentations. Owned by LinkedIn since 2012, it targets a professional audience but may give value to scholarship through creating a long-term record of the content of talks. This article tests this hypothesis by analyzing sets of general and scholarly related SlideShare documents using content and citation analysis and popularity statistics reported on the site. The results suggest that academics, students, and teachers are a minority of SlideShare uploaders, especially since 2010, with most documents not being directly related to scholarship or teaching. About two thirds of uploaded SlideShare documents are presentation slides, with the remainder often being files associated with presentations or video recordings of talks. SlideShare is therefore a presentation-centered site with a predominantly professional user base. Although a minority of the uploaded SlideShare documents are cited by, or cite, academic publications, probably too few articles are cited by SlideShare to consider extracting SlideShare citations for research evaluation. Nevertheless, scholars should consider SlideShare to be a potential source of academic and nonacademic information, particularly in library and information science, education, and business.
    Source
    Journal of the Association for Information Science and Technology. 68(2017) no.8, S.1989-2003
  12. Vaughan, L.; Thelwall, M.: Search engine coverage bias : evidence and possible causes (2004) 0.00
    0.0036413912 = product of:
      0.014565565 = sum of:
        0.014565565 = weight(_text_:information in 2536) [ClassicSimilarity], result of:
          0.014565565 = score(doc=2536,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.16457605 = fieldWeight in 2536, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2536)
      0.25 = coord(1/4)
    
    Abstract
    Commercial search engines are now playing an increasingly important role in Web information dissemination and access. Of particular interest to business and national governments is whether the big engines have coverage biased towards the US or other countries. In our study we tested for national biases in three major search engines and found significant differences in their coverage of commercial Web sites. The US sites were much better covered than the others in the study: sites from China, Taiwan and Singapore. We then examined the possible technical causes of the differences and found that the language of a site does not affect its coverage by search engines. However, the visibility of a site, measured by the number of links to it, affects its chance to be covered by search engines. We conclude that the coverage bias does exist but this is due not to deliberate choices of the search engines but occurs as a natural result of cumulative advantage effects of US sites on the Web. Nevertheless, the bias remains a cause for international concern.
    Source
    Information processing and management. 40(2004) no.4, S.693-708
  13. Thelwall, M.: Assessing web search engines : a webometric approach (2011) 0.00
    0.0036413912 = product of:
      0.014565565 = sum of:
        0.014565565 = weight(_text_:information in 10) [ClassicSimilarity], result of:
          0.014565565 = score(doc=10,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.16457605 = fieldWeight in 10, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=10)
      0.25 = coord(1/4)
    
    Abstract
    Information Retrieval (IR) research typically evaluates search systems in terms of the standard precision, recall and F-measures to weight the relative importance of precision and recall (e.g. van Rijsbergen, 1979). All of these assess the extent to which the system returns good matches for a query. In contrast, webometric measures are designed specifically for web search engines and are designed to monitor changes in results over time and various aspects of the internal logic of the way in which search engine select the results to be returned. This chapter introduces a range of webometric measurements and illustrates them with case studies of Google, Bing and Yahoo! This is a very fertile area for simple and complex new investigations into search engine results.
    Source
    Innovations in information retrieval: perspectives for theory and practice. Eds.: A. Foster, u. P. Rafferty
  14. Maflahi, N.; Thelwall, M.: When are readership counts as useful as citation counts? : Scopus versus Mendeley for LIS journals (2016) 0.00
    0.0036413912 = product of:
      0.014565565 = sum of:
        0.014565565 = weight(_text_:information in 2495) [ClassicSimilarity], result of:
          0.014565565 = score(doc=2495,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.16457605 = fieldWeight in 2495, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2495)
      0.25 = coord(1/4)
    
    Abstract
    In theory, articles can attract readers on the social reference sharing site Mendeley before they can attract citations, so Mendeley altmetrics could provide early indications of article impact. This article investigates the influence of time on the number of Mendeley readers of an article through a theoretical discussion and an investigation into the relationship between counts of readers of, and citations to, 4 general library and information science (LIS) journals. For this discipline, it takes about 7 years for articles to attract as many Scopus citations as Mendeley readers, and after this the Spearman correlation between readers and citers is stable at about 0.6 for all years. This suggests that Mendeley readership counts may be useful impact indicators for both newer and older articles. The lack of dates for individual Mendeley article readers and an unknown bias toward more recent articles mean that readership data should be normalized individually by year, however, before making any comparisons between articles published in different years.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.1, S.191-199
  15. Thelwall, M.: Extracting macroscopic information from Web links (2001) 0.00
    0.0030344925 = product of:
      0.01213797 = sum of:
        0.01213797 = weight(_text_:information in 6851) [ClassicSimilarity], result of:
          0.01213797 = score(doc=6851,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.13714671 = fieldWeight in 6851, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6851)
      0.25 = coord(1/4)
    
    Source
    Journal of the American Society for Information Science and technology. 52(2001) no.13, S.1157-1168
  16. Thelwall, M.; Vaughan, L.; Björneborn, L.: Webometrics (2004) 0.00
    0.0030344925 = product of:
      0.01213797 = sum of:
        0.01213797 = weight(_text_:information in 4279) [ClassicSimilarity], result of:
          0.01213797 = score(doc=4279,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.13714671 = fieldWeight in 4279, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4279)
      0.25 = coord(1/4)
    
    Abstract
    Webometrics, the quantitative study of Web-related phenomena, emerged from the realization that methods originally designed for bibliometric analysis of scientific journal article citation patterns could be applied to the Web, with commercial search engines providing the raw data. Almind and Ingwersen (1997) defined the field and gave it its name. Other pioneers included Rodriguez Gairin (1997) and Aguillo (1998). Larson (1996) undertook exploratory link structure analysis, as did Rousseau (1997). Webometrics encompasses research from fields beyond information science such as communication studies, statistical physics, and computer science. In this review we concentrate on link analysis, but also cover other aspects of webometrics, including Web log fle analysis. One theme that runs through this chapter is the messiness of Web data and the need for data cleansing heuristics. The uncontrolled Web creates numerous problems in the interpretation of results, for instance, from the automatic creation or replication of links. The loose connection between top-level domain specifications (e.g., com, edu, and org) and their actual content is also a frustrating problem. For example, many .com sites contain noncommercial content, although com is ostensibly the main commercial top-level domain. Indeed, a skeptical researcher could claim that obstacles of this kind are so great that all Web analyses lack value. As will be seen, one response to this view, a view shared by critics of evaluative bibliometrics, is to demonstrate that Web data correlate significantly with some non-Web data in order to prove that the Web data are not wholly random. A practical response has been to develop increasingly sophisticated data cleansing techniques and multiple data analysis methods.
    Source
    Annual review of information science and technology. 39(2005), S.81-138
  17. Thelwall, M.: ¬A layered approach for investigating the topological structure of communities in the Web (2003) 0.00
    0.0030344925 = product of:
      0.01213797 = sum of:
        0.01213797 = weight(_text_:information in 4450) [ClassicSimilarity], result of:
          0.01213797 = score(doc=4450,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.13714671 = fieldWeight in 4450, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4450)
      0.25 = coord(1/4)
    
    Abstract
    A layered approach for identifying communities in the Web is presented and explored by applying the flake exact community identification algorithm to the UK academic Web. Although community or topic identification is a common task in information retrieval, a new perspective is developed by: the application of alternative document models, shifting the focus from individual pages to aggregated collections based upon Web directories, domains and entire sites; the removal of internal site links; and the adaptation of a new fast algorithm to allow fully-automated community identification using all possible single starting points. The overall topology of the graphs in the three least-aggregated layers was first investigated and found to include a large number of isolated points but, surprisingly, with most of the remainder being in one huge connected component, exact proportions varying by layer. The community identification process then found that the number of communities far exceeded the number of topological components, indicating that community identification is a potentially useful technique, even with random starting points. Both the number and size of communities identified was dependent on the parameter of the algorithm, with very different results being obtained in each case. In conclusion, the UK academic Web is embedded with layers of non-trivial communities and, if it is not unique in this, then there is the promise of improved results for information retrieval algorithms that can exploit this additional structure, and the application of the technique directly to partially automate Web metrics tasks such as that of finding all pages related to a given subject hosted by a single country's universities.
  18. Thelwall, M.: Quantitative comparisons of search engine results (2008) 0.00
    0.0030344925 = product of:
      0.01213797 = sum of:
        0.01213797 = weight(_text_:information in 2350) [ClassicSimilarity], result of:
          0.01213797 = score(doc=2350,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.13714671 = fieldWeight in 2350, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2350)
      0.25 = coord(1/4)
    
    Abstract
    Search engines are normally used to find information or Web sites, but Webometric investigations use them for quantitative data such as the number of pages matching a query and the international spread of those pages. For this type of application, the accuracy of the hit count estimates and range of URLs in the full results are important. Here, we compare the applications programming interfaces of Google, Yahoo!, and Live Search for 1,587 single word searches. The hit count estimates were broadly consistent but with Yahoo! and Google, reporting 5-6 times more hits than Live Search. Yahoo! tended to return slightly more matching URLs than Google, with Live Search returning significantly fewer. Yahoo!'s result URLs included a significantly wider range of domains and sites than the other two, and there was little consistency between the three engines in the number of different domains. In contrast, the three engines were reasonably consistent in the number of different top-level domains represented in the result URLs, although Yahoo! tended to return the most. In conclusion, quantitative results from the three search engines are mostly consistent but with unexpected types of inconsistency that users should be aware of. Google is recommended for hit count estimates but Yahoo! is recommended for all other Webometric purposes.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.11, S.1702-1710
  19. Kousha, K.; Thelwall, M.: Assessing the impact of disciplinary research on teaching : an automatic analysis of online syllabuses (2008) 0.00
    0.0030344925 = product of:
      0.01213797 = sum of:
        0.01213797 = weight(_text_:information in 2383) [ClassicSimilarity], result of:
          0.01213797 = score(doc=2383,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.13714671 = fieldWeight in 2383, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2383)
      0.25 = coord(1/4)
    
    Abstract
    The impact of published academic research in the sciences and social sciences, when measured, is commonly estimated by counting citations from journal articles. The Web has now introduced new potential sources of quantitative data online that could be used to measure aspects of research impact. In this article we assess the extent to which citations from online syllabuses could be a valuable source of evidence about the educational utility of research. An analysis of online syllabus citations to 70,700 articles published in 2003 in the journals of 12 subjects indicates that online syllabus citations were sufficiently numerous to be a useful impact indictor in some social sciences, including political science and information and library science, but not in others, nor in any sciences. This result was consistent with current social science research having, in general, more educational value than current science research. Moreover, articles frequently cited in online syllabuses were not necessarily highly cited by other articles. Hence it seems that online syllabus citations provide a valuable additional source of evidence about the impact of journals, scholars, and research articles in some social sciences.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.13, S.2060-2069
  20. Kousha, K.; Thelwall, M.; Rezaie, S.: Can the impact of scholarly images be assessed online? : an exploratory study using image identification technology (2010) 0.00
    0.0030344925 = product of:
      0.01213797 = sum of:
        0.01213797 = weight(_text_:information in 3966) [ClassicSimilarity], result of:
          0.01213797 = score(doc=3966,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.13714671 = fieldWeight in 3966, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3966)
      0.25 = coord(1/4)
    
    Abstract
    The web contains a huge number of digital pictures. For scholars publishing such images it is important to know how well used their images are, but no method seems to have been developed for monitoring the value of academic images. In particular, can the impact of scientific or artistic images be assessed through identifying images copied or reused on the Internet? This article explores a case study of 260 NASA images to investigate whether the TinEye search engine could theoretically help to provide this information. The results show that the selected pictures had a median of 11 online copies each. However, a classification of 210 of these copies reveals that only 1.4% were explicitly used in academic publications, reflecting research impact, and the majority of the NASA pictures were used for informal scholarly (or educational) communication (37%). Additional analyses of world famous paintings and scientific images about pathology and molecular structures suggest that image contents are important for the type and extent of image use. Although it is reasonable to use statistics derived from TinEye for assessing image reuse value, the extent of its image indexing is not known.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.9, S.1734-1744