Search (44 results, page 1 of 3)

  • × author_ss:"Thelwall, M."
  1. Thelwall, M.: Assessing web search engines : a webometric approach (2011) 0.05
    0.046684146 = product of:
      0.12449105 = sum of:
        0.05077526 = weight(_text_:case in 10) [ClassicSimilarity], result of:
          0.05077526 = score(doc=10,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.29144385 = fieldWeight in 10, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.046875 = fieldNorm(doc=10)
        0.04182736 = weight(_text_:studies in 10) [ClassicSimilarity], result of:
          0.04182736 = score(doc=10,freq=2.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.26452032 = fieldWeight in 10, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.046875 = fieldNorm(doc=10)
        0.031888437 = product of:
          0.06377687 = sum of:
            0.06377687 = weight(_text_:area in 10) [ClassicSimilarity], result of:
              0.06377687 = score(doc=10,freq=2.0), product of:
                0.1952553 = queryWeight, product of:
                  4.927245 = idf(docFreq=870, maxDocs=44218)
                  0.03962768 = queryNorm
                0.32663327 = fieldWeight in 10, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.927245 = idf(docFreq=870, maxDocs=44218)
                  0.046875 = fieldNorm(doc=10)
          0.5 = coord(1/2)
      0.375 = coord(3/8)
    
    Abstract
    Information Retrieval (IR) research typically evaluates search systems in terms of the standard precision, recall and F-measures to weight the relative importance of precision and recall (e.g. van Rijsbergen, 1979). All of these assess the extent to which the system returns good matches for a query. In contrast, webometric measures are designed specifically for web search engines and are designed to monitor changes in results over time and various aspects of the internal logic of the way in which search engine select the results to be returned. This chapter introduces a range of webometric measurements and illustrates them with case studies of Google, Bing and Yahoo! This is a very fertile area for simple and complex new investigations into search engine results.
  2. Zuccala, A.; Thelwall, M.; Oppenheim, C.; Dhiensa, R.: Web intelligence analyses of digital libraries : a case study of the National electronic Library for Health (NeLH) (2007) 0.04
    0.038431544 = product of:
      0.102484114 = sum of:
        0.026727835 = weight(_text_:libraries in 838) [ClassicSimilarity], result of:
          0.026727835 = score(doc=838,freq=4.0), product of:
            0.13017908 = queryWeight, product of:
              3.2850544 = idf(docFreq=4499, maxDocs=44218)
              0.03962768 = queryNorm
            0.2053159 = fieldWeight in 838, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2850544 = idf(docFreq=4499, maxDocs=44218)
              0.03125 = fieldNorm(doc=838)
        0.047871374 = weight(_text_:case in 838) [ClassicSimilarity], result of:
          0.047871374 = score(doc=838,freq=4.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.2747759 = fieldWeight in 838, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03125 = fieldNorm(doc=838)
        0.027884906 = weight(_text_:studies in 838) [ClassicSimilarity], result of:
          0.027884906 = score(doc=838,freq=2.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.17634688 = fieldWeight in 838, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03125 = fieldNorm(doc=838)
      0.375 = coord(3/8)
    
    Abstract
    Purpose - The purpose of this paper is to explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the National electronic Library for Health (NeLH). Design/methodology/approach - The Web intelligence techniques in this study are a combination of link analysis (web structure mining), web server log file analysis (web usage mining), and text analysis (web content mining), utilizing the power of commercial search engines and drawing upon the information science fields of bibliometrics and webometrics. LexiURL is a computer program designed to calculate summary statistics for lists of links or URLs. Its output is a series of standard reports, for example listing and counting all of the different domain names in the data. Findings - Link data, when analysed together with user transaction log files (i.e. Web referring domains) can provide insights into who is using a digital library and when, and who could be using the digital library if they are "surfing" a particular part of the Web; in this case any site that is linked to or colinked with the NeLH. This study found that the NeLH was embedded in a multifaceted Web context, including many governmental, educational, commercial and organisational sites, with the most interesting being sites from the.edu domain, representing American Universities. Not many links directed to the NeLH were followed on September 25, 2005 (the date of the log file analysis and link extraction analysis), which means that users who access the digital library have been arriving at the site via only a few select links, bookmarks and search engine searches, or non-electronic sources. Originality/value - A number of studies concerning digital library users have been carried out using log file analysis as a research tool. Log files focus on real-time user transactions; while LexiURL can be used to extract links and colinks associated with a digital library's growing Web network. This Web network is not recognized often enough, and can be a useful indication of where potential users are surfing, even if they have not yet specifically visited the NeLH site.
  3. Thelwall, M.: ¬A comparison of link and URL citation counting (2011) 0.03
    0.025671326 = product of:
      0.1026853 = sum of:
        0.042312715 = weight(_text_:case in 4533) [ClassicSimilarity], result of:
          0.042312715 = score(doc=4533,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.24286987 = fieldWeight in 4533, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4533)
        0.06037259 = weight(_text_:studies in 4533) [ClassicSimilarity], result of:
          0.06037259 = score(doc=4533,freq=6.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.3818022 = fieldWeight in 4533, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4533)
      0.25 = coord(2/8)
    
    Abstract
    Purpose - Link analysis is an established topic within webometrics. It normally uses counts of links between sets of web sites or to sets of web sites. These link counts are derived from web crawlers or commercial search engines with the latter being the only alternative for some investigations. This paper compares link counts with URL citation counts in order to assess whether the latter could be a replacement for the former if the major search engines withdraw their advanced hyperlink search facilities. Design/methodology/approach - URL citation counts are compared with link counts for a variety of data sets used in previous webometric studies. Findings - The results show a high degree of correlation between the two but with URL citations being much less numerous, at least outside academia and business. Research limitations/implications - The results cover a small selection of 15 case studies and so the findings are only indicative. Significant differences between results indicate that the difference between link counts and URL citation counts will vary between webometric studies. Practical implications - Should link searches be withdrawn, then link analyses of less well linked non-academic, non-commercial sites would be seriously weakened, although citations based on e-mail addresses could help to make citations more numerous than links for some business and academic contexts. Originality/value - This is the first systematic study of the difference between link counts and URL citation counts in a variety of contexts and it shows that there are significant differences between the two.
  4. Thelwall, M.; Klitkou, A.; Verbeek, A.; Stuart, D.; Vincent, C.: Policy-relevant Webometrics for individual scientific fields (2010) 0.02
    0.023150655 = product of:
      0.09260262 = sum of:
        0.05077526 = weight(_text_:case in 3574) [ClassicSimilarity], result of:
          0.05077526 = score(doc=3574,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.29144385 = fieldWeight in 3574, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.046875 = fieldNorm(doc=3574)
        0.04182736 = weight(_text_:studies in 3574) [ClassicSimilarity], result of:
          0.04182736 = score(doc=3574,freq=2.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.26452032 = fieldWeight in 3574, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.046875 = fieldNorm(doc=3574)
      0.25 = coord(2/8)
    
    Abstract
    Despite over 10 years of research there is no agreement on the most suitable roles for Webometric indicators in support of research policy and almost no field-based Webometrics. This article partly fills these gaps by analyzing the potential of policy-relevant Webometrics for individual scientific fields with the help of 4 case studies. Although Webometrics cannot provide robust indicators of knowledge flows or research impact, it can provide some evidence of networking and mutual awareness. The scope of Webometrics is also relatively wide, including not only research organizations and firms but also intermediary groups like professional associations, Web portals, and government agencies. Webometrics can, therefore, provide evidence about the research process to compliment peer review, bibliometric, and patent indicators: tracking the early, mainly prepublication development of new fields and research funding initiatives, assessing the role and impact of intermediary organizations and the need for new ones, and monitoring the extent of mutual awareness in particular research areas.
  5. Thelwall, M.: Extracting macroscopic information from Web links (2001) 0.02
    0.022901682 = product of:
      0.09160673 = sum of:
        0.042312715 = weight(_text_:case in 6851) [ClassicSimilarity], result of:
          0.042312715 = score(doc=6851,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.24286987 = fieldWeight in 6851, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6851)
        0.049294014 = weight(_text_:studies in 6851) [ClassicSimilarity], result of:
          0.049294014 = score(doc=6851,freq=4.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.3117402 = fieldWeight in 6851, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6851)
      0.25 = coord(2/8)
    
    Abstract
    Much has been written about the potential and pitfalls of macroscopic Web-based link analysis, yet there have been no studies that have provided clear statistical evidence that any of the proposed calculations can produce results over large areas of the Web that correlate with phenomena external to the Internet. This article attempts to provide such evidence through an evaluation of Ingwersen's (1998) proposed external Web Impact Factor (WIF) for the original use of the Web: the interlinking of academic research. In particular, it studies the case of the relationship between academic hyperlinks and research activity for universities in Britain, a country chosen for its variety of institutions and the existence of an official government rating exercise for research. After reviewing the numerous reasons why link counts may be unreliable, it demonstrates that four different WIFs do, in fact, correlate with the conventional academic research measures. The WIF delivering the greatest correlation with research rankings was the ratio of Web pages with links pointing at research-based pages to faculty numbers. The scarcity of links to electronic academic papers in the data set suggests that, in contrast to citation analysis, this WIF is measuring the reputations of universities and their scholars, rather than the quality of their publications
  6. Thelwall, M.: Directing students to new information types : a new role for Google in literature searches? (2005) 0.02
    0.020994222 = product of:
      0.08397689 = sum of:
        0.04677371 = weight(_text_:libraries in 364) [ClassicSimilarity], result of:
          0.04677371 = score(doc=364,freq=4.0), product of:
            0.13017908 = queryWeight, product of:
              3.2850544 = idf(docFreq=4499, maxDocs=44218)
              0.03962768 = queryNorm
            0.35930282 = fieldWeight in 364, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2850544 = idf(docFreq=4499, maxDocs=44218)
              0.0546875 = fieldNorm(doc=364)
        0.037203178 = product of:
          0.074406356 = sum of:
            0.074406356 = weight(_text_:area in 364) [ClassicSimilarity], result of:
              0.074406356 = score(doc=364,freq=2.0), product of:
                0.1952553 = queryWeight, product of:
                  4.927245 = idf(docFreq=870, maxDocs=44218)
                  0.03962768 = queryNorm
                0.38107216 = fieldWeight in 364, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.927245 = idf(docFreq=870, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=364)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    Conducting a literature review is an important activity for postgraduates and many undergraduates. Librarians can play an important role, directing students to digital libraries, compiling online subject reSource lists, and educating about the need to evaluate the quality of online resources. In order to conduct an effective literature search in a new area, however, in some subjects it is necessary to gain basic topic knowledge, including specialist vocabularies. Google's link-based page ranking algorithm makes this search engine an ideal tool for finding specialist topic introductory material, particularly in computer science, and so librarians should be teaching this as part of a strategic literature review approach.
    Source
    Libraries and Google. Eds.: Miller, W. u. R.M. Pellen
  7. Didegah, F.; Thelwall, M.: Co-saved, co-tweeted, and co-cited networks (2018) 0.01
    0.01111404 = product of:
      0.04445616 = sum of:
        0.02834915 = weight(_text_:libraries in 4291) [ClassicSimilarity], result of:
          0.02834915 = score(doc=4291,freq=2.0), product of:
            0.13017908 = queryWeight, product of:
              3.2850544 = idf(docFreq=4499, maxDocs=44218)
              0.03962768 = queryNorm
            0.2177704 = fieldWeight in 4291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2850544 = idf(docFreq=4499, maxDocs=44218)
              0.046875 = fieldNorm(doc=4291)
        0.01610701 = product of:
          0.03221402 = sum of:
            0.03221402 = weight(_text_:22 in 4291) [ClassicSimilarity], result of:
              0.03221402 = score(doc=4291,freq=2.0), product of:
                0.13876937 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03962768 = queryNorm
                0.23214069 = fieldWeight in 4291, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4291)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    Counts of tweets and Mendeley user libraries have been proposed as altmetric alternatives to citation counts for the impact assessment of articles. Although both have been investigated to discover whether they correlate with article citations, it is not known whether users tend to tweet or save (in Mendeley) the same kinds of articles that they cite. In response, this article compares pairs of articles that are tweeted, saved to a Mendeley library, or cited by the same user, but possibly a different user for each source. The study analyzes 1,131,318 articles published in 2012, with minimum tweeted (10), saved to Mendeley (100), and cited (10) thresholds. The results show surprisingly minor overall overlaps between the three phenomena. The importance of journals for Twitter and the presence of many bots at different levels of activity suggest that this site has little value for impact altmetrics. The moderate differences between patterns of saving and citation suggest that Mendeley can be used for some types of impact assessments, but sensitivity is needed for underlying differences.
    Date
    28. 7.2018 10:00:22
  8. Li, X.; Thelwall, M.; Kousha, K.: ¬The role of arXiv, RePEc, SSRN and PMC in formal scholarly communication (2015) 0.01
    0.009999052 = product of:
      0.07999241 = sum of:
        0.07999241 = sum of:
          0.0531474 = weight(_text_:area in 2593) [ClassicSimilarity], result of:
            0.0531474 = score(doc=2593,freq=2.0), product of:
              0.1952553 = queryWeight, product of:
                4.927245 = idf(docFreq=870, maxDocs=44218)
                0.03962768 = queryNorm
              0.27219442 = fieldWeight in 2593, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.927245 = idf(docFreq=870, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2593)
          0.026845016 = weight(_text_:22 in 2593) [ClassicSimilarity], result of:
            0.026845016 = score(doc=2593,freq=2.0), product of:
              0.13876937 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03962768 = queryNorm
              0.19345059 = fieldWeight in 2593, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2593)
      0.125 = coord(1/8)
    
    Abstract
    Purpose The four major Subject Repositories (SRs), arXiv, Research Papers in Economics (RePEc), Social Science Research Network (SSRN) and PubMed Central (PMC), are all important within their disciplines but no previous study has systematically compared how often they are cited in academic publications. In response, the purpose of this paper is to report an analysis of citations to SRs from Scopus publications, 2000-2013. Design/methodology/approach Scopus searches were used to count the number of documents citing the four SRs in each year. A random sample of 384 documents citing the four SRs was then visited to investigate the nature of the citations. Findings Each SR was most cited within its own subject area but attracted substantial citations from other subject areas, suggesting that they are open to interdisciplinary uses. The proportion of documents citing each SR is continuing to increase rapidly, and the SRs all seem to attract substantial numbers of citations from more than one discipline. Research limitations/implications Scopus does not cover all publications, and most citations to documents found in the four SRs presumably cite the published version, when one exists, rather than the repository version. Practical implications SRs are continuing to grow and do not seem to be threatened by institutional repositories and so research managers should encourage their continued use within their core disciplines, including for research that aims at an audience in other disciplines. Originality/value This is the first simultaneous analysis of Scopus citations to the four most popular SRs.
    Date
    20. 1.2015 18:30:22
  9. Thelwall, M.; Prabowo, R.; Fairclough, R.: Are raw RSS feeds suitable for broad issue scanning? : a science concern case study (2006) 0.01
    0.007479902 = product of:
      0.059839215 = sum of:
        0.059839215 = weight(_text_:case in 6116) [ClassicSimilarity], result of:
          0.059839215 = score(doc=6116,freq=4.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.34346986 = fieldWeight in 6116, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6116)
      0.125 = coord(1/8)
    
    Abstract
    Broad issue scanning is the task of identifying important public debates arising in a given broad issue; really simple syndication (RSS) feeds are a natural information source for investigating broad issues. RSS, as originally conceived, is a method for publishing timely and concise information on the Internet, for example, about the main stories in a news site or the latest postings in a blog. RSS feeds are potentially a nonintrusive source of high-quality data about public opinion: Monitoring a large number may allow quantitative methods to extract information relevant to a given need. In this article we describe an RSS feed-based coword frequency method to identify bursts of discussion relevant to a given broad issue. A case study of public science concerns is used to demonstrate the method and assess the suitability of raw RSS feeds for broad issue scanning (i.e., without data cleansing). An attempt to identify genuine science concern debates from the corpus through investigating the top 1,000 "burst" words found only two genuine debates, however. The low success rate was mainly caused by a few pathological feeds that dominated the results and obscured any significant debates. The results point to the need to develop effective data cleansing procedures for RSS feeds, particularly if there is not a large quantity of discussion about the broad issue, and a range of potential techniques is suggested. Finally, the analysis confirmed that the time series information generated by real-time monitoring of RSS feeds could usefully illustrate the evolution of new debates relevant to a broad issue.
  10. Thelwall, M.; Li, X.; Barjak, F.; Robinson, S.: Assessing the international web connectivity of research groups (2008) 0.01
    0.007479902 = product of:
      0.059839215 = sum of:
        0.059839215 = weight(_text_:case in 1401) [ClassicSimilarity], result of:
          0.059839215 = score(doc=1401,freq=4.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.34346986 = fieldWeight in 1401, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1401)
      0.125 = coord(1/8)
    
    Abstract
    Purpose - The purpose of this paper is to claim that it is useful to assess the web connectivity of research groups, describe hyperlink-based techniques to achieve this and present brief details of European life sciences research groups as a case study. Design/methodology/approach - A commercial search engine was harnessed to deliver hyperlink data via its automatic query submission interface. A special purpose link analysis tool, LexiURL, then summarised and graphed the link data in appropriate ways. Findings - Webometrics can provide a wide range of descriptive information about the international connectivity of research groups. Research limitations/implications - Only one field was analysed, data was taken from only one search engine, and the results were not validated. Practical implications - Web connectivity seems to be particularly important for attracting overseas job applicants and to promote research achievements and capabilities, and hence we contend that it can be useful for national and international governments to use webometrics to ensure that the web is being used effectively by research groups. Originality/value - This is the first paper to make a case for the value of using a range of webometric techniques to evaluate the web presences of research groups within a field, and possibly the first "applied" webometrics study produced for an external contract.
  11. Thelwall, M.; Stuart, D.: Web crawling ethics revisited : cost, privacy, and denial of service (2006) 0.01
    0.0074047255 = product of:
      0.059237804 = sum of:
        0.059237804 = weight(_text_:case in 6098) [ClassicSimilarity], result of:
          0.059237804 = score(doc=6098,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.34001783 = fieldWeight in 6098, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6098)
      0.125 = coord(1/8)
    
    Abstract
    Ethical aspects of the employment of Web crawlers for information science research and other contexts are reviewed. The difference between legal and ethical uses of communications technologies is emphasized as well as the changing boundary between ethical and unethical conduct. A review of the potential impacts on Web site owners is used to underpin a new framework for ethical crawling, and it is argued that delicate human judgment is required for each individual case, with verdicts likely to change over time. Decisions can be based upon an approximate cost-benefit analysis, but it is crucial that crawler owners find out about the technological issues affecting the owners of the sites being crawled in order to produce an informed assessment.
  12. Vaughan, L.; Thelwall, M.: ¬A modelling approach to uncover hyperlink patterns : the case of Canadian universities (2005) 0.01
    0.0074047255 = product of:
      0.059237804 = sum of:
        0.059237804 = weight(_text_:case in 1014) [ClassicSimilarity], result of:
          0.059237804 = score(doc=1014,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.34001783 = fieldWeight in 1014, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1014)
      0.125 = coord(1/8)
    
  13. Thelwall, M.; Price, L.: Language evolution and the spread of ideas on the Web : a procedure for identifying emergent hybrid word (2006) 0.01
    0.0063469075 = product of:
      0.05077526 = sum of:
        0.05077526 = weight(_text_:case in 5896) [ClassicSimilarity], result of:
          0.05077526 = score(doc=5896,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.29144385 = fieldWeight in 5896, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.046875 = fieldNorm(doc=5896)
      0.125 = coord(1/8)
    
    Abstract
    Word usage is of interest to linguists for its own sake as well as to social scientists and others who seek to track the spread of ideas, for example, in public debates over political decisions. The historical evolution of language can be analyzed with the tools of corpus linguistics through evolving corpora and the Web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the Web, focusing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.
  14. Thelwall, M.; Vann, K.; Fairclough, R.: Web issue analysis : an integrated water resource management case study (2006) 0.01
    0.0063469075 = product of:
      0.05077526 = sum of:
        0.05077526 = weight(_text_:case in 5906) [ClassicSimilarity], result of:
          0.05077526 = score(doc=5906,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.29144385 = fieldWeight in 5906, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.046875 = fieldNorm(doc=5906)
      0.125 = coord(1/8)
    
  15. Thelwall, M.: Extracting accurate and complete results from search engines : case study windows live (2008) 0.01
    0.0063469075 = product of:
      0.05077526 = sum of:
        0.05077526 = weight(_text_:case in 1338) [ClassicSimilarity], result of:
          0.05077526 = score(doc=1338,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.29144385 = fieldWeight in 1338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.046875 = fieldNorm(doc=1338)
      0.125 = coord(1/8)
    
  16. Shifman, L.; Thelwall, M.: Assessing global diffusion with Web memetics : the spread and evolution of a popular joke (2009) 0.01
    0.0063469075 = product of:
      0.05077526 = sum of:
        0.05077526 = weight(_text_:case in 3303) [ClassicSimilarity], result of:
          0.05077526 = score(doc=3303,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.29144385 = fieldWeight in 3303, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.046875 = fieldNorm(doc=3303)
      0.125 = coord(1/8)
    
    Abstract
    Memes are small units of culture, analogous to genes, which flow from person to person by copying or imitation. More than any previous medium, the Internet has the technical capabilities for global meme diffusion. Yet, to spread globally, memes need to negotiate their way through cultural and linguistic borders. This article introduces a new broad method, Web memetics, comprising extensive Web searches and combined quantitative and qualitative analyses, to identify and assess: (a) the different versions of a meme, (b) its evolution online, and (c) its Web presence and translation into common Internet languages. This method is demonstrated through one extensively circulated joke about men, women, and computers. The results show that the joke has mutated into several different versions and is widely translated, and that translations incorporate small, local adaptations while retaining the English versions' fundamental components. In conclusion, Web memetics has demonstrated its ability to identify and track the evolution and spread of memes online, with interesting results, albeit for only one case study.
  17. Thelwall, M.; Delgado, M.M.: Arts and humanities research evaluation : no metrics please, just data (2015) 0.01
    0.0063469075 = product of:
      0.05077526 = sum of:
        0.05077526 = weight(_text_:case in 2313) [ClassicSimilarity], result of:
          0.05077526 = score(doc=2313,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.29144385 = fieldWeight in 2313, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.046875 = fieldNorm(doc=2313)
      0.125 = coord(1/8)
    
    Abstract
    Purpose The purpose of this paper is to make an explicit case for the use of data with contextual information as evidence in arts and humanities research evaluations rather than systematic metrics. Design/methodology/approach A survey of the strengths and limitations of citation-based indicators is combined with evidence about existing uses of wider impact data in the arts and humanities, with particular reference to the 2014 UK Research Excellence Framework. Findings Data are already used as impact evidence in the arts and humanities but this practice should become more widespread. Practical implications Arts and humanities researchers should be encouraged to think creatively about the kinds of data that they may be able to generate in support of the value of their research and should not rely upon standardised metrics. Originality/value This paper combines practices emerging in the arts and humanities with research evaluation from a scientometric perspective to generate new recommendations.
  18. Kousha, K.; Thelwall, M.; Rezaie, S.: Assessing the citation impact of books : the role of Google Books, Google Scholar, and Scopus (2011) 0.01
    0.0061617517 = product of:
      0.049294014 = sum of:
        0.049294014 = weight(_text_:studies in 4920) [ClassicSimilarity], result of:
          0.049294014 = score(doc=4920,freq=4.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.3117402 = fieldWeight in 4920, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4920)
      0.125 = coord(1/8)
    
    Abstract
    Citation indictors are increasingly used in some subject areas to support peer review in the evaluation of researchers and departments. Nevertheless, traditional journal-based citation indexes may be inadequate for the citation impact assessment of book-based disciplines. This article examines whether online citations from Google Books and Google Scholar can provide alternative sources of citation evidence. To investigate this, we compared the citation counts to 1,000 books submitted to the 2008 U.K. Research Assessment Exercise (RAE) from Google Books and Google Scholar with Scopus citations across seven book-based disciplines (archaeology; law; politics and international studies; philosophy; sociology; history; and communication, cultural, and media studies). Google Books and Google Scholar citations to books were 1.4 and 3.2 times more common than were Scopus citations, and their medians were more than twice and three times as high as were Scopus median citations, respectively. This large number of citations is evidence that in book-oriented disciplines in the social sciences, arts, and humanities, online book citations may be sufficiently numerous to support peer review for research evaluation, at least in the United Kingdom.
  19. Payne, N.; Thelwall, M.: Mathematical models for academic webs : linear relationship or non-linear power law? (2005) 0.01
    0.006099823 = product of:
      0.048798583 = sum of:
        0.048798583 = weight(_text_:studies in 1066) [ClassicSimilarity], result of:
          0.048798583 = score(doc=1066,freq=2.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.30860704 = fieldWeight in 1066, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1066)
      0.125 = coord(1/8)
    
    Abstract
    Previous studies of academic web interlinking have tended to hypothesise that the relationship between the research of a university and links to or from its web site should follow a linear trend, yet the typical distribution of web data, in general, seems to be a non-linear power law. This paper assesses whether a linear trend or a power law is the most appropriate method with which to model the relationship between research and web site size or outlinks. Following linear regression, analysis of the confidence intervals for the logarithmic graphs, and analysis of the outliers, the results suggest that a linear trend is more appropriate than a non-linear power law.
  20. Thelwall, M.; Harries, G.: Do the Web Sites of Higher Rated Scholars Have Significantly More Online Impact? (2004) 0.01
    0.0052890894 = product of:
      0.042312715 = sum of:
        0.042312715 = weight(_text_:case in 2123) [ClassicSimilarity], result of:
          0.042312715 = score(doc=2123,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.24286987 = fieldWeight in 2123, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2123)
      0.125 = coord(1/8)
    
    Abstract
    The quality and impact of academic Web sites is of interest to many audiences, including the scholars who use them and Web educators who need to identify best practice. Several large-scale European Union research projects have been funded to build new indicators for online scientific activity, reflecting recognition of the importance of the Web for scholarly communication. In this paper we address the key question of whether higher rated scholars produce higher impact Web sites, using the United Kingdom as a case study and measuring scholars' quality in terms of university-wide average research ratings. Methodological issues concerning the measurement of the online impact are discussed, leading to the adoption of counts of links to a university's constituent single domain Web sites from an aggregated counting metric. The findings suggest that universities with higher rated scholars produce significantly more Web content but with a similar average online impact. Higher rated scholars therefore attract more total links from their peers, but only by being more prolific, refuting earlier suggestions. It can be surmised that general Web publications are very different from scholarly journal articles and conference papers, for which scholarly quality does associate with citation impact. This has important implications for the construction of new Web indicators, for example that online impact should not be used to assess the quality of small groups of scholars, even within a single discipline.