Search (113 results, page 1 of 6)

  • × author_ss:"Thelwall, M."
  1. Thelwall, M.: Can Google's PageRank be used to find the most important academic Web pages? (2003) 0.03
    0.02708309 = product of:
      0.08124927 = sum of:
        0.007914125 = weight(_text_:information in 4457) [ClassicSimilarity], result of:
          0.007914125 = score(doc=4457,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.116372846 = fieldWeight in 4457, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=4457)
        0.023498412 = weight(_text_:retrieval in 4457) [ClassicSimilarity], result of:
          0.023498412 = score(doc=4457,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.20052543 = fieldWeight in 4457, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=4457)
        0.049836725 = weight(_text_:techniques in 4457) [ClassicSimilarity], result of:
          0.049836725 = score(doc=4457,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.2920283 = fieldWeight in 4457, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.046875 = fieldNorm(doc=4457)
      0.33333334 = coord(3/9)
    
    Abstract
    Google's PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems in order to test whether it is capable of identifying the most important Web pages. The results are also compared with simple inlink counts. It was discovered that the highest inlinked pages do not always have the highest PageRank, indicating that the two metrics are genuinely different, even for the top pages. More significantly, however, internal links dominated external links for the high ranks in either method and superficial reasons accounted for high scores in both cases. It is concluded that PageRank is not useful for identifying the top pages in a site and that it must be combined with a powerful text matching techniques in order to get the quality of information retrieval results provided by Google.
  2. Thelwall, M.; Wilson, P.: Does research with statistics have more impact? : the citation rank advantage of structural equation modeling (2016) 0.02
    0.019923627 = product of:
      0.08965632 = sum of:
        0.0065951035 = weight(_text_:information in 2900) [ClassicSimilarity], result of:
          0.0065951035 = score(doc=2900,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.09697737 = fieldWeight in 2900, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2900)
        0.08306122 = weight(_text_:techniques in 2900) [ClassicSimilarity], result of:
          0.08306122 = score(doc=2900,freq=8.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.4867139 = fieldWeight in 2900, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2900)
      0.22222222 = coord(2/9)
    
    Abstract
    Statistics are essential to many areas of research and individual statistical techniques may change the ways in which problems are addressed as well as the types of problems that can be tackled. Hence, specific techniques may tend to generate high-impact findings within science. This article estimates the citation advantage of a technique by calculating the average citation rank of articles using it in the issue of the journal in which they were published. Applied to structural equation modeling (SEM) and four related techniques in 3 broad fields, the results show citation advantages that vary by technique and broad field. For example, SEM seems to be more influential in all broad fields than the 4 simpler methods, with one exception, and hence seems to be particularly worth adding to statistical curricula. In contrast, Pearson correlation apparently has the highest average impact in medicine but the least in psychology. In conclusion, the results suggest that the importance of a statistical technique may vary by discipline and that even simple techniques can help to generate high-impact research in some contexts.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.5, S.1233-1244
  3. Thelwall, M.; Price, L.: Language evolution and the spread of ideas on the Web : a procedure for identifying emergent hybrid word (2006) 0.02
    0.017420867 = product of:
      0.0783939 = sum of:
        0.007914125 = weight(_text_:information in 5896) [ClassicSimilarity], result of:
          0.007914125 = score(doc=5896,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.116372846 = fieldWeight in 5896, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=5896)
        0.07047977 = weight(_text_:techniques in 5896) [ClassicSimilarity], result of:
          0.07047977 = score(doc=5896,freq=4.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.4129904 = fieldWeight in 5896, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.046875 = fieldNorm(doc=5896)
      0.22222222 = coord(2/9)
    
    Abstract
    Word usage is of interest to linguists for its own sake as well as to social scientists and others who seek to track the spread of ideas, for example, in public debates over political decisions. The historical evolution of language can be analyzed with the tools of corpus linguistics through evolving corpora and the Web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the Web, focusing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.10, S.1326-1337
  4. Thelwall, M.: Bibliometrics to webometrics (2009) 0.02
    0.01582233 = product of:
      0.07120049 = sum of:
        0.01305764 = weight(_text_:information in 4239) [ClassicSimilarity], result of:
          0.01305764 = score(doc=4239,freq=4.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.1920054 = fieldWeight in 4239, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4239)
        0.05814285 = weight(_text_:techniques in 4239) [ClassicSimilarity], result of:
          0.05814285 = score(doc=4239,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.3406997 = fieldWeight in 4239, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4239)
      0.22222222 = coord(2/9)
    
    Abstract
    Bibliometrics has changed out of all recognition since 1958; becoming established as a field, being taught widely in library and information science schools, and being at the core of a number of science evaluation research groups around the world. This was all made possible by the work of Eugene Garfield and his Science Citation Index. This article reviews the distance that bibliometrics has travelled since 1958 by comparing early bibliometrics with current practice, and by giving an overview of a range of recent developments, such as patent analysis, national research evaluation exercises, visualization techniques, new applications, online citation indexes, and the creation of digital libraries. Webometrics, a modern, fast-growing offshoot of bibliometrics, is reviewed in detail. Finally, future prospects are discussed with regard to both bibliometrics and webometrics.
    Source
    Information science in transition, Ed.: A. Gilchrist
  5. Thelwall, M.: Webometrics (2009) 0.02
    0.015007389 = product of:
      0.06753325 = sum of:
        0.017696522 = weight(_text_:information in 3906) [ClassicSimilarity], result of:
          0.017696522 = score(doc=3906,freq=10.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.2602176 = fieldWeight in 3906, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3906)
        0.049836725 = weight(_text_:techniques in 3906) [ClassicSimilarity], result of:
          0.049836725 = score(doc=3906,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.2920283 = fieldWeight in 3906, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.046875 = fieldNorm(doc=3906)
      0.22222222 = coord(2/9)
    
    Abstract
    Webometrics is an information science field concerned with measuring aspects of the World Wide Web (WWW) for a variety of information science research goals. It came into existence about five years after the Web was formed and has since grown to become a significant aspect of information science, at least in terms of published research. Although some webometrics research has focused on the structure or evolution of the Web itself or the performance of commercial search engines, most has used data from the Web to shed light on information provision or online communication in various contexts. Most prominently, techniques have been developed to track, map, and assess Web-based informal scholarly communication, for example, in terms of the hyperlinks between academic Web sites or the online impact of digital repositories. In addition, a range of nonacademic issues and groups of Web users have also been analyzed.
    Source
    Encyclopedia of library and information sciences. 3rd ed. Ed.: M.J. Bates
  6. Thelwall, M.; Li, X.; Barjak, F.; Robinson, S.: Assessing the international web connectivity of research groups (2008) 0.01
    0.014517388 = product of:
      0.06532825 = sum of:
        0.0065951035 = weight(_text_:information in 1401) [ClassicSimilarity], result of:
          0.0065951035 = score(doc=1401,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.09697737 = fieldWeight in 1401, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1401)
        0.058733147 = weight(_text_:techniques in 1401) [ClassicSimilarity], result of:
          0.058733147 = score(doc=1401,freq=4.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.34415868 = fieldWeight in 1401, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1401)
      0.22222222 = coord(2/9)
    
    Abstract
    Purpose - The purpose of this paper is to claim that it is useful to assess the web connectivity of research groups, describe hyperlink-based techniques to achieve this and present brief details of European life sciences research groups as a case study. Design/methodology/approach - A commercial search engine was harnessed to deliver hyperlink data via its automatic query submission interface. A special purpose link analysis tool, LexiURL, then summarised and graphed the link data in appropriate ways. Findings - Webometrics can provide a wide range of descriptive information about the international connectivity of research groups. Research limitations/implications - Only one field was analysed, data was taken from only one search engine, and the results were not validated. Practical implications - Web connectivity seems to be particularly important for attracting overseas job applicants and to promote research achievements and capabilities, and hence we contend that it can be useful for national and international governments to use webometrics to ensure that the web is being used effectively by research groups. Originality/value - This is the first paper to make a case for the value of using a range of webometric techniques to evaluate the web presences of research groups within a field, and possibly the first "applied" webometrics study produced for an external contract.
  7. Thelwall, M.; Wilkinson, D.: Graph structure in three national academic Webs : power laws with anomalies (2003) 0.01
    0.012833523 = product of:
      0.05775085 = sum of:
        0.007914125 = weight(_text_:information in 1681) [ClassicSimilarity], result of:
          0.007914125 = score(doc=1681,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.116372846 = fieldWeight in 1681, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1681)
        0.049836725 = weight(_text_:techniques in 1681) [ClassicSimilarity], result of:
          0.049836725 = score(doc=1681,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.2920283 = fieldWeight in 1681, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.046875 = fieldNorm(doc=1681)
      0.22222222 = coord(2/9)
    
    Abstract
    The graph structures of three national university publicly indexable Webs from Australia, New Zealand, and the UK were analyzed. Strong scale-free regularities for page indegrees, outdegrees, and connected component sizes were in evidence, resulting in power laws similar to those previously identified for individual university Web sites and for the AItaVista-indexed Web. Anomalies were also discovered in most distributions and were tracked down to root causes. As a result, resource driven Web sites and automatically generated pages were identified as representing a significant break from the assumptions of previous power law models. It follows that attempts to track average Web linking behavior would benefit from using techniques to minimize or eliminate the impact of such anomalies.
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.8, S.706-712
  8. Thelwall, M.; Prabowo, R.; Fairclough, R.: Are raw RSS feeds suitable for broad issue scanning? : a science concern case study (2006) 0.01
    0.012506157 = product of:
      0.056277707 = sum of:
        0.0147471 = weight(_text_:information in 6116) [ClassicSimilarity], result of:
          0.0147471 = score(doc=6116,freq=10.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.21684799 = fieldWeight in 6116, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6116)
        0.04153061 = weight(_text_:techniques in 6116) [ClassicSimilarity], result of:
          0.04153061 = score(doc=6116,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.24335694 = fieldWeight in 6116, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6116)
      0.22222222 = coord(2/9)
    
    Abstract
    Broad issue scanning is the task of identifying important public debates arising in a given broad issue; really simple syndication (RSS) feeds are a natural information source for investigating broad issues. RSS, as originally conceived, is a method for publishing timely and concise information on the Internet, for example, about the main stories in a news site or the latest postings in a blog. RSS feeds are potentially a nonintrusive source of high-quality data about public opinion: Monitoring a large number may allow quantitative methods to extract information relevant to a given need. In this article we describe an RSS feed-based coword frequency method to identify bursts of discussion relevant to a given broad issue. A case study of public science concerns is used to demonstrate the method and assess the suitability of raw RSS feeds for broad issue scanning (i.e., without data cleansing). An attempt to identify genuine science concern debates from the corpus through investigating the top 1,000 "burst" words found only two genuine debates, however. The low success rate was mainly caused by a few pathological feeds that dominated the results and obscured any significant debates. The results point to the need to develop effective data cleansing procedures for RSS feeds, particularly if there is not a large quantity of discussion about the broad issue, and a range of potential techniques is suggested. Finally, the analysis confirmed that the time series information generated by real-time monitoring of RSS feeds could usefully illustrate the evolution of new debates relevant to a broad issue.
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.12, S.1644-1654
  9. Thelwall, M.; Vaughan, L.; Björneborn, L.: Webometrics (2004) 0.01
    0.011301666 = product of:
      0.050857496 = sum of:
        0.009326885 = weight(_text_:information in 4279) [ClassicSimilarity], result of:
          0.009326885 = score(doc=4279,freq=4.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.13714671 = fieldWeight in 4279, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4279)
        0.04153061 = weight(_text_:techniques in 4279) [ClassicSimilarity], result of:
          0.04153061 = score(doc=4279,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.24335694 = fieldWeight in 4279, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4279)
      0.22222222 = coord(2/9)
    
    Abstract
    Webometrics, the quantitative study of Web-related phenomena, emerged from the realization that methods originally designed for bibliometric analysis of scientific journal article citation patterns could be applied to the Web, with commercial search engines providing the raw data. Almind and Ingwersen (1997) defined the field and gave it its name. Other pioneers included Rodriguez Gairin (1997) and Aguillo (1998). Larson (1996) undertook exploratory link structure analysis, as did Rousseau (1997). Webometrics encompasses research from fields beyond information science such as communication studies, statistical physics, and computer science. In this review we concentrate on link analysis, but also cover other aspects of webometrics, including Web log fle analysis. One theme that runs through this chapter is the messiness of Web data and the need for data cleansing heuristics. The uncontrolled Web creates numerous problems in the interpretation of results, for instance, from the automatic creation or replication of links. The loose connection between top-level domain specifications (e.g., com, edu, and org) and their actual content is also a frustrating problem. For example, many .com sites contain noncommercial content, although com is ostensibly the main commercial top-level domain. Indeed, a skeptical researcher could claim that obstacles of this kind are so great that all Web analyses lack value. As will be seen, one response to this view, a view shared by critics of evaluative bibliometrics, is to demonstrate that Web data correlate significantly with some non-Web data in order to prove that the Web data are not wholly random. A practical response has been to develop increasingly sophisticated data cleansing techniques and multiple data analysis methods.
    Source
    Annual review of information science and technology. 39(2005), S.81-138
  10. Thelwall, M.; Wilkinson, D.: Finding similar academic Web sites with links, bibliometric couplings and colinks (2004) 0.01
    0.010430987 = product of:
      0.04693944 = sum of:
        0.013707667 = weight(_text_:information in 2571) [ClassicSimilarity], result of:
          0.013707667 = score(doc=2571,freq=6.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.20156369 = fieldWeight in 2571, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2571)
        0.033231772 = weight(_text_:retrieval in 2571) [ClassicSimilarity], result of:
          0.033231772 = score(doc=2571,freq=4.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.2835858 = fieldWeight in 2571, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2571)
      0.22222222 = coord(2/9)
    
    Abstract
    A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or sites that are similar in content. In this paper we assess the extent to which links, colinks and couplings can be used to identify similar Web sites. As an experiment, a random sample of 500 pairs of domains from the UK academic Web were taken and human assessments of site similarity, based upon content type, were compared against ratings for the three concepts. The results show that using a combination of all three gives the highest probability of identifying similar sites, but surprisingly this was only a marginal improvement over using links alone. Another unexpected result was that high values for either colink counts or couplings were associated with only a small increased likelihood of similarity. The principal advantage of using couplings and colinks was found to be greater coverage in terms of a much larger number of pairs of sites being connected by these measures, instead of increased probability of similarity. In information retrieval terminology, this is improved recall rather than improved precision.
    Source
    Information processing and management. 40(2004) no.3, S.515-526
  11. Thelwall, M.: Assessing web search engines : a webometric approach (2011) 0.01
    0.009872008 = product of:
      0.044424035 = sum of:
        0.011192262 = weight(_text_:information in 10) [ClassicSimilarity], result of:
          0.011192262 = score(doc=10,freq=4.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.16457605 = fieldWeight in 10, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=10)
        0.033231772 = weight(_text_:retrieval in 10) [ClassicSimilarity], result of:
          0.033231772 = score(doc=10,freq=4.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.2835858 = fieldWeight in 10, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=10)
      0.22222222 = coord(2/9)
    
    Abstract
    Information Retrieval (IR) research typically evaluates search systems in terms of the standard precision, recall and F-measures to weight the relative importance of precision and recall (e.g. van Rijsbergen, 1979). All of these assess the extent to which the system returns good matches for a query. In contrast, webometric measures are designed specifically for web search engines and are designed to monitor changes in results over time and various aspects of the internal logic of the way in which search engine select the results to be returned. This chapter introduces a range of webometric measurements and illustrates them with case studies of Google, Bing and Yahoo! This is a very fertile area for simple and complex new investigations into search engine results.
    Source
    Innovations in information retrieval: perspectives for theory and practice. Eds.: A. Foster, u. P. Rafferty
  12. Zuccala, A.; Thelwall, M.; Oppenheim, C.; Dhiensa, R.: Web intelligence analyses of digital libraries : a case study of the National electronic Library for Health (NeLH) (2007) 0.01
    0.008555682 = product of:
      0.03850057 = sum of:
        0.005276083 = weight(_text_:information in 838) [ClassicSimilarity], result of:
          0.005276083 = score(doc=838,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.0775819 = fieldWeight in 838, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=838)
        0.033224486 = weight(_text_:techniques in 838) [ClassicSimilarity], result of:
          0.033224486 = score(doc=838,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.19468555 = fieldWeight in 838, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.03125 = fieldNorm(doc=838)
      0.22222222 = coord(2/9)
    
    Abstract
    Purpose - The purpose of this paper is to explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the National electronic Library for Health (NeLH). Design/methodology/approach - The Web intelligence techniques in this study are a combination of link analysis (web structure mining), web server log file analysis (web usage mining), and text analysis (web content mining), utilizing the power of commercial search engines and drawing upon the information science fields of bibliometrics and webometrics. LexiURL is a computer program designed to calculate summary statistics for lists of links or URLs. Its output is a series of standard reports, for example listing and counting all of the different domain names in the data. Findings - Link data, when analysed together with user transaction log files (i.e. Web referring domains) can provide insights into who is using a digital library and when, and who could be using the digital library if they are "surfing" a particular part of the Web; in this case any site that is linked to or colinked with the NeLH. This study found that the NeLH was embedded in a multifaceted Web context, including many governmental, educational, commercial and organisational sites, with the most interesting being sites from the.edu domain, representing American Universities. Not many links directed to the NeLH were followed on September 25, 2005 (the date of the log file analysis and link extraction analysis), which means that users who access the digital library have been arriving at the site via only a few select links, bookmarks and search engine searches, or non-electronic sources. Originality/value - A number of studies concerning digital library users have been carried out using log file analysis as a research tool. Log files focus on real-time user transactions; while LexiURL can be used to extract links and colinks associated with a digital library's growing Web network. This Web network is not recognized often enough, and can be a useful indication of where potential users are surfing, even if they have not yet specifically visited the NeLH site.
  13. Thelwall, M.; Vaughan, L.: New versions of PageRank employing alternative Web document models (2004) 0.01
    0.008268018 = product of:
      0.03720608 = sum of:
        0.013707667 = weight(_text_:information in 674) [ClassicSimilarity], result of:
          0.013707667 = score(doc=674,freq=6.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.20156369 = fieldWeight in 674, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=674)
        0.023498412 = weight(_text_:retrieval in 674) [ClassicSimilarity], result of:
          0.023498412 = score(doc=674,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.20052543 = fieldWeight in 674, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=674)
      0.22222222 = coord(2/9)
    
    Abstract
    Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects' rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.
  14. Thelwall, M.: ¬A layered approach for investigating the topological structure of communities in the Web (2003) 0.01
    0.008226673 = product of:
      0.037020028 = sum of:
        0.009326885 = weight(_text_:information in 4450) [ClassicSimilarity], result of:
          0.009326885 = score(doc=4450,freq=4.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.13714671 = fieldWeight in 4450, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4450)
        0.027693143 = weight(_text_:retrieval in 4450) [ClassicSimilarity], result of:
          0.027693143 = score(doc=4450,freq=4.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.23632148 = fieldWeight in 4450, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4450)
      0.22222222 = coord(2/9)
    
    Abstract
    A layered approach for identifying communities in the Web is presented and explored by applying the flake exact community identification algorithm to the UK academic Web. Although community or topic identification is a common task in information retrieval, a new perspective is developed by: the application of alternative document models, shifting the focus from individual pages to aggregated collections based upon Web directories, domains and entire sites; the removal of internal site links; and the adaptation of a new fast algorithm to allow fully-automated community identification using all possible single starting points. The overall topology of the graphs in the three least-aggregated layers was first investigated and found to include a large number of isolated points but, surprisingly, with most of the remainder being in one huge connected component, exact proportions varying by layer. The community identification process then found that the number of communities far exceeded the number of topological components, indicating that community identification is a potentially useful technique, even with random starting points. Both the number and size of communities identified was dependent on the parameter of the algorithm, with very different results being obtained in each case. In conclusion, the UK academic Web is embedded with layers of non-trivial communities and, if it is not unique in this, then there is the promise of improved results for information retrieval algorithms that can exploit this additional structure, and the application of the technique directly to partially automate Web metrics tasks such as that of finding all pages related to a given subject hosted by a single country's universities.
  15. Thelwall, M.; Sud, P.: ¬A comparison of methods for collecting web citation data for academic organizations (2011) 0.01
    0.008226673 = product of:
      0.037020028 = sum of:
        0.009326885 = weight(_text_:information in 4626) [ClassicSimilarity], result of:
          0.009326885 = score(doc=4626,freq=4.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.13714671 = fieldWeight in 4626, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4626)
        0.027693143 = weight(_text_:retrieval in 4626) [ClassicSimilarity], result of:
          0.027693143 = score(doc=4626,freq=4.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.23632148 = fieldWeight in 4626, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4626)
      0.22222222 = coord(2/9)
    
    Abstract
    The primary webometric method for estimating the online impact of an organization is to count links to its website. Link counts have been available from commercial search engines for over a decade but this was set to end by early 2012 and so a replacement is needed. This article compares link counts to two alternative methods: URL citations and organization title mentions. New variations of these methods are also introduced. The three methods are compared against each other using Yahoo!. Two of the three methods (URL citations and organization title mentions) are also compared against each other using Bing. Evidence from a case study of 131 UK universities and 49 US Library and Information Science (LIS) departments suggests that Bing's Hit Count Estimates (HCEs) for popular title searches are not useful for webometric research but that Yahoo!'s HCEs for all three types of search and Bing's URL citation HCEs seem to be consistent. For exact URL counts the results of all three methods in Yahoo! and both methods in Bing are also consistent. Four types of accuracy factors are also introduced and defined: search engine coverage, search engine retrieval variation, search engine retrieval anomalies, and query polysemy.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.8, S.1488-1497
  16. Levitt, J.M.; Thelwall, M.: Citation levels and collaboration within library and information science (2009) 0.01
    0.0074008936 = product of:
      0.03330402 = sum of:
        0.0147471 = weight(_text_:information in 2734) [ClassicSimilarity], result of:
          0.0147471 = score(doc=2734,freq=10.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.21684799 = fieldWeight in 2734, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2734)
        0.01855692 = product of:
          0.03711384 = sum of:
            0.03711384 = weight(_text_:22 in 2734) [ClassicSimilarity], result of:
              0.03711384 = score(doc=2734,freq=4.0), product of:
                0.13565971 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038739666 = queryNorm
                0.27358043 = fieldWeight in 2734, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2734)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    Collaboration is a major research policy objective, but does it deliver higher quality research? This study uses citation analysis to examine the Web of Science (WoS) Information Science & Library Science subject category (IS&LS) to ascertain whether, in general, more highly cited articles are more highly collaborative than other articles. It consists of two investigations. The first investigation is a longitudinal comparison of the degree and proportion of collaboration in five strata of citation; it found that collaboration in the highest four citation strata (all in the most highly cited 22%) increased in unison over time, whereas collaboration in the lowest citation strata (un-cited articles) remained low and stable. Given that over 40% of the articles were un-cited, it seems important to take into account the differences found between un-cited articles and relatively highly cited articles when investigating collaboration in IS&LS. The second investigation compares collaboration for 35 influential information scientists; it found that their more highly cited articles on average were not more highly collaborative than their less highly cited articles. In summary, although collaborative research is conducive to high citation in general, collaboration has apparently not tended to be essential to the success of current and former elite information scientists.
    Date
    22. 3.2009 12:43:51
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.3, S.434-442
  17. Thelwall, M.; Ruschenburg, T.: Grundlagen und Forschungsfelder der Webometrie (2006) 0.01
    0.0070104287 = product of:
      0.031546928 = sum of:
        0.010552166 = weight(_text_:information in 77) [ClassicSimilarity], result of:
          0.010552166 = score(doc=77,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.1551638 = fieldWeight in 77, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=77)
        0.02099476 = product of:
          0.04198952 = sum of:
            0.04198952 = weight(_text_:22 in 77) [ClassicSimilarity], result of:
              0.04198952 = score(doc=77,freq=2.0), product of:
                0.13565971 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038739666 = queryNorm
                0.30952093 = fieldWeight in 77, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=77)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Date
    4.12.2006 12:12:22
    Source
    Information - Wissenschaft und Praxis. 57(2006) H.8, S.401-406
  18. Thelwall, M.: Web indicators for research evaluation : a practical guide (2016) 0.01
    0.0064241993 = product of:
      0.028908897 = sum of:
        0.009326885 = weight(_text_:information in 3384) [ClassicSimilarity], result of:
          0.009326885 = score(doc=3384,freq=4.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.13714671 = fieldWeight in 3384, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3384)
        0.01958201 = weight(_text_:retrieval in 3384) [ClassicSimilarity], result of:
          0.01958201 = score(doc=3384,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.16710453 = fieldWeight in 3384, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3384)
      0.22222222 = coord(2/9)
    
    Abstract
    In recent years there has been an increasing demand for research evaluation within universities and other research-based organisations. In parallel, there has been an increasing recognition that traditional citation-based indicators are not able to reflect the societal impacts of research and are slow to appear. This has led to the creation of new indicators for different types of research impact as well as timelier indicators, mainly derived from the Web. These indicators have been called altmetrics, webometrics or just web metrics. This book describes and evaluates a range of web indicators for aspects of societal or scholarly impact, discusses the theory and practice of using and evaluating web indicators for research assessment and outlines practical strategies for obtaining many web indicators. In addition to describing impact indicators for traditional scholarly outputs, such as journal articles and monographs, it also covers indicators for videos, datasets, software and other non-standard scholarly outputs. The book describes strategies to analyse web indicators for individual publications as well as to compare the impacts of groups of publications. The practical part of the book includes descriptions of how to use the free software Webometric Analyst to gather and analyse web data. This book is written for information science undergraduate and Master?s students that are learning about alternative indicators or scientometrics as well as Ph.D. students and other researchers and practitioners using indicators to help assess research impact or to study scholarly communication.
    Series
    Synthesis lectures on information concepts, retrieval, and services; 52
  19. Thelwall, M.; Maflahi, N.: Guideline references and academic citations as evidence of the clinical value of health research (2016) 0.01
    0.005986296 = product of:
      0.026938332 = sum of:
        0.011192262 = weight(_text_:information in 2856) [ClassicSimilarity], result of:
          0.011192262 = score(doc=2856,freq=4.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.16457605 = fieldWeight in 2856, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2856)
        0.01574607 = product of:
          0.03149214 = sum of:
            0.03149214 = weight(_text_:22 in 2856) [ClassicSimilarity], result of:
              0.03149214 = score(doc=2856,freq=2.0), product of:
                0.13565971 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038739666 = queryNorm
                0.23214069 = fieldWeight in 2856, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2856)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    This article introduces a new source of evidence of the value of medical-related research: citations from clinical guidelines. These give evidence that research findings have been used to inform the day-to-day practice of medical staff. To identify whether citations from guidelines can give different information from that of traditional citation counts, this article assesses the extent to which references in clinical guidelines tend to be highly cited in the academic literature and highly read in Mendeley. Using evidence from the United Kingdom, references associated with the UK's National Institute of Health and Clinical Excellence (NICE) guidelines tended to be substantially more cited than comparable articles, unless they had been published in the most recent 3 years. Citation counts also seemed to be stronger indicators than Mendeley readership altmetrics. Hence, although presence in guidelines may be particularly useful to highlight the contributions of recently published articles, for older articles citation counts may already be sufficient to recognize their contributions to health in society.
    Date
    19. 3.2016 12:22:00
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.4, S.960-966
  20. Kousha, K.; Thelwall, M.: How is science cited on the Web? : a classification of google unique Web citations (2007) 0.01
    0.005847096 = product of:
      0.026311932 = sum of:
        0.013190207 = weight(_text_:information in 586) [ClassicSimilarity], result of:
          0.013190207 = score(doc=586,freq=8.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.19395474 = fieldWeight in 586, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=586)
        0.013121725 = product of:
          0.02624345 = sum of:
            0.02624345 = weight(_text_:22 in 586) [ClassicSimilarity], result of:
              0.02624345 = score(doc=586,freq=2.0), product of:
                0.13565971 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038739666 = queryNorm
                0.19345059 = fieldWeight in 586, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=586)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    Although the analysis of citations in the scholarly literature is now an established and relatively well understood part of information science, not enough is known about citations that can be found on the Web. In particular, are there new Web types, and if so, are these trivial or potentially useful for studying or evaluating research communication? We sought evidence based upon a sample of 1,577 Web citations of the URLs or titles of research articles in 64 open-access journals from biology, physics, chemistry, and computing. Only 25% represented intellectual impact, from references of Web documents (23%) and other informal scholarly sources (2%). Many of the Web/URL citations were created for general or subject-specific navigation (45%) or for self-publicity (22%). Additional analyses revealed significant disciplinary differences in the types of Google unique Web/URL citations as well as some characteristics of scientific open-access publishing on the Web. We conclude that the Web provides access to a new and different type of citation information, one that may therefore enable us to measure different aspects of research, and the research process in particular; but to obtain good information, the different types should be separated.
    Source
    Journal of the American Society for Information Science and Technology. 58(2007) no.11, S.1631-1644