Search (75 results, page 2 of 4)

  • × author_ss:"Thelwall, M."
  1. Thelwall, M.: Conceptualizing documentation on the Web : an evaluation of different heuristic-based models for counting links between university Web sites (2002) 0.02
    0.016509846 = product of:
      0.049529534 = sum of:
        0.049529534 = product of:
          0.09905907 = sum of:
            0.09905907 = weight(_text_:web in 978) [ClassicSimilarity], result of:
              0.09905907 = score(doc=978,freq=22.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.59793836 = fieldWeight in 978, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=978)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    All known previous Web link studies have used the Web page as the primary indivisible source document for counting purposes. Arguments are presented to explain why this is not necessarily optimal and why other alternatives have the potential to produce better results. This is despite the fact that individual Web files are often the only choice if search engines are used for raw data and are the easiest basic Web unit to identify. The central issue is of defining the Web "document": that which should comprise the single indissoluble unit of coherent material. Three alternative heuristics are defined for the educational arena based upon the directory, the domain and the whole university site. These are then compared by implementing them an a set of 108 UK university institutional Web sites under the assumption that a more effective heuristic will tend to produce results that correlate more highly with institutional research productivity. It was discovered that the domain and directory models were able to successfully reduce the impact of anomalous linking behavior between pairs of Web sites, with the latter being the method of choice. Reasons are then given as to why a document model an its own cannot eliminate all anomalies in Web linking behavior. Finally, the results from all models give a clear confirmation of the very strong association between the research productivity of a UK university and the number of incoming links from its peers' Web sites.
  2. Kousha, K.; Thelwall, M.: Google Scholar citations and Google Web/URL citations : a multi-discipline exploratory analysis (2007) 0.02
    0.01574152 = product of:
      0.047224555 = sum of:
        0.047224555 = product of:
          0.09444911 = sum of:
            0.09444911 = weight(_text_:web in 337) [ClassicSimilarity], result of:
              0.09444911 = score(doc=337,freq=20.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.5701118 = fieldWeight in 337, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=337)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    We use a new data gathering method, "Web/URL citation," Web/URL and Google Scholar to compare traditional and Web-based citation patterns across multiple disciplines (biology, chemistry, physics, computing, sociology, economics, psychology, and education) based upon a sample of 1,650 articles from 108 open access (OA) journals published in 2001. A Web/URL citation of an online journal article is a Web mention of its title, URL, or both. For each discipline, except psychology, we found significant correlations between Thomson Scientific (formerly Thomson ISI, here: ISI) citations and both Google Scholar and Google Web/URL citations. Google Scholar citations correlated more highly with ISI citations than did Google Web/URL citations, indicating that the Web/URL method measures a broader type of citation phenomenon. Google Scholar citations were more numerous than ISI citations in computer science and the four social science disciplines, suggesting that Google Scholar is more comprehensive for social sciences and perhaps also when conference articles are valued and published online. We also found large disciplinary differences in the percentage overlap between ISI and Google Scholar citation sources. Finally, although we found many significant trends, there were also numerous exceptions, suggesting that replacing traditional citation sources with the Web or Google Scholar for research impact calculations would be problematic.
  3. Vaughan, L.; Thelwall, M.: Scholarly use of the Web : what are the key inducers of links to journal Web sites? (2003) 0.01
    0.014933716 = product of:
      0.044801146 = sum of:
        0.044801146 = product of:
          0.08960229 = sum of:
            0.08960229 = weight(_text_:web in 1236) [ClassicSimilarity], result of:
              0.08960229 = score(doc=1236,freq=18.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.5408555 = fieldWeight in 1236, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1236)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Web links have been studied by information scientists for at least six years but it is only in the past two that clear evidence has emerged to show that counts of links to scholarly Web spaces (universities and departments) can correlate significantly with research measures, giving some credence to their use for the investigation of scholarly communication. This paper reports an a study to investigate the factors that influence the creation of links to journal Web sites. An empirical approach is used: collecting data and testing for significant patterns. The specific questions addressed are whether site age and site content are inducers of links to a journal's Web site as measured by the ratio of link counts to Journal Impact Factors, two variables previously discovered to be related. A new methodology for data collection is also introduced that uses the Internet Archive to obtain an earliest known creation date for Web sites. The results show that both site age and site content are significant factors for the disciplines studied: library and information science, and law. Comparisons between the two fields also show disciplinary differences in Web site characteristics. Scholars and publishers should be particularly aware that richer content an a journal's Web site tends to generate links and thus the traffic to the site.
  4. Thelwall, M.; Vaughan, L.; Björneborn, L.: Webometrics (2004) 0.01
    0.014933716 = product of:
      0.044801146 = sum of:
        0.044801146 = product of:
          0.08960229 = sum of:
            0.08960229 = weight(_text_:web in 4279) [ClassicSimilarity], result of:
              0.08960229 = score(doc=4279,freq=18.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.5408555 = fieldWeight in 4279, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4279)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Webometrics, the quantitative study of Web-related phenomena, emerged from the realization that methods originally designed for bibliometric analysis of scientific journal article citation patterns could be applied to the Web, with commercial search engines providing the raw data. Almind and Ingwersen (1997) defined the field and gave it its name. Other pioneers included Rodriguez Gairin (1997) and Aguillo (1998). Larson (1996) undertook exploratory link structure analysis, as did Rousseau (1997). Webometrics encompasses research from fields beyond information science such as communication studies, statistical physics, and computer science. In this review we concentrate on link analysis, but also cover other aspects of webometrics, including Web log fle analysis. One theme that runs through this chapter is the messiness of Web data and the need for data cleansing heuristics. The uncontrolled Web creates numerous problems in the interpretation of results, for instance, from the automatic creation or replication of links. The loose connection between top-level domain specifications (e.g., com, edu, and org) and their actual content is also a frustrating problem. For example, many .com sites contain noncommercial content, although com is ostensibly the main commercial top-level domain. Indeed, a skeptical researcher could claim that obstacles of this kind are so great that all Web analyses lack value. As will be seen, one response to this view, a view shared by critics of evaluative bibliometrics, is to demonstrate that Web data correlate significantly with some non-Web data in order to prove that the Web data are not wholly random. A practical response has been to develop increasingly sophisticated data cleansing techniques and multiple data analysis methods.
  5. Thelwall, M.: Web indicators for research evaluation : a practical guide (2016) 0.01
    0.014079643 = product of:
      0.04223893 = sum of:
        0.04223893 = product of:
          0.08447786 = sum of:
            0.08447786 = weight(_text_:web in 3384) [ClassicSimilarity], result of:
              0.08447786 = score(doc=3384,freq=16.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.5099235 = fieldWeight in 3384, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3384)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    In recent years there has been an increasing demand for research evaluation within universities and other research-based organisations. In parallel, there has been an increasing recognition that traditional citation-based indicators are not able to reflect the societal impacts of research and are slow to appear. This has led to the creation of new indicators for different types of research impact as well as timelier indicators, mainly derived from the Web. These indicators have been called altmetrics, webometrics or just web metrics. This book describes and evaluates a range of web indicators for aspects of societal or scholarly impact, discusses the theory and practice of using and evaluating web indicators for research assessment and outlines practical strategies for obtaining many web indicators. In addition to describing impact indicators for traditional scholarly outputs, such as journal articles and monographs, it also covers indicators for videos, datasets, software and other non-standard scholarly outputs. The book describes strategies to analyse web indicators for individual publications as well as to compare the impacts of groups of publications. The practical part of the book includes descriptions of how to use the free software Webometric Analyst to gather and analyse web data. This book is written for information science undergraduate and Master?s students that are learning about alternative indicators or scientometrics as well as Ph.D. students and other researchers and practitioners using indicators to help assess research impact or to study scholarly communication.
  6. Payne, N.; Thelwall, M.: Mathematical models for academic webs : linear relationship or non-linear power law? (2005) 0.01
    0.0139381355 = product of:
      0.041814405 = sum of:
        0.041814405 = product of:
          0.08362881 = sum of:
            0.08362881 = weight(_text_:web in 1066) [ClassicSimilarity], result of:
              0.08362881 = score(doc=1066,freq=8.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.50479853 = fieldWeight in 1066, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1066)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Previous studies of academic web interlinking have tended to hypothesise that the relationship between the research of a university and links to or from its web site should follow a linear trend, yet the typical distribution of web data, in general, seems to be a non-linear power law. This paper assesses whether a linear trend or a power law is the most appropriate method with which to model the relationship between research and web site size or outlinks. Following linear regression, analysis of the confidence intervals for the logarithmic graphs, and analysis of the outliers, the results suggest that a linear trend is more appropriate than a non-linear power law.
  7. Thelwall, M.: Web impact factors and search engine coverage (2000) 0.01
    0.0137951765 = product of:
      0.041385528 = sum of:
        0.041385528 = product of:
          0.082771055 = sum of:
            0.082771055 = weight(_text_:web in 4539) [ClassicSimilarity], result of:
              0.082771055 = score(doc=4539,freq=6.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.49962097 = fieldWeight in 4539, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4539)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Search engines index only a proportion of the web and this proportion is not determined randomly but by following algorithms that take into account the properties that impact factors measure. A survey was conducted in order to test the coverage of search engines and to decide thether their partial coverage is indeed an obstacle to using them to calculate web impact factors. The results indicate that search engine coverage, even of large national domains is extremely uneven and would be likely to lead to misleading calculations
  8. Zuccala, A.; Thelwall, M.; Oppenheim, C.; Dhiensa, R.: Web intelligence analyses of digital libraries : a case study of the National electronic Library for Health (NeLH) (2007) 0.01
    0.0137951765 = product of:
      0.041385528 = sum of:
        0.041385528 = product of:
          0.082771055 = sum of:
            0.082771055 = weight(_text_:web in 838) [ClassicSimilarity], result of:
              0.082771055 = score(doc=838,freq=24.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.49962097 = fieldWeight in 838, product of:
                  4.8989797 = tf(freq=24.0), with freq of:
                    24.0 = termFreq=24.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03125 = fieldNorm(doc=838)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - The purpose of this paper is to explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the National electronic Library for Health (NeLH). Design/methodology/approach - The Web intelligence techniques in this study are a combination of link analysis (web structure mining), web server log file analysis (web usage mining), and text analysis (web content mining), utilizing the power of commercial search engines and drawing upon the information science fields of bibliometrics and webometrics. LexiURL is a computer program designed to calculate summary statistics for lists of links or URLs. Its output is a series of standard reports, for example listing and counting all of the different domain names in the data. Findings - Link data, when analysed together with user transaction log files (i.e. Web referring domains) can provide insights into who is using a digital library and when, and who could be using the digital library if they are "surfing" a particular part of the Web; in this case any site that is linked to or colinked with the NeLH. This study found that the NeLH was embedded in a multifaceted Web context, including many governmental, educational, commercial and organisational sites, with the most interesting being sites from the.edu domain, representing American Universities. Not many links directed to the NeLH were followed on September 25, 2005 (the date of the log file analysis and link extraction analysis), which means that users who access the digital library have been arriving at the site via only a few select links, bookmarks and search engine searches, or non-electronic sources. Originality/value - A number of studies concerning digital library users have been carried out using log file analysis as a research tool. Log files focus on real-time user transactions; while LexiURL can be used to extract links and colinks associated with a digital library's growing Web network. This Web network is not recognized often enough, and can be a useful indication of where potential users are surfing, even if they have not yet specifically visited the NeLH site.
  9. Thelwall, M.; Wilkinson, D.: Finding similar academic Web sites with links, bibliometric couplings and colinks (2004) 0.01
    0.013357123 = product of:
      0.04007137 = sum of:
        0.04007137 = product of:
          0.08014274 = sum of:
            0.08014274 = weight(_text_:web in 2571) [ClassicSimilarity], result of:
              0.08014274 = score(doc=2571,freq=10.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.48375595 = fieldWeight in 2571, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2571)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or sites that are similar in content. In this paper we assess the extent to which links, colinks and couplings can be used to identify similar Web sites. As an experiment, a random sample of 500 pairs of domains from the UK academic Web were taken and human assessments of site similarity, based upon content type, were compared against ratings for the three concepts. The results show that using a combination of all three gives the highest probability of identifying similar sites, but surprisingly this was only a marginal improvement over using links alone. Another unexpected result was that high values for either colink counts or couplings were associated with only a small increased likelihood of similarity. The principal advantage of using couplings and colinks was found to be greater coverage in terms of a much larger number of pairs of sites being connected by these measures, instead of increased probability of similarity. In information retrieval terminology, this is improved recall rather than improved precision.
  10. Thelwall, M.: Text characteristics of English language university Web sites (2005) 0.01
    0.013357123 = product of:
      0.04007137 = sum of:
        0.04007137 = product of:
          0.08014274 = sum of:
            0.08014274 = weight(_text_:web in 3463) [ClassicSimilarity], result of:
              0.08014274 = score(doc=3463,freq=10.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.48375595 = fieldWeight in 3463, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3463)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The nature of the contents of academic Web sites is of direct relevance to the new field of scientific Web intelligence, and for search engine and topic-specific crawler designers. We analyze word frequencies in national academic Webs using the Web sites of three Englishspeaking nations: Australia, New Zealand, and the United Kingdom. Strong regularities were found in page size and word frequency distributions, but with significant anomalies. At least 26% of pages contain no words. High frequency words include university names and acronyms, Internet terminology, and computing product names: not always words in common usage away from the Web. A minority of low frequency words are spelling mistakes, with other common types including nonwords, proper names, foreign language terms or computer science variable names. Based upon these findings, recommendations for data cleansing and filtering are made, particularly for clustering applications.
  11. Thelwall, M.: Can Google's PageRank be used to find the most important academic Web pages? (2003) 0.01
    0.013357123 = product of:
      0.04007137 = sum of:
        0.04007137 = product of:
          0.08014274 = sum of:
            0.08014274 = weight(_text_:web in 4457) [ClassicSimilarity], result of:
              0.08014274 = score(doc=4457,freq=10.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.48375595 = fieldWeight in 4457, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4457)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Google's PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems in order to test whether it is capable of identifying the most important Web pages. The results are also compared with simple inlink counts. It was discovered that the highest inlinked pages do not always have the highest PageRank, indicating that the two metrics are genuinely different, even for the top pages. More significantly, however, internal links dominated external links for the high ranks in either method and superficial reasons accounted for high scores in both cases. It is concluded that PageRank is not useful for identifying the top pages in a site and that it must be combined with a powerful text matching techniques in order to get the quality of information retrieval results provided by Google.
  12. Shifman, L.; Thelwall, M.: Assessing global diffusion with Web memetics : the spread and evolution of a popular joke (2009) 0.01
    0.013357123 = product of:
      0.04007137 = sum of:
        0.04007137 = product of:
          0.08014274 = sum of:
            0.08014274 = weight(_text_:web in 3303) [ClassicSimilarity], result of:
              0.08014274 = score(doc=3303,freq=10.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.48375595 = fieldWeight in 3303, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3303)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Memes are small units of culture, analogous to genes, which flow from person to person by copying or imitation. More than any previous medium, the Internet has the technical capabilities for global meme diffusion. Yet, to spread globally, memes need to negotiate their way through cultural and linguistic borders. This article introduces a new broad method, Web memetics, comprising extensive Web searches and combined quantitative and qualitative analyses, to identify and assess: (a) the different versions of a meme, (b) its evolution online, and (c) its Web presence and translation into common Internet languages. This method is demonstrated through one extensively circulated joke about men, women, and computers. The results show that the joke has mutated into several different versions and is widely translated, and that translations incorporate small, local adaptations while retaining the English versions' fundamental components. In conclusion, Web memetics has demonstrated its ability to identify and track the evolution and spread of memes online, with interesting results, albeit for only one case study.
  13. Thelwall, M.: Extracting macroscopic information from Web links (2001) 0.01
    0.012193328 = product of:
      0.03657998 = sum of:
        0.03657998 = product of:
          0.07315996 = sum of:
            0.07315996 = weight(_text_:web in 6851) [ClassicSimilarity], result of:
              0.07315996 = score(doc=6851,freq=12.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.4416067 = fieldWeight in 6851, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6851)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Much has been written about the potential and pitfalls of macroscopic Web-based link analysis, yet there have been no studies that have provided clear statistical evidence that any of the proposed calculations can produce results over large areas of the Web that correlate with phenomena external to the Internet. This article attempts to provide such evidence through an evaluation of Ingwersen's (1998) proposed external Web Impact Factor (WIF) for the original use of the Web: the interlinking of academic research. In particular, it studies the case of the relationship between academic hyperlinks and research activity for universities in Britain, a country chosen for its variety of institutions and the existence of an official government rating exercise for research. After reviewing the numerous reasons why link counts may be unreliable, it demonstrates that four different WIFs do, in fact, correlate with the conventional academic research measures. The WIF delivering the greatest correlation with research rankings was the ratio of Web pages with links pointing at research-based pages to faculty numbers. The scarcity of links to electronic academic papers in the data set suggests that, in contrast to citation analysis, this WIF is measuring the reputations of universities and their scholars, rather than the quality of their publications
  14. Thelwall, M.: ¬A layered approach for investigating the topological structure of communities in the Web (2003) 0.01
    0.012193328 = product of:
      0.03657998 = sum of:
        0.03657998 = product of:
          0.07315996 = sum of:
            0.07315996 = weight(_text_:web in 4450) [ClassicSimilarity], result of:
              0.07315996 = score(doc=4450,freq=12.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.4416067 = fieldWeight in 4450, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4450)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    A layered approach for identifying communities in the Web is presented and explored by applying the flake exact community identification algorithm to the UK academic Web. Although community or topic identification is a common task in information retrieval, a new perspective is developed by: the application of alternative document models, shifting the focus from individual pages to aggregated collections based upon Web directories, domains and entire sites; the removal of internal site links; and the adaptation of a new fast algorithm to allow fully-automated community identification using all possible single starting points. The overall topology of the graphs in the three least-aggregated layers was first investigated and found to include a large number of isolated points but, surprisingly, with most of the remainder being in one huge connected component, exact proportions varying by layer. The community identification process then found that the number of communities far exceeded the number of topological components, indicating that community identification is a potentially useful technique, even with random starting points. Both the number and size of communities identified was dependent on the parameter of the algorithm, with very different results being obtained in each case. In conclusion, the UK academic Web is embedded with layers of non-trivial communities and, if it is not unique in this, then there is the promise of improved results for information retrieval algorithms that can exploit this additional structure, and the application of the technique directly to partially automate Web metrics tasks such as that of finding all pages related to a given subject hosted by a single country's universities.
  15. Barjak, F.; Thelwall, M.: ¬A statistical analysis of the web presences of European life sciences research teams (2008) 0.01
    0.012193328 = product of:
      0.03657998 = sum of:
        0.03657998 = product of:
          0.07315996 = sum of:
            0.07315996 = weight(_text_:web in 1383) [ClassicSimilarity], result of:
              0.07315996 = score(doc=1383,freq=12.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.4416067 = fieldWeight in 1383, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1383)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Web links have been used for around ten years to explore the online impact of academic information and information producers. Nevertheless, few studies have attempted to relate link counts to relevant offline attributes of the owners of the targeted Web sites, with the exception of research productivity. This article reports the results of a study to relate site inlink counts to relevant owner characteristics for over 400 European life-science research group Web sites. The analysis confirmed that research-group size and Web-presence size were important for attracting Web links, although research productivity was not. Little evidence was found for significant influence of any of an array of factors, including research-group leader gender and industry connections. In addition, the choice of search engine for link data created a surprising international difference in the results, with Google perhaps giving unreliable results. Overall, the data collection, statistical analysis and results interpretation were all complex and it seems that we still need to know more about search engines, hyperlinks, and their function in science before we can draw conclusions on their usefulness and role in the canon of science and technology indicators.
  16. Kousha, K.; Thelwall, M.: Disseminating research with web CV hyperlinks (2014) 0.01
    0.012193328 = product of:
      0.03657998 = sum of:
        0.03657998 = product of:
          0.07315996 = sum of:
            0.07315996 = weight(_text_:web in 1331) [ClassicSimilarity], result of:
              0.07315996 = score(doc=1331,freq=12.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.4416067 = fieldWeight in 1331, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1331)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Some curricula vitae (web CVs) of academics on the web, including homepages and publication lists, link to open-access (OA) articles, resources, abstracts in publishers' websites, or academic discussions, helping to disseminate research. To assess how common such practices are and whether they vary by discipline, gender, and country, the authors conducted a large-scale e-mail survey of astronomy and astrophysics, public health, environmental engineering, and philosophy across 15 European countries and analyzed hyperlinks from web CVs of academics. About 60% of the 2,154 survey responses reported having a web CV or something similar, and there were differences between disciplines, genders, and countries. A follow-up outlink analysis of 2,700 web CVs found that a third had at least one outlink to an OA target, typically a public eprint archive or an individual self-archived file. This proportion was considerably higher in astronomy (48%) and philosophy (37%) than in environmental engineering (29%) and public health (21%). There were also differences in linking to publishers' websites, resources, and discussions. Perhaps most important, however, the amount of linking to OA publications seems to be much lower than allowed by publishers and journals, suggesting that many opportunities for disseminating full-text research online are being missed, especially in disciplines without established repositories. Moreover, few academics seem to be exploiting their CVs to link to discussions, resources, or article abstracts, which seems to be another missed opportunity for publicizing research.
  17. Price, L.; Thelwall, M.: ¬The clustering power of low frequency words in academic webs (2005) 0.01
    0.012070779 = product of:
      0.036212336 = sum of:
        0.036212336 = product of:
          0.07242467 = sum of:
            0.07242467 = weight(_text_:web in 3561) [ClassicSimilarity], result of:
              0.07242467 = score(doc=3561,freq=6.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.43716836 = fieldWeight in 3561, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3561)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The value of low frequency words for subject-based academic Web site clustering is assessed. A new technique is introduced to compare the relative clustering power of different vocabularies. The technique is designed for word frequency tests in large document clustering exercises. Results for the Australian and New Zealand academic Web spaces indicate that low frequency words are useful for clustering academic Web sites along subject lines; removing low frequency words results in sites becoming, an average, less dissimilar to sites from other subjects.
  18. Thelwall, M.; Stuart, D.: Web crawling ethics revisited : cost, privacy, and denial of service (2006) 0.01
    0.012070779 = product of:
      0.036212336 = sum of:
        0.036212336 = product of:
          0.07242467 = sum of:
            0.07242467 = weight(_text_:web in 6098) [ClassicSimilarity], result of:
              0.07242467 = score(doc=6098,freq=6.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.43716836 = fieldWeight in 6098, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=6098)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Ethical aspects of the employment of Web crawlers for information science research and other contexts are reviewed. The difference between legal and ethical uses of communications technologies is emphasized as well as the changing boundary between ethical and unethical conduct. A review of the potential impacts on Web site owners is used to underpin a new framework for ethical crawling, and it is argued that delicate human judgment is required for each individual case, with verdicts likely to change over time. Decisions can be based upon an approximate cost-benefit analysis, but it is crucial that crawler owners find out about the technological issues affecting the owners of the sites being crawled in order to produce an informed assessment.
  19. Vaughan, L.; Thelwall, M.: ¬A modelling approach to uncover hyperlink patterns : the case of Canadian universities (2005) 0.01
    0.012070779 = product of:
      0.036212336 = sum of:
        0.036212336 = product of:
          0.07242467 = sum of:
            0.07242467 = weight(_text_:web in 1014) [ClassicSimilarity], result of:
              0.07242467 = score(doc=1014,freq=6.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.43716836 = fieldWeight in 1014, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1014)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Hyperlink patterns between Canadian university Web sites were analyzed by a mathematical modeling approach. A multiple regression model was developed which shows that faculty quality and the language of the university are important predictors for links to a university Web site. Higher faculty quality means more links. French universities received lower numbers of links to their Web sites than comparable English universities. Analysis of interlinking between pairs of universities also showed that English universities are advantaged. Universities are more likely to link to each other when the geographical distance between them is less than 3000 km, possibly reflecting the east vs. west divide that exists in Canadian society.
  20. Thelwall, M.; Wilkinson, D.: Graph structure in three national academic Webs : power laws with anomalies (2003) 0.01
    0.011946972 = product of:
      0.035840917 = sum of:
        0.035840917 = product of:
          0.071681835 = sum of:
            0.071681835 = weight(_text_:web in 1681) [ClassicSimilarity], result of:
              0.071681835 = score(doc=1681,freq=8.0), product of:
                0.1656677 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.050763648 = queryNorm
                0.43268442 = fieldWeight in 1681, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1681)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The graph structures of three national university publicly indexable Webs from Australia, New Zealand, and the UK were analyzed. Strong scale-free regularities for page indegrees, outdegrees, and connected component sizes were in evidence, resulting in power laws similar to those previously identified for individual university Web sites and for the AItaVista-indexed Web. Anomalies were also discovered in most distributions and were tracked down to root causes. As a result, resource driven Web sites and automatically generated pages were identified as representing a significant break from the assumptions of previous power law models. It follows that attempts to track average Web linking behavior would benefit from using techniques to minimize or eliminate the impact of such anomalies.