Search (43 results, page 1 of 3)

  • × author_ss:"Thelwall, M."
  1. Thelwall, M.; Wilkinson, D.: Finding similar academic Web sites with links, bibliometric couplings and colinks (2004) 0.07
    0.06716367 = product of:
      0.201491 = sum of:
        0.201491 = weight(_text_:sites in 2571) [ClassicSimilarity], result of:
          0.201491 = score(doc=2571,freq=10.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.7749018 = fieldWeight in 2571, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=2571)
      0.33333334 = coord(1/3)
    
    Abstract
    A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or sites that are similar in content. In this paper we assess the extent to which links, colinks and couplings can be used to identify similar Web sites. As an experiment, a random sample of 500 pairs of domains from the UK academic Web were taken and human assessments of site similarity, based upon content type, were compared against ratings for the three concepts. The results show that using a combination of all three gives the highest probability of identifying similar sites, but surprisingly this was only a marginal improvement over using links alone. Another unexpected result was that high values for either colink counts or couplings were associated with only a small increased likelihood of similarity. The principal advantage of using couplings and colinks was found to be greater coverage in terms of a much larger number of pairs of sites being connected by these measures, instead of increased probability of similarity. In information retrieval terminology, this is improved recall rather than improved precision.
  2. Thelwall, M.; Buckley, K.; Paltoglou, G.; Cai, D.; Kappas, A.: Sentiment strength detection in short informal text (2010) 0.06
    0.06129259 = product of:
      0.09193888 = sum of:
        0.075091265 = weight(_text_:sites in 4200) [ClassicSimilarity], result of:
          0.075091265 = score(doc=4200,freq=2.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.28878886 = fieldWeight in 4200, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4200)
        0.016847622 = product of:
          0.033695243 = sum of:
            0.033695243 = weight(_text_:22 in 4200) [ClassicSimilarity], result of:
              0.033695243 = score(doc=4200,freq=2.0), product of:
                0.1741801 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049739745 = queryNorm
                0.19345059 = fieldWeight in 4200, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4200)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    A huge number of informal messages are posted every day in social network sites, blogs, and discussion forums. Emotions seem to be frequently important in these texts for expressing friendship, showing social support or as part of online arguments. Algorithms to identify sentiment and sentiment strength are needed to help understand the role of emotion in this informal communication and also to identify inappropriate or anomalous affective utterances, potentially associated with threatening behavior to the self or others. Nevertheless, existing sentiment detection algorithms tend to be commercially oriented, designed to identify opinions about products rather than user behaviors. This article partly fills this gap with a new algorithm, SentiStrength, to extract sentiment strength from informal English text, using new methods to exploit the de facto grammars and spelling styles of cyberspace. Applied to MySpace comments and with a lookup table of term sentiment strengths optimized by machine learning, SentiStrength is able to predict positive emotion with 60.6% accuracy and negative emotion with 72.8% accuracy, both based upon strength scales of 1-5. The former, but not the latter, is better than baseline and a wide range of general machine learning approaches.
    Date
    22. 1.2011 14:29:23
  3. Thelwall, M.; Sud, P.; Wilkinson, D.: Link and co-inlink network diagrams with URL citations or title mentions (2012) 0.06
    0.06129259 = product of:
      0.09193888 = sum of:
        0.075091265 = weight(_text_:sites in 57) [ClassicSimilarity], result of:
          0.075091265 = score(doc=57,freq=2.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.28878886 = fieldWeight in 57, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=57)
        0.016847622 = product of:
          0.033695243 = sum of:
            0.033695243 = weight(_text_:22 in 57) [ClassicSimilarity], result of:
              0.033695243 = score(doc=57,freq=2.0), product of:
                0.1741801 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049739745 = queryNorm
                0.19345059 = fieldWeight in 57, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=57)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Webometric network analyses have been used to map the connectivity of groups of websites to identify clusters, important sites or overall structure. Such analyses have mainly been based upon hyperlink counts, the number of hyperlinks between a pair of websites, although some have used title mentions or URL citations instead. The ability to automatically gather hyperlink counts from Yahoo! ceased in April 2011 and the ability to manually gather such counts was due to cease by early 2012, creating a need for alternatives. This article assesses URL citations and title mentions as possible replacements for hyperlinks in both binary and weighted direct link and co-inlink network diagrams. It also assesses three different types of data for the network connections: hit count estimates, counts of matching URLs, and filtered counts of matching URLs. Results from analyses of U.S. library and information science departments and U.K. universities give evidence that metrics based upon URLs or titles can be appropriate replacements for metrics based upon hyperlinks for both binary and weighted networks, although filtered counts of matching URLs are necessary to give the best results for co-title mention and co-URL citation network diagrams.
    Date
    6. 4.2012 18:16:22
  4. Price, L.; Thelwall, M.: ¬The clustering power of low frequency words in academic webs (2005) 0.06
    0.060695544 = product of:
      0.18208663 = sum of:
        0.18208663 = weight(_text_:sites in 3561) [ClassicSimilarity], result of:
          0.18208663 = score(doc=3561,freq=6.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.7002758 = fieldWeight in 3561, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3561)
      0.33333334 = coord(1/3)
    
    Abstract
    The value of low frequency words for subject-based academic Web site clustering is assessed. A new technique is introduced to compare the relative clustering power of different vocabularies. The technique is designed for word frequency tests in large document clustering exercises. Results for the Australian and New Zealand academic Web spaces indicate that low frequency words are useful for clustering academic Web sites along subject lines; removing low frequency words results in sites becoming, an average, less dissimilar to sites from other subjects.
  5. Vaughan, L.; Thelwall, M.: Search engine coverage bias : evidence and possible causes (2004) 0.06
    0.060073014 = product of:
      0.18021904 = sum of:
        0.18021904 = weight(_text_:sites in 2536) [ClassicSimilarity], result of:
          0.18021904 = score(doc=2536,freq=8.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.6930933 = fieldWeight in 2536, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=2536)
      0.33333334 = coord(1/3)
    
    Abstract
    Commercial search engines are now playing an increasingly important role in Web information dissemination and access. Of particular interest to business and national governments is whether the big engines have coverage biased towards the US or other countries. In our study we tested for national biases in three major search engines and found significant differences in their coverage of commercial Web sites. The US sites were much better covered than the others in the study: sites from China, Taiwan and Singapore. We then examined the possible technical causes of the differences and found that the language of a site does not affect its coverage by search engines. However, the visibility of a site, measured by the number of links to it, affects its chance to be covered by search engines. We conclude that the coverage bias does exist but this is due not to deliberate choices of the search engines but occurs as a natural result of cumulative advantage effects of US sites on the Web. Nevertheless, the bias remains a cause for international concern.
  6. Thelwall, M.: Text characteristics of English language university Web sites (2005) 0.05
    0.05202476 = product of:
      0.15607427 = sum of:
        0.15607427 = weight(_text_:sites in 3463) [ClassicSimilarity], result of:
          0.15607427 = score(doc=3463,freq=6.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.6002364 = fieldWeight in 3463, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=3463)
      0.33333334 = coord(1/3)
    
    Abstract
    The nature of the contents of academic Web sites is of direct relevance to the new field of scientific Web intelligence, and for search engine and topic-specific crawler designers. We analyze word frequencies in national academic Webs using the Web sites of three Englishspeaking nations: Australia, New Zealand, and the United Kingdom. Strong regularities were found in page size and word frequency distributions, but with significant anomalies. At least 26% of pages contain no words. High frequency words include university names and acronyms, Internet terminology, and computing product names: not always words in common usage away from the Web. A minority of low frequency words are spelling mistakes, with other common types including nonwords, proper names, foreign language terms or computer science variable names. Based upon these findings, recommendations for data cleansing and filtering are made, particularly for clustering applications.
  7. Thelwall, M.: Conceptualizing documentation on the Web : an evaluation of different heuristic-based models for counting links between university Web sites (2002) 0.05
    0.050060846 = product of:
      0.15018253 = sum of:
        0.15018253 = weight(_text_:sites in 978) [ClassicSimilarity], result of:
          0.15018253 = score(doc=978,freq=8.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.5775777 = fieldWeight in 978, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=978)
      0.33333334 = coord(1/3)
    
    Abstract
    All known previous Web link studies have used the Web page as the primary indivisible source document for counting purposes. Arguments are presented to explain why this is not necessarily optimal and why other alternatives have the potential to produce better results. This is despite the fact that individual Web files are often the only choice if search engines are used for raw data and are the easiest basic Web unit to identify. The central issue is of defining the Web "document": that which should comprise the single indissoluble unit of coherent material. Three alternative heuristics are defined for the educational arena based upon the directory, the domain and the whole university site. These are then compared by implementing them an a set of 108 UK university institutional Web sites under the assumption that a more effective heuristic will tend to produce results that correlate more highly with institutional research productivity. It was discovered that the domain and directory models were able to successfully reduce the impact of anomalous linking behavior between pairs of Web sites, with the latter being the method of choice. Reasons are then given as to why a document model an its own cannot eliminate all anomalies in Web linking behavior. Finally, the results from all models give a clear confirmation of the very strong association between the research productivity of a UK university and the number of incoming links from its peers' Web sites.
  8. Thelwall, M.; Harries, G.: Do the Web Sites of Higher Rated Scholars Have Significantly More Online Impact? (2004) 0.05
    0.050060846 = product of:
      0.15018253 = sum of:
        0.15018253 = weight(_text_:sites in 2123) [ClassicSimilarity], result of:
          0.15018253 = score(doc=2123,freq=8.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.5775777 = fieldWeight in 2123, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2123)
      0.33333334 = coord(1/3)
    
    Abstract
    The quality and impact of academic Web sites is of interest to many audiences, including the scholars who use them and Web educators who need to identify best practice. Several large-scale European Union research projects have been funded to build new indicators for online scientific activity, reflecting recognition of the importance of the Web for scholarly communication. In this paper we address the key question of whether higher rated scholars produce higher impact Web sites, using the United Kingdom as a case study and measuring scholars' quality in terms of university-wide average research ratings. Methodological issues concerning the measurement of the online impact are discussed, leading to the adoption of counts of links to a university's constituent single domain Web sites from an aggregated counting metric. The findings suggest that universities with higher rated scholars produce significantly more Web content but with a similar average online impact. Higher rated scholars therefore attract more total links from their peers, but only by being more prolific, refuting earlier suggestions. It can be surmised that general Web publications are very different from scholarly journal articles and conference papers, for which scholarly quality does associate with citation impact. This has important implications for the construction of new Web indicators, for example that online impact should not be used to assess the quality of small groups of scholars, even within a single discipline.
  9. Vaughan, L.; Thelwall, M.: ¬A modelling approach to uncover hyperlink patterns : the case of Canadian universities (2005) 0.05
    0.04955771 = product of:
      0.14867312 = sum of:
        0.14867312 = weight(_text_:sites in 1014) [ClassicSimilarity], result of:
          0.14867312 = score(doc=1014,freq=4.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.57177275 = fieldWeight in 1014, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1014)
      0.33333334 = coord(1/3)
    
    Abstract
    Hyperlink patterns between Canadian university Web sites were analyzed by a mathematical modeling approach. A multiple regression model was developed which shows that faculty quality and the language of the university are important predictors for links to a university Web site. Higher faculty quality means more links. French universities received lower numbers of links to their Web sites than comparable English universities. Analysis of interlinking between pairs of universities also showed that English universities are advantaged. Universities are more likely to link to each other when the geographical distance between them is less than 3000 km, possibly reflecting the east vs. west divide that exists in Canadian society.
  10. Vaughan, L.; Thelwall, M.: Scholarly use of the Web : what are the key inducers of links to journal Web sites? (2003) 0.04
    0.04335396 = product of:
      0.13006188 = sum of:
        0.13006188 = weight(_text_:sites in 1236) [ClassicSimilarity], result of:
          0.13006188 = score(doc=1236,freq=6.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.500197 = fieldWeight in 1236, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1236)
      0.33333334 = coord(1/3)
    
    Abstract
    Web links have been studied by information scientists for at least six years but it is only in the past two that clear evidence has emerged to show that counts of links to scholarly Web spaces (universities and departments) can correlate significantly with research measures, giving some credence to their use for the investigation of scholarly communication. This paper reports an a study to investigate the factors that influence the creation of links to journal Web sites. An empirical approach is used: collecting data and testing for significant patterns. The specific questions addressed are whether site age and site content are inducers of links to a journal's Web site as measured by the ratio of link counts to Journal Impact Factors, two variables previously discovered to be related. A new methodology for data collection is also introduced that uses the Internet Archive to obtain an earliest known creation date for Web sites. The results show that both site age and site content are significant factors for the disciplines studied: library and information science, and law. Comparisons between the two fields also show disciplinary differences in Web site characteristics. Scholars and publishers should be particularly aware that richer content an a journal's Web site tends to generate links and thus the traffic to the site.
  11. Thelwall, M.: ¬A comparison of link and URL citation counting (2011) 0.04
    0.04335396 = product of:
      0.13006188 = sum of:
        0.13006188 = weight(_text_:sites in 4533) [ClassicSimilarity], result of:
          0.13006188 = score(doc=4533,freq=6.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.500197 = fieldWeight in 4533, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4533)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - Link analysis is an established topic within webometrics. It normally uses counts of links between sets of web sites or to sets of web sites. These link counts are derived from web crawlers or commercial search engines with the latter being the only alternative for some investigations. This paper compares link counts with URL citation counts in order to assess whether the latter could be a replacement for the former if the major search engines withdraw their advanced hyperlink search facilities. Design/methodology/approach - URL citation counts are compared with link counts for a variety of data sets used in previous webometric studies. Findings - The results show a high degree of correlation between the two but with URL citations being much less numerous, at least outside academia and business. Research limitations/implications - The results cover a small selection of 15 case studies and so the findings are only indicative. Significant differences between results indicate that the difference between link counts and URL citation counts will vary between webometric studies. Practical implications - Should link searches be withdrawn, then link analyses of less well linked non-academic, non-commercial sites would be seriously weakened, although citations based on e-mail addresses could help to make citations more numerous than links for some business and academic contexts. Originality/value - This is the first systematic study of the difference between link counts and URL citation counts in a variety of contexts and it shows that there are significant differences between the two.
  12. Thelwall, M.; Wilkinson, D.: Graph structure in three national academic Webs : power laws with anomalies (2003) 0.04
    0.042478036 = product of:
      0.1274341 = sum of:
        0.1274341 = weight(_text_:sites in 1681) [ClassicSimilarity], result of:
          0.1274341 = score(doc=1681,freq=4.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.49009097 = fieldWeight in 1681, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=1681)
      0.33333334 = coord(1/3)
    
    Abstract
    The graph structures of three national university publicly indexable Webs from Australia, New Zealand, and the UK were analyzed. Strong scale-free regularities for page indegrees, outdegrees, and connected component sizes were in evidence, resulting in power laws similar to those previously identified for individual university Web sites and for the AItaVista-indexed Web. Anomalies were also discovered in most distributions and were tracked down to root causes. As a result, resource driven Web sites and automatically generated pages were identified as representing a significant break from the assumptions of previous power law models. It follows that attempts to track average Web linking behavior would benefit from using techniques to minimize or eliminate the impact of such anomalies.
  13. Thelwall, M.; Wilkinson, D.: Public dialogs in social network sites : What is their purpose? (2010) 0.04
    0.042478036 = product of:
      0.1274341 = sum of:
        0.1274341 = weight(_text_:sites in 3327) [ClassicSimilarity], result of:
          0.1274341 = score(doc=3327,freq=4.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.49009097 = fieldWeight in 3327, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=3327)
      0.33333334 = coord(1/3)
    
    Abstract
    Social network sites (SNSs) such as MySpace and Facebook are important venues for interpersonal communication, especially among youth. One way in which members can communicate is to write public messages on each other's profile, but how is this unusual means of communication used in practice? An analysis of 2,293 public comment exchanges extracted from large samples of U.S. and U.K. MySpace members found them to be relatively rapid, but rarely used for prolonged exchanges. They seem to fulfill two purposes: making initial contact and keeping in touch occasionally such as at birthdays and other important dates. Although about half of the dialogs seem to exchange some gossip, the dialogs seem typically too short to play the role of gossip-based social grooming for typical pairs of Friends, but close Friends may still communicate extensively in SNSs with other methods.
  14. Barjak, F.; Thelwall, M.: ¬A statistical analysis of the web presences of European life sciences research teams (2008) 0.04
    0.035398364 = product of:
      0.106195085 = sum of:
        0.106195085 = weight(_text_:sites in 1383) [ClassicSimilarity], result of:
          0.106195085 = score(doc=1383,freq=4.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.40840912 = fieldWeight in 1383, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1383)
      0.33333334 = coord(1/3)
    
    Abstract
    Web links have been used for around ten years to explore the online impact of academic information and information producers. Nevertheless, few studies have attempted to relate link counts to relevant offline attributes of the owners of the targeted Web sites, with the exception of research productivity. This article reports the results of a study to relate site inlink counts to relevant owner characteristics for over 400 European life-science research group Web sites. The analysis confirmed that research-group size and Web-presence size were important for attracting Web links, although research productivity was not. Little evidence was found for significant influence of any of an array of factors, including research-group leader gender and industry connections. In addition, the choice of search engine for link data created a surprising international difference in the results, with Google perhaps giving unreliable results. Overall, the data collection, statistical analysis and results interpretation were all complex and it seems that we still need to know more about search engines, hyperlinks, and their function in science before we can draw conclusions on their usefulness and role in the canon of science and technology indicators.
  15. Thelwall, M.: Quantitative comparisons of search engine results (2008) 0.04
    0.035398364 = product of:
      0.106195085 = sum of:
        0.106195085 = weight(_text_:sites in 2350) [ClassicSimilarity], result of:
          0.106195085 = score(doc=2350,freq=4.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.40840912 = fieldWeight in 2350, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2350)
      0.33333334 = coord(1/3)
    
    Abstract
    Search engines are normally used to find information or Web sites, but Webometric investigations use them for quantitative data such as the number of pages matching a query and the international spread of those pages. For this type of application, the accuracy of the hit count estimates and range of URLs in the full results are important. Here, we compare the applications programming interfaces of Google, Yahoo!, and Live Search for 1,587 single word searches. The hit count estimates were broadly consistent but with Yahoo! and Google, reporting 5-6 times more hits than Live Search. Yahoo! tended to return slightly more matching URLs than Google, with Live Search returning significantly fewer. Yahoo!'s result URLs included a significantly wider range of domains and sites than the other two, and there was little consistency between the three engines in the number of different domains. In contrast, the three engines were reasonably consistent in the number of different top-level domains represented in the result URLs, although Yahoo! tended to return the most. In conclusion, quantitative results from the three search engines are mostly consistent but with unexpected types of inconsistency that users should be aware of. Google is recommended for hit count estimates but Yahoo! is recommended for all other Webometric purposes.
  16. Thelwall, M.; Kousha, K.: Academia.edu : Social network or Academic Network? (2014) 0.04
    0.035398364 = product of:
      0.106195085 = sum of:
        0.106195085 = weight(_text_:sites in 1234) [ClassicSimilarity], result of:
          0.106195085 = score(doc=1234,freq=4.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.40840912 = fieldWeight in 1234, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1234)
      0.33333334 = coord(1/3)
    
    Abstract
    Academic social network sites Academia.edu and ResearchGate, and reference sharing sites Mendeley, Bibsonomy, Zotero, and CiteULike, give scholars the ability to publicize their research outputs and connect with each other. With millions of users, these are a significant addition to the scholarly communication and academic information-seeking eco-structure. There is thus a need to understand the role that they play and the changes, if any, that they can make to the dynamics of academic careers. This article investigates attributes of philosophy scholars on Academia.edu, introducing a median-based, time-normalizing method to adjust for time delays in joining the site. In comparison to students, faculty tend to attract more profile views but female philosophers did not attract more profile views than did males, suggesting that academic capital drives philosophy uses of the site more than does friendship and networking. Secondary analyses of law, history, and computer science confirmed the faculty advantage (in terms of higher profile views) except for females in law and females in computer science. There was also a female advantage for both faculty and students in law and computer science as well as for history students. Hence, Academia.edu overall seems to reflect a hybrid of scholarly norms (the faculty advantage) and a female advantage that is suggestive of general social networking norms. Finally, traditional bibliometric measures did not correlate with any Academia.edu metrics for philosophers, perhaps because more senior academics use the site less extensively or because of the range informal scholarly activities that cannot be measured by bibliometric methods.
  17. Thelwall, M.; Stuart, D.: Web crawling ethics revisited : cost, privacy, and denial of service (2006) 0.04
    0.03504259 = product of:
      0.105127774 = sum of:
        0.105127774 = weight(_text_:sites in 6098) [ClassicSimilarity], result of:
          0.105127774 = score(doc=6098,freq=2.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.40430441 = fieldWeight in 6098, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6098)
      0.33333334 = coord(1/3)
    
    Abstract
    Ethical aspects of the employment of Web crawlers for information science research and other contexts are reviewed. The difference between legal and ethical uses of communications technologies is emphasized as well as the changing boundary between ethical and unethical conduct. A review of the potential impacts on Web site owners is used to underpin a new framework for ethical crawling, and it is argued that delicate human judgment is required for each individual case, with verdicts likely to change over time. Decisions can be based upon an approximate cost-benefit analysis, but it is crucial that crawler owners find out about the technological issues affecting the owners of the sites being crawled in order to produce an informed assessment.
  18. Thelwall, M.: Can Google's PageRank be used to find the most important academic Web pages? (2003) 0.03
    0.030036507 = product of:
      0.09010952 = sum of:
        0.09010952 = weight(_text_:sites in 4457) [ClassicSimilarity], result of:
          0.09010952 = score(doc=4457,freq=2.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.34654665 = fieldWeight in 4457, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=4457)
      0.33333334 = coord(1/3)
    
    Abstract
    Google's PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems in order to test whether it is capable of identifying the most important Web pages. The results are also compared with simple inlink counts. It was discovered that the highest inlinked pages do not always have the highest PageRank, indicating that the two metrics are genuinely different, even for the top pages. More significantly, however, internal links dominated external links for the high ranks in either method and superficial reasons accounted for high scores in both cases. It is concluded that PageRank is not useful for identifying the top pages in a site and that it must be combined with a powerful text matching techniques in order to get the quality of information retrieval results provided by Google.
  19. Thelwall, M.; Vaughan, L.: New versions of PageRank employing alternative Web document models (2004) 0.03
    0.030036507 = product of:
      0.09010952 = sum of:
        0.09010952 = weight(_text_:sites in 674) [ClassicSimilarity], result of:
          0.09010952 = score(doc=674,freq=2.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.34654665 = fieldWeight in 674, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=674)
      0.33333334 = coord(1/3)
    
    Abstract
    Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects' rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.
  20. Angus, E.; Thelwall, M.; Stuart, D.: General patterns of tag usage among university groups in Flickr (2008) 0.03
    0.030036507 = product of:
      0.09010952 = sum of:
        0.09010952 = weight(_text_:sites in 2554) [ClassicSimilarity], result of:
          0.09010952 = score(doc=2554,freq=2.0), product of:
            0.26002133 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.049739745 = queryNorm
            0.34654665 = fieldWeight in 2554, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=2554)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - The purpose of this research is to investigate general patterns of tag usage and determines the usefulness of the tags used within university image groups to the wider Flickr community. There has been a significant rise in the use of Web 2.0 social network web sites and online applications in recent years. One of the most popular is Flickr, an online image management application. Design/methodology/approach - This study uses a webometric data collection, classification and informetric analysis. Findings - The results show that members of university image groups tend to tag in a manner that is of use to users of the system as a whole rather than merely for the tag creator. Originality/value - This paper gives a valuable insight into the tagging practices of image groups in Flickr.