Search (26 results, page 1 of 2)

  • × year_i:[2000 TO 2010}
  • × author_ss:"Thelwall, M."
  1. Thelwall, M.: Text characteristics of English language university Web sites (2005) 0.06
    0.060694948 = product of:
      0.21243231 = sum of:
        0.12759283 = weight(_text_:sites in 3463) [ClassicSimilarity], result of:
          0.12759283 = score(doc=3463,freq=6.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.6002364 = fieldWeight in 3463, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=3463)
        0.08483948 = weight(_text_:united in 3463) [ClassicSimilarity], result of:
          0.08483948 = score(doc=3463,freq=2.0), product of:
            0.22812355 = queryWeight, product of:
              5.6101127 = idf(docFreq=439, maxDocs=44218)
              0.04066292 = queryNorm
            0.37190145 = fieldWeight in 3463, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6101127 = idf(docFreq=439, maxDocs=44218)
              0.046875 = fieldNorm(doc=3463)
      0.2857143 = coord(2/7)
    
    Abstract
    The nature of the contents of academic Web sites is of direct relevance to the new field of scientific Web intelligence, and for search engine and topic-specific crawler designers. We analyze word frequencies in national academic Webs using the Web sites of three Englishspeaking nations: Australia, New Zealand, and the United Kingdom. Strong regularities were found in page size and word frequency distributions, but with significant anomalies. At least 26% of pages contain no words. High frequency words include university names and acronyms, Internet terminology, and computing product names: not always words in common usage away from the Web. A minority of low frequency words are spelling mistakes, with other common types including nonwords, proper names, foreign language terms or computer science variable names. Based upon these findings, recommendations for data cleansing and filtering are made, particularly for clustering applications.
  2. Thelwall, M.; Harries, G.: Do the Web Sites of Higher Rated Scholars Have Significantly More Online Impact? (2004) 0.06
    0.055278808 = product of:
      0.19347581 = sum of:
        0.122776255 = weight(_text_:sites in 2123) [ClassicSimilarity], result of:
          0.122776255 = score(doc=2123,freq=8.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.5775777 = fieldWeight in 2123, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2123)
        0.070699565 = weight(_text_:united in 2123) [ClassicSimilarity], result of:
          0.070699565 = score(doc=2123,freq=2.0), product of:
            0.22812355 = queryWeight, product of:
              5.6101127 = idf(docFreq=439, maxDocs=44218)
              0.04066292 = queryNorm
            0.30991787 = fieldWeight in 2123, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6101127 = idf(docFreq=439, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2123)
      0.2857143 = coord(2/7)
    
    Abstract
    The quality and impact of academic Web sites is of interest to many audiences, including the scholars who use them and Web educators who need to identify best practice. Several large-scale European Union research projects have been funded to build new indicators for online scientific activity, reflecting recognition of the importance of the Web for scholarly communication. In this paper we address the key question of whether higher rated scholars produce higher impact Web sites, using the United Kingdom as a case study and measuring scholars' quality in terms of university-wide average research ratings. Methodological issues concerning the measurement of the online impact are discussed, leading to the adoption of counts of links to a university's constituent single domain Web sites from an aggregated counting metric. The findings suggest that universities with higher rated scholars produce significantly more Web content but with a similar average online impact. Higher rated scholars therefore attract more total links from their peers, but only by being more prolific, refuting earlier suggestions. It can be surmised that general Web publications are very different from scholarly journal articles and conference papers, for which scholarly quality does associate with citation impact. This has important implications for the construction of new Web indicators, for example that online impact should not be used to assess the quality of small groups of scholars, even within a single discipline.
  3. Angus, E.; Thelwall, M.; Stuart, D.: General patterns of tag usage among university groups in Flickr (2008) 0.03
    0.026491161 = product of:
      0.09271906 = sum of:
        0.07366575 = weight(_text_:sites in 2554) [ClassicSimilarity], result of:
          0.07366575 = score(doc=2554,freq=2.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.34654665 = fieldWeight in 2554, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=2554)
        0.01905331 = product of:
          0.03810662 = sum of:
            0.03810662 = weight(_text_:design in 2554) [ClassicSimilarity], result of:
              0.03810662 = score(doc=2554,freq=2.0), product of:
                0.15288728 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.04066292 = queryNorm
                0.24924651 = fieldWeight in 2554, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2554)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Purpose - The purpose of this research is to investigate general patterns of tag usage and determines the usefulness of the tags used within university image groups to the wider Flickr community. There has been a significant rise in the use of Web 2.0 social network web sites and online applications in recent years. One of the most popular is Flickr, an online image management application. Design/methodology/approach - This study uses a webometric data collection, classification and informetric analysis. Findings - The results show that members of university image groups tend to tag in a manner that is of use to users of the system as a whole rather than merely for the tag creator. Originality/value - This paper gives a valuable insight into the tagging practices of image groups in Flickr.
  4. Thelwall, M.; Wilkinson, D.: Finding similar academic Web sites with links, bibliometric couplings and colinks (2004) 0.02
    0.023531662 = product of:
      0.16472162 = sum of:
        0.16472162 = weight(_text_:sites in 2571) [ClassicSimilarity], result of:
          0.16472162 = score(doc=2571,freq=10.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.7749018 = fieldWeight in 2571, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=2571)
      0.14285715 = coord(1/7)
    
    Abstract
    A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or sites that are similar in content. In this paper we assess the extent to which links, colinks and couplings can be used to identify similar Web sites. As an experiment, a random sample of 500 pairs of domains from the UK academic Web were taken and human assessments of site similarity, based upon content type, were compared against ratings for the three concepts. The results show that using a combination of all three gives the highest probability of identifying similar sites, but surprisingly this was only a marginal improvement over using links alone. Another unexpected result was that high values for either colink counts or couplings were associated with only a small increased likelihood of similarity. The principal advantage of using couplings and colinks was found to be greater coverage in terms of a much larger number of pairs of sites being connected by these measures, instead of increased probability of similarity. In information retrieval terminology, this is improved recall rather than improved precision.
  5. Zuccala, A.; Thelwall, M.; Oppenheim, C.; Dhiensa, R.: Web intelligence analyses of digital libraries : a case study of the National electronic Library for Health (NeLH) (2007) 0.02
    0.023472842 = product of:
      0.082154945 = sum of:
        0.06945274 = weight(_text_:sites in 838) [ClassicSimilarity], result of:
          0.06945274 = score(doc=838,freq=4.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.3267273 = fieldWeight in 838, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.03125 = fieldNorm(doc=838)
        0.012702207 = product of:
          0.025404414 = sum of:
            0.025404414 = weight(_text_:design in 838) [ClassicSimilarity], result of:
              0.025404414 = score(doc=838,freq=2.0), product of:
                0.15288728 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.04066292 = queryNorm
                0.16616434 = fieldWeight in 838, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.03125 = fieldNorm(doc=838)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Purpose - The purpose of this paper is to explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the National electronic Library for Health (NeLH). Design/methodology/approach - The Web intelligence techniques in this study are a combination of link analysis (web structure mining), web server log file analysis (web usage mining), and text analysis (web content mining), utilizing the power of commercial search engines and drawing upon the information science fields of bibliometrics and webometrics. LexiURL is a computer program designed to calculate summary statistics for lists of links or URLs. Its output is a series of standard reports, for example listing and counting all of the different domain names in the data. Findings - Link data, when analysed together with user transaction log files (i.e. Web referring domains) can provide insights into who is using a digital library and when, and who could be using the digital library if they are "surfing" a particular part of the Web; in this case any site that is linked to or colinked with the NeLH. This study found that the NeLH was embedded in a multifaceted Web context, including many governmental, educational, commercial and organisational sites, with the most interesting being sites from the.edu domain, representing American Universities. Not many links directed to the NeLH were followed on September 25, 2005 (the date of the log file analysis and link extraction analysis), which means that users who access the digital library have been arriving at the site via only a few select links, bookmarks and search engine searches, or non-electronic sources. Originality/value - A number of studies concerning digital library users have been carried out using log file analysis as a research tool. Log files focus on real-time user transactions; while LexiURL can be used to extract links and colinks associated with a digital library's growing Web network. This Web network is not recognized often enough, and can be a useful indication of where potential users are surfing, even if they have not yet specifically visited the NeLH site.
  6. Price, L.; Thelwall, M.: ¬The clustering power of low frequency words in academic webs (2005) 0.02
    0.021265471 = product of:
      0.1488583 = sum of:
        0.1488583 = weight(_text_:sites in 3561) [ClassicSimilarity], result of:
          0.1488583 = score(doc=3561,freq=6.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.7002758 = fieldWeight in 3561, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3561)
      0.14285715 = coord(1/7)
    
    Abstract
    The value of low frequency words for subject-based academic Web site clustering is assessed. A new technique is introduced to compare the relative clustering power of different vocabularies. The technique is designed for word frequency tests in large document clustering exercises. Results for the Australian and New Zealand academic Web spaces indicate that low frequency words are useful for clustering academic Web sites along subject lines; removing low frequency words results in sites becoming, an average, less dissimilar to sites from other subjects.
  7. Vaughan, L.; Thelwall, M.: Search engine coverage bias : evidence and possible causes (2004) 0.02
    0.02104736 = product of:
      0.1473315 = sum of:
        0.1473315 = weight(_text_:sites in 2536) [ClassicSimilarity], result of:
          0.1473315 = score(doc=2536,freq=8.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.6930933 = fieldWeight in 2536, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=2536)
      0.14285715 = coord(1/7)
    
    Abstract
    Commercial search engines are now playing an increasingly important role in Web information dissemination and access. Of particular interest to business and national governments is whether the big engines have coverage biased towards the US or other countries. In our study we tested for national biases in three major search engines and found significant differences in their coverage of commercial Web sites. The US sites were much better covered than the others in the study: sites from China, Taiwan and Singapore. We then examined the possible technical causes of the differences and found that the language of a site does not affect its coverage by search engines. However, the visibility of a site, measured by the number of links to it, affects its chance to be covered by search engines. We conclude that the coverage bias does exist but this is due not to deliberate choices of the search engines but occurs as a natural result of cumulative advantage effects of US sites on the Web. Nevertheless, the bias remains a cause for international concern.
  8. Thelwall, M.: Conceptualizing documentation on the Web : an evaluation of different heuristic-based models for counting links between university Web sites (2002) 0.02
    0.017539466 = product of:
      0.122776255 = sum of:
        0.122776255 = weight(_text_:sites in 978) [ClassicSimilarity], result of:
          0.122776255 = score(doc=978,freq=8.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.5775777 = fieldWeight in 978, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=978)
      0.14285715 = coord(1/7)
    
    Abstract
    All known previous Web link studies have used the Web page as the primary indivisible source document for counting purposes. Arguments are presented to explain why this is not necessarily optimal and why other alternatives have the potential to produce better results. This is despite the fact that individual Web files are often the only choice if search engines are used for raw data and are the easiest basic Web unit to identify. The central issue is of defining the Web "document": that which should comprise the single indissoluble unit of coherent material. Three alternative heuristics are defined for the educational arena based upon the directory, the domain and the whole university site. These are then compared by implementing them an a set of 108 UK university institutional Web sites under the assumption that a more effective heuristic will tend to produce results that correlate more highly with institutional research productivity. It was discovered that the domain and directory models were able to successfully reduce the impact of anomalous linking behavior between pairs of Web sites, with the latter being the method of choice. Reasons are then given as to why a document model an its own cannot eliminate all anomalies in Web linking behavior. Finally, the results from all models give a clear confirmation of the very strong association between the research productivity of a UK university and the number of incoming links from its peers' Web sites.
  9. Vaughan, L.; Thelwall, M.: ¬A modelling approach to uncover hyperlink patterns : the case of Canadian universities (2005) 0.02
    0.017363185 = product of:
      0.12154229 = sum of:
        0.12154229 = weight(_text_:sites in 1014) [ClassicSimilarity], result of:
          0.12154229 = score(doc=1014,freq=4.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.57177275 = fieldWeight in 1014, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1014)
      0.14285715 = coord(1/7)
    
    Abstract
    Hyperlink patterns between Canadian university Web sites were analyzed by a mathematical modeling approach. A multiple regression model was developed which shows that faculty quality and the language of the university are important predictors for links to a university Web site. Higher faculty quality means more links. French universities received lower numbers of links to their Web sites than comparable English universities. Analysis of interlinking between pairs of universities also showed that English universities are advantaged. Universities are more likely to link to each other when the geographical distance between them is less than 3000 km, possibly reflecting the east vs. west divide that exists in Canadian society.
  10. Vaughan, L.; Thelwall, M.: Scholarly use of the Web : what are the key inducers of links to journal Web sites? (2003) 0.02
    0.0151896225 = product of:
      0.106327355 = sum of:
        0.106327355 = weight(_text_:sites in 1236) [ClassicSimilarity], result of:
          0.106327355 = score(doc=1236,freq=6.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.500197 = fieldWeight in 1236, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1236)
      0.14285715 = coord(1/7)
    
    Abstract
    Web links have been studied by information scientists for at least six years but it is only in the past two that clear evidence has emerged to show that counts of links to scholarly Web spaces (universities and departments) can correlate significantly with research measures, giving some credence to their use for the investigation of scholarly communication. This paper reports an a study to investigate the factors that influence the creation of links to journal Web sites. An empirical approach is used: collecting data and testing for significant patterns. The specific questions addressed are whether site age and site content are inducers of links to a journal's Web site as measured by the ratio of link counts to Journal Impact Factors, two variables previously discovered to be related. A new methodology for data collection is also introduced that uses the Internet Archive to obtain an earliest known creation date for Web sites. The results show that both site age and site content are significant factors for the disciplines studied: library and information science, and law. Comparisons between the two fields also show disciplinary differences in Web site characteristics. Scholars and publishers should be particularly aware that richer content an a journal's Web site tends to generate links and thus the traffic to the site.
  11. Thelwall, M.; Wilkinson, D.: Graph structure in three national academic Webs : power laws with anomalies (2003) 0.01
    0.01488273 = product of:
      0.10417911 = sum of:
        0.10417911 = weight(_text_:sites in 1681) [ClassicSimilarity], result of:
          0.10417911 = score(doc=1681,freq=4.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.49009097 = fieldWeight in 1681, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=1681)
      0.14285715 = coord(1/7)
    
    Abstract
    The graph structures of three national university publicly indexable Webs from Australia, New Zealand, and the UK were analyzed. Strong scale-free regularities for page indegrees, outdegrees, and connected component sizes were in evidence, resulting in power laws similar to those previously identified for individual university Web sites and for the AItaVista-indexed Web. Anomalies were also discovered in most distributions and were tracked down to root causes. As a result, resource driven Web sites and automatically generated pages were identified as representing a significant break from the assumptions of previous power law models. It follows that attempts to track average Web linking behavior would benefit from using techniques to minimize or eliminate the impact of such anomalies.
  12. Barjak, F.; Thelwall, M.: ¬A statistical analysis of the web presences of European life sciences research teams (2008) 0.01
    0.012402276 = product of:
      0.08681592 = sum of:
        0.08681592 = weight(_text_:sites in 1383) [ClassicSimilarity], result of:
          0.08681592 = score(doc=1383,freq=4.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.40840912 = fieldWeight in 1383, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1383)
      0.14285715 = coord(1/7)
    
    Abstract
    Web links have been used for around ten years to explore the online impact of academic information and information producers. Nevertheless, few studies have attempted to relate link counts to relevant offline attributes of the owners of the targeted Web sites, with the exception of research productivity. This article reports the results of a study to relate site inlink counts to relevant owner characteristics for over 400 European life-science research group Web sites. The analysis confirmed that research-group size and Web-presence size were important for attracting Web links, although research productivity was not. Little evidence was found for significant influence of any of an array of factors, including research-group leader gender and industry connections. In addition, the choice of search engine for link data created a surprising international difference in the results, with Google perhaps giving unreliable results. Overall, the data collection, statistical analysis and results interpretation were all complex and it seems that we still need to know more about search engines, hyperlinks, and their function in science before we can draw conclusions on their usefulness and role in the canon of science and technology indicators.
  13. Thelwall, M.: Quantitative comparisons of search engine results (2008) 0.01
    0.012402276 = product of:
      0.08681592 = sum of:
        0.08681592 = weight(_text_:sites in 2350) [ClassicSimilarity], result of:
          0.08681592 = score(doc=2350,freq=4.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.40840912 = fieldWeight in 2350, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2350)
      0.14285715 = coord(1/7)
    
    Abstract
    Search engines are normally used to find information or Web sites, but Webometric investigations use them for quantitative data such as the number of pages matching a query and the international spread of those pages. For this type of application, the accuracy of the hit count estimates and range of URLs in the full results are important. Here, we compare the applications programming interfaces of Google, Yahoo!, and Live Search for 1,587 single word searches. The hit count estimates were broadly consistent but with Yahoo! and Google, reporting 5-6 times more hits than Live Search. Yahoo! tended to return slightly more matching URLs than Google, with Live Search returning significantly fewer. Yahoo!'s result URLs included a significantly wider range of domains and sites than the other two, and there was little consistency between the three engines in the number of different domains. In contrast, the three engines were reasonably consistent in the number of different top-level domains represented in the result URLs, although Yahoo! tended to return the most. In conclusion, quantitative results from the three search engines are mostly consistent but with unexpected types of inconsistency that users should be aware of. Google is recommended for hit count estimates but Yahoo! is recommended for all other Webometric purposes.
  14. Thelwall, M.; Stuart, D.: Web crawling ethics revisited : cost, privacy, and denial of service (2006) 0.01
    0.012277626 = product of:
      0.08594338 = sum of:
        0.08594338 = weight(_text_:sites in 6098) [ClassicSimilarity], result of:
          0.08594338 = score(doc=6098,freq=2.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.40430441 = fieldWeight in 6098, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6098)
      0.14285715 = coord(1/7)
    
    Abstract
    Ethical aspects of the employment of Web crawlers for information science research and other contexts are reviewed. The difference between legal and ethical uses of communications technologies is emphasized as well as the changing boundary between ethical and unethical conduct. A review of the potential impacts on Web site owners is used to underpin a new framework for ethical crawling, and it is argued that delicate human judgment is required for each individual case, with verdicts likely to change over time. Decisions can be based upon an approximate cost-benefit analysis, but it is crucial that crawler owners find out about the technological issues affecting the owners of the sites being crawled in order to produce an informed assessment.
  15. Thelwall, M.; Vann, K.; Fairclough, R.: Web issue analysis : an integrated water resource management case study (2006) 0.01
    0.012119926 = product of:
      0.08483948 = sum of:
        0.08483948 = weight(_text_:united in 5906) [ClassicSimilarity], result of:
          0.08483948 = score(doc=5906,freq=2.0), product of:
            0.22812355 = queryWeight, product of:
              5.6101127 = idf(docFreq=439, maxDocs=44218)
              0.04066292 = queryNorm
            0.37190145 = fieldWeight in 5906, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6101127 = idf(docFreq=439, maxDocs=44218)
              0.046875 = fieldNorm(doc=5906)
      0.14285715 = coord(1/7)
    
    Abstract
    In this article Web issue analysis is introduced as a new technique to investigate an issue as reflected on the Web. The issue chosen, integrated water resource management (IWRM), is a United Nations-initiated paradigm for managing water resources in an international context, particularly in developing nations. As with many international governmental initiatives, there is a considerable body of online information about it: 41.381 hypertext markup language (HTML) pages and 28.735 PDF documents mentioning the issue were downloaded. A page uniform resource locator (URL) and link analysis revealed the international and sectoral spread of IWRM. A noun and noun phrase occurrence analysis was used to identify the issues most commonly discussed, revealing some unexpected topics such as private sector and economic growth. Although the complexity of the methods required to produce meaningful statistics from the data is disadvantageous to easy interpretation, it was still possible to produce data that could be subject to a reasonably intuitive interpretation. Hence Web issue analysis is claimed to be a useful new technique for information science.
  16. Thelwall, M.: Can Google's PageRank be used to find the most important academic Web pages? (2003) 0.01
    0.01052368 = product of:
      0.07366575 = sum of:
        0.07366575 = weight(_text_:sites in 4457) [ClassicSimilarity], result of:
          0.07366575 = score(doc=4457,freq=2.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.34654665 = fieldWeight in 4457, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=4457)
      0.14285715 = coord(1/7)
    
    Abstract
    Google's PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems in order to test whether it is capable of identifying the most important Web pages. The results are also compared with simple inlink counts. It was discovered that the highest inlinked pages do not always have the highest PageRank, indicating that the two metrics are genuinely different, even for the top pages. More significantly, however, internal links dominated external links for the high ranks in either method and superficial reasons accounted for high scores in both cases. It is concluded that PageRank is not useful for identifying the top pages in a site and that it must be combined with a powerful text matching techniques in order to get the quality of information retrieval results provided by Google.
  17. Thelwall, M.; Vaughan, L.: New versions of PageRank employing alternative Web document models (2004) 0.01
    0.01052368 = product of:
      0.07366575 = sum of:
        0.07366575 = weight(_text_:sites in 674) [ClassicSimilarity], result of:
          0.07366575 = score(doc=674,freq=2.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.34654665 = fieldWeight in 674, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=674)
      0.14285715 = coord(1/7)
    
    Abstract
    Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects' rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.
  18. Thelwall, M.: Homophily in MySpace (2009) 0.01
    0.01052368 = product of:
      0.07366575 = sum of:
        0.07366575 = weight(_text_:sites in 2706) [ClassicSimilarity], result of:
          0.07366575 = score(doc=2706,freq=2.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.34654665 = fieldWeight in 2706, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=2706)
      0.14285715 = coord(1/7)
    
    Abstract
    Social network sites like MySpace are increasingly important environments for expressing and maintaining interpersonal connections, but does online communication exacerbate or ameliorate the known tendency for offline friendships to form between similar people (homophily)? This article reports an exploratory study of the similarity between the reported attributes of pairs of active MySpace Friends based upon a systematic sample of 2,567 members joining on June 18, 2007 and Friends who commented on their profile. The results showed no evidence of gender homophily but significant evidence of homophily for ethnicity, religion, age, country, marital status, attitude towards children, sexual orientation, and reason for joining MySpace. There were also some imbalances: women and the young were disproportionately commenters, and commenters tended to have more Friends than commentees. Overall, it seems that although traditional sources of homophily are thriving in MySpace networks of active public connections, gender homophily has completely disappeared. Finally, the method used has wide potential for investigating and partially tracking homophily in society, providing early warning of socially divisive trends.
  19. Thelwall, M.; Wilkinson, D.; Uppal, S.: Data mining emotion in social network communication : gender differences in MySpace (2009) 0.01
    0.01052368 = product of:
      0.07366575 = sum of:
        0.07366575 = weight(_text_:sites in 3322) [ClassicSimilarity], result of:
          0.07366575 = score(doc=3322,freq=2.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.34654665 = fieldWeight in 3322, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=3322)
      0.14285715 = coord(1/7)
    
    Abstract
    Despite the rapid growth in social network sites and in data mining for emotion (sentiment analysis), little research has tied the two together, and none has had social science goals. This article examines the extent to which emotion is present in MySpace comments, using a combination of data mining and content analysis, and exploring age and gender. A random sample of 819 public comments to or from U.S. users was manually classified for strength of positive and negative emotion. Two thirds of the comments expressed positive emotion, but a minority (20%) contained negative emotion, confirming that MySpace is an extraordinarily emotion-rich environment. Females are likely to give and receive more positive comments than are males, but there is no difference for negative comments. It is thus possible that females are more successful social network site users partly because of their greater ability to textually harness positive affect.
  20. Thelwall, M.: Webometrics (2009) 0.01
    0.01052368 = product of:
      0.07366575 = sum of:
        0.07366575 = weight(_text_:sites in 3906) [ClassicSimilarity], result of:
          0.07366575 = score(doc=3906,freq=2.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.34654665 = fieldWeight in 3906, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=3906)
      0.14285715 = coord(1/7)
    
    Abstract
    Webometrics is an information science field concerned with measuring aspects of the World Wide Web (WWW) for a variety of information science research goals. It came into existence about five years after the Web was formed and has since grown to become a significant aspect of information science, at least in terms of published research. Although some webometrics research has focused on the structure or evolution of the Web itself or the performance of commercial search engines, most has used data from the Web to shed light on information provision or online communication in various contexts. Most prominently, techniques have been developed to track, map, and assess Web-based informal scholarly communication, for example, in terms of the hyperlinks between academic Web sites or the online impact of digital repositories. In addition, a range of nonacademic issues and groups of Web users have also been analyzed.