Search (71 results, page 1 of 4)

  • × author_ss:"Thelwall, M."
  1. Thelwall, M.: Web impact factors and search engine coverage (2000) 0.21
    0.20596267 = product of:
      0.2746169 = sum of:
        0.08061194 = weight(_text_:web in 4539) [ClassicSimilarity], result of:
          0.08061194 = score(doc=4539,freq=6.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.49962097 = fieldWeight in 4539, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0625 = fieldNorm(doc=4539)
        0.105578996 = weight(_text_:search in 4539) [ClassicSimilarity], result of:
          0.105578996 = score(doc=4539,freq=8.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.6144187 = fieldWeight in 4539, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0625 = fieldNorm(doc=4539)
        0.08842595 = product of:
          0.1768519 = sum of:
            0.1768519 = weight(_text_:engine in 4539) [ClassicSimilarity], result of:
              0.1768519 = score(doc=4539,freq=4.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.6686872 = fieldWeight in 4539, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4539)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Search engines index only a proportion of the web and this proportion is not determined randomly but by following algorithms that take into account the properties that impact factors measure. A survey was conducted in order to test the coverage of search engines and to decide thether their partial coverage is indeed an obstacle to using them to calculate web impact factors. The results indicate that search engine coverage, even of large national domains is extremely uneven and would be likely to lead to misleading calculations
  2. Vaughan, L.; Thelwall, M.: Search engine coverage bias : evidence and possible causes (2004) 0.15
    0.1532508 = product of:
      0.2043344 = sum of:
        0.060458954 = weight(_text_:web in 2536) [ClassicSimilarity], result of:
          0.060458954 = score(doc=2536,freq=6.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.37471575 = fieldWeight in 2536, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=2536)
        0.0969805 = weight(_text_:search in 2536) [ClassicSimilarity], result of:
          0.0969805 = score(doc=2536,freq=12.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.5643796 = fieldWeight in 2536, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=2536)
        0.04689494 = product of:
          0.09378988 = sum of:
            0.09378988 = weight(_text_:engine in 2536) [ClassicSimilarity], result of:
              0.09378988 = score(doc=2536,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.35462496 = fieldWeight in 2536, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2536)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Commercial search engines are now playing an increasingly important role in Web information dissemination and access. Of particular interest to business and national governments is whether the big engines have coverage biased towards the US or other countries. In our study we tested for national biases in three major search engines and found significant differences in their coverage of commercial Web sites. The US sites were much better covered than the others in the study: sites from China, Taiwan and Singapore. We then examined the possible technical causes of the differences and found that the language of a site does not affect its coverage by search engines. However, the visibility of a site, measured by the number of links to it, affects its chance to be covered by search engines. We conclude that the coverage bias does exist but this is due not to deliberate choices of the search engines but occurs as a natural result of cumulative advantage effects of US sites on the Web. Nevertheless, the bias remains a cause for international concern.
  3. Thelwall, M.: Assessing web search engines : a webometric approach (2011) 0.15
    0.15316099 = product of:
      0.20421466 = sum of:
        0.049364526 = weight(_text_:web in 10) [ClassicSimilarity], result of:
          0.049364526 = score(doc=10,freq=4.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.3059541 = fieldWeight in 10, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=10)
        0.08853068 = weight(_text_:search in 10) [ClassicSimilarity], result of:
          0.08853068 = score(doc=10,freq=10.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.51520574 = fieldWeight in 10, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=10)
        0.06631946 = product of:
          0.13263892 = sum of:
            0.13263892 = weight(_text_:engine in 10) [ClassicSimilarity], result of:
              0.13263892 = score(doc=10,freq=4.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.5015154 = fieldWeight in 10, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.046875 = fieldNorm(doc=10)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Information Retrieval (IR) research typically evaluates search systems in terms of the standard precision, recall and F-measures to weight the relative importance of precision and recall (e.g. van Rijsbergen, 1979). All of these assess the extent to which the system returns good matches for a query. In contrast, webometric measures are designed specifically for web search engines and are designed to monitor changes in results over time and various aspects of the internal logic of the way in which search engine select the results to be returned. This chapter introduces a range of webometric measurements and illustrates them with case studies of Google, Bing and Yahoo! This is a very fertile area for simple and complex new investigations into search engine results.
  4. Thelwall, M.; Vaughan, L.: New versions of PageRank employing alternative Web document models (2004) 0.14
    0.13891208 = product of:
      0.18521611 = sum of:
        0.09872905 = weight(_text_:web in 674) [ClassicSimilarity], result of:
          0.09872905 = score(doc=674,freq=16.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.6119082 = fieldWeight in 674, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=674)
        0.03959212 = weight(_text_:search in 674) [ClassicSimilarity], result of:
          0.03959212 = score(doc=674,freq=2.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.230407 = fieldWeight in 674, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=674)
        0.04689494 = product of:
          0.09378988 = sum of:
            0.09378988 = weight(_text_:engine in 674) [ClassicSimilarity], result of:
              0.09378988 = score(doc=674,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.35462496 = fieldWeight in 674, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.046875 = fieldNorm(doc=674)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects' rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.
  5. Thelwall, M.; Sud, P.: ¬A comparison of methods for collecting web citation data for academic organizations (2011) 0.13
    0.12791318 = product of:
      0.17055091 = sum of:
        0.029088326 = weight(_text_:web in 4626) [ClassicSimilarity], result of:
          0.029088326 = score(doc=4626,freq=2.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.18028519 = fieldWeight in 4626, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4626)
        0.07377557 = weight(_text_:search in 4626) [ClassicSimilarity], result of:
          0.07377557 = score(doc=4626,freq=10.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.4293381 = fieldWeight in 4626, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4626)
        0.06768702 = product of:
          0.13537404 = sum of:
            0.13537404 = weight(_text_:engine in 4626) [ClassicSimilarity], result of:
              0.13537404 = score(doc=4626,freq=6.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.51185703 = fieldWeight in 4626, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4626)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    The primary webometric method for estimating the online impact of an organization is to count links to its website. Link counts have been available from commercial search engines for over a decade but this was set to end by early 2012 and so a replacement is needed. This article compares link counts to two alternative methods: URL citations and organization title mentions. New variations of these methods are also introduced. The three methods are compared against each other using Yahoo!. Two of the three methods (URL citations and organization title mentions) are also compared against each other using Bing. Evidence from a case study of 131 UK universities and 49 US Library and Information Science (LIS) departments suggests that Bing's Hit Count Estimates (HCEs) for popular title searches are not useful for webometric research but that Yahoo!'s HCEs for all three types of search and Bing's URL citation HCEs seem to be consistent. For exact URL counts the results of all three methods in Yahoo! and both methods in Bing are also consistent. Four types of accuracy factors are also introduced and defined: search engine coverage, search engine retrieval variation, search engine retrieval anomalies, and query polysemy.
  6. Thelwall, M.; Li, X.; Barjak, F.; Robinson, S.: Assessing the international web connectivity of research groups (2008) 0.13
    0.1252271 = product of:
      0.16696946 = sum of:
        0.06504348 = weight(_text_:web in 1401) [ClassicSimilarity], result of:
          0.06504348 = score(doc=1401,freq=10.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.40312994 = fieldWeight in 1401, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1401)
        0.046659768 = weight(_text_:search in 1401) [ClassicSimilarity], result of:
          0.046659768 = score(doc=1401,freq=4.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.27153727 = fieldWeight in 1401, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1401)
        0.05526622 = product of:
          0.11053244 = sum of:
            0.11053244 = weight(_text_:engine in 1401) [ClassicSimilarity], result of:
              0.11053244 = score(doc=1401,freq=4.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.41792953 = fieldWeight in 1401, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1401)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Purpose - The purpose of this paper is to claim that it is useful to assess the web connectivity of research groups, describe hyperlink-based techniques to achieve this and present brief details of European life sciences research groups as a case study. Design/methodology/approach - A commercial search engine was harnessed to deliver hyperlink data via its automatic query submission interface. A special purpose link analysis tool, LexiURL, then summarised and graphed the link data in appropriate ways. Findings - Webometrics can provide a wide range of descriptive information about the international connectivity of research groups. Research limitations/implications - Only one field was analysed, data was taken from only one search engine, and the results were not validated. Practical implications - Web connectivity seems to be particularly important for attracting overseas job applicants and to promote research achievements and capabilities, and hence we contend that it can be useful for national and international governments to use webometrics to ensure that the web is being used effectively by research groups. Originality/value - This is the first paper to make a case for the value of using a range of webometric techniques to evaluate the web presences of research groups within a field, and possibly the first "applied" webometrics study produced for an external contract.
  7. Thelwall, M.: Text characteristics of English language university Web sites (2005) 0.12
    0.12340443 = product of:
      0.16453923 = sum of:
        0.07805218 = weight(_text_:web in 3463) [ClassicSimilarity], result of:
          0.07805218 = score(doc=3463,freq=10.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.48375595 = fieldWeight in 3463, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=3463)
        0.03959212 = weight(_text_:search in 3463) [ClassicSimilarity], result of:
          0.03959212 = score(doc=3463,freq=2.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.230407 = fieldWeight in 3463, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=3463)
        0.04689494 = product of:
          0.09378988 = sum of:
            0.09378988 = weight(_text_:engine in 3463) [ClassicSimilarity], result of:
              0.09378988 = score(doc=3463,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.35462496 = fieldWeight in 3463, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3463)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    The nature of the contents of academic Web sites is of direct relevance to the new field of scientific Web intelligence, and for search engine and topic-specific crawler designers. We analyze word frequencies in national academic Webs using the Web sites of three Englishspeaking nations: Australia, New Zealand, and the United Kingdom. Strong regularities were found in page size and word frequency distributions, but with significant anomalies. At least 26% of pages contain no words. High frequency words include university names and acronyms, Internet terminology, and computing product names: not always words in common usage away from the Web. A minority of low frequency words are spelling mistakes, with other common types including nonwords, proper names, foreign language terms or computer science variable names. Based upon these findings, recommendations for data cleansing and filtering are made, particularly for clustering applications.
  8. Barjak, F.; Thelwall, M.: ¬A statistical analysis of the web presences of European life sciences research teams (2008) 0.12
    0.11774283 = product of:
      0.15699044 = sum of:
        0.07125156 = weight(_text_:web in 1383) [ClassicSimilarity], result of:
          0.07125156 = score(doc=1383,freq=12.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.4416067 = fieldWeight in 1383, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1383)
        0.046659768 = weight(_text_:search in 1383) [ClassicSimilarity], result of:
          0.046659768 = score(doc=1383,freq=4.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.27153727 = fieldWeight in 1383, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1383)
        0.03907912 = product of:
          0.07815824 = sum of:
            0.07815824 = weight(_text_:engine in 1383) [ClassicSimilarity], result of:
              0.07815824 = score(doc=1383,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.29552078 = fieldWeight in 1383, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1383)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Web links have been used for around ten years to explore the online impact of academic information and information producers. Nevertheless, few studies have attempted to relate link counts to relevant offline attributes of the owners of the targeted Web sites, with the exception of research productivity. This article reports the results of a study to relate site inlink counts to relevant owner characteristics for over 400 European life-science research group Web sites. The analysis confirmed that research-group size and Web-presence size were important for attracting Web links, although research productivity was not. Little evidence was found for significant influence of any of an array of factors, including research-group leader gender and industry connections. In addition, the choice of search engine for link data created a surprising international difference in the results, with Google perhaps giving unreliable results. Overall, the data collection, statistical analysis and results interpretation were all complex and it seems that we still need to know more about search engines, hyperlinks, and their function in science before we can draw conclusions on their usefulness and role in the canon of science and technology indicators.
  9. Zuccala, A.; Thelwall, M.; Oppenheim, C.; Dhiensa, R.: Web intelligence analyses of digital libraries : a case study of the National electronic Library for Health (NeLH) (2007) 0.11
    0.11190228 = product of:
      0.14920305 = sum of:
        0.08061194 = weight(_text_:web in 838) [ClassicSimilarity], result of:
          0.08061194 = score(doc=838,freq=24.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.49962097 = fieldWeight in 838, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=838)
        0.03732781 = weight(_text_:search in 838) [ClassicSimilarity], result of:
          0.03732781 = score(doc=838,freq=4.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.21722981 = fieldWeight in 838, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.03125 = fieldNorm(doc=838)
        0.031263296 = product of:
          0.06252659 = sum of:
            0.06252659 = weight(_text_:engine in 838) [ClassicSimilarity], result of:
              0.06252659 = score(doc=838,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.23641664 = fieldWeight in 838, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.03125 = fieldNorm(doc=838)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Purpose - The purpose of this paper is to explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the National electronic Library for Health (NeLH). Design/methodology/approach - The Web intelligence techniques in this study are a combination of link analysis (web structure mining), web server log file analysis (web usage mining), and text analysis (web content mining), utilizing the power of commercial search engines and drawing upon the information science fields of bibliometrics and webometrics. LexiURL is a computer program designed to calculate summary statistics for lists of links or URLs. Its output is a series of standard reports, for example listing and counting all of the different domain names in the data. Findings - Link data, when analysed together with user transaction log files (i.e. Web referring domains) can provide insights into who is using a digital library and when, and who could be using the digital library if they are "surfing" a particular part of the Web; in this case any site that is linked to or colinked with the NeLH. This study found that the NeLH was embedded in a multifaceted Web context, including many governmental, educational, commercial and organisational sites, with the most interesting being sites from the.edu domain, representing American Universities. Not many links directed to the NeLH were followed on September 25, 2005 (the date of the log file analysis and link extraction analysis), which means that users who access the digital library have been arriving at the site via only a few select links, bookmarks and search engine searches, or non-electronic sources. Originality/value - A number of studies concerning digital library users have been carried out using log file analysis as a research tool. Log files focus on real-time user transactions; while LexiURL can be used to extract links and colinks associated with a digital library's growing Web network. This Web network is not recognized often enough, and can be a useful indication of where potential users are surfing, even if they have not yet specifically visited the NeLH site.
  10. Thelwall, M.: Quantitative comparisons of search engine results (2008) 0.11
    0.1117384 = product of:
      0.14898454 = sum of:
        0.029088326 = weight(_text_:web in 2350) [ClassicSimilarity], result of:
          0.029088326 = score(doc=2350,freq=2.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.18028519 = fieldWeight in 2350, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2350)
        0.08081709 = weight(_text_:search in 2350) [ClassicSimilarity], result of:
          0.08081709 = score(doc=2350,freq=12.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.47031635 = fieldWeight in 2350, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2350)
        0.03907912 = product of:
          0.07815824 = sum of:
            0.07815824 = weight(_text_:engine in 2350) [ClassicSimilarity], result of:
              0.07815824 = score(doc=2350,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.29552078 = fieldWeight in 2350, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2350)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Search engines are normally used to find information or Web sites, but Webometric investigations use them for quantitative data such as the number of pages matching a query and the international spread of those pages. For this type of application, the accuracy of the hit count estimates and range of URLs in the full results are important. Here, we compare the applications programming interfaces of Google, Yahoo!, and Live Search for 1,587 single word searches. The hit count estimates were broadly consistent but with Yahoo! and Google, reporting 5-6 times more hits than Live Search. Yahoo! tended to return slightly more matching URLs than Google, with Live Search returning significantly fewer. Yahoo!'s result URLs included a significantly wider range of domains and sites than the other two, and there was little consistency between the three engines in the number of different domains. In contrast, the three engines were reasonably consistent in the number of different top-level domains represented in the result URLs, although Yahoo! tended to return the most. In conclusion, quantitative results from the three search engines are mostly consistent but with unexpected types of inconsistency that users should be aware of. Google is recommended for hit count estimates but Yahoo! is recommended for all other Webometric purposes.
  11. Thelwall, M.; Binns, R.; Harries, G.; Page-Kennedy, T.; Price, L.; Wilkinson, D.: Custom interfaces for advanced queries in search engines (2001) 0.11
    0.106457256 = product of:
      0.14194301 = sum of:
        0.029088326 = weight(_text_:web in 697) [ClassicSimilarity], result of:
          0.029088326 = score(doc=697,freq=2.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.18028519 = fieldWeight in 697, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=697)
        0.07377557 = weight(_text_:search in 697) [ClassicSimilarity], result of:
          0.07377557 = score(doc=697,freq=10.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.4293381 = fieldWeight in 697, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=697)
        0.03907912 = product of:
          0.07815824 = sum of:
            0.07815824 = weight(_text_:engine in 697) [ClassicSimilarity], result of:
              0.07815824 = score(doc=697,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.29552078 = fieldWeight in 697, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=697)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Those seeking information from the Internet often start from a search engine, using either its organised directory structure or its text query facility. In response to the difficulty in identifying the most relevant pages for some information needs, many search engines offer Boolean text matching and some, including Google, AltaVista and HotBot, offer the facility to integrate additional information into a more advanced request. Amongst web users, however, it is known that the employment of complex enquiries is far from universal, with very short queries being the norm. It is demonstrated that the gap between the provision of advanced search facilities and their use can be bridged, for specific information needs, by the construction of a simple interface in the form of a website that automatically formulates the necessary requests. It is argued that this kind of resource, perhaps employing additional knowledge domain specific information, is one that could be useful for websites or portals of common interest groups. The approach is illustrated by a website that enables a user to search the individual websites of university level institutions in European Union associated countries.
  12. Orduna-Malea, E.; Thelwall, M.; Kousha, K.: Web citations in patents : evidence of technological impact? (2017) 0.10
    0.101888694 = product of:
      0.13585159 = sum of:
        0.049364526 = weight(_text_:web in 3764) [ClassicSimilarity], result of:
          0.049364526 = score(doc=3764,freq=4.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.3059541 = fieldWeight in 3764, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=3764)
        0.03959212 = weight(_text_:search in 3764) [ClassicSimilarity], result of:
          0.03959212 = score(doc=3764,freq=2.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.230407 = fieldWeight in 3764, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=3764)
        0.04689494 = product of:
          0.09378988 = sum of:
            0.09378988 = weight(_text_:engine in 3764) [ClassicSimilarity], result of:
              0.09378988 = score(doc=3764,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.35462496 = fieldWeight in 3764, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3764)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Patents sometimes cite webpages either as general background to the problem being addressed or to identify prior publications that limit the scope of the patent granted. Counts of the number of patents citing an organization's website may therefore provide an indicator of its technological capacity or relevance. This article introduces methods to extract URL citations from patents and evaluates the usefulness of counts of patent web citations as a technology indicator. An analysis of patents citing 200 US universities or 177 UK universities found computer science and engineering departments to be frequently cited, as well as research-related webpages, such as Wikipedia, YouTube, or the Internet Archive. Overall, however, patent URL citations seem to be frequent enough to be useful for ranking major US and the top few UK universities if popular hosted subdomains are filtered out, but the hit count estimates on the first search engine results page should not be relied upon for accuracy.
  13. Kousha, K.; Thelwall, M.; Rezaie, S.: Can the impact of scholarly images be assessed online? : an exploratory study using image identification technology (2010) 0.08
    0.07587066 = product of:
      0.101160884 = sum of:
        0.029088326 = weight(_text_:web in 3966) [ClassicSimilarity], result of:
          0.029088326 = score(doc=3966,freq=2.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.18028519 = fieldWeight in 3966, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3966)
        0.032993436 = weight(_text_:search in 3966) [ClassicSimilarity], result of:
          0.032993436 = score(doc=3966,freq=2.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.19200584 = fieldWeight in 3966, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3966)
        0.03907912 = product of:
          0.07815824 = sum of:
            0.07815824 = weight(_text_:engine in 3966) [ClassicSimilarity], result of:
              0.07815824 = score(doc=3966,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.29552078 = fieldWeight in 3966, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3966)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    The web contains a huge number of digital pictures. For scholars publishing such images it is important to know how well used their images are, but no method seems to have been developed for monitoring the value of academic images. In particular, can the impact of scientific or artistic images be assessed through identifying images copied or reused on the Internet? This article explores a case study of 260 NASA images to investigate whether the TinEye search engine could theoretically help to provide this information. The results show that the selected pictures had a median of 11 online copies each. However, a classification of 210 of these copies reveals that only 1.4% were explicitly used in academic publications, reflecting research impact, and the majority of the NASA pictures were used for informal scholarly (or educational) communication (37%). Additional analyses of world famous paintings and scientific images about pathology and molecular structures suggest that image contents are important for the type and extent of image use. Although it is reasonable to use statistics derived from TinEye for assessing image reuse value, the extent of its image indexing is not known.
  14. Thelwall, M.: Webometrics (2009) 0.07
    0.06597235 = product of:
      0.1319447 = sum of:
        0.09235258 = weight(_text_:web in 3906) [ClassicSimilarity], result of:
          0.09235258 = score(doc=3906,freq=14.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.57238775 = fieldWeight in 3906, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=3906)
        0.03959212 = weight(_text_:search in 3906) [ClassicSimilarity], result of:
          0.03959212 = score(doc=3906,freq=2.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.230407 = fieldWeight in 3906, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=3906)
      0.5 = coord(2/4)
    
    Abstract
    Webometrics is an information science field concerned with measuring aspects of the World Wide Web (WWW) for a variety of information science research goals. It came into existence about five years after the Web was formed and has since grown to become a significant aspect of information science, at least in terms of published research. Although some webometrics research has focused on the structure or evolution of the Web itself or the performance of commercial search engines, most has used data from the Web to shed light on information provision or online communication in various contexts. Most prominently, techniques have been developed to track, map, and assess Web-based informal scholarly communication, for example, in terms of the hyperlinks between academic Web sites or the online impact of digital repositories. In addition, a range of nonacademic issues and groups of Web users have also been analyzed.
  15. Thelwall, M.: Conceptualizing documentation on the Web : an evaluation of different heuristic-based models for counting links between university Web sites (2002) 0.06
    0.06473425 = product of:
      0.1294685 = sum of:
        0.09647507 = weight(_text_:web in 978) [ClassicSimilarity], result of:
          0.09647507 = score(doc=978,freq=22.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.59793836 = fieldWeight in 978, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=978)
        0.032993436 = weight(_text_:search in 978) [ClassicSimilarity], result of:
          0.032993436 = score(doc=978,freq=2.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.19200584 = fieldWeight in 978, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=978)
      0.5 = coord(2/4)
    
    Abstract
    All known previous Web link studies have used the Web page as the primary indivisible source document for counting purposes. Arguments are presented to explain why this is not necessarily optimal and why other alternatives have the potential to produce better results. This is despite the fact that individual Web files are often the only choice if search engines are used for raw data and are the easiest basic Web unit to identify. The central issue is of defining the Web "document": that which should comprise the single indissoluble unit of coherent material. Three alternative heuristics are defined for the educational arena based upon the directory, the domain and the whole university site. These are then compared by implementing them an a set of 108 UK university institutional Web sites under the assumption that a more effective heuristic will tend to produce results that correlate more highly with institutional research productivity. It was discovered that the domain and directory models were able to successfully reduce the impact of anomalous linking behavior between pairs of Web sites, with the latter being the method of choice. Reasons are then given as to why a document model an its own cannot eliminate all anomalies in Web linking behavior. Finally, the results from all models give a clear confirmation of the very strong association between the research productivity of a UK university and the number of incoming links from its peers' Web sites.
  16. Thelwall, M.; Vaughan, L.; Björneborn, L.: Webometrics (2004) 0.06
    0.060129207 = product of:
      0.12025841 = sum of:
        0.08726498 = weight(_text_:web in 4279) [ClassicSimilarity], result of:
          0.08726498 = score(doc=4279,freq=18.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.5408555 = fieldWeight in 4279, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4279)
        0.032993436 = weight(_text_:search in 4279) [ClassicSimilarity], result of:
          0.032993436 = score(doc=4279,freq=2.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.19200584 = fieldWeight in 4279, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4279)
      0.5 = coord(2/4)
    
    Abstract
    Webometrics, the quantitative study of Web-related phenomena, emerged from the realization that methods originally designed for bibliometric analysis of scientific journal article citation patterns could be applied to the Web, with commercial search engines providing the raw data. Almind and Ingwersen (1997) defined the field and gave it its name. Other pioneers included Rodriguez Gairin (1997) and Aguillo (1998). Larson (1996) undertook exploratory link structure analysis, as did Rousseau (1997). Webometrics encompasses research from fields beyond information science such as communication studies, statistical physics, and computer science. In this review we concentrate on link analysis, but also cover other aspects of webometrics, including Web log fle analysis. One theme that runs through this chapter is the messiness of Web data and the need for data cleansing heuristics. The uncontrolled Web creates numerous problems in the interpretation of results, for instance, from the automatic creation or replication of links. The loose connection between top-level domain specifications (e.g., com, edu, and org) and their actual content is also a frustrating problem. For example, many .com sites contain noncommercial content, although com is ostensibly the main commercial top-level domain. Indeed, a skeptical researcher could claim that obstacles of this kind are so great that all Web analyses lack value. As will be seen, one response to this view, a view shared by critics of evaluative bibliometrics, is to demonstrate that Web data correlate significantly with some non-Web data in order to prove that the Web data are not wholly random. A practical response has been to develop increasingly sophisticated data cleansing techniques and multiple data analysis methods.
  17. Thelwall, M.: Directing students to new information types : a new role for Google in literature searches? (2005) 0.06
    0.06001722 = product of:
      0.12003444 = sum of:
        0.06532367 = weight(_text_:search in 364) [ClassicSimilarity], result of:
          0.06532367 = score(doc=364,freq=4.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.38015217 = fieldWeight in 364, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0546875 = fieldNorm(doc=364)
        0.05471077 = product of:
          0.10942154 = sum of:
            0.10942154 = weight(_text_:engine in 364) [ClassicSimilarity], result of:
              0.10942154 = score(doc=364,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.41372913 = fieldWeight in 364, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=364)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Conducting a literature review is an important activity for postgraduates and many undergraduates. Librarians can play an important role, directing students to digital libraries, compiling online subject reSource lists, and educating about the need to evaluate the quality of online resources. In order to conduct an effective literature search in a new area, however, in some subjects it is necessary to gain basic topic knowledge, including specialist vocabularies. Google's link-based page ranking algorithm makes this search engine an ideal tool for finding specialist topic introductory material, particularly in computer science, and so librarians should be teaching this as part of a strategic literature review approach.
  18. Thelwall, M.: Results from a web impact factor crawler (2001) 0.06
    0.05766148 = product of:
      0.11532296 = sum of:
        0.05817665 = weight(_text_:web in 4490) [ClassicSimilarity], result of:
          0.05817665 = score(doc=4490,freq=8.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.36057037 = fieldWeight in 4490, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4490)
        0.057146307 = weight(_text_:search in 4490) [ClassicSimilarity], result of:
          0.057146307 = score(doc=4490,freq=6.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.33256388 = fieldWeight in 4490, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4490)
      0.5 = coord(2/4)
    
    Abstract
    Web impact factors, the proposed web equivalent of impact factors for journals, can be calculated by using search engines. It has been found that the results are problematic because of the variable coverage of search engines as well as their ability to give significantly different results over short periods of time. The fundamental problem is that although some search engines provide a functionality that is capable of being used for impact calculations, this is not their primary task and therefore they do not give guarantees as to performance in this respect. In this paper, a bespoke web crawler designed specifically for the calculation of reliable WIFs is presented. This crawler was used to calculate WIFs for a number of UK universities, and the results of these calculations are discussed. The principal findings were that with certain restrictions, WIFs can be calculated reliably, but do not correlate with accepted research rankings owing to the variety of material hosted on university servers. Changes to the calculations to improve the fit of the results to research rankings are proposed, but there are still inherent problems undermining the reliability of the calculation. These problems still apply if the WIF scores are taken on their own as indicators of the general impact of any area of the Internet, but with care would not apply to online journals.
  19. Thelwall, M.: Extracting accurate and complete results from search engines : case study windows live (2008) 0.06
    0.057045117 = product of:
      0.114090234 = sum of:
        0.03490599 = weight(_text_:web in 1338) [ClassicSimilarity], result of:
          0.03490599 = score(doc=1338,freq=2.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.21634221 = fieldWeight in 1338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=1338)
        0.07918424 = weight(_text_:search in 1338) [ClassicSimilarity], result of:
          0.07918424 = score(doc=1338,freq=8.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.460814 = fieldWeight in 1338, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=1338)
      0.5 = coord(2/4)
    
    Abstract
    Although designed for general Web searching, Webometrics and related research commercial search engines are also used to produce estimated hit counts or lists of URLs matching a query. Unfortunately, however, they do not return all matching URLs for a search and their hit count estimates are unreliable. In this article, we assess whether it is possible to obtain complete lists of matching URLs from Windows Live, and whether any of its hit count estimates are robust. As part of this, we introduce two new methods to extract extra URLs from search engines: automated query splitting and automated domain and TLD searching. Both methods successfully identify additional matching URLs but the findings suggest that there is no way to get complete lists of matching URLs or accurate hit counts from Windows Live, although some estimating suggestions are provided.
  20. Kousha, K.; Thelwall, M.: How is science cited on the Web? : a classification of google unique Web citations (2007) 0.05
    0.05436564 = product of:
      0.10873128 = sum of:
        0.09198537 = weight(_text_:web in 586) [ClassicSimilarity], result of:
          0.09198537 = score(doc=586,freq=20.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.5701118 = fieldWeight in 586, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=586)
        0.01674591 = product of:
          0.03349182 = sum of:
            0.03349182 = weight(_text_:22 in 586) [ClassicSimilarity], result of:
              0.03349182 = score(doc=586,freq=2.0), product of:
                0.17312855 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049439456 = queryNorm
                0.19345059 = fieldWeight in 586, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=586)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Although the analysis of citations in the scholarly literature is now an established and relatively well understood part of information science, not enough is known about citations that can be found on the Web. In particular, are there new Web types, and if so, are these trivial or potentially useful for studying or evaluating research communication? We sought evidence based upon a sample of 1,577 Web citations of the URLs or titles of research articles in 64 open-access journals from biology, physics, chemistry, and computing. Only 25% represented intellectual impact, from references of Web documents (23%) and other informal scholarly sources (2%). Many of the Web/URL citations were created for general or subject-specific navigation (45%) or for self-publicity (22%). Additional analyses revealed significant disciplinary differences in the types of Google unique Web/URL citations as well as some characteristics of scientific open-access publishing on the Web. We conclude that the Web provides access to a new and different type of citation information, one that may therefore enable us to measure different aspects of research, and the research process in particular; but to obtain good information, the different types should be separated.