Search (147 results, page 1 of 8)

Zhang, Y.; Jansen, B.J.; Spink, A.: Identification of factors predicting clickthrough in Web searching using neural network analysis (2009) 0.24

0.24043433 = product of:
  0.3205791 = sum of:
    0.049364526 = weight(_text_:web in 2742) [ClassicSimilarity], result of:
      0.049364526 = score(doc=2742,freq=4.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.3059541 = fieldWeight in 2742, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2742)
    0.068575576 = weight(_text_:search in 2742) [ClassicSimilarity], result of:
      0.068575576 = score(doc=2742,freq=6.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.39907667 = fieldWeight in 2742, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=2742)
    0.20263903 = sum of:
      0.16244885 = weight(_text_:engine in 2742) [ClassicSimilarity], result of:
        0.16244885 = score(doc=2742,freq=6.0), product of:
          0.26447627 = queryWeight, product of:
            5.349498 = idf(docFreq=570, maxDocs=44218)
            0.049439456 = queryNorm
          0.6142285 = fieldWeight in 2742, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            5.349498 = idf(docFreq=570, maxDocs=44218)
            0.046875 = fieldNorm(doc=2742)
      0.04019018 = weight(_text_:22 in 2742) [ClassicSimilarity], result of:
        0.04019018 = score(doc=2742,freq=2.0), product of:
          0.17312855 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049439456 = queryNorm
          0.23214069 = fieldWeight in 2742, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2742)
  0.75 = coord(3/4)

Abstract: In this research, we aim to identify factors that significantly affect the clickthrough of Web searchers. Our underlying goal is determine more efficient methods to optimize the clickthrough rate. We devise a clickthrough metric for measuring customer satisfaction of search engine results using the number of links visited, number of queries a user submits, and rank of clicked links. We use a neural network to detect the significant influence of searching characteristics on future user clickthrough. Our results show that high occurrences of query reformulation, lengthy searching duration, longer query length, and the higher ranking of prior clicked links correlate positively with future clickthrough. We provide recommendations for leveraging these findings for improving the performance of search engine retrieval and result ranking, along with implications for search engine marketing.
Date: 22. 3.2009 17:49:11

Thelwall, M.: Web impact factors and search engine coverage (2000) 0.21

0.20596267 = product of:
  0.2746169 = sum of:
    0.08061194 = weight(_text_:web in 4539) [ClassicSimilarity], result of:
      0.08061194 = score(doc=4539,freq=6.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.49962097 = fieldWeight in 4539, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0625 = fieldNorm(doc=4539)
    0.105578996 = weight(_text_:search in 4539) [ClassicSimilarity], result of:
      0.105578996 = score(doc=4539,freq=8.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.6144187 = fieldWeight in 4539, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0625 = fieldNorm(doc=4539)
    0.08842595 = product of:
      0.1768519 = sum of:
        0.1768519 = weight(_text_:engine in 4539) [ClassicSimilarity], result of:
          0.1768519 = score(doc=4539,freq=4.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.6686872 = fieldWeight in 4539, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.0625 = fieldNorm(doc=4539)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Search engines index only a proportion of the web and this proportion is not determined randomly but by following algorithms that take into account the properties that impact factors measure. A survey was conducted in order to test the coverage of search engines and to decide thether their partial coverage is indeed an obstacle to using them to calculate web impact factors. The results indicate that search engine coverage, even of large national domains is extremely uneven and would be likely to lead to misleading calculations

Jepsen, E.T.; Seiden, P.; Ingwersen, P.; Björneborn, L.; Borlund, P.: Characteristics of scientific Web publications : preliminary data gathering and analysis (2004) 0.17
```
0.1678026 = product of:
  0.22373681 = sum of:
    0.08227421 = weight(_text_:web in 3091) [ClassicSimilarity], result of:
      0.08227421 = score(doc=3091,freq=16.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.5099235 = fieldWeight in 3091, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3091)
    0.07377557 = weight(_text_:search in 3091) [ClassicSimilarity], result of:
      0.07377557 = score(doc=3091,freq=10.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.4293381 = fieldWeight in 3091, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3091)
    0.06768702 = product of:
      0.13537404 = sum of:
        0.13537404 = weight(_text_:engine in 3091) [ClassicSimilarity], result of:
          0.13537404 = score(doc=3091,freq=6.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.51185703 = fieldWeight in 3091, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3091)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Abstract

Because of the increasing presence of scientific publications an the Web, combined with the existing difficulties in easily verifying and retrieving these publications, research an techniques and methods for retrieval of scientific Web publications is called for. In this article, we report an the initial steps taken toward the construction of a test collection of scientific Web publications within the subject domain of plant biology. The steps reported are those of data gathering and data analysis aiming at identifying characteristics of scientific Web publications. The data used in this article were generated based an specifically selected domain topics that are searched for in three publicly accessible search engines (Google, AlITheWeb, and AItaVista). A sample of the retrieved hits was analyzed with regard to how various publication attributes correlated with the scientific quality of the content and whether this information could be employed to harvest, filter, and rank Web publications. The attributes analyzed were inlinks, outlinks, bibliographic references, file format, language, search engine overlap, structural position (according to site structure), and the occurrence of various types of metadata. As could be expected, the ranked output differs between the three search engines. Apparently, this is caused by differences in ranking algorithms rather than the databases themselves. In fact, because scientific Web content in this subject domain receives few inlinks, both AItaVista and AlITheWeb retrieved a higher degree of accessible scientific content than Google. Because of the search engine cutoffs of accessible URLs, the feasibility of using search engine output for Web content analysis is also discussed.

Bar-Ilan, J.; Peritz, B.C.: ¬A method for measuring the evolution of a topic on the Web : the case of "informetrics" (2009) 0.14

0.13815016 = product of:
  0.1842002 = sum of:
    0.08227421 = weight(_text_:web in 3089) [ClassicSimilarity], result of:
      0.08227421 = score(doc=3089,freq=16.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.5099235 = fieldWeight in 3089, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3089)
    0.046659768 = weight(_text_:search in 3089) [ClassicSimilarity], result of:
      0.046659768 = score(doc=3089,freq=4.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.27153727 = fieldWeight in 3089, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3089)
    0.05526622 = product of:
      0.11053244 = sum of:
        0.11053244 = weight(_text_:engine in 3089) [ClassicSimilarity], result of:
          0.11053244 = score(doc=3089,freq=4.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.41792953 = fieldWeight in 3089, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3089)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: The universe of information has been enriched by the creation of the World Wide Web, which has become an indispensible source for research. Since this source is growing at an enormous speed, an in-depth look of its performance to create a method for its evaluation has become necessary; however, growth is not the only process that influences the evolution of the Web. During their lifetime, Web pages may change their content and links to/from other Web pages, be duplicated or moved to a different URL, be removed from the Web either temporarily or permanently, and be temporarily inaccessible due to server and/or communication failures. To obtain a better understanding of these processes, we developed a method for tracking topics on the Web for long periods of time, without the need to employ a crawler and relying only on publicly available resources. The multiple data-collection methods used allow us to discover new pages related to the topic, to identify changes to existing pages, and to detect previously existing pages that have been removed or whose content is not relevant anymore to the specified topic. The method is demonstrated through monitoring Web pages that contain the term informetrics for a period of 8 years. The data-collection method also allowed us to analyze the dynamic changes in search engine coverage, illustrated here on Google - the search engine used for the longest period of time for data collection in this project.

Thelwall, M.; Li, X.; Barjak, F.; Robinson, S.: Assessing the international web connectivity of research groups (2008) 0.13

0.1252271 = product of:
  0.16696946 = sum of:
    0.06504348 = weight(_text_:web in 1401) [ClassicSimilarity], result of:
      0.06504348 = score(doc=1401,freq=10.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.40312994 = fieldWeight in 1401, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1401)
    0.046659768 = weight(_text_:search in 1401) [ClassicSimilarity], result of:
      0.046659768 = score(doc=1401,freq=4.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.27153727 = fieldWeight in 1401, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1401)
    0.05526622 = product of:
      0.11053244 = sum of:
        0.11053244 = weight(_text_:engine in 1401) [ClassicSimilarity], result of:
          0.11053244 = score(doc=1401,freq=4.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.41792953 = fieldWeight in 1401, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1401)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Purpose - The purpose of this paper is to claim that it is useful to assess the web connectivity of research groups, describe hyperlink-based techniques to achieve this and present brief details of European life sciences research groups as a case study. Design/methodology/approach - A commercial search engine was harnessed to deliver hyperlink data via its automatic query submission interface. A special purpose link analysis tool, LexiURL, then summarised and graphed the link data in appropriate ways. Findings - Webometrics can provide a wide range of descriptive information about the international connectivity of research groups. Research limitations/implications - Only one field was analysed, data was taken from only one search engine, and the results were not validated. Practical implications - Web connectivity seems to be particularly important for attracting overseas job applicants and to promote research achievements and capabilities, and hence we contend that it can be useful for national and international governments to use webometrics to ensure that the web is being used effectively by research groups. Originality/value - This is the first paper to make a case for the value of using a range of webometric techniques to evaluate the web presences of research groups within a field, and possibly the first "applied" webometrics study produced for an external contract.

Vaughan, L.; Shaw, D.: Web citation data for impact assessment : a comparison of four science disciplines (2005) 0.12

0.12304345 = product of:
  0.16405793 = sum of:
    0.09198537 = weight(_text_:web in 3880) [ClassicSimilarity], result of:
      0.09198537 = score(doc=3880,freq=20.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.5701118 = fieldWeight in 3880, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3880)
    0.032993436 = weight(_text_:search in 3880) [ClassicSimilarity], result of:
      0.032993436 = score(doc=3880,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.19200584 = fieldWeight in 3880, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3880)
    0.03907912 = product of:
      0.07815824 = sum of:
        0.07815824 = weight(_text_:engine in 3880) [ClassicSimilarity], result of:
          0.07815824 = score(doc=3880,freq=2.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.29552078 = fieldWeight in 3880, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3880)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: The number and type of Web citations to journal articles in four areas of science are examined: biology, genetics, medicine, and multidisciplinary sciences. For a sample of 5,972 articles published in 114 journals, the median Web citation counts per journal article range from 6.2 in medicine to 10.4 in genetics. About 30% of Web citations in each area indicate intellectual impact (citations from articles or class readings, in contrast to citations from bibliographic services or the author's or journal's home page). Journals receiving more Web citations also have higher percentages of citations indicating intellectual impact. There is significant correlation between the number of citations reported in the databases from the Institute for Scientific Information (ISI, now Thomson Scientific) and the number of citations retrieved using the Google search engine (Web citations). The correlation is much weaker for journals published outside the United Kingdom or United States and for multidisciplinary journals. Web citation numbers are higher than ISI citation counts, suggesting that Web searches might be conducted for an earlier or a more fine-grained assessment of an article's impact. The Web-evident impact of non-UK/USA publications might provide a balance to the geographic or cultural biases observed in ISI's data, although the stability of Web citation counts is debatable.

H-Index auch im Web of Science (2008) 0.11

0.11184722 = product of:
  0.14912963 = sum of:
    0.060458954 = weight(_text_:web in 590) [ClassicSimilarity], result of:
      0.060458954 = score(doc=590,freq=6.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.37471575 = fieldWeight in 590, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=590)
    0.068575576 = weight(_text_:search in 590) [ClassicSimilarity], result of:
      0.068575576 = score(doc=590,freq=6.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.39907667 = fieldWeight in 590, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=590)
    0.02009509 = product of:
      0.04019018 = sum of:
        0.04019018 = weight(_text_:22 in 590) [ClassicSimilarity], result of:
          0.04019018 = score(doc=590,freq=2.0), product of:
            0.17312855 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049439456 = queryNorm
            0.23214069 = fieldWeight in 590, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=590)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Content: "Zur Kurzmitteilung "Latest enhancements in Scopus: ... h-Index incorporated in Scopus" in den letzten Online-Mitteilungen (Online-Mitteilungen 92, S.31) ist zu korrigieren, dass der h-Index sehr wohl bereits im Web of Science enthalten ist. Allerdings findet man/frau diese Information nicht in der "cited ref search", sondern neben der Trefferliste einer Quick Search, General Search oder einer Suche über den Author Finder in der rechten Navigationsleiste unter dem Titel "Citation Report". Der "Citation Report" bietet für die in der jeweiligen Trefferliste angezeigten Arbeiten: - Die Gesamtzahl der Zitierungen aller Arbeiten in der Trefferliste - Die mittlere Zitationshäufigkeit dieser Arbeiten - Die Anzahl der Zitierungen der einzelnen Arbeiten, aufgeschlüsselt nach Publikationsjahr der zitierenden Arbeiten - Die mittlere Zitationshäufigkeit dieser Arbeiten pro Jahr - Den h-Index (ein h-Index von x sagt aus, dass x Arbeiten der Trefferliste mehr als x-mal zitiert wurden; er ist gegenüber sehr hohen Zitierungen einzelner Arbeiten unempfindlicher als die mittlere Zitationshäufigkeit)."
Date: 6. 4.2008 19:04:22
Object: Web of Science

Thelwall, M.: Quantitative comparisons of search engine results (2008) 0.11

0.1117384 = product of:
  0.14898454 = sum of:
    0.029088326 = weight(_text_:web in 2350) [ClassicSimilarity], result of:
      0.029088326 = score(doc=2350,freq=2.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.18028519 = fieldWeight in 2350, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2350)
    0.08081709 = weight(_text_:search in 2350) [ClassicSimilarity], result of:
      0.08081709 = score(doc=2350,freq=12.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.47031635 = fieldWeight in 2350, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2350)
    0.03907912 = product of:
      0.07815824 = sum of:
        0.07815824 = weight(_text_:engine in 2350) [ClassicSimilarity], result of:
          0.07815824 = score(doc=2350,freq=2.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.29552078 = fieldWeight in 2350, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2350)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Search engines are normally used to find information or Web sites, but Webometric investigations use them for quantitative data such as the number of pages matching a query and the international spread of those pages. For this type of application, the accuracy of the hit count estimates and range of URLs in the full results are important. Here, we compare the applications programming interfaces of Google, Yahoo!, and Live Search for 1,587 single word searches. The hit count estimates were broadly consistent but with Yahoo! and Google, reporting 5-6 times more hits than Live Search. Yahoo! tended to return slightly more matching URLs than Google, with Live Search returning significantly fewer. Yahoo!'s result URLs included a significantly wider range of domains and sites than the other two, and there was little consistency between the three engines in the number of different domains. In contrast, the three engines were reasonably consistent in the number of different top-level domains represented in the result URLs, although Yahoo! tended to return the most. In conclusion, quantitative results from the three search engines are mostly consistent but with unexpected types of inconsistency that users should be aware of. Google is recommended for hit count estimates but Yahoo! is recommended for all other Webometric purposes.

Leydesdorff, L.; Vaughan, L.: Co-occurrence matrices and their applications in information science : extending ACA to the Web environment (2006) 0.08

0.08490725 = product of:
  0.113209665 = sum of:
    0.041137107 = weight(_text_:web in 6113) [ClassicSimilarity], result of:
      0.041137107 = score(doc=6113,freq=4.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.25496176 = fieldWeight in 6113, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6113)
    0.032993436 = weight(_text_:search in 6113) [ClassicSimilarity], result of:
      0.032993436 = score(doc=6113,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.19200584 = fieldWeight in 6113, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6113)
    0.03907912 = product of:
      0.07815824 = sum of:
        0.07815824 = weight(_text_:engine in 6113) [ClassicSimilarity], result of:
          0.07815824 = score(doc=6113,freq=2.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.29552078 = fieldWeight in 6113, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6113)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Co-occurrence matrices, such as cocitation, coword, and colink matrices, have been used widely in the information sciences. However, confusion and controversy have hindered the proper statistical analysis of these data. The underlying problem, in our opinion, involved understanding the nature of various types of matrices. This article discusses the difference between a symmetrical cocitation matrix and an asymmetrical citation matrix as well as the appropriate statistical techniques that can be applied to each of these matrices, respectively. Similarity measures (such as the Pearson correlation coefficient or the cosine) should not be applied to the symmetrical cocitation matrix but can be applied to the asymmetrical citation matrix to derive the proximity matrix. The argument is illustrated with examples. The study then extends the application of co-occurrence matrices to the Web environment, in which the nature of the available data and thus data collection methods are different from those of traditional databases such as the Science Citation Index. A set of data collected with the Google Scholar search engine is analyzed by using both the traditional methods of multivariate analysis and the new visualization software Pajek, which is based on social network analysis and graph theory.

Bar-Ilan, J.: ¬The Web as an information source on informetrics? : A content analysis (2000) 0.08
```
0.077360384 = product of:
  0.15472077 = sum of:
    0.09872905 = weight(_text_:web in 4587) [ClassicSimilarity], result of:
      0.09872905 = score(doc=4587,freq=16.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.6119082 = fieldWeight in 4587, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=4587)
    0.055991717 = weight(_text_:search in 4587) [ClassicSimilarity], result of:
      0.055991717 = score(doc=4587,freq=4.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.3258447 = fieldWeight in 4587, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=4587)
  0.5 = coord(2/4)
```
Abstract

This article addresses the question of whether the Web can serve as an information source for research. Specifically, it analyzes by way of content analysis the Web pages retrieved by the major search engines on a particular date (June 7, 1998), as a result of the query 'informetrics OR informetric'. In 807 out of the 942 retrieved pages, the search terms were mentioned in the context of information science. Over 70% of the pages contained only indirect information on the topic, in the form of hypertext links and bibliographical references without annotation. The bibliographical references extracted from the Web pages were analyzed, and lists of most productive authors, most cited authors, works, and sources were compiled. The list of reference obtained from the Web was also compared to data retrieved from commercial databases. For most cases, the list of references extracted from the Web outperformed the commercial, bibliographic databases. The results of these comparisons indicate that valuable, freely available data is hidden in the Web waiting to be extracted from the millions of Web pages

Cothey, V.: Web-crawling reliability (2004) 0.08

0.076967746 = product of:
  0.15393549 = sum of:
    0.10774467 = weight(_text_:web in 3089) [ClassicSimilarity], result of:
      0.10774467 = score(doc=3089,freq=14.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.6677857 = fieldWeight in 3089, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3089)
    0.046190813 = weight(_text_:search in 3089) [ClassicSimilarity], result of:
      0.046190813 = score(doc=3089,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.2688082 = fieldWeight in 3089, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3089)
  0.5 = coord(2/4)

Abstract: In this article, I investigate the reliability, in the social science sense, of collecting informetric data about the World Wide Web by Web crawling. The investigation includes a critical examination of the practice of Web crawling and contrasts the results of content crawling with the results of link crawling. It is shown that Web crawling by search engines is intentionally biased and selective. I also report the results of a [arge-scale experimental simulation of Web crawling that illustrates the effects of different crawling policies an data collection. It is concluded that the reliability of Web crawling as a data collection technique is improved by fuller reporting of relevant crawling policies.

Bhavnani, S.K.: Why is it difficult to find comprehensive information? : implications of information scatter for search and design (2005) 0.08
```
0.07603432 = product of:
  0.15206864 = sum of:
    0.07125156 = weight(_text_:web in 3684) [ClassicSimilarity], result of:
      0.07125156 = score(doc=3684,freq=12.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.4416067 = fieldWeight in 3684, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3684)
    0.08081709 = weight(_text_:search in 3684) [ClassicSimilarity], result of:
      0.08081709 = score(doc=3684,freq=12.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.47031635 = fieldWeight in 3684, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3684)
  0.5 = coord(2/4)
```
Abstract

The rapid development of Web sites providing extensive coverage of a topic, coupled with the development of powerful search engines (designed to help users find such Web sites), suggests that users can easily find comprehensive information about a topic. In domains such as consumer healthcare, finding comprehensive information about a topic is critical as it can improve a patient's judgment in making healthcare decisions, and can encourage higher compliance with treatment. However, recent studies show that despite using powerful search engines, many healthcare information seekers have difficulty finding comprehensive information even for narrow healthcare topics because the relevant information is scattered across many Web sites. To date, no studies have analyzed how facts related to a search topic are distributed across relevant Web pages and Web sites. In this study, the distribution of facts related to five common healthcare topics across high-quality sites is analyzed, and the reasons underlying those distributions are explored. The analysis revealed the existence of few pages that had many facts, many pages that had few facts, and no single page or site that provided all the facts. While such a distribution conforms to other information-related phenomena, a deeper analysis revealed that the distributions were caused by a trade-off between depth and breadth, leading to the existence of general, specialized, and sparse pages. Furthermore, the results helped to make explicit the knowledge needed by searchers to find comprehensive healthcare information, and suggested the motivation to explore distribution-conscious approaches for the development of future search systems, search interfaces, Web page designs, and training.
Hong, T.: ¬The influence of structural and message features an Web site credibility (2006) 0.07
```
0.07215504 = product of:
  0.14431009 = sum of:
    0.10471797 = weight(_text_:web in 5787) [ClassicSimilarity], result of:
      0.10471797 = score(doc=5787,freq=18.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.64902663 = fieldWeight in 5787, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5787)
    0.03959212 = weight(_text_:search in 5787) [ClassicSimilarity], result of:
      0.03959212 = score(doc=5787,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.230407 = fieldWeight in 5787, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=5787)
  0.5 = coord(2/4)
```
Abstract

This article explores the associations that message features and Web structural features have with perceptions of Web site credibility. In a within-subjects experiment, 84 participants actively located health-related Web sites an the basis of two tasks that differed in task specificity and complexity. Web sites that were deemed most credible were content analyzed for message features and structural features that have been found to be associated with perceptions of source credibility. Regression analyses indicated that message features predicted perceived Web site credibility for both searches when controlling for Internet experience and issue involvement. Advertisements and structural features had no significant effects an perceived Web site credibility. Institutionaffiliated domain names (.gov, org, edu) predicted Web site credibility, but only in the general search, which was more difficult. Implications of results are discussed in terms of online credibility research and Web site design.
Vaughan, L.; Shaw , D.: Bibliographic and Web citations : what Is the difference? (2003) 0.07
```
0.06893644 = product of:
  0.13787287 = sum of:
    0.10487945 = weight(_text_:web in 5176) [ClassicSimilarity], result of:
      0.10487945 = score(doc=5176,freq=26.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.65002745 = fieldWeight in 5176, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5176)
    0.032993436 = weight(_text_:search in 5176) [ClassicSimilarity], result of:
      0.032993436 = score(doc=5176,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.19200584 = fieldWeight in 5176, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5176)
  0.5 = coord(2/4)
```
Abstract

Vaughn, and Shaw look at the relationship between traditional citation and Web citation (not hyperlinks but rather textual mentions of published papers). Using English language research journals in ISI's 2000 Journal Citation Report - Information and Library Science category - 1209 full length papers published in 1997 in 46 journals were identified. Each was searched in Social Science Citation Index and on the Web using Google phrase search by entering the title in quotation marks, and followed for distinction where necessary with sub-titles, author's names, and journal title words. After removing obvious false drops, the number of web sites was recorded for comparison with the SSCI counts. A second sample from 1992 was also collected for examination. There were a total of 16,371 web citations to the selected papers. The top and bottom ranked four journals were then examined and every third citation to every third paper was selected and classified as to source type, domain, and country of origin. Web counts are much higher than ISI citation counts. Of the 46 journals from 1997, 26 demonstrated a significant correlation between Web and traditional citation counts, and 11 of the 15 in the 1992 sample also showed significant correlation. Journal impact factor in 1998 and 1999 correlated significantly with average Web citations per journal in the 1997 data, but at a low level. Thirty percent of web citations come from other papers posted on the web, and 30percent from listings of web based bibliographic services, while twelve percent come from class reading lists. High web citation journals often have web accessible tables of content.
Hayer, L.: Lazarsfeld zitiert : eine bibliometrische Analyse (2008) 0.07
```
0.06815734 = product of:
  0.09087645 = sum of:
    0.041137107 = weight(_text_:web in 1934) [ClassicSimilarity], result of:
      0.041137107 = score(doc=1934,freq=4.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.25496176 = fieldWeight in 1934, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1934)
    0.032993436 = weight(_text_:search in 1934) [ClassicSimilarity], result of:
      0.032993436 = score(doc=1934,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.19200584 = fieldWeight in 1934, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1934)
    0.01674591 = product of:
      0.03349182 = sum of:
        0.03349182 = weight(_text_:22 in 1934) [ClassicSimilarity], result of:
          0.03349182 = score(doc=1934,freq=2.0), product of:
            0.17312855 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049439456 = queryNorm
            0.19345059 = fieldWeight in 1934, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1934)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Abstract

Um sich einer Antwort auf die Frage anzunähern, welche Bedeutung der Nachlass eines Wissenschaftlers wie jener Paul F. Lazarsfelds (mit zahlreichen noch unveröffentlichten Schriften) für die aktuelle Forschung haben könne, kann untersucht werden, wie häufig dieser Wissenschaftler zitiert wird. Wenn ein Autor zitiert wird, wird er auch genutzt. Wird er über einen langen Zeitraum oft genutzt, ist vermutlich auch die Auseinandersetzung mit seinem Nachlass von Nutzen. Außerdem kann aufgrund der Zitierungen festgestellt werden, was aus dem Lebenswerk eines Wissenschaftlers für die aktuelle Forschung relevant erscheint. Daraus können die vordringlichen Fragestellungen in der Bearbeitung des Nachlasses abgeleitet werden. Die Aufgabe für die folgende Untersuchung lautete daher: Wie oft wird Paul F. Lazarsfeld zitiert? Dabei interessierte auch: Wer zitiert wo? Die Untersuchung wurde mit Hilfe der Meta-Datenbank "ISI Web of Knowledge" durchgeführt. In dieser wurde im "Web of Science" mit dem Werkzeug "Cited Reference Search" nach dem zitierten Autor (Cited Author) "Lazarsfeld P*" gesucht. Diese Suche ergab 1535 Referenzen (References). Werden alle Referenzen gewählt, führt dies zu 4839 Ergebnissen (Results). Dabei wurden die Datenbanken SCI-Expanded, SSCI und A&HCI verwendet. Bei dieser Suche wurden die Publikationsjahre 1941-2008 analysiert. Vor 1956 wurden allerdings nur sehr wenige Zitate gefunden: 1946 fünf, ansonsten maximal drei, 1942-1944 und 1949 überhaupt keines. Zudem ist das Jahr 2008 noch lange nicht zu Ende. (Es gab jedoch schon vor Ende März 24 Zitate!)

Date

22. 6.2008 12:54:12

Menczer, F.: Lexical and semantic clustering by Web links (2004) 0.07

0.06597235 = product of:
  0.1319447 = sum of:
    0.09235258 = weight(_text_:web in 3090) [ClassicSimilarity], result of:
      0.09235258 = score(doc=3090,freq=14.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.57238775 = fieldWeight in 3090, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3090)
    0.03959212 = weight(_text_:search in 3090) [ClassicSimilarity], result of:
      0.03959212 = score(doc=3090,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.230407 = fieldWeight in 3090, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=3090)
  0.5 = coord(2/4)

Abstract: Recent Web-searching and -mining tools are combining text and link analysis to improve ranking and crawling algorithms. The central assumption behind such approaches is that there is a correiation between the graph structure of the Web and the text and meaning of pages. Here I formalize and empirically evaluate two general conjectures drawing connections from link information to lexical and semantic Web content. The link-content conjecture states that a page is similar to the pages that link to it, and the link-cluster conjecture that pages about the same topic are clustered together. These conjectures are offen simply assumed to hold, and Web search tools are built an such assumptions. The present quantitative confirmation sheds light an the connection between the success of the latest Web-mining techniques and the small world topology of the Web, with encouraging implications for the design of better crawling algorithms.

Thelwall, M.: Webometrics (2009) 0.07

0.06597235 = product of:
  0.1319447 = sum of:
    0.09235258 = weight(_text_:web in 3906) [ClassicSimilarity], result of:
      0.09235258 = score(doc=3906,freq=14.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.57238775 = fieldWeight in 3906, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3906)
    0.03959212 = weight(_text_:search in 3906) [ClassicSimilarity], result of:
      0.03959212 = score(doc=3906,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.230407 = fieldWeight in 3906, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=3906)
  0.5 = coord(2/4)

Abstract: Webometrics is an information science field concerned with measuring aspects of the World Wide Web (WWW) for a variety of information science research goals. It came into existence about five years after the Web was formed and has since grown to become a significant aspect of information science, at least in terms of published research. Although some webometrics research has focused on the structure or evolution of the Web itself or the performance of commercial search engines, most has used data from the Web to shed light on information provision or online communication in various contexts. Most prominently, techniques have been developed to track, map, and assess Web-based informal scholarly communication, for example, in terms of the hyperlinks between academic Web sites or the online impact of digital repositories. In addition, a range of nonacademic issues and groups of Web users have also been analyzed.

Thelwall, M.: Conceptualizing documentation on the Web : an evaluation of different heuristic-based models for counting links between university Web sites (2002) 0.06
```
0.06473425 = product of:
  0.1294685 = sum of:
    0.09647507 = weight(_text_:web in 978) [ClassicSimilarity], result of:
      0.09647507 = score(doc=978,freq=22.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.59793836 = fieldWeight in 978, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=978)
    0.032993436 = weight(_text_:search in 978) [ClassicSimilarity], result of:
      0.032993436 = score(doc=978,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.19200584 = fieldWeight in 978, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=978)
  0.5 = coord(2/4)
```
Abstract

All known previous Web link studies have used the Web page as the primary indivisible source document for counting purposes. Arguments are presented to explain why this is not necessarily optimal and why other alternatives have the potential to produce better results. This is despite the fact that individual Web files are often the only choice if search engines are used for raw data and are the easiest basic Web unit to identify. The central issue is of defining the Web "document": that which should comprise the single indissoluble unit of coherent material. Three alternative heuristics are defined for the educational arena based upon the directory, the domain and the whole university site. These are then compared by implementing them an a set of 108 UK university institutional Web sites under the assumption that a more effective heuristic will tend to produce results that correlate more highly with institutional research productivity. It was discovered that the domain and directory models were able to successfully reduce the impact of anomalous linking behavior between pairs of Web sites, with the latter being the method of choice. Reasons are then given as to why a document model an its own cannot eliminate all anomalies in Web linking behavior. Finally, the results from all models give a clear confirmation of the very strong association between the research productivity of a UK university and the number of incoming links from its peers' Web sites.
Lawrence, S.: Online or Invisible? (2001) 0.06
```
0.060827456 = product of:
  0.12165491 = sum of:
    0.057001244 = weight(_text_:web in 1063) [ClassicSimilarity], result of:
      0.057001244 = score(doc=1063,freq=12.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.35328537 = fieldWeight in 1063, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=1063)
    0.064653665 = weight(_text_:search in 1063) [ClassicSimilarity], result of:
      0.064653665 = score(doc=1063,freq=12.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.37625307 = fieldWeight in 1063, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.03125 = fieldNorm(doc=1063)
  0.5 = coord(2/4)
```
Content

The volume of scientific literature typically far exceeds the ability of scientists to identify and utilize all relevant information in their research. Improvements to the accessibility of scientific literature, allowing scientists to locate more relevant research within a given time, have the potential to dramatically improve communication and progress in science. With the web, scientists now have very convenient access to an increasing amount of literature that previously required trips to the library, inter-library loan delays, or substantial effort in locating the source. Evidence shows that usage increases when access is more convenient, and maximizing the usage of the scientific record benefits all of society. Although availability varies greatly by discipline, over a million research articles are freely available on the web. Some journals and conferences provide free access online, others allow authors to post articles on the web, and others allow authors to purchase the right to post their articles on the web. In this article we investigate the impact of free online availability by analyzing citation rates. We do not discuss methods of creating free online availability, such as time-delayed release or publication/membership/conference charges. Online availability of an article may not be expected to greatly improve access and impact by itself. For example, efficient means of locating articles via web search engines or specialized search services is required, and a substantial percentage of the literature needs to be indexed by these search services before it is worthwhile for many scientists to use them. Computer science is a forerunner in web availability -- a substantial percentage of the literature is online and available through search engines such as Google (google.com), or specialized services such as ResearchIndex (researchindex.org). Even so, the greatest impact of the online availability of computer science literature is likely yet to come, because comprehensive search services and more powerful search methods have only become available recently. We analyzed 119,924 conference articles in computer science and related disciplines, obtained from DBLP (dblp.uni-trier.de). In computer science, conference articles are typically formal publications and are often more prestigious than journal articles, with acceptance rates at some conferences below 10%. Citation counts and online availability were estimated using ResearchIndex. The analysis excludes self-citations, where a citation is considered to be a self-citation if one or more of the citing and cited authors match.
Thelwall, M.; Vaughan, L.; Björneborn, L.: Webometrics (2004) 0.06
```
0.060129207 = product of:
  0.12025841 = sum of:
    0.08726498 = weight(_text_:web in 4279) [ClassicSimilarity], result of:
      0.08726498 = score(doc=4279,freq=18.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.5408555 = fieldWeight in 4279, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4279)
    0.032993436 = weight(_text_:search in 4279) [ClassicSimilarity], result of:
      0.032993436 = score(doc=4279,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.19200584 = fieldWeight in 4279, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4279)
  0.5 = coord(2/4)
```
Abstract

Webometrics, the quantitative study of Web-related phenomena, emerged from the realization that methods originally designed for bibliometric analysis of scientific journal article citation patterns could be applied to the Web, with commercial search engines providing the raw data. Almind and Ingwersen (1997) defined the field and gave it its name. Other pioneers included Rodriguez Gairin (1997) and Aguillo (1998). Larson (1996) undertook exploratory link structure analysis, as did Rousseau (1997). Webometrics encompasses research from fields beyond information science such as communication studies, statistical physics, and computer science. In this review we concentrate on link analysis, but also cover other aspects of webometrics, including Web log fle analysis. One theme that runs through this chapter is the messiness of Web data and the need for data cleansing heuristics. The uncontrolled Web creates numerous problems in the interpretation of results, for instance, from the automatic creation or replication of links. The loose connection between top-level domain specifications (e.g., com, edu, and org) and their actual content is also a frustrating problem. For example, many .com sites contain noncommercial content, although com is ostensibly the main commercial top-level domain. Indeed, a skeptical researcher could claim that obstacles of this kind are so great that all Web analyses lack value. As will be seen, one response to this view, a view shared by critics of evaluative bibliometrics, is to demonstrate that Web data correlate significantly with some non-Web data in order to prove that the Web data are not wholly random. A practical response has been to develop increasingly sophisticated data cleansing techniques and multiple data analysis methods.

Search (147 results, page 1 of 8)

Authors

Languages

Types

Themes

Classifications