Search (145 results, page 1 of 8)

  • × theme_ss:"Informetrie"
  1. Liu, D.-R.; Shih, M.-J.: Hybrid-patent classification based on patent-network analysis (2011) 0.09
    0.08966473 = product of:
      0.13449709 = sum of:
        0.11778076 = weight(_text_:query in 4189) [ClassicSimilarity], result of:
          0.11778076 = score(doc=4189,freq=8.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.5134957 = fieldWeight in 4189, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4189)
        0.016716326 = product of:
          0.03343265 = sum of:
            0.03343265 = weight(_text_:22 in 4189) [ClassicSimilarity], result of:
              0.03343265 = score(doc=4189,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.19345059 = fieldWeight in 4189, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4189)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Effective patent management is essential for organizations to maintain their competitive advantage. The classification of patents is a critical part of patent management and industrial analysis. This study proposes a hybrid-patent-classification approach that combines a novel patent-network-based classification method with three conventional classification methods to analyze query patents and predict their classes. The novel patent network contains various types of nodes that represent different features extracted from patent documents. The nodes are connected based on the relationship metrics derived from the patent metadata. The proposed classification method predicts a query patent's class by analyzing all reachable nodes in the patent network and calculating their relevance to the query patent. It then classifies the query patent with a modified k-nearest neighbor classifier. To further improve the approach, we combine it with content-based, citation-based, and metadata-based classification methods to develop a hybrid-classification approach. We evaluate the performance of the hybrid approach on a test dataset of patent documents obtained from the U.S. Patent and Trademark Office, and compare its performance with that of the three conventional methods. The results demonstrate that the proposed patent-network-based approach yields more accurate class predictions than the patent network-based approach.
    Date
    22. 1.2011 13:04:21
  2. Zhang, Y.; Jansen, B.J.; Spink, A.: Identification of factors predicting clickthrough in Web searching using neural network analysis (2009) 0.08
    0.07999992 = product of:
      0.119999886 = sum of:
        0.09994029 = weight(_text_:query in 2742) [ClassicSimilarity], result of:
          0.09994029 = score(doc=2742,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.43571556 = fieldWeight in 2742, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=2742)
        0.020059591 = product of:
          0.040119182 = sum of:
            0.040119182 = weight(_text_:22 in 2742) [ClassicSimilarity], result of:
              0.040119182 = score(doc=2742,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.23214069 = fieldWeight in 2742, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2742)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In this research, we aim to identify factors that significantly affect the clickthrough of Web searchers. Our underlying goal is determine more efficient methods to optimize the clickthrough rate. We devise a clickthrough metric for measuring customer satisfaction of search engine results using the number of links visited, number of queries a user submits, and rank of clicked links. We use a neural network to detect the significant influence of searching characteristics on future user clickthrough. Our results show that high occurrences of query reformulation, lengthy searching duration, longer query length, and the higher ranking of prior clicked links correlate positively with future clickthrough. We provide recommendations for leveraging these findings for improving the performance of search engine retrieval and result ranking, along with implications for search engine marketing.
    Date
    22. 3.2009 17:49:11
  3. Ridenour, L.: Boundary objects : measuring gaps and overlap between research areas (2016) 0.06
    0.06048537 = product of:
      0.09072805 = sum of:
        0.07066846 = weight(_text_:query in 2835) [ClassicSimilarity], result of:
          0.07066846 = score(doc=2835,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.30809742 = fieldWeight in 2835, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=2835)
        0.020059591 = product of:
          0.040119182 = sum of:
            0.040119182 = weight(_text_:22 in 2835) [ClassicSimilarity], result of:
              0.040119182 = score(doc=2835,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.23214069 = fieldWeight in 2835, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2835)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The aim of this paper is to develop methodology to determine conceptual overlap between research areas. It investigates patterns of terminology usage in scientific abstracts as boundary objects between research specialties. Research specialties were determined by high-level classifications assigned by Thomson Reuters in their Essential Science Indicators file, which provided a strictly hierarchical classification of journals into 22 categories. Results from the query "network theory" were downloaded from the Web of Science. From this file, two top-level groups, economics and social sciences, were selected and topically analyzed to provide a baseline of similarity on which to run an informetric analysis. The Places & Spaces Map of Science (Klavans and Boyack 2007) was used to determine the proximity of disciplines to one another in order to select the two disciplines use in the analysis. Groups analyzed share common theories and goals; however, groups used different language to describe their research. It was found that 61% of term words were shared between the two groups.
  4. Niemi, T.; Hirvonen, L.; Järvelin, K.: Multidimensional data model and query language for informetrics (2003) 0.05
    0.052673157 = product of:
      0.15801947 = sum of:
        0.15801947 = weight(_text_:query in 1753) [ClassicSimilarity], result of:
          0.15801947 = score(doc=1753,freq=10.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.68892676 = fieldWeight in 1753, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=1753)
      0.33333334 = coord(1/3)
    
    Abstract
    Multidimensional data analysis or On-line analytical processing (OLAP) offers a single subject-oriented source for analyzing summary data based an various dimensions. We demonstrate that the OLAP approach gives a promising starting point for advanced analysis and comparison among summary data in informetrics applications. At the moment there is no single precise, commonly accepted logical/conceptual model for multidimensional analysis. This is because the requirements of applications vary considerably. We develop a conceptual/logical multidimensional model for supporting the complex and unpredictable needs of informetrics. Summary data are considered with respect of some dimensions. By changing dimensions the user may construct other views an the same summary data. We develop a multidimensional query language whose basic idea is to support the definition of views in a way, which is natural and intuitive for lay users in the informetrics area. We show that this view-oriented query language has a great expressive power and its degree of declarativity is greater than in contemporary operation-oriented or SQL (Structured Query Language)-like OLAP query languages.
  5. Bar-Ilan, J.: On the overlap, the precision and estimated recall of search engines : a case study of the query 'Erdös' (1998) 0.04
    0.03886567 = product of:
      0.116597004 = sum of:
        0.116597004 = weight(_text_:query in 3753) [ClassicSimilarity], result of:
          0.116597004 = score(doc=3753,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.5083348 = fieldWeight in 3753, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3753)
      0.33333334 = coord(1/3)
    
    Abstract
    Investigates the retrieval capabilities of 6 Internet search engines on a simple query. Existing work on search engine evaluation considers only the first 10 or 20 results returned by the search engine. In this work, all documents that the search engine pointed at were retrieved and thoroughly examined. Thus the precision of the whole retrieval process could be calculated, the overlap between the results of the engines studied, and an estimate on the recall of the searches given. The precision of the engines is high, recall is very low and the overlap is minimal
  6. Braun, T.; Glanzel, W.; Grupp, H.: ¬The scientometric weight of 50 nations in 27 scientific areas, 1989-1993 : Pt.1: All fields combined, mathematics, engineering, chemistry and physics (1995) 0.03
    0.034373984 = product of:
      0.103121944 = sum of:
        0.103121944 = product of:
          0.20624389 = sum of:
            0.20624389 = weight(_text_:page in 761) [ClassicSimilarity], result of:
              0.20624389 = score(doc=761,freq=6.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.74820316 = fieldWeight in 761, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=761)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Attempts some new approaches to the presentation of bibliometric macro level indicators. Mathematics, engineering, physics and chemistry subfields are assigned to 13 science areas. Each science area then appears on 1 table (left page) and 2 graphs (right page). The 1st graph shows the main citation rates with respect to the world average on a relational chart. The countries are represented by letter codes that can be found in the corresponding table on the facing page. The 2nd graph visualizes the countries' relative research activity in the given science areas as compared to the world standard
  7. Koehler, W.: Web page change and persistence : a four-year longitudinal study (2002) 0.03
    0.03402142 = product of:
      0.10206425 = sum of:
        0.10206425 = product of:
          0.2041285 = sum of:
            0.2041285 = weight(_text_:page in 203) [ClassicSimilarity], result of:
              0.2041285 = score(doc=203,freq=8.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.74052906 = fieldWeight in 203, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.046875 = fieldNorm(doc=203)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Changes in the topography of the Web can be expressed in at least four ways: (1) more sites on more servers in more places, (2) more pages and objects added to existing sites and pages, (3) changes in traffic, and (4) modifications to existing text, graphic, and other Web objects. This article does not address the first three factors (more sites, more pages, more traffic) in the growth of the Web. It focuses instead on changes to an existing set of Web documents. The article documents changes to an aging set of Web pages, first identified and "collected" in December 1996 and followed weekly thereafter. Results are reported through February 2001. The article addresses two related phenomena: (1) the life cycle of Web objects, and (2) changes to Web objects. These data reaffirm that the half-life of a Web page is approximately 2 years. There is variation among Web pages by top-level domain and by page type (navigation, content). Web page content appears to stabilize over time; aging pages change less often than once they did
  8. Alonso, S.; Cabrerizo, F.J.; Herrera-Viedma, E.; Herrera, F.: WoS query partitioner : a tool to retrieve very large numbers of items from the Web of Science using different source-based partitioning approaches (2010) 0.03
    0.03400038 = product of:
      0.10200114 = sum of:
        0.10200114 = weight(_text_:query in 3701) [ClassicSimilarity], result of:
          0.10200114 = score(doc=3701,freq=6.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.44470036 = fieldWeight in 3701, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3701)
      0.33333334 = coord(1/3)
    
    Abstract
    Thomson Reuters' Web of Science (WoS) is undoubtedly a great tool for scientiometrics purposes. It allows one to retrieve and compute different measures such as the total number of papers that satisfy a particular condition; however, it also is well known that this tool imposes several different restrictions that make obtaining certain results difficult. One of those constraints is that the tool does not offer the total count of documents in a dataset if it is larger than 100,000 items. In this article, we propose and analyze different approaches that involve partitioning the search space (using the Source field) to retrieve item counts for very large datasets from the WoS. The proposed techniques improve previous approaches: They do not need any extra information about the retrieved dataset (thus allowing completely automatic procedures to retrieve the results), they are designed to avoid many of the restrictions imposed by the WoS, and they can be easily applied to almost any query. Finally, a description of WoS Query Partitioner, a freely available and online interactive tool that implements those techniques, is presented.
  9. Bucy, E.P.; Lang, A.; Potter, R.F.; Grabe, M.E.: Formal features of cyberspace : relationships between Web page complexity and site traffic (1999) 0.03
    0.02835118 = product of:
      0.08505354 = sum of:
        0.08505354 = product of:
          0.17010708 = sum of:
            0.17010708 = weight(_text_:page in 4060) [ClassicSimilarity], result of:
              0.17010708 = score(doc=4060,freq=8.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.6171075 = fieldWeight in 4060, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4060)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Although the Internet is not without its critics, many popular and academic writers are particular effusive in the praise of the WWW's interactive features. A content analysis of the formal features of 496 Web sites, drawn randomly from a sample of the top 5.000 most visited sites determined by 100hot.com, was performed to explore whether the capabilities of the WWW are being exploited by Web page designers to the extent that the literature suggests they are. Specifically, the study examines the differences between the formal features of commercial versus non-commercial sites as well as the relationship between Web page complexity and the amount of traffic a site receives. Findings indicate that, although most pages in this stage of the Web's development remain technological simple and noninteractive, there are significant relationships between site traffic and home-page structure for Web sites in the commercial (.com) as well as educational (.edu) domains. As the Web continues to expand and the amount of information redundancy increases, it is argued that a site's information packaging will become increasingly important in gaining users' attention and interest
  10. Tsai, B.-s.: Information landscaping : information mapping, charting, querying and reporting techniques for total quality knowledge management (2003) 0.03
    0.027761191 = product of:
      0.08328357 = sum of:
        0.08328357 = weight(_text_:query in 1079) [ClassicSimilarity], result of:
          0.08328357 = score(doc=1079,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.3630963 = fieldWeight in 1079, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1079)
      0.33333334 = coord(1/3)
    
    Abstract
    Information landscaping--an integration of information mapping, charting, querying and reporting techniques--has been developed to enable the construction of a total quality knowledge management system focusing on a particular subject information field. The techniques apply five major parameters of the Fuzzy commonality model (FCM) including unionization, quantity, continuity or stability, changeability, and critical probability, to construct a series of information maps (infomaps) and a set of chronological-statistical charts (infocharts). The infomaps and infocharts are used as the blueprints and navigation agents for building and developing a web-based subject experts depository and query-report system. Focusing on the subject experts/expertise, this system enables a researcher to expedite a query search through infomaps (qualitative reference) and infocharts (quantitative reference). The entropy measurement and the entropy constant (the square root of the average entropy measure) are calculated to compare with the critical probability of the FCM. This leads to the finding of a set of regression straight lines and the establishment of an information oscillogram. The tropics (upper limit, middle range, lower limit), and the potential/solstitial population and its growth rate within a subject information domain during a particular time period can be determined. They can effectively and efficiently guide librarians and information professionals towards the construction and the continuous development of an electronic collection. The cultivation of a virtual learning and referencing environment can also be created by utilizing this data.
  11. Egghe, L.; Ravichandra Rao, I.K.: ¬The influence of the broadness of a query of a topic on its h-index : models and examples of the h-index of n-grams (2008) 0.03
    0.027761191 = product of:
      0.08328357 = sum of:
        0.08328357 = weight(_text_:query in 2009) [ClassicSimilarity], result of:
          0.08328357 = score(doc=2009,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.3630963 = fieldWeight in 2009, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2009)
      0.33333334 = coord(1/3)
    
    Abstract
    The article studies the influence of the query formulation of a topic on its h-index. In order to generate pure random sets of documents, we used N-grams (N variable) to measure this influence: strings of zeros, truncated at the end. The used databases are WoS and Scopus. The formula h=T**1/alpha, proved in Egghe and Rousseau (2006) where T is the number of retrieved documents and is Lotka's exponent, is confirmed being a concavely increasing function of T. We also give a formula for the relation between h and N the length of the N-gram: h=D10**(-N/alpha) where D is a constant, a convexly decreasing function, which is found in our experiments. Nonlinear regression on h=T**1/alpha gives an estimation of , which can then be used to estimate the h-index of the entire database (Web of Science [WoS] and Scopus): h=S**1/alpha, , where S is the total number of documents in the database.
  12. Della Mea, V.; Demartini, G.; Di Gaspero, L.; Mizzaro, S.: Measuring retrieval effectiveness with Average Distance Measure (ADM) (2006) 0.03
    0.027482178 = product of:
      0.08244653 = sum of:
        0.08244653 = weight(_text_:query in 774) [ClassicSimilarity], result of:
          0.08244653 = score(doc=774,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.35944697 = fieldWeight in 774, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0546875 = fieldNorm(doc=774)
      0.33333334 = coord(1/3)
    
    Abstract
    Most common effectiveness measures for information retrieval systems are based on the assumptions of binary relevance (either a document is relevant to a given query or it is not) and binary retrieval (either a document is retrieved or it is not). In this paper, we describe an information retrieval effectiveness measure named ADM (Average Distance Measure) that questions these assumptions. We compare ADM with other measures, discuss it from a conceptual point of view, and report some experimental results. Both conceptual analysis and experimental evidence demonstrate ADM adequacy in measuring the effectiveness of information retrieval systems.
  13. Simkin, M.V.; Roychowdhury, V.P.: Why does attention to web articles fall with Time? (2015) 0.02
    0.024056775 = product of:
      0.072170325 = sum of:
        0.072170325 = product of:
          0.14434065 = sum of:
            0.14434065 = weight(_text_:page in 2163) [ClassicSimilarity], result of:
              0.14434065 = score(doc=2163,freq=4.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.5236331 = fieldWeight in 2163, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2163)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    We analyze access statistics of 150 blog entries and news articles for periods of up to 3 years. Access rate falls as an inverse power of time passed since publication. The power law holds for periods of up to 1,000 days. The exponents are different for different blogs and are distributed between 0.6 and 3.2. We argue that the decay of attention to a web article is caused by the link to it first dropping down the list of links on the website's front page and then disappearing from the front page and its subsequent movement further into background. The other proposed explanations that use a decaying with time novelty factor, or some intricate theory of human dynamics, cannot explain all of the experimental observations.
  14. Bar-Ilan, J.: ¬The Web as an information source on informetrics? : A content analysis (2000) 0.02
    0.023556154 = product of:
      0.07066846 = sum of:
        0.07066846 = weight(_text_:query in 4587) [ClassicSimilarity], result of:
          0.07066846 = score(doc=4587,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.30809742 = fieldWeight in 4587, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=4587)
      0.33333334 = coord(1/3)
    
    Abstract
    This article addresses the question of whether the Web can serve as an information source for research. Specifically, it analyzes by way of content analysis the Web pages retrieved by the major search engines on a particular date (June 7, 1998), as a result of the query 'informetrics OR informetric'. In 807 out of the 942 retrieved pages, the search terms were mentioned in the context of information science. Over 70% of the pages contained only indirect information on the topic, in the form of hypertext links and bibliographical references without annotation. The bibliographical references extracted from the Web pages were analyzed, and lists of most productive authors, most cited authors, works, and sources were compiled. The list of reference obtained from the Web was also compared to data retrieved from commercial databases. For most cases, the list of references extracted from the Web outperformed the commercial, bibliographic databases. The results of these comparisons indicate that valuable, freely available data is hidden in the Web waiting to be extracted from the millions of Web pages
  15. Bhavnani, S.K.: Why is it difficult to find comprehensive information? : implications of information scatter for search and design (2005) 0.02
    0.02004731 = product of:
      0.060141932 = sum of:
        0.060141932 = product of:
          0.120283864 = sum of:
            0.120283864 = weight(_text_:page in 3684) [ClassicSimilarity], result of:
              0.120283864 = score(doc=3684,freq=4.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.4363609 = fieldWeight in 3684, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3684)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The rapid development of Web sites providing extensive coverage of a topic, coupled with the development of powerful search engines (designed to help users find such Web sites), suggests that users can easily find comprehensive information about a topic. In domains such as consumer healthcare, finding comprehensive information about a topic is critical as it can improve a patient's judgment in making healthcare decisions, and can encourage higher compliance with treatment. However, recent studies show that despite using powerful search engines, many healthcare information seekers have difficulty finding comprehensive information even for narrow healthcare topics because the relevant information is scattered across many Web sites. To date, no studies have analyzed how facts related to a search topic are distributed across relevant Web pages and Web sites. In this study, the distribution of facts related to five common healthcare topics across high-quality sites is analyzed, and the reasons underlying those distributions are explored. The analysis revealed the existence of few pages that had many facts, many pages that had few facts, and no single page or site that provided all the facts. While such a distribution conforms to other information-related phenomena, a deeper analysis revealed that the distributions were caused by a trade-off between depth and breadth, leading to the existence of general, specialized, and sparse pages. Furthermore, the results helped to make explicit the knowledge needed by searchers to find comprehensive healthcare information, and suggested the motivation to explore distribution-conscious approaches for the development of future search systems, search interfaces, Web page designs, and training.
  16. Cheng, S.; YunTao, P.; JunPeng, Y.; Hong, G.; ZhengLu, Y.; ZhiYu, H.: PageRank, HITS and impact factor for journal ranking (2009) 0.02
    0.02004731 = product of:
      0.060141932 = sum of:
        0.060141932 = product of:
          0.120283864 = sum of:
            0.120283864 = weight(_text_:page in 2513) [ClassicSimilarity], result of:
              0.120283864 = score(doc=2513,freq=4.0), product of:
                0.27565226 = queryWeight, product of:
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.049352113 = queryNorm
                0.4363609 = fieldWeight in 2513, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.5854197 = idf(docFreq=450, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2513)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Journal citation measures are one of the most widely used bibliometric tools. The most well-known measure is the ISI Impact Factor, under the standard definition, the impact factor of journal j in a given year is the average number of citations received by papers published in the previous two years of journal j. However, the impact factor has its "intrinsic" limitations, it is a ranking measure based fundamentally on a pure counting of the in-degrees of nodes in the network, and its calculation does not take into account the "impact" or "prestige" of the journals in which the citations appear. Google's PageRank algorithm and Kleinberg's HITS method are webpage ranking algorithm, they compute the scores of webpages based on a combination of the number of hyperlinks that point to the page and the status of pages that the hyperlinks originate from, a page is important if it is pointed to by other important pages. We demonstrate how popular webpage algorithm PageRank and HITS can be used ranking journal, and we compared ISI impact factor, PageRank and HITS for journal ranking, and with PageRank and HITS compute respectively including self-citation and non self-citation, and discussed the merit and shortcomings and the scope of application that the various algorithms are used to rank journal.
  17. Thelwall, M.; Li, X.; Barjak, F.; Robinson, S.: Assessing the international web connectivity of research groups (2008) 0.02
    0.019630127 = product of:
      0.05889038 = sum of:
        0.05889038 = weight(_text_:query in 1401) [ClassicSimilarity], result of:
          0.05889038 = score(doc=1401,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.25674784 = fieldWeight in 1401, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1401)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - The purpose of this paper is to claim that it is useful to assess the web connectivity of research groups, describe hyperlink-based techniques to achieve this and present brief details of European life sciences research groups as a case study. Design/methodology/approach - A commercial search engine was harnessed to deliver hyperlink data via its automatic query submission interface. A special purpose link analysis tool, LexiURL, then summarised and graphed the link data in appropriate ways. Findings - Webometrics can provide a wide range of descriptive information about the international connectivity of research groups. Research limitations/implications - Only one field was analysed, data was taken from only one search engine, and the results were not validated. Practical implications - Web connectivity seems to be particularly important for attracting overseas job applicants and to promote research achievements and capabilities, and hence we contend that it can be useful for national and international governments to use webometrics to ensure that the web is being used effectively by research groups. Originality/value - This is the first paper to make a case for the value of using a range of webometric techniques to evaluate the web presences of research groups within a field, and possibly the first "applied" webometrics study produced for an external contract.
  18. Thelwall, M.: Quantitative comparisons of search engine results (2008) 0.02
    0.019630127 = product of:
      0.05889038 = sum of:
        0.05889038 = weight(_text_:query in 2350) [ClassicSimilarity], result of:
          0.05889038 = score(doc=2350,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.25674784 = fieldWeight in 2350, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2350)
      0.33333334 = coord(1/3)
    
    Abstract
    Search engines are normally used to find information or Web sites, but Webometric investigations use them for quantitative data such as the number of pages matching a query and the international spread of those pages. For this type of application, the accuracy of the hit count estimates and range of URLs in the full results are important. Here, we compare the applications programming interfaces of Google, Yahoo!, and Live Search for 1,587 single word searches. The hit count estimates were broadly consistent but with Yahoo! and Google, reporting 5-6 times more hits than Live Search. Yahoo! tended to return slightly more matching URLs than Google, with Live Search returning significantly fewer. Yahoo!'s result URLs included a significantly wider range of domains and sites than the other two, and there was little consistency between the three engines in the number of different domains. In contrast, the three engines were reasonably consistent in the number of different top-level domains represented in the result URLs, although Yahoo! tended to return the most. In conclusion, quantitative results from the three search engines are mostly consistent but with unexpected types of inconsistency that users should be aware of. Google is recommended for hit count estimates but Yahoo! is recommended for all other Webometric purposes.
  19. Thelwall, M.; Sud, P.: ¬A comparison of methods for collecting web citation data for academic organizations (2011) 0.02
    0.019630127 = product of:
      0.05889038 = sum of:
        0.05889038 = weight(_text_:query in 4626) [ClassicSimilarity], result of:
          0.05889038 = score(doc=4626,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.25674784 = fieldWeight in 4626, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4626)
      0.33333334 = coord(1/3)
    
    Abstract
    The primary webometric method for estimating the online impact of an organization is to count links to its website. Link counts have been available from commercial search engines for over a decade but this was set to end by early 2012 and so a replacement is needed. This article compares link counts to two alternative methods: URL citations and organization title mentions. New variations of these methods are also introduced. The three methods are compared against each other using Yahoo!. Two of the three methods (URL citations and organization title mentions) are also compared against each other using Bing. Evidence from a case study of 131 UK universities and 49 US Library and Information Science (LIS) departments suggests that Bing's Hit Count Estimates (HCEs) for popular title searches are not useful for webometric research but that Yahoo!'s HCEs for all three types of search and Bing's URL citation HCEs seem to be consistent. For exact URL counts the results of all three methods in Yahoo! and both methods in Bing are also consistent. Four types of accuracy factors are also introduced and defined: search engine coverage, search engine retrieval variation, search engine retrieval anomalies, and query polysemy.
  20. Delgado-Quirós, L.; Aguillo, I.F.; Martín-Martín, A.; López-Cózar, E.D.; Orduña-Malea, E.; Ortega, J.L.: Why are these publications missing? : uncovering the reasons behind the exclusion of documents in free-access scholarly databases (2024) 0.02
    0.019630127 = product of:
      0.05889038 = sum of:
        0.05889038 = weight(_text_:query in 1201) [ClassicSimilarity], result of:
          0.05889038 = score(doc=1201,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.25674784 = fieldWeight in 1201, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1201)
      0.33333334 = coord(1/3)
    
    Abstract
    This study analyses the coverage of seven free-access bibliographic databases (Crossref, Dimensions-non-subscription version, Google Scholar, Lens, Microsoft Academic, Scilit, and Semantic Scholar) to identify the potential reasons that might cause the exclusion of scholarly documents and how they could influence coverage. To do this, 116 k randomly selected bibliographic records from Crossref were used as a baseline. API endpoints and web scraping were used to query each database. The results show that coverage differences are mainly caused by the way each service builds their databases. While classic bibliographic databases ingest almost the exact same content from Crossref (Lens and Scilit miss 0.1% and 0.2% of the records, respectively), academic search engines present lower coverage (Google Scholar does not find: 9.8%, Semantic Scholar: 10%, and Microsoft Academic: 12%). Coverage differences are mainly attributed to external factors, such as web accessibility and robot exclusion policies (39.2%-46%), and internal requirements that exclude secondary content (6.5%-11.6%). In the case of Dimensions, the only classic bibliographic database with the lowest coverage (7.6%), internal selection criteria such as the indexation of full books instead of book chapters (65%) and the exclusion of secondary content (15%) are the main motives of missing publications.

Authors

Years

Languages

  • e 136
  • d 8
  • ro 1
  • More… Less…

Types

  • a 143
  • m 2
  • el 1
  • s 1
  • More… Less…