Search (25 results, page 1 of 2)

  • × author_ss:"Wolfram, D."
  1. Ajiferuke, I.; Lu, K.; Wolfram, D.: ¬A comparison of citer and citation-based measure outcomes for multiple disciplines (2010) 0.03
    0.028611436 = product of:
      0.042917155 = sum of:
        0.02263261 = weight(_text_:on in 4000) [ClassicSimilarity], result of:
          0.02263261 = score(doc=4000,freq=4.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.20619515 = fieldWeight in 4000, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=4000)
        0.020284547 = product of:
          0.040569093 = sum of:
            0.040569093 = weight(_text_:22 in 4000) [ClassicSimilarity], result of:
              0.040569093 = score(doc=4000,freq=2.0), product of:
                0.1747608 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04990557 = queryNorm
                0.23214069 = fieldWeight in 4000, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4000)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Author research impact was examined based on citer analysis (the number of citers as opposed to the number of citations) for 90 highly cited authors grouped into three broad subject areas. Citer-based outcome measures were also compared with more traditional citation-based measures for levels of association. The authors found that there are significant differences in citer-based outcomes among the three broad subject areas examined and that there is a high degree of correlation between citer and citation-based measures for all measures compared, except for two outcomes calculated for the social sciences. Citer-based measures do produce slightly different rankings of authors based on citer counts when compared to more traditional citation counts. Examples are provided. Citation measures may not adequately address the influence, or reach, of an author because citations usually do not address the origin of the citation beyond self-citations.
    Date
    28. 9.2010 12:54:22
  2. Castanha, R.C.G.; Wolfram, D.: ¬The domain of knowledge organization : a bibliometric analysis of prolific authors and their intellectual space (2018) 0.02
    0.020160122 = product of:
      0.030240182 = sum of:
        0.013336393 = weight(_text_:on in 4150) [ClassicSimilarity], result of:
          0.013336393 = score(doc=4150,freq=2.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.121501654 = fieldWeight in 4150, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4150)
        0.01690379 = product of:
          0.03380758 = sum of:
            0.03380758 = weight(_text_:22 in 4150) [ClassicSimilarity], result of:
              0.03380758 = score(doc=4150,freq=2.0), product of:
                0.1747608 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04990557 = queryNorm
                0.19345059 = fieldWeight in 4150, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4150)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The domain of knowledge organization (KO) represents a foundational area of information science. One way to better understand the intellectual structure of the KO domain is to apply bibliometric methods to key contributors to the literature. This study analyzes the most prolific contributing authors to the journal Knowledge Organization, the sources they cite and the citations they receive for the period 1993 to 2016. The analyses were conducted using visualization outcomes of citation, co-citation and author bibliographic coupling analysis to reveal theoretical points of reference among authors and the most prominent research themes that constitute this scientific community. Birger Hjørland was the most cited author, and was situated at or near the middle of each of the maps based on different citation relationships. The proximities between authors resulting from the different citation relationships demonstrate how authors situate themselves intellectually through the citations they give and how other authors situate them through the citations received. There is a consistent core of theoretical references as well among the most productive authors. We observed a close network of scholarly communication between the authors cited in this core, which indicates the actual role of the journal Knowledge Organization as a space for knowledge construction in the area of knowledge organization.
    Source
    Knowledge organization. 45(2018) no.1, S.13-22
  3. Wolfram, D.; Zhang, J.: ¬The influence of indexing practices and weighting algorithms on document spaces (2008) 0.01
    0.011928434 = product of:
      0.0357853 = sum of:
        0.0357853 = weight(_text_:on in 1963) [ClassicSimilarity], result of:
          0.0357853 = score(doc=1963,freq=10.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.32602316 = fieldWeight in 1963, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=1963)
      0.33333334 = coord(1/3)
    
    Abstract
    Index modeling and computer simulation techniques are used to examine the influence of indexing frequency distributions, indexing exhaustivity distributions, and three weighting methods on hypothetical document spaces in a vector-based information retrieval (IR) system. The way documents are indexed plays an important role in retrieval. The authors demonstrate the influence of different indexing characteristics on document space density (DSD) changes and document space discriminative capacity for IR. Document environments that contain a relatively higher percentage of infrequently occurring terms provide lower density outcomes than do environments where a higher percentage of frequently occurring terms exists. Different indexing exhaustivity levels, however, have little influence on the document space densities. A weighting algorithm that favors higher weights for infrequently occurring terms results in the lowest overall document space densities, which allows documents to be more readily differentiated from one another. This in turn can positively influence IR. The authors also discuss the influence on outcomes using two methods of normalization of term weights (i.e., means and ranges) for the different weighting methods.
  4. Olson, H.A.; Wolfram, D.: Syntagmatic relationships and indexing consistency on a larger scale (2008) 0.01
    0.011761595 = product of:
      0.035284784 = sum of:
        0.035284784 = weight(_text_:on in 2214) [ClassicSimilarity], result of:
          0.035284784 = score(doc=2214,freq=14.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.3214632 = fieldWeight in 2214, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2214)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - The purpose of this article is to examine interindexer consistency on a larger scale than other studies have done to determine if group consensus is reached by larger numbers of indexers and what, if any, relationships emerge between assigned terms. Design/methodology/approach - In total, 64 MLIS students were recruited to assign up to five terms to a document. The authors applied basic data modeling and the exploratory statistical techniques of multi-dimensional scaling (MDS) and hierarchical cluster analysis to determine whether relationships exist in indexing consistency and the coocurrence of assigned terms. Findings - Consistency in the assignment of indexing terms to a document follows an inverse shape, although it is not strictly power law-based unlike many other social phenomena. The exploratory techniques revealed that groups of terms clustered together. The resulting term cooccurrence relationships were largely syntagmatic. Research limitations/implications - The results are based on the indexing of one article by non-expert indexers and are, thus, not generalizable. Based on the study findings, along with the growing popularity of folksonomies and the apparent authority of communally developed information resources, communally developed indexes based on group consensus may have merit. Originality/value - Consistency in the assignment of indexing terms has been studied primarily on a small scale. Few studies have examined indexing on a larger scale with more than a handful of indexers. Recognition of the differences in indexing assignment has implications for the development of public information systems, especially those that do not use a controlled vocabulary and those tagged by end-users. In such cases, multiple access points that accommodate the different ways that users interpret content are needed so that searchers may be guided to relevant content despite using different terminology.
  5. Wolfram, D.; Wang, P.; Zhang, J.: Identifying Web search session patterns using cluster analysis : a comparison of three search environments (2009) 0.01
    0.010669115 = product of:
      0.032007344 = sum of:
        0.032007344 = weight(_text_:on in 2796) [ClassicSimilarity], result of:
          0.032007344 = score(doc=2796,freq=8.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.29160398 = fieldWeight in 2796, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=2796)
      0.33333334 = coord(1/3)
    
    Abstract
    Session characteristics taken from large transaction logs of three Web search environments (academic Web site, public search engine, consumer health information portal) were modeled using cluster analysis to determine if coherent session groups emerged for each environment and whether the types of session groups are similar across the three environments. The analysis revealed three distinct clusters of session behaviors common to each environment: hit and run sessions on focused topics, relatively brief sessions on popular topics, and sustained sessions using obscure terms with greater query modification. The findings also revealed shifts in session characteristics over time for one of the datasets, away from hit and run sessions toward more popular search topics. A better understanding of session characteristics can help system designers to develop more responsive systems to support search features that cater to identifiable groups of searchers based on their search behaviors. For example, the system may identify struggling searchers based on session behaviors that match those identified in the current study to provide context sensitive help.
  6. Wittig, C.; Wolfram, D.: ¬A survey of networking education in North American library schools (1994) 0.01
    0.010058938 = product of:
      0.030176813 = sum of:
        0.030176813 = weight(_text_:on in 750) [ClassicSimilarity], result of:
          0.030176813 = score(doc=750,freq=4.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.27492687 = fieldWeight in 750, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0625 = fieldNorm(doc=750)
      0.33333334 = coord(1/3)
    
    Abstract
    Reports results of a survey of US library schools to investigate the adoption, impact, and role of networking concepts and resources, such as the Internet, in the library and information science curriculum. Findings indicate that, to a large degree, educators have kept up with recent trends and tools in networking in a variety of courses. There was overwhelming consensus on the importance of networked information resources and access tools but less agreement on their places in the library and information science curriculum
  7. Zhang, J.; Wolfram, D.; Wang, P.; Hong, Y.; Gillis, R.: Visualization of health-subject analysis based on query term co-occurrences (2008) 0.01
    0.009940362 = product of:
      0.029821085 = sum of:
        0.029821085 = weight(_text_:on in 2376) [ClassicSimilarity], result of:
          0.029821085 = score(doc=2376,freq=10.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.271686 = fieldWeight in 2376, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2376)
      0.33333334 = coord(1/3)
    
    Abstract
    A multidimensional-scaling approach is used to analyze frequently used medical-topic terms in queries submitted to a Web-based consumer health information system. Based on a year-long transaction log file, five medical focus keywords (stomach, hip, stroke, depression, and cholesterol) and their co-occurring query terms are analyzed. An overlap-coefficient similarity measure and a conversion measure are used to calculate the proximity of terms to one another based on their co-occurrences in queries. The impact of the dimensionality of the visual configuration, the cutoff point of term co-occurrence for inclusion in the analysis, and the Minkowski metric power k on the stress value are discussed. A visual clustering of groups of terms based on the proximity within each focus-keyword group is also conducted. Term distributions within each visual configuration are characterized and are compared with formal medical vocabulary. This investigation reveals that there are significant differences between consumer health query-term usage and more formal medical terminology used by medical professionals when describing the same medical subject. Future directions are discussed.
  8. Dimitroff, A.; Wolfram, D.: Searcher response in a hypertext-based bibliographic information retrieval system (1995) 0.01
    0.009015355 = product of:
      0.027046064 = sum of:
        0.027046064 = product of:
          0.054092128 = sum of:
            0.054092128 = weight(_text_:22 in 187) [ClassicSimilarity], result of:
              0.054092128 = score(doc=187,freq=2.0), product of:
                0.1747608 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04990557 = queryNorm
                0.30952093 = fieldWeight in 187, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=187)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Journal of the American Society for Information Science. 46(1995) no.1, S.22-29
  9. Zhang, J.; Wolfram, D.; Wang, P.: Analysis of query keywords of sports-related queries using visualization and clustering (2009) 0.01
    0.008890929 = product of:
      0.026672786 = sum of:
        0.026672786 = weight(_text_:on in 2947) [ClassicSimilarity], result of:
          0.026672786 = score(doc=2947,freq=8.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.24300331 = fieldWeight in 2947, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2947)
      0.33333334 = coord(1/3)
    
    Abstract
    The authors investigated 11 sports-related query keywords extracted from a public search engine query log to better understand sports-related information seeking on the Internet. After the query log contents were cleaned and query data were parsed, popular sports-related keywords were identified, along with frequently co-occurring query terms associated with the identified keywords. Relationships among each sports-related focus keyword and its related keywords were characterized and grouped using multidimensional scaling (MDS) in combination with traditional hierarchical clustering methods. The two approaches were synthesized in a visual context by highlighting the results of the hierarchical clustering analysis in the visual MDS configuration. Important events, people, subjects, merchandise, and so on related to a sport were illustrated, and relationships among the sports were analyzed. A small-scale comparative study of sports searches with and without term assistance was conducted. Searches that used search term assistance by relying on previous query term relationships outperformed the searches without the search term assistance. The findings of this study provide insights into sports information seeking behavior on the Internet. The developed method also may be applied to other query log subject areas.
  10. Lu, K.; Wolfram, D.: Measuring author research relatedness : a comparison of word-based, topic-based, and author cocitation approaches (2012) 0.01
    0.008890929 = product of:
      0.026672786 = sum of:
        0.026672786 = weight(_text_:on in 453) [ClassicSimilarity], result of:
          0.026672786 = score(doc=453,freq=8.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.24300331 = fieldWeight in 453, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=453)
      0.33333334 = coord(1/3)
    
    Abstract
    Relationships between authors based on characteristics of published literature have been studied for decades. Author cocitation analysis using mapping techniques has been most frequently used to study how closely two authors are thought to be in intellectual space based on how members of the research community co-cite their works. Other approaches exist to study author relatedness based more directly on the text of their published works. In this study we present static and dynamic word-based approaches using vector space modeling, as well as a topic-based approach based on latent Dirichlet allocation for mapping author research relatedness. Vector space modeling is used to define an author space consisting of works by a given author. Outcomes for the two word-based approaches and a topic-based approach for 50 prolific authors in library and information science are compared with more traditional author cocitation analysis using multidimensional scaling and hierarchical cluster analysis. The two word-based approaches produced similar outcomes except where two authors were frequent co-authors for the majority of their articles. The topic-based approach produced the most distinctive map.
  11. Wolfram, D.; Dimitroff, A.: Preliminary findings on searcher performance and perceptions of performance in a hypertext bibliographic retrieval system (1997) 0.01
    0.008801571 = product of:
      0.026404712 = sum of:
        0.026404712 = weight(_text_:on in 1857) [ClassicSimilarity], result of:
          0.026404712 = score(doc=1857,freq=4.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.24056101 = fieldWeight in 1857, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1857)
      0.33333334 = coord(1/3)
    
    Abstract
    Reports on research examining the relationship of searcher performance and perception of performance, particulary for hypertext-based onformation retrieval systems for bibliographic data. Employs a prototype hypertext bibliographic retrieval system called HyperLynx. Evaluates its use by 83 subjects at the School of Library and Information Science and the Golda Meir Library at the University of Wisconsin-Milwaukee, USA. Measures of system usgae indicate that there is no significant relationship between confidence and the number of record pages visited, although confident searchers searched for shorter time periods. The reality check measures shows that both novice and experienced searchers were over confident in their performance
  12. Zhang, J.; Wolfram, D.: Visualization of term discrimination analysis (2001) 0.01
    0.0076997704 = product of:
      0.02309931 = sum of:
        0.02309931 = weight(_text_:on in 5210) [ClassicSimilarity], result of:
          0.02309931 = score(doc=5210,freq=6.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.21044704 = fieldWeight in 5210, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5210)
      0.33333334 = coord(1/3)
    
    Abstract
    Zang and Wolfram compute the discrimination value for terms as the difference between the centroid value of all terms in the corpus and that value without the term in question, and suggest selection be made by comparing density changes with a visualization tool. The Distance Angle Retrieval Environment (DARE) visually projects a document or term space by presenting distance similarity on the X axis and angular similarity on the Y axis. Thus a document icon appearing close to the X axis would be relevant to reference points in terms of a distance similarity measure, while those close to the Y axis are relevant to reference points in terms of an angle based measure. Using 450 Associated Press news reports indexed by 44 distinct terms, the removal of the term ``Yeltsin'' causes the cluster to fall on the Y axis indicating a good discriminator. For an angular measure, cosine say, movement along the X axis to the left will signal good discrimination, as movement to the right will signal poor discrimination. A term density space could also be used. Most terms are shown to be indifferent discriminators. Different measures result in different choices as good and poor discriminators, as does the use of a term space rather than a document space. The visualization approach is clearly feasible, and provides some additional insights not found in the computation of a discrimination value.
  13. Wolfram, D.; Zhang, J.: ¬An investigation of the influence of indexing exhaustivity and term distributions on a document space (2002) 0.01
    0.0076997704 = product of:
      0.02309931 = sum of:
        0.02309931 = weight(_text_:on in 5238) [ClassicSimilarity], result of:
          0.02309931 = score(doc=5238,freq=6.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.21044704 = fieldWeight in 5238, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5238)
      0.33333334 = coord(1/3)
    
    Abstract
    Wolfram and Zhang are interested in the effect of different indexing exhaustivity, by which they mean the number of terms chosen, and of different index term distributions and different term weighting methods on the resulting document cluster organization. The Distance Angle Retrieval Environment, DARE, which provides a two dimensional display of retrieved documents was used to represent the document clusters based upon a document's distance from the searcher's main interest, and on the angle formed by the document, a point representing a minor interest, and the point representing the main interest. If the centroid and the origin of the document space are assigned as major and minor points the average distance between documents and the centroid can be measured providing an indication of cluster organization. in the form of a size normalized similarity measure. Using 500 records from NTIS and nine models created by intersecting low, observed, and high exhaustivity levels (based upon a negative binomial distribution) with shallow, observed, and steep term distributions (based upon a Zipf distribution) simulation runs were preformed using inverse document frequency, inter-document term frequency, and inverse document frequency based upon both inter and intra-document frequencies. Low exhaustivity and shallow distributions result in a more dense document space and less effective retrieval. High exhaustivity and steeper distributions result in a more diffuse space.
  14. Zhang, J.; Chen, Y.; Zhao, Y.; Wolfram, D.; Ma, F.: Public health and social media : a study of Zika virus-related posts on Yahoo! Answers (2020) 0.01
    0.0076997704 = product of:
      0.02309931 = sum of:
        0.02309931 = weight(_text_:on in 5672) [ClassicSimilarity], result of:
          0.02309931 = score(doc=5672,freq=6.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.21044704 = fieldWeight in 5672, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5672)
      0.33333334 = coord(1/3)
    
    Abstract
    This study investigates the content of questions and responses about the Zika virus on Yahoo! Answers as a recent example of how public concerns regarding an international health issue are reflected in social media. We investigate the contents of posts about the Zika virus on Yahoo! Answers, identify and reveal subject patterns about the Zika virus, and analyze the temporal changes of the revealed subject topics over 4 defined periods of the Zika virus outbreak. Multidimensional scaling analysis, temporal analysis, and inferential statistical analysis approaches were used in the study. A resulting 2-layer Zika virus schema, and term connections and relationships are presented. The results indicate that consumers' concerns changed over the 4 defined periods. Consumers paid more attention to the basic information about the Zika virus, and the prevention and protection from the Zika virus at the beginning of the outbreak of the Zika virus. During the later periods, consumers became more interested in the role that the government and health organizations played in the public health emergency.
  15. Ross, N.C.M.; Wolfram, D.: End user searching on the Internet : an analysis of term pair topics submitted to the Excite search engine (2000) 0.01
    0.0075442037 = product of:
      0.02263261 = sum of:
        0.02263261 = weight(_text_:on in 4998) [ClassicSimilarity], result of:
          0.02263261 = score(doc=4998,freq=4.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.20619515 = fieldWeight in 4998, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=4998)
      0.33333334 = coord(1/3)
    
    Abstract
    Queries submitted to the Excite search engine were analyzed for subject content based on the cooccurrence of terms within multiterm queries. More than 1000 of the most frequently cooccurring term pairs were categorized into one or more of 30 developed subject areas. Subject area frequencies and their cooccurrences with one another were tallied and analyzed using hierarchical cluster analysis and multidimensional scaling. The cluster analyses revealed several anticipated and a few unanticipated groupings of subjects, resulting in several well-defined high-level clusters of broad subject areas. Multidimensional scaling of subject cooccurrences revealed similar relationships among the different subject categories. Applications that arise from a better understanding of the topics users search and their relationships are discussed
  16. Wolfram, D.; Olson, H.A.; Bloom, R.: Measuring consistency for multiple taggers using vector space modeling (2009) 0.01
    0.0075442037 = product of:
      0.02263261 = sum of:
        0.02263261 = weight(_text_:on in 3113) [ClassicSimilarity], result of:
          0.02263261 = score(doc=3113,freq=4.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.20619515 = fieldWeight in 3113, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=3113)
      0.33333334 = coord(1/3)
    
    Abstract
    A longstanding area of study in indexing is the identification of factors affecting vocabulary usage and consistency. This topic has seen a recent resurgence with a focus on social tagging. Tagging data for scholarly articles made available by the social bookmarking Website CiteULike (www.citeulike.org) were used to test the use of inter-indexer/tagger consistency density values, based on a method developed by the authors by comparing calculations for highly tagged documents representing three subject areas (Science, Social Science, Social Software). The analysis revealed that the developed method is viable for a large dataset. The findings also indicated that there were no significant differences in tagging consistency among the three topic areas, demonstrating that vocabulary usage in a relatively new subject area like social software is no more inconsistent than the more established subject areas investigated. The implications of the method used and the findings are discussed.
  17. Lu, K.; Cai, X.; Ajiferuke, I.; Wolfram, D.: Vocabulary size and its effect on topic representation (2017) 0.01
    0.0075442037 = product of:
      0.02263261 = sum of:
        0.02263261 = weight(_text_:on in 3414) [ClassicSimilarity], result of:
          0.02263261 = score(doc=3414,freq=4.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.20619515 = fieldWeight in 3414, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=3414)
      0.33333334 = coord(1/3)
    
    Abstract
    This study investigates how computational overhead for topic model training may be reduced by selectively removing terms from the vocabulary of text corpora being modeled. We compare the impact of removing singly occurring terms, the top 0.5%, 1% and 5% most frequently occurring terms and both top 0.5% most frequent and singly occurring terms, along with changes in the number of topics modeled (10, 20, 30, 40, 50, 100) using three datasets. Four outcome measures are compared. The removal of singly occurring terms has little impact on outcomes for all of the measures tested. Document discriminative capacity, as measured by the document space density, is reduced by the removal of frequently occurring terms, but increases with higher numbers of topics. Vocabulary size does not greatly influence entropy, but entropy is affected by the number of topics. Finally, topic similarity, as measured by pairwise topic similarity and Jensen-Shannon divergence, decreases with the removal of frequent terms. The findings have implications for information science research in information retrieval and informetrics that makes use of topic modeling.
  18. Xie, H.I.; Wolfram, D.: State digital library usability contributing organizational factors (2002) 0.01
    0.0062868367 = product of:
      0.01886051 = sum of:
        0.01886051 = weight(_text_:on in 5221) [ClassicSimilarity], result of:
          0.01886051 = score(doc=5221,freq=4.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.1718293 = fieldWeight in 5221, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5221)
      0.33333334 = coord(1/3)
    
    Abstract
    In this issue Xie and Wolfram study the Wisconsin state digital library BadgerLink to determine the organizational factors that lead to different use requirements and the degree to which these are met, as well as impact on physical libraries. To this end, usage data from EBSCOhost and ProQuest logs for BadgerLink were analyzed, 313 Wisconsin libraries of all types were surveyed (76% response rate), and analyzed along with 81 responses to a voluntary web survey of end users. Heaviest users were K-12 schools and institutions of higher education. Heaviest use sites were the two largest state universities and the state's largest public library. Small libraries were infrequent users. Web survey respondents were mature working professionals. Sixty percent searched for specific information, but 46% reported browsing in subject areas. Libraries with dedicated Internet access reported more frequent usage than those with dial-up connection. Those who accessed from libraries reported more frequent use than those at work or at home. Libraries that trained end users reported more use, but the majority of the web survey respondents reported themselves as self-taught. Logs confirm reported subject interests. Three surrogates were requested for every full text document but full text availability is reported as the reason for use by 30% of users. Availability has led to the cancellation of subscriptions in many libraries that are important promoters of the service. A model will need to include interactions based upon the influence of each involved participant on the others. It will also need to include the extension of the activities of one participant to other participant organizations and the communication among these organizations.
  19. Wang, F.; Wolfram, D.: Assessment of journal similarity based on citing discipline analysis (2015) 0.01
    0.0062868367 = product of:
      0.01886051 = sum of:
        0.01886051 = weight(_text_:on in 1849) [ClassicSimilarity], result of:
          0.01886051 = score(doc=1849,freq=4.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.1718293 = fieldWeight in 1849, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1849)
      0.33333334 = coord(1/3)
    
    Abstract
    This study compares the range of disciplines of citing journal articles to determine how closely related journals assigned to the same Web of Science research area are. The frequency distribution of disciplines by citing articles provides a signature for a cited journal that permits it to be compared with other journals using similarity comparison techniques. As an initial exploration, citing discipline data for 40 high-impact-factor journals assigned to the "information science and library science" category of the Web of Science were compared across 5 time periods. Similarity relationships were determined using multidimensional scaling and hierarchical cluster analysis to compare the outcomes produced by the proposed citing discipline and established cocitation methods. The maps and clustering outcomes reveal that a number of journals in allied areas of the information science and library science category may not be very closely related to each other or may not be appropriately situated in the category studied. The citing discipline similarity data resulted in similar outcomes with the cocitation data but with some notable differences. Because the citing discipline method relies on a citing perspective different from cocitations, it may provide a complementary way to compare journal similarity that is less labor intensive than cocitation analysis.
  20. Wolfram, D.; Volz, A.; Dimitroff, A.: ¬The effect of linkage structure on retrieval performance in a hypertext-based bibliographic retrieval system (1996) 0.01
    0.00622365 = product of:
      0.01867095 = sum of:
        0.01867095 = weight(_text_:on in 6622) [ClassicSimilarity], result of:
          0.01867095 = score(doc=6622,freq=2.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.17010231 = fieldWeight in 6622, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6622)
      0.33333334 = coord(1/3)