Search (8 results, page 1 of 1)

  • × author_ss:"Wolfram, D."
  • × year_i:[2000 TO 2010}
  1. Ajiferuke, I.; Wolfram, D.: Analysis of Web page image tag distribution characteristics (2005) 0.01
    0.012998324 = product of:
      0.025996648 = sum of:
        0.025996648 = product of:
          0.10398659 = sum of:
            0.10398659 = weight(_text_:authors in 1059) [ClassicSimilarity], result of:
              0.10398659 = score(doc=1059,freq=4.0), product of:
                0.24330677 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.05337063 = queryNorm
                0.42738882 = fieldWeight in 1059, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1059)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    The authors investigate the frequency distribution of the use of image tags in Web pages. Using data sampled from top level Web pages across five top level domains and from sample pages within individual websites, the authors model observed patterns in the frequency of image tag usage by fitting collected data distributions to different theoretical models used in informetrics. Models tested include the modified power law (MPL), Mandelbrot (MDB), generalized waring (GW), generalized inverse Gaussian-Poisson (GIGP), and generalized negative binomial (GNB) distributions. The GIGP provided the best fit for data sets for top level pages across the top level domains tested. The poor fits of the models to the observed data distributions from specific websites were due to the multimodal nature of the observed data sets. Mixtures of the tested models for the data sets provided better fits. The ability to effectively model Web page attributes, such as the distribution of the number of image tags used per page, is needed for accurate simulation models of Web page content, and makes it possible to estimate the number of requests needed to display the complete content of Web pages.
  2. Wolfram, D.; Zhang, J.: ¬The influence of indexing practices and weighting algorithms on document spaces (2008) 0.01
    0.012998324 = product of:
      0.025996648 = sum of:
        0.025996648 = product of:
          0.10398659 = sum of:
            0.10398659 = weight(_text_:authors in 1963) [ClassicSimilarity], result of:
              0.10398659 = score(doc=1963,freq=4.0), product of:
                0.24330677 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.05337063 = queryNorm
                0.42738882 = fieldWeight in 1963, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1963)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    Index modeling and computer simulation techniques are used to examine the influence of indexing frequency distributions, indexing exhaustivity distributions, and three weighting methods on hypothetical document spaces in a vector-based information retrieval (IR) system. The way documents are indexed plays an important role in retrieval. The authors demonstrate the influence of different indexing characteristics on document space density (DSD) changes and document space discriminative capacity for IR. Document environments that contain a relatively higher percentage of infrequently occurring terms provide lower density outcomes than do environments where a higher percentage of frequently occurring terms exists. Different indexing exhaustivity levels, however, have little influence on the document space densities. A weighting algorithm that favors higher weights for infrequently occurring terms results in the lowest overall document space densities, which allows documents to be more readily differentiated from one another. This in turn can positively influence IR. The authors also discuss the influence on outcomes using two methods of normalization of term weights (i.e., means and ranges) for the different weighting methods.
  3. Xie, H.I.; Wolfram, D.: State digital library usability contributing organizational factors (2002) 0.01
    0.009392901 = product of:
      0.018785803 = sum of:
        0.018785803 = product of:
          0.037571605 = sum of:
            0.037571605 = weight(_text_:k in 5221) [ClassicSimilarity], result of:
              0.037571605 = score(doc=5221,freq=2.0), product of:
                0.1905213 = queryWeight, product of:
                  3.569778 = idf(docFreq=3384, maxDocs=44218)
                  0.05337063 = queryNorm
                0.19720423 = fieldWeight in 5221, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.569778 = idf(docFreq=3384, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5221)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this issue Xie and Wolfram study the Wisconsin state digital library BadgerLink to determine the organizational factors that lead to different use requirements and the degree to which these are met, as well as impact on physical libraries. To this end, usage data from EBSCOhost and ProQuest logs for BadgerLink were analyzed, 313 Wisconsin libraries of all types were surveyed (76% response rate), and analyzed along with 81 responses to a voluntary web survey of end users. Heaviest users were K-12 schools and institutions of higher education. Heaviest use sites were the two largest state universities and the state's largest public library. Small libraries were infrequent users. Web survey respondents were mature working professionals. Sixty percent searched for specific information, but 46% reported browsing in subject areas. Libraries with dedicated Internet access reported more frequent usage than those with dial-up connection. Those who accessed from libraries reported more frequent use than those at work or at home. Libraries that trained end users reported more use, but the majority of the web survey respondents reported themselves as self-taught. Logs confirm reported subject interests. Three surrogates were requested for every full text document but full text availability is reported as the reason for use by 30% of users. Availability has led to the cancellation of subscriptions in many libraries that are important promoters of the service. A model will need to include interactions based upon the influence of each involved participant on the others. It will also need to include the extension of the activities of one participant to other participant organizations and the communication among these organizations.
  4. Zhang, J.; Wolfram, D.; Wang, P.; Hong, Y.; Gillis, R.: Visualization of health-subject analysis based on query term co-occurrences (2008) 0.01
    0.009392901 = product of:
      0.018785803 = sum of:
        0.018785803 = product of:
          0.037571605 = sum of:
            0.037571605 = weight(_text_:k in 2376) [ClassicSimilarity], result of:
              0.037571605 = score(doc=2376,freq=2.0), product of:
                0.1905213 = queryWeight, product of:
                  3.569778 = idf(docFreq=3384, maxDocs=44218)
                  0.05337063 = queryNorm
                0.19720423 = fieldWeight in 2376, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.569778 = idf(docFreq=3384, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2376)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A multidimensional-scaling approach is used to analyze frequently used medical-topic terms in queries submitted to a Web-based consumer health information system. Based on a year-long transaction log file, five medical focus keywords (stomach, hip, stroke, depression, and cholesterol) and their co-occurring query terms are analyzed. An overlap-coefficient similarity measure and a conversion measure are used to calculate the proximity of terms to one another based on their co-occurrences in queries. The impact of the dimensionality of the visual configuration, the cutoff point of term co-occurrence for inclusion in the analysis, and the Minkowski metric power k on the stress value are discussed. A visual clustering of groups of terms based on the proximity within each focus-keyword group is also conducted. Term distributions within each visual configuration are characterized and are compared with formal medical vocabulary. This investigation reveals that there are significant differences between consumer health query-term usage and more formal medical terminology used by medical professionals when describing the same medical subject. Future directions are discussed.
  5. Wolfram, D.; Olson, H.A.; Bloom, R.: Measuring consistency for multiple taggers using vector space modeling (2009) 0.01
    0.009191203 = product of:
      0.018382406 = sum of:
        0.018382406 = product of:
          0.07352962 = sum of:
            0.07352962 = weight(_text_:authors in 3113) [ClassicSimilarity], result of:
              0.07352962 = score(doc=3113,freq=2.0), product of:
                0.24330677 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.05337063 = queryNorm
                0.30220953 = fieldWeight in 3113, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3113)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    A longstanding area of study in indexing is the identification of factors affecting vocabulary usage and consistency. This topic has seen a recent resurgence with a focus on social tagging. Tagging data for scholarly articles made available by the social bookmarking Website CiteULike (www.citeulike.org) were used to test the use of inter-indexer/tagger consistency density values, based on a method developed by the authors by comparing calculations for highly tagged documents representing three subject areas (Science, Social Science, Social Software). The analysis revealed that the developed method is viable for a large dataset. The findings also indicated that there were no significant differences in tagging consistency among the three topic areas, demonstrating that vocabulary usage in a relatively new subject area like social software is no more inconsistent than the more established subject areas investigated. The implications of the method used and the findings are discussed.
  6. Wolfram, D.; Xie, H.I.: Traditional IR for web users : a context for general audience digital libraries (2002) 0.01
    0.007659336 = product of:
      0.015318672 = sum of:
        0.015318672 = product of:
          0.06127469 = sum of:
            0.06127469 = weight(_text_:authors in 2589) [ClassicSimilarity], result of:
              0.06127469 = score(doc=2589,freq=2.0), product of:
                0.24330677 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.05337063 = queryNorm
                0.25184128 = fieldWeight in 2589, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2589)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    The emergence of general audience digital libraries (GADLs) defines a context that represents a hybrid of both "traditional" IR, using primarily bibliographic resources provided by database vendors, and "popular" IR, exemplified by public search systems available on the World Wide Web. Findings of a study investigating end-user searching and response to a GADL are reported. Data collected from a Web-based end-user survey and data logs of resource usage for a Web-based GADL were analyzed for user characteristics, patterns of access and use, and user feedback. Cross-tabulations using respondent demographics revealed several key differences in how the system was used and valued by users of different age groups. Older users valued the service more than younger users and engaged in different searching and viewing behaviors. The GADL more closely resembles traditional retrieval systems in terms of content and purpose of use, but is more similar to popular IR systems in terms of user behavior and accessibility. A model that defines the dual context of the GADL environment is derived from the data analysis and existing IR models in general and other specific contexts. The authors demonstrate the distinguishing characteristics of this IR context, and discuss implications for the development and evaluation of future GADLs to accommodate a variety of user needs and expectations.
  7. Olson, H.A.; Wolfram, D.: Syntagmatic relationships and indexing consistency on a larger scale (2008) 0.01
    0.007659336 = product of:
      0.015318672 = sum of:
        0.015318672 = product of:
          0.06127469 = sum of:
            0.06127469 = weight(_text_:authors in 2214) [ClassicSimilarity], result of:
              0.06127469 = score(doc=2214,freq=2.0), product of:
                0.24330677 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.05337063 = queryNorm
                0.25184128 = fieldWeight in 2214, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2214)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    Purpose - The purpose of this article is to examine interindexer consistency on a larger scale than other studies have done to determine if group consensus is reached by larger numbers of indexers and what, if any, relationships emerge between assigned terms. Design/methodology/approach - In total, 64 MLIS students were recruited to assign up to five terms to a document. The authors applied basic data modeling and the exploratory statistical techniques of multi-dimensional scaling (MDS) and hierarchical cluster analysis to determine whether relationships exist in indexing consistency and the coocurrence of assigned terms. Findings - Consistency in the assignment of indexing terms to a document follows an inverse shape, although it is not strictly power law-based unlike many other social phenomena. The exploratory techniques revealed that groups of terms clustered together. The resulting term cooccurrence relationships were largely syntagmatic. Research limitations/implications - The results are based on the indexing of one article by non-expert indexers and are, thus, not generalizable. Based on the study findings, along with the growing popularity of folksonomies and the apparent authority of communally developed information resources, communally developed indexes based on group consensus may have merit. Originality/value - Consistency in the assignment of indexing terms has been studied primarily on a small scale. Few studies have examined indexing on a larger scale with more than a handful of indexers. Recognition of the differences in indexing assignment has implications for the development of public information systems, especially those that do not use a controlled vocabulary and those tagged by end-users. In such cases, multiple access points that accommodate the different ways that users interpret content are needed so that searchers may be guided to relevant content despite using different terminology.
  8. Zhang, J.; Wolfram, D.; Wang, P.: Analysis of query keywords of sports-related queries using visualization and clustering (2009) 0.01
    0.007659336 = product of:
      0.015318672 = sum of:
        0.015318672 = product of:
          0.06127469 = sum of:
            0.06127469 = weight(_text_:authors in 2947) [ClassicSimilarity], result of:
              0.06127469 = score(doc=2947,freq=2.0), product of:
                0.24330677 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.05337063 = queryNorm
                0.25184128 = fieldWeight in 2947, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2947)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    The authors investigated 11 sports-related query keywords extracted from a public search engine query log to better understand sports-related information seeking on the Internet. After the query log contents were cleaned and query data were parsed, popular sports-related keywords were identified, along with frequently co-occurring query terms associated with the identified keywords. Relationships among each sports-related focus keyword and its related keywords were characterized and grouped using multidimensional scaling (MDS) in combination with traditional hierarchical clustering methods. The two approaches were synthesized in a visual context by highlighting the results of the hierarchical clustering analysis in the visual MDS configuration. Important events, people, subjects, merchandise, and so on related to a sport were illustrated, and relationships among the sports were analyzed. A small-scale comparative study of sports searches with and without term assistance was conducted. Searches that used search term assistance by relying on previous query term relationships outperformed the searches without the search term assistance. The findings of this study provide insights into sports information seeking behavior on the Internet. The developed method also may be applied to other query log subject areas.