Search (80 results, page 1 of 4)

  • × language_ss:"e"
  • × theme_ss:"Internet"
  • × year_i:[2010 TO 2020}
  1. Zimmer, M.; Proferes, N.J.: ¬A topology of Twitter research : disciplines, methods, and ethics (2014) 0.04
    0.04214679 = product of:
      0.08429358 = sum of:
        0.06843241 = weight(_text_:data in 1622) [ClassicSimilarity], result of:
          0.06843241 = score(doc=1622,freq=14.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.46216056 = fieldWeight in 1622, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1622)
        0.01586117 = product of:
          0.03172234 = sum of:
            0.03172234 = weight(_text_:22 in 1622) [ClassicSimilarity], result of:
              0.03172234 = score(doc=1622,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.19345059 = fieldWeight in 1622, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1622)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Purpose - The purpose of this paper is to engage in a systematic analysis of academic research that relies on the collection and use of Twitter data, creating topology of Twitter research that details the disciplines and methods of analysis, amount of tweets and users under analysis, the methods used to collect Twitter data, and accounts of ethical considerations related to these projects. Design/methodology/approach - Content analysis of 382 academic publications from 2006 to 2012 that used Twitter as their primary platform for data collection and analysis. Findings - The analysis of over 380 scholarly publications utilizing Twitter data reveals noteworthy trends related to the growth of Twitter-based research overall, the disciplines engaged in such research, the methods of acquiring Twitter data for analysis, and emerging ethical considerations of such research. Research limitations/implications - The findings provide a benchmark analysis that must be updated with the continued growth of Twitter-based research. Originality/value - The research is the first full-text systematic analysis of Twitter-based research projects, focussing on the growth in discipline and methods as well as its ethical implications. It is of value for the broader research community currently engaged in social media-based research, and will prompt reflexive evaluation of what research is occurring, how it is occurring, what is being done with Twitter data, and how researchers are addressing the ethics of Twitter-based research.
    Date
    20. 1.2015 18:30:22
    Series
    Special Issue: Twitter data analytics
  2. Yang, S.; Han, R.; Ding, J.; Song, Y.: ¬The distribution of Web citations (2012) 0.04
    0.03959743 = product of:
      0.07919486 = sum of:
        0.053759433 = weight(_text_:data in 2735) [ClassicSimilarity], result of:
          0.053759433 = score(doc=2735,freq=6.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.3630661 = fieldWeight in 2735, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2735)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 2735) [ClassicSimilarity], result of:
              0.05087085 = score(doc=2735,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 2735, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2735)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    A substantial amount of research has focused on the persistence or availability of Web citations. The present study analyzes Web citation distributions. Web citations are defined as the mentions of the URLs of Web pages (Web resources) as references in academic papers. The present paper primarily focuses on the analysis of the URLs of Web citations and uses three sets of data, namely, Set 1 from the Humanities and Social Science Index in China (CSSCI, 1998-2009), Set 2 from the publications of two international computer science societies, Communications of the ACM and IEEE Computer (1995-1999), and Set 3 from the medical science database, MEDLINE, of the National Library of Medicine (1994-2006). Web citation distributions are investigated based on Web site types, Web page types, URL frequencies, URL depths, URL lengths, and year of article publication. Results show significant differences in the Web citation distributions among the three data sets. However, when the URLs of Web citations with the same hostnames are aggregated, the distributions in the three data sets are consistent with the power law (the Lotka function).
    Source
    Information processing and management. 48(2012) no.4, S.779-790
  3. Hwang, S.-Y.; Yang, W.-S.; Ting, K.-D.: Automatic index construction for multimedia digital libraries (2010) 0.03
    0.03466491 = product of:
      0.06932982 = sum of:
        0.043894395 = weight(_text_:data in 4228) [ClassicSimilarity], result of:
          0.043894395 = score(doc=4228,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.29644224 = fieldWeight in 4228, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=4228)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 4228) [ClassicSimilarity], result of:
              0.05087085 = score(doc=4228,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 4228, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4228)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Indexing remains one of the most popular tools provided by digital libraries to help users identify and understand the characteristics of the information they need. Despite extensive studies of the problem of automatic index construction for text-based digital libraries, the construction of multimedia digital libraries continues to represent a challenge, because multimedia objects usually lack sufficient text information to ensure reliable index learning. This research attempts to tackle the problem of automatic index construction for multimedia objects by employing Web usage logs and limited keywords pertaining to multimedia objects. The tests of two proposed algorithms use two different data sets with different amounts of textual information. Web usage logs offer precious information for building indexes of multimedia digital libraries with limited textual information. The proposed methods generally yield better indexes, especially for the artwork data set.
    Source
    Information processing and management. 46(2010) no.3, S.295-307
  4. Barrio, P.; Gravano, L.: Sampling strategies for information extraction over the deep web (2017) 0.03
    0.029316615 = product of:
      0.05863323 = sum of:
        0.029262928 = weight(_text_:data in 3412) [ClassicSimilarity], result of:
          0.029262928 = score(doc=3412,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.19762816 = fieldWeight in 3412, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03125 = fieldNorm(doc=3412)
        0.029370302 = product of:
          0.058740605 = sum of:
            0.058740605 = weight(_text_:processing in 3412) [ClassicSimilarity], result of:
              0.058740605 = score(doc=3412,freq=6.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.30987173 = fieldWeight in 3412, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3412)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Information extraction systems discover structured information in natural language text. Having information in structured form enables much richer querying and data mining than possible over the natural language text. However, information extraction is a computationally expensive task, and hence improving the efficiency of the extraction process over large text collections is of critical interest. In this paper, we focus on an especially valuable family of text collections, namely, the so-called deep-web text collections, whose contents are not crawlable and are only available via querying. Important steps for efficient information extraction over deep-web text collections (e.g., selecting the collections on which to focus the extraction effort, based on their contents; or learning which documents within these collections-and in which order-to process, based on their words and phrases) require having a representative document sample from each collection. These document samples have to be collected by querying the deep-web text collections, an expensive process that renders impractical the existing sampling approaches developed for other data scenarios. In this paper, we systematically study the space of query-based document sampling techniques for information extraction over the deep web. Specifically, we consider (i) alternative query execution schedules, which vary on how they account for the query effectiveness, and (ii) alternative document retrieval and processing schedules, which vary on how they distribute the extraction effort over documents. We report the results of the first large-scale experimental evaluation of sampling techniques for information extraction over the deep web. Our results show the merits and limitations of the alternative query execution and document retrieval and processing strategies, and provide a roadmap for addressing this critically important building block for efficient, scalable information extraction.
    Source
    Information processing and management. 53(2017) no.2, S.309-331
  5. Oguz, F.; Koehler, W.: URL decay at year 20 : a research note (2016) 0.03
    0.029208332 = product of:
      0.058416665 = sum of:
        0.036211025 = weight(_text_:data in 2651) [ClassicSimilarity], result of:
          0.036211025 = score(doc=2651,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.24455236 = fieldWeight in 2651, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2651)
        0.022205638 = product of:
          0.044411276 = sum of:
            0.044411276 = weight(_text_:22 in 2651) [ClassicSimilarity], result of:
              0.044411276 = score(doc=2651,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.2708308 = fieldWeight in 2651, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2651)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    All text is ephemeral. Some texts are more ephemeral than others. The web has proved to be among the most ephemeral and changing of information vehicles. The research note revisits Koehler's original data set after about 20 years since it was first collected. By late 2013, the number of URLs responding to a query had fallen to 1.6% of the original sample. A query of the 6 remaining URLs in February 2015 showed only 2 still responding.
    Date
    22. 1.2016 14:37:14
  6. Borlund, P.; Dreier, S.: ¬An investigation of the search behaviour associated with Ingwersen's three types of information needs (2014) 0.03
    0.028236724 = product of:
      0.05647345 = sum of:
        0.031038022 = weight(_text_:data in 2691) [ClassicSimilarity], result of:
          0.031038022 = score(doc=2691,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2096163 = fieldWeight in 2691, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2691)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 2691) [ClassicSimilarity], result of:
              0.05087085 = score(doc=2691,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 2691, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2691)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    We report a naturalistic interactive information retrieval (IIR) study of 18 ordinary users in the age of 20-25 who carry out everyday-life information seeking (ELIS) on the Internet with respect to the three types of information needs identified by Ingwersen (1986): the verificative information need (VIN), the conscious topical information need (CIN), and the muddled topical information need (MIN). The searches took place in the private homes of the users in order to ensure as realistic searching as possible. Ingwersen (1996) associates a given search behaviour to each of the three types of information needs, which are analytically deduced, but not yet empirically tested. Thus the objective of the study is to investigate whether empirical data does, or does not, conform to the predictions derived from the three types of information needs. The main conclusion is that the analytically deduced information search behaviour characteristics by Ingwersen are positively corroborated for this group of test participants who search the Internet as part of ELIS.
    Source
    Information processing and management. 50(2014) no.4, S.493-507
  7. Huvila, I.: Mining qualitative data on human information behaviour from the Web (2010) 0.03
    0.025605064 = product of:
      0.102420256 = sum of:
        0.102420256 = weight(_text_:data in 4676) [ClassicSimilarity], result of:
          0.102420256 = score(doc=4676,freq=16.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.69169855 = fieldWeight in 4676, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4676)
      0.25 = coord(1/4)
    
    Abstract
    This paper discusses an approach of collecting qualitative data on human information behaviour that is based on mining web data using search engines. The approach is technically the same that has been used for some time in webometric research to make statistical inferences on web data, but the present paper shows how the same tools and data collecting methods can be used to gather data for qualitative data analysis on human information behaviour.
    Theme
    Data Mining
  8. Arbelaitz, O.; Martínez-Otzeta. J.M.; Muguerza, J.: User modeling in a social network for cognitively disabled people (2016) 0.03
    0.025035713 = product of:
      0.050071426 = sum of:
        0.031038022 = weight(_text_:data in 2639) [ClassicSimilarity], result of:
          0.031038022 = score(doc=2639,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2096163 = fieldWeight in 2639, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2639)
        0.019033402 = product of:
          0.038066804 = sum of:
            0.038066804 = weight(_text_:22 in 2639) [ClassicSimilarity], result of:
              0.038066804 = score(doc=2639,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.23214069 = fieldWeight in 2639, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2639)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Online communities are becoming an important tool in the communication and participation processes in our society. However, the most widespread applications are difficult to use for people with disabilities, or may involve some risks if no previous training has been undertaken. This work describes a novel social network for cognitively disabled people along with a clustering-based method for modeling activity and socialization processes of its users in a noninvasive way. This closed social network is specifically designed for people with cognitive disabilities, called Guremintza, that provides the network administrators (e.g., social workers) with two types of reports: summary statistics of the network usage and behavior patterns discovered by a data mining process. Experiments made in an initial stage of the network show that the discovered patterns are meaningful to the social workers and they find them useful in monitoring the progress of the users.
    Date
    22. 1.2016 12:02:26
  9. Hannemann, J.; Kett, J.: Linked data for libraries (2010) 0.02
    0.021947198 = product of:
      0.08778879 = sum of:
        0.08778879 = weight(_text_:data in 3964) [ClassicSimilarity], result of:
          0.08778879 = score(doc=3964,freq=16.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.5928845 = fieldWeight in 3964, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=3964)
      0.25 = coord(1/4)
    
    Abstract
    The Semantic Web in general and the Linking Open Data initiative in particular encourage institutions to publish, share and interlink their data. This has considerable potential for libraries, which can complement their data by linking it to other, external data sources. This paper details the first linked open data service of the German National Library. The focus is on the challenges met during the inception of this service. Extrapolating from our experiences, the paper further discusses the German National Library's perspective on the future of library data exchange and the potential for the creation of globally interlinked library data. We outline how this process can be facilitated and how new services can be offered based on these growing metadata collections.
  10. Dufour, C.; Bartlett, J.C.; Toms, E.G.: Understanding how webcasts are used as sources of information (2011) 0.02
    0.020863095 = product of:
      0.04172619 = sum of:
        0.02586502 = weight(_text_:data in 4195) [ClassicSimilarity], result of:
          0.02586502 = score(doc=4195,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.17468026 = fieldWeight in 4195, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4195)
        0.01586117 = product of:
          0.03172234 = sum of:
            0.03172234 = weight(_text_:22 in 4195) [ClassicSimilarity], result of:
              0.03172234 = score(doc=4195,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.19345059 = fieldWeight in 4195, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4195)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Webcasting systems were developed to provide remote access in real-time to live events. Today, these systems have an additional requirement: to accommodate the "second life" of webcasts as archival information objects. Research to date has focused on facilitating the production and storage of webcasts as well as the development of more interactive and collaborative multimedia tools to support the event, but research has not examined how people interact with a webcasting system to access and use the contents of those archived events. Using an experimental design, this study examined how 16 typical users interact with a webcasting system to respond to a set of information tasks: selecting a webcast, searching for specific information, and making a gist of a webcast. Using several data sources that included user actions, user perceptions, and user explanations of their actions and decisions, the study also examined the strategies employed to complete the tasks. The results revealed distinctive system-use patterns for each task and provided insights into the types of tools needed to make webcasting systems better suited for also using the webcasts as information objects.
    Date
    22. 1.2011 14:16:14
  11. Bhatia, S.; Biyani, P.; Mitra, P.: Identifying the role of individual user messages in an online discussion and its use in thread retrieval (2016) 0.02
    0.020863095 = product of:
      0.04172619 = sum of:
        0.02586502 = weight(_text_:data in 2650) [ClassicSimilarity], result of:
          0.02586502 = score(doc=2650,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.17468026 = fieldWeight in 2650, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2650)
        0.01586117 = product of:
          0.03172234 = sum of:
            0.03172234 = weight(_text_:22 in 2650) [ClassicSimilarity], result of:
              0.03172234 = score(doc=2650,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.19345059 = fieldWeight in 2650, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2650)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Online discussion forums have become a popular medium for users to discuss with and seek information from other users having similar interests. A typical discussion thread consists of a sequence of posts posted by multiple users. Each post in a thread serves a different purpose providing different types of information and, thus, may not be equally useful for all applications. Identifying the purpose and nature of each post in a discussion thread is thus an interesting research problem as it can help in improving information extraction and intelligent assistance techniques. We study the problem of classifying a given post as per its purpose in the discussion thread and employ features based on the post's content, structure of the thread, behavior of the participating users, and sentiment analysis of the post's content. We evaluate our approach on two forum data sets belonging to different genres and achieve strong classification performance. We also analyze the relative importance of different features used for the post classification task. Next, as a use case, we describe how the post class information can help in thread retrieval by incorporating this information in a state-of-the-art thread retrieval model.
    Date
    22. 1.2016 11:50:46
  12. Bizer, C.; Mendes, P.N.; Jentzsch, A.: Topology of the Web of Data (2012) 0.02
    0.018651532 = product of:
      0.07460613 = sum of:
        0.07460613 = weight(_text_:data in 425) [ClassicSimilarity], result of:
          0.07460613 = score(doc=425,freq=26.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.50385493 = fieldWeight in 425, product of:
              5.0990195 = tf(freq=26.0), with freq of:
                26.0 = termFreq=26.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03125 = fieldNorm(doc=425)
      0.25 = coord(1/4)
    
    Abstract
    The degree of structure of Web content is the determining factor for the types of functionality that search engines can provide. The more well structured the Web content is, the easier it is for search engines to understand Web content and provide advanced functionality, such as faceted filtering or the aggregation of content from multiple Web sites, based on this understanding. Today, most Web sites are generated from structured data that is stored in relational databases. Thus, it does not require too much extra effort for Web sites to publish this structured data directly on the Web in addition to HTML pages, and thus help search engines to understand Web content and provide improved functionality. An early approach to realize this idea and help search engines to understand Web content is Microformats, a technique for markingup structured data about specific types on entities-such as tags, blog posts, people, or reviews-within HTML pages. As Microformats are focused on a few entity types, the World Wide Web Consortium (W3C) started in 2004 to standardize RDFa as an alternative, more generic language for embedding any type of data into HTML pages. Today, major search engines such as Google, Yahoo, and Bing extract Microformat and RDFa data describing products, reviews, persons, events, and recipes from Web pages and use the extracted data to improve the user's search experience. The search engines have started to aggregate structured data from different Web sites and augment their search results with these aggregated information units in the form of rich snippets which combine, for instance, data This chapter gives an overview of the topology of the Web of Data that has been created by publishing data on the Web using the microformats RDFa, Microdata and Linked Data publishing techniques.
    Series
    Data-centric systems and applications
  13. Griesbaum, J.; Mahrholz, N.; Kiedrowski, K. von Löwe; Rittberger, M.: Knowledge generation in online forums : a case study in the German educational domain (2015) 0.02
    0.016690476 = product of:
      0.03338095 = sum of:
        0.020692015 = weight(_text_:data in 4440) [ClassicSimilarity], result of:
          0.020692015 = score(doc=4440,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.1397442 = fieldWeight in 4440, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03125 = fieldNorm(doc=4440)
        0.012688936 = product of:
          0.025377871 = sum of:
            0.025377871 = weight(_text_:22 in 4440) [ClassicSimilarity], result of:
              0.025377871 = score(doc=4440,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.15476047 = fieldWeight in 4440, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4440)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Purpose - The purpose of this paper is to get a first approximation of the usefulness of online forums with regard to information seeking and knowledge generation. Design/methodology/approach - This study captures the characteristics of knowledge generation by examining the pragmatics and types of information needs of posted questions and by investigating knowledge related characteristics of discussion posts as well as the success of communication. Three online forums were examined. The data set consists of 55 threads, containing 533 posts which were categorized manually by two researchers. Findings - Results show that questioners often ask for personal estimations. Information needs often aim for actionable insights or uncertainty reduction. With regard to answers, factual information is the dominant content type and has the highest knowledge value as it is the strongest predictor with regard to the generation of new knowledge. Opinions are also relevant, but in a rather subsequent and complementary way. Emotional aspects are scarcely observed. Overall, results indicate that knowledge creation predominantly follows a socio-cultural paradigm of knowledge exchange. Research limitations/implications - Although the investigation captures important aspects of knowledge building processes, the measurement of the forums' knowledge value is still rather limited. Success is only partly measurable with the current scheme. The central coding category "new topical knowledge" is only of nominal value and therefore not able to compare different kinds of knowledge gains in the course of discussion. Originality/value - The investigation reaches out beyond studies that do not consider that the role and relevance of posts is dependent on the state of the discussion. Furthermore, the paper integrates two perspectives of knowledge value: the success of the questioner with regard to the expressed information need and the knowledge building value for communicants and readers.
    Date
    20. 1.2015 18:30:22
  14. Danowski, P.: Step one: blow up the silo! : Open bibliographic data, the first step towards Linked Open Data (2010) 0.02
    0.015519011 = product of:
      0.062076043 = sum of:
        0.062076043 = weight(_text_:data in 3962) [ClassicSimilarity], result of:
          0.062076043 = score(doc=3962,freq=8.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.4192326 = fieldWeight in 3962, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=3962)
      0.25 = coord(1/4)
    
    Abstract
    More and more libraries starting semantic web projects. The question about the license of the data is not discussed or the discussion is deferred to the end of project. In this paper is discussed why the question of the license is so important in context of the semantic web that is should be one of the first aspects in a semantic web project. Also it will be shown why a public domain weaver is the only solution that fulfill the the special requirements of the semantic web and that guaranties the reuseablitly of semantic library data for a sustainability of the projects.
  15. Huang, C.; Fu, T.; Chen, H.: Text-based video content classification for online video-sharing sites (2010) 0.01
    0.014458986 = product of:
      0.057835944 = sum of:
        0.057835944 = weight(_text_:data in 3452) [ClassicSimilarity], result of:
          0.057835944 = score(doc=3452,freq=10.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.39059696 = fieldWeight in 3452, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3452)
      0.25 = coord(1/4)
    
    Abstract
    With the emergence of Web 2.0, sharing personal content, communicating ideas, and interacting with other online users in Web 2.0 communities have become daily routines for online users. User-generated data from Web 2.0 sites provide rich personal information (e.g., personal preferences and interests) and can be utilized to obtain insight about cyber communities and their social networks. Many studies have focused on leveraging user-generated information to analyze blogs and forums, but few studies have applied this approach to video-sharing Web sites. In this study, we propose a text-based framework for video content classification of online-video sharing Web sites. Different types of user-generated data (e.g., titles, descriptions, and comments) were used as proxies for online videos, and three types of text features (lexical, syntactic, and content-specific features) were extracted. Three feature-based classification techniques (C4.5, Naïve Bayes, and Support Vector Machine) were used to classify videos. To evaluate the proposed framework, user-generated data from candidate videos, which were identified by searching user-given keywords on YouTube, were first collected. Then, a subset of the collected data was randomly selected and manually tagged by users as our experiment data. The experimental results showed that the proposed approach was able to classify online videos based on users' interests with accuracy rates up to 87.2%, and all three types of text features contributed to discriminating videos. Support Vector Machine outperformed C4.5 and Naïve Bayes techniques in our experiments. In addition, our case study further demonstrated that accurate video-classification results are very useful for identifying implicit cyber communities on video-sharing Web sites.
  16. Oliveira Machado, L.M.; Souza, R.R.; Simões, M. da Graça: Semantic web or web of data? : a diachronic study (1999 to 2017) of the publications of Tim Berners-Lee and the World Wide Web Consortium (2019) 0.01
    0.014458986 = product of:
      0.057835944 = sum of:
        0.057835944 = weight(_text_:data in 5300) [ClassicSimilarity], result of:
          0.057835944 = score(doc=5300,freq=10.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.39059696 = fieldWeight in 5300, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5300)
      0.25 = coord(1/4)
    
    Abstract
    The web has been, in the last decades, the place where information retrieval achieved its maximum importance, given its ubiquity and the sheer volume of information. However, its exponential growth made the retrieval task increasingly hard, relying in its effectiveness on idiosyncratic and somewhat biased ranking algorithms. To deal with this problem, a "new" web, called the Semantic Web (SW), was proposed, bringing along concepts like "Web of Data" and "Linked Data," although the definitions and connections among these concepts are often unclear. Based on a qualitative approach built over a literature review, a definition of SW is presented, discussing the related concepts sometimes used as synonyms. It concludes that the SW is a comprehensive and ambitious construct that includes the great purpose of making the web a global database. It also follows the specifications developed and/or associated with its operationalization and the necessary procedures for the connection of data in an open format on the web. The goals of this comprehensive SW are the union of two outcomes still tenuously connected: the virtually unlimited possibility of connections between data-the web domain-with the potentiality of the automated inference of "intelligent" systems-the semantic component.
  17. Stuart, D.: Web metrics for library and information professionals (2014) 0.01
    0.013579135 = product of:
      0.05431654 = sum of:
        0.05431654 = weight(_text_:data in 2274) [ClassicSimilarity], result of:
          0.05431654 = score(doc=2274,freq=18.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.36682853 = fieldWeight in 2274, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2274)
      0.25 = coord(1/4)
    
    Abstract
    This is a practical guide to using web metrics to measure impact and demonstrate value. The web provides an opportunity to collect a host of different metrics, from those associated with social media accounts and websites to more traditional research outputs. This book is a clear guide for library and information professionals as to what web metrics are available and how to assess and use them to make informed decisions and demonstrate value. As individuals and organizations increasingly use the web in addition to traditional publishing avenues and formats, this book provides the tools to unlock web metrics and evaluate the impact of this content. The key topics covered include: bibliometrics, webometrics and web metrics; data collection tools; evaluating impact on the web; evaluating social media impact; investigating relationships between actors; exploring traditional publications in a new environment; web metrics and the web of data; the future of web metrics and the library and information professional. The book will provide a practical introduction to web metrics for a wide range of library and information professionals, from the bibliometrician wanting to demonstrate the wider impact of a researcher's work than can be demonstrated through traditional citations databases, to the reference librarian wanting to measure how successfully they are engaging with their users on Twitter. It will be a valuable tool for anyone who wants to not only understand the impact of content, but demonstrate this impact to others within the organization and beyond.
    Content
    1. Introduction. MetricsIndicators -- Web metrics and Ranganathan's laws of library science -- Web metrics for the library and information professional -- The aim of this book -- The structure of the rest of this book -- 2. Bibliometrics, webometrics and web metrics. Web metrics -- Information science metrics -- Web analytics -- Relational and evaluative metrics -- Evaluative web metrics -- Relational web metrics -- Validating the results -- 3. Data collection tools. The anatomy of a URL, web links and the structure of the web -- Search engines 1.0 -- Web crawlers -- Search engines 2.0 -- Post search engine 2.0: fragmentation -- 4. Evaluating impact on the web. Websites -- Blogs -- Wikis -- Internal metrics -- External metrics -- A systematic approach to content analysis -- 5. Evaluating social media impact. Aspects of social network sites -- Typology of social network sites -- Research and tools for specific sites and services -- Other social network sites -- URL shorteners: web analytic links on any site -- General social media impact -- Sentiment analysis -- 6. Investigating relationships between actors. Social network analysis methods -- Sources for relational network analysis -- 7. Exploring traditional publications in a new environment. More bibliographic items -- Full text analysis -- Greater context -- 8. Web metrics and the web of data. The web of data -- Building the semantic web -- Implications of the web of data for web metrics -- Investigating the web of data today -- SPARQL -- Sindice -- LDSpider: an RDF web crawler -- 9. The future of web metrics and the library and information professional. How far we have come -- The future of web metrics -- The future of the library and information professional and web metrics.
    LCSH
    Data mining
    Subject
    Data mining
  18. Elsweiler, D.; Harvey, M.: Engaging and maintaining a sense of being informed : understanding the tasks motivating twitter search (2015) 0.01
    0.01293251 = product of:
      0.05173004 = sum of:
        0.05173004 = weight(_text_:data in 1635) [ClassicSimilarity], result of:
          0.05173004 = score(doc=1635,freq=8.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.34936053 = fieldWeight in 1635, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1635)
      0.25 = coord(1/4)
    
    Abstract
    Micro-blogging services such as Twitter represent constantly evolving, user-generated sources of information. Previous studies show that users search such content regularly but are often dissatisfied with current search facilities. We argue that an enhanced understanding of the motivations for search would aid the design of improved search systems, better reflecting what people need. Building on previous research, we present qualitative analyses of two sources of data regarding how and why people search Twitter. The first, a diary study (p?=?68), provides descriptions of Twitter information needs (n?=?117) and important meta-data from active study participants. The second data set was established by collecting first-person descriptions of search behavior (n?=?388) tweeted by twitter users themselves (p?=?381) and complements the first data set by providing similar descriptions from a more plentiful source. The results of our analyses reveal numerous characteristics of Twitter search that differentiate it from more commonly studied search domains, such as web search. The findings also shed light on some of the difficulties users encounter. By highlighting examples that go beyond those previously published, this article adds to the understanding of how and why people search such content. Based on these new insights, we conclude with a discussion of possible design implications for search systems that index micro-blogging content.
  19. Wang, C.; Zhao, S.; Kalra, A.; Borcea, C.; Chen, Y.: Predictive models and analysis for webpage depth-level dwell time (2018) 0.01
    0.01293251 = product of:
      0.05173004 = sum of:
        0.05173004 = weight(_text_:data in 4370) [ClassicSimilarity], result of:
          0.05173004 = score(doc=4370,freq=8.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.34936053 = fieldWeight in 4370, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4370)
      0.25 = coord(1/4)
    
    Abstract
    A half of online display ads are not rendered viewable because the users do not scroll deep enough or spend sufficient time at the page depth where the ads are placed. In order to increase the marketing efficiency and ad effectiveness, there is a strong demand for viewability prediction from both advertisers and publishers. This paper aims to predict the dwell time for a given urn:x-wiley:23301635:media:asi24025:asi24025-math-0001 triplet based on historic data collected by publishers. This problem is difficult because of user behavior variability and data sparsity. To solve it, we propose predictive models based on Factorization Machines and Field-aware Factorization Machines in order to overcome the data sparsity issue and provide flexibility to add auxiliary information such as the visible area of a user's browser. In addition, we leverage the prior dwell time behavior of the user within the current page view, that is, time series information, to further improve the proposed models. Experimental results using data from a large web publisher demonstrate that the proposed models outperform comparison models. Also, the results show that adding time series information further improves the performance.
  20. Kong, S.; Ye, F.; Feng, L.; Zhao, Z.: Towards the prediction problems of bursting hashtags on Twitter (2015) 0.01
    0.012802532 = product of:
      0.051210128 = sum of:
        0.051210128 = weight(_text_:data in 2338) [ClassicSimilarity], result of:
          0.051210128 = score(doc=2338,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.34584928 = fieldWeight in 2338, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.25 = coord(1/4)
    
    Abstract
    Hundreds of thousands of hashtags are generated every day on Twitter. Only a few will burst and become trending topics. In this article, we provide the definition of a bursting hashtag and conduct a systematic study of a series of challenging prediction problems that span the entire life cycles of bursting hashtags. Around the problem of "how to build a system to predict bursting hashtags," we explore different types of features and present machine learning solutions. On real data sets from Twitter, experiments are conducted to evaluate the effectiveness of the proposed solutions and the contributions of features.
    Theme
    Data Mining

Types

  • a 74
  • el 5
  • m 3
  • More… Less…