Search (156 results, page 1 of 8)

  • × language_ss:"e"
  • × theme_ss:"Internet"
  • × year_i:[2010 TO 2020}
  1. Fu, T.; Abbasi, A.; Chen, H.: ¬A focused crawler for Dark Web forums (2010) 0.01
    0.009314393 = product of:
      0.043467164 = sum of:
        0.02688897 = weight(_text_:system in 3471) [ClassicSimilarity], result of:
          0.02688897 = score(doc=3471,freq=8.0), product of:
            0.07727166 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.02453417 = queryNorm
            0.3479797 = fieldWeight in 3471, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3471)
        0.004176737 = weight(_text_:information in 3471) [ClassicSimilarity], result of:
          0.004176737 = score(doc=3471,freq=2.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.09697737 = fieldWeight in 3471, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3471)
        0.012401459 = weight(_text_:retrieval in 3471) [ClassicSimilarity], result of:
          0.012401459 = score(doc=3471,freq=2.0), product of:
            0.07421378 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02453417 = queryNorm
            0.16710453 = fieldWeight in 3471, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3471)
      0.21428572 = coord(3/14)
    
    Abstract
    The unprecedented growth of the Internet has given rise to the Dark Web, the problematic facet of the Web associated with cybercrime, hate, and extremism. Despite the need for tools to collect and analyze Dark Web forums, the covert nature of this part of the Internet makes traditional Web crawling techniques insufficient for capturing such content. In this study, we propose a novel crawling system designed to collect Dark Web forum content. The system uses a human-assisted accessibility approach to gain access to Dark Web forums. Several URL ordering features and techniques enable efficient extraction of forum postings. The system also includes an incremental crawler coupled with a recall-improvement mechanism intended to facilitate enhanced retrieval and updating of collected content. Experiments conducted to evaluate the effectiveness of the human-assisted accessibility approach and the recall-improvement-based, incremental-update procedure yielded favorable results. The human-assisted approach significantly improved access to Dark Web forums while the incremental crawler with recall improvement also outperformed standard periodic- and incremental-update approaches. Using the system, we were able to collect over 100 Dark Web forums from three regions. A case study encompassing link and content analysis of collected forums was used to illustrate the value and importance of gathering and analyzing content from such online communities.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1213-1231
  2. Dufour, C.; Bartlett, J.C.; Toms, E.G.: Understanding how webcasts are used as sources of information (2011) 0.01
    0.008963037 = product of:
      0.041827507 = sum of:
        0.02328653 = weight(_text_:system in 4195) [ClassicSimilarity], result of:
          0.02328653 = score(doc=4195,freq=6.0), product of:
            0.07727166 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.02453417 = queryNorm
            0.30135927 = fieldWeight in 4195, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4195)
        0.010230875 = weight(_text_:information in 4195) [ClassicSimilarity], result of:
          0.010230875 = score(doc=4195,freq=12.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.23754507 = fieldWeight in 4195, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4195)
        0.008310104 = product of:
          0.016620208 = sum of:
            0.016620208 = weight(_text_:22 in 4195) [ClassicSimilarity], result of:
              0.016620208 = score(doc=4195,freq=2.0), product of:
                0.085914485 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02453417 = queryNorm
                0.19345059 = fieldWeight in 4195, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4195)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Abstract
    Webcasting systems were developed to provide remote access in real-time to live events. Today, these systems have an additional requirement: to accommodate the "second life" of webcasts as archival information objects. Research to date has focused on facilitating the production and storage of webcasts as well as the development of more interactive and collaborative multimedia tools to support the event, but research has not examined how people interact with a webcasting system to access and use the contents of those archived events. Using an experimental design, this study examined how 16 typical users interact with a webcasting system to respond to a set of information tasks: selecting a webcast, searching for specific information, and making a gist of a webcast. Using several data sources that included user actions, user perceptions, and user explanations of their actions and decisions, the study also examined the strategies employed to complete the tasks. The results revealed distinctive system-use patterns for each task and provided insights into the types of tools needed to make webcasting systems better suited for also using the webcasts as information objects.
    Date
    22. 1.2011 14:16:14
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.2, S.343-362
  3. Bhatia, S.; Biyani, P.; Mitra, P.: Identifying the role of individual user messages in an online discussion and its use in thread retrieval (2016) 0.01
    0.008575914 = product of:
      0.040020935 = sum of:
        0.010230875 = weight(_text_:information in 2650) [ClassicSimilarity], result of:
          0.010230875 = score(doc=2650,freq=12.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.23754507 = fieldWeight in 2650, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2650)
        0.021479957 = weight(_text_:retrieval in 2650) [ClassicSimilarity], result of:
          0.021479957 = score(doc=2650,freq=6.0), product of:
            0.07421378 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02453417 = queryNorm
            0.28943354 = fieldWeight in 2650, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2650)
        0.008310104 = product of:
          0.016620208 = sum of:
            0.016620208 = weight(_text_:22 in 2650) [ClassicSimilarity], result of:
              0.016620208 = score(doc=2650,freq=2.0), product of:
                0.085914485 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02453417 = queryNorm
                0.19345059 = fieldWeight in 2650, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2650)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Abstract
    Online discussion forums have become a popular medium for users to discuss with and seek information from other users having similar interests. A typical discussion thread consists of a sequence of posts posted by multiple users. Each post in a thread serves a different purpose providing different types of information and, thus, may not be equally useful for all applications. Identifying the purpose and nature of each post in a discussion thread is thus an interesting research problem as it can help in improving information extraction and intelligent assistance techniques. We study the problem of classifying a given post as per its purpose in the discussion thread and employ features based on the post's content, structure of the thread, behavior of the participating users, and sentiment analysis of the post's content. We evaluate our approach on two forum data sets belonging to different genres and achieve strong classification performance. We also analyze the relative importance of different features used for the post classification task. Next, as a use case, we describe how the post class information can help in thread retrieval by incorporating this information in a state-of-the-art thread retrieval model.
    Date
    22. 1.2016 11:50:46
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.2, S.276-288
  4. Johnson, E.H.: S R Ranganathan in the Internet age (2019) 0.01
    0.0077201175 = product of:
      0.036027215 = sum of:
        0.016133383 = weight(_text_:system in 5406) [ClassicSimilarity], result of:
          0.016133383 = score(doc=5406,freq=2.0), product of:
            0.07727166 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.02453417 = queryNorm
            0.20878783 = fieldWeight in 5406, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=5406)
        0.0050120843 = weight(_text_:information in 5406) [ClassicSimilarity], result of:
          0.0050120843 = score(doc=5406,freq=2.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.116372846 = fieldWeight in 5406, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=5406)
        0.014881751 = weight(_text_:retrieval in 5406) [ClassicSimilarity], result of:
          0.014881751 = score(doc=5406,freq=2.0), product of:
            0.07421378 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02453417 = queryNorm
            0.20052543 = fieldWeight in 5406, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=5406)
      0.21428572 = coord(3/14)
    
    Abstract
    S R Ranganathan's ideas have influenced library classification since the inception of his Colon Classification in 1933. His address at Elsinore, "Library Classification Through a Century", was his grand vision of the century of progress in classification from 1876 to 1975, and looked to the future of faceted classification as the means to provide a cohesive system to organize the world's information. Fifty years later, the internet and its achievements, social ecology, and consequences present a far more complicated picture, with the library as he knew it as a very small part and the problems that he confronted now greatly exacerbated. The systematic nature of Ranganathan's canons, principles, postulates, and devices suggest that modern semantic algorithms could guide automatic subject tagging. The vision presented here is one of internet-wide faceted classification and retrieval, implemented as open, distributed facets providing unified faceted searching across all web sites.
  5. Keikha, M.; Crestani, F.; Carman, M.J.: Employing document dependency in blog search (2012) 0.01
    0.0064334315 = product of:
      0.03002268 = sum of:
        0.013444485 = weight(_text_:system in 4987) [ClassicSimilarity], result of:
          0.013444485 = score(doc=4987,freq=2.0), product of:
            0.07727166 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.02453417 = queryNorm
            0.17398985 = fieldWeight in 4987, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4987)
        0.004176737 = weight(_text_:information in 4987) [ClassicSimilarity], result of:
          0.004176737 = score(doc=4987,freq=2.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.09697737 = fieldWeight in 4987, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4987)
        0.012401459 = weight(_text_:retrieval in 4987) [ClassicSimilarity], result of:
          0.012401459 = score(doc=4987,freq=2.0), product of:
            0.07421378 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02453417 = queryNorm
            0.16710453 = fieldWeight in 4987, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4987)
      0.21428572 = coord(3/14)
    
    Abstract
    The goal in blog search is to rank blogs according to their recurrent relevance to the topic of the query. State-of-the-art approaches view it as an expert search or resource selection problem. We investigate the effect of content-based similarity between posts on the performance of the retrieval system. We test two different approaches for smoothing (regularizing) relevance scores of posts based on their dependencies. In the first approach, we smooth term distributions describing posts by performing a random walk over a document-term graph in which similar posts are highly connected. In the second, we directly smooth scores for posts using a regularization framework that aims to minimize the discrepancy between scores for similar documents. We then extend these approaches to consider the time interval between the posts in smoothing the scores. The idea is that if two posts are temporally close, then they are good sources for smoothing each other's relevance scores. We compare these methods with the state-of-the-art approaches in blog search that employ Language Modeling-based resource selection algorithms and fusion-based methods for aggregating post relevance scores. We show performance gains over the baseline techniques which do not take advantage of the relation between posts for smoothing relevance estimates.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.2, S.354-365
  6. Luo, Z.; Yu, Y.; Osborne, M.; Wang, T.: Structuring tweets for improving Twitter search (2015) 0.01
    0.0058806646 = product of:
      0.04116465 = sum of:
        0.008353474 = weight(_text_:information in 2335) [ClassicSimilarity], result of:
          0.008353474 = score(doc=2335,freq=8.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.19395474 = fieldWeight in 2335, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2335)
        0.032811176 = weight(_text_:retrieval in 2335) [ClassicSimilarity], result of:
          0.032811176 = score(doc=2335,freq=14.0), product of:
            0.07421378 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02453417 = queryNorm
            0.442117 = fieldWeight in 2335, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2335)
      0.14285715 = coord(2/14)
    
    Abstract
    Spam and wildly varying documents make searching in Twitter challenging. Most Twitter search systems generally treat a Tweet as a plain text when modeling relevance. However, a series of conventions allows users to Tweet in structural ways using a combination of different blocks of texts. These blocks include plain texts, hashtags, links, mentions, etc. Each block encodes a variety of communicative intent and the sequence of these blocks captures changing discourse. Previous work shows that exploiting the structural information can improve the structured documents (e.g., web pages) retrieval. In this study we utilize the structure of Tweets, induced by these blocks, for Twitter retrieval and Twitter opinion retrieval. For Twitter retrieval, a set of features, derived from the blocks of text and their combinations, is used into a learning-to-rank scenario. We show that structuring Tweets can achieve state-of-the-art performance. Our approach does not rely on social media features, but when we do add this additional information, performance improves significantly. For Twitter opinion retrieval, we explore the question of whether structural information derived from the body of Tweets and opinionatedness ratings of Tweets can improve performance. Experimental results show that retrieval using a novel unsupervised opinionatedness feature based on structuring Tweets achieves comparable performance with a supervised method using manually tagged Tweets. Topic-related specific structured Tweet sets are shown to help with query-dependent opinion retrieval.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2522-2539
  7. Wijnhoven, F.: ¬The Hegelian inquiring system and a critical triangulation tool for the Internet information slave : a design science study (2012) 0.01
    0.005739774 = product of:
      0.04017842 = sum of:
        0.022816047 = weight(_text_:system in 254) [ClassicSimilarity], result of:
          0.022816047 = score(doc=254,freq=4.0), product of:
            0.07727166 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.02453417 = queryNorm
            0.29527056 = fieldWeight in 254, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=254)
        0.01736237 = weight(_text_:information in 254) [ClassicSimilarity], result of:
          0.01736237 = score(doc=254,freq=24.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.40312737 = fieldWeight in 254, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=254)
      0.14285715 = coord(2/14)
    
    Abstract
    This article discusses people's understanding of reality by representations from the Internet. The Hegelian inquiry system is used here to explain the nature of informing on the Internet as activities of information masters to influence information slaves' opinions and as activities of information slaves to become well informed. The key assumption of Hegelianism regarding information is that information has no value independent from the interests and worldviews (theses) it supports. As part of the dialectic process of generating syntheses, we propose a role for information science of offering methods to critically evaluate the master's information, and by this we develop an opinion (thesis) independent from the master's power. For this we offer multiple methods for information criticism, named triangulation, which may help users to evaluate a master's evidence. This article presents also a prototype of a Hegelian information triangulator tool for information slaves (i.e., nonexperts). The article concludes with suggestions for further research on informative triangulation.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.6, S.1168-1182
  8. Bhattacharya, S.; Yang, C.; Srinivasan, P.; Boynton, B.: Perceptions of presidential candidates' personalities in twitter (2016) 0.01
    0.005333207 = product of:
      0.0248883 = sum of:
        0.004176737 = weight(_text_:information in 2635) [ClassicSimilarity], result of:
          0.004176737 = score(doc=2635,freq=2.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.09697737 = fieldWeight in 2635, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2635)
        0.012401459 = weight(_text_:retrieval in 2635) [ClassicSimilarity], result of:
          0.012401459 = score(doc=2635,freq=2.0), product of:
            0.07421378 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02453417 = queryNorm
            0.16710453 = fieldWeight in 2635, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2635)
        0.008310104 = product of:
          0.016620208 = sum of:
            0.016620208 = weight(_text_:22 in 2635) [ClassicSimilarity], result of:
              0.016620208 = score(doc=2635,freq=2.0), product of:
                0.085914485 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02453417 = queryNorm
                0.19345059 = fieldWeight in 2635, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2635)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Abstract
    Political sentiment analysis using social media, especially Twitter, has attracted wide interest in recent years. In such research, opinions about politicians are typically divided into positive, negative, or neutral. In our research, the goal is to mine political opinion from social media at a higher resolution by assessing statements of opinion related to the personality traits of politicians; this is an angle that has not yet been considered in social media research. A second goal is to contribute a novel retrieval-based approach for tracking public perception of personality using Gough and Heilbrun's Adjective Check List (ACL) of 110 terms describing key traits. This is in contrast to the typical lexical and machine-learning approaches used in sentiment analysis. High-precision search templates developed from the ACL were run on an 18-month span of Twitter posts mentioning Obama and Romney and these retrieved more than half a million tweets. For example, the results indicated that Romney was perceived as more of an achiever and Obama was perceived as somewhat more friendly. The traits were also aggregated into 14 broad personality dimensions. For example, Obama rated far higher than Romney on the Moderation dimension and lower on the Machiavellianism dimension. The temporal variability of such perceptions was explored.
    Date
    22. 1.2016 11:25:47
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.2, S.249-267
  9. Borlund, P.; Dreier, S.: ¬An investigation of the search behaviour associated with Ingwersen's three types of information needs (2014) 0.00
    0.0045007076 = product of:
      0.03150495 = sum of:
        0.016623203 = weight(_text_:information in 2691) [ClassicSimilarity], result of:
          0.016623203 = score(doc=2691,freq=22.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.38596505 = fieldWeight in 2691, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2691)
        0.014881751 = weight(_text_:retrieval in 2691) [ClassicSimilarity], result of:
          0.014881751 = score(doc=2691,freq=2.0), product of:
            0.07421378 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02453417 = queryNorm
            0.20052543 = fieldWeight in 2691, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2691)
      0.14285715 = coord(2/14)
    
    Abstract
    We report a naturalistic interactive information retrieval (IIR) study of 18 ordinary users in the age of 20-25 who carry out everyday-life information seeking (ELIS) on the Internet with respect to the three types of information needs identified by Ingwersen (1986): the verificative information need (VIN), the conscious topical information need (CIN), and the muddled topical information need (MIN). The searches took place in the private homes of the users in order to ensure as realistic searching as possible. Ingwersen (1996) associates a given search behaviour to each of the three types of information needs, which are analytically deduced, but not yet empirically tested. Thus the objective of the study is to investigate whether empirical data does, or does not, conform to the predictions derived from the three types of information needs. The main conclusion is that the analytically deduced information search behaviour characteristics by Ingwersen are positively corroborated for this group of test participants who search the Internet as part of ELIS.
    Source
    Information processing and management. 50(2014) no.4, S.493-507
  10. Yang, M.; Kiang, M.; Chen, H.; Li, Y.: Artificial immune system for illicit content identification in social media (2012) 0.00
    0.004360122 = product of:
      0.030520853 = sum of:
        0.02328653 = weight(_text_:system in 4980) [ClassicSimilarity], result of:
          0.02328653 = score(doc=4980,freq=6.0), product of:
            0.07727166 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.02453417 = queryNorm
            0.30135927 = fieldWeight in 4980, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4980)
        0.0072343214 = weight(_text_:information in 4980) [ClassicSimilarity], result of:
          0.0072343214 = score(doc=4980,freq=6.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.16796975 = fieldWeight in 4980, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4980)
      0.14285715 = coord(2/14)
    
    Abstract
    Social media is frequently used as a platform for the exchange of information and opinions as well as propaganda dissemination. But online content can be misused for the distribution of illicit information, such as violent postings in web forums. Illicit content is highly distributed in social media, while non-illicit content is unspecific and topically diverse. It is costly and time consuming to label a large amount of illicit content (positive examples) and non-illicit content (negative examples) to train classification systems. Nevertheless, it is relatively easy to obtain large volumes of unlabeled content in social media. In this article, an artificial immune system-based technique is presented to address the difficulties in the illicit content identification in social media. Inspired by the positive selection principle in the immune system, we designed a novel labeling heuristic based on partially supervised learning to extract high-quality positive and negative examples from unlabeled datasets. The empirical evaluation results from two large hate group web forums suggest that our proposed approach generally outperforms the benchmark techniques and exhibits more stable performance.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.2, S.256-269
  11. Vechtomova, O.: Facet-based opinion retrieval from blogs (2010) 0.00
    0.0043430096 = product of:
      0.030401066 = sum of:
        0.0058474317 = weight(_text_:information in 4225) [ClassicSimilarity], result of:
          0.0058474317 = score(doc=4225,freq=2.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.13576832 = fieldWeight in 4225, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4225)
        0.024553634 = weight(_text_:retrieval in 4225) [ClassicSimilarity], result of:
          0.024553634 = score(doc=4225,freq=4.0), product of:
            0.07421378 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02453417 = queryNorm
            0.33085006 = fieldWeight in 4225, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4225)
      0.14285715 = coord(2/14)
    
    Abstract
    The paper presents methods of retrieving blog posts containing opinions about an entity expressed in the query. The methods use a lexicon of subjective words and phrases compiled from manually and automatically developed resources. One of the methods uses the Kullback-Leibler divergence to weight subjective words occurring near query terms in documents, another uses proximity between the occurrences of query terms and subjective words in documents, and the third combines both factors. Methods of structuring queries into facets, facet expansion using Wikipedia, and a facet-based retrieval are also investigated in this work. The methods were evaluated using the TREC 2007 and 2008 Blog track topics, and proved to be highly effective.
    Source
    Information processing and management. 46(2010) no.1, S.71-88
  12. Sugimoto, C.R.; Work, S.; Larivière, V.; Haustein, S.: Scholarly use of social media and altmetrics : A review of the literature (2017) 0.00
    0.0039754473 = product of:
      0.02782813 = sum of:
        0.022816047 = weight(_text_:system in 3781) [ClassicSimilarity], result of:
          0.022816047 = score(doc=3781,freq=4.0), product of:
            0.07727166 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.02453417 = queryNorm
            0.29527056 = fieldWeight in 3781, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=3781)
        0.0050120843 = weight(_text_:information in 3781) [ClassicSimilarity], result of:
          0.0050120843 = score(doc=3781,freq=2.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.116372846 = fieldWeight in 3781, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3781)
      0.14285715 = coord(2/14)
    
    Abstract
    Social media has become integrated into the fabric of the scholarly communication system in fundamental ways, principally through scholarly use of social media platforms and the promotion of new indicators on the basis of interactions with these platforms. Research and scholarship in this area has accelerated since the coining and subsequent advocacy for altmetrics-that is, research indicators based on social media activity. This review provides an extensive account of the state-of-the art in both scholarly use of social media and altmetrics. The review consists of 2 main parts: the first examines the use of social media in academia, reviewing the various functions these platforms have in the scholarly communication process and the factors that affect this use. The second part reviews empirical studies of altmetrics, discussing the various interpretations of altmetrics, data collection and methodological limitations, and differences according to platform. The review ends with a critical discussion of the implications of this transformation in the scholarly communication system.
    Source
    Journal of the Association for Information Science and Technology. 68(2017) no.9, S.2037-2062
  13. Burford, S.: Complexity and the practice of web information architecture (2011) 0.00
    0.0039058207 = product of:
      0.027340744 = sum of:
        0.016133383 = weight(_text_:system in 4772) [ClassicSimilarity], result of:
          0.016133383 = score(doc=4772,freq=2.0), product of:
            0.07727166 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.02453417 = queryNorm
            0.20878783 = fieldWeight in 4772, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=4772)
        0.011207362 = weight(_text_:information in 4772) [ClassicSimilarity], result of:
          0.011207362 = score(doc=4772,freq=10.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.2602176 = fieldWeight in 4772, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=4772)
      0.14285715 = coord(2/14)
    
    Abstract
    This article describes the outcomes of research that examined the practice of web information architecture (IA) in large organizations. Using a grounded theory approach, seven large organizations were investigated and the data were analyzed for emerging themes and concepts. The research finds that the practice of web IA is characterized by unpredictability, multiple perspectives, and a need for responsiveness, agility, and negotiation. This article claims that web IA occurs in a complex environment and has emergent, self-organizing properties. There is value in examining the practice as a complex adaptive system. Using this metaphor, a pre-determined, structured methodology that delivers a documented, enduring, information design for the web is found inadequate - dominant and traditional thinking and practice in the organization of information are challenged.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.10, S.2024-2037
  14. Almeida Mariz, A.C.; Melo, R.O.; Almeida Mariz, T.: Challenges of organization and retrieval of photographs on social networks on the Internet (2018) 0.00
    0.0037893022 = product of:
      0.026525114 = sum of:
        0.006682779 = weight(_text_:information in 4830) [ClassicSimilarity], result of:
          0.006682779 = score(doc=4830,freq=2.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.1551638 = fieldWeight in 4830, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=4830)
        0.019842334 = weight(_text_:retrieval in 4830) [ClassicSimilarity], result of:
          0.019842334 = score(doc=4830,freq=2.0), product of:
            0.07421378 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02453417 = queryNorm
            0.26736724 = fieldWeight in 4830, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0625 = fieldNorm(doc=4830)
      0.14285715 = coord(2/14)
    
    Source
    Challenges and opportunities for knowledge organization in the digital age: proceedings of the Fifteenth International ISKO Conference, 9-11 July 2018, Porto, Portugal / organized by: International Society for Knowledge Organization (ISKO), ISKO Spain and Portugal Chapter, University of Porto - Faculty of Arts and Humanities, Research Centre in Communication, Information and Digital Culture (CIC.digital) - Porto. Eds.: F. Ribeiro u. M.E. Cerveira
  15. Oliveira Machado, L.M.; Souza, R.R.; Simões, M. da Graça: Semantic web or web of data? : a diachronic study (1999 to 2017) of the publications of Tim Berners-Lee and the World Wide Web Consortium (2019) 0.00
    0.0035389478 = product of:
      0.024772633 = sum of:
        0.0072343214 = weight(_text_:information in 5300) [ClassicSimilarity], result of:
          0.0072343214 = score(doc=5300,freq=6.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.16796975 = fieldWeight in 5300, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5300)
        0.017538311 = weight(_text_:retrieval in 5300) [ClassicSimilarity], result of:
          0.017538311 = score(doc=5300,freq=4.0), product of:
            0.07421378 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02453417 = queryNorm
            0.23632148 = fieldWeight in 5300, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5300)
      0.14285715 = coord(2/14)
    
    Abstract
    The web has been, in the last decades, the place where information retrieval achieved its maximum importance, given its ubiquity and the sheer volume of information. However, its exponential growth made the retrieval task increasingly hard, relying in its effectiveness on idiosyncratic and somewhat biased ranking algorithms. To deal with this problem, a "new" web, called the Semantic Web (SW), was proposed, bringing along concepts like "Web of Data" and "Linked Data," although the definitions and connections among these concepts are often unclear. Based on a qualitative approach built over a literature review, a definition of SW is presented, discussing the related concepts sometimes used as synonyms. It concludes that the SW is a comprehensive and ambitious construct that includes the great purpose of making the web a global database. It also follows the specifications developed and/or associated with its operationalization and the necessary procedures for the connection of data in an open format on the web. The goals of this comprehensive SW are the union of two outcomes still tenuously connected: the virtually unlimited possibility of connections between data-the web domain-with the potentiality of the automated inference of "intelligent" systems-the semantic component.
    Source
    Journal of the Association for Information Science and Technology. 70(2019) no.7, S.701-714
  16. Kong, S.; Ye, F.; Feng, L.; Zhao, Z.: Towards the prediction problems of bursting hashtags on Twitter (2015) 0.00
    0.0035242445 = product of:
      0.02466971 = sum of:
        0.018822279 = weight(_text_:system in 2338) [ClassicSimilarity], result of:
          0.018822279 = score(doc=2338,freq=2.0), product of:
            0.07727166 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.02453417 = queryNorm
            0.2435858 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
        0.0058474317 = weight(_text_:information in 2338) [ClassicSimilarity], result of:
          0.0058474317 = score(doc=2338,freq=2.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.13576832 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.14285715 = coord(2/14)
    
    Abstract
    Hundreds of thousands of hashtags are generated every day on Twitter. Only a few will burst and become trending topics. In this article, we provide the definition of a bursting hashtag and conduct a systematic study of a series of challenging prediction problems that span the entire life cycles of bursting hashtags. Around the problem of "how to build a system to predict bursting hashtags," we explore different types of features and present machine learning solutions. On real data sets from Twitter, experiments are conducted to evaluate the effectiveness of the proposed solutions and the contributions of features.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2566-2579
  17. Barrio, P.; Gravano, L.: Sampling strategies for information extraction over the deep web (2017) 0.00
    0.0035138645 = product of:
      0.02459705 = sum of:
        0.010566402 = weight(_text_:information in 3412) [ClassicSimilarity], result of:
          0.010566402 = score(doc=3412,freq=20.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.2453355 = fieldWeight in 3412, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=3412)
        0.014030648 = weight(_text_:retrieval in 3412) [ClassicSimilarity], result of:
          0.014030648 = score(doc=3412,freq=4.0), product of:
            0.07421378 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02453417 = queryNorm
            0.18905719 = fieldWeight in 3412, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03125 = fieldNorm(doc=3412)
      0.14285715 = coord(2/14)
    
    Abstract
    Information extraction systems discover structured information in natural language text. Having information in structured form enables much richer querying and data mining than possible over the natural language text. However, information extraction is a computationally expensive task, and hence improving the efficiency of the extraction process over large text collections is of critical interest. In this paper, we focus on an especially valuable family of text collections, namely, the so-called deep-web text collections, whose contents are not crawlable and are only available via querying. Important steps for efficient information extraction over deep-web text collections (e.g., selecting the collections on which to focus the extraction effort, based on their contents; or learning which documents within these collections-and in which order-to process, based on their words and phrases) require having a representative document sample from each collection. These document samples have to be collected by querying the deep-web text collections, an expensive process that renders impractical the existing sampling approaches developed for other data scenarios. In this paper, we systematically study the space of query-based document sampling techniques for information extraction over the deep web. Specifically, we consider (i) alternative query execution schedules, which vary on how they account for the query effectiveness, and (ii) alternative document retrieval and processing schedules, which vary on how they distribute the extraction effort over documents. We report the results of the first large-scale experimental evaluation of sampling techniques for information extraction over the deep web. Our results show the merits and limitations of the alternative query execution and document retrieval and processing strategies, and provide a roadmap for addressing this critically important building block for efficient, scalable information extraction.
    Source
    Information processing and management. 53(2017) no.2, S.309-331
  18. Bodoff, D.; Raban, D.: User models as revealed in web-based research services (2012) 0.00
    0.0033661337 = product of:
      0.023562934 = sum of:
        0.008681185 = weight(_text_:information in 76) [ClassicSimilarity], result of:
          0.008681185 = score(doc=76,freq=6.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.20156369 = fieldWeight in 76, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=76)
        0.014881751 = weight(_text_:retrieval in 76) [ClassicSimilarity], result of:
          0.014881751 = score(doc=76,freq=2.0), product of:
            0.07421378 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02453417 = queryNorm
            0.20052543 = fieldWeight in 76, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=76)
      0.14285715 = coord(2/14)
    
    Abstract
    The user-centered approach to information retrieval emphasizes the importance of a user model in determining what information will be most useful to a particular user, given their context. Mediated search provides an opportunity to elaborate on this idea, as an intermediary's elicitations reveal what aspects of the user model they think are worth inquiring about. However, empirical evidence is divided over whether intermediaries actually work to develop a broadly conceived user model. Our research revisits the issue in a web research services setting, whose characteristics are expected to result in more thorough user modeling on the part of intermediaries. Our empirical study confirms that intermediaries engage in rich user modeling. While intermediaries behave differently across settings, our interpretation is that the underlying user model characteristics that intermediaries inquire about in our setting are applicable to other settings as well.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.3, S.584-599
  19. Thelwall, M.; Sud, P.: ¬A comparison of methods for collecting web citation data for academic organizations (2011) 0.00
    0.0033493014 = product of:
      0.023445109 = sum of:
        0.005906798 = weight(_text_:information in 4626) [ClassicSimilarity], result of:
          0.005906798 = score(doc=4626,freq=4.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.13714671 = fieldWeight in 4626, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4626)
        0.017538311 = weight(_text_:retrieval in 4626) [ClassicSimilarity], result of:
          0.017538311 = score(doc=4626,freq=4.0), product of:
            0.07421378 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.02453417 = queryNorm
            0.23632148 = fieldWeight in 4626, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4626)
      0.14285715 = coord(2/14)
    
    Abstract
    The primary webometric method for estimating the online impact of an organization is to count links to its website. Link counts have been available from commercial search engines for over a decade but this was set to end by early 2012 and so a replacement is needed. This article compares link counts to two alternative methods: URL citations and organization title mentions. New variations of these methods are also introduced. The three methods are compared against each other using Yahoo!. Two of the three methods (URL citations and organization title mentions) are also compared against each other using Bing. Evidence from a case study of 131 UK universities and 49 US Library and Information Science (LIS) departments suggests that Bing's Hit Count Estimates (HCEs) for popular title searches are not useful for webometric research but that Yahoo!'s HCEs for all three types of search and Bing's URL citation HCEs seem to be consistent. For exact URL counts the results of all three methods in Yahoo! and both methods in Bing are also consistent. Four types of accuracy factors are also introduced and defined: search engine coverage, search engine retrieval variation, search engine retrieval anomalies, and query polysemy.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.8, S.1488-1497
  20. Kang, H.; Plaisant, C.; Elsayed, T.; Oard, D.W.: Making sense of archived e-mail : exploring the Enron collection with NetLens (2010) 0.00
    0.003317363 = product of:
      0.023221541 = sum of:
        0.016133383 = weight(_text_:system in 3446) [ClassicSimilarity], result of:
          0.016133383 = score(doc=3446,freq=2.0), product of:
            0.07727166 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.02453417 = queryNorm
            0.20878783 = fieldWeight in 3446, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=3446)
        0.0070881573 = weight(_text_:information in 3446) [ClassicSimilarity], result of:
          0.0070881573 = score(doc=3446,freq=4.0), product of:
            0.04306919 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02453417 = queryNorm
            0.16457605 = fieldWeight in 3446, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3446)
      0.14285715 = coord(2/14)
    
    Abstract
    Informal communications media pose new challenges for information-systems design, but the nature of informal interaction offers new opportunities as well. This paper describes NetLens-E-mail, a system designed to support exploration of the content-actor network in large e-mail collections. Unique features of NetLens-E-mail include close coupling of orientation, specification, restriction, and expansion, and introduction and incorporation of a novel capability for iterative projection between content and actor networks within the same collection. Scenarios are presented to illustrate the intended employment of NetLens-E-mail, and design walkthroughs with two domain experts provide an initial basis for assessment of the suitability of the design by scholars and analysts.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.4, S.723-744

Types

  • a 145
  • m 8
  • el 7
  • More… Less…