Search (83 results, page 1 of 5)

  • × theme_ss:"Data Mining"
  1. Fonseca, F.; Marcinkowski, M.; Davis, C.: Cyber-human systems of thought and understanding (2019) 0.06
    0.057023626 = product of:
      0.095039375 = sum of:
        0.022488397 = weight(_text_:technology in 5011) [ClassicSimilarity], result of:
          0.022488397 = score(doc=5011,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.16453418 = fieldWeight in 5011, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5011)
        0.057007212 = weight(_text_:social in 5011) [ClassicSimilarity], result of:
          0.057007212 = score(doc=5011,freq=4.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.3115296 = fieldWeight in 5011, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5011)
        0.015543767 = product of:
          0.031087535 = sum of:
            0.031087535 = weight(_text_:22 in 5011) [ClassicSimilarity], result of:
              0.031087535 = score(doc=5011,freq=2.0), product of:
                0.16070013 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04589033 = queryNorm
                0.19345059 = fieldWeight in 5011, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5011)
          0.5 = coord(1/2)
      0.6 = coord(3/5)
    
    Abstract
    The present challenge faced by scientists working with Big Data comes in the overwhelming volume and level of detail provided by current data sets. Exceeding traditional empirical approaches, Big Data opens a new perspective on scientific work in which data comes to play a role in the development of the scientific problematic to be developed. Addressing this reconfiguration of our relationship with data through readings of Wittgenstein, Macherey, and Popper, we propose a picture of science that encourages scientists to engage with the data in a direct way, using the data itself as an instrument for scientific investigation. Using GIS as a theme, we develop the concept of cyber-human systems of thought and understanding to bridge the divide between representative (theoretical) thinking and (non-theoretical) data-driven science. At the foundation of these systems, we invoke the concept of the "semantic pixel" to establish a logical and virtual space linking data and the work of scientists. It is with this discussion of the relationship between analysts in their pursuit of knowledge and the rise of Big Data that this present discussion of the philosophical foundations of Big Data addresses the central questions raised by social informatics research.
    Date
    7. 3.2019 16:32:22
    Footnote
    Beitrag eines Special issue on social informatics of knowledge
    Source
    Journal of the Association for Information Science and Technology. 70(2019) no.4, S.402-411
  2. Sun, X.; Lin, H.: Topical community detection from mining user tagging behavior and interest (2013) 0.05
    0.05405986 = product of:
      0.13514964 = sum of:
        0.026986076 = weight(_text_:technology in 605) [ClassicSimilarity], result of:
          0.026986076 = score(doc=605,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.19744103 = fieldWeight in 605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=605)
        0.10816357 = weight(_text_:social in 605) [ClassicSimilarity], result of:
          0.10816357 = score(doc=605,freq=10.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.59108585 = fieldWeight in 605, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046875 = fieldNorm(doc=605)
      0.4 = coord(2/5)
    
    Abstract
    With the development of Web2.0, social tagging systems in which users can freely choose tags to annotate resources according to their interests have attracted much attention. In particular, literature on the emergence of collective intelligence in social tagging systems has increased. In this article, we propose a probabilistic generative model to detect latent topical communities among users. Social tags and resource contents are leveraged to model user interest in two similar and correlated ways. Our primary goal is to capture user tagging behavior and interest and discover the emergent topical community structure. The communities should be groups of users with frequent social interactions as well as similar topical interests, which would have important research implications for personalized information services. Experimental results on two real social tagging data sets with different genres have shown that the proposed generative model more accurately models user interest and detects high-quality and meaningful topical communities.
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.2, S.321-333
  3. Thelwall, M.; Wilkinson, D.; Uppal, S.: Data mining emotion in social network communication : gender differences in MySpace (2009) 0.05
    0.049492206 = product of:
      0.12373052 = sum of:
        0.026986076 = weight(_text_:technology in 3322) [ClassicSimilarity], result of:
          0.026986076 = score(doc=3322,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.19744103 = fieldWeight in 3322, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=3322)
        0.09674444 = weight(_text_:social in 3322) [ClassicSimilarity], result of:
          0.09674444 = score(doc=3322,freq=8.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.52868325 = fieldWeight in 3322, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046875 = fieldNorm(doc=3322)
      0.4 = coord(2/5)
    
    Abstract
    Despite the rapid growth in social network sites and in data mining for emotion (sentiment analysis), little research has tied the two together, and none has had social science goals. This article examines the extent to which emotion is present in MySpace comments, using a combination of data mining and content analysis, and exploring age and gender. A random sample of 819 public comments to or from U.S. users was manually classified for strength of positive and negative emotion. Two thirds of the comments expressed positive emotion, but a minority (20%) contained negative emotion, confirming that MySpace is an extraordinarily emotion-rich environment. Females are likely to give and receive more positive comments than are males, but there is no difference for negative comments. It is thus possible that females are more successful social network site users partly because of their greater ability to textually harness positive affect.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.1, S.190-199
  4. Ebrahimi, M.; ShafieiBavani, E.; Wong, R.; Chen, F.: Twitter user geolocation by filtering of highly mentioned users (2018) 0.05
    0.049492206 = product of:
      0.12373052 = sum of:
        0.026986076 = weight(_text_:technology in 4286) [ClassicSimilarity], result of:
          0.026986076 = score(doc=4286,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.19744103 = fieldWeight in 4286, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=4286)
        0.09674444 = weight(_text_:social in 4286) [ClassicSimilarity], result of:
          0.09674444 = score(doc=4286,freq=8.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.52868325 = fieldWeight in 4286, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046875 = fieldNorm(doc=4286)
      0.4 = coord(2/5)
    
    Abstract
    Geolocated social media data provide a powerful source of information about places and regional human behavior. Because only a small amount of social media data have been geolocation-annotated, inference techniques play a substantial role to increase the volume of annotated data. Conventional research in this area has been based on the text content of posts from a given user or the social network of the user, with some recent crossovers between the text- and network-based approaches. This paper proposes a novel approach to categorize highly-mentioned users (celebrities) into Local and Global types, and consequently use Local celebrities as location indicators. A label propagation algorithm is then used over the refined social network for geolocation inference. Finally, we propose a hybrid approach by merging a text-based method as a back-off strategy into our network-based approach. Empirical experiments over three standard Twitter benchmark data sets demonstrate that our approach outperforms state-of-the-art user geolocation methods.
    Source
    Journal of the Association for Information Science and Technology. 69(2018) no.7, S.879-889
  5. Kulathuramaiyer, N.; Maurer, H.: Implications of emerging data mining (2009) 0.05
    0.046059962 = product of:
      0.1151499 = sum of:
        0.04674126 = weight(_text_:technology in 3144) [ClassicSimilarity], result of:
          0.04674126 = score(doc=3144,freq=6.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.34197792 = fieldWeight in 3144, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=3144)
        0.068408646 = weight(_text_:social in 3144) [ClassicSimilarity], result of:
          0.068408646 = score(doc=3144,freq=4.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.3738355 = fieldWeight in 3144, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046875 = fieldNorm(doc=3144)
      0.4 = coord(2/5)
    
    Abstract
    Data Mining describes a technology that discovers non-trivial hidden patterns in a large collection of data. Although this technology has a tremendous impact on our lives, the invaluable contributions of this invisible technology often go unnoticed. This paper discusses advances in data mining while focusing on the emerging data mining capability. Such data mining applications perform multidimensional mining on a wide variety of heterogeneous data sources, providing solutions to many unresolved problems. This paper also highlights the advantages and disadvantages arising from the ever-expanding scope of data mining. Data Mining augments human intelligence by equipping us with a wealth of knowledge and by empowering us to perform our daily tasks better. As the mining scope and capacity increases, users and organizations become more willing to compromise privacy. The huge data stores of the 'master miners' allow them to gain deep insights into individual lifestyles and their social and behavioural patterns. Data integration and analysis capability of combining business and financial trends together with the ability to deterministically track market changes will drastically affect our lives.
    Source
    Social Semantic Web: Web 2.0, was nun? Hrsg.: A. Blumauer u. T. Pellegrini
  6. Thelwall, M.; Wilkinson, D.: Public dialogs in social network sites : What is their purpose? (2010) 0.04
    0.04430769 = product of:
      0.11076923 = sum of:
        0.026986076 = weight(_text_:technology in 3327) [ClassicSimilarity], result of:
          0.026986076 = score(doc=3327,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.19744103 = fieldWeight in 3327, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=3327)
        0.08378315 = weight(_text_:social in 3327) [ClassicSimilarity], result of:
          0.08378315 = score(doc=3327,freq=6.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.45785317 = fieldWeight in 3327, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046875 = fieldNorm(doc=3327)
      0.4 = coord(2/5)
    
    Abstract
    Social network sites (SNSs) such as MySpace and Facebook are important venues for interpersonal communication, especially among youth. One way in which members can communicate is to write public messages on each other's profile, but how is this unusual means of communication used in practice? An analysis of 2,293 public comment exchanges extracted from large samples of U.S. and U.K. MySpace members found them to be relatively rapid, but rarely used for prolonged exchanges. They seem to fulfill two purposes: making initial contact and keeping in touch occasionally such as at birthdays and other important dates. Although about half of the dialogs seem to exchange some gossip, the dialogs seem typically too short to play the role of gossip-based social grooming for typical pairs of Friends, but close Friends may still communicate extensively in SNSs with other methods.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.2, S.392-404
  7. Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.03
    0.034876682 = product of:
      0.0871917 = sum of:
        0.062321678 = weight(_text_:technology in 1737) [ClassicSimilarity], result of:
          0.062321678 = score(doc=1737,freq=6.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.45597056 = fieldWeight in 1737, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0625 = fieldNorm(doc=1737)
        0.024870027 = product of:
          0.049740054 = sum of:
            0.049740054 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
              0.049740054 = score(doc=1737,freq=2.0), product of:
                0.16070013 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04589033 = queryNorm
                0.30952093 = fieldWeight in 1737, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1737)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Defines digital libraries and discusses the effects of new technology on librarians. Examines the different viewpoints of librarians and information technologists on digital libraries. Describes the development of a digital library at the National Drug Intelligence Center, USA, which was carried out in collaboration with information technology experts. The system is based on Web enabled search technology to find information, data visualization and data mining to visualize it and use of SGML as an information standard to store it
    Date
    22.11.1998 18:57:22
  8. Mining text data (2012) 0.03
    0.032994803 = product of:
      0.08248701 = sum of:
        0.017990718 = weight(_text_:technology in 362) [ClassicSimilarity], result of:
          0.017990718 = score(doc=362,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.13162735 = fieldWeight in 362, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.03125 = fieldNorm(doc=362)
        0.06449629 = weight(_text_:social in 362) [ClassicSimilarity], result of:
          0.06449629 = score(doc=362,freq=8.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.3524555 = fieldWeight in 362, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.03125 = fieldNorm(doc=362)
      0.4 = coord(2/5)
    
    Abstract
    Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including the key research content on the topic, and the future directions of research in the field. There is a special focus on Text Embedded with Heterogeneous and Multimedia Data which makes the mining process much more challenging. A number of methods have been designed such as transfer learning and cross-lingual mining for such cases. Mining Text Data simplifies the content, so that advanced-level students, practitioners and researchers in computer science can benefit from this book. Academic and corporate libraries, as well as ACM, IEEE, and Management Science focused on information security, electronic commerce, databases, data mining, machine learning, and statistics are the primary buyers for this reference book.
    Content
    Inhalt: An Introduction to Text Mining.- Information Extraction from Text.- A Survey of Text Summarization Techniques.- A Survey of Text Clustering Algorithms.- Dimensionality Reduction and Topic Modeling.- A Survey of Text Classification Algorithms.- Transfer Learning for Text Mining.- Probabilistic Models for Text Mining.- Mining Text Streams.- Translingual Mining from Text Data.- Text Mining in Multimedia.- Text Analytics in Social Media.- A Survey of Opinion Mining and Sentiment Analysis.- Biomedical Text Mining: A Survey of Recent Progress.- Index.
  9. Borgman, C.L.; Wofford, M.F.; Golshan, M.S.; Darch, P.T.: Collaborative qualitative research at scale : reflections on 20 years of acquiring global data and making data global (2021) 0.03
    0.031798244 = product of:
      0.07949561 = sum of:
        0.022488397 = weight(_text_:technology in 239) [ClassicSimilarity], result of:
          0.022488397 = score(doc=239,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.16453418 = fieldWeight in 239, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=239)
        0.057007212 = weight(_text_:social in 239) [ClassicSimilarity], result of:
          0.057007212 = score(doc=239,freq=4.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.3115296 = fieldWeight in 239, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.0390625 = fieldNorm(doc=239)
      0.4 = coord(2/5)
    
    Abstract
    A 5-year project to study scientific data uses in geography, starting in 1999, evolved into 20 years of research on data practices in sensor networks, environmental sciences, biology, seismology, undersea science, biomedicine, astronomy, and other fields. By emulating the "team science" approaches of the scientists studied, the UCLA Center for Knowledge Infrastructures accumulated a comprehensive collection of qualitative data about how scientists generate, manage, use, and reuse data across domains. Building upon Paul N. Edwards's model of "making global data"-collecting signals via consistent methods, technologies, and policies-to "make data global"-comparing and integrating those data, the research team has managed and exploited these data as a collaborative resource. This article reflects on the social, technical, organizational, economic, and policy challenges the team has encountered in creating new knowledge from data old and new. We reflect on continuity over generations of students and staff, transitions between grants, transfer of legacy data between software tools, research methods, and the role of professional data managers in the social sciences.
    Source
    Journal of the Association for Information Science and Technology. 72(2021) no.6, S.667-682
  10. Bella, A. La; Fronzetti Colladon, A.; Battistoni, E.; Castellan, S.; Francucci, M.: Assessing perceived organizational leadership styles through twitter text mining (2018) 0.03
    0.030143319 = product of:
      0.075358294 = sum of:
        0.026986076 = weight(_text_:technology in 2400) [ClassicSimilarity], result of:
          0.026986076 = score(doc=2400,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.19744103 = fieldWeight in 2400, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=2400)
        0.04837222 = weight(_text_:social in 2400) [ClassicSimilarity], result of:
          0.04837222 = score(doc=2400,freq=2.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.26434162 = fieldWeight in 2400, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046875 = fieldNorm(doc=2400)
      0.4 = coord(2/5)
    
    Abstract
    We propose a text classification tool based on support vector machines for the assessment of organizational leadership styles, as appearing to Twitter users. We collected Twitter data over 51 days, related to the first 30 Italian organizations in the 2015 ranking of Forbes Global 2000-out of which we selected the five with the most relevant volumes of tweets. We analyzed the communication of the company leaders, together with the dialogue among the stakeholders of each company, to understand the association with perceived leadership styles and dimensions. To assess leadership profiles, we referred to the 10-factor model developed by Barchiesi and La Bella in 2007. We maintain the distinctiveness of the approach we propose, as it allows a rapid assessment of the perceived leadership capabilities of an enterprise, as they emerge from its social media interactions. It can also be used to show how companies respond and manage their communication when specific events take place, and to assess their stakeholder's reactions.
    Source
    Journal of the Association for Information Science and Technology. 69(2018) no.1, S.21-31
  11. Organisciak, P.; Schmidt, B.M.; Downie, J.S.: Giving shape to large digital libraries through exploratory data analysis (2022) 0.03
    0.030143319 = product of:
      0.075358294 = sum of:
        0.026986076 = weight(_text_:technology in 473) [ClassicSimilarity], result of:
          0.026986076 = score(doc=473,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.19744103 = fieldWeight in 473, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=473)
        0.04837222 = weight(_text_:social in 473) [ClassicSimilarity], result of:
          0.04837222 = score(doc=473,freq=2.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.26434162 = fieldWeight in 473, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046875 = fieldNorm(doc=473)
      0.4 = coord(2/5)
    
    Abstract
    The emergence of large multi-institutional digital libraries has opened the door to aggregate-level examinations of the published word. Such large-scale analysis offers a new way to pursue traditional problems in the humanities and social sciences, using digital methods to ask routine questions of large corpora. However, inquiry into multiple centuries of books is constrained by the burdens of scale, where statistical inference is technically complex and limited by hurdles to access and flexibility. This work examines the role that exploratory data analysis and visualization tools may play in understanding large bibliographic datasets. We present one such tool, HathiTrust+Bookworm, which allows multifaceted exploration of the multimillion work HathiTrust Digital Library, and center it in the broader space of scholarly tools for exploratory data analysis.
    Source
    Journal of the Association for Information Science and Technology. 73(2022) no.2, S.317-332
  12. Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.03
    0.02511943 = product of:
      0.062798575 = sum of:
        0.022488397 = weight(_text_:technology in 967) [ClassicSimilarity], result of:
          0.022488397 = score(doc=967,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.16453418 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
        0.04031018 = weight(_text_:social in 967) [ClassicSimilarity], result of:
          0.04031018 = score(doc=967,freq=2.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.22028469 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.4 = coord(2/5)
    
    Abstract
    Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1399-1410
  13. Li, D.; Tang, J.; Ding, Y.; Shuai, X.; Chambers, T.; Sun, G.; Luo, Z.; Zhang, J.: Topic-level opinion influence model (TOIM) : an investigation using tencent microblogging (2015) 0.03
    0.02511943 = product of:
      0.062798575 = sum of:
        0.022488397 = weight(_text_:technology in 2345) [ClassicSimilarity], result of:
          0.022488397 = score(doc=2345,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.16453418 = fieldWeight in 2345, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2345)
        0.04031018 = weight(_text_:social in 2345) [ClassicSimilarity], result of:
          0.04031018 = score(doc=2345,freq=2.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.22028469 = fieldWeight in 2345, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2345)
      0.4 = coord(2/5)
    
    Abstract
    Text mining has been widely used in multiple types of user-generated data to infer user opinion, but its application to microblogging is difficult because text messages are short and noisy, providing limited information about user opinion. Given that microblogging users communicate with each other to form a social network, we hypothesize that user opinion is influenced by its neighbors in the network. In this paper, we infer user opinion on a topic by combining two factors: the user's historical opinion about relevant topics and opinion influence from his/her neighbors. We thus build a topic-level opinion influence model (TOIM) by integrating both topic factor and opinion influence factor into a unified probabilistic model. We evaluate our model in one of the largest microblogging sites in China, Tencent Weibo, and the experiments show that TOIM outperforms baseline methods in opinion inference accuracy. Moreover, incorporating indirect influence further improves inference recall and f1-measure. Finally, we demonstrate some useful applications of TOIM in analyzing users' behaviors in Tencent Weibo.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2657-2673
  14. Jones, K.M.L.; Rubel, A.; LeClere, E.: ¬A matter of trust : higher education institutions as information fiduciaries in an age of educational data mining and learning analytics (2020) 0.03
    0.02511943 = product of:
      0.062798575 = sum of:
        0.022488397 = weight(_text_:technology in 5968) [ClassicSimilarity], result of:
          0.022488397 = score(doc=5968,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.16453418 = fieldWeight in 5968, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5968)
        0.04031018 = weight(_text_:social in 5968) [ClassicSimilarity], result of:
          0.04031018 = score(doc=5968,freq=2.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.22028469 = fieldWeight in 5968, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5968)
      0.4 = coord(2/5)
    
    Abstract
    Higher education institutions are mining and analyzing student data to effect educational, political, and managerial outcomes. Done under the banner of "learning analytics," this work can-and often does-surface sensitive data and information about, inter alia, a student's demographics, academic performance, offline and online movements, physical fitness, mental wellbeing, and social network. With these data, institutions and third parties are able to describe student life, predict future behaviors, and intervene to address academic or other barriers to student success (however defined). Learning analytics, consequently, raise serious issues concerning student privacy, autonomy, and the appropriate flow of student data. We argue that issues around privacy lead to valid questions about the degree to which students should trust their institution to use learning analytics data and other artifacts (algorithms, predictive scores) with their interests in mind. We argue that higher education institutions are paradigms of information fiduciaries. As such, colleges and universities have a special responsibility to their students. In this article, we use the information fiduciary concept to analyze cases when learning analytics violate an institution's responsibility to its students.
    Source
    Journal of the Association for Information Science and Technology. 71(2020) no.10, S.1227-1241
  15. Qiu, X.Y.; Srinivasan, P.; Hu, Y.: Supervised learning models to predict firm performance with annual reports : an empirical study (2014) 0.02
    0.023224084 = product of:
      0.05806021 = sum of:
        0.026986076 = weight(_text_:technology in 1205) [ClassicSimilarity], result of:
          0.026986076 = score(doc=1205,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.19744103 = fieldWeight in 1205, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=1205)
        0.031074135 = product of:
          0.06214827 = sum of:
            0.06214827 = weight(_text_:aspects in 1205) [ClassicSimilarity], result of:
              0.06214827 = score(doc=1205,freq=2.0), product of:
                0.20741826 = queryWeight, product of:
                  4.5198684 = idf(docFreq=1308, maxDocs=44218)
                  0.04589033 = queryNorm
                0.29962775 = fieldWeight in 1205, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.5198684 = idf(docFreq=1308, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1205)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Text mining and machine learning methodologies have been applied toward knowledge discovery in several domains, such as biomedicine and business. Interestingly, in the business domain, the text mining and machine learning community has minimally explored company annual reports with their mandatory disclosures. In this study, we explore the question "How can annual reports be used to predict change in company performance from one year to the next?" from a text mining perspective. Our article contributes a systematic study of the potential of company mandatory disclosures using a computational viewpoint in the following aspects: (a) We characterize our research problem along distinct dimensions to gain a reasonably comprehensive understanding of the capacity of supervised learning methods in predicting change in company performance using annual reports, and (b) our findings from unbiased systematic experiments provide further evidence about the economic incentives faced by analysts in their stock recommendations and speculations on analysts having access to more information in producing earnings forecast.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.2, S.400-413
  16. Liu, B.: Web data mining : exploring hyperlinks, contents, and usage data (2011) 0.02
    0.020095546 = product of:
      0.050238863 = sum of:
        0.017990718 = weight(_text_:technology in 354) [ClassicSimilarity], result of:
          0.017990718 = score(doc=354,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.13162735 = fieldWeight in 354, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.03125 = fieldNorm(doc=354)
        0.032248147 = weight(_text_:social in 354) [ClassicSimilarity], result of:
          0.032248147 = score(doc=354,freq=2.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.17622775 = fieldWeight in 354, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.03125 = fieldNorm(doc=354)
      0.4 = coord(2/5)
    
    Abstract
    Web mining aims to discover useful information and knowledge from the Web hyperlink structure, page contents, and usage data. Although Web mining uses many conventional data mining techniques, it is not purely an application of traditional data mining due to the semistructured and unstructured nature of the Web data and its heterogeneity. It has also developed many of its own algorithms and techniques. Liu has written a comprehensive text on Web data mining. Key topics of structure mining, content mining, and usage mining are covered both in breadth and in depth. His book brings together all the essential concepts and algorithms from related areas such as data mining, machine learning, and text processing to form an authoritative and coherent text. The book offers a rich blend of theory and practice, addressing seminal research ideas, as well as examining the technology from a practical point of view. It is suitable for students, researchers and practitioners interested in Web mining both as a learning text and a reference book. Lecturers can readily use it for classes on data mining, Web mining, and Web search. Additional teaching materials such as lecture slides, datasets, and implemented algorithms are available online.
    Content
    Inhalt: 1. Introduction 2. Association Rules and Sequential Patterns 3. Supervised Learning 4. Unsupervised Learning 5. Partially Supervised Learning 6. Information Retrieval and Web Search 7. Social Network Analysis 8. Web Crawling 9. Structured Data Extraction: Wrapper Generation 10. Information Integration
  17. Ku, L.-W.; Chen, H.-H.: Mining opinions from the Web : beyond relevance retrieval (2007) 0.02
    0.019353405 = product of:
      0.04838351 = sum of:
        0.022488397 = weight(_text_:technology in 605) [ClassicSimilarity], result of:
          0.022488397 = score(doc=605,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.16453418 = fieldWeight in 605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=605)
        0.025895113 = product of:
          0.051790226 = sum of:
            0.051790226 = weight(_text_:aspects in 605) [ClassicSimilarity], result of:
              0.051790226 = score(doc=605,freq=2.0), product of:
                0.20741826 = queryWeight, product of:
                  4.5198684 = idf(docFreq=1308, maxDocs=44218)
                  0.04589033 = queryNorm
                0.2496898 = fieldWeight in 605, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.5198684 = idf(docFreq=1308, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=605)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Documents discussing public affairs, common themes, interesting products, and so on, are reported and distributed on the Web. Positive and negative opinions embedded in documents are useful references and feedbacks for governments to improve their services, for companies to market their products, and for customers to purchase their objects. Web opinion mining aims to extract, summarize, and track various aspects of subjective information on the Web. Mining subjective information enables traditional information retrieval (IR) systems to retrieve more data from human viewpoints and provide information with finer granularity. Opinion extraction identifies opinion holders, extracts the relevant opinion sentences, and decides their polarities. Opinion summarization recognizes the major events embedded in documents and summarizes the supportive and the nonsupportive evidence. Opinion tracking captures subjective information from various genres and monitors the developments of opinions from spatial and temporal dimensions. To demonstrate and evaluate the proposed opinion mining algorithms, news and bloggers' articles are adopted. Documents in the evaluation corpora are tagged in different granularities from words, sentences to documents. In the experiments, positive and negative sentiment words and their weights are mined on the basis of Chinese word structures. The f-measure is 73.18% and 63.75% for verbs and nouns, respectively. Utilizing the sentiment words mined together with topical words, we achieve f-measure 62.16% at the sentence level and 74.37% at the document level.
    Source
    Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1838-1850
  18. O'Brien, H.L.; Lebow, M.: Mixed-methods approach to measuring user experience in online news interactions (2013) 0.02
    0.019353405 = product of:
      0.04838351 = sum of:
        0.022488397 = weight(_text_:technology in 1001) [ClassicSimilarity], result of:
          0.022488397 = score(doc=1001,freq=2.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.16453418 = fieldWeight in 1001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1001)
        0.025895113 = product of:
          0.051790226 = sum of:
            0.051790226 = weight(_text_:aspects in 1001) [ClassicSimilarity], result of:
              0.051790226 = score(doc=1001,freq=2.0), product of:
                0.20741826 = queryWeight, product of:
                  4.5198684 = idf(docFreq=1308, maxDocs=44218)
                  0.04589033 = queryNorm
                0.2496898 = fieldWeight in 1001, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.5198684 = idf(docFreq=1308, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1001)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    When it comes to evaluating online information experiences, what metrics matter? We conducted a study in which 30 people browsed and selected content within an online news website. Data collected included psychometric scales (User Engagement, Cognitive Absorption, System Usability Scales), self-reported interest in news content, and performance metrics (i.e., reading time, browsing time, total time, number of pages visited, and use of recommended links); a subset of the participants had their physiological responses recorded during the interaction (i.e., heart rate, electrodermal activity, electrocmytogram). Findings demonstrated the concurrent validity of the psychometric scales and interest ratings and revealed that increased time on tasks, number of pages visited, and use of recommended links were not necessarily indicative of greater self-reported engagement, cognitive absorption, or perceived usability. Positive ratings of news content were associated with lower physiological activity. The implications of this research are twofold. First, we propose that user experience is a useful framework for studying online information interactions and will result in a broader conceptualization of information interaction and its evaluation. Second, we advocate a mixed-methods approach to measurement that employs a suite of metrics capable of capturing the pragmatic (e.g., usability) and hedonic (e.g., fun, engagement) aspects of information interactions. We underscore the importance of using multiple measures in information research, because our results emphasize that performance and physiological data must be interpreted in the context of users' subjective experiences.
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.8, S.1543-1556
  19. Wongthontham, P.; Abu-Salih, B.: Ontology-based approach for semantic data extraction from social big data : state-of-the-art and research directions (2018) 0.02
    0.019348888 = product of:
      0.09674444 = sum of:
        0.09674444 = weight(_text_:social in 4097) [ClassicSimilarity], result of:
          0.09674444 = score(doc=4097,freq=8.0), product of:
            0.18299131 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.04589033 = queryNorm
            0.52868325 = fieldWeight in 4097, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046875 = fieldNorm(doc=4097)
      0.2 = coord(1/5)
    
    Abstract
    A challenge of managing and extracting useful knowledge from social media data sources has attracted much attention from academic and industry. To address this challenge, semantic analysis of textual data is focused in this paper. We propose an ontology-based approach to extract semantics of textual data and define the domain of data. In other words, we semantically analyse the social data at two levels i.e. the entity level and the domain level. We have chosen Twitter as a social channel challenge for a purpose of concept proof. Domain knowledge is captured in ontologies which are then used to enrich the semantics of tweets provided with specific semantic conceptual representation of entities that appear in the tweets. Case studies are used to demonstrate this approach. We experiment and evaluate our proposed approach with a public dataset collected from Twitter and from the politics domain. The ontology-based approach leverages entity extraction and concept mappings in terms of quantity and accuracy of concept identification.
  20. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.02
    0.018938866 = product of:
      0.047347162 = sum of:
        0.031803396 = weight(_text_:technology in 1605) [ClassicSimilarity], result of:
          0.031803396 = score(doc=1605,freq=4.0), product of:
            0.13667917 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.04589033 = queryNorm
            0.23268649 = fieldWeight in 1605, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1605)
        0.015543767 = product of:
          0.031087535 = sum of:
            0.031087535 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
              0.031087535 = score(doc=1605,freq=2.0), product of:
                0.16070013 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04589033 = queryNorm
                0.19345059 = fieldWeight in 1605, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1605)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22

Years

Languages

  • e 76
  • d 7

Types

  • a 74
  • m 7
  • s 5
  • el 4
  • More… Less…