Search (48 results, page 1 of 3)

  • × theme_ss:"Data Mining"
  • × year_i:[2010 TO 2020}
  1. Fonseca, F.; Marcinkowski, M.; Davis, C.: Cyber-human systems of thought and understanding (2019) 0.07
    0.066608444 = product of:
      0.099912666 = sum of:
        0.029910497 = weight(_text_:of in 5011) [ClassicSimilarity], result of:
          0.029910497 = score(doc=5011,freq=36.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.36650562 = fieldWeight in 5011, product of:
              6.0 = tf(freq=36.0), with freq of:
                36.0 = termFreq=36.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5011)
        0.07000217 = sum of:
          0.03464816 = weight(_text_:science in 5011) [ClassicSimilarity], result of:
            0.03464816 = score(doc=5011,freq=6.0), product of:
              0.13747036 = queryWeight, product of:
                2.6341193 = idf(docFreq=8627, maxDocs=44218)
                0.05218836 = queryNorm
              0.25204095 = fieldWeight in 5011, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                2.6341193 = idf(docFreq=8627, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5011)
          0.03535401 = weight(_text_:22 in 5011) [ClassicSimilarity], result of:
            0.03535401 = score(doc=5011,freq=2.0), product of:
              0.18275474 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05218836 = queryNorm
              0.19345059 = fieldWeight in 5011, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5011)
      0.6666667 = coord(2/3)
    
    Abstract
    The present challenge faced by scientists working with Big Data comes in the overwhelming volume and level of detail provided by current data sets. Exceeding traditional empirical approaches, Big Data opens a new perspective on scientific work in which data comes to play a role in the development of the scientific problematic to be developed. Addressing this reconfiguration of our relationship with data through readings of Wittgenstein, Macherey, and Popper, we propose a picture of science that encourages scientists to engage with the data in a direct way, using the data itself as an instrument for scientific investigation. Using GIS as a theme, we develop the concept of cyber-human systems of thought and understanding to bridge the divide between representative (theoretical) thinking and (non-theoretical) data-driven science. At the foundation of these systems, we invoke the concept of the "semantic pixel" to establish a logical and virtual space linking data and the work of scientists. It is with this discussion of the relationship between analysts in their pursuit of knowledge and the rise of Big Data that this present discussion of the philosophical foundations of Big Data addresses the central questions raised by social informatics research.
    Date
    7. 3.2019 16:32:22
    Footnote
    Beitrag eines Special issue on social informatics of knowledge
    Source
    Journal of the Association for Information Science and Technology. 70(2019) no.4, S.402-411
  2. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.05
    0.053186636 = product of:
      0.07977995 = sum of:
        0.02442182 = weight(_text_:of in 1605) [ClassicSimilarity], result of:
          0.02442182 = score(doc=1605,freq=24.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.2992506 = fieldWeight in 1605, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1605)
        0.055358134 = sum of:
          0.020004123 = weight(_text_:science in 1605) [ClassicSimilarity], result of:
            0.020004123 = score(doc=1605,freq=2.0), product of:
              0.13747036 = queryWeight, product of:
                2.6341193 = idf(docFreq=8627, maxDocs=44218)
                0.05218836 = queryNorm
              0.1455159 = fieldWeight in 1605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.6341193 = idf(docFreq=8627, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1605)
          0.03535401 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
            0.03535401 = score(doc=1605,freq=2.0), product of:
              0.18275474 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05218836 = queryNorm
              0.19345059 = fieldWeight in 1605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1605)
      0.6666667 = coord(2/3)
    
    Abstract
    Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
  3. Hallonsten, O.; Holmberg, D.: Analyzing structural stratification in the Swedish higher education system : data contextualization with policy-history analysis (2013) 0.05
    0.051768064 = product of:
      0.0776521 = sum of:
        0.022293966 = weight(_text_:of in 668) [ClassicSimilarity], result of:
          0.022293966 = score(doc=668,freq=20.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.27317715 = fieldWeight in 668, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=668)
        0.055358134 = sum of:
          0.020004123 = weight(_text_:science in 668) [ClassicSimilarity], result of:
            0.020004123 = score(doc=668,freq=2.0), product of:
              0.13747036 = queryWeight, product of:
                2.6341193 = idf(docFreq=8627, maxDocs=44218)
                0.05218836 = queryNorm
              0.1455159 = fieldWeight in 668, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.6341193 = idf(docFreq=8627, maxDocs=44218)
                0.0390625 = fieldNorm(doc=668)
          0.03535401 = weight(_text_:22 in 668) [ClassicSimilarity], result of:
            0.03535401 = score(doc=668,freq=2.0), product of:
              0.18275474 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05218836 = queryNorm
              0.19345059 = fieldWeight in 668, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=668)
      0.6666667 = coord(2/3)
    
    Abstract
    20th century massification of higher education and research in academia is said to have produced structurally stratified higher education systems in many countries. Most manifestly, the research mission of universities appears to be divisive. Authors have claimed that the Swedish system, while formally unified, has developed into a binary state, and statistics seem to support this conclusion. This article makes use of a comprehensive statistical data source on Swedish higher education institutions to illustrate stratification, and uses literature on Swedish research policy history to contextualize the statistics. Highlighting the opportunities as well as constraints of the data, the article argues that there is great merit in combining statistics with a qualitative analysis when studying the structural characteristics of national higher education systems. Not least the article shows that it is an over-simplification to describe the Swedish system as binary; the stratification is more complex. On basis of the analysis, the article also argues that while global trends certainly influence national developments, higher education systems have country-specific features that may enrich the understanding of how systems evolve and therefore should be analyzed as part of a broader study of the increasingly globalized academic system.
    Date
    22. 3.2013 19:43:01
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.3, S.574-586
  4. Carter, D.; Sholler, D.: Data science on the ground : hype, criticism, and everyday work (2016) 0.04
    0.03858435 = product of:
      0.057876524 = sum of:
        0.023928396 = weight(_text_:of in 3111) [ClassicSimilarity], result of:
          0.023928396 = score(doc=3111,freq=16.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.2932045 = fieldWeight in 3111, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=3111)
        0.033948127 = product of:
          0.067896254 = sum of:
            0.067896254 = weight(_text_:science in 3111) [ClassicSimilarity], result of:
              0.067896254 = score(doc=3111,freq=16.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.49389738 = fieldWeight in 3111, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3111)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Modern organizations often employ data scientists to improve business processes using diverse sets of data. Researchers and practitioners have both touted the benefits and warned of the drawbacks associated with data science and big data approaches, but few studies investigate how data science is carried out "on the ground." In this paper, we first review the hype and criticisms surrounding data science and big data approaches. We then present the findings of semistructured interviews with 18 data analysts from various industries and organizational roles. Using qualitative coding techniques, we evaluated these interviews in light of the hype and criticisms surrounding data science in the popular discourse. We found that although the data analysts we interviewed were sensitive to both the allure and the potential pitfalls of data science, their motivations and evaluations of their work were more nuanced. We conclude by reflecting on the relationship between data analysts' work and the discourses around data science and big data, suggesting how future research can better account for the everyday practices of this profession.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.10, S.2309-2319
  5. Leydesdorff, L.; Persson, O.: Mapping the geography of science : distribution patterns and networks of relations among cities and institutes (2010) 0.03
    0.03470899 = product of:
      0.052063484 = sum of:
        0.028058534 = weight(_text_:of in 3704) [ClassicSimilarity], result of:
          0.028058534 = score(doc=3704,freq=22.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.34381276 = fieldWeight in 3704, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=3704)
        0.02400495 = product of:
          0.0480099 = sum of:
            0.0480099 = weight(_text_:science in 3704) [ClassicSimilarity], result of:
              0.0480099 = score(doc=3704,freq=8.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.34923816 = fieldWeight in 3704, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3704)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Using Google Earth, Google Maps, and/or network visualization programs such as Pajek, one can overlay the network of relations among addresses in scientific publications onto the geographic map. The authors discuss the pros and cons of various options, and provide software (freeware) for bridging existing gaps between the Science Citation Indices (Thomson Reuters) and Scopus (Elsevier), on the one hand, and these various visualization tools on the other. At the level of city names, the global map can be drawn reliably on the basis of the available address information. At the level of the names of organizations and institutes, there are problems of unification both in the ISI databases and with Scopus. Pajek enables a combination of visualization and statistical analysis, whereas the Google Maps and its derivatives provide superior tools on the Internet.
    Object
    Science Citation Index
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.8, S.1622-1634
  6. Kong, S.; Ye, F.; Feng, L.; Zhao, Z.: Towards the prediction problems of bursting hashtags on Twitter (2015) 0.03
    0.032128956 = product of:
      0.048193432 = sum of:
        0.034190547 = weight(_text_:of in 2338) [ClassicSimilarity], result of:
          0.034190547 = score(doc=2338,freq=24.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.41895083 = fieldWeight in 2338, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
        0.0140028875 = product of:
          0.028005775 = sum of:
            0.028005775 = weight(_text_:science in 2338) [ClassicSimilarity], result of:
              0.028005775 = score(doc=2338,freq=2.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.20372227 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Hundreds of thousands of hashtags are generated every day on Twitter. Only a few will burst and become trending topics. In this article, we provide the definition of a bursting hashtag and conduct a systematic study of a series of challenging prediction problems that span the entire life cycles of bursting hashtags. Around the problem of "how to build a system to predict bursting hashtags," we explore different types of features and present machine learning solutions. On real data sets from Twitter, experiments are conducted to evaluate the effectiveness of the proposed solutions and the contributions of features.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2566-2579
  7. Blake, C.: Text mining (2011) 0.03
    0.031830467 = product of:
      0.047745697 = sum of:
        0.01973992 = weight(_text_:of in 1599) [ClassicSimilarity], result of:
          0.01973992 = score(doc=1599,freq=2.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.24188137 = fieldWeight in 1599, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.109375 = fieldNorm(doc=1599)
        0.028005775 = product of:
          0.05601155 = sum of:
            0.05601155 = weight(_text_:science in 1599) [ClassicSimilarity], result of:
              0.05601155 = score(doc=1599,freq=2.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.40744454 = fieldWeight in 1599, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.109375 = fieldNorm(doc=1599)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Source
    Annual review of information science and technology. 45(2011) no.1, S.121-155
  8. Frické, M.: Big data and its epistemology (2015) 0.03
    0.0317073 = product of:
      0.04756095 = sum of:
        0.020722598 = weight(_text_:of in 1811) [ClassicSimilarity], result of:
          0.020722598 = score(doc=1811,freq=12.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.25392252 = fieldWeight in 1811, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=1811)
        0.026838351 = product of:
          0.053676702 = sum of:
            0.053676702 = weight(_text_:science in 1811) [ClassicSimilarity], result of:
              0.053676702 = score(doc=1811,freq=10.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.39046016 = fieldWeight in 1811, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1811)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The article considers whether Big Data, in the form of data-driven science, will enable the discovery, or appraisal, of universal scientific theories, instrumentalist tools, or inductive inferences. It points out, initially, that such aspirations are similar to the now-discredited inductivist approach to science. On the positive side, Big Data may permit larger sample sizes, cheaper and more extensive testing of theories, and the continuous assessment of theories. On the negative side, data-driven science encourages passive data collection, as opposed to experimentation and testing, and hornswoggling ("unsound statistical fiddling"). The roles of theory and data in inductive algorithms, statistical modeling, and scientific discoveries are analyzed, and it is argued that theory is needed at every turn. Data-driven science is a chimera.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.4, S.651-661
  9. Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.03
    0.028235972 = product of:
      0.042353958 = sum of:
        0.025379896 = weight(_text_:of in 3015) [ClassicSimilarity], result of:
          0.025379896 = score(doc=3015,freq=18.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.3109903 = fieldWeight in 3015, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=3015)
        0.016974064 = product of:
          0.033948127 = sum of:
            0.033948127 = weight(_text_:science in 3015) [ClassicSimilarity], result of:
              0.033948127 = score(doc=3015,freq=4.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.24694869 = fieldWeight in 3015, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3015)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.7, S.1668-1678
  10. Bella, A. La; Fronzetti Colladon, A.; Battistoni, E.; Castellan, S.; Francucci, M.: Assessing perceived organizational leadership styles through twitter text mining (2018) 0.03
    0.025836824 = product of:
      0.038755234 = sum of:
        0.02675276 = weight(_text_:of in 2400) [ClassicSimilarity], result of:
          0.02675276 = score(doc=2400,freq=20.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.32781258 = fieldWeight in 2400, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=2400)
        0.012002475 = product of:
          0.02400495 = sum of:
            0.02400495 = weight(_text_:science in 2400) [ClassicSimilarity], result of:
              0.02400495 = score(doc=2400,freq=2.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.17461908 = fieldWeight in 2400, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2400)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    We propose a text classification tool based on support vector machines for the assessment of organizational leadership styles, as appearing to Twitter users. We collected Twitter data over 51 days, related to the first 30 Italian organizations in the 2015 ranking of Forbes Global 2000-out of which we selected the five with the most relevant volumes of tweets. We analyzed the communication of the company leaders, together with the dialogue among the stakeholders of each company, to understand the association with perceived leadership styles and dimensions. To assess leadership profiles, we referred to the 10-factor model developed by Barchiesi and La Bella in 2007. We maintain the distinctiveness of the approach we propose, as it allows a rapid assessment of the perceived leadership capabilities of an enterprise, as they emerge from its social media interactions. It can also be used to show how companies respond and manage their communication when specific events take place, and to assess their stakeholder's reactions.
    Source
    Journal of the Association for Information Science and Technology. 69(2018) no.1, S.21-31
  11. O'Brien, H.L.; Lebow, M.: Mixed-methods approach to measuring user experience in online news interactions (2013) 0.03
    0.025467968 = product of:
      0.03820195 = sum of:
        0.028199887 = weight(_text_:of in 1001) [ClassicSimilarity], result of:
          0.028199887 = score(doc=1001,freq=32.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.34554482 = fieldWeight in 1001, product of:
              5.656854 = tf(freq=32.0), with freq of:
                32.0 = termFreq=32.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1001)
        0.010002062 = product of:
          0.020004123 = sum of:
            0.020004123 = weight(_text_:science in 1001) [ClassicSimilarity], result of:
              0.020004123 = score(doc=1001,freq=2.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.1455159 = fieldWeight in 1001, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1001)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    When it comes to evaluating online information experiences, what metrics matter? We conducted a study in which 30 people browsed and selected content within an online news website. Data collected included psychometric scales (User Engagement, Cognitive Absorption, System Usability Scales), self-reported interest in news content, and performance metrics (i.e., reading time, browsing time, total time, number of pages visited, and use of recommended links); a subset of the participants had their physiological responses recorded during the interaction (i.e., heart rate, electrodermal activity, electrocmytogram). Findings demonstrated the concurrent validity of the psychometric scales and interest ratings and revealed that increased time on tasks, number of pages visited, and use of recommended links were not necessarily indicative of greater self-reported engagement, cognitive absorption, or perceived usability. Positive ratings of news content were associated with lower physiological activity. The implications of this research are twofold. First, we propose that user experience is a useful framework for studying online information interactions and will result in a broader conceptualization of information interaction and its evaluation. Second, we advocate a mixed-methods approach to measurement that employs a suite of metrics capable of capturing the pragmatic (e.g., usability) and hedonic (e.g., fun, engagement) aspects of information interactions. We underscore the importance of using multiple measures in information research, because our results emphasize that performance and physiological data must be interpreted in the context of users' subjective experiences.
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.8, S.1543-1556
  12. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.03
    0.025131106 = product of:
      0.03769666 = sum of:
        0.020722598 = weight(_text_:of in 3464) [ClassicSimilarity], result of:
          0.020722598 = score(doc=3464,freq=12.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.25392252 = fieldWeight in 3464, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
        0.016974064 = product of:
          0.033948127 = sum of:
            0.033948127 = weight(_text_:science in 3464) [ClassicSimilarity], result of:
              0.033948127 = score(doc=3464,freq=4.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.24694869 = fieldWeight in 3464, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3464)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1105-1119
  13. Ekbia, H.; Mattioli, M.; Kouper, I.; Arave, G.; Ghazinejad, A.; Bowman, T.; Suri, V.R.; Tsou, A.; Weingart, S.; Sugimoto, C.R.: Big data, bigger dilemmas : a critical review (2015) 0.03
    0.025018109 = product of:
      0.037527163 = sum of:
        0.02338211 = weight(_text_:of in 2155) [ClassicSimilarity], result of:
          0.02338211 = score(doc=2155,freq=22.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.28651062 = fieldWeight in 2155, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2155)
        0.014145052 = product of:
          0.028290104 = sum of:
            0.028290104 = weight(_text_:science in 2155) [ClassicSimilarity], result of:
              0.028290104 = score(doc=2155,freq=4.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.20579056 = fieldWeight in 2155, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2155)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The recent interest in Big Data has generated a broad range of new academic, corporate, and policy practices along with an evolving debate among its proponents, detractors, and skeptics. While the practices draw on a common set of tools, techniques, and technologies, most contributions to the debate come either from a particular disciplinary perspective or with a focus on a domain-specific issue. A close examination of these contributions reveals a set of common problematics that arise in various guises and in different places. It also demonstrates the need for a critical synthesis of the conceptual and practical dilemmas surrounding Big Data. The purpose of this article is to provide such a synthesis by drawing on relevant writings in the sciences, humanities, policy, and trade literature. In bringing these diverse literatures together, we aim to shed light on the common underlying issues that concern and affect all of these areas. By contextualizing the phenomenon of Big Data within larger socioeconomic developments, we also seek to provide a broader understanding of its drivers, barriers, and challenges. This approach allows us to identify attributes of Big Data that require more attention-autonomy, opacity, generativity, disparity, and futurity-leading to questions and ideas for moving beyond dilemmas.
    Series
    Advances in information science
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.8, S.1523-1545
  14. Sarnikar, S.; Zhang, Z.; Zhao, J.L.: Query-performance prediction for effective query routing in domain-specific repositories (2014) 0.02
    0.024921581 = product of:
      0.03738237 = sum of:
        0.025379896 = weight(_text_:of in 1326) [ClassicSimilarity], result of:
          0.025379896 = score(doc=1326,freq=18.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.3109903 = fieldWeight in 1326, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=1326)
        0.012002475 = product of:
          0.02400495 = sum of:
            0.02400495 = weight(_text_:science in 1326) [ClassicSimilarity], result of:
              0.02400495 = score(doc=1326,freq=2.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.17461908 = fieldWeight in 1326, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1326)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The effective use of corporate memory is becoming increasingly important because every aspect of e-business requires access to information repositories. Unfortunately, less-than-satisfying effectiveness in state-of-the-art information-retrieval techniques is well known, even for some of the best search engines such as Google. In this study, the authors resolve this retrieval ineffectiveness problem by developing a new framework for predicting query performance, which is the first step toward better retrieval effectiveness. Specifically, they examine the relationship between query performance and query context. A query context consists of the query itself, the document collection, and the interaction between the two. The authors first analyze the characteristics of query context and develop various features for predicting query performance. Then, they propose a context-sensitive model for predicting query performance based on the characteristics of the query and the document collection. Finally, they validate this model with respect to five real-world collections of documents and demonstrate its utility in routing queries to the correct repository with high accuracy.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.8, S.1597-1614
  15. Ebrahimi, M.; ShafieiBavani, E.; Wong, R.; Chen, F.: Twitter user geolocation by filtering of highly mentioned users (2018) 0.02
    0.023953915 = product of:
      0.035930872 = sum of:
        0.023928396 = weight(_text_:of in 4286) [ClassicSimilarity], result of:
          0.023928396 = score(doc=4286,freq=16.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.2932045 = fieldWeight in 4286, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=4286)
        0.012002475 = product of:
          0.02400495 = sum of:
            0.02400495 = weight(_text_:science in 4286) [ClassicSimilarity], result of:
              0.02400495 = score(doc=4286,freq=2.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.17461908 = fieldWeight in 4286, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4286)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Geolocated social media data provide a powerful source of information about places and regional human behavior. Because only a small amount of social media data have been geolocation-annotated, inference techniques play a substantial role to increase the volume of annotated data. Conventional research in this area has been based on the text content of posts from a given user or the social network of the user, with some recent crossovers between the text- and network-based approaches. This paper proposes a novel approach to categorize highly-mentioned users (celebrities) into Local and Global types, and consequently use Local celebrities as location indicators. A label propagation algorithm is then used over the refined social network for geolocation inference. Finally, we propose a hybrid approach by merging a text-based method as a back-off strategy into our network-based approach. Empirical experiments over three standard Twitter benchmark data sets demonstrate that our approach outperforms state-of-the-art user geolocation methods.
    Source
    Journal of the Association for Information Science and Technology. 69(2018) no.7, S.879-889
  16. Derek Doran, D.; Gokhale, S.S.: ¬A classification framework for web robots (2012) 0.02
    0.023693837 = product of:
      0.035540756 = sum of:
        0.019537456 = weight(_text_:of in 505) [ClassicSimilarity], result of:
          0.019537456 = score(doc=505,freq=6.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.23940048 = fieldWeight in 505, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=505)
        0.0160033 = product of:
          0.0320066 = sum of:
            0.0320066 = weight(_text_:science in 505) [ClassicSimilarity], result of:
              0.0320066 = score(doc=505,freq=2.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.23282544 = fieldWeight in 505, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.0625 = fieldNorm(doc=505)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The behavior of modern web robots varies widely when they crawl for different purposes. In this article, we present a framework to classify these web robots from two orthogonal perspectives, namely, their functionality and the types of resources they consume. Applying the classification framework to a year-long access log from the UConn SoE web server, we present trends that point to significant differences in their crawling behavior.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.12, S.2549-2554,
  17. Gill, A.J.; Hinrichs-Krapels, S.; Blanke, T.; Grant, J.; Hedges, M.; Tanner, S.: Insight workflow : systematically combining human and computational methods to explore textual data (2017) 0.02
    0.022949254 = product of:
      0.03442388 = sum of:
        0.02442182 = weight(_text_:of in 3682) [ClassicSimilarity], result of:
          0.02442182 = score(doc=3682,freq=24.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.2992506 = fieldWeight in 3682, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3682)
        0.010002062 = product of:
          0.020004123 = sum of:
            0.020004123 = weight(_text_:science in 3682) [ClassicSimilarity], result of:
              0.020004123 = score(doc=3682,freq=2.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.1455159 = fieldWeight in 3682, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3682)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Analyzing large quantities of real-world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real-world data set as our case study, and use this exploration as a demonstration of our "insight workflow," which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in-depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.
    Source
    Journal of the Association for Information Science and Technology. 68(2017) no.7, S.1671-1686
  18. Thelwall, M.; Wilkinson, D.: Public dialogs in social network sites : What is their purpose? (2010) 0.02
    0.02292363 = product of:
      0.034385443 = sum of:
        0.022382967 = weight(_text_:of in 3327) [ClassicSimilarity], result of:
          0.022382967 = score(doc=3327,freq=14.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.2742677 = fieldWeight in 3327, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=3327)
        0.012002475 = product of:
          0.02400495 = sum of:
            0.02400495 = weight(_text_:science in 3327) [ClassicSimilarity], result of:
              0.02400495 = score(doc=3327,freq=2.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.17461908 = fieldWeight in 3327, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3327)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Social network sites (SNSs) such as MySpace and Facebook are important venues for interpersonal communication, especially among youth. One way in which members can communicate is to write public messages on each other's profile, but how is this unusual means of communication used in practice? An analysis of 2,293 public comment exchanges extracted from large samples of U.S. and U.K. MySpace members found them to be relatively rapid, but rarely used for prolonged exchanges. They seem to fulfill two purposes: making initial contact and keeping in touch occasionally such as at birthdays and other important dates. Although about half of the dialogs seem to exchange some gossip, the dialogs seem typically too short to play the role of gossip-based social grooming for typical pairs of Friends, but close Friends may still communicate extensively in SNSs with other methods.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.2, S.392-404
  19. Tonkin, E.L.; Tourte, G.J.L.: Working with text. tools, techniques and approaches for text mining (2016) 0.02
    0.022723591 = product of:
      0.034085386 = sum of:
        0.019940332 = weight(_text_:of in 4019) [ClassicSimilarity], result of:
          0.019940332 = score(doc=4019,freq=16.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.24433708 = fieldWeight in 4019, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4019)
        0.014145052 = product of:
          0.028290104 = sum of:
            0.028290104 = weight(_text_:science in 4019) [ClassicSimilarity], result of:
              0.028290104 = score(doc=4019,freq=4.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.20579056 = fieldWeight in 4019, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4019)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    What is text mining, and how can it be used? What relevance do these methods have to everyday work in information science and the digital humanities? How does one develop competences in text mining? Working with Text provides a series of cross-disciplinary perspectives on text mining and its applications. As text mining raises legal and ethical issues, the legal background of text mining and the responsibilities of the engineer are discussed in this book. Chapters provide an introduction to the use of the popular GATE text mining package with data drawn from social media, the use of text mining to support semantic search, the development of an authority system to support content tagging, and recent techniques in automatic language evaluation. Focused studies describe text mining on historical texts, automated indexing using constrained vocabularies, and the use of natural language processing to explore the climate science literature. Interviews are included that offer a glimpse into the real-life experience of working within commercial and academic text mining.
  20. Tu, Y.-N.; Hsu, S.-L.: Constructing conceptual trajectory maps to trace the development of research fields (2016) 0.02
    0.022256117 = product of:
      0.033384174 = sum of:
        0.02338211 = weight(_text_:of in 3059) [ClassicSimilarity], result of:
          0.02338211 = score(doc=3059,freq=22.0), product of:
            0.08160993 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.05218836 = queryNorm
            0.28651062 = fieldWeight in 3059, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3059)
        0.010002062 = product of:
          0.020004123 = sum of:
            0.020004123 = weight(_text_:science in 3059) [ClassicSimilarity], result of:
              0.020004123 = score(doc=3059,freq=2.0), product of:
                0.13747036 = queryWeight, product of:
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.05218836 = queryNorm
                0.1455159 = fieldWeight in 3059, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.6341193 = idf(docFreq=8627, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3059)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    This study proposes a new method to construct and trace the trajectory of conceptual development of a research field by combining main path analysis, citation analysis, and text-mining techniques. Main path analysis, a method used commonly to trace the most critical path in a citation network, helps describe the developmental trajectory of a research field. This study extends the main path analysis method and applies text-mining techniques in the new method, which reflects the trajectory of conceptual development in an academic research field more accurately than citation frequency, which represents only the articles examined. Articles can be merged based on similarity of concepts, and by merging concepts the history of a research field can be described more precisely. The new method was applied to the "h-index" and "text mining" fields. The precision, recall, and F-measures of the h-index were 0.738, 0.652, and 0.658 and those of text-mining were 0.501, 0.653, and 0.551, respectively. Last, this study not only establishes the conceptual trajectory map of a research field, but also recommends keywords that are more precise than those used currently by researchers. These precise keywords could enable researchers to gather related works more quickly than before.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.8, S.2016-2031

Languages

  • e 47
  • d 1
  • More… Less…

Types

  • a 46
  • el 7
  • m 2
  • s 1
  • More… Less…