Search (54 results, page 1 of 3)

  • × theme_ss:"Data Mining"
  • × year_i:[2010 TO 2020}
  1. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.02
    0.019754423 = product of:
      0.049386058 = sum of:
        0.0068111527 = weight(_text_:a in 1605) [ClassicSimilarity], result of:
          0.0068111527 = score(doc=1605,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.12739488 = fieldWeight in 1605, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1605)
        0.042574905 = sum of:
          0.011163551 = weight(_text_:information in 1605) [ClassicSimilarity], result of:
            0.011163551 = score(doc=1605,freq=4.0), product of:
              0.08139861 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046368346 = queryNorm
              0.13714671 = fieldWeight in 1605, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1605)
          0.031411353 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
            0.031411353 = score(doc=1605,freq=2.0), product of:
              0.16237405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046368346 = queryNorm
              0.19345059 = fieldWeight in 1605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1605)
      0.4 = coord(2/5)
    
    Abstract
    Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
    Type
    a
  2. Fonseca, F.; Marcinkowski, M.; Davis, C.: Cyber-human systems of thought and understanding (2019) 0.02
    0.019326193 = product of:
      0.048315484 = sum of:
        0.009010308 = weight(_text_:a in 5011) [ClassicSimilarity], result of:
          0.009010308 = score(doc=5011,freq=14.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.1685276 = fieldWeight in 5011, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5011)
        0.039305177 = sum of:
          0.007893822 = weight(_text_:information in 5011) [ClassicSimilarity], result of:
            0.007893822 = score(doc=5011,freq=2.0), product of:
              0.08139861 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046368346 = queryNorm
              0.09697737 = fieldWeight in 5011, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5011)
          0.031411353 = weight(_text_:22 in 5011) [ClassicSimilarity], result of:
            0.031411353 = score(doc=5011,freq=2.0), product of:
              0.16237405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046368346 = queryNorm
              0.19345059 = fieldWeight in 5011, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5011)
      0.4 = coord(2/5)
    
    Abstract
    The present challenge faced by scientists working with Big Data comes in the overwhelming volume and level of detail provided by current data sets. Exceeding traditional empirical approaches, Big Data opens a new perspective on scientific work in which data comes to play a role in the development of the scientific problematic to be developed. Addressing this reconfiguration of our relationship with data through readings of Wittgenstein, Macherey, and Popper, we propose a picture of science that encourages scientists to engage with the data in a direct way, using the data itself as an instrument for scientific investigation. Using GIS as a theme, we develop the concept of cyber-human systems of thought and understanding to bridge the divide between representative (theoretical) thinking and (non-theoretical) data-driven science. At the foundation of these systems, we invoke the concept of the "semantic pixel" to establish a logical and virtual space linking data and the work of scientists. It is with this discussion of the relationship between analysts in their pursuit of knowledge and the rise of Big Data that this present discussion of the philosophical foundations of Big Data addresses the central questions raised by social informatics research.
    Date
    7. 3.2019 16:32:22
    Source
    Journal of the Association for Information Science and Technology. 70(2019) no.4, S.402-411
    Type
    a
  3. Hallonsten, O.; Holmberg, D.: Analyzing structural stratification in the Swedish higher education system : data contextualization with policy-history analysis (2013) 0.02
    0.018768111 = product of:
      0.046920277 = sum of:
        0.0076151006 = weight(_text_:a in 668) [ClassicSimilarity], result of:
          0.0076151006 = score(doc=668,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.14243183 = fieldWeight in 668, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=668)
        0.039305177 = sum of:
          0.007893822 = weight(_text_:information in 668) [ClassicSimilarity], result of:
            0.007893822 = score(doc=668,freq=2.0), product of:
              0.08139861 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046368346 = queryNorm
              0.09697737 = fieldWeight in 668, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0390625 = fieldNorm(doc=668)
          0.031411353 = weight(_text_:22 in 668) [ClassicSimilarity], result of:
            0.031411353 = score(doc=668,freq=2.0), product of:
              0.16237405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046368346 = queryNorm
              0.19345059 = fieldWeight in 668, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=668)
      0.4 = coord(2/5)
    
    Abstract
    20th century massification of higher education and research in academia is said to have produced structurally stratified higher education systems in many countries. Most manifestly, the research mission of universities appears to be divisive. Authors have claimed that the Swedish system, while formally unified, has developed into a binary state, and statistics seem to support this conclusion. This article makes use of a comprehensive statistical data source on Swedish higher education institutions to illustrate stratification, and uses literature on Swedish research policy history to contextualize the statistics. Highlighting the opportunities as well as constraints of the data, the article argues that there is great merit in combining statistics with a qualitative analysis when studying the structural characteristics of national higher education systems. Not least the article shows that it is an over-simplification to describe the Swedish system as binary; the stratification is more complex. On basis of the analysis, the article also argues that while global trends certainly influence national developments, higher education systems have country-specific features that may enrich the understanding of how systems evolve and therefore should be analyzed as part of a broader study of the increasingly globalized academic system.
    Date
    22. 3.2013 19:43:01
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.3, S.574-586
    Type
    a
  4. Blake, C.: Text mining (2011) 0.01
    0.008234787 = product of:
      0.020586967 = sum of:
        0.009535614 = weight(_text_:a in 1599) [ClassicSimilarity], result of:
          0.009535614 = score(doc=1599,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.17835285 = fieldWeight in 1599, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=1599)
        0.011051352 = product of:
          0.022102704 = sum of:
            0.022102704 = weight(_text_:information in 1599) [ClassicSimilarity], result of:
              0.022102704 = score(doc=1599,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.27153665 = fieldWeight in 1599, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.109375 = fieldNorm(doc=1599)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Source
    Annual review of information science and technology. 45(2011) no.1, S.121-155
    Type
    a
  5. Ebrahimi, M.; ShafieiBavani, E.; Wong, R.; Chen, F.: Twitter user geolocation by filtering of highly mentioned users (2018) 0.01
    0.007848554 = product of:
      0.019621385 = sum of:
        0.012923255 = weight(_text_:a in 4286) [ClassicSimilarity], result of:
          0.012923255 = score(doc=4286,freq=20.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.24171482 = fieldWeight in 4286, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=4286)
        0.0066981306 = product of:
          0.013396261 = sum of:
            0.013396261 = weight(_text_:information in 4286) [ClassicSimilarity], result of:
              0.013396261 = score(doc=4286,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.16457605 = fieldWeight in 4286, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4286)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Geolocated social media data provide a powerful source of information about places and regional human behavior. Because only a small amount of social media data have been geolocation-annotated, inference techniques play a substantial role to increase the volume of annotated data. Conventional research in this area has been based on the text content of posts from a given user or the social network of the user, with some recent crossovers between the text- and network-based approaches. This paper proposes a novel approach to categorize highly-mentioned users (celebrities) into Local and Global types, and consequently use Local celebrities as location indicators. A label propagation algorithm is then used over the refined social network for geolocation inference. Finally, we propose a hybrid approach by merging a text-based method as a back-off strategy into our network-based approach. Empirical experiments over three standard Twitter benchmark data sets demonstrate that our approach outperforms state-of-the-art user geolocation methods.
    Source
    Journal of the Association for Information Science and Technology. 69(2018) no.7, S.879-889
    Type
    a
  6. O'Brien, H.L.; Lebow, M.: Mixed-methods approach to measuring user experience in online news interactions (2013) 0.01
    0.007471291 = product of:
      0.018678227 = sum of:
        0.009010308 = weight(_text_:a in 1001) [ClassicSimilarity], result of:
          0.009010308 = score(doc=1001,freq=14.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.1685276 = fieldWeight in 1001, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1001)
        0.009667919 = product of:
          0.019335838 = sum of:
            0.019335838 = weight(_text_:information in 1001) [ClassicSimilarity], result of:
              0.019335838 = score(doc=1001,freq=12.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23754507 = fieldWeight in 1001, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1001)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    When it comes to evaluating online information experiences, what metrics matter? We conducted a study in which 30 people browsed and selected content within an online news website. Data collected included psychometric scales (User Engagement, Cognitive Absorption, System Usability Scales), self-reported interest in news content, and performance metrics (i.e., reading time, browsing time, total time, number of pages visited, and use of recommended links); a subset of the participants had their physiological responses recorded during the interaction (i.e., heart rate, electrodermal activity, electrocmytogram). Findings demonstrated the concurrent validity of the psychometric scales and interest ratings and revealed that increased time on tasks, number of pages visited, and use of recommended links were not necessarily indicative of greater self-reported engagement, cognitive absorption, or perceived usability. Positive ratings of news content were associated with lower physiological activity. The implications of this research are twofold. First, we propose that user experience is a useful framework for studying online information interactions and will result in a broader conceptualization of information interaction and its evaluation. Second, we advocate a mixed-methods approach to measurement that employs a suite of metrics capable of capturing the pragmatic (e.g., usability) and hedonic (e.g., fun, engagement) aspects of information interactions. We underscore the importance of using multiple measures in information research, because our results emphasize that performance and physiological data must be interpreted in the context of users' subjective experiences.
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.8, S.1543-1556
    Type
    a
  7. Chardonnens, A.; Hengchen, S.: Text mining for cultural heritage institutions : a 5-step method for cultural heritage institutions (2017) 0.01
    0.0073474604 = product of:
      0.01836865 = sum of:
        0.009437811 = weight(_text_:a in 646) [ClassicSimilarity], result of:
          0.009437811 = score(doc=646,freq=6.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.17652355 = fieldWeight in 646, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=646)
        0.0089308405 = product of:
          0.017861681 = sum of:
            0.017861681 = weight(_text_:information in 646) [ClassicSimilarity], result of:
              0.017861681 = score(doc=646,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.21943474 = fieldWeight in 646, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0625 = fieldNorm(doc=646)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Source
    Everything changes, everything stays the same? - Understanding information spaces : Proceedings of the 15th International Symposium of Information Science (ISI 2017), Berlin/Germany, 13th - 15th March 2017. Eds.: M. Gäde, V. Trkulja u. V. Petras
    Type
    a
  8. Ekbia, H.; Mattioli, M.; Kouper, I.; Arave, G.; Ghazinejad, A.; Bowman, T.; Suri, V.R.; Tsou, A.; Weingart, S.; Sugimoto, C.R.: Big data, bigger dilemmas : a critical review (2015) 0.01
    0.00732971 = product of:
      0.018324275 = sum of:
        0.0127425 = weight(_text_:a in 2155) [ClassicSimilarity], result of:
          0.0127425 = score(doc=2155,freq=28.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.23833402 = fieldWeight in 2155, product of:
              5.2915025 = tf(freq=28.0), with freq of:
                28.0 = termFreq=28.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2155)
        0.0055817757 = product of:
          0.011163551 = sum of:
            0.011163551 = weight(_text_:information in 2155) [ClassicSimilarity], result of:
              0.011163551 = score(doc=2155,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.13714671 = fieldWeight in 2155, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2155)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    The recent interest in Big Data has generated a broad range of new academic, corporate, and policy practices along with an evolving debate among its proponents, detractors, and skeptics. While the practices draw on a common set of tools, techniques, and technologies, most contributions to the debate come either from a particular disciplinary perspective or with a focus on a domain-specific issue. A close examination of these contributions reveals a set of common problematics that arise in various guises and in different places. It also demonstrates the need for a critical synthesis of the conceptual and practical dilemmas surrounding Big Data. The purpose of this article is to provide such a synthesis by drawing on relevant writings in the sciences, humanities, policy, and trade literature. In bringing these diverse literatures together, we aim to shed light on the common underlying issues that concern and affect all of these areas. By contextualizing the phenomenon of Big Data within larger socioeconomic developments, we also seek to provide a broader understanding of its drivers, barriers, and challenges. This approach allows us to identify attributes of Big Data that require more attention-autonomy, opacity, generativity, disparity, and futurity-leading to questions and ideas for moving beyond dilemmas.
    Series
    Advances in information science
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.8, S.1523-1545
    Type
    a
  9. Varathan, K.D.; Giachanou, A.; Crestani, F.: Comparative opinion mining : a review (2017) 0.01
    0.0072525083 = product of:
      0.018131271 = sum of:
        0.01129502 = weight(_text_:a in 3540) [ClassicSimilarity], result of:
          0.01129502 = score(doc=3540,freq=22.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.21126054 = fieldWeight in 3540, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3540)
        0.006836252 = product of:
          0.013672504 = sum of:
            0.013672504 = weight(_text_:information in 3540) [ClassicSimilarity], result of:
              0.013672504 = score(doc=3540,freq=6.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.16796975 = fieldWeight in 3540, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3540)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Opinion mining refers to the use of natural language processing, text analysis, and computational linguistics to identify and extract subjective information in textual material. Opinion mining, also known as sentiment analysis, has received a lot of attention in recent times, as it provides a number of tools to analyze public opinion on a number of different topics. Comparative opinion mining is a subfield of opinion mining which deals with identifying and extracting information that is expressed in a comparative form (e.g., "paper X is better than the Y"). Comparative opinion mining plays a very important role when one tries to evaluate something because it provides a reference point for the comparison. This paper provides a review of the area of comparative opinion mining. It is the first review that cover specifically this topic as all previous reviews dealt mostly with general opinion mining. This survey covers comparative opinion mining from two different angles. One from the perspective of techniques and the other from the perspective of comparative opinion elements. It also incorporates preprocessing tools as well as data set that were used by past researchers that can be useful to future researchers in the field of comparative opinion mining.
    Source
    Journal of the Association for Information Science and Technology. 68(2017) no.4, S.811-829
    Type
    a
  10. Derek Doran, D.; Gokhale, S.S.: ¬A classification framework for web robots (2012) 0.01
    0.0068851607 = product of:
      0.017212901 = sum of:
        0.010897844 = weight(_text_:a in 505) [ClassicSimilarity], result of:
          0.010897844 = score(doc=505,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.20383182 = fieldWeight in 505, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=505)
        0.006315058 = product of:
          0.012630116 = sum of:
            0.012630116 = weight(_text_:information in 505) [ClassicSimilarity], result of:
              0.012630116 = score(doc=505,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.1551638 = fieldWeight in 505, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0625 = fieldNorm(doc=505)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    The behavior of modern web robots varies widely when they crawl for different purposes. In this article, we present a framework to classify these web robots from two orthogonal perspectives, namely, their functionality and the types of resources they consume. Applying the classification framework to a year-long access log from the UConn SoE web server, we present trends that point to significant differences in their crawling behavior.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.12, S.2549-2554,
    Type
    a
  11. Kong, S.; Ye, F.; Feng, L.; Zhao, Z.: Towards the prediction problems of bursting hashtags on Twitter (2015) 0.01
    0.0068817483 = product of:
      0.01720437 = sum of:
        0.011678694 = weight(_text_:a in 2338) [ClassicSimilarity], result of:
          0.011678694 = score(doc=2338,freq=12.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.21843673 = fieldWeight in 2338, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
        0.005525676 = product of:
          0.011051352 = sum of:
            0.011051352 = weight(_text_:information in 2338) [ClassicSimilarity], result of:
              0.011051352 = score(doc=2338,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.13576832 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Hundreds of thousands of hashtags are generated every day on Twitter. Only a few will burst and become trending topics. In this article, we provide the definition of a bursting hashtag and conduct a systematic study of a series of challenging prediction problems that span the entire life cycles of bursting hashtags. Around the problem of "how to build a system to predict bursting hashtags," we explore different types of features and present machine learning solutions. On real data sets from Twitter, experiments are conducted to evaluate the effectiveness of the proposed solutions and the contributions of features.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2566-2579
    Type
    a
  12. Berendt, B.; Krause, B.; Kolbe-Nusser, S.: Intelligent scientific authoring tools : interactive data mining for constructive uses of citation networks (2010) 0.01
    0.0067985477 = product of:
      0.016996369 = sum of:
        0.012260076 = weight(_text_:a in 4226) [ClassicSimilarity], result of:
          0.012260076 = score(doc=4226,freq=18.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.22931081 = fieldWeight in 4226, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=4226)
        0.0047362936 = product of:
          0.009472587 = sum of:
            0.009472587 = weight(_text_:information in 4226) [ClassicSimilarity], result of:
              0.009472587 = score(doc=4226,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.116372846 = fieldWeight in 4226, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4226)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Many powerful methods and tools exist for extracting meaning from scientific publications, their texts, and their citation links. However, existing proposals often neglect a fundamental aspect of learning: that understanding and learning require an active and constructive exploration of a domain. In this paper, we describe a new method and a tool that use data mining and interactivity to turn the typical search and retrieve dialogue, in which the user asks questions and a system gives answers, into a dialogue that also involves sense-making, in which the user has to become active by constructing a bibliography and a domain model of the search term(s). This model starts from an automatically generated and annotated clustering solution that is iteratively modified by users. The tool is part of an integrated authoring system covering all phases from search through reading and sense-making to writing. Two evaluation studies demonstrate the usability of this interactive and constructive approach, and they show that clusters and groups represent identifiable sub-topics.
    Source
    Information processing and management. 46(2010) no.1, S.1-10
    Type
    a
  13. Qiu, X.Y.; Srinivasan, P.; Hu, Y.: Supervised learning models to predict firm performance with annual reports : an empirical study (2014) 0.01
    0.0066833766 = product of:
      0.016708441 = sum of:
        0.0100103095 = weight(_text_:a in 1205) [ClassicSimilarity], result of:
          0.0100103095 = score(doc=1205,freq=12.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.18723148 = fieldWeight in 1205, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1205)
        0.0066981306 = product of:
          0.013396261 = sum of:
            0.013396261 = weight(_text_:information in 1205) [ClassicSimilarity], result of:
              0.013396261 = score(doc=1205,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.16457605 = fieldWeight in 1205, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1205)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Text mining and machine learning methodologies have been applied toward knowledge discovery in several domains, such as biomedicine and business. Interestingly, in the business domain, the text mining and machine learning community has minimally explored company annual reports with their mandatory disclosures. In this study, we explore the question "How can annual reports be used to predict change in company performance from one year to the next?" from a text mining perspective. Our article contributes a systematic study of the potential of company mandatory disclosures using a computational viewpoint in the following aspects: (a) We characterize our research problem along distinct dimensions to gain a reasonably comprehensive understanding of the capacity of supervised learning methods in predicting change in company performance using annual reports, and (b) our findings from unbiased systematic experiments provide further evidence about the economic incentives faced by analysts in their stock recommendations and speculations on analysts having access to more information in producing earnings forecast.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.2, S.400-413
    Type
    a
  14. Biskri, I.; Rompré, L.: Using association rules for query reformulation (2012) 0.01
    0.006550755 = product of:
      0.016376887 = sum of:
        0.008173384 = weight(_text_:a in 92) [ClassicSimilarity], result of:
          0.008173384 = score(doc=92,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.15287387 = fieldWeight in 92, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=92)
        0.008203502 = product of:
          0.016407004 = sum of:
            0.016407004 = weight(_text_:information in 92) [ClassicSimilarity], result of:
              0.016407004 = score(doc=92,freq=6.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.20156369 = fieldWeight in 92, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=92)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    In this paper the authors will present research on the combination of two methods of data mining: text classification and maximal association rules. Text classification has been the focus of interest of many researchers for a long time. However, the results take the form of lists of words (classes) that people often do not know what to do with. The use of maximal association rules induced a number of advantages: (1) the detection of dependencies and correlations between the relevant units of information (words) of different classes, (2) the extraction of hidden knowledge, often relevant, from a large volume of data. The authors will show how this combination can improve the process of information retrieval.
    Source
    Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a
    Type
    a
  15. Sarnikar, S.; Zhang, Z.; Zhao, J.L.: Query-performance prediction for effective query routing in domain-specific repositories (2014) 0.01
    0.006550755 = product of:
      0.016376887 = sum of:
        0.008173384 = weight(_text_:a in 1326) [ClassicSimilarity], result of:
          0.008173384 = score(doc=1326,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.15287387 = fieldWeight in 1326, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1326)
        0.008203502 = product of:
          0.016407004 = sum of:
            0.016407004 = weight(_text_:information in 1326) [ClassicSimilarity], result of:
              0.016407004 = score(doc=1326,freq=6.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.20156369 = fieldWeight in 1326, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1326)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    The effective use of corporate memory is becoming increasingly important because every aspect of e-business requires access to information repositories. Unfortunately, less-than-satisfying effectiveness in state-of-the-art information-retrieval techniques is well known, even for some of the best search engines such as Google. In this study, the authors resolve this retrieval ineffectiveness problem by developing a new framework for predicting query performance, which is the first step toward better retrieval effectiveness. Specifically, they examine the relationship between query performance and query context. A query context consists of the query itself, the document collection, and the interaction between the two. The authors first analyze the characteristics of query context and develop various features for predicting query performance. Then, they propose a context-sensitive model for predicting query performance based on the characteristics of the query and the document collection. Finally, they validate this model with respect to five real-world collections of documents and demonstrate its utility in routing queries to the correct repository with high accuracy.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.8, S.1597-1614
    Type
    a
  16. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.01
    0.006334501 = product of:
      0.015836252 = sum of:
        0.009138121 = weight(_text_:a in 3464) [ClassicSimilarity], result of:
          0.009138121 = score(doc=3464,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.1709182 = fieldWeight in 3464, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
        0.0066981306 = product of:
          0.013396261 = sum of:
            0.013396261 = weight(_text_:information in 3464) [ClassicSimilarity], result of:
              0.013396261 = score(doc=3464,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.16457605 = fieldWeight in 3464, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3464)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1105-1119
    Type
    a
  17. Huvila, I.: Mining qualitative data on human information behaviour from the Web (2010) 0.01
    0.0063276635 = product of:
      0.015819158 = sum of:
        0.004767807 = weight(_text_:a in 4676) [ClassicSimilarity], result of:
          0.004767807 = score(doc=4676,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.089176424 = fieldWeight in 4676, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4676)
        0.011051352 = product of:
          0.022102704 = sum of:
            0.022102704 = weight(_text_:information in 4676) [ClassicSimilarity], result of:
              0.022102704 = score(doc=4676,freq=8.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.27153665 = fieldWeight in 4676, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4676)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This paper discusses an approach of collecting qualitative data on human information behaviour that is based on mining web data using search engines. The approach is technically the same that has been used for some time in webometric research to make statistical inferences on web data, but the present paper shows how the same tools and data collecting methods can be used to gather data for qualitative data analysis on human information behaviour.
    Source
    Information und Wissen: global, sozial und frei? Proceedings des 12. Internationalen Symposiums für Informationswissenschaft (ISI 2011) ; Hildesheim, 9. - 11. März 2011. Hrsg.: J. Griesbaum, T. Mandl u. C. Womser-Hacker
    Type
    a
  18. Suakkaphong, N.; Zhang, Z.; Chen, H.: Disease named entity recognition using semisupervised learning and conditional random fields (2011) 0.01
    0.0062546856 = product of:
      0.015636714 = sum of:
        0.0068111527 = weight(_text_:a in 4367) [ClassicSimilarity], result of:
          0.0068111527 = score(doc=4367,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.12739488 = fieldWeight in 4367, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4367)
        0.008825562 = product of:
          0.017651124 = sum of:
            0.017651124 = weight(_text_:information in 4367) [ClassicSimilarity], result of:
              0.017651124 = score(doc=4367,freq=10.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.21684799 = fieldWeight in 4367, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4367)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Information extraction is an important text-mining task that aims at extracting prespecified types of information from large text collections and making them available in structured representations such as databases. In the biomedical domain, information extraction can be applied to help biologists make the most use of their digital-literature archives. Currently, there are large amounts of biomedical literature that contain rich information about biomedical substances. Extracting such knowledge requires a good named entity recognition technique. In this article, we combine conditional random fields (CRFs), a state-of-the-art sequence-labeling algorithm, with two semisupervised learning techniques, bootstrapping and feature sampling, to recognize disease names from biomedical literature. Two data-processing strategies for each technique also were analyzed: one sequentially processing unlabeled data partitions and another one processing unlabeled data partitions in a round-robin fashion. The experimental results showed the advantage of semisupervised learning techniques given limited labeled training data. Specifically, CRFs with bootstrapping implemented in sequential fashion outperformed strictly supervised CRFs for disease name recognition. The project was supported by NIH/NLM Grant R33 LM07299-01, 2002-2005.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.4, S.727-737
    Type
    a
  19. Jäger, L.: Von Big Data zu Big Brother (2018) 0.01
    0.0061156014 = product of:
      0.015289003 = sum of:
        0.002724461 = weight(_text_:a in 5234) [ClassicSimilarity], result of:
          0.002724461 = score(doc=5234,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.050957955 = fieldWeight in 5234, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=5234)
        0.012564542 = product of:
          0.025129084 = sum of:
            0.025129084 = weight(_text_:22 in 5234) [ClassicSimilarity], result of:
              0.025129084 = score(doc=5234,freq=2.0), product of:
                0.16237405 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046368346 = queryNorm
                0.15476047 = fieldWeight in 5234, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5234)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Date
    22. 1.2018 11:33:49
    Type
    a
  20. Mandl, T.: Text mining und data minig (2013) 0.01
    0.00588199 = product of:
      0.014704974 = sum of:
        0.0068111527 = weight(_text_:a in 713) [ClassicSimilarity], result of:
          0.0068111527 = score(doc=713,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.12739488 = fieldWeight in 713, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=713)
        0.007893822 = product of:
          0.015787644 = sum of:
            0.015787644 = weight(_text_:information in 713) [ClassicSimilarity], result of:
              0.015787644 = score(doc=713,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.19395474 = fieldWeight in 713, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.078125 = fieldNorm(doc=713)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Source
    Grundlagen der praktischen Information und Dokumentation. Handbuch zur Einführung in die Informationswissenschaft und -praxis. 6., völlig neu gefaßte Ausgabe. Hrsg. von R. Kuhlen, W. Semar u. D. Strauch. Begründet von Klaus Laisiepen, Ernst Lutterbeck, Karl-Heinrich Meyer-Uhlenried
    Type
    a

Languages

  • e 47
  • d 7

Types

  • a 52
  • el 11
  • m 2
  • s 1
  • More… Less…