Search (9 results, page 1 of 1)

  • × theme_ss:"Data Mining"
  • × year_i:[2010 TO 2020}
  1. Hallonsten, O.; Holmberg, D.: Analyzing structural stratification in the Swedish higher education system : data contextualization with policy-history analysis (2013) 0.03
    0.030387513 = sum of:
      0.013938631 = product of:
        0.055754524 = sum of:
          0.055754524 = weight(_text_:authors in 668) [ClassicSimilarity], result of:
            0.055754524 = score(doc=668,freq=2.0), product of:
              0.22138755 = queryWeight, product of:
                4.558814 = idf(docFreq=1258, maxDocs=44218)
                0.04856253 = queryNorm
              0.25184128 = fieldWeight in 668, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.558814 = idf(docFreq=1258, maxDocs=44218)
                0.0390625 = fieldNorm(doc=668)
        0.25 = coord(1/4)
      0.016448881 = product of:
        0.032897763 = sum of:
          0.032897763 = weight(_text_:22 in 668) [ClassicSimilarity], result of:
            0.032897763 = score(doc=668,freq=2.0), product of:
              0.17005771 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.04856253 = queryNorm
              0.19345059 = fieldWeight in 668, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=668)
        0.5 = coord(1/2)
    
    Abstract
    20th century massification of higher education and research in academia is said to have produced structurally stratified higher education systems in many countries. Most manifestly, the research mission of universities appears to be divisive. Authors have claimed that the Swedish system, while formally unified, has developed into a binary state, and statistics seem to support this conclusion. This article makes use of a comprehensive statistical data source on Swedish higher education institutions to illustrate stratification, and uses literature on Swedish research policy history to contextualize the statistics. Highlighting the opportunities as well as constraints of the data, the article argues that there is great merit in combining statistics with a qualitative analysis when studying the structural characteristics of national higher education systems. Not least the article shows that it is an over-simplification to describe the Swedish system as binary; the stratification is more complex. On basis of the analysis, the article also argues that while global trends certainly influence national developments, higher education systems have country-specific features that may enrich the understanding of how systems evolve and therefore should be analyzed as part of a broader study of the increasingly globalized academic system.
    Date
    22. 3.2013 19:43:01
  2. Miao, Q.; Li, Q.; Zeng, D.: Fine-grained opinion mining by integrating multiple review sources (2010) 0.01
    0.01379854 = product of:
      0.02759708 = sum of:
        0.02759708 = product of:
          0.11038832 = sum of:
            0.11038832 = weight(_text_:authors in 4104) [ClassicSimilarity], result of:
              0.11038832 = score(doc=4104,freq=4.0), product of:
                0.22138755 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.04856253 = queryNorm
                0.49862027 = fieldWeight in 4104, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4104)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    With the rapid development of Web 2.0, online reviews have become extremely valuable sources for mining customers' opinions. Fine-grained opinion mining has attracted more and more attention of both applied and theoretical research. In this article, the authors study how to automatically mine product features and opinions from multiple review sources. Specifically, they propose an integration strategy to solve the issue. Within the integration strategy, the authors mine domain knowledge from semistructured reviews and then exploit the domain knowledge to assist product feature extraction and sentiment orientation identification from unstructured reviews. Finally, feature-opinion tuples are generated. Experimental results on real-world datasets show that the proposed approach is effective.
  3. Biskri, I.; Rompré, L.: Using association rules for query reformulation (2012) 0.01
    0.011827321 = product of:
      0.023654642 = sum of:
        0.023654642 = product of:
          0.09461857 = sum of:
            0.09461857 = weight(_text_:authors in 92) [ClassicSimilarity], result of:
              0.09461857 = score(doc=92,freq=4.0), product of:
                0.22138755 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.04856253 = queryNorm
                0.42738882 = fieldWeight in 92, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.046875 = fieldNorm(doc=92)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    In this paper the authors will present research on the combination of two methods of data mining: text classification and maximal association rules. Text classification has been the focus of interest of many researchers for a long time. However, the results take the form of lists of words (classes) that people often do not know what to do with. The use of maximal association rules induced a number of advantages: (1) the detection of dependencies and correlations between the relevant units of information (words) of different classes, (2) the extraction of hidden knowledge, often relevant, from a large volume of data. The authors will show how this combination can improve the process of information retrieval.
  4. Sarnikar, S.; Zhang, Z.; Zhao, J.L.: Query-performance prediction for effective query routing in domain-specific repositories (2014) 0.01
    0.011827321 = product of:
      0.023654642 = sum of:
        0.023654642 = product of:
          0.09461857 = sum of:
            0.09461857 = weight(_text_:authors in 1326) [ClassicSimilarity], result of:
              0.09461857 = score(doc=1326,freq=4.0), product of:
                0.22138755 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.04856253 = queryNorm
                0.42738882 = fieldWeight in 1326, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1326)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    The effective use of corporate memory is becoming increasingly important because every aspect of e-business requires access to information repositories. Unfortunately, less-than-satisfying effectiveness in state-of-the-art information-retrieval techniques is well known, even for some of the best search engines such as Google. In this study, the authors resolve this retrieval ineffectiveness problem by developing a new framework for predicting query performance, which is the first step toward better retrieval effectiveness. Specifically, they examine the relationship between query performance and query context. A query context consists of the query itself, the document collection, and the interaction between the two. The authors first analyze the characteristics of query context and develop various features for predicting query performance. Then, they propose a context-sensitive model for predicting query performance based on the characteristics of the query and the document collection. Finally, they validate this model with respect to five real-world collections of documents and demonstrate its utility in routing queries to the correct repository with high accuracy.
  5. Leydesdorff, L.; Persson, O.: Mapping the geography of science : distribution patterns and networks of relations among cities and institutes (2010) 0.01
    0.008363178 = product of:
      0.016726356 = sum of:
        0.016726356 = product of:
          0.066905424 = sum of:
            0.066905424 = weight(_text_:authors in 3704) [ClassicSimilarity], result of:
              0.066905424 = score(doc=3704,freq=2.0), product of:
                0.22138755 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.04856253 = queryNorm
                0.30220953 = fieldWeight in 3704, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3704)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    Using Google Earth, Google Maps, and/or network visualization programs such as Pajek, one can overlay the network of relations among addresses in scientific publications onto the geographic map. The authors discuss the pros and cons of various options, and provide software (freeware) for bridging existing gaps between the Science Citation Indices (Thomson Reuters) and Scopus (Elsevier), on the one hand, and these various visualization tools on the other. At the level of city names, the global map can be drawn reliably on the basis of the available address information. At the level of the names of organizations and institutes, there are problems of unification both in the ISI databases and with Scopus. Pajek enables a combination of visualization and statistical analysis, whereas the Google Maps and its derivatives provide superior tools on the Internet.
  6. Berry, M.W.; Esau, R.; Kiefer, B.: ¬The use of text mining techniques in electronic discovery for legal matters (2012) 0.01
    0.008363178 = product of:
      0.016726356 = sum of:
        0.016726356 = product of:
          0.066905424 = sum of:
            0.066905424 = weight(_text_:authors in 91) [ClassicSimilarity], result of:
              0.066905424 = score(doc=91,freq=2.0), product of:
                0.22138755 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.04856253 = queryNorm
                0.30220953 = fieldWeight in 91, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.046875 = fieldNorm(doc=91)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    Electronic discovery (eDiscovery) is the process of collecting and analyzing electronic documents to determine their relevance to a legal matter. Office technology has advanced and eased the requirements necessary to create a document. As such, the volume of data has outgrown the manual processes previously used to make relevance judgments. Methods of text mining and information retrieval have been put to use in eDiscovery to help tame the volume of data; however, the results have been uneven. This chapter looks at the historical bias of the collection process. The authors examine how tools like classifiers, latent semantic analysis, and non-negative matrix factorization deal with nuances of the collection process.
  7. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.01
    0.008224441 = product of:
      0.016448881 = sum of:
        0.016448881 = product of:
          0.032897763 = sum of:
            0.032897763 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
              0.032897763 = score(doc=1605,freq=2.0), product of:
                0.17005771 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04856253 = queryNorm
                0.19345059 = fieldWeight in 1605, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1605)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
  8. Fonseca, F.; Marcinkowski, M.; Davis, C.: Cyber-human systems of thought and understanding (2019) 0.01
    0.008224441 = product of:
      0.016448881 = sum of:
        0.016448881 = product of:
          0.032897763 = sum of:
            0.032897763 = weight(_text_:22 in 5011) [ClassicSimilarity], result of:
              0.032897763 = score(doc=5011,freq=2.0), product of:
                0.17005771 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04856253 = queryNorm
                0.19345059 = fieldWeight in 5011, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5011)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    7. 3.2019 16:32:22
  9. Jäger, L.: Von Big Data zu Big Brother (2018) 0.01
    0.006579553 = product of:
      0.013159106 = sum of:
        0.013159106 = product of:
          0.026318211 = sum of:
            0.026318211 = weight(_text_:22 in 5234) [ClassicSimilarity], result of:
              0.026318211 = score(doc=5234,freq=2.0), product of:
                0.17005771 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04856253 = queryNorm
                0.15476047 = fieldWeight in 5234, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5234)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 1.2018 11:33:49