Search (141 results, page 1 of 8)

  • × type_ss:"a"
  • × theme_ss:"Data Mining"
  1. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.08
    0.08185689 = product of:
      0.16371378 = sum of:
        0.08959906 = weight(_text_:data in 1605) [ClassicSimilarity], result of:
          0.08959906 = score(doc=1605,freq=24.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.60511017 = fieldWeight in 1605, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1605)
        0.07411472 = sum of:
          0.042392377 = weight(_text_:processing in 1605) [ClassicSimilarity], result of:
            0.042392377 = score(doc=1605,freq=2.0), product of:
              0.18956426 = queryWeight, product of:
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.046827413 = queryNorm
              0.22363065 = fieldWeight in 1605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1605)
          0.03172234 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
            0.03172234 = score(doc=1605,freq=2.0), product of:
              0.16398162 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046827413 = queryNorm
              0.19345059 = fieldWeight in 1605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1605)
      0.5 = coord(2/4)
    
    Abstract
    Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
    Theme
    Data Mining
  2. Saggi, M.K.; Jain, S.: ¬A survey towards an integration of big data analytics to big insights for value-creation (2018) 0.07
    0.07135947 = product of:
      0.14271894 = sum of:
        0.112743005 = weight(_text_:data in 5053) [ClassicSimilarity], result of:
          0.112743005 = score(doc=5053,freq=38.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.7614136 = fieldWeight in 5053, product of:
              6.164414 = tf(freq=38.0), with freq of:
                38.0 = termFreq=38.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5053)
        0.029975938 = product of:
          0.059951875 = sum of:
            0.059951875 = weight(_text_:processing in 5053) [ClassicSimilarity], result of:
              0.059951875 = score(doc=5053,freq=4.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.3162615 = fieldWeight in 5053, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5053)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Big Data Analytics (BDA) is increasingly becoming a trending practice that generates an enormous amount of data and provides a new opportunity that is helpful in relevant decision-making. The developments in Big Data Analytics provide a new paradigm and solutions for big data sources, storage, and advanced analytics. The BDA provide a nuanced view of big data development, and insights on how it can truly create value for firm and customer. This article presents a comprehensive, well-informed examination, and realistic analysis of deploying big data analytics successfully in companies. It provides an overview of the architecture of BDA including six components, namely: (i) data generation, (ii) data acquisition, (iii) data storage, (iv) advanced data analytics, (v) data visualization, and (vi) decision-making for value-creation. In this paper, seven V's characteristics of BDA namely Volume, Velocity, Variety, Valence, Veracity, Variability, and Value are explored. The various big data analytics tools, techniques and technologies have been described. Furthermore, it presents a methodical analysis for the usage of Big Data Analytics in various applications such as agriculture, healthcare, cyber security, and smart city. This paper also highlights the previous research, challenges, current status, and future directions of big data analytics for various application platforms. This overview highlights three issues, namely (i) concepts, characteristics and processing paradigms of Big Data Analytics; (ii) the state-of-the-art framework for decision-making in BDA for companies to insight value-creation; and (iii) the current challenges of Big Data Analytics as well as possible future directions.
    Footnote
    Beitrag in einem Themenheft: 'In (Big) Data we trust: Value creation in knowledge organizations'.
    Source
    Information processing and management. 54(2018) no.5, S.758-790
    Theme
    Data Mining
  3. Dang, X.H.; Ong. K.-L.: Knowledge discovery in data streams (2009) 0.06
    0.059274744 = product of:
      0.11854949 = sum of:
        0.09311406 = weight(_text_:data in 3829) [ClassicSimilarity], result of:
          0.09311406 = score(doc=3829,freq=18.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.6288489 = fieldWeight in 3829, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=3829)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 3829) [ClassicSimilarity], result of:
              0.05087085 = score(doc=3829,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 3829, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3829)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Knowing what to do with the massive amount of data collected has always been an ongoing issue for many organizations. While data mining has been touted to be the solution, it has failed to deliver the impact despite its successes in many areas. One reason is that data mining algorithms were not designed for the real world, i.e., they usually assume a static view of the data and a stable execution environment where resourcesare abundant. The reality however is that data are constantly changing and the execution environment is dynamic. Hence, it becomes difficult for data mining to truly deliver timely and relevant results. Recently, the processing of stream data has received many attention. What is interesting is that the methodology to design stream-based algorithms may well be the solution to the above problem. In this entry, we discuss this issue and present an overview of recent works.
    Theme
    Data Mining
  4. Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.06
    0.058416665 = product of:
      0.11683333 = sum of:
        0.07242205 = weight(_text_:data in 4577) [ClassicSimilarity], result of:
          0.07242205 = score(doc=4577,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.48910472 = fieldWeight in 4577, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.109375 = fieldNorm(doc=4577)
        0.044411276 = product of:
          0.08882255 = sum of:
            0.08882255 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
              0.08882255 = score(doc=4577,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.5416616 = fieldWeight in 4577, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4577)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    2. 4.2000 18:01:22
    Theme
    Data Mining
  5. Fonseca, F.; Marcinkowski, M.; Davis, C.: Cyber-human systems of thought and understanding (2019) 0.05
    0.052730113 = product of:
      0.10546023 = sum of:
        0.08959906 = weight(_text_:data in 5011) [ClassicSimilarity], result of:
          0.08959906 = score(doc=5011,freq=24.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.60511017 = fieldWeight in 5011, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5011)
        0.01586117 = product of:
          0.03172234 = sum of:
            0.03172234 = weight(_text_:22 in 5011) [ClassicSimilarity], result of:
              0.03172234 = score(doc=5011,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.19345059 = fieldWeight in 5011, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5011)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The present challenge faced by scientists working with Big Data comes in the overwhelming volume and level of detail provided by current data sets. Exceeding traditional empirical approaches, Big Data opens a new perspective on scientific work in which data comes to play a role in the development of the scientific problematic to be developed. Addressing this reconfiguration of our relationship with data through readings of Wittgenstein, Macherey, and Popper, we propose a picture of science that encourages scientists to engage with the data in a direct way, using the data itself as an instrument for scientific investigation. Using GIS as a theme, we develop the concept of cyber-human systems of thought and understanding to bridge the divide between representative (theoretical) thinking and (non-theoretical) data-driven science. At the foundation of these systems, we invoke the concept of the "semantic pixel" to establish a logical and virtual space linking data and the work of scientists. It is with this discussion of the relationship between analysts in their pursuit of knowledge and the rise of Big Data that this present discussion of the philosophical foundations of Big Data addresses the central questions raised by social informatics research.
    Date
    7. 3.2019 16:32:22
    Theme
    Data Mining
  6. Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.05
    0.04852856 = product of:
      0.09705712 = sum of:
        0.07167925 = weight(_text_:data in 1737) [ClassicSimilarity], result of:
          0.07167925 = score(doc=1737,freq=6.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.48408815 = fieldWeight in 1737, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0625 = fieldNorm(doc=1737)
        0.025377871 = product of:
          0.050755743 = sum of:
            0.050755743 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
              0.050755743 = score(doc=1737,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.30952093 = fieldWeight in 1737, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1737)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Defines digital libraries and discusses the effects of new technology on librarians. Examines the different viewpoints of librarians and information technologists on digital libraries. Describes the development of a digital library at the National Drug Intelligence Center, USA, which was carried out in collaboration with information technology experts. The system is based on Web enabled search technology to find information, data visualization and data mining to visualize it and use of SGML as an information standard to store it
    Date
    22.11.1998 18:57:22
    Theme
    Data Mining
  7. Galal, G.M.; Cook, D.J.; Holder, L.B.: Exploiting parallelism in a structural scientific discovery system to improve scalability (1999) 0.05
    0.047419276 = product of:
      0.09483855 = sum of:
        0.06940313 = weight(_text_:data in 2952) [ClassicSimilarity], result of:
          0.06940313 = score(doc=2952,freq=10.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.46871632 = fieldWeight in 2952, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2952)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 2952) [ClassicSimilarity], result of:
              0.05087085 = score(doc=2952,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 2952, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2952)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The large amount of data collected today is quickly overwhelming researchers' abilities to interpret the data and discover interesting patterns. Knowledge discovery and data mining approaches hold the potential to automate the interpretation process, but these approaches frequently utilize computationally expensive algorithms. In particular, scientific discovery systems focus on the utilization of richer data representation, sometimes without regard for scalability. This research investigates approaches for scaling a particular knowledge discovery in databases (KDD) system, SUBDUE, using parallel and distributed resources. SUBDUE has been used to discover interesting and repetitive concepts in graph-based databases from a variety of domains, but requires a substantial amount of processing time. Experiments that demonstrate scalability of parallel versions of the SUBDUE system are performed using CAD circuit databases and artificially-generated databases, and potential achievements and obstacles are discussed
    Theme
    Data Mining
  8. Suakkaphong, N.; Zhang, Z.; Chen, H.: Disease named entity recognition using semisupervised learning and conditional random fields (2011) 0.05
    0.04727441 = product of:
      0.09454882 = sum of:
        0.057835944 = weight(_text_:data in 4367) [ClassicSimilarity], result of:
          0.057835944 = score(doc=4367,freq=10.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.39059696 = fieldWeight in 4367, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4367)
        0.036712877 = product of:
          0.073425755 = sum of:
            0.073425755 = weight(_text_:processing in 4367) [ClassicSimilarity], result of:
              0.073425755 = score(doc=4367,freq=6.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.38733965 = fieldWeight in 4367, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4367)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Information extraction is an important text-mining task that aims at extracting prespecified types of information from large text collections and making them available in structured representations such as databases. In the biomedical domain, information extraction can be applied to help biologists make the most use of their digital-literature archives. Currently, there are large amounts of biomedical literature that contain rich information about biomedical substances. Extracting such knowledge requires a good named entity recognition technique. In this article, we combine conditional random fields (CRFs), a state-of-the-art sequence-labeling algorithm, with two semisupervised learning techniques, bootstrapping and feature sampling, to recognize disease names from biomedical literature. Two data-processing strategies for each technique also were analyzed: one sequentially processing unlabeled data partitions and another one processing unlabeled data partitions in a round-robin fashion. The experimental results showed the advantage of semisupervised learning techniques given limited labeled training data. Specifically, CRFs with bootstrapping implemented in sequential fashion outperformed strictly supervised CRFs for disease name recognition. The project was supported by NIH/NLM Grant R33 LM07299-01, 2002-2005.
    Theme
    Data Mining
  9. Amir, A.; Feldman, R.; Kashi, R.: ¬A new and versatile method for association generation (1997) 0.04
    0.041951865 = product of:
      0.08390373 = sum of:
        0.058525857 = weight(_text_:data in 1270) [ClassicSimilarity], result of:
          0.058525857 = score(doc=1270,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.3952563 = fieldWeight in 1270, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0625 = fieldNorm(doc=1270)
        0.025377871 = product of:
          0.050755743 = sum of:
            0.050755743 = weight(_text_:22 in 1270) [ClassicSimilarity], result of:
              0.050755743 = score(doc=1270,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.30952093 = fieldWeight in 1270, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1270)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Current algorithms for finding associations among the attributes describing data in a database have a number of shortcomings. Presents a novel method for association generation, that answers all desiderata. The method is different from all existing algorithms and especially suitable to textual databases with binary attributes. Uses subword trees for quick indexing into the required database statistics. Tests the algorithm on the Reuters-22173 database with satisfactory results
    Source
    Information systems. 22(1997) nos.5/6, S.333-347
    Theme
    Data Mining
  10. Gaizauskas, R.; Wilks, Y.: Information extraction : beyond document retrieval (1998) 0.04
    0.03993276 = product of:
      0.07986552 = sum of:
        0.043894395 = weight(_text_:data in 4716) [ClassicSimilarity], result of:
          0.043894395 = score(doc=4716,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.29644224 = fieldWeight in 4716, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=4716)
        0.035971127 = product of:
          0.071942255 = sum of:
            0.071942255 = weight(_text_:processing in 4716) [ClassicSimilarity], result of:
              0.071942255 = score(doc=4716,freq=4.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.3795138 = fieldWeight in 4716, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4716)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    In this paper we give a synoptic view of the growth of the text processing technology of informatione xtraction (IE) whose function is to extract information about a pre-specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960s and 70s till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining
    Theme
    Data Mining
  11. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.04
    0.03959743 = product of:
      0.07919486 = sum of:
        0.053759433 = weight(_text_:data in 3464) [ClassicSimilarity], result of:
          0.053759433 = score(doc=3464,freq=6.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.3630661 = fieldWeight in 3464, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 3464) [ClassicSimilarity], result of:
              0.05087085 = score(doc=3464,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 3464, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3464)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.
    Theme
    Data Mining
  12. Berendt, B.; Krause, B.; Kolbe-Nusser, S.: Intelligent scientific authoring tools : interactive data mining for constructive uses of citation networks (2010) 0.04
    0.03959743 = product of:
      0.07919486 = sum of:
        0.053759433 = weight(_text_:data in 4226) [ClassicSimilarity], result of:
          0.053759433 = score(doc=4226,freq=6.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.3630661 = fieldWeight in 4226, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=4226)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 4226) [ClassicSimilarity], result of:
              0.05087085 = score(doc=4226,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 4226, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4226)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Many powerful methods and tools exist for extracting meaning from scientific publications, their texts, and their citation links. However, existing proposals often neglect a fundamental aspect of learning: that understanding and learning require an active and constructive exploration of a domain. In this paper, we describe a new method and a tool that use data mining and interactivity to turn the typical search and retrieve dialogue, in which the user asks questions and a system gives answers, into a dialogue that also involves sense-making, in which the user has to become active by constructing a bibliography and a domain model of the search term(s). This model starts from an automatically generated and annotated clustering solution that is iteratively modified by users. The tool is part of an integrated authoring system covering all phases from search through reading and sense-making to writing. Two evaluation studies demonstrate the usability of this interactive and constructive approach, and they show that clusters and groups represent identifiable sub-topics.
    Source
    Information processing and management. 46(2010) no.1, S.1-10
    Theme
    Data Mining
  13. Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.04
    0.03959743 = product of:
      0.07919486 = sum of:
        0.053759433 = weight(_text_:data in 3015) [ClassicSimilarity], result of:
          0.053759433 = score(doc=3015,freq=6.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.3630661 = fieldWeight in 3015, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=3015)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 3015) [ClassicSimilarity], result of:
              0.05087085 = score(doc=3015,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 3015, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3015)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
    Theme
    Data Mining
  14. Cardie, C.: Empirical methods in information extraction (1997) 0.04
    0.03764897 = product of:
      0.07529794 = sum of:
        0.04138403 = weight(_text_:data in 3246) [ClassicSimilarity], result of:
          0.04138403 = score(doc=3246,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2794884 = fieldWeight in 3246, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0625 = fieldNorm(doc=3246)
        0.033913903 = product of:
          0.067827806 = sum of:
            0.067827806 = weight(_text_:processing in 3246) [ClassicSimilarity], result of:
              0.067827806 = score(doc=3246,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.35780904 = fieldWeight in 3246, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3246)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Footnote
    Contribution to a special section reviewing recent research in empirical methods in speech recognition, syntactic parsing, semantic processing, information extraction and machine translation
    Theme
    Data Mining
  15. Haravu, L.J.; Neelameghan, A.: Text mining and data mining in knowledge organization and discovery : the making of knowledge-based products (2003) 0.04
    0.036463115 = product of:
      0.07292623 = sum of:
        0.05173004 = weight(_text_:data in 5653) [ClassicSimilarity], result of:
          0.05173004 = score(doc=5653,freq=8.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.34936053 = fieldWeight in 5653, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5653)
        0.021196188 = product of:
          0.042392377 = sum of:
            0.042392377 = weight(_text_:processing in 5653) [ClassicSimilarity], result of:
              0.042392377 = score(doc=5653,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.22363065 = fieldWeight in 5653, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5653)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Discusses the importance of knowledge organization in the context of the information overload caused by the vast quantities of data and information accessible on internal and external networks of an organization. Defines the characteristics of a knowledge-based product. Elaborates on the techniques and applications of text mining in developing knowledge products. Presents two approaches, as case studies, to the making of knowledge products: (1) steps and processes in the planning, designing and development of a composite multilingual multimedia CD product, with the potential international, inter-cultural end users in view, and (2) application of natural language processing software in text mining. Using a text mining software, it is possible to link concept terms from a processed text to a related thesaurus, glossary, schedules of a classification scheme, and facet structured subject representations. Concludes that the products of text mining and data mining could be made more useful if the features of a faceted scheme for subject classification are incorporated into text mining techniques and products.
    Theme
    Data Mining
  16. Tonkin, E.L.; Tourte, G.J.L.: Working with text. tools, techniques and approaches for text mining (2016) 0.04
    0.036463115 = product of:
      0.07292623 = sum of:
        0.05173004 = weight(_text_:data in 4019) [ClassicSimilarity], result of:
          0.05173004 = score(doc=4019,freq=8.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.34936053 = fieldWeight in 4019, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4019)
        0.021196188 = product of:
          0.042392377 = sum of:
            0.042392377 = weight(_text_:processing in 4019) [ClassicSimilarity], result of:
              0.042392377 = score(doc=4019,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.22363065 = fieldWeight in 4019, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4019)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    What is text mining, and how can it be used? What relevance do these methods have to everyday work in information science and the digital humanities? How does one develop competences in text mining? Working with Text provides a series of cross-disciplinary perspectives on text mining and its applications. As text mining raises legal and ethical issues, the legal background of text mining and the responsibilities of the engineer are discussed in this book. Chapters provide an introduction to the use of the popular GATE text mining package with data drawn from social media, the use of text mining to support semantic search, the development of an authority system to support content tagging, and recent techniques in automatic language evaluation. Focused studies describe text mining on historical texts, automated indexing using constrained vocabularies, and the use of natural language processing to explore the climate science literature. Interviews are included that offer a glimpse into the real-life experience of working within commercial and academic text mining.
    LCSH
    Data mining
    Subject
    Data mining
    Theme
    Data Mining
  17. Keim, D.A.: Data Mining mit bloßem Auge (2002) 0.03
    0.034701563 = product of:
      0.13880625 = sum of:
        0.13880625 = weight(_text_:data in 1086) [ClassicSimilarity], result of:
          0.13880625 = score(doc=1086,freq=10.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.93743265 = fieldWeight in 1086, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.09375 = fieldNorm(doc=1086)
      0.25 = coord(1/4)
    
    Abstract
    Visualisierungen, die möglichst instruktive grafische Darstellung von Daten, ist wesentlicher Bestandteil des Data Mining
    Footnote
    Teil eines Heftthemas 'Data Mining'
    Series
    Data Mining
    Theme
    Data Mining
  18. Sánchez, D.; Chamorro-Martínez, J.; Vila, M.A.: Modelling subjectivity in visual perception of orientation for image retrieval (2003) 0.03
    0.03466491 = product of:
      0.06932982 = sum of:
        0.043894395 = weight(_text_:data in 1067) [ClassicSimilarity], result of:
          0.043894395 = score(doc=1067,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.29644224 = fieldWeight in 1067, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=1067)
        0.025435425 = product of:
          0.05087085 = sum of:
            0.05087085 = weight(_text_:processing in 1067) [ClassicSimilarity], result of:
              0.05087085 = score(doc=1067,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.26835677 = fieldWeight in 1067, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1067)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    In this paper we combine computer vision and data mining techniques to model high-level concepts for image retrieval, on the basis of basic perceptual features of the human visual system. High-level concepts related to these features are learned and represented by means of a set of fuzzy association rules. The concepts so acquired can be used for image retrieval with the advantage that it is not needed to provide an image as a query. Instead, a query is formulated by using the labels that identify the learned concepts as search terms, and the retrieval process calculates the relevance of an image to the query by an inference mechanism. An additional feature of our methodology is that it can capture user's subjectivity. For that purpose, fuzzy sets theory is employed to measure user's assessments about the fulfillment of a concept by an image.
    Source
    Information processing and management. 39(2003) no.2, S.251-266
    Theme
    Data Mining
  19. Hallonsten, O.; Holmberg, D.: Analyzing structural stratification in the Swedish higher education system : data contextualization with policy-history analysis (2013) 0.03
    0.033795606 = product of:
      0.06759121 = sum of:
        0.05173004 = weight(_text_:data in 668) [ClassicSimilarity], result of:
          0.05173004 = score(doc=668,freq=8.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.34936053 = fieldWeight in 668, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=668)
        0.01586117 = product of:
          0.03172234 = sum of:
            0.03172234 = weight(_text_:22 in 668) [ClassicSimilarity], result of:
              0.03172234 = score(doc=668,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.19345059 = fieldWeight in 668, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=668)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    20th century massification of higher education and research in academia is said to have produced structurally stratified higher education systems in many countries. Most manifestly, the research mission of universities appears to be divisive. Authors have claimed that the Swedish system, while formally unified, has developed into a binary state, and statistics seem to support this conclusion. This article makes use of a comprehensive statistical data source on Swedish higher education institutions to illustrate stratification, and uses literature on Swedish research policy history to contextualize the statistics. Highlighting the opportunities as well as constraints of the data, the article argues that there is great merit in combining statistics with a qualitative analysis when studying the structural characteristics of national higher education systems. Not least the article shows that it is an over-simplification to describe the Swedish system as binary; the stratification is more complex. On basis of the analysis, the article also argues that while global trends certainly influence national developments, higher education systems have country-specific features that may enrich the understanding of how systems evolve and therefore should be analyzed as part of a broader study of the increasingly globalized academic system.
    Date
    22. 3.2013 19:43:01
    Theme
    Data Mining
  20. Kraker, P.; Kittel, C,; Enkhbayar, A.: Open Knowledge Maps : creating a visual interface to the world's scientific knowledge based on natural language processing (2016) 0.03
    0.033504575 = product of:
      0.06700915 = sum of:
        0.031038022 = weight(_text_:data in 3205) [ClassicSimilarity], result of:
          0.031038022 = score(doc=3205,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2096163 = fieldWeight in 3205, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=3205)
        0.035971127 = product of:
          0.071942255 = sum of:
            0.071942255 = weight(_text_:processing in 3205) [ClassicSimilarity], result of:
              0.071942255 = score(doc=3205,freq=4.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.3795138 = fieldWeight in 3205, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3205)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The goal of Open Knowledge Maps is to create a visual interface to the world's scientific knowledge. The base for this visual interface consists of so-called knowledge maps, which enable the exploration of existing knowledge and the discovery of new knowledge. Our open source knowledge mapping software applies a mixture of summarization techniques and similarity measures on article metadata, which are iteratively chained together. After processing, the representation is saved in a database for use in a web visualization. In the future, we want to create a space for collective knowledge mapping that brings together individuals and communities involved in exploration and discovery. We want to enable people to guide each other in their discovery by collaboratively annotating and modifying the automatically created maps.
    Theme
    Data Mining

Years

Languages

  • e 114
  • d 26
  • sp 1
  • More… Less…

Classifications