Search (155 results, page 4 of 8)

  • × theme_ss:"Data Mining"
  1. Deogun, J.S.: Feature selection and effective classifiers (1998) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 2911) [ClassicSimilarity], result of:
              0.009076704 = score(doc=2911,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 2911, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2911)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Develops and analyzes 4 algorithms for feature selection in the context of rough set methodology. Develops the notion of accuracy of classification that can be used for upper or lower classification methods and defines the feature selection problem. Presents a discussion of upper classifiers and develops 4 features selection heuristics and discusses the family of stepwise backward selection algorithms. Analyzes the worst case time complexity in all algorithms presented. Discusses details of the experiments and results of using a family of stepwise backward selection learning data sets and a duodenal ulcer data set. Includes the experimental setup and results of comparison of lower classifiers and upper classiers on the duodenal ulcer data set. Discusses exteded decision tables
    Footnote
    Contribution to a special issue devoted to knowledge discovery and data mining
    Type
    a
  2. Galal, G.M.; Cook, D.J.; Holder, L.B.: Exploiting parallelism in a structural scientific discovery system to improve scalability (1999) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 2952) [ClassicSimilarity], result of:
              0.009076704 = score(doc=2952,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 2952, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2952)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The large amount of data collected today is quickly overwhelming researchers' abilities to interpret the data and discover interesting patterns. Knowledge discovery and data mining approaches hold the potential to automate the interpretation process, but these approaches frequently utilize computationally expensive algorithms. In particular, scientific discovery systems focus on the utilization of richer data representation, sometimes without regard for scalability. This research investigates approaches for scaling a particular knowledge discovery in databases (KDD) system, SUBDUE, using parallel and distributed resources. SUBDUE has been used to discover interesting and repetitive concepts in graph-based databases from a variety of domains, but requires a substantial amount of processing time. Experiments that demonstrate scalability of parallel versions of the SUBDUE system are performed using CAD circuit databases and artificially-generated databases, and potential achievements and obstacles are discussed
    Type
    a
  3. Loh, S.; Oliveira, J.P.M. de; Gastal, F.L.: Knowledge discovery in textual documentation : qualitative and quantitative analyses (2001) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 4482) [ClassicSimilarity], result of:
              0.009076704 = score(doc=4482,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 4482, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4482)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper presents an approach for performing knowledge discovery in texts through qualitative and quantitative analyses of high-level textual characteristics. Instead of applying mining techniques on attribute values, terms or keywords extracted from texts, the discovery process works over conceptss identified in texts. Concepts represent real world events and objects, and they help the user to understand ideas, trends, thoughts, opinions and intentions present in texts. The approach combines a quasi-automatic categorisation task (for qualitative analysis) with a mining process (for quantitative analysis). The goal is to find new and useful knowledge inside a textual collection through the use of mining techniques applied over concepts (representing text content). In this paper, an application of the approach to medical records of a psychiatric hospital is presented. The approach helps physicians to extract knowledge about patients and diseases. This knowledge may be used for epidemiological studies, for training professionals and it may be also used to support physicians to diagnose and evaluate diseases.
    Type
    a
  4. Sánchez, D.; Chamorro-Martínez, J.; Vila, M.A.: Modelling subjectivity in visual perception of orientation for image retrieval (2003) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 1067) [ClassicSimilarity], result of:
              0.009076704 = score(doc=1067,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 1067, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1067)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper we combine computer vision and data mining techniques to model high-level concepts for image retrieval, on the basis of basic perceptual features of the human visual system. High-level concepts related to these features are learned and represented by means of a set of fuzzy association rules. The concepts so acquired can be used for image retrieval with the advantage that it is not needed to provide an image as a query. Instead, a query is formulated by using the labels that identify the learned concepts as search terms, and the retrieval process calculates the relevance of an image to the query by an inference mechanism. An additional feature of our methodology is that it can capture user's subjectivity. For that purpose, fuzzy sets theory is employed to measure user's assessments about the fulfillment of a concept by an image.
    Type
    a
  5. Srinivasan, P.: Text mining in biomedicine : challenges and opportunities (2006) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 1497) [ClassicSimilarity], result of:
              0.009076704 = score(doc=1497,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 1497, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1497)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Text mining is about making serendipity more likely. Serendipity, the chance discovery of interesting ideas, has been responsible for many discoveries in science. Text mining systems strive to explore large text collections, separate the potentially meaningfull connections from a vast and mostly noisy background of random associations. In this paper we provide a summary of our text mining approach and also illustrate briefly some of the experiments we have conducted with this approach. In particular we use a profile-based text mining method. We have used these profiles to explore the global distribution of disease research, replicate discoveries made by others and propose new hypotheses. Text mining holds much potential that has yet to be tapped.
    Source
    Knowledge organization, information systems and other essays: Professor A. Neelameghan Festschrift. Ed. by K.S. Raghavan and K.N. Prasad
    Type
    a
  6. Li, J.; Zhang, P.; Cao, J.: External concept support for group support systems through Web mining (2009) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 2806) [ClassicSimilarity], result of:
              0.009076704 = score(doc=2806,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 2806, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2806)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    External information plays an important role in group decision-making processes, yet research about external information support for Group Support Systems (GSS) has been lacking. In this study, we propose an approach to build a concept space to provide external concept support for GSS users. Built on a Web mining algorithm, the approach can mine a concept space from the Web and retrieve related concepts from the concept space based on users' comments in a real-time manner. We conduct two experiments to evaluate the quality of the proposed approach and the effectiveness of the external concept support provided by this approach. The experiment results indicate that the concept space mined from the Web contained qualified concepts to stimulate divergent thinking. The results also demonstrate that external concept support in GSS greatly enhanced group productivity for idea generation tasks.
    Type
    a
  7. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 3464) [ClassicSimilarity], result of:
              0.009076704 = score(doc=3464,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 3464, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3464)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.
    Type
    a
  8. Mohr, J.W.; Bogdanov, P.: Topic models : what they are and why they matter (2013) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 1142) [ClassicSimilarity], result of:
              0.009076704 = score(doc=1142,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 1142, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1142)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We provide a brief, non-technical introduction to the text mining methodology known as "topic modeling." We summarize the theory and background of the method and discuss what kinds of things are found by topic models. Using a text corpus comprised of the eight articles from the special issue of Poetics on the subject of topic models, we run a topic model on these articles, both as a way to introduce the methodology and also to help summarize some of the ways in which social and cultural scientists are using topic models. We review some of the critiques and debates over the use of the method and finally, we link these developments back to some of the original innovations in the field of content analysis that were pioneered by Harold D. Lasswell and colleagues during and just after World War II.
    Type
    a
  9. Bella, A. La; Fronzetti Colladon, A.; Battistoni, E.; Castellan, S.; Francucci, M.: Assessing perceived organizational leadership styles through twitter text mining (2018) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 2400) [ClassicSimilarity], result of:
              0.009076704 = score(doc=2400,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 2400, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2400)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We propose a text classification tool based on support vector machines for the assessment of organizational leadership styles, as appearing to Twitter users. We collected Twitter data over 51 days, related to the first 30 Italian organizations in the 2015 ranking of Forbes Global 2000-out of which we selected the five with the most relevant volumes of tweets. We analyzed the communication of the company leaders, together with the dialogue among the stakeholders of each company, to understand the association with perceived leadership styles and dimensions. To assess leadership profiles, we referred to the 10-factor model developed by Barchiesi and La Bella in 2007. We maintain the distinctiveness of the approach we propose, as it allows a rapid assessment of the perceived leadership capabilities of an enterprise, as they emerge from its social media interactions. It can also be used to show how companies respond and manage their communication when specific events take place, and to assess their stakeholder's reactions.
    Type
    a
  10. Wongthontham, P.; Abu-Salih, B.: Ontology-based approach for semantic data extraction from social big data : state-of-the-art and research directions (2018) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 4097) [ClassicSimilarity], result of:
              0.009076704 = score(doc=4097,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 4097, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4097)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A challenge of managing and extracting useful knowledge from social media data sources has attracted much attention from academic and industry. To address this challenge, semantic analysis of textual data is focused in this paper. We propose an ontology-based approach to extract semantics of textual data and define the domain of data. In other words, we semantically analyse the social data at two levels i.e. the entity level and the domain level. We have chosen Twitter as a social channel challenge for a purpose of concept proof. Domain knowledge is captured in ontologies which are then used to enrich the semantics of tweets provided with specific semantic conceptual representation of entities that appear in the tweets. Case studies are used to demonstrate this approach. We experiment and evaluate our proposed approach with a public dataset collected from Twitter and from the politics domain. The ontology-based approach leverages entity extraction and concept mappings in terms of quantity and accuracy of concept identification.
    Type
    a
  11. O'Brien, H.L.; Lebow, M.: Mixed-methods approach to measuring user experience in online news interactions (2013) 0.00
    0.0022374375 = product of:
      0.004474875 = sum of:
        0.004474875 = product of:
          0.00894975 = sum of:
            0.00894975 = weight(_text_:a in 1001) [ClassicSimilarity], result of:
              0.00894975 = score(doc=1001,freq=14.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1685276 = fieldWeight in 1001, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1001)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    When it comes to evaluating online information experiences, what metrics matter? We conducted a study in which 30 people browsed and selected content within an online news website. Data collected included psychometric scales (User Engagement, Cognitive Absorption, System Usability Scales), self-reported interest in news content, and performance metrics (i.e., reading time, browsing time, total time, number of pages visited, and use of recommended links); a subset of the participants had their physiological responses recorded during the interaction (i.e., heart rate, electrodermal activity, electrocmytogram). Findings demonstrated the concurrent validity of the psychometric scales and interest ratings and revealed that increased time on tasks, number of pages visited, and use of recommended links were not necessarily indicative of greater self-reported engagement, cognitive absorption, or perceived usability. Positive ratings of news content were associated with lower physiological activity. The implications of this research are twofold. First, we propose that user experience is a useful framework for studying online information interactions and will result in a broader conceptualization of information interaction and its evaluation. Second, we advocate a mixed-methods approach to measurement that employs a suite of metrics capable of capturing the pragmatic (e.g., usability) and hedonic (e.g., fun, engagement) aspects of information interactions. We underscore the importance of using multiple measures in information research, because our results emphasize that performance and physiological data must be interpreted in the context of users' subjective experiences.
    Type
    a
  12. Mining text data (2012) 0.00
    0.0021393995 = product of:
      0.004278799 = sum of:
        0.004278799 = product of:
          0.008557598 = sum of:
            0.008557598 = weight(_text_:a in 362) [ClassicSimilarity], result of:
              0.008557598 = score(doc=362,freq=20.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.16114321 = fieldWeight in 362, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.03125 = fieldNorm(doc=362)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including the key research content on the topic, and the future directions of research in the field. There is a special focus on Text Embedded with Heterogeneous and Multimedia Data which makes the mining process much more challenging. A number of methods have been designed such as transfer learning and cross-lingual mining for such cases. Mining Text Data simplifies the content, so that advanced-level students, practitioners and researchers in computer science can benefit from this book. Academic and corporate libraries, as well as ACM, IEEE, and Management Science focused on information security, electronic commerce, databases, data mining, machine learning, and statistics are the primary buyers for this reference book.
    Content
    Inhalt: An Introduction to Text Mining.- Information Extraction from Text.- A Survey of Text Summarization Techniques.- A Survey of Text Clustering Algorithms.- Dimensionality Reduction and Topic Modeling.- A Survey of Text Classification Algorithms.- Transfer Learning for Text Mining.- Probabilistic Models for Text Mining.- Mining Text Streams.- Translingual Mining from Text Data.- Text Mining in Multimedia.- Text Analytics in Social Media.- A Survey of Opinion Mining and Sentiment Analysis.- Biomedical Text Mining: A Survey of Recent Progress.- Index.
  13. Hereth, J.; Stumme, G.; Wille, R.; Wille, U.: Conceptual knowledge discovery and data analysis (2000) 0.00
    0.0020714647 = product of:
      0.0041429293 = sum of:
        0.0041429293 = product of:
          0.008285859 = sum of:
            0.008285859 = weight(_text_:a in 5083) [ClassicSimilarity], result of:
              0.008285859 = score(doc=5083,freq=12.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15602624 = fieldWeight in 5083, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5083)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper, we discuss Conceptual Knowledge Discovery in Databases (CKDD) in its connection with Data Analysis. Our approach is based on Formal Concept Analysis, a mathematical theory which has been developed and proven useful during the last 20 years. Formal Concept Analysis has led to a theory of conceptual information systems which has been applied by using the management system TOSCANA in a wide range of domains. In this paper, we use such an application in database marketing to demonstrate how methods and procedures of CKDD can be applied in Data Analysis. In particular, we show the interplay and integration of data mining and data analysis techniques based on Formal Concept Analysis. The main concern of this paper is to explain how the transition from data to knowledge can be supported by a TOSCANA system. To clarify the transition steps we discuss their correspondence to the five levels of knowledge representation established by R. Brachman and to the steps of empirically grounded theory building proposed by A. Strauss and J. Corbin
    Type
    a
  14. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.00
    0.0020714647 = product of:
      0.0041429293 = sum of:
        0.0041429293 = product of:
          0.008285859 = sum of:
            0.008285859 = weight(_text_:a in 604) [ClassicSimilarity], result of:
              0.008285859 = score(doc=604,freq=12.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15602624 = fieldWeight in 604, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=604)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words.
    Type
    a
  15. Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.00
    0.0020714647 = product of:
      0.0041429293 = sum of:
        0.0041429293 = product of:
          0.008285859 = sum of:
            0.008285859 = weight(_text_:a in 967) [ClassicSimilarity], result of:
              0.008285859 = score(doc=967,freq=12.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15602624 = fieldWeight in 967, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=967)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
    Type
    a
  16. Gill, A.J.; Hinrichs-Krapels, S.; Blanke, T.; Grant, J.; Hedges, M.; Tanner, S.: Insight workflow : systematically combining human and computational methods to explore textual data (2017) 0.00
    0.0020714647 = product of:
      0.0041429293 = sum of:
        0.0041429293 = product of:
          0.008285859 = sum of:
            0.008285859 = weight(_text_:a in 3682) [ClassicSimilarity], result of:
              0.008285859 = score(doc=3682,freq=12.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15602624 = fieldWeight in 3682, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3682)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Analyzing large quantities of real-world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real-world data set as our case study, and use this exploration as a demonstration of our "insight workflow," which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in-depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.
    Type
    a
  17. Raghavan, V.V.; Deogun, J.S.; Sever, H.: Knowledge discovery and data mining : introduction (1998) 0.00
    0.0020506454 = product of:
      0.004101291 = sum of:
        0.004101291 = product of:
          0.008202582 = sum of:
            0.008202582 = weight(_text_:a in 2899) [ClassicSimilarity], result of:
              0.008202582 = score(doc=2899,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1544581 = fieldWeight in 2899, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2899)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Defines knowledge discovery and database mining. The challenge for knowledge discovery in databases (KDD) is to automatically process large quantities of raw data, identifying the most significant and meaningful patterns, and present these as as knowledge appropriate for achieving a user's goals. Data mining is the process of deriving useful knowledge from real world databases through the application of pattern extraction techniques. Explains the goals of, and motivation for, research work on data mining. Discusses the nature of database contents, along with problems within the field of data mining
    Footnote
    Contribution to a special issue devoted to knowledge discovery and data mining
    Type
    a
  18. Search tools (1997) 0.00
    0.0020506454 = product of:
      0.004101291 = sum of:
        0.004101291 = product of:
          0.008202582 = sum of:
            0.008202582 = weight(_text_:a in 3834) [ClassicSimilarity], result of:
              0.008202582 = score(doc=3834,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1544581 = fieldWeight in 3834, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3834)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Offers brief accounts of Internet search tools. Covers the Lycos revamp; the new navigation service produced jointly by Excite and Netscape, delivering a language specific, locally relevant Web guide for Japan, Germany, France, the UK and Australia; InfoWatcher, a combination offline browser, search engine and push product from Carvelle Inc., USA; Alexa by Alexa Internet and WBI from IBM which are free and provide users with information on how others have used the Web sites which they are visiting; and Concept Explorer from Knowledge Discovery Systems, Inc., California which performs data mining from the Web, Usenet groups, MEDLINE and the US Patent and Trademark Office patent abstracts
    Type
    a
  19. Knowledge discovery and data mining (1998) 0.00
    0.0020296127 = product of:
      0.0040592253 = sum of:
        0.0040592253 = product of:
          0.008118451 = sum of:
            0.008118451 = weight(_text_:a in 2898) [ClassicSimilarity], result of:
              0.008118451 = score(doc=2898,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15287387 = fieldWeight in 2898, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2898)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Footnote
    A special issue devoted to knowledge discovery and data mining
  20. Keim, D.A.: Data Mining mit bloßem Auge (2002) 0.00
    0.0020296127 = product of:
      0.0040592253 = sum of:
        0.0040592253 = product of:
          0.008118451 = sum of:
            0.008118451 = weight(_text_:a in 1086) [ClassicSimilarity], result of:
              0.008118451 = score(doc=1086,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15287387 = fieldWeight in 1086, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1086)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a

Years

Languages

  • e 125
  • d 29
  • sp 1
  • More… Less…

Types

  • a 141
  • el 15
  • m 10
  • s 9
  • More… Less…