Search (43 results, page 1 of 3)

  • × theme_ss:"Data Mining"
  1. Wu, T.; Pottenger, W.M.: ¬A semi-supervised active learning algorithm for information extraction from textual data (2005) 0.04
    0.0424988 = product of:
      0.0849976 = sum of:
        0.0849976 = product of:
          0.1699952 = sum of:
            0.1699952 = weight(_text_:learning in 3237) [ClassicSimilarity], result of:
              0.1699952 = score(doc=3237,freq=18.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.73995584 = fieldWeight in 3237, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3237)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this article we present a semi-supervised active learning algorithm for pattern discovery in information extraction from textual data. The patterns are reduced regular expressions composed of various characteristics of features useful in information extraction. Our major contribution is a semi-supervised learning algorithm that extracts information from a set of examples labeled as relevant or irrelevant to a given attribute. The approach is semi-supervised because it does not require precise labeling of the exact location of features in the training data. This significantly reduces the effort needed to develop a training set. An active learning algorithm is used to assist the semi-supervised learning algorithm to further reduce the training set development effort. The active learning algorithm is seeded with a Single positive example of a given attribute. The context of the seed is used to automatically identify candidates for additional positive examples of the given attribute. Candidate examples are manually pruned during the active learning phase, and our semi-supervised learning algorithm automatically discovers reduced regular expressions for each attribute. We have successfully applied this learning technique in the extraction of textual features from police incident reports, university crime reports, and patents. The performance of our algorithm compares favorably with competitive extraction systems being used in criminal justice information systems.
  2. Qiu, X.Y.; Srinivasan, P.; Hu, Y.: Supervised learning models to predict firm performance with annual reports : an empirical study (2014) 0.03
    0.03399904 = product of:
      0.06799808 = sum of:
        0.06799808 = product of:
          0.13599616 = sum of:
            0.13599616 = weight(_text_:learning in 1205) [ClassicSimilarity], result of:
              0.13599616 = score(doc=1205,freq=8.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.59196466 = fieldWeight in 1205, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1205)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Text mining and machine learning methodologies have been applied toward knowledge discovery in several domains, such as biomedicine and business. Interestingly, in the business domain, the text mining and machine learning community has minimally explored company annual reports with their mandatory disclosures. In this study, we explore the question "How can annual reports be used to predict change in company performance from one year to the next?" from a text mining perspective. Our article contributes a systematic study of the potential of company mandatory disclosures using a computational viewpoint in the following aspects: (a) We characterize our research problem along distinct dimensions to gain a reasonably comprehensive understanding of the capacity of supervised learning methods in predicting change in company performance using annual reports, and (b) our findings from unbiased systematic experiments provide further evidence about the economic incentives faced by analysts in their stock recommendations and speculations on analysts having access to more information in producing earnings forecast.
  3. Cardie, C.: Empirical methods in information extraction (1997) 0.03
    0.032054603 = product of:
      0.064109206 = sum of:
        0.064109206 = product of:
          0.12821841 = sum of:
            0.12821841 = weight(_text_:learning in 3246) [ClassicSimilarity], result of:
              0.12821841 = score(doc=3246,freq=4.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.55810964 = fieldWeight in 3246, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3246)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Surveys the use of empirical, machine-learning methods for information extraction. Presents a generic architecture for information extraction systems and surveys the learning algorithms that have been developed to address the problems of accuracy, portability, and knowledge acquisition for each component of the architecture
  4. Jones, K.M.L.; Rubel, A.; LeClere, E.: ¬A matter of trust : higher education institutions as information fiduciaries in an age of educational data mining and learning analytics (2020) 0.03
    0.031676736 = product of:
      0.06335347 = sum of:
        0.06335347 = product of:
          0.12670694 = sum of:
            0.12670694 = weight(_text_:learning in 5968) [ClassicSimilarity], result of:
              0.12670694 = score(doc=5968,freq=10.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.55153054 = fieldWeight in 5968, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5968)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Higher education institutions are mining and analyzing student data to effect educational, political, and managerial outcomes. Done under the banner of "learning analytics," this work can-and often does-surface sensitive data and information about, inter alia, a student's demographics, academic performance, offline and online movements, physical fitness, mental wellbeing, and social network. With these data, institutions and third parties are able to describe student life, predict future behaviors, and intervene to address academic or other barriers to student success (however defined). Learning analytics, consequently, raise serious issues concerning student privacy, autonomy, and the appropriate flow of student data. We argue that issues around privacy lead to valid questions about the degree to which students should trust their institution to use learning analytics data and other artifacts (algorithms, predictive scores) with their interests in mind. We argue that higher education institutions are paradigms of information fiduciaries. As such, colleges and universities have a special responsibility to their students. In this article, we use the information fiduciary concept to analyze cases when learning analytics violate an institution's responsibility to its students.
  5. Fayyad, U.M.: Data mining and knowledge dicovery : making sense out of data (1996) 0.03
    0.028332531 = product of:
      0.056665063 = sum of:
        0.056665063 = product of:
          0.113330126 = sum of:
            0.113330126 = weight(_text_:learning in 7007) [ClassicSimilarity], result of:
              0.113330126 = score(doc=7007,freq=2.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.49330387 = fieldWeight in 7007, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.078125 = fieldNorm(doc=7007)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Defines knowledge discovery and data mining (KDD) as the overall process of extracting high level knowledge from low level data. Outlines the KDD process. Explains how KDD is related to the fields of: statistics, pattern recognition, machine learning, artificial intelligence, databases and data warehouses
  6. Maaten, L. van den: Learning a parametric embedding by preserving local structure (2009) 0.03
    0.028047776 = product of:
      0.05609555 = sum of:
        0.05609555 = product of:
          0.1121911 = sum of:
            0.1121911 = weight(_text_:learning in 3883) [ClassicSimilarity], result of:
              0.1121911 = score(doc=3883,freq=4.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.48834592 = fieldWeight in 3883, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3883)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The paper presents a new unsupervised dimensionality reduction technique, called parametric t-SNE, that learns a parametric mapping between the high-dimensional data space and the low-dimensional latent space. Parametric t-SNE learns the parametric mapping in such a way that the local structure of the data is preserved as well as possible in the latent space. We evaluate the performance of parametric t-SNE in experiments on three datasets, in which we compare it to the performance of two other unsupervised parametric dimensionality reduction techniques. The results of experiments illustrate the strong performance of parametric t-SNE, in particular, in learning settings in which the dimensionality of the latent space is relatively low.
  7. Maaten, L. van den: Accelerating t-SNE using Tree-Based Algorithms (2014) 0.03
    0.028047776 = product of:
      0.05609555 = sum of:
        0.05609555 = product of:
          0.1121911 = sum of:
            0.1121911 = weight(_text_:learning in 3886) [ClassicSimilarity], result of:
              0.1121911 = score(doc=3886,freq=4.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.48834592 = fieldWeight in 3886, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3886)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The paper investigates the acceleration of t-SNE-an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots-using two tree-based algorithms. In particular, the paper develops variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N*logN). Our experiments show that the resulting algorithms substantially accelerate t-SNE, and that they make it possible to learn embeddings of data sets with millions of objects. Somewhat counterintuitively, the Barnes-Hut variant of t-SNE appears to outperform the dual-tree variant.
    Source
    Journal of machine learning research. 15(2014), S.3221-3245
  8. Liu, B.: Web data mining : exploring hyperlinks, contents, and usage data (2011) 0.03
    0.02534139 = product of:
      0.05068278 = sum of:
        0.05068278 = product of:
          0.10136556 = sum of:
            0.10136556 = weight(_text_:learning in 354) [ClassicSimilarity], result of:
              0.10136556 = score(doc=354,freq=10.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.44122443 = fieldWeight in 354, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.03125 = fieldNorm(doc=354)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Web mining aims to discover useful information and knowledge from the Web hyperlink structure, page contents, and usage data. Although Web mining uses many conventional data mining techniques, it is not purely an application of traditional data mining due to the semistructured and unstructured nature of the Web data and its heterogeneity. It has also developed many of its own algorithms and techniques. Liu has written a comprehensive text on Web data mining. Key topics of structure mining, content mining, and usage mining are covered both in breadth and in depth. His book brings together all the essential concepts and algorithms from related areas such as data mining, machine learning, and text processing to form an authoritative and coherent text. The book offers a rich blend of theory and practice, addressing seminal research ideas, as well as examining the technology from a practical point of view. It is suitable for students, researchers and practitioners interested in Web mining both as a learning text and a reference book. Lecturers can readily use it for classes on data mining, Web mining, and Web search. Additional teaching materials such as lecture slides, datasets, and implemented algorithms are available online.
    Content
    Inhalt: 1. Introduction 2. Association Rules and Sequential Patterns 3. Supervised Learning 4. Unsupervised Learning 5. Partially Supervised Learning 6. Information Retrieval and Web Search 7. Social Network Analysis 8. Web Crawling 9. Structured Data Extraction: Wrapper Generation 10. Information Integration
  9. Suakkaphong, N.; Zhang, Z.; Chen, H.: Disease named entity recognition using semisupervised learning and conditional random fields (2011) 0.02
    0.024536695 = product of:
      0.04907339 = sum of:
        0.04907339 = product of:
          0.09814678 = sum of:
            0.09814678 = weight(_text_:learning in 4367) [ClassicSimilarity], result of:
              0.09814678 = score(doc=4367,freq=6.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.42721373 = fieldWeight in 4367, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4367)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Information extraction is an important text-mining task that aims at extracting prespecified types of information from large text collections and making them available in structured representations such as databases. In the biomedical domain, information extraction can be applied to help biologists make the most use of their digital-literature archives. Currently, there are large amounts of biomedical literature that contain rich information about biomedical substances. Extracting such knowledge requires a good named entity recognition technique. In this article, we combine conditional random fields (CRFs), a state-of-the-art sequence-labeling algorithm, with two semisupervised learning techniques, bootstrapping and feature sampling, to recognize disease names from biomedical literature. Two data-processing strategies for each technique also were analyzed: one sequentially processing unlabeled data partitions and another one processing unlabeled data partitions in a round-robin fashion. The experimental results showed the advantage of semisupervised learning techniques given limited labeled training data. Specifically, CRFs with bootstrapping implemented in sequential fashion outperformed strictly supervised CRFs for disease name recognition. The project was supported by NIH/NLM Grant R33 LM07299-01, 2002-2005.
  10. Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.02
    0.024399696 = product of:
      0.04879939 = sum of:
        0.04879939 = product of:
          0.09759878 = sum of:
            0.09759878 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
              0.09759878 = score(doc=4577,freq=2.0), product of:
                0.18018405 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05145426 = queryNorm
                0.5416616 = fieldWeight in 4577, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4577)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    2. 4.2000 18:01:22
  11. Wong, M.L.; Leung, K.S.; Cheng, J.C.Y.: Discovering knowledge from noisy databases using genetic programming (2000) 0.02
    0.024040952 = product of:
      0.048081905 = sum of:
        0.048081905 = product of:
          0.09616381 = sum of:
            0.09616381 = weight(_text_:learning in 4863) [ClassicSimilarity], result of:
              0.09616381 = score(doc=4863,freq=4.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.41858223 = fieldWeight in 4863, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4863)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In data mining, we emphasize the need for learning from huge, incomplete, and imperfect data sets. To handle noise in the problem domain, existing learning systems avoid overfitting the imperfect training examples by excluding insignificant patterns. The problem is that these systems use a limiting attribute-value language for representing the training examples and the induced knowledge. Moreover, some important patterns are ignored because they are statistically insignificant. In this article, we present a framework that combines genetic programming and inductive logic programming to induce knowledge represented in various knowledge representation formalisms from noisy databases (LOGENPRO). Moreover, the system is applied to one real-life medical database. The knowledge discovered provides insights to and allows better understanding of the medical domains
  12. Berendt, B.; Krause, B.; Kolbe-Nusser, S.: Intelligent scientific authoring tools : interactive data mining for constructive uses of citation networks (2010) 0.02
    0.024040952 = product of:
      0.048081905 = sum of:
        0.048081905 = product of:
          0.09616381 = sum of:
            0.09616381 = weight(_text_:learning in 4226) [ClassicSimilarity], result of:
              0.09616381 = score(doc=4226,freq=4.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.41858223 = fieldWeight in 4226, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4226)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Many powerful methods and tools exist for extracting meaning from scientific publications, their texts, and their citation links. However, existing proposals often neglect a fundamental aspect of learning: that understanding and learning require an active and constructive exploration of a domain. In this paper, we describe a new method and a tool that use data mining and interactivity to turn the typical search and retrieve dialogue, in which the user asks questions and a system gives answers, into a dialogue that also involves sense-making, in which the user has to become active by constructing a bibliography and a domain model of the search term(s). This model starts from an automatically generated and annotated clustering solution that is iteratively modified by users. The tool is part of an integrated authoring system covering all phases from search through reading and sense-making to writing. Two evaluation studies demonstrate the usability of this interactive and constructive approach, and they show that clusters and groups represent identifiable sub-topics.
  13. Fayyad, U.M.; Djorgovski, S.G.; Weir, N.: From digitized images to online catalogs : data ming a sky server (1996) 0.02
    0.022666026 = product of:
      0.04533205 = sum of:
        0.04533205 = product of:
          0.0906641 = sum of:
            0.0906641 = weight(_text_:learning in 6625) [ClassicSimilarity], result of:
              0.0906641 = score(doc=6625,freq=2.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.3946431 = fieldWeight in 6625, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6625)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Offers a data mining approach based on machine learning classification methods to the problem of automated cataloguing of online databases of digital images resulting from sky surveys. The SKICAT system automates the reduction and analysis of 3 terabytes of images expected to contain about 2 billion sky objects. It offers a solution to problems associated with the analysis of large data sets in science
  14. KDD : techniques and applications (1998) 0.02
    0.020914026 = product of:
      0.04182805 = sum of:
        0.04182805 = product of:
          0.0836561 = sum of:
            0.0836561 = weight(_text_:22 in 6783) [ClassicSimilarity], result of:
              0.0836561 = score(doc=6783,freq=2.0), product of:
                0.18018405 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05145426 = queryNorm
                0.46428138 = fieldWeight in 6783, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6783)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Footnote
    A special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997
  15. Wong, S.K.M.; Butz, C.J.; Xiang, X.: Automated database schema design using mined data dependencies (1998) 0.02
    0.019832773 = product of:
      0.039665546 = sum of:
        0.039665546 = product of:
          0.07933109 = sum of:
            0.07933109 = weight(_text_:learning in 2897) [ClassicSimilarity], result of:
              0.07933109 = score(doc=2897,freq=2.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.3453127 = fieldWeight in 2897, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2897)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Data dependencies are used in database schema design to enforce the correctness of a database as well as to reduce redundant data. These dependencies are usually determined from the semantics of the attributes and are then enforced upon the relations. Describes a bottom-up procedure for discovering multivalued dependencies in observed data without knowing a priori the relationships among the attributes. The proposed algorithm is an application of the technique designed for learning conditional independencies in probabilistic reasoning. A prototype system for automated database schema design has been implemented. Experiments were carried out to demonstrate both the effectiveness and efficiency of the method
  16. Kong, S.; Ye, F.; Feng, L.; Zhao, Z.: Towards the prediction problems of bursting hashtags on Twitter (2015) 0.02
    0.019832773 = product of:
      0.039665546 = sum of:
        0.039665546 = product of:
          0.07933109 = sum of:
            0.07933109 = weight(_text_:learning in 2338) [ClassicSimilarity], result of:
              0.07933109 = score(doc=2338,freq=2.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.3453127 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Hundreds of thousands of hashtags are generated every day on Twitter. Only a few will burst and become trending topics. In this article, we provide the definition of a bursting hashtag and conduct a systematic study of a series of challenging prediction problems that span the entire life cycles of bursting hashtags. Around the problem of "how to build a system to predict bursting hashtags," we explore different types of features and present machine learning solutions. On real data sets from Twitter, experiments are conducted to evaluate the effectiveness of the proposed solutions and the contributions of features.
  17. Short, M.: Text mining and subject analysis for fiction; or, using machine learning and information extraction to assign subject headings to dime novels (2019) 0.02
    0.019832773 = product of:
      0.039665546 = sum of:
        0.039665546 = product of:
          0.07933109 = sum of:
            0.07933109 = weight(_text_:learning in 5481) [ClassicSimilarity], result of:
              0.07933109 = score(doc=5481,freq=2.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.3453127 = fieldWeight in 5481, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5481)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  18. Chakrabarti, S.: Mining the Web : discovering knowledge from hypertext data (2003) 0.02
    0.019629356 = product of:
      0.03925871 = sum of:
        0.03925871 = product of:
          0.07851742 = sum of:
            0.07851742 = weight(_text_:learning in 2222) [ClassicSimilarity], result of:
              0.07851742 = score(doc=2222,freq=6.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.34177098 = fieldWeight in 2222, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2222)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Footnote
    Rez. in: JASIST 55(2004) no.3, S.275-276 (C. Chen): "This is a book about finding significant statistical patterns on the Web - in particular, patterns that are associated with hypertext documents, topics, hyperlinks, and queries. The term pattern in this book refers to dependencies among such items. On the one hand, the Web contains useful information an just about every topic under the sun. On the other hand, just like searching for a needle in a haystack, one would need powerful tools to locate useful information an the vast land of the Web. Soumen Chakrabarti's book focuses an a wide range of techniques for machine learning and data mining an the Web. The goal of the book is to provide both the technical Background and tools and tricks of the trade of Web content mining. Much of the technical content reflects the state of the art between 1995 and 2002. The targeted audience is researchers and innovative developers in this area, as well as newcomers who intend to enter this area. The book begins with an introduction chapter. The introduction chapter explains fundamental concepts such as crawling and indexing as well as clustering and classification. The remaining eight chapters are organized into three parts: i) infrastructure, ii) learning and iii) applications.
    Part I, Infrastructure, has two chapters: Chapter 2 on crawling the Web and Chapter 3 an Web search and information retrieval. The second part of the book, containing chapters 4, 5, and 6, is the centerpiece. This part specifically focuses an machine learning in the context of hypertext. Part III is a collection of applications that utilize the techniques described in earlier chapters. Chapter 7 is an social network analysis. Chapter 8 is an resource discovery. Chapter 9 is an the future of Web mining. Overall, this is a valuable reference book for researchers and developers in the field of Web mining. It should be particularly useful for those who would like to design and probably code their own Computer programs out of the equations and pseudocodes an most of the pages. For a student, the most valuable feature of the book is perhaps the formal and consistent treatments of concepts across the board. For what is behind and beyond the technical details, one has to either dig deeper into the bibliographic notes at the end of each chapter, or resort to more in-depth analysis of relevant subjects in the literature. lf you are looking for successful stories about Web mining or hard-way-learned lessons of failures, this is not the book."
  19. Mining text data (2012) 0.02
    0.019629356 = product of:
      0.03925871 = sum of:
        0.03925871 = product of:
          0.07851742 = sum of:
            0.07851742 = weight(_text_:learning in 362) [ClassicSimilarity], result of:
              0.07851742 = score(doc=362,freq=6.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.34177098 = fieldWeight in 362, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.03125 = fieldNorm(doc=362)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including the key research content on the topic, and the future directions of research in the field. There is a special focus on Text Embedded with Heterogeneous and Multimedia Data which makes the mining process much more challenging. A number of methods have been designed such as transfer learning and cross-lingual mining for such cases. Mining Text Data simplifies the content, so that advanced-level students, practitioners and researchers in computer science can benefit from this book. Academic and corporate libraries, as well as ACM, IEEE, and Management Science focused on information security, electronic commerce, databases, data mining, machine learning, and statistics are the primary buyers for this reference book.
    Content
    Inhalt: An Introduction to Text Mining.- Information Extraction from Text.- A Survey of Text Summarization Techniques.- A Survey of Text Clustering Algorithms.- Dimensionality Reduction and Topic Modeling.- A Survey of Text Classification Algorithms.- Transfer Learning for Text Mining.- Probabilistic Models for Text Mining.- Mining Text Streams.- Translingual Mining from Text Data.- Text Mining in Multimedia.- Text Analytics in Social Media.- A Survey of Opinion Mining and Sentiment Analysis.- Biomedical Text Mining: A Survey of Recent Progress.- Index.
  20. Information visualization in data mining and knowledge discovery (2002) 0.02
    0.018304355 = product of:
      0.03660871 = sum of:
        0.03660871 = sum of:
          0.022666026 = weight(_text_:learning in 1789) [ClassicSimilarity], result of:
            0.022666026 = score(doc=1789,freq=2.0), product of:
              0.22973695 = queryWeight, product of:
                4.464877 = idf(docFreq=1382, maxDocs=44218)
                0.05145426 = queryNorm
              0.098660775 = fieldWeight in 1789, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.464877 = idf(docFreq=1382, maxDocs=44218)
                0.015625 = fieldNorm(doc=1789)
          0.013942684 = weight(_text_:22 in 1789) [ClassicSimilarity], result of:
            0.013942684 = score(doc=1789,freq=2.0), product of:
              0.18018405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05145426 = queryNorm
              0.07738023 = fieldWeight in 1789, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.015625 = fieldNorm(doc=1789)
      0.5 = coord(1/2)
    
    Date
    23. 3.2008 19:10:22
    Footnote
    Rez. in: JASIST 54(2003) no.9, S.905-906 (C.A. Badurek): "Visual approaches for knowledge discovery in very large databases are a prime research need for information scientists focused an extracting meaningful information from the ever growing stores of data from a variety of domains, including business, the geosciences, and satellite and medical imagery. This work presents a summary of research efforts in the fields of data mining, knowledge discovery, and data visualization with the goal of aiding the integration of research approaches and techniques from these major fields. The editors, leading computer scientists from academia and industry, present a collection of 32 papers from contributors who are incorporating visualization and data mining techniques through academic research as well application development in industry and government agencies. Information Visualization focuses upon techniques to enhance the natural abilities of humans to visually understand data, in particular, large-scale data sets. It is primarily concerned with developing interactive graphical representations to enable users to more intuitively make sense of multidimensional data as part of the data exploration process. It includes research from computer science, psychology, human-computer interaction, statistics, and information science. Knowledge Discovery in Databases (KDD) most often refers to the process of mining databases for previously unknown patterns and trends in data. Data mining refers to the particular computational methods or algorithms used in this process. The data mining research field is most related to computational advances in database theory, artificial intelligence and machine learning. This work compiles research summaries from these main research areas in order to provide "a reference work containing the collection of thoughts and ideas of noted researchers from the fields of data mining and data visualization" (p. 8). It addresses these areas in three main sections: the first an data visualization, the second an KDD and model visualization, and the last an using visualization in the knowledge discovery process. The seven chapters of Part One focus upon methodologies and successful techniques from the field of Data Visualization. Hoffman and Grinstein (Chapter 2) give a particularly good overview of the field of data visualization and its potential application to data mining. An introduction to the terminology of data visualization, relation to perceptual and cognitive science, and discussion of the major visualization display techniques are presented. Discussion and illustration explain the usefulness and proper context of such data visualization techniques as scatter plots, 2D and 3D isosurfaces, glyphs, parallel coordinates, and radial coordinate visualizations. Remaining chapters present the need for standardization of visualization methods, discussion of user requirements in the development of tools, and examples of using information visualization in addressing research problems.

Years

Languages

  • e 36
  • d 7

Types