Search (29 results, page 1 of 2)

  • × theme_ss:"Data Mining"
  • × year_i:[2000 TO 2010}
  1. Budzik, J.; Hammond, K.J.; Birnbaum, L.: Information access in context (2001) 0.06
    0.056041002 = product of:
      0.1401025 = sum of:
        0.1129755 = weight(_text_:context in 3835) [ClassicSimilarity], result of:
          0.1129755 = score(doc=3835,freq=2.0), product of:
            0.17622331 = queryWeight, product of:
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.04251826 = queryNorm
            0.64109284 = fieldWeight in 3835, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.109375 = fieldNorm(doc=3835)
        0.027127001 = product of:
          0.081381 = sum of:
            0.081381 = weight(_text_:29 in 3835) [ClassicSimilarity], result of:
              0.081381 = score(doc=3835,freq=2.0), product of:
                0.14956595 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.04251826 = queryNorm
                0.5441145 = fieldWeight in 3835, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3835)
          0.33333334 = coord(1/3)
      0.4 = coord(2/5)
    
    Date
    29. 3.2002 17:31:17
  2. Information visualization in data mining and knowledge discovery (2002) 0.03
    0.026763054 = product of:
      0.044605087 = sum of:
        0.022824498 = weight(_text_:context in 1789) [ClassicSimilarity], result of:
          0.022824498 = score(doc=1789,freq=4.0), product of:
            0.17622331 = queryWeight, product of:
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.04251826 = queryNorm
            0.12952031 = fieldWeight in 1789, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.015625 = fieldNorm(doc=1789)
        0.017940165 = weight(_text_:index in 1789) [ClassicSimilarity], result of:
          0.017940165 = score(doc=1789,freq=2.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.09655905 = fieldWeight in 1789, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.015625 = fieldNorm(doc=1789)
        0.0038404248 = product of:
          0.011521274 = sum of:
            0.011521274 = weight(_text_:22 in 1789) [ClassicSimilarity], result of:
              0.011521274 = score(doc=1789,freq=2.0), product of:
                0.1488917 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04251826 = queryNorm
                0.07738023 = fieldWeight in 1789, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.015625 = fieldNorm(doc=1789)
          0.33333334 = coord(1/3)
      0.6 = coord(3/5)
    
    Date
    23. 3.2008 19:10:22
    Footnote
    Rez. in: JASIST 54(2003) no.9, S.905-906 (C.A. Badurek): "Visual approaches for knowledge discovery in very large databases are a prime research need for information scientists focused an extracting meaningful information from the ever growing stores of data from a variety of domains, including business, the geosciences, and satellite and medical imagery. This work presents a summary of research efforts in the fields of data mining, knowledge discovery, and data visualization with the goal of aiding the integration of research approaches and techniques from these major fields. The editors, leading computer scientists from academia and industry, present a collection of 32 papers from contributors who are incorporating visualization and data mining techniques through academic research as well application development in industry and government agencies. Information Visualization focuses upon techniques to enhance the natural abilities of humans to visually understand data, in particular, large-scale data sets. It is primarily concerned with developing interactive graphical representations to enable users to more intuitively make sense of multidimensional data as part of the data exploration process. It includes research from computer science, psychology, human-computer interaction, statistics, and information science. Knowledge Discovery in Databases (KDD) most often refers to the process of mining databases for previously unknown patterns and trends in data. Data mining refers to the particular computational methods or algorithms used in this process. The data mining research field is most related to computational advances in database theory, artificial intelligence and machine learning. This work compiles research summaries from these main research areas in order to provide "a reference work containing the collection of thoughts and ideas of noted researchers from the fields of data mining and data visualization" (p. 8). It addresses these areas in three main sections: the first an data visualization, the second an KDD and model visualization, and the last an using visualization in the knowledge discovery process. The seven chapters of Part One focus upon methodologies and successful techniques from the field of Data Visualization. Hoffman and Grinstein (Chapter 2) give a particularly good overview of the field of data visualization and its potential application to data mining. An introduction to the terminology of data visualization, relation to perceptual and cognitive science, and discussion of the major visualization display techniques are presented. Discussion and illustration explain the usefulness and proper context of such data visualization techniques as scatter plots, 2D and 3D isosurfaces, glyphs, parallel coordinates, and radial coordinate visualizations. Remaining chapters present the need for standardization of visualization methods, discussion of user requirements in the development of tools, and examples of using information visualization in addressing research problems.
    In 13 chapters, Part Two provides an introduction to KDD, an overview of data mining techniques, and examples of the usefulness of data model visualizations. The importance of visualization throughout the KDD process is stressed in many of the chapters. In particular, the need for measures of visualization effectiveness, benchmarking for identifying best practices, and the use of standardized sample data sets is convincingly presented. Many of the important data mining approaches are discussed in this complementary context. Cluster and outlier detection, classification techniques, and rule discovery algorithms are presented as the basic techniques common to the KDD process. The potential effectiveness of using visualization in the data modeling process are illustrated in chapters focused an using visualization for helping users understand the KDD process, ask questions and form hypotheses about their data, and evaluate the accuracy and veracity of their results. The 11 chapters of Part Three provide an overview of the KDD process and successful approaches to integrating KDD, data mining, and visualization in complementary domains. Rhodes (Chapter 21) begins this section with an excellent overview of the relation between the KDD process and data mining techniques. He states that the "primary goals of data mining are to describe the existing data and to predict the behavior or characteristics of future data of the same type" (p. 281). These goals are met by data mining tasks such as classification, regression, clustering, summarization, dependency modeling, and change or deviation detection. Subsequent chapters demonstrate how visualization can aid users in the interactive process of knowledge discovery by graphically representing the results from these iterative tasks. Finally, examples of the usefulness of integrating visualization and data mining tools in the domain of business, imagery and text mining, and massive data sets are provided. This text concludes with a thorough and useful 17-page index and lengthy yet integrating 17-page summary of the academic and industrial backgrounds of the contributing authors. A 16-page set of color inserts provide a better representation of the visualizations discussed, and a URL provided suggests that readers may view all the book's figures in color on-line, although as of this submission date it only provides access to a summary of the book and its contents. The overall contribution of this work is its focus an bridging two distinct areas of research, making it a valuable addition to the Morgan Kaufmann Series in Database Management Systems. The editors of this text have met their main goal of providing the first textbook integrating knowledge discovery, data mining, and visualization. Although it contributes greatly to our under- standing of the development and current state of the field, a major weakness of this text is that there is no concluding chapter to discuss the contributions of the sum of these contributed papers or give direction to possible future areas of research. "Integration of expertise between two different disciplines is a difficult process of communication and reeducation. Integrating data mining and visualization is particularly complex because each of these fields in itself must draw an a wide range of research experience" (p. 300). Although this work contributes to the crossdisciplinary communication needed to advance visualization in KDD, a more formal call for an interdisciplinary research agenda in a concluding chapter would have provided a more satisfying conclusion to a very good introductory text.
  3. Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.01
    0.012685614 = product of:
      0.06342807 = sum of:
        0.06342807 = weight(_text_:index in 5997) [ClassicSimilarity], result of:
          0.06342807 = score(doc=5997,freq=4.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.3413878 = fieldWeight in 5997, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5997)
      0.2 = coord(1/5)
    
    Content
    Data Analysis, Statistics, and Classification.- Pattern Recognition and Automation.- Data Mining, Information Processing, and Automation.- New Media, Web Mining, and Automation.- Applications in Management Science, Finance, and Marketing.- Applications in Medicine, Biology, Archaeology, and Others.- Author Index.- Subject Index.
  4. Peters, G.; Gaese, V.: ¬Das DocCat-System in der Textdokumentation von G+J (2003) 0.01
    0.01052821 = product of:
      0.026320525 = sum of:
        0.018639674 = weight(_text_:system in 1507) [ClassicSimilarity], result of:
          0.018639674 = score(doc=1507,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.13919188 = fieldWeight in 1507, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03125 = fieldNorm(doc=1507)
        0.0076808496 = product of:
          0.023042548 = sum of:
            0.023042548 = weight(_text_:22 in 1507) [ClassicSimilarity], result of:
              0.023042548 = score(doc=1507,freq=2.0), product of:
                0.1488917 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04251826 = queryNorm
                0.15476047 = fieldWeight in 1507, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1507)
          0.33333334 = coord(1/3)
      0.4 = coord(2/5)
    
    Date
    22. 4.2003 11:45:36
  5. Chen, C.-C.; Chen, A.-P.: Using data mining technology to provide a recommendation service in the digital library (2007) 0.01
    0.008071217 = product of:
      0.04035608 = sum of:
        0.04035608 = weight(_text_:system in 2533) [ClassicSimilarity], result of:
          0.04035608 = score(doc=2533,freq=6.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.30135927 = fieldWeight in 2533, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2533)
      0.2 = coord(1/5)
    
    Abstract
    Purpose - Since library storage has been increasing day by day, it is difficult for readers to find the books which interest them as well as representative booklists. How to utilize meaningful information effectively to improve the service quality of the digital library appears to be very important. The purpose of this paper is to provide a recommendation system architecture to promote digital library services in electronic libraries. Design/methodology/approach - In the proposed architecture, a two-phase data mining process used by association rule and clustering methods is designed to generate a recommendation system. The process considers not only the relationship of a cluster of users but also the associations among the information accessed. Findings - The process considered not only the relationship of a cluster of users but also the associations among the information accessed. With the advanced filter, the recommendation supported by the proposed system architecture would be closely served to meet users' needs. Originality/value - This paper not only constructs a recommendation service for readers to search books from the web but takes the initiative in finding the most suitable books for readers as well. Furthermore, library managers are expected to purchase core and hot books from a limited budget to maintain and satisfy the requirements of readers along with promoting digital library services.
  6. Wu, T.; Pottenger, W.M.: ¬A semi-supervised active learning algorithm for information extraction from textual data (2005) 0.01
    0.008069678 = product of:
      0.040348392 = sum of:
        0.040348392 = weight(_text_:context in 3237) [ClassicSimilarity], result of:
          0.040348392 = score(doc=3237,freq=2.0), product of:
            0.17622331 = queryWeight, product of:
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.04251826 = queryNorm
            0.22896172 = fieldWeight in 3237, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3237)
      0.2 = coord(1/5)
    
    Abstract
    In this article we present a semi-supervised active learning algorithm for pattern discovery in information extraction from textual data. The patterns are reduced regular expressions composed of various characteristics of features useful in information extraction. Our major contribution is a semi-supervised learning algorithm that extracts information from a set of examples labeled as relevant or irrelevant to a given attribute. The approach is semi-supervised because it does not require precise labeling of the exact location of features in the training data. This significantly reduces the effort needed to develop a training set. An active learning algorithm is used to assist the semi-supervised learning algorithm to further reduce the training set development effort. The active learning algorithm is seeded with a Single positive example of a given attribute. The context of the seed is used to automatically identify candidates for additional positive examples of the given attribute. Candidate examples are manually pruned during the active learning phase, and our semi-supervised learning algorithm automatically discovers reduced regular expressions for each attribute. We have successfully applied this learning technique in the extraction of textual features from police incident reports, university crime reports, and patents. The performance of our algorithm compares favorably with competitive extraction systems being used in criminal justice information systems.
  7. Haravu, L.J.; Neelameghan, A.: Text mining and data mining in knowledge organization and discovery : the making of knowledge-based products (2003) 0.01
    0.008069678 = product of:
      0.040348392 = sum of:
        0.040348392 = weight(_text_:context in 5653) [ClassicSimilarity], result of:
          0.040348392 = score(doc=5653,freq=2.0), product of:
            0.17622331 = queryWeight, product of:
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.04251826 = queryNorm
            0.22896172 = fieldWeight in 5653, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5653)
      0.2 = coord(1/5)
    
    Abstract
    Discusses the importance of knowledge organization in the context of the information overload caused by the vast quantities of data and information accessible on internal and external networks of an organization. Defines the characteristics of a knowledge-based product. Elaborates on the techniques and applications of text mining in developing knowledge products. Presents two approaches, as case studies, to the making of knowledge products: (1) steps and processes in the planning, designing and development of a composite multilingual multimedia CD product, with the potential international, inter-cultural end users in view, and (2) application of natural language processing software in text mining. Using a text mining software, it is possible to link concept terms from a processed text to a related thesaurus, glossary, schedules of a classification scheme, and facet structured subject representations. Concludes that the products of text mining and data mining could be made more useful if the features of a faceted scheme for subject classification are incorporated into text mining techniques and products.
  8. Pons-Porrata, A.; Berlanga-Llavori, R.; Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques (2007) 0.01
    0.007908144 = product of:
      0.03954072 = sum of:
        0.03954072 = weight(_text_:system in 916) [ClassicSimilarity], result of:
          0.03954072 = score(doc=916,freq=4.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.29527056 = fieldWeight in 916, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=916)
      0.2 = coord(1/5)
    
    Abstract
    In this paper, we present a topic discovery system aimed to reveal the implicit knowledge present in news streams. This knowledge is expressed as a hierarchy of topic/subtopics, where each topic contains the set of documents that are related to it and a summary extracted from these documents. Summaries so built are useful to browse and select topics of interest from the generated hierarchies. Our proposal consists of a new incremental hierarchical clustering algorithm, which combines both partitional and agglomerative approaches, taking the main benefits from them. Finally, a new summarization method based on Testor Theory has been proposed to build the topic summaries. Experimental results in the TDT2 collection demonstrate its usefulness and effectiveness not only as a topic detection system, but also as a classification and summarization tool.
  9. Medien-Informationsmanagement : Archivarische, dokumentarische, betriebswirtschaftliche, rechtliche und Berufsbild-Aspekte ; [Frühjahrstagung der Fachgruppe 7 im Jahr 2000 in Weimar und Folgetagung 2001 in Köln] (2003) 0.01
    0.007896158 = product of:
      0.019740393 = sum of:
        0.013979756 = weight(_text_:system in 1833) [ClassicSimilarity], result of:
          0.013979756 = score(doc=1833,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.104393914 = fieldWeight in 1833, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1833)
        0.005760637 = product of:
          0.01728191 = sum of:
            0.01728191 = weight(_text_:22 in 1833) [ClassicSimilarity], result of:
              0.01728191 = score(doc=1833,freq=2.0), product of:
                0.1488917 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04251826 = queryNorm
                0.116070345 = fieldWeight in 1833, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=1833)
          0.33333334 = coord(1/3)
      0.4 = coord(2/5)
    
    Content
    Enthält u.a. die Beiträge (Dokumentarische Aspekte): Günter Perers/Volker Gaese: Das DocCat-System in der Textdokumentation von Gr+J (Weimar 2000) Thomas Gerick: Finden statt suchen. Knowledge Retrieval in Wissensbanken. Mit organisiertem Wissen zu mehr Erfolg (Weimar 2000) Winfried Gödert: Aufbereitung und Rezeption von Information (Weimar 2000) Elisabeth Damen: Klassifikation als Ordnungssystem im elektronischen Pressearchiv (Köln 2001) Clemens Schlenkrich: Aspekte neuer Regelwerksarbeit - Multimediales Datenmodell für ARD und ZDF (Köln 2001) Josef Wandeler: Comprenez-vous only Bahnhof'? - Mehrsprachigkeit in der Mediendokumentation (Köln 200 1)
    Date
    11. 5.2008 19:49:22
  10. Wang, W.M.; Cheung, C.F.; Lee, W.B.; Kwok, S.K.: Mining knowledge from natural language texts using fuzzy associated concept mapping (2008) 0.01
    0.0071760663 = product of:
      0.03588033 = sum of:
        0.03588033 = weight(_text_:index in 2121) [ClassicSimilarity], result of:
          0.03588033 = score(doc=2121,freq=2.0), product of:
            0.18579477 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.04251826 = queryNorm
            0.1931181 = fieldWeight in 2121, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.03125 = fieldNorm(doc=2121)
      0.2 = coord(1/5)
    
    Abstract
    Natural Language Processing (NLP) techniques have been successfully used to automatically extract information from unstructured text through a detailed analysis of their content, often to satisfy particular information needs. In this paper, an automatic concept map construction technique, Fuzzy Association Concept Mapping (FACM), is proposed for the conversion of abstracted short texts into concept maps. The approach consists of a linguistic module and a recommendation module. The linguistic module is a text mining method that does not require the use to have any prior knowledge about using NLP techniques. It incorporates rule-based reasoning (RBR) and case based reasoning (CBR) for anaphoric resolution. It aims at extracting the propositions in text so as to construct a concept map automatically. The recommendation module is arrived at by adopting fuzzy set theories. It is an interactive process which provides suggestions of propositions for further human refinement of the automatically generated concept maps. The suggested propositions are relationships among the concepts which are not explicitly found in the paragraphs. This technique helps to stimulate individual reflection and generate new knowledge. Evaluation was carried out by using the Science Citation Index (SCI) abstract database and CNET News as test data, which are well known databases and the quality of the text is assured. Experimental results show that the automatically generated concept maps conform to the outputs generated manually by domain experts, since the degree of difference between them is proportionally small. The method provides users with the ability to convert scientific and short texts into a structured format which can be easily processed by computer. Moreover, it provides knowledge workers with extra time to re-think their written text and to view their knowledge from another angle.
  11. Hereth, J.; Stumme, G.; Wille, R.; Wille, U.: Conceptual knowledge discovery and data analysis (2000) 0.01
    0.0065901205 = product of:
      0.032950602 = sum of:
        0.032950602 = weight(_text_:system in 5083) [ClassicSimilarity], result of:
          0.032950602 = score(doc=5083,freq=4.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.24605882 = fieldWeight in 5083, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5083)
      0.2 = coord(1/5)
    
    Abstract
    In this paper, we discuss Conceptual Knowledge Discovery in Databases (CKDD) in its connection with Data Analysis. Our approach is based on Formal Concept Analysis, a mathematical theory which has been developed and proven useful during the last 20 years. Formal Concept Analysis has led to a theory of conceptual information systems which has been applied by using the management system TOSCANA in a wide range of domains. In this paper, we use such an application in database marketing to demonstrate how methods and procedures of CKDD can be applied in Data Analysis. In particular, we show the interplay and integration of data mining and data analysis techniques based on Formal Concept Analysis. The main concern of this paper is to explain how the transition from data to knowledge can be supported by a TOSCANA system. To clarify the transition steps we discuss their correspondence to the five levels of knowledge representation established by R. Brachman and to the steps of empirically grounded theory building proposed by A. Strauss and J. Corbin
  12. Liu, Y.; Huang, X.; An, A.: Personalized recommendation with adaptive mixture of markov models (2007) 0.01
    0.0065901205 = product of:
      0.032950602 = sum of:
        0.032950602 = weight(_text_:system in 606) [ClassicSimilarity], result of:
          0.032950602 = score(doc=606,freq=4.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.24605882 = fieldWeight in 606, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=606)
      0.2 = coord(1/5)
    
    Abstract
    With more and more information available on the Internet, the task of making personalized recommendations to assist the user's navigation has become increasingly important. Considering there might be millions of users with different backgrounds accessing a Web site everyday, it is infeasible to build a separate recommendation system for each user. To address this problem, clustering techniques can first be employed to discover user groups. Then, user navigation patterns for each group can be discovered, to allow the adaptation of a Web site to the interest of each individual group. In this paper, we propose to model user access sequences as stochastic processes, and a mixture of Markov models based approach is taken to cluster users and to capture the sequential relationships inherent in user access histories. Several important issues that arise in constructing the Markov models are also addressed. The first issue lies in the complexity of the mixture of Markov models. To improve the efficiency of building/maintaining the mixture of Markov models, we develop a lightweight adapt-ive algorithm to update the model parameters without recomputing model parameters from scratch. The second issue concerns the proper selection of training data for building the mixture of Markov models. We investigate two different training data selection strategies and perform extensive experiments to compare their effectiveness on a real dataset that is generated by a Web-based knowledge management system, Livelink.
  13. Liu, W.; Weichselbraun, A.; Scharl, A.; Chang, E.: Semi-automatic ontology extension using spreading activation (2005) 0.01
    0.006523886 = product of:
      0.03261943 = sum of:
        0.03261943 = weight(_text_:system in 3028) [ClassicSimilarity], result of:
          0.03261943 = score(doc=3028,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.2435858 = fieldWeight in 3028, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3028)
      0.2 = coord(1/5)
    
    Abstract
    This paper describes a system to semi-automatically extend and refine ontologies by mining textual data from the Web sites of international online media. Expanding a seed ontology creates a semantic network through co-occurrence analysis, trigger phrase analysis, and disambiguation based on the WordNet lexical dictionary. Spreading activation then processes this semantic network to find the most probable candidates for inclusion in an extended ontology. Approaches to identifying hierarchical relationships such as subsumption, head noun analysis and WordNet consultation are used to confirm and classify the found relationships. Using a seed ontology on "climate change" as an example, this paper demonstrates how spreading activation improves the result by naturally integrating the mentioned methods.
  14. Chakrabarti, S.: Mining the Web : discovering knowledge from hypertext data (2003) 0.01
    0.0064557428 = product of:
      0.032278713 = sum of:
        0.032278713 = weight(_text_:context in 2222) [ClassicSimilarity], result of:
          0.032278713 = score(doc=2222,freq=2.0), product of:
            0.17622331 = queryWeight, product of:
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.04251826 = queryNorm
            0.18316938 = fieldWeight in 2222, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.03125 = fieldNorm(doc=2222)
      0.2 = coord(1/5)
    
    Footnote
    Part I, Infrastructure, has two chapters: Chapter 2 on crawling the Web and Chapter 3 an Web search and information retrieval. The second part of the book, containing chapters 4, 5, and 6, is the centerpiece. This part specifically focuses an machine learning in the context of hypertext. Part III is a collection of applications that utilize the techniques described in earlier chapters. Chapter 7 is an social network analysis. Chapter 8 is an resource discovery. Chapter 9 is an the future of Web mining. Overall, this is a valuable reference book for researchers and developers in the field of Web mining. It should be particularly useful for those who would like to design and probably code their own Computer programs out of the equations and pseudocodes an most of the pages. For a student, the most valuable feature of the book is perhaps the formal and consistent treatments of concepts across the board. For what is behind and beyond the technical details, one has to either dig deeper into the bibliographic notes at the end of each chapter, or resort to more in-depth analysis of relevant subjects in the literature. lf you are looking for successful stories about Web mining or hard-way-learned lessons of failures, this is not the book."
  15. Wong, M.L.; Leung, K.S.; Cheng, J.C.Y.: Discovering knowledge from noisy databases using genetic programming (2000) 0.01
    0.0055919024 = product of:
      0.027959513 = sum of:
        0.027959513 = weight(_text_:system in 4863) [ClassicSimilarity], result of:
          0.027959513 = score(doc=4863,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.20878783 = fieldWeight in 4863, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=4863)
      0.2 = coord(1/5)
    
    Abstract
    In data mining, we emphasize the need for learning from huge, incomplete, and imperfect data sets. To handle noise in the problem domain, existing learning systems avoid overfitting the imperfect training examples by excluding insignificant patterns. The problem is that these systems use a limiting attribute-value language for representing the training examples and the induced knowledge. Moreover, some important patterns are ignored because they are statistically insignificant. In this article, we present a framework that combines genetic programming and inductive logic programming to induce knowledge represented in various knowledge representation formalisms from noisy databases (LOGENPRO). Moreover, the system is applied to one real-life medical database. The knowledge discovered provides insights to and allows better understanding of the medical domains
  16. Fenstermacher, K.D.; Ginsburg, M.: Client-side monitoring for Web mining (2003) 0.01
    0.0055919024 = product of:
      0.027959513 = sum of:
        0.027959513 = weight(_text_:system in 1611) [ClassicSimilarity], result of:
          0.027959513 = score(doc=1611,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.20878783 = fieldWeight in 1611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=1611)
      0.2 = coord(1/5)
    
    Abstract
    "Garbage in, garbage out" is a well-known phrase in computer analysis, and one that comes to mind when mining Web data to draw conclusions about Web users. The challenge is that data analysts wish to infer patterns of client-side behavior from server-side data. However, because only a fraction of the user's actions ever reaches the Web server, analysts must rely an incomplete data. In this paper, we propose a client-side monitoring system that is unobtrusive and supports flexible data collection. Moreover, the proposed framework encompasses client-side applications beyond the Web browser. Expanding monitoring beyond the browser to incorporate standard office productivity tools enables analysts to derive a much richer and more accurate picture of user behavior an the Web.
  17. Sánchez, D.; Chamorro-Martínez, J.; Vila, M.A.: Modelling subjectivity in visual perception of orientation for image retrieval (2003) 0.01
    0.0055919024 = product of:
      0.027959513 = sum of:
        0.027959513 = weight(_text_:system in 1067) [ClassicSimilarity], result of:
          0.027959513 = score(doc=1067,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.20878783 = fieldWeight in 1067, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=1067)
      0.2 = coord(1/5)
    
    Abstract
    In this paper we combine computer vision and data mining techniques to model high-level concepts for image retrieval, on the basis of basic perceptual features of the human visual system. High-level concepts related to these features are learned and represented by means of a set of fuzzy association rules. The concepts so acquired can be used for image retrieval with the advantage that it is not needed to provide an image as a query. Instead, a query is formulated by using the labels that identify the learned concepts as search terms, and the retrieval process calculates the relevance of an image to the query by an inference mechanism. An additional feature of our methodology is that it can capture user's subjectivity. For that purpose, fuzzy sets theory is employed to measure user's assessments about the fulfillment of a concept by an image.
  18. Witten, I.H.; Frank, E.: Data Mining : Praktische Werkzeuge und Techniken für das maschinelle Lernen (2000) 0.00
    0.004650343 = product of:
      0.023251716 = sum of:
        0.023251716 = product of:
          0.069755144 = sum of:
            0.069755144 = weight(_text_:29 in 6833) [ClassicSimilarity], result of:
              0.069755144 = score(doc=6833,freq=2.0), product of:
                0.14956595 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.04251826 = queryNorm
                0.46638384 = fieldWeight in 6833, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6833)
          0.33333334 = coord(1/3)
      0.2 = coord(1/5)
    
    Date
    27. 1.1996 10:29:55
  19. Keim, D.A.: Data Mining mit bloßem Auge (2002) 0.00
    0.004650343 = product of:
      0.023251716 = sum of:
        0.023251716 = product of:
          0.069755144 = sum of:
            0.069755144 = weight(_text_:29 in 1086) [ClassicSimilarity], result of:
              0.069755144 = score(doc=1086,freq=2.0), product of:
                0.14956595 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.04251826 = queryNorm
                0.46638384 = fieldWeight in 1086, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1086)
          0.33333334 = coord(1/3)
      0.2 = coord(1/5)
    
    Date
    31.12.1996 19:29:41
  20. Kruse, R.; Borgelt, C.: Suche im Datendschungel (2002) 0.00
    0.004650343 = product of:
      0.023251716 = sum of:
        0.023251716 = product of:
          0.069755144 = sum of:
            0.069755144 = weight(_text_:29 in 1087) [ClassicSimilarity], result of:
              0.069755144 = score(doc=1087,freq=2.0), product of:
                0.14956595 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.04251826 = queryNorm
                0.46638384 = fieldWeight in 1087, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1087)
          0.33333334 = coord(1/3)
      0.2 = coord(1/5)
    
    Date
    31.12.1996 19:29:41

Languages

  • e 18
  • d 11

Types