Search (133 results, page 1 of 7)

  • × theme_ss:"Data Mining"
  1. Information visualization in data mining and knowledge discovery (2002) 0.06
    0.060118683 = product of:
      0.108213626 = sum of:
        0.026920462 = weight(_text_:line in 1789) [ClassicSimilarity], result of:
          0.026920462 = score(doc=1789,freq=2.0), product of:
            0.21724595 = queryWeight, product of:
              5.6078424 = idf(docFreq=440, maxDocs=44218)
              0.038739666 = queryNorm
            0.123916976 = fieldWeight in 1789, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6078424 = idf(docFreq=440, maxDocs=44218)
              0.015625 = fieldNorm(doc=1789)
        0.009870647 = weight(_text_:information in 1789) [ClassicSimilarity], result of:
          0.009870647 = score(doc=1789,freq=28.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.14514244 = fieldWeight in 1789, product of:
              5.2915025 = tf(freq=28.0), with freq of:
                28.0 = termFreq=28.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.015625 = fieldNorm(doc=1789)
        0.011077258 = weight(_text_:retrieval in 1789) [ClassicSimilarity], result of:
          0.011077258 = score(doc=1789,freq=4.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.09452859 = fieldWeight in 1789, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.015625 = fieldNorm(doc=1789)
        0.055096574 = weight(_text_:techniques in 1789) [ClassicSimilarity], result of:
          0.055096574 = score(doc=1789,freq=22.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.32284945 = fieldWeight in 1789, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.015625 = fieldNorm(doc=1789)
        0.00524869 = product of:
          0.01049738 = sum of:
            0.01049738 = weight(_text_:22 in 1789) [ClassicSimilarity], result of:
              0.01049738 = score(doc=1789,freq=2.0), product of:
                0.13565971 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038739666 = queryNorm
                0.07738023 = fieldWeight in 1789, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.015625 = fieldNorm(doc=1789)
          0.5 = coord(1/2)
      0.5555556 = coord(5/9)
    
    Date
    23. 3.2008 19:10:22
    Footnote
    Rez. in: JASIST 54(2003) no.9, S.905-906 (C.A. Badurek): "Visual approaches for knowledge discovery in very large databases are a prime research need for information scientists focused an extracting meaningful information from the ever growing stores of data from a variety of domains, including business, the geosciences, and satellite and medical imagery. This work presents a summary of research efforts in the fields of data mining, knowledge discovery, and data visualization with the goal of aiding the integration of research approaches and techniques from these major fields. The editors, leading computer scientists from academia and industry, present a collection of 32 papers from contributors who are incorporating visualization and data mining techniques through academic research as well application development in industry and government agencies. Information Visualization focuses upon techniques to enhance the natural abilities of humans to visually understand data, in particular, large-scale data sets. It is primarily concerned with developing interactive graphical representations to enable users to more intuitively make sense of multidimensional data as part of the data exploration process. It includes research from computer science, psychology, human-computer interaction, statistics, and information science. Knowledge Discovery in Databases (KDD) most often refers to the process of mining databases for previously unknown patterns and trends in data. Data mining refers to the particular computational methods or algorithms used in this process. The data mining research field is most related to computational advances in database theory, artificial intelligence and machine learning. This work compiles research summaries from these main research areas in order to provide "a reference work containing the collection of thoughts and ideas of noted researchers from the fields of data mining and data visualization" (p. 8). It addresses these areas in three main sections: the first an data visualization, the second an KDD and model visualization, and the last an using visualization in the knowledge discovery process. The seven chapters of Part One focus upon methodologies and successful techniques from the field of Data Visualization. Hoffman and Grinstein (Chapter 2) give a particularly good overview of the field of data visualization and its potential application to data mining. An introduction to the terminology of data visualization, relation to perceptual and cognitive science, and discussion of the major visualization display techniques are presented. Discussion and illustration explain the usefulness and proper context of such data visualization techniques as scatter plots, 2D and 3D isosurfaces, glyphs, parallel coordinates, and radial coordinate visualizations. Remaining chapters present the need for standardization of visualization methods, discussion of user requirements in the development of tools, and examples of using information visualization in addressing research problems.
    In 13 chapters, Part Two provides an introduction to KDD, an overview of data mining techniques, and examples of the usefulness of data model visualizations. The importance of visualization throughout the KDD process is stressed in many of the chapters. In particular, the need for measures of visualization effectiveness, benchmarking for identifying best practices, and the use of standardized sample data sets is convincingly presented. Many of the important data mining approaches are discussed in this complementary context. Cluster and outlier detection, classification techniques, and rule discovery algorithms are presented as the basic techniques common to the KDD process. The potential effectiveness of using visualization in the data modeling process are illustrated in chapters focused an using visualization for helping users understand the KDD process, ask questions and form hypotheses about their data, and evaluate the accuracy and veracity of their results. The 11 chapters of Part Three provide an overview of the KDD process and successful approaches to integrating KDD, data mining, and visualization in complementary domains. Rhodes (Chapter 21) begins this section with an excellent overview of the relation between the KDD process and data mining techniques. He states that the "primary goals of data mining are to describe the existing data and to predict the behavior or characteristics of future data of the same type" (p. 281). These goals are met by data mining tasks such as classification, regression, clustering, summarization, dependency modeling, and change or deviation detection. Subsequent chapters demonstrate how visualization can aid users in the interactive process of knowledge discovery by graphically representing the results from these iterative tasks. Finally, examples of the usefulness of integrating visualization and data mining tools in the domain of business, imagery and text mining, and massive data sets are provided. This text concludes with a thorough and useful 17-page index and lengthy yet integrating 17-page summary of the academic and industrial backgrounds of the contributing authors. A 16-page set of color inserts provide a better representation of the visualizations discussed, and a URL provided suggests that readers may view all the book's figures in color on-line, although as of this submission date it only provides access to a summary of the book and its contents. The overall contribution of this work is its focus an bridging two distinct areas of research, making it a valuable addition to the Morgan Kaufmann Series in Database Management Systems. The editors of this text have met their main goal of providing the first textbook integrating knowledge discovery, data mining, and visualization. Although it contributes greatly to our under- standing of the development and current state of the field, a major weakness of this text is that there is no concluding chapter to discuss the contributions of the sum of these contributed papers or give direction to possible future areas of research. "Integration of expertise between two different disciplines is a difficult process of communication and reeducation. Integrating data mining and visualization is particularly complex because each of these fields in itself must draw an a wide range of research experience" (p. 300). Although this work contributes to the crossdisciplinary communication needed to advance visualization in KDD, a more formal call for an interdisciplinary research agenda in a concluding chapter would have provided a more satisfying conclusion to a very good introductory text.
    With contributors almost exclusively from the computer science field, the intended audience of this work is heavily slanted towards a computer science perspective. However, it is highly readable and provides introductory material that would be useful to information scientists from a variety of domains. Yet, much interesting work in information visualization from other fields could have been included giving the work more of an interdisciplinary perspective to complement their goals of integrating work in this area. Unfortunately, many of the application chapters are these, shallow, and lack complementary illustrations of visualization techniques or user interfaces used. However, they do provide insight into the many applications being developed in this rapidly expanding field. The authors have successfully put together a highly useful reference text for the data mining and information visualization communities. Those interested in a good introduction and overview of complementary research areas in these fields will be satisfied with this collection of papers. The focus upon integrating data visualization with data mining complements texts in each of these fields, such as Advances in Knowledge Discovery and Data Mining (Fayyad et al., MIT Press) and Readings in Information Visualization: Using Vision to Think (Card et. al., Morgan Kauffman). This unique work is a good starting point for future interaction between researchers in the fields of data visualization and data mining and makes a good accompaniment for a course focused an integrating these areas or to the main reference texts in these fields."
    LCSH
    Information visualization
    RSWK
    Information Retrieval (BVB)
    Subject
    Information Retrieval (BVB)
    Information visualization
  2. Saz, J.T.: Perspectivas en recuperacion y explotacion de informacion electronica : el 'data mining' (1997) 0.05
    0.045138486 = product of:
      0.13541545 = sum of:
        0.013190207 = weight(_text_:information in 3723) [ClassicSimilarity], result of:
          0.013190207 = score(doc=3723,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.19395474 = fieldWeight in 3723, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=3723)
        0.03916402 = weight(_text_:retrieval in 3723) [ClassicSimilarity], result of:
          0.03916402 = score(doc=3723,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.33420905 = fieldWeight in 3723, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=3723)
        0.08306122 = weight(_text_:techniques in 3723) [ClassicSimilarity], result of:
          0.08306122 = score(doc=3723,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.4867139 = fieldWeight in 3723, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.078125 = fieldNorm(doc=3723)
      0.33333334 = coord(3/9)
    
    Abstract
    Presents the concept and the techniques identified by the term data mining. Explains the principles and phases of developing a data mining process, and the main types of data mining tools
    Footnote
    Übers. des Titels: Perspectives on the retrieval and exploitation of electronic information: data mining
  3. Analytische Informationssysteme : Data Warehouse, On-Line Analytical Processing, Data Mining (1999) 0.04
    0.03831773 = product of:
      0.17242979 = sum of:
        0.16319664 = weight(_text_:line in 1381) [ClassicSimilarity], result of:
          0.16319664 = score(doc=1381,freq=6.0), product of:
            0.21724595 = queryWeight, product of:
              5.6078424 = idf(docFreq=440, maxDocs=44218)
              0.038739666 = queryNorm
            0.7512068 = fieldWeight in 1381, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.6078424 = idf(docFreq=440, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1381)
        0.009233146 = weight(_text_:information in 1381) [ClassicSimilarity], result of:
          0.009233146 = score(doc=1381,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.13576832 = fieldWeight in 1381, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1381)
      0.22222222 = coord(2/9)
    
    Abstract
    Neben den operativen Informationssystemen, welche die Abwicklung des betrieblichen Tagesgeschäftes unterstützen, treten heute verstärkt Informationssysteme für analytische Aufgaben der Fach- und Führungskräfte in den Vordergrund. In fast allen Unternehmen werden derzeit Begriffe und Konzepte wie Data Warehouse, On-Line Analytical Processing und Data Mining diskutiert und die zugehörigen Produkte evaluiert. Vor diesem Hintergrund zielt der vorliegende Sammelband darauf ab, einen aktuellen Überblick über Technologien, Produkte und Trends zu bieten. Als Entscheidungsgrundlage für den Praktiker beim Aufbau und Einsatz derartiger analytischer Informationssysteme können die unterschiedlichen Beiträge aus Wirtschaft und Wissenschaft wertvolle Hilfestellung leisten.
    Content
    Grundlagen.- Data Warehouse.- On-line Analytical Processing.- Data Mining.- Betriebswirtschaftliche und strategische Aspekte.
    Theme
    Information Resources Management
  4. Lihui, C.; Lian, C.W.: Using Web structure and summarisation techniques for Web content mining (2005) 0.04
    0.038022097 = product of:
      0.11406629 = sum of:
        0.011423056 = weight(_text_:information in 1046) [ClassicSimilarity], result of:
          0.011423056 = score(doc=1046,freq=6.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.16796975 = fieldWeight in 1046, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1046)
        0.01958201 = weight(_text_:retrieval in 1046) [ClassicSimilarity], result of:
          0.01958201 = score(doc=1046,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.16710453 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1046)
        0.08306122 = weight(_text_:techniques in 1046) [ClassicSimilarity], result of:
          0.08306122 = score(doc=1046,freq=8.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.4867139 = fieldWeight in 1046, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1046)
      0.33333334 = coord(3/9)
    
    Abstract
    The dynamic nature and size of the Internet can result in difficulty finding relevant information. Most users typically express their information need via short queries to search engines and they often have to physically sift through the search results based on relevance ranking set by the search engines, making the process of relevance judgement time-consuming. In this paper, we describe a novel representation technique which makes use of the Web structure together with summarisation techniques to better represent knowledge in actual Web Documents. We named the proposed technique as Semantic Virtual Document (SVD). We will discuss how the proposed SVD can be used together with a suitable clustering algorithm to achieve an automatic content-based categorization of similar Web Documents. The auto-categorization facility as well as a "Tree-like" Graphical User Interface (GUI) for post-retrieval document browsing enhances the relevance judgement process for Internet users. Furthermore, we will introduce how our cluster-biased automatic query expansion technique can be used to overcome the ambiguity of short queries typically given by users. We will outline our experimental design to evaluate the effectiveness of the proposed SVD for representation and present a prototype called iSEARCH (Intelligent SEarch And Review of Cluster Hierarchy) for Web content mining. Our results confirm, quantify and extend previous research using Web structure and summarisation techniques, introducing novel techniques for knowledge representation to enhance Web content mining.
    Source
    Information processing and management. 41(2005) no.5, S.1225-1242
  5. Analytische Informationssysteme : Data Warehouse, On-Line Analytical Processing, Data Mining (1998) 0.04
    0.036186066 = product of:
      0.1628373 = sum of:
        0.15228513 = weight(_text_:line in 1380) [ClassicSimilarity], result of:
          0.15228513 = score(doc=1380,freq=4.0), product of:
            0.21724595 = queryWeight, product of:
              5.6078424 = idf(docFreq=440, maxDocs=44218)
              0.038739666 = queryNorm
            0.7009803 = fieldWeight in 1380, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.6078424 = idf(docFreq=440, maxDocs=44218)
              0.0625 = fieldNorm(doc=1380)
        0.010552166 = weight(_text_:information in 1380) [ClassicSimilarity], result of:
          0.010552166 = score(doc=1380,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.1551638 = fieldWeight in 1380, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=1380)
      0.22222222 = coord(2/9)
    
    Abstract
    Neben den operativen Informationssystemen treten heute verstärkt Informationssysteme für die analytischen Aufgaben der Fach- und Führungskräfte in den Vordergrund. In fast allen Unternehmen werden derzeit Begriffe und Konzepte wie Data Warehouse, On-Line Analytical Processing und Data Mining diskutiert und die zugehörigen Produkte evaluiert. Vor diesem Hintergrund zielt der vorliegende Sammelband darauf, einen aktuellen Überblick über Technologien, Produkte und Trends zu bieten. Als Entscheidungsgrundlage für den Praktiker beim Aufbau und Einsatz derartiger analytischer Informationssysteme können die unterschiedlichen Beiträge aus Wirtschaft und Wissenschaft wertvolle Hilfestellung leisten
    Theme
    Information Resources Management
  6. Sánchez, D.; Chamorro-Martínez, J.; Vila, M.A.: Modelling subjectivity in visual perception of orientation for image retrieval (2003) 0.03
    0.034915894 = product of:
      0.104747675 = sum of:
        0.007914125 = weight(_text_:information in 1067) [ClassicSimilarity], result of:
          0.007914125 = score(doc=1067,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.116372846 = fieldWeight in 1067, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1067)
        0.046996824 = weight(_text_:retrieval in 1067) [ClassicSimilarity], result of:
          0.046996824 = score(doc=1067,freq=8.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.40105087 = fieldWeight in 1067, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=1067)
        0.049836725 = weight(_text_:techniques in 1067) [ClassicSimilarity], result of:
          0.049836725 = score(doc=1067,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.2920283 = fieldWeight in 1067, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.046875 = fieldNorm(doc=1067)
      0.33333334 = coord(3/9)
    
    Abstract
    In this paper we combine computer vision and data mining techniques to model high-level concepts for image retrieval, on the basis of basic perceptual features of the human visual system. High-level concepts related to these features are learned and represented by means of a set of fuzzy association rules. The concepts so acquired can be used for image retrieval with the advantage that it is not needed to provide an image as a query. Instead, a query is formulated by using the labels that identify the learned concepts as search terms, and the retrieval process calculates the relevance of an image to the query by an inference mechanism. An additional feature of our methodology is that it can capture user's subjectivity. For that purpose, fuzzy sets theory is employed to measure user's assessments about the fulfillment of a concept by an image.
    Source
    Information processing and management. 39(2003) no.2, S.251-266
  7. Sarnikar, S.; Zhang, Z.; Zhao, J.L.: Query-performance prediction for effective query routing in domain-specific repositories (2014) 0.03
    0.034748282 = product of:
      0.10424484 = sum of:
        0.013707667 = weight(_text_:information in 1326) [ClassicSimilarity], result of:
          0.013707667 = score(doc=1326,freq=6.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.20156369 = fieldWeight in 1326, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1326)
        0.040700447 = weight(_text_:retrieval in 1326) [ClassicSimilarity], result of:
          0.040700447 = score(doc=1326,freq=6.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.34732026 = fieldWeight in 1326, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=1326)
        0.049836725 = weight(_text_:techniques in 1326) [ClassicSimilarity], result of:
          0.049836725 = score(doc=1326,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.2920283 = fieldWeight in 1326, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.046875 = fieldNorm(doc=1326)
      0.33333334 = coord(3/9)
    
    Abstract
    The effective use of corporate memory is becoming increasingly important because every aspect of e-business requires access to information repositories. Unfortunately, less-than-satisfying effectiveness in state-of-the-art information-retrieval techniques is well known, even for some of the best search engines such as Google. In this study, the authors resolve this retrieval ineffectiveness problem by developing a new framework for predicting query performance, which is the first step toward better retrieval effectiveness. Specifically, they examine the relationship between query performance and query context. A query context consists of the query itself, the document collection, and the interaction between the two. The authors first analyze the characteristics of query context and develop various features for predicting query performance. Then, they propose a context-sensitive model for predicting query performance based on the characteristics of the query and the document collection. Finally, they validate this model with respect to five real-world collections of documents and demonstrate its utility in routing queries to the correct repository with high accuracy.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.8, S.1597-1614
  8. Gaizauskas, R.; Wilks, Y.: Information extraction : beyond document retrieval (1998) 0.03
    0.032965586 = product of:
      0.09889675 = sum of:
        0.01582825 = weight(_text_:information in 4716) [ClassicSimilarity], result of:
          0.01582825 = score(doc=4716,freq=8.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.23274569 = fieldWeight in 4716, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=4716)
        0.033231772 = weight(_text_:retrieval in 4716) [ClassicSimilarity], result of:
          0.033231772 = score(doc=4716,freq=4.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.2835858 = fieldWeight in 4716, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=4716)
        0.049836725 = weight(_text_:techniques in 4716) [ClassicSimilarity], result of:
          0.049836725 = score(doc=4716,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.2920283 = fieldWeight in 4716, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.046875 = fieldNorm(doc=4716)
      0.33333334 = coord(3/9)
    
    Abstract
    In this paper we give a synoptic view of the growth of the text processing technology of informatione xtraction (IE) whose function is to extract information about a pre-specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960s and 70s till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining
  9. Berry, M.W.; Esau, R.; Kiefer, B.: ¬The use of text mining techniques in electronic discovery for legal matters (2012) 0.03
    0.031420253 = product of:
      0.09426076 = sum of:
        0.011192262 = weight(_text_:information in 91) [ClassicSimilarity], result of:
          0.011192262 = score(doc=91,freq=4.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.16457605 = fieldWeight in 91, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=91)
        0.033231772 = weight(_text_:retrieval in 91) [ClassicSimilarity], result of:
          0.033231772 = score(doc=91,freq=4.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.2835858 = fieldWeight in 91, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=91)
        0.049836725 = weight(_text_:techniques in 91) [ClassicSimilarity], result of:
          0.049836725 = score(doc=91,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.2920283 = fieldWeight in 91, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.046875 = fieldNorm(doc=91)
      0.33333334 = coord(3/9)
    
    Abstract
    Electronic discovery (eDiscovery) is the process of collecting and analyzing electronic documents to determine their relevance to a legal matter. Office technology has advanced and eased the requirements necessary to create a document. As such, the volume of data has outgrown the manual processes previously used to make relevance judgments. Methods of text mining and information retrieval have been put to use in eDiscovery to help tame the volume of data; however, the results have been uneven. This chapter looks at the historical bias of the collection process. The authors examine how tools like classifiers, latent semantic analysis, and non-negative matrix factorization deal with nuances of the collection process.
    Source
    Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a
  10. Chen, Y.-L.; Liu, Y.-H.; Ho, W.-L.: ¬A text mining approach to assist the general public in the retrieval of legal documents (2013) 0.03
    0.030327542 = product of:
      0.09098262 = sum of:
        0.007914125 = weight(_text_:information in 521) [ClassicSimilarity], result of:
          0.007914125 = score(doc=521,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.116372846 = fieldWeight in 521, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=521)
        0.033231772 = weight(_text_:retrieval in 521) [ClassicSimilarity], result of:
          0.033231772 = score(doc=521,freq=4.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.2835858 = fieldWeight in 521, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=521)
        0.049836725 = weight(_text_:techniques in 521) [ClassicSimilarity], result of:
          0.049836725 = score(doc=521,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.2920283 = fieldWeight in 521, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.046875 = fieldNorm(doc=521)
      0.33333334 = coord(3/9)
    
    Abstract
    Applying text mining techniques to legal issues has been an emerging research topic in recent years. Although some previous studies focused on assisting professionals in the retrieval of related legal documents, they did not take into account the general public and their difficulty in describing legal problems in professional legal terms. Because this problem has not been addressed by previous research, this study aims to design a text-mining-based method that allows the general public to use everyday vocabulary to search for and retrieve criminal judgments. The experimental results indicate that our method can help the general public, who are not familiar with professional legal terms, to acquire relevant criminal judgments more accurately and effectively.
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.2, S.280-290
  11. Haravu, L.J.; Neelameghan, A.: Text mining and data mining in knowledge organization and discovery : the making of knowledge-based products (2003) 0.03
    0.029912738 = product of:
      0.08973821 = sum of:
        0.011423056 = weight(_text_:information in 5653) [ClassicSimilarity], result of:
          0.011423056 = score(doc=5653,freq=6.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.16796975 = fieldWeight in 5653, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5653)
        0.01958201 = weight(_text_:retrieval in 5653) [ClassicSimilarity], result of:
          0.01958201 = score(doc=5653,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.16710453 = fieldWeight in 5653, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5653)
        0.058733147 = weight(_text_:techniques in 5653) [ClassicSimilarity], result of:
          0.058733147 = score(doc=5653,freq=4.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.34415868 = fieldWeight in 5653, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5653)
      0.33333334 = coord(3/9)
    
    Abstract
    Discusses the importance of knowledge organization in the context of the information overload caused by the vast quantities of data and information accessible on internal and external networks of an organization. Defines the characteristics of a knowledge-based product. Elaborates on the techniques and applications of text mining in developing knowledge products. Presents two approaches, as case studies, to the making of knowledge products: (1) steps and processes in the planning, designing and development of a composite multilingual multimedia CD product, with the potential international, inter-cultural end users in view, and (2) application of natural language processing software in text mining. Using a text mining software, it is possible to link concept terms from a processed text to a related thesaurus, glossary, schedules of a classification scheme, and facet structured subject representations. Concludes that the products of text mining and data mining could be made more useful if the features of a faceted scheme for subject classification are incorporated into text mining techniques and products.
    Content
    Beitrag eines Themenheftes "Knowledge organization and classification in international information retrieval"
  12. KDD : techniques and applications (1998) 0.03
    0.02914791 = product of:
      0.1311656 = sum of:
        0.09967345 = weight(_text_:techniques in 6783) [ClassicSimilarity], result of:
          0.09967345 = score(doc=6783,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.5840566 = fieldWeight in 6783, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.09375 = fieldNorm(doc=6783)
        0.03149214 = product of:
          0.06298428 = sum of:
            0.06298428 = weight(_text_:22 in 6783) [ClassicSimilarity], result of:
              0.06298428 = score(doc=6783,freq=2.0), product of:
                0.13565971 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038739666 = queryNorm
                0.46428138 = fieldWeight in 6783, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6783)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Footnote
    A special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997
  13. Wang, W.M.; Cheung, C.F.; Lee, W.B.; Kwok, S.K.: Mining knowledge from natural language texts using fuzzy associated concept mapping (2008) 0.03
    0.027222618 = product of:
      0.081667855 = sum of:
        0.009138444 = weight(_text_:information in 2121) [ClassicSimilarity], result of:
          0.009138444 = score(doc=2121,freq=6.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.1343758 = fieldWeight in 2121, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=2121)
        0.046986517 = weight(_text_:techniques in 2121) [ClassicSimilarity], result of:
          0.046986517 = score(doc=2121,freq=4.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.27532694 = fieldWeight in 2121, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.03125 = fieldNorm(doc=2121)
        0.025542893 = product of:
          0.051085785 = sum of:
            0.051085785 = weight(_text_:theories in 2121) [ClassicSimilarity], result of:
              0.051085785 = score(doc=2121,freq=2.0), product of:
                0.21161452 = queryWeight, product of:
                  5.4624767 = idf(docFreq=509, maxDocs=44218)
                  0.038739666 = queryNorm
                0.24140964 = fieldWeight in 2121, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4624767 = idf(docFreq=509, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2121)
          0.5 = coord(1/2)
      0.33333334 = coord(3/9)
    
    Abstract
    Natural Language Processing (NLP) techniques have been successfully used to automatically extract information from unstructured text through a detailed analysis of their content, often to satisfy particular information needs. In this paper, an automatic concept map construction technique, Fuzzy Association Concept Mapping (FACM), is proposed for the conversion of abstracted short texts into concept maps. The approach consists of a linguistic module and a recommendation module. The linguistic module is a text mining method that does not require the use to have any prior knowledge about using NLP techniques. It incorporates rule-based reasoning (RBR) and case based reasoning (CBR) for anaphoric resolution. It aims at extracting the propositions in text so as to construct a concept map automatically. The recommendation module is arrived at by adopting fuzzy set theories. It is an interactive process which provides suggestions of propositions for further human refinement of the automatically generated concept maps. The suggested propositions are relationships among the concepts which are not explicitly found in the paragraphs. This technique helps to stimulate individual reflection and generate new knowledge. Evaluation was carried out by using the Science Citation Index (SCI) abstract database and CNET News as test data, which are well known databases and the quality of the text is assured. Experimental results show that the automatically generated concept maps conform to the outputs generated manually by domain experts, since the degree of difference between them is proportionally small. The method provides users with the ability to convert scientific and short texts into a structured format which can be easily processed by computer. Moreover, it provides knowledge workers with extra time to re-think their written text and to view their knowledge from another angle.
    Source
    Information processing and management. 44(2008) no.5, S.1707-1719
  14. Shi, X.; Yang, C.C.: Mining related queries from Web search engine query logs using an improved association rule mining model (2007) 0.03
    0.025286574 = product of:
      0.07585972 = sum of:
        0.0147471 = weight(_text_:information in 597) [ClassicSimilarity], result of:
          0.0147471 = score(doc=597,freq=10.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.21684799 = fieldWeight in 597, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=597)
        0.01958201 = weight(_text_:retrieval in 597) [ClassicSimilarity], result of:
          0.01958201 = score(doc=597,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.16710453 = fieldWeight in 597, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=597)
        0.04153061 = weight(_text_:techniques in 597) [ClassicSimilarity], result of:
          0.04153061 = score(doc=597,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.24335694 = fieldWeight in 597, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0390625 = fieldNorm(doc=597)
      0.33333334 = coord(3/9)
    
    Abstract
    With the overwhelming volume of information, the task of finding relevant information on a given topic on the Web is becoming increasingly difficult. Web search engines hence become one of the most popular solutions available on the Web. However, it has never been easy for novice users to organize and represent their information needs using simple queries. Users have to keep modifying their input queries until they get expected results. Therefore, it is often desirable for search engines to give suggestions on related queries to users. Besides, by identifying those related queries, search engines can potentially perform optimizations on their systems, such as query expansion and file indexing. In this work we propose a method that suggests a list of related queries given an initial input query. The related queries are based in the query log of previously submitted queries by human users, which can be identified using an enhanced model of association rules. Users can utilize the suggested related queries to tune or redirect the search process. Our method not only discovers the related queries, but also ranks them according to the degree of their relatedness. Unlike many other rival techniques, it also performs reasonably well on less frequent input queries.
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
    Source
    Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1871-1883
  15. Liu, Y.; Huang, X.; An, A.: Personalized recommendation with adaptive mixture of markov models (2007) 0.02
    0.02417856 = product of:
      0.07253568 = sum of:
        0.011423056 = weight(_text_:information in 606) [ClassicSimilarity], result of:
          0.011423056 = score(doc=606,freq=6.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.16796975 = fieldWeight in 606, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=606)
        0.01958201 = weight(_text_:retrieval in 606) [ClassicSimilarity], result of:
          0.01958201 = score(doc=606,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.16710453 = fieldWeight in 606, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=606)
        0.04153061 = weight(_text_:techniques in 606) [ClassicSimilarity], result of:
          0.04153061 = score(doc=606,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.24335694 = fieldWeight in 606, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0390625 = fieldNorm(doc=606)
      0.33333334 = coord(3/9)
    
    Abstract
    With more and more information available on the Internet, the task of making personalized recommendations to assist the user's navigation has become increasingly important. Considering there might be millions of users with different backgrounds accessing a Web site everyday, it is infeasible to build a separate recommendation system for each user. To address this problem, clustering techniques can first be employed to discover user groups. Then, user navigation patterns for each group can be discovered, to allow the adaptation of a Web site to the interest of each individual group. In this paper, we propose to model user access sequences as stochastic processes, and a mixture of Markov models based approach is taken to cluster users and to capture the sequential relationships inherent in user access histories. Several important issues that arise in constructing the Markov models are also addressed. The first issue lies in the complexity of the mixture of Markov models. To improve the efficiency of building/maintaining the mixture of Markov models, we develop a lightweight adapt-ive algorithm to update the model parameters without recomputing model parameters from scratch. The second issue concerns the proper selection of training data for building the mixture of Markov models. We investigate two different training data selection strategies and perform extensive experiments to compare their effectiveness on a real dataset that is generated by a Web-based knowledge management system, Livelink.
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
    Source
    Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1851-1870
  16. Chakrabarti, S.: Mining the Web : discovering knowledge from hypertext data (2003) 0.02
    0.02393019 = product of:
      0.07179057 = sum of:
        0.009138444 = weight(_text_:information in 2222) [ClassicSimilarity], result of:
          0.009138444 = score(doc=2222,freq=6.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.1343758 = fieldWeight in 2222, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=2222)
        0.015665608 = weight(_text_:retrieval in 2222) [ClassicSimilarity], result of:
          0.015665608 = score(doc=2222,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.13368362 = fieldWeight in 2222, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03125 = fieldNorm(doc=2222)
        0.046986517 = weight(_text_:techniques in 2222) [ClassicSimilarity], result of:
          0.046986517 = score(doc=2222,freq=4.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.27532694 = fieldWeight in 2222, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.03125 = fieldNorm(doc=2222)
      0.33333334 = coord(3/9)
    
    Footnote
    Rez. in: JASIST 55(2004) no.3, S.275-276 (C. Chen): "This is a book about finding significant statistical patterns on the Web - in particular, patterns that are associated with hypertext documents, topics, hyperlinks, and queries. The term pattern in this book refers to dependencies among such items. On the one hand, the Web contains useful information an just about every topic under the sun. On the other hand, just like searching for a needle in a haystack, one would need powerful tools to locate useful information an the vast land of the Web. Soumen Chakrabarti's book focuses an a wide range of techniques for machine learning and data mining an the Web. The goal of the book is to provide both the technical Background and tools and tricks of the trade of Web content mining. Much of the technical content reflects the state of the art between 1995 and 2002. The targeted audience is researchers and innovative developers in this area, as well as newcomers who intend to enter this area. The book begins with an introduction chapter. The introduction chapter explains fundamental concepts such as crawling and indexing as well as clustering and classification. The remaining eight chapters are organized into three parts: i) infrastructure, ii) learning and iii) applications.
    Part I, Infrastructure, has two chapters: Chapter 2 on crawling the Web and Chapter 3 an Web search and information retrieval. The second part of the book, containing chapters 4, 5, and 6, is the centerpiece. This part specifically focuses an machine learning in the context of hypertext. Part III is a collection of applications that utilize the techniques described in earlier chapters. Chapter 7 is an social network analysis. Chapter 8 is an resource discovery. Chapter 9 is an the future of Web mining. Overall, this is a valuable reference book for researchers and developers in the field of Web mining. It should be particularly useful for those who would like to design and probably code their own Computer programs out of the equations and pseudocodes an most of the pages. For a student, the most valuable feature of the book is perhaps the formal and consistent treatments of concepts across the board. For what is behind and beyond the technical details, one has to either dig deeper into the bibliographic notes at the end of each chapter, or resort to more in-depth analysis of relevant subjects in the literature. lf you are looking for successful stories about Web mining or hard-way-learned lessons of failures, this is not the book."
  17. Liu, B.: Web data mining : exploring hyperlinks, contents, and usage data (2011) 0.02
    0.02393019 = product of:
      0.07179057 = sum of:
        0.009138444 = weight(_text_:information in 354) [ClassicSimilarity], result of:
          0.009138444 = score(doc=354,freq=6.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.1343758 = fieldWeight in 354, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=354)
        0.015665608 = weight(_text_:retrieval in 354) [ClassicSimilarity], result of:
          0.015665608 = score(doc=354,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.13368362 = fieldWeight in 354, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03125 = fieldNorm(doc=354)
        0.046986517 = weight(_text_:techniques in 354) [ClassicSimilarity], result of:
          0.046986517 = score(doc=354,freq=4.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.27532694 = fieldWeight in 354, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.03125 = fieldNorm(doc=354)
      0.33333334 = coord(3/9)
    
    Abstract
    Web mining aims to discover useful information and knowledge from the Web hyperlink structure, page contents, and usage data. Although Web mining uses many conventional data mining techniques, it is not purely an application of traditional data mining due to the semistructured and unstructured nature of the Web data and its heterogeneity. It has also developed many of its own algorithms and techniques. Liu has written a comprehensive text on Web data mining. Key topics of structure mining, content mining, and usage mining are covered both in breadth and in depth. His book brings together all the essential concepts and algorithms from related areas such as data mining, machine learning, and text processing to form an authoritative and coherent text. The book offers a rich blend of theory and practice, addressing seminal research ideas, as well as examining the technology from a practical point of view. It is suitable for students, researchers and practitioners interested in Web mining both as a learning text and a reference book. Lecturers can readily use it for classes on data mining, Web mining, and Web search. Additional teaching materials such as lecture slides, datasets, and implemented algorithms are available online.
    Content
    Inhalt: 1. Introduction 2. Association Rules and Sequential Patterns 3. Supervised Learning 4. Unsupervised Learning 5. Partially Supervised Learning 6. Information Retrieval and Web Search 7. Social Network Analysis 8. Web Crawling 9. Structured Data Extraction: Wrapper Generation 10. Information Integration
  18. Gill, A.J.; Hinrichs-Krapels, S.; Blanke, T.; Grant, J.; Hedges, M.; Tanner, S.: Insight workflow : systematically combining human and computational methods to explore textual data (2017) 0.02
    0.018057778 = product of:
      0.08126 = sum of:
        0.009326885 = weight(_text_:information in 3682) [ClassicSimilarity], result of:
          0.009326885 = score(doc=3682,freq=4.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.13714671 = fieldWeight in 3682, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3682)
        0.07193312 = weight(_text_:techniques in 3682) [ClassicSimilarity], result of:
          0.07193312 = score(doc=3682,freq=6.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.42150658 = fieldWeight in 3682, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3682)
      0.22222222 = coord(2/9)
    
    Abstract
    Analyzing large quantities of real-world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real-world data set as our case study, and use this exploration as a demonstration of our "insight workflow," which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in-depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.
    Source
    Journal of the Association for Information Science and Technology. 68(2017) no.7, S.1671-1686
  19. Frické, M.: Big data and its epistemology (2015) 0.02
    0.016505891 = product of:
      0.074276514 = sum of:
        0.007914125 = weight(_text_:information in 1811) [ClassicSimilarity], result of:
          0.007914125 = score(doc=1811,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.116372846 = fieldWeight in 1811, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1811)
        0.06636239 = product of:
          0.13272478 = sum of:
            0.13272478 = weight(_text_:theories in 1811) [ClassicSimilarity], result of:
              0.13272478 = score(doc=1811,freq=6.0), product of:
                0.21161452 = queryWeight, product of:
                  5.4624767 = idf(docFreq=509, maxDocs=44218)
                  0.038739666 = queryNorm
                0.6272007 = fieldWeight in 1811, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.4624767 = idf(docFreq=509, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1811)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    The article considers whether Big Data, in the form of data-driven science, will enable the discovery, or appraisal, of universal scientific theories, instrumentalist tools, or inductive inferences. It points out, initially, that such aspirations are similar to the now-discredited inductivist approach to science. On the positive side, Big Data may permit larger sample sizes, cheaper and more extensive testing of theories, and the continuous assessment of theories. On the negative side, data-driven science encourages passive data collection, as opposed to experimentation and testing, and hornswoggling ("unsound statistical fiddling"). The roles of theory and data in inductive algorithms, statistical modeling, and scientific discoveries are analyzed, and it is argued that theory is needed at every turn. Data-driven science is a chimera.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.4, S.651-661
  20. Zhou, L.; Chaovalit, P.: Ontology-supported polarity mining (2008) 0.02
    0.016474472 = product of:
      0.074135125 = sum of:
        0.015992278 = weight(_text_:information in 1343) [ClassicSimilarity], result of:
          0.015992278 = score(doc=1343,freq=6.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.23515764 = fieldWeight in 1343, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1343)
        0.05814285 = weight(_text_:techniques in 1343) [ClassicSimilarity], result of:
          0.05814285 = score(doc=1343,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.3406997 = fieldWeight in 1343, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1343)
      0.22222222 = coord(2/9)
    
    Abstract
    Polarity mining provides an in-depth analysis of semantic orientations of text information. Motivated by its success in the area of topic mining, we propose an ontology-supported polarity mining (OSPM) approach. The approach aims to enhance polarity mining with ontology by providing detailed topic-specific information. OSPM was evaluated in the movie review domain using both supervised and unsupervised techniques. Results revealed that OSPM outperformed the baseline method without ontology support. The findings of this study not only advance the state of polarity mining research but also shed light on future research directions.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.1, S.98-110

Years

Languages

  • e 116
  • d 16
  • sp 1
  • More… Less…

Types

  • a 114
  • m 15
  • s 13
  • el 9
  • x 1
  • More… Less…