Search (7 results, page 1 of 1)

Perugini, S.; Ramakrishnan, N.: Mining Web functional dependencies for flexible information access (2007) 0.03
```
0.027947066 = product of:
  0.055894133 = sum of:
    0.055894133 = product of:
      0.111788265 = sum of:
        0.111788265 = weight(_text_:y in 602) [ClassicSimilarity], result of:
          0.111788265 = score(doc=602,freq=4.0), product of:
            0.24777827 = queryWeight, product of:
              4.8124003 = idf(docFreq=976, maxDocs=44218)
              0.05148746 = queryNorm
            0.45116252 = fieldWeight in 602, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.8124003 = idf(docFreq=976, maxDocs=44218)
              0.046875 = fieldNorm(doc=602)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We present an approach to enhancing information access through Web structure mining in contrast to traditional approaches involving usage mining. Specifically, we mine the hardwired hierarchical hyperlink structure of Web sites to identify patterns of term-term co-occurrences we call Web functional dependencies (FDs). Intuitively, a Web FD x -> y declares that all paths through a site involving a hyperlink labeled x also contain a hyperlink labeled y. The complete set of FDs satisfied by a site help characterize (flexible and expressive) interaction paradigms supported by a site, where a paradigm is the set of explorable sequences therein. We describe algorithms for mining FDs and results from mining several hierarchical Web sites and present several interface designs that can exploit such FDs to provide compelling user experiences.

Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004) 0.02

0.01976156 = product of:
  0.03952312 = sum of:
    0.03952312 = product of:
      0.07904624 = sum of:
        0.07904624 = weight(_text_:y in 2563) [ClassicSimilarity], result of:
          0.07904624 = score(doc=2563,freq=2.0), product of:
            0.24777827 = queryWeight, product of:
              4.8124003 = idf(docFreq=976, maxDocs=44218)
              0.05148746 = queryNorm
            0.3190201 = fieldWeight in 2563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8124003 = idf(docFreq=976, maxDocs=44218)
              0.046875 = fieldNorm(doc=2563)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Liu, Y.; Huang, X.; An, A.: Personalized recommendation with adaptive mixture of markov models (2007) 0.02

0.016467968 = product of:
  0.032935936 = sum of:
    0.032935936 = product of:
      0.06587187 = sum of:
        0.06587187 = weight(_text_:y in 606) [ClassicSimilarity], result of:
          0.06587187 = score(doc=606,freq=2.0), product of:
            0.24777827 = queryWeight, product of:
              4.8124003 = idf(docFreq=976, maxDocs=44218)
              0.05148746 = queryNorm
            0.26585007 = fieldWeight in 606, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8124003 = idf(docFreq=976, maxDocs=44218)
              0.0390625 = fieldNorm(doc=606)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Liu, Y.; Zhang, M.; Cen, R.; Ru, L.; Ma, S.: Data cleansing for Web information retrieval using query independent features (2007) 0.02

0.016467968 = product of:
  0.032935936 = sum of:
    0.032935936 = product of:
      0.06587187 = sum of:
        0.06587187 = weight(_text_:y in 607) [ClassicSimilarity], result of:
          0.06587187 = score(doc=607,freq=2.0), product of:
            0.24777827 = queryWeight, product of:
              4.8124003 = idf(docFreq=976, maxDocs=44218)
              0.05148746 = queryNorm
            0.26585007 = fieldWeight in 607, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8124003 = idf(docFreq=976, maxDocs=44218)
              0.0390625 = fieldNorm(doc=607)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Information visualization in data mining and knowledge discovery (2002) 0.02
```
0.015335628 = sum of:
  0.008359788 = product of:
    0.033439152 = sum of:
      0.033439152 = weight(_text_:authors in 1789) [ClassicSimilarity], result of:
        0.033439152 = score(doc=1789,freq=4.0), product of:
          0.23472176 = queryWeight, product of:
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.05148746 = queryNorm
          0.14246294 = fieldWeight in 1789, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            4.558814 = idf(docFreq=1258, maxDocs=44218)
            0.015625 = fieldNorm(doc=1789)
    0.25 = coord(1/4)
  0.00697584 = product of:
    0.01395168 = sum of:
      0.01395168 = weight(_text_:22 in 1789) [ClassicSimilarity], result of:
        0.01395168 = score(doc=1789,freq=2.0), product of:
          0.18030031 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05148746 = queryNorm
          0.07738023 = fieldWeight in 1789, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.015625 = fieldNorm(doc=1789)
    0.5 = coord(1/2)
```
Date

23. 3.2008 19:10:22

Footnote

In 13 chapters, Part Two provides an introduction to KDD, an overview of data mining techniques, and examples of the usefulness of data model visualizations. The importance of visualization throughout the KDD process is stressed in many of the chapters. In particular, the need for measures of visualization effectiveness, benchmarking for identifying best practices, and the use of standardized sample data sets is convincingly presented. Many of the important data mining approaches are discussed in this complementary context. Cluster and outlier detection, classification techniques, and rule discovery algorithms are presented as the basic techniques common to the KDD process. The potential effectiveness of using visualization in the data modeling process are illustrated in chapters focused an using visualization for helping users understand the KDD process, ask questions and form hypotheses about their data, and evaluate the accuracy and veracity of their results. The 11 chapters of Part Three provide an overview of the KDD process and successful approaches to integrating KDD, data mining, and visualization in complementary domains. Rhodes (Chapter 21) begins this section with an excellent overview of the relation between the KDD process and data mining techniques. He states that the "primary goals of data mining are to describe the existing data and to predict the behavior or characteristics of future data of the same type" (p. 281). These goals are met by data mining tasks such as classification, regression, clustering, summarization, dependency modeling, and change or deviation detection. Subsequent chapters demonstrate how visualization can aid users in the interactive process of knowledge discovery by graphically representing the results from these iterative tasks. Finally, examples of the usefulness of integrating visualization and data mining tools in the domain of business, imagery and text mining, and massive data sets are provided. This text concludes with a thorough and useful 17-page index and lengthy yet integrating 17-page summary of the academic and industrial backgrounds of the contributing authors. A 16-page set of color inserts provide a better representation of the visualizations discussed, and a URL provided suggests that readers may view all the book's figures in color on-line, although as of this submission date it only provides access to a summary of the book and its contents. The overall contribution of this work is its focus an bridging two distinct areas of research, making it a valuable addition to the Morgan Kaufmann Series in Database Management Systems. The editors of this text have met their main goal of providing the first textbook integrating knowledge discovery, data mining, and visualization. Although it contributes greatly to our under- standing of the development and current state of the field, a major weakness of this text is that there is no concluding chapter to discuss the contributions of the sum of these contributed papers or give direction to possible future areas of research. "Integration of expertise between two different disciplines is a difficult process of communication and reeducation. Integrating data mining and visualization is particularly complex because each of these fields in itself must draw an a wide range of research experience" (p. 300). Although this work contributes to the crossdisciplinary communication needed to advance visualization in KDD, a more formal call for an interdisciplinary research agenda in a concluding chapter would have provided a more satisfying conclusion to a very good introductory text.
With contributors almost exclusively from the computer science field, the intended audience of this work is heavily slanted towards a computer science perspective. However, it is highly readable and provides introductory material that would be useful to information scientists from a variety of domains. Yet, much interesting work in information visualization from other fields could have been included giving the work more of an interdisciplinary perspective to complement their goals of integrating work in this area. Unfortunately, many of the application chapters are these, shallow, and lack complementary illustrations of visualization techniques or user interfaces used. However, they do provide insight into the many applications being developed in this rapidly expanding field. The authors have successfully put together a highly useful reference text for the data mining and information visualization communities. Those interested in a good introduction and overview of complementary research areas in these fields will be satisfied with this collection of papers. The focus upon integrating data visualization with data mining complements texts in each of these fields, such as Advances in Knowledge Discovery and Data Mining (Fayyad et al., MIT Press) and Readings in Information Visualization: Using Vision to Think (Card et. al., Morgan Kauffman). This unique work is a good starting point for future interaction between researchers in the fields of data visualization and data mining and makes a good accompaniment for a course focused an integrating these areas or to the main reference texts in these fields."
Chen, H.; Chau, M.: Web mining : machine learning for Web applications (2003) 0.01
```
0.008866894 = product of:
  0.017733788 = sum of:
    0.017733788 = product of:
      0.07093515 = sum of:
        0.07093515 = weight(_text_:authors in 4242) [ClassicSimilarity], result of:
          0.07093515 = score(doc=4242,freq=2.0), product of:
            0.23472176 = queryWeight, product of:
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.05148746 = queryNorm
            0.30220953 = fieldWeight in 4242, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.046875 = fieldNorm(doc=4242)
      0.25 = coord(1/4)
  0.5 = coord(1/2)
```
Abstract

With more than two billion pages created by millions of Web page authors and organizations, the World Wide Web is a tremendously rich knowledge base. The knowledge comes not only from the content of the pages themselves, but also from the unique characteristics of the Web, such as its hyperlink structure and its diversity of content and languages. Analysis of these characteristics often reveals interesting patterns and new knowledge. Such knowledge can be used to improve users' efficiency and effectiveness in searching for information an the Web, and also for applications unrelated to the Web, such as support for decision making or business management. The Web's size and its unstructured and dynamic content, as well as its multilingual nature, make the extraction of useful knowledge a challenging research problem. Furthermore, the Web generates a large amount of data in other formats that contain valuable information. For example, Web server logs' information about user access patterns can be used for information personalization or improving Web page design.
Schwartz, F.; Fang, Y.C.: Citation data analysis on hydrogeology (2007) 0.01
```
0.008359788 = product of:
  0.016719576 = sum of:
    0.016719576 = product of:
      0.066878304 = sum of:
        0.066878304 = weight(_text_:authors in 433) [ClassicSimilarity], result of:
          0.066878304 = score(doc=433,freq=4.0), product of:
            0.23472176 = queryWeight, product of:
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.05148746 = queryNorm
            0.28492588 = fieldWeight in 433, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.03125 = fieldNorm(doc=433)
      0.25 = coord(1/4)
  0.5 = coord(1/2)
```
Abstract

This article explores the status of research in hydrogeology using data mining techniques. First we try to explain what citation analysis is and review some of the previous work on citation analysis. The main idea in this article is to address some common issues about citation numbers and the use of these data. To validate the use of citation numbers, we compare the citation patterns for Water Resources Research papers in the 1980s with those in the 1990s. The citation growths for highly cited authors from the 1980s are used to examine whether it is possible to predict the citation patterns for highly-cited authors in the 1990s. If the citation data prove to be steady and stable, these numbers then can be used to explore the evolution of science in hydrogeology. The famous quotation, "If you are not the lead dog, the scenery never changes," attributed to Lee Iacocca, points to the importance of an entrepreneurial spirit in all forms of endeavor. In the case of hydrogeological research, impact analysis makes it clear how important it is to be a pioneer. Statistical correlation coefficients are used to retrieve papers among a collection of 2,847 papers before and after 1991 sharing the same topics with 273 papers in 1991 in Water Resources Research. The numbers of papers before and after 1991 are then plotted against various levels of citations for papers in 1991 to compare the distributions of paper population before and after that year. The similarity metrics based on word counts can ensure that the "before" papers are like ancestors and "after" papers are descendants in the same type of research. This exercise gives us an idea of how many papers are populated before and after 1991 (1991 is chosen based on balanced numbers of papers before and after that year). In addition, the impact of papers is measured in terms of citation presented as "percentile," a relative measure based on rankings in one year, in order to minimize the effect of time.

Search (7 results, page 1 of 1)

Authors

Types

Themes

Subjects

Classifications