Search (155 results, page 2 of 8)

  • × theme_ss:"Data Mining"
  1. Liu, W.; Weichselbraun, A.; Scharl, A.; Chang, E.: Semi-automatic ontology extension using spreading activation (2005) 0.00
    0.0031324127 = product of:
      0.0062648254 = sum of:
        0.0062648254 = product of:
          0.012529651 = sum of:
            0.012529651 = weight(_text_:a in 3028) [ClassicSimilarity], result of:
              0.012529651 = score(doc=3028,freq=14.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.23593865 = fieldWeight in 3028, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3028)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper describes a system to semi-automatically extend and refine ontologies by mining textual data from the Web sites of international online media. Expanding a seed ontology creates a semantic network through co-occurrence analysis, trigger phrase analysis, and disambiguation based on the WordNet lexical dictionary. Spreading activation then processes this semantic network to find the most probable candidates for inclusion in an extended ontology. Approaches to identifying hierarchical relationships such as subsumption, head noun analysis and WordNet consultation are used to confirm and classify the found relationships. Using a seed ontology on "climate change" as an example, this paper demonstrates how spreading activation improves the result by naturally integrating the mentioned methods.
    Type
    a
  2. Pons-Porrata, A.; Berlanga-Llavori, R.; Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques (2007) 0.00
    0.0030444188 = product of:
      0.0060888375 = sum of:
        0.0060888375 = product of:
          0.012177675 = sum of:
            0.012177675 = weight(_text_:a in 916) [ClassicSimilarity], result of:
              0.012177675 = score(doc=916,freq=18.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.22931081 = fieldWeight in 916, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=916)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper, we present a topic discovery system aimed to reveal the implicit knowledge present in news streams. This knowledge is expressed as a hierarchy of topic/subtopics, where each topic contains the set of documents that are related to it and a summary extracted from these documents. Summaries so built are useful to browse and select topics of interest from the generated hierarchies. Our proposal consists of a new incremental hierarchical clustering algorithm, which combines both partitional and agglomerative approaches, taking the main benefits from them. Finally, a new summarization method based on Testor Theory has been proposed to build the topic summaries. Experimental results in the TDT2 collection demonstrate its usefulness and effectiveness not only as a topic detection system, but also as a classification and summarization tool.
    Type
    a
  3. Berendt, B.; Krause, B.; Kolbe-Nusser, S.: Intelligent scientific authoring tools : interactive data mining for constructive uses of citation networks (2010) 0.00
    0.0030444188 = product of:
      0.0060888375 = sum of:
        0.0060888375 = product of:
          0.012177675 = sum of:
            0.012177675 = weight(_text_:a in 4226) [ClassicSimilarity], result of:
              0.012177675 = score(doc=4226,freq=18.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.22931081 = fieldWeight in 4226, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4226)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Many powerful methods and tools exist for extracting meaning from scientific publications, their texts, and their citation links. However, existing proposals often neglect a fundamental aspect of learning: that understanding and learning require an active and constructive exploration of a domain. In this paper, we describe a new method and a tool that use data mining and interactivity to turn the typical search and retrieve dialogue, in which the user asks questions and a system gives answers, into a dialogue that also involves sense-making, in which the user has to become active by constructing a bibliography and a domain model of the search term(s). This model starts from an automatically generated and annotated clustering solution that is iteratively modified by users. The tool is part of an integrated authoring system covering all phases from search through reading and sense-making to writing. Two evaluation studies demonstrate the usability of this interactive and constructive approach, and they show that clusters and groups represent identifiable sub-topics.
    Type
    a
  4. Wong, S.K.M.; Butz, C.J.; Xiang, X.: Automated database schema design using mined data dependencies (1998) 0.00
    0.0029000505 = product of:
      0.005800101 = sum of:
        0.005800101 = product of:
          0.011600202 = sum of:
            0.011600202 = weight(_text_:a in 2897) [ClassicSimilarity], result of:
              0.011600202 = score(doc=2897,freq=12.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.21843673 = fieldWeight in 2897, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2897)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Data dependencies are used in database schema design to enforce the correctness of a database as well as to reduce redundant data. These dependencies are usually determined from the semantics of the attributes and are then enforced upon the relations. Describes a bottom-up procedure for discovering multivalued dependencies in observed data without knowing a priori the relationships among the attributes. The proposed algorithm is an application of the technique designed for learning conditional independencies in probabilistic reasoning. A prototype system for automated database schema design has been implemented. Experiments were carried out to demonstrate both the effectiveness and efficiency of the method
    Footnote
    Contribution to a special issue devoted to knowledge discovery and data mining
    Type
    a
  5. Kong, S.; Ye, F.; Feng, L.; Zhao, Z.: Towards the prediction problems of bursting hashtags on Twitter (2015) 0.00
    0.0029000505 = product of:
      0.005800101 = sum of:
        0.005800101 = product of:
          0.011600202 = sum of:
            0.011600202 = weight(_text_:a in 2338) [ClassicSimilarity], result of:
              0.011600202 = score(doc=2338,freq=12.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.21843673 = fieldWeight in 2338, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Hundreds of thousands of hashtags are generated every day on Twitter. Only a few will burst and become trending topics. In this article, we provide the definition of a bursting hashtag and conduct a systematic study of a series of challenging prediction problems that span the entire life cycles of bursting hashtags. Around the problem of "how to build a system to predict bursting hashtags," we explore different types of features and present machine learning solutions. On real data sets from Twitter, experiments are conducted to evaluate the effectiveness of the proposed solutions and the contributions of features.
    Type
    a
  6. Perugini, S.; Ramakrishnan, N.: Mining Web functional dependencies for flexible information access (2007) 0.00
    0.0028703054 = product of:
      0.005740611 = sum of:
        0.005740611 = product of:
          0.011481222 = sum of:
            0.011481222 = weight(_text_:a in 602) [ClassicSimilarity], result of:
              0.011481222 = score(doc=602,freq=16.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.2161963 = fieldWeight in 602, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=602)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We present an approach to enhancing information access through Web structure mining in contrast to traditional approaches involving usage mining. Specifically, we mine the hardwired hierarchical hyperlink structure of Web sites to identify patterns of term-term co-occurrences we call Web functional dependencies (FDs). Intuitively, a Web FD x -> y declares that all paths through a site involving a hyperlink labeled x also contain a hyperlink labeled y. The complete set of FDs satisfied by a site help characterize (flexible and expressive) interaction paradigms supported by a site, where a paradigm is the set of explorable sequences therein. We describe algorithms for mining FDs and results from mining several hierarchical Web sites and present several interface designs that can exploit such FDs to provide compelling user experiences.
    Type
    a
  7. Kraker, P.; Kittel, C,; Enkhbayar, A.: Open Knowledge Maps : creating a visual interface to the world's scientific knowledge based on natural language processing (2016) 0.00
    0.0028703054 = product of:
      0.005740611 = sum of:
        0.005740611 = product of:
          0.011481222 = sum of:
            0.011481222 = weight(_text_:a in 3205) [ClassicSimilarity], result of:
              0.011481222 = score(doc=3205,freq=16.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.2161963 = fieldWeight in 3205, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3205)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The goal of Open Knowledge Maps is to create a visual interface to the world's scientific knowledge. The base for this visual interface consists of so-called knowledge maps, which enable the exploration of existing knowledge and the discovery of new knowledge. Our open source knowledge mapping software applies a mixture of summarization techniques and similarity measures on article metadata, which are iteratively chained together. After processing, the representation is saved in a database for use in a web visualization. In the future, we want to create a space for collective knowledge mapping that brings together individuals and communities involved in exploration and discovery. We want to enable people to guide each other in their discovery by collaboratively annotating and modifying the automatically created maps.
    Type
    a
  8. Maaten, L. van den; Hinton, G.: Visualizing non-metric similarities in multiple maps (2012) 0.00
    0.0028703054 = product of:
      0.005740611 = sum of:
        0.005740611 = product of:
          0.011481222 = sum of:
            0.011481222 = weight(_text_:a in 3884) [ClassicSimilarity], result of:
              0.011481222 = score(doc=3884,freq=16.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.2161963 = fieldWeight in 3884, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3884)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Techniques for multidimensional scaling visualize objects as points in a low-dimensional metric map. As a result, the visualizations are subject to the fundamental limitations of metric spaces. These limitations prevent multidimensional scaling from faithfully representing non-metric similarity data such as word associations or event co-occurrences. In particular, multidimensional scaling cannot faithfully represent intransitive pairwise similarities in a visualization, and it cannot faithfully visualize "central" objects. In this paper, we present an extension of a recently proposed multidimensional scaling technique called t-SNE. The extension aims to address the problems of traditional multidimensional scaling techniques when these techniques are used to visualize non-metric similarities. The new technique, called multiple maps t-SNE, alleviates these problems by constructing a collection of maps that reveal complementary structure in the similarity data. We apply multiple maps t-SNE to a large data set of word association data and to a data set of NIPS co-authorships, demonstrating its ability to successfully visualize non-metric similarities.
    Type
    a
  9. Nicholson, S.: Bibliomining for automated collection development in a digital library setting : using data mining to discover Web-based scholarly research works (2003) 0.00
    0.0028047764 = product of:
      0.005609553 = sum of:
        0.005609553 = product of:
          0.011219106 = sum of:
            0.011219106 = weight(_text_:a in 1867) [ClassicSimilarity], result of:
              0.011219106 = score(doc=1867,freq=22.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.21126054 = fieldWeight in 1867, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1867)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This research creates an intelligent agent for automated collection development in a digital library setting. It uses a predictive model based an facets of each Web page to select scholarly works. The criteria came from the academic library selection literature, and a Delphi study was used to refine the list to 41 criteria. A Perl program was designed to analyze a Web page for each criterion and applied to a large collection of scholarly and nonscholarly Web pages. Bibliomining, or data mining for libraries, was then used to create different classification models. Four techniques were used: logistic regression, nonparametric discriminant analysis, classification trees, and neural networks. Accuracy and return were used to judge the effectiveness of each model an test datasets. In addition, a set of problematic pages that were difficult to classify because of their similarity to scholarly research was gathered and classified using the models. The resulting models could be used in the selection process to automatically create a digital library of Webbased scholarly research works. In addition, the technique can be extended to create a digital library of any type of structured electronic information.
    Type
    a
  10. Varathan, K.D.; Giachanou, A.; Crestani, F.: Comparative opinion mining : a review (2017) 0.00
    0.0028047764 = product of:
      0.005609553 = sum of:
        0.005609553 = product of:
          0.011219106 = sum of:
            0.011219106 = weight(_text_:a in 3540) [ClassicSimilarity], result of:
              0.011219106 = score(doc=3540,freq=22.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.21126054 = fieldWeight in 3540, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3540)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Opinion mining refers to the use of natural language processing, text analysis, and computational linguistics to identify and extract subjective information in textual material. Opinion mining, also known as sentiment analysis, has received a lot of attention in recent times, as it provides a number of tools to analyze public opinion on a number of different topics. Comparative opinion mining is a subfield of opinion mining which deals with identifying and extracting information that is expressed in a comparative form (e.g., "paper X is better than the Y"). Comparative opinion mining plays a very important role when one tries to evaluate something because it provides a reference point for the comparison. This paper provides a review of the area of comparative opinion mining. It is the first review that cover specifically this topic as all previous reviews dealt mostly with general opinion mining. This survey covers comparative opinion mining from two different angles. One from the perspective of techniques and the other from the perspective of comparative opinion elements. It also incorporates preprocessing tools as well as data set that were used by past researchers that can be useful to future researchers in the field of comparative opinion mining.
    Type
    a
  11. Fayyad, U.M.; Djorgovski, S.G.; Weir, N.: From digitized images to online catalogs : data ming a sky server (1996) 0.00
    0.00270615 = product of:
      0.0054123 = sum of:
        0.0054123 = product of:
          0.0108246 = sum of:
            0.0108246 = weight(_text_:a in 6625) [ClassicSimilarity], result of:
              0.0108246 = score(doc=6625,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20383182 = fieldWeight in 6625, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6625)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Offers a data mining approach based on machine learning classification methods to the problem of automated cataloguing of online databases of digital images resulting from sky surveys. The SKICAT system automates the reduction and analysis of 3 terabytes of images expected to contain about 2 billion sky objects. It offers a solution to problems associated with the analysis of large data sets in science
    Type
    a
  12. Chen, Z.: Knowledge discovery and system-user partnership : on a production 'adversarial partnership' approach (1994) 0.00
    0.00270615 = product of:
      0.0054123 = sum of:
        0.0054123 = product of:
          0.0108246 = sum of:
            0.0108246 = weight(_text_:a in 6759) [ClassicSimilarity], result of:
              0.0108246 = score(doc=6759,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20383182 = fieldWeight in 6759, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6759)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Examines the relationship between systems and users from the knowledge discovery in databases or data mining perspecitives. A comprehensive study on knowledge discovery in human computer symbiosis is needed. Proposes a database-user adversarial partnership, which is general enough to cover knowledge discovery and security of issues related to databases and their users. It can be further generalized into system-user adversarial paertnership. Discusses opportunities provided by knowledge discovery techniques and potential social implications
    Type
    a
  13. Howlett, D.: Digging deep for treasure (1998) 0.00
    0.00270615 = product of:
      0.0054123 = sum of:
        0.0054123 = product of:
          0.0108246 = sum of:
            0.0108246 = weight(_text_:a in 4544) [ClassicSimilarity], result of:
              0.0108246 = score(doc=4544,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20383182 = fieldWeight in 4544, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.125 = fieldNorm(doc=4544)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  14. Tunbridge, N.: Semiology put to data mining (1999) 0.00
    0.00270615 = product of:
      0.0054123 = sum of:
        0.0054123 = product of:
          0.0108246 = sum of:
            0.0108246 = weight(_text_:a in 6782) [ClassicSimilarity], result of:
              0.0108246 = score(doc=6782,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20383182 = fieldWeight in 6782, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.125 = fieldNorm(doc=6782)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  15. Bath, P.A.: Data mining in health and medical information (2003) 0.00
    0.00270615 = product of:
      0.0054123 = sum of:
        0.0054123 = product of:
          0.0108246 = sum of:
            0.0108246 = weight(_text_:a in 4263) [ClassicSimilarity], result of:
              0.0108246 = score(doc=4263,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20383182 = fieldWeight in 4263, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4263)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Data mining (DM) is part of a process by which information can be extracted from data or databases and used to inform decision making in a variety of contexts (Benoit, 2002; Michalski, Bratka & Kubat, 1997). DM includes a range of tools and methods for extractiog information; their use in the commercial sector for knowledge extraction and discovery has been one of the main driving forces in their development (Adriaans & Zantinge, 1996; Benoit, 2002). DM has been developed and applied in numerous areas. This review describes its use in analyzing health and medical information.
    Type
    a
  16. Derek Doran, D.; Gokhale, S.S.: ¬A classification framework for web robots (2012) 0.00
    0.00270615 = product of:
      0.0054123 = sum of:
        0.0054123 = product of:
          0.0108246 = sum of:
            0.0108246 = weight(_text_:a in 505) [ClassicSimilarity], result of:
              0.0108246 = score(doc=505,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20383182 = fieldWeight in 505, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=505)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The behavior of modern web robots varies widely when they crawl for different purposes. In this article, we present a framework to classify these web robots from two orthogonal perspectives, namely, their functionality and the types of resources they consume. Applying the classification framework to a year-long access log from the UConn SoE web server, we present trends that point to significant differences in their crawling behavior.
    Type
    a
  17. Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004) 0.00
    0.0026849252 = product of:
      0.0053698504 = sum of:
        0.0053698504 = product of:
          0.010739701 = sum of:
            0.010739701 = weight(_text_:a in 2563) [ClassicSimilarity], result of:
              0.010739701 = score(doc=2563,freq=14.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20223314 = fieldWeight in 2563, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2563)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Topic discovery is an important means for marketing, e-Business and social science studies. As well, it can be applied to various purposes, such as identifying a group with certain properties and observing the emergence and diminishment of a certain cyber community. Previous topic discovery work (J.M. Kleinberg, Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, p. 668) requires manual judgment of usefulness of outcomes and is thus incapable of handling the explosive growth of the Internet. In this paper, we propose the Automatic Topic Discovery (ATD) method, which combines a method of base set construction, a clustering algorithm and an iterative principal eigenvector computation method to discover the topics relevant to a given query without using manual examination. Given a query, ATD returns with topics associated with the query and top representative pages for each topic. Our experiments show that the ATD method performs better than the traditional eigenvector method in terms of computation time and topic discovery quality.
    Type
    a
  18. Whittle, M.; Eaglestone, B.; Ford, N.; Gillet, V.J.; Madden, A.: Data mining of search engine logs (2007) 0.00
    0.0026849252 = product of:
      0.0053698504 = sum of:
        0.0053698504 = product of:
          0.010739701 = sum of:
            0.010739701 = weight(_text_:a in 1330) [ClassicSimilarity], result of:
              0.010739701 = score(doc=1330,freq=14.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20223314 = fieldWeight in 1330, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1330)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This article reports on the development of a novel method for the analysis of Web logs. The method uses techniques that look for similarities between queries and identify sequences of query transformation. It allows sequences of query transformations to be represented as graphical networks, thereby giving a richer view of search behavior than is possible with the usual sequential descriptions. We also perform a basic analysis to study the correlations between observed transformation codes, with results that appear to show evidence of behavior habits. The method was developed using transaction logs from the Excite search engine to provide a tool for an ongoing research project that is endeavoring to develop a greater understanding of Web-based searching by the general public.
    Type
    a
  19. Kulathuramaiyer, N.; Maurer, H.: Implications of emerging data mining (2009) 0.00
    0.0026849252 = product of:
      0.0053698504 = sum of:
        0.0053698504 = product of:
          0.010739701 = sum of:
            0.010739701 = weight(_text_:a in 3144) [ClassicSimilarity], result of:
              0.010739701 = score(doc=3144,freq=14.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20223314 = fieldWeight in 3144, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3144)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Data Mining describes a technology that discovers non-trivial hidden patterns in a large collection of data. Although this technology has a tremendous impact on our lives, the invaluable contributions of this invisible technology often go unnoticed. This paper discusses advances in data mining while focusing on the emerging data mining capability. Such data mining applications perform multidimensional mining on a wide variety of heterogeneous data sources, providing solutions to many unresolved problems. This paper also highlights the advantages and disadvantages arising from the ever-expanding scope of data mining. Data Mining augments human intelligence by equipping us with a wealth of knowledge and by empowering us to perform our daily tasks better. As the mining scope and capacity increases, users and organizations become more willing to compromise privacy. The huge data stores of the 'master miners' allow them to gain deep insights into individual lifestyles and their social and behavioural patterns. Data integration and analysis capability of combining business and financial trends together with the ability to deterministically track market changes will drastically affect our lives.
    Source
    Social Semantic Web: Web 2.0, was nun? Hrsg.: A. Blumauer u. T. Pellegrini
    Type
    a
  20. Chen, C.-C.; Chen, A.-P.: Using data mining technology to provide a recommendation service in the digital library (2007) 0.00
    0.0026742492 = product of:
      0.0053484985 = sum of:
        0.0053484985 = product of:
          0.010696997 = sum of:
            0.010696997 = weight(_text_:a in 2533) [ClassicSimilarity], result of:
              0.010696997 = score(doc=2533,freq=20.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20142901 = fieldWeight in 2533, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2533)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Purpose - Since library storage has been increasing day by day, it is difficult for readers to find the books which interest them as well as representative booklists. How to utilize meaningful information effectively to improve the service quality of the digital library appears to be very important. The purpose of this paper is to provide a recommendation system architecture to promote digital library services in electronic libraries. Design/methodology/approach - In the proposed architecture, a two-phase data mining process used by association rule and clustering methods is designed to generate a recommendation system. The process considers not only the relationship of a cluster of users but also the associations among the information accessed. Findings - The process considered not only the relationship of a cluster of users but also the associations among the information accessed. With the advanced filter, the recommendation supported by the proposed system architecture would be closely served to meet users' needs. Originality/value - This paper not only constructs a recommendation service for readers to search books from the web but takes the initiative in finding the most suitable books for readers as well. Furthermore, library managers are expected to purchase core and hot books from a limited budget to maintain and satisfy the requirements of readers along with promoting digital library services.
    Type
    a

Years

Languages

  • e 125
  • d 29
  • sp 1
  • More… Less…

Types

  • a 141
  • el 15
  • m 10
  • s 9
  • More… Less…