Search (59 results, page 3 of 3)

  • × theme_ss:"Data Mining"
  • × type_ss:"a"
  • × year_i:[2000 TO 2010}
  1. Wong, M.L.; Leung, K.S.; Cheng, J.C.Y.: Discovering knowledge from noisy databases using genetic programming (2000) 0.00
    0.0022989952 = product of:
      0.006896985 = sum of:
        0.006896985 = weight(_text_:a in 4863) [ClassicSimilarity], result of:
          0.006896985 = score(doc=4863,freq=6.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.13239266 = fieldWeight in 4863, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=4863)
      0.33333334 = coord(1/3)
    
    Abstract
    In data mining, we emphasize the need for learning from huge, incomplete, and imperfect data sets. To handle noise in the problem domain, existing learning systems avoid overfitting the imperfect training examples by excluding insignificant patterns. The problem is that these systems use a limiting attribute-value language for representing the training examples and the induced knowledge. Moreover, some important patterns are ignored because they are statistically insignificant. In this article, we present a framework that combines genetic programming and inductive logic programming to induce knowledge represented in various knowledge representation formalisms from noisy databases (LOGENPRO). Moreover, the system is applied to one real-life medical database. The knowledge discovered provides insights to and allows better understanding of the medical domains
    Type
    a
  2. Gluck , M.: Multimedia exploratory data analysis for geospatial data mining : the case for augmented seriation (2001) 0.00
    0.0022989952 = product of:
      0.006896985 = sum of:
        0.006896985 = weight(_text_:a in 5214) [ClassicSimilarity], result of:
          0.006896985 = score(doc=5214,freq=6.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.13239266 = fieldWeight in 5214, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=5214)
      0.33333334 = coord(1/3)
    
    Abstract
    To prevent type-one error, statisticians tend to accept the possibility of type-two error, which leads to the rejection of hypotheses later shown to be true. In both Exploratory Data Analysis and data mining the emphasis is more appropriately on the elimination of type-two error. Thus EDA methods, including its visualization tools may be appropriate for Data Mining. Seriation, creates a matrix of observations and variables, where the cells contain an icon whose size represents its value, and permits the movement of rows and columns in order to visually discern patterns. Augmented Seriation, a method of data mining, adds computer graphics, sound, color, and extra dimensions to the matrix so that the analyst has different modalities for pattern observation. Gluck has developed software for such analysis.
    Type
    a
  3. Dang, X.H.; Ong. K.-L.: Knowledge discovery in data streams (2009) 0.00
    0.0022989952 = product of:
      0.006896985 = sum of:
        0.006896985 = weight(_text_:a in 3829) [ClassicSimilarity], result of:
          0.006896985 = score(doc=3829,freq=6.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.13239266 = fieldWeight in 3829, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=3829)
      0.33333334 = coord(1/3)
    
    Abstract
    Knowing what to do with the massive amount of data collected has always been an ongoing issue for many organizations. While data mining has been touted to be the solution, it has failed to deliver the impact despite its successes in many areas. One reason is that data mining algorithms were not designed for the real world, i.e., they usually assume a static view of the data and a stable execution environment where resourcesare abundant. The reality however is that data are constantly changing and the execution environment is dynamic. Hence, it becomes difficult for data mining to truly deliver timely and relevant results. Recently, the processing of stream data has received many attention. What is interesting is that the methodology to design stream-based algorithms may well be the solution to the above problem. In this entry, we discuss this issue and present an overview of recent works.
    Type
    a
  4. Borgelt, C.; Kruse, R.: Unsicheres Wissen nutzen (2002) 0.00
    0.002212209 = product of:
      0.0066366266 = sum of:
        0.0066366266 = weight(_text_:a in 1104) [ClassicSimilarity], result of:
          0.0066366266 = score(doc=1104,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.12739488 = fieldWeight in 1104, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=1104)
      0.33333334 = coord(1/3)
    
    Type
    a
  5. Shi, X.; Yang, C.C.: Mining related queries from Web search engine query logs using an improved association rule mining model (2007) 0.00
    0.002212209 = product of:
      0.0066366266 = sum of:
        0.0066366266 = weight(_text_:a in 597) [ClassicSimilarity], result of:
          0.0066366266 = score(doc=597,freq=8.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.12739488 = fieldWeight in 597, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=597)
      0.33333334 = coord(1/3)
    
    Abstract
    With the overwhelming volume of information, the task of finding relevant information on a given topic on the Web is becoming increasingly difficult. Web search engines hence become one of the most popular solutions available on the Web. However, it has never been easy for novice users to organize and represent their information needs using simple queries. Users have to keep modifying their input queries until they get expected results. Therefore, it is often desirable for search engines to give suggestions on related queries to users. Besides, by identifying those related queries, search engines can potentially perform optimizations on their systems, such as query expansion and file indexing. In this work we propose a method that suggests a list of related queries given an initial input query. The related queries are based in the query log of previously submitted queries by human users, which can be identified using an enhanced model of association rules. Users can utilize the suggested related queries to tune or redirect the search process. Our method not only discovers the related queries, but also ranks them according to the degree of their relatedness. Unlike many other rival techniques, it also performs reasonably well on less frequent input queries.
    Type
    a
  6. Baumgartner, R.: Methoden und Werkzeuge zur Webdatenextraktion (2006) 0.00
    0.0021899752 = product of:
      0.0065699257 = sum of:
        0.0065699257 = weight(_text_:a in 5808) [ClassicSimilarity], result of:
          0.0065699257 = score(doc=5808,freq=4.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.12611452 = fieldWeight in 5808, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5808)
      0.33333334 = coord(1/3)
    
    Source
    Semantic Web: Wege zur vernetzten Wissensgesellschaft. Hrsg.: T. Pellegrini, u. A. Blumauer
    Type
    a
  7. Lam, W.; Yang, C.C.; Menczer, F.: Introduction to the special topic section on mining Web resources for enhancing information retrieval (2007) 0.00
    0.0021899752 = product of:
      0.0065699257 = sum of:
        0.0065699257 = weight(_text_:a in 600) [ClassicSimilarity], result of:
          0.0065699257 = score(doc=600,freq=4.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.12611452 = fieldWeight in 600, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=600)
      0.33333334 = coord(1/3)
    
    Abstract
    The amount of information on the Web has been expanding at an enormous pace. There are a variety of Web documents in different genres, such as news, reports, reviews. Traditionally, the information displayed on Web sites has been static. Recently, there are many Web sites offering content that is dynamically generated and frequently updated. It is also common for Web sites to contain information in different languages since many countries adopt more than one language. Moreover, content may exist in multimedia formats including text, images, video, and audio.
    Type
    a
  8. Cohen, D.J.: From Babel to knowledge : data mining large digital collections (2006) 0.00
    0.0021675134 = product of:
      0.00650254 = sum of:
        0.00650254 = weight(_text_:a in 1178) [ClassicSimilarity], result of:
          0.00650254 = score(doc=1178,freq=12.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.12482099 = fieldWeight in 1178, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=1178)
      0.33333334 = coord(1/3)
    
    Abstract
    In Jorge Luis Borges's curious short story The Library of Babel, the narrator describes an endless collection of books stored from floor to ceiling in a labyrinth of countless hexagonal rooms. The pages of the library's books seem to contain random sequences of letters and spaces; occasionally a few intelligible words emerge in the sea of paper and ink. Nevertheless, readers diligently, and exasperatingly, scan the shelves for coherent passages. The narrator himself has wandered numerous rooms in search of enlightenment, but with resignation he simply awaits his death and burial - which Borges explains (with signature dark humor) consists of being tossed unceremoniously over the library's banister. Borges's nightmare, of course, is a cursed vision of the research methods of disciplines such as literature, history, and philosophy, where the careful reading of books, one after the other, is supposed to lead inexorably to knowledge and understanding. Computer scientists would approach Borges's library far differently. Employing the information theory that forms the basis for search engines and other computerized techniques for assessing in one fell swoop large masses of documents, they would quickly realize the collection's incoherence though sampling and statistical methods - and wisely start looking for the library's exit. These computational methods, which allow us to find patterns, determine relationships, categorize documents, and extract information from massive corpuses, will form the basis for new tools for research in the humanities and other disciplines in the coming decade. For the past three years I have been experimenting with how to provide such end-user tools - that is, tools that harness the power of vast electronic collections while hiding much of their complicated technical plumbing. In particular, I have made extensive use of the application programming interfaces (APIs) the leading search engines provide for programmers to query their databases directly (from server to server without using their web interfaces). In addition, I have explored how one might extract information from large digital collections, from the well-curated lexicographic database WordNet to the democratic (and poorly curated) online reference work Wikipedia. While processing these digital corpuses is currently an imperfect science, even now useful tools can be created by combining various collections and methods for searching and analyzing them. And more importantly, these nascent services suggest a future in which information can be gleaned from, and sense can be made out of, even imperfect digital libraries of enormous scale. A brief examination of two approaches to data mining large digital collections hints at this future, while also providing some lessons about how to get there.
    Type
    a
  9. Sperlich, T.: ¬Die Zukunft hat schon begonnen : Visualisierungssoftware in der praktischen Anwendung (2000) 0.00
    0.0017697671 = product of:
      0.0053093014 = sum of:
        0.0053093014 = weight(_text_:a in 5059) [ClassicSimilarity], result of:
          0.0053093014 = score(doc=5059,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.10191591 = fieldWeight in 5059, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=5059)
      0.33333334 = coord(1/3)
    
    Type
    a
  10. Tiefschürfen in Datenbanken (2002) 0.00
    0.0017697671 = product of:
      0.0053093014 = sum of:
        0.0053093014 = weight(_text_:a in 996) [ClassicSimilarity], result of:
          0.0053093014 = score(doc=996,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.10191591 = fieldWeight in 996, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=996)
      0.33333334 = coord(1/3)
    
    Type
    a
  11. Brückner, T.; Dambeck, H.: Sortierautomaten : Grundlagen der Textklassifizierung (2003) 0.00
    0.0017697671 = product of:
      0.0053093014 = sum of:
        0.0053093014 = weight(_text_:a in 2398) [ClassicSimilarity], result of:
          0.0053093014 = score(doc=2398,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.10191591 = fieldWeight in 2398, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=2398)
      0.33333334 = coord(1/3)
    
    Type
    a
  12. Schwartz, F.; Fang, Y.C.: Citation data analysis on hydrogeology (2007) 0.00
    0.0017697671 = product of:
      0.0053093014 = sum of:
        0.0053093014 = weight(_text_:a in 433) [ClassicSimilarity], result of:
          0.0053093014 = score(doc=433,freq=8.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.10191591 = fieldWeight in 433, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=433)
      0.33333334 = coord(1/3)
    
    Abstract
    This article explores the status of research in hydrogeology using data mining techniques. First we try to explain what citation analysis is and review some of the previous work on citation analysis. The main idea in this article is to address some common issues about citation numbers and the use of these data. To validate the use of citation numbers, we compare the citation patterns for Water Resources Research papers in the 1980s with those in the 1990s. The citation growths for highly cited authors from the 1980s are used to examine whether it is possible to predict the citation patterns for highly-cited authors in the 1990s. If the citation data prove to be steady and stable, these numbers then can be used to explore the evolution of science in hydrogeology. The famous quotation, "If you are not the lead dog, the scenery never changes," attributed to Lee Iacocca, points to the importance of an entrepreneurial spirit in all forms of endeavor. In the case of hydrogeological research, impact analysis makes it clear how important it is to be a pioneer. Statistical correlation coefficients are used to retrieve papers among a collection of 2,847 papers before and after 1991 sharing the same topics with 273 papers in 1991 in Water Resources Research. The numbers of papers before and after 1991 are then plotted against various levels of citations for papers in 1991 to compare the distributions of paper population before and after that year. The similarity metrics based on word counts can ensure that the "before" papers are like ancestors and "after" papers are descendants in the same type of research. This exercise gives us an idea of how many papers are populated before and after 1991 (1991 is chosen based on balanced numbers of papers before and after that year). In addition, the impact of papers is measured in terms of citation presented as "percentile," a relative measure based on rankings in one year, in order to minimize the effect of time.
    Type
    a
  13. Schwartz, D.: Graphische Datenanalyse für digitale Bibliotheken : Leistungs- und Funktionsumfang moderner Analyse- und Visualisierungsinstrumente (2006) 0.00
    0.0015485462 = product of:
      0.0046456386 = sum of:
        0.0046456386 = weight(_text_:a in 30) [ClassicSimilarity], result of:
          0.0046456386 = score(doc=30,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.089176424 = fieldWeight in 30, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=30)
      0.33333334 = coord(1/3)
    
    Type
    a
  14. Zhou, L.; Chaovalit, P.: Ontology-supported polarity mining (2008) 0.00
    0.0015485462 = product of:
      0.0046456386 = sum of:
        0.0046456386 = weight(_text_:a in 1343) [ClassicSimilarity], result of:
          0.0046456386 = score(doc=1343,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.089176424 = fieldWeight in 1343, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1343)
      0.33333334 = coord(1/3)
    
    Type
    a
  15. Ohly, H.P.: Bibliometric mining : added value from document analysis and retrieval (2008) 0.00
    0.0013273255 = product of:
      0.0039819763 = sum of:
        0.0039819763 = weight(_text_:a in 2386) [ClassicSimilarity], result of:
          0.0039819763 = score(doc=2386,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.07643694 = fieldWeight in 2386, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2386)
      0.33333334 = coord(1/3)
    
    Type
    a
  16. Keim, D.A.: Datenvisualisierung und Data Mining (2004) 0.00
    0.0011061045 = product of:
      0.0033183133 = sum of:
        0.0033183133 = weight(_text_:a in 2931) [ClassicSimilarity], result of:
          0.0033183133 = score(doc=2931,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.06369744 = fieldWeight in 2931, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2931)
      0.33333334 = coord(1/3)
    
    Type
    a
  17. Ku, L.-W.; Chen, H.-H.: Mining opinions from the Web : beyond relevance retrieval (2007) 0.00
    0.0011061045 = product of:
      0.0033183133 = sum of:
        0.0033183133 = weight(_text_:a in 605) [ClassicSimilarity], result of:
          0.0033183133 = score(doc=605,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.06369744 = fieldWeight in 605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=605)
      0.33333334 = coord(1/3)
    
    Type
    a
  18. Klein, H.: Web Content Mining (2004) 0.00
    8.848836E-4 = product of:
      0.0026546507 = sum of:
        0.0026546507 = weight(_text_:a in 3154) [ClassicSimilarity], result of:
          0.0026546507 = score(doc=3154,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.050957955 = fieldWeight in 3154, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=3154)
      0.33333334 = coord(1/3)
    
    Type
    a
  19. Seidenfaden, U.: Schürfen in Datenbergen : Data-Mining soll möglichst viel Information zu Tage fördern (2001) 0.00
    7.742731E-4 = product of:
      0.0023228193 = sum of:
        0.0023228193 = weight(_text_:a in 6923) [ClassicSimilarity], result of:
          0.0023228193 = score(doc=6923,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.044588212 = fieldWeight in 6923, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.02734375 = fieldNorm(doc=6923)
      0.33333334 = coord(1/3)
    
    Type
    a

Languages

  • e 43
  • d 16