Search (28 results, page 1 of 2)

Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.05

0.048416097 = product of:
  0.09683219 = sum of:
    0.09683219 = sum of:
      0.009471525 = weight(_text_:a in 4577) [ClassicSimilarity], result of:
        0.009471525 = score(doc=4577,freq=2.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.17835285 = fieldWeight in 4577, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.109375 = fieldNorm(doc=4577)
      0.087360665 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
        0.087360665 = score(doc=4577,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.5416616 = fieldWeight in 4577, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.109375 = fieldNorm(doc=4577)
  0.5 = coord(1/2)

Date: 2. 4.2000 18:01:22
Type: a

KDD : techniques and applications (1998) 0.04

0.04149951 = product of:
  0.08299902 = sum of:
    0.08299902 = sum of:
      0.008118451 = weight(_text_:a in 6783) [ClassicSimilarity], result of:
        0.008118451 = score(doc=6783,freq=2.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.15287387 = fieldWeight in 6783, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.09375 = fieldNorm(doc=6783)
      0.07488057 = weight(_text_:22 in 6783) [ClassicSimilarity], result of:
        0.07488057 = score(doc=6783,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.46428138 = fieldWeight in 6783, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.09375 = fieldNorm(doc=6783)
  0.5 = coord(1/2)

Footnote: A special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997

Amir, A.; Feldman, R.; Kashi, R.: ¬A new and versatile method for association generation (1997) 0.03

0.031588875 = product of:
  0.06317775 = sum of:
    0.06317775 = sum of:
      0.013257373 = weight(_text_:a in 1270) [ClassicSimilarity], result of:
        0.013257373 = score(doc=1270,freq=12.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.24964198 = fieldWeight in 1270, product of:
            3.4641016 = tf(freq=12.0), with freq of:
              12.0 = termFreq=12.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0625 = fieldNorm(doc=1270)
      0.04992038 = weight(_text_:22 in 1270) [ClassicSimilarity], result of:
        0.04992038 = score(doc=1270,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.30952093 = fieldWeight in 1270, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=1270)
  0.5 = coord(1/2)

Abstract: Current algorithms for finding associations among the attributes describing data in a database have a number of shortcomings. Presents a novel method for association generation, that answers all desiderata. The method is different from all existing algorithms and especially suitable to textual databases with binary attributes. Uses subword trees for quick indexing into the required database statistics. Tests the algorithm on the Reuters-22173 database with satisfactory results
Source: Information systems. 22(1997) nos.5/6, S.333-347
Type: a

Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.03

0.028787265 = product of:
  0.05757453 = sum of:
    0.05757453 = sum of:
      0.007654148 = weight(_text_:a in 1737) [ClassicSimilarity], result of:
        0.007654148 = score(doc=1737,freq=4.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.14413087 = fieldWeight in 1737, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0625 = fieldNorm(doc=1737)
      0.04992038 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
        0.04992038 = score(doc=1737,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.30952093 = fieldWeight in 1737, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=1737)
  0.5 = coord(1/2)

Abstract: Defines digital libraries and discusses the effects of new technology on librarians. Examines the different viewpoints of librarians and information technologists on digital libraries. Describes the development of a digital library at the National Drug Intelligence Center, USA, which was carried out in collaboration with information technology experts. The system is based on Web enabled search technology to find information, data visualization and data mining to visualize it and use of SGML as an information standard to store it
Date: 22.11.1998 18:57:22
Type: a

Hofstede, A.H.M. ter; Proper, H.A.; Van der Weide, T.P.: Exploiting fact verbalisation in conceptual information modelling (1997) 0.03

0.026575929 = product of:
  0.053151857 = sum of:
    0.053151857 = sum of:
      0.009471525 = weight(_text_:a in 2908) [ClassicSimilarity], result of:
        0.009471525 = score(doc=2908,freq=8.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.17835285 = fieldWeight in 2908, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2908)
      0.043680333 = weight(_text_:22 in 2908) [ClassicSimilarity], result of:
        0.043680333 = score(doc=2908,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.2708308 = fieldWeight in 2908, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2908)
  0.5 = coord(1/2)

Abstract: Focuses on the information modelling side of conceptual modelling. Deals with the exploitation of fact verbalisations after finishing the actual information system. Verbalisations are used as input for the design of the so-called information model. Exploits these verbalisation in 4 directions: considers their use for a conceptual query language, the verbalisation of instances, the description of the contents of a database and for the verbalisation of queries in a computer supported query environment. Provides an example session with an envisioned tool for end user query formulations that exploits the verbalisation
Source: Information systems. 22(1997) nos.5/6, S.349-385
Type: a

Lusti, M.: Data Warehousing and Data Mining : Eine Einführung in entscheidungsunterstützende Systeme (1999) 0.01

0.012480095 = product of:
  0.02496019 = sum of:
    0.02496019 = product of:
      0.04992038 = sum of:
        0.04992038 = weight(_text_:22 in 4261) [ClassicSimilarity], result of:
          0.04992038 = score(doc=4261,freq=2.0), product of:
            0.16128273 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046056706 = queryNorm
            0.30952093 = fieldWeight in 4261, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4261)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 17. 7.2002 19:22:06

Bell, D.A.; Guan, J.W.: Computational methods for rough classification and discovery (1998) 0.00
```
0.0031324127 = product of:
  0.0062648254 = sum of:
    0.0062648254 = product of:
      0.012529651 = sum of:
        0.012529651 = weight(_text_:a in 2909) [ClassicSimilarity], result of:
          0.012529651 = score(doc=2909,freq=14.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.23593865 = fieldWeight in 2909, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2909)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Rough set theory is a mathematical tool to deal with vagueness and uncertainty. To apply the theory, it needs to be associated with efficient and effective computational methods. A relation can be used to represent a decison table for use in decision making. By using this kind of table, rough set theory can be applied successfully to rough classification and knowledge discovery. Presents computational methods for using rough sets to identify classes in datasets, finding dependencies in relations, and discovering rules which are hidden in databases. Illustrates the methods with a running example from a database of car test results

Footnote

Contribution to a special issue devoted to knowledge discovery and data mining

Type

a
Wong, S.K.M.; Butz, C.J.; Xiang, X.: Automated database schema design using mined data dependencies (1998) 0.00
```
0.0029000505 = product of:
  0.005800101 = sum of:
    0.005800101 = product of:
      0.011600202 = sum of:
        0.011600202 = weight(_text_:a in 2897) [ClassicSimilarity], result of:
          0.011600202 = score(doc=2897,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.21843673 = fieldWeight in 2897, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2897)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Data dependencies are used in database schema design to enforce the correctness of a database as well as to reduce redundant data. These dependencies are usually determined from the semantics of the attributes and are then enforced upon the relations. Describes a bottom-up procedure for discovering multivalued dependencies in observed data without knowing a priori the relationships among the attributes. The proposed algorithm is an application of the technique designed for learning conditional independencies in probabilistic reasoning. A prototype system for automated database schema design has been implemented. Experiments were carried out to demonstrate both the effectiveness and efficiency of the method

Footnote

Contribution to a special issue devoted to knowledge discovery and data mining

Type

a

Fayyad, U.M.; Djorgovski, S.G.; Weir, N.: From digitized images to online catalogs : data ming a sky server (1996) 0.00

0.00270615 = product of:
  0.0054123 = sum of:
    0.0054123 = product of:
      0.0108246 = sum of:
        0.0108246 = weight(_text_:a in 6625) [ClassicSimilarity], result of:
          0.0108246 = score(doc=6625,freq=8.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.20383182 = fieldWeight in 6625, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=6625)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Offers a data mining approach based on machine learning classification methods to the problem of automated cataloguing of online databases of digital images resulting from sky surveys. The SKICAT system automates the reduction and analysis of 3 terabytes of images expected to contain about 2 billion sky objects. It offers a solution to problems associated with the analysis of large data sets in science
Type: a

Chen, Z.: Knowledge discovery and system-user partnership : on a production 'adversarial partnership' approach (1994) 0.00

0.00270615 = product of:
  0.0054123 = sum of:
    0.0054123 = product of:
      0.0108246 = sum of:
        0.0108246 = weight(_text_:a in 6759) [ClassicSimilarity], result of:
          0.0108246 = score(doc=6759,freq=8.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.20383182 = fieldWeight in 6759, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=6759)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Examines the relationship between systems and users from the knowledge discovery in databases or data mining perspecitives. A comprehensive study on knowledge discovery in human computer symbiosis is needed. Proposes a database-user adversarial partnership, which is general enough to cover knowledge discovery and security of issues related to databases and their users. It can be further generalized into system-user adversarial paertnership. Discusses opportunities provided by knowledge discovery techniques and potential social implications
Type: a

Howlett, D.: Digging deep for treasure (1998) 0.00

0.00270615 = product of:
  0.0054123 = sum of:
    0.0054123 = product of:
      0.0108246 = sum of:
        0.0108246 = weight(_text_:a in 4544) [ClassicSimilarity], result of:
          0.0108246 = score(doc=4544,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.20383182 = fieldWeight in 4544, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.125 = fieldNorm(doc=4544)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Tunbridge, N.: Semiology put to data mining (1999) 0.00

0.00270615 = product of:
  0.0054123 = sum of:
    0.0054123 = product of:
      0.0108246 = sum of:
        0.0108246 = weight(_text_:a in 6782) [ClassicSimilarity], result of:
          0.0108246 = score(doc=6782,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.20383182 = fieldWeight in 6782, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.125 = fieldNorm(doc=6782)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Lingras, P.J.; Yao, Y.Y.: Data mining using extensions of the rough set model (1998) 0.00
```
0.0026473717 = product of:
  0.0052947435 = sum of:
    0.0052947435 = product of:
      0.010589487 = sum of:
        0.010589487 = weight(_text_:a in 2910) [ClassicSimilarity], result of:
          0.010589487 = score(doc=2910,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.19940455 = fieldWeight in 2910, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2910)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Examines basic issues of data mining using the theory of rough sets, which is a recent proposal for generalizing classical set theory. The Pawlak rough set model is based on the concept of an equivalence relation. A generalized rough set model need not be based on equivalence relation axioms. The Pawlak rough set model has been used for deriving deterministic as well as probabilistic rules froma complete database. Demonstrates that a generalised rough set model can be used for generating rules from incomplete databases. These rules are based on plausability functions proposed by Shafer. Discusses the importance of rule extraction from incomplete databases in data mining

Footnote

Contribution to a special issue devoted to knowledge discovery and data mining

Type

a
Wu, X.: Rule induction with extension matrices (1998) 0.00
```
0.0024857575 = product of:
  0.004971515 = sum of:
    0.004971515 = product of:
      0.00994303 = sum of:
        0.00994303 = weight(_text_:a in 2912) [ClassicSimilarity], result of:
          0.00994303 = score(doc=2912,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.18723148 = fieldWeight in 2912, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2912)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Presents a heuristic, attribute-based, noise-tolerant data mining program, HCV (Version 2.0), absed on the newly-developed extension matrix approach. Gives a simple example of attribute-based induction to show the difference between the rules in variable-valued logic produced by HCV, the decision tree generated by C4.5 and the decision tree's decompiled rules by C4.5 rules. Outlines the extension matrix approach for data mining. Describes the HCV algorithm in detail. Outlines techniques developed and implemented in the HCV program for noise handling and discretization of continuous domains respectively. Follows these with a performance comparison of HCV with famous ID3-like algorithms including C4.5 and C4.5 rules on a collection of standard databases including the famous MONK's problems

Footnote

Contribution to a special issue devoted to knowledge discovery and data mining

Type

a
Gaizauskas, R.; Wilks, Y.: Information extraction : beyond document retrieval (1998) 0.00
```
0.0024857575 = product of:
  0.004971515 = sum of:
    0.004971515 = product of:
      0.00994303 = sum of:
        0.00994303 = weight(_text_:a in 4716) [ClassicSimilarity], result of:
          0.00994303 = score(doc=4716,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.18723148 = fieldWeight in 4716, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=4716)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this paper we give a synoptic view of the growth of the text processing technology of informatione xtraction (IE) whose function is to extract information about a pre-specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960s and 70s till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining

Type

a

Saz, J.T.: Perspectivas en recuperacion y explotacion de informacion electronica : el 'data mining' (1997) 0.00

0.0023919214 = product of:
  0.0047838427 = sum of:
    0.0047838427 = product of:
      0.009567685 = sum of:
        0.009567685 = weight(_text_:a in 3723) [ClassicSimilarity], result of:
          0.009567685 = score(doc=3723,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.18016359 = fieldWeight in 3723, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=3723)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Presents the concept and the techniques identified by the term data mining. Explains the principles and phases of developing a data mining process, and the main types of data mining tools
Type: a

Schmid, J.: Data mining : wie finde ich in Datensammlungen entscheidungsrelevante Muster? (1999) 0.00

0.0023678814 = product of:
  0.0047357627 = sum of:
    0.0047357627 = product of:
      0.009471525 = sum of:
        0.009471525 = weight(_text_:a in 4540) [ClassicSimilarity], result of:
          0.009471525 = score(doc=4540,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17835285 = fieldWeight in 4540, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=4540)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Cardie, C.: Empirical methods in information extraction (1997) 0.00

0.0023435948 = product of:
  0.0046871896 = sum of:
    0.0046871896 = product of:
      0.009374379 = sum of:
        0.009374379 = weight(_text_:a in 3246) [ClassicSimilarity], result of:
          0.009374379 = score(doc=3246,freq=6.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17652355 = fieldWeight in 3246, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=3246)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Surveys the use of empirical, machine-learning methods for information extraction. Presents a generic architecture for information extraction systems and surveys the learning algorithms that have been developed to address the problems of accuracy, portability, and knowledge acquisition for each component of the architecture
Footnote: Contribution to a special section reviewing recent research in empirical methods in speech recognition, syntactic parsing, semantic processing, information extraction and machine translation
Type: a

Deogun, J.S.: Feature selection and effective classifiers (1998) 0.00
```
0.002269176 = product of:
  0.004538352 = sum of:
    0.004538352 = product of:
      0.009076704 = sum of:
        0.009076704 = weight(_text_:a in 2911) [ClassicSimilarity], result of:
          0.009076704 = score(doc=2911,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1709182 = fieldWeight in 2911, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2911)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Develops and analyzes 4 algorithms for feature selection in the context of rough set methodology. Develops the notion of accuracy of classification that can be used for upper or lower classification methods and defines the feature selection problem. Presents a discussion of upper classifiers and develops 4 features selection heuristics and discusses the family of stepwise backward selection algorithms. Analyzes the worst case time complexity in all algorithms presented. Discusses details of the experiments and results of using a family of stepwise backward selection learning data sets and a duodenal ulcer data set. Includes the experimental setup and results of comparison of lower classifiers and upper classiers on the duodenal ulcer data set. Discusses exteded decision tables

Footnote

Contribution to a special issue devoted to knowledge discovery and data mining

Type

a
Galal, G.M.; Cook, D.J.; Holder, L.B.: Exploiting parallelism in a structural scientific discovery system to improve scalability (1999) 0.00
```
0.002269176 = product of:
  0.004538352 = sum of:
    0.004538352 = product of:
      0.009076704 = sum of:
        0.009076704 = weight(_text_:a in 2952) [ClassicSimilarity], result of:
          0.009076704 = score(doc=2952,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1709182 = fieldWeight in 2952, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2952)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The large amount of data collected today is quickly overwhelming researchers' abilities to interpret the data and discover interesting patterns. Knowledge discovery and data mining approaches hold the potential to automate the interpretation process, but these approaches frequently utilize computationally expensive algorithms. In particular, scientific discovery systems focus on the utilization of richer data representation, sometimes without regard for scalability. This research investigates approaches for scaling a particular knowledge discovery in databases (KDD) system, SUBDUE, using parallel and distributed resources. SUBDUE has been used to discover interesting and repetitive concepts in graph-based databases from a variety of domains, but requires a substantial amount of processing time. Experiments that demonstrate scalability of parallel versions of the SUBDUE system are performed using CAD circuit databases and artificially-generated databases, and potential achievements and obstacles are discussed

Type

a

Search (28 results, page 1 of 2)

Authors

Languages

Types

Themes

Subjects