Search (45 results, page 1 of 3)

Amir, A.; Feldman, R.; Kashi, R.: ¬A new and versatile method for association generation (1997) 0.02

0.02113474 = product of:
  0.06340422 = sum of:
    0.06340422 = product of:
      0.095106326 = sum of:
        0.047768015 = weight(_text_:29 in 1270) [ClassicSimilarity], result of:
          0.047768015 = score(doc=1270,freq=2.0), product of:
            0.15363316 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04367448 = queryNorm
            0.31092256 = fieldWeight in 1270, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0625 = fieldNorm(doc=1270)
        0.04733831 = weight(_text_:22 in 1270) [ClassicSimilarity], result of:
          0.04733831 = score(doc=1270,freq=2.0), product of:
            0.15294059 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04367448 = queryNorm
            0.30952093 = fieldWeight in 1270, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1270)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)

Date: 5. 4.1996 15:29:15
Source: Information systems. 22(1997) nos.5/6, S.333-347

Hofstede, A.H.M. ter; Proper, H.A.; Van der Weide, T.P.: Exploiting fact verbalisation in conceptual information modelling (1997) 0.02

0.018492898 = product of:
  0.055478692 = sum of:
    0.055478692 = product of:
      0.08321804 = sum of:
        0.041797012 = weight(_text_:29 in 2908) [ClassicSimilarity], result of:
          0.041797012 = score(doc=2908,freq=2.0), product of:
            0.15363316 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04367448 = queryNorm
            0.27205724 = fieldWeight in 2908, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2908)
        0.041421022 = weight(_text_:22 in 2908) [ClassicSimilarity], result of:
          0.041421022 = score(doc=2908,freq=2.0), product of:
            0.15294059 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04367448 = queryNorm
            0.2708308 = fieldWeight in 2908, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2908)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)

Date: 5. 4.1996 15:29:15
Source: Information systems. 22(1997) nos.5/6, S.349-385

Bell, D.A.; Guan, J.W.: Computational methods for rough classification and discovery (1998) 0.01
```
0.011240869 = product of:
  0.033722606 = sum of:
    0.033722606 = product of:
      0.10116781 = sum of:
        0.10116781 = weight(_text_:theory in 2909) [ClassicSimilarity], result of:
          0.10116781 = score(doc=2909,freq=6.0), product of:
            0.18161562 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.04367448 = queryNorm
            0.55704355 = fieldWeight in 2909, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2909)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Rough set theory is a mathematical tool to deal with vagueness and uncertainty. To apply the theory, it needs to be associated with efficient and effective computational methods. A relation can be used to represent a decison table for use in decision making. By using this kind of table, rough set theory can be applied successfully to rough classification and knowledge discovery. Presents computational methods for using rough sets to identify classes in datasets, finding dependencies in relations, and discovering rules which are hidden in databases. Illustrates the methods with a running example from a database of car test results

Budzik, J.; Hammond, K.J.; Birnbaum, L.: Information access in context (2001) 0.01

0.009288225 = product of:
  0.027864676 = sum of:
    0.027864676 = product of:
      0.083594024 = sum of:
        0.083594024 = weight(_text_:29 in 3835) [ClassicSimilarity], result of:
          0.083594024 = score(doc=3835,freq=2.0), product of:
            0.15363316 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04367448 = queryNorm
            0.5441145 = fieldWeight in 3835, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.109375 = fieldNorm(doc=3835)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 29. 3.2002 17:31:17

Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.01

0.009204673 = product of:
  0.027614016 = sum of:
    0.027614016 = product of:
      0.082842045 = sum of:
        0.082842045 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
          0.082842045 = score(doc=4577,freq=2.0), product of:
            0.15294059 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04367448 = queryNorm
            0.5416616 = fieldWeight in 4577, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=4577)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 2. 4.2000 18:01:22

Lingras, P.J.; Yao, Y.Y.: Data mining using extensions of the rough set model (1998) 0.01
```
0.009178132 = product of:
  0.027534394 = sum of:
    0.027534394 = product of:
      0.08260318 = sum of:
        0.08260318 = weight(_text_:theory in 2910) [ClassicSimilarity], result of:
          0.08260318 = score(doc=2910,freq=4.0), product of:
            0.18161562 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.04367448 = queryNorm
            0.45482418 = fieldWeight in 2910, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2910)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Examines basic issues of data mining using the theory of rough sets, which is a recent proposal for generalizing classical set theory. The Pawlak rough set model is based on the concept of an equivalence relation. A generalized rough set model need not be based on equivalence relation axioms. The Pawlak rough set model has been used for deriving deterministic as well as probabilistic rules froma complete database. Demonstrates that a generalised rough set model can be used for generating rules from incomplete databases. These rules are based on plausability functions proposed by Shafer. Discusses the importance of rule extraction from incomplete databases in data mining
Hereth, J.; Stumme, G.; Wille, R.; Wille, U.: Conceptual knowledge discovery and data analysis (2000) 0.01
```
0.008029193 = product of:
  0.024087576 = sum of:
    0.024087576 = product of:
      0.07226273 = sum of:
        0.07226273 = weight(_text_:theory in 5083) [ClassicSimilarity], result of:
          0.07226273 = score(doc=5083,freq=6.0), product of:
            0.18161562 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.04367448 = queryNorm
            0.39788827 = fieldWeight in 5083, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5083)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

In this paper, we discuss Conceptual Knowledge Discovery in Databases (CKDD) in its connection with Data Analysis. Our approach is based on Formal Concept Analysis, a mathematical theory which has been developed and proven useful during the last 20 years. Formal Concept Analysis has led to a theory of conceptual information systems which has been applied by using the management system TOSCANA in a wide range of domains. In this paper, we use such an application in database marketing to demonstrate how methods and procedures of CKDD can be applied in Data Analysis. In particular, we show the interplay and integration of data mining and data analysis techniques based on Formal Concept Analysis. The main concern of this paper is to explain how the transition from data to knowledge can be supported by a TOSCANA system. To clarify the transition steps we discuss their correspondence to the five levels of knowledge representation established by R. Brachman and to the steps of empirically grounded theory building proposed by A. Strauss and J. Corbin

Witten, I.H.; Frank, E.: Data Mining : Praktische Werkzeuge und Techniken für das maschinelle Lernen (2000) 0.01

0.0079613365 = product of:
  0.02388401 = sum of:
    0.02388401 = product of:
      0.071652025 = sum of:
        0.071652025 = weight(_text_:29 in 6833) [ClassicSimilarity], result of:
          0.071652025 = score(doc=6833,freq=2.0), product of:
            0.15363316 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04367448 = queryNorm
            0.46638384 = fieldWeight in 6833, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.09375 = fieldNorm(doc=6833)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 27. 1.1996 10:29:55

Keim, D.A.: Data Mining mit bloßem Auge (2002) 0.01

0.0079613365 = product of:
  0.02388401 = sum of:
    0.02388401 = product of:
      0.071652025 = sum of:
        0.071652025 = weight(_text_:29 in 1086) [ClassicSimilarity], result of:
          0.071652025 = score(doc=1086,freq=2.0), product of:
            0.15363316 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04367448 = queryNorm
            0.46638384 = fieldWeight in 1086, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.09375 = fieldNorm(doc=1086)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 31.12.1996 19:29:41

Kruse, R.; Borgelt, C.: Suche im Datendschungel (2002) 0.01

0.0079613365 = product of:
  0.02388401 = sum of:
    0.02388401 = product of:
      0.071652025 = sum of:
        0.071652025 = weight(_text_:29 in 1087) [ClassicSimilarity], result of:
          0.071652025 = score(doc=1087,freq=2.0), product of:
            0.15363316 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04367448 = queryNorm
            0.46638384 = fieldWeight in 1087, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.09375 = fieldNorm(doc=1087)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 31.12.1996 19:29:41

Wrobel, S.: Lern- und Entdeckungsverfahren (2002) 0.01

0.0079613365 = product of:
  0.02388401 = sum of:
    0.02388401 = product of:
      0.071652025 = sum of:
        0.071652025 = weight(_text_:29 in 1105) [ClassicSimilarity], result of:
          0.071652025 = score(doc=1105,freq=2.0), product of:
            0.15363316 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04367448 = queryNorm
            0.46638384 = fieldWeight in 1105, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.09375 = fieldNorm(doc=1105)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 31.12.1996 19:29:41

KDD : techniques and applications (1998) 0.01

0.00788972 = product of:
  0.023669157 = sum of:
    0.023669157 = product of:
      0.07100747 = sum of:
        0.07100747 = weight(_text_:22 in 6783) [ClassicSimilarity], result of:
          0.07100747 = score(doc=6783,freq=2.0), product of:
            0.15294059 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04367448 = queryNorm
            0.46428138 = fieldWeight in 6783, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=6783)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Footnote: A special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997

Frické, M.: Big data and its epistemology (2015) 0.01
```
0.00786697 = product of:
  0.02360091 = sum of:
    0.02360091 = product of:
      0.070802726 = sum of:
        0.070802726 = weight(_text_:theory in 1811) [ClassicSimilarity], result of:
          0.070802726 = score(doc=1811,freq=4.0), product of:
            0.18161562 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.04367448 = queryNorm
            0.3898493 = fieldWeight in 1811, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.046875 = fieldNorm(doc=1811)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

The article considers whether Big Data, in the form of data-driven science, will enable the discovery, or appraisal, of universal scientific theories, instrumentalist tools, or inductive inferences. It points out, initially, that such aspirations are similar to the now-discredited inductivist approach to science. On the positive side, Big Data may permit larger sample sizes, cheaper and more extensive testing of theories, and the continuous assessment of theories. On the negative side, data-driven science encourages passive data collection, as opposed to experimentation and testing, and hornswoggling ("unsound statistical fiddling"). The roles of theory and data in inductive algorithms, statistical modeling, and scientific discoveries are analyzed, and it is argued that theory is needed at every turn. Data-driven science is a chimera.

Borgelt, C.; Kruse, R.: Unsicheres Wissen nutzen (2002) 0.01

0.0066344473 = product of:
  0.019903341 = sum of:
    0.019903341 = product of:
      0.059710022 = sum of:
        0.059710022 = weight(_text_:29 in 1104) [ClassicSimilarity], result of:
          0.059710022 = score(doc=1104,freq=2.0), product of:
            0.15363316 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04367448 = queryNorm
            0.38865322 = fieldWeight in 1104, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.078125 = fieldNorm(doc=1104)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 31.12.1996 19:29:41

Survey of text mining : clustering, classification, and retrieval (2004) 0.01
```
0.006555808 = product of:
  0.019667422 = sum of:
    0.019667422 = product of:
      0.059002265 = sum of:
        0.059002265 = weight(_text_:theory in 804) [ClassicSimilarity], result of:
          0.059002265 = score(doc=804,freq=4.0), product of:
            0.18161562 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.04367448 = queryNorm
            0.3248744 = fieldWeight in 804, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.0390625 = fieldNorm(doc=804)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Extracting content from text continues to be an important research problem for information processing and management. Approaches to capture the semantics of text-based document collections may be based on Bayesian models, probability theory, vector space models, statistical models, or even graph theory. As the volume of digitized textual media continues to grow, so does the need for designing robust, scalable indexing and search strategies (software) to meet a variety of user needs. Knowledge extraction or creation from text requires systematic yet reliable processing that can be codified and adapted for changing needs and environments. This book will draw upon experts in both academia and industry to recommend practical approaches to the purification, indexing, and mining of textual information. It will address document identification, clustering and categorizing documents, cleaning text, and visualizing semantic models of text.
Information visualization in data mining and knowledge discovery (2002) 0.01
```
0.0063384315 = product of:
  0.019015294 = sum of:
    0.019015294 = product of:
      0.028522938 = sum of:
        0.016688362 = weight(_text_:theory in 1789) [ClassicSimilarity], result of:
          0.016688362 = score(doc=1789,freq=2.0), product of:
            0.18161562 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.04367448 = queryNorm
            0.09188836 = fieldWeight in 1789, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.015625 = fieldNorm(doc=1789)
        0.011834578 = weight(_text_:22 in 1789) [ClassicSimilarity], result of:
          0.011834578 = score(doc=1789,freq=2.0), product of:
            0.15294059 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04367448 = queryNorm
            0.07738023 = fieldWeight in 1789, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.015625 = fieldNorm(doc=1789)
      0.6666667 = coord(2/3)
  0.33333334 = coord(1/3)
```
Date

23. 3.2008 19:10:22

Footnote

Rez. in: JASIST 54(2003) no.9, S.905-906 (C.A. Badurek): "Visual approaches for knowledge discovery in very large databases are a prime research need for information scientists focused an extracting meaningful information from the ever growing stores of data from a variety of domains, including business, the geosciences, and satellite and medical imagery. This work presents a summary of research efforts in the fields of data mining, knowledge discovery, and data visualization with the goal of aiding the integration of research approaches and techniques from these major fields. The editors, leading computer scientists from academia and industry, present a collection of 32 papers from contributors who are incorporating visualization and data mining techniques through academic research as well application development in industry and government agencies. Information Visualization focuses upon techniques to enhance the natural abilities of humans to visually understand data, in particular, large-scale data sets. It is primarily concerned with developing interactive graphical representations to enable users to more intuitively make sense of multidimensional data as part of the data exploration process. It includes research from computer science, psychology, human-computer interaction, statistics, and information science. Knowledge Discovery in Databases (KDD) most often refers to the process of mining databases for previously unknown patterns and trends in data. Data mining refers to the particular computational methods or algorithms used in this process. The data mining research field is most related to computational advances in database theory, artificial intelligence and machine learning. This work compiles research summaries from these main research areas in order to provide "a reference work containing the collection of thoughts and ideas of noted researchers from the fields of data mining and data visualization" (p. 8). It addresses these areas in three main sections: the first an data visualization, the second an KDD and model visualization, and the last an using visualization in the knowledge discovery process. The seven chapters of Part One focus upon methodologies and successful techniques from the field of Data Visualization. Hoffman and Grinstein (Chapter 2) give a particularly good overview of the field of data visualization and its potential application to data mining. An introduction to the terminology of data visualization, relation to perceptual and cognitive science, and discussion of the major visualization display techniques are presented. Discussion and illustration explain the usefulness and proper context of such data visualization techniques as scatter plots, 2D and 3D isosurfaces, glyphs, parallel coordinates, and radial coordinate visualizations. Remaining chapters present the need for standardization of visualization methods, discussion of user requirements in the development of tools, and examples of using information visualization in addressing research problems.
Pons-Porrata, A.; Berlanga-Llavori, R.; Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques (2007) 0.01
```
0.0055627874 = product of:
  0.016688362 = sum of:
    0.016688362 = product of:
      0.050065085 = sum of:
        0.050065085 = weight(_text_:theory in 916) [ClassicSimilarity], result of:
          0.050065085 = score(doc=916,freq=2.0), product of:
            0.18161562 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.04367448 = queryNorm
            0.27566507 = fieldWeight in 916, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.046875 = fieldNorm(doc=916)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

In this paper, we present a topic discovery system aimed to reveal the implicit knowledge present in news streams. This knowledge is expressed as a hierarchy of topic/subtopics, where each topic contains the set of documents that are related to it and a summary extracted from these documents. Summaries so built are useful to browse and select topics of interest from the generated hierarchies. Our proposal consists of a new incremental hierarchical clustering algorithm, which combines both partitional and agglomerative approaches, taking the main benefits from them. Finally, a new summarization method based on Testor Theory has been proposed to build the topic summaries. Experimental results in the TDT2 collection demonstrate its usefulness and effectiveness not only as a topic detection system, but also as a classification and summarization tool.
Sánchez, D.; Chamorro-Martínez, J.; Vila, M.A.: Modelling subjectivity in visual perception of orientation for image retrieval (2003) 0.01
```
0.0055627874 = product of:
  0.016688362 = sum of:
    0.016688362 = product of:
      0.050065085 = sum of:
        0.050065085 = weight(_text_:theory in 1067) [ClassicSimilarity], result of:
          0.050065085 = score(doc=1067,freq=2.0), product of:
            0.18161562 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.04367448 = queryNorm
            0.27566507 = fieldWeight in 1067, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.046875 = fieldNorm(doc=1067)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

In this paper we combine computer vision and data mining techniques to model high-level concepts for image retrieval, on the basis of basic perceptual features of the human visual system. High-level concepts related to these features are learned and represented by means of a set of fuzzy association rules. The concepts so acquired can be used for image retrieval with the advantage that it is not needed to provide an image as a query. Instead, a query is formulated by using the labels that identify the learned concepts as search terms, and the retrieval process calculates the relevance of an image to the query by an inference mechanism. An additional feature of our methodology is that it can capture user's subjectivity. For that purpose, fuzzy sets theory is employed to measure user's assessments about the fulfillment of a concept by an image.
Mohr, J.W.; Bogdanov, P.: Topic models : what they are and why they matter (2013) 0.01
```
0.0055627874 = product of:
  0.016688362 = sum of:
    0.016688362 = product of:
      0.050065085 = sum of:
        0.050065085 = weight(_text_:theory in 1142) [ClassicSimilarity], result of:
          0.050065085 = score(doc=1142,freq=2.0), product of:
            0.18161562 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.04367448 = queryNorm
            0.27566507 = fieldWeight in 1142, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.046875 = fieldNorm(doc=1142)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

We provide a brief, non-technical introduction to the text mining methodology known as "topic modeling." We summarize the theory and background of the method and discuss what kinds of things are found by topic models. Using a text corpus comprised of the eight articles from the special issue of Poetics on the subject of topic models, we run a topic model on these articles, both as a way to introduce the methodology and also to help summarize some of the ways in which social and cultural scientists are using topic models. We review some of the critiques and debates over the use of the method and finally, we link these developments back to some of the original innovations in the field of content analysis that were pioneered by Harold D. Lasswell and colleagues during and just after World War II.
Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.01
```
0.0055627874 = product of:
  0.016688362 = sum of:
    0.016688362 = product of:
      0.050065085 = sum of:
        0.050065085 = weight(_text_:theory in 3015) [ClassicSimilarity], result of:
          0.050065085 = score(doc=3015,freq=2.0), product of:
            0.18161562 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.04367448 = queryNorm
            0.27566507 = fieldWeight in 3015, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.046875 = fieldNorm(doc=3015)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.

Search (45 results, page 1 of 3)

Authors

Years

Languages

Types

Themes

Subjects

Classifications