Search (162 results, page 1 of 9)

Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P.: From data mining to knowledge discovery in databases (1996) 0.07

0.070293136 = product of:
  0.117155224 = sum of:
    0.06318085 = weight(_text_:g in 7458) [ClassicSimilarity], result of:
      0.06318085 = score(doc=7458,freq=2.0), product of:
        0.15225126 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.040536046 = queryNorm
        0.4149775 = fieldWeight in 7458, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.078125 = fieldNorm(doc=7458)
    0.048019946 = weight(_text_:u in 7458) [ClassicSimilarity], result of:
      0.048019946 = score(doc=7458,freq=2.0), product of:
        0.13273303 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.040536046 = queryNorm
        0.3617784 = fieldWeight in 7458, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.078125 = fieldNorm(doc=7458)
    0.0059544328 = weight(_text_:a in 7458) [ClassicSimilarity], result of:
      0.0059544328 = score(doc=7458,freq=2.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.12739488 = fieldWeight in 7458, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.078125 = fieldNorm(doc=7458)
  0.6 = coord(3/5)

Type: a

Heyer, G.; Läuter, M.; Quasthoff, U.; Wolff, C.: Texttechnologische Anwendungen am Beispiel Text Mining (2000) 0.04

0.042175878 = product of:
  0.07029313 = sum of:
    0.037908506 = weight(_text_:g in 5565) [ClassicSimilarity], result of:
      0.037908506 = score(doc=5565,freq=2.0), product of:
        0.15225126 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.040536046 = queryNorm
        0.24898648 = fieldWeight in 5565, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.046875 = fieldNorm(doc=5565)
    0.028811965 = weight(_text_:u in 5565) [ClassicSimilarity], result of:
      0.028811965 = score(doc=5565,freq=2.0), product of:
        0.13273303 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.040536046 = queryNorm
        0.21706703 = fieldWeight in 5565, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.046875 = fieldNorm(doc=5565)
    0.0035726598 = weight(_text_:a in 5565) [ClassicSimilarity], result of:
      0.0035726598 = score(doc=5565,freq=2.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.07643694 = fieldWeight in 5565, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=5565)
  0.6 = coord(3/5)

Type: a

Hereth, J.; Stumme, G.; Wille, R.; Wille, U.: Conceptual knowledge discovery and data analysis (2000) 0.04

0.037735835 = product of:
  0.062893055 = sum of:
    0.031590424 = weight(_text_:g in 5083) [ClassicSimilarity], result of:
      0.031590424 = score(doc=5083,freq=2.0), product of:
        0.15225126 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.040536046 = queryNorm
        0.20748875 = fieldWeight in 5083, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5083)
    0.024009973 = weight(_text_:u in 5083) [ClassicSimilarity], result of:
      0.024009973 = score(doc=5083,freq=2.0), product of:
        0.13273303 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.040536046 = queryNorm
        0.1808892 = fieldWeight in 5083, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5083)
    0.0072926614 = weight(_text_:a in 5083) [ClassicSimilarity], result of:
      0.0072926614 = score(doc=5083,freq=12.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.15602624 = fieldWeight in 5083, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5083)
  0.6 = coord(3/5)

Abstract: In this paper, we discuss Conceptual Knowledge Discovery in Databases (CKDD) in its connection with Data Analysis. Our approach is based on Formal Concept Analysis, a mathematical theory which has been developed and proven useful during the last 20 years. Formal Concept Analysis has led to a theory of conceptual information systems which has been applied by using the management system TOSCANA in a wide range of domains. In this paper, we use such an application in database marketing to demonstrate how methods and procedures of CKDD can be applied in Data Analysis. In particular, we show the interplay and integration of data mining and data analysis techniques based on Formal Concept Analysis. The main concern of this paper is to explain how the transition from data to knowledge can be supported by a TOSCANA system. To clarify the transition steps we discuss their correspondence to the five levels of knowledge representation established by R. Brachman and to the steps of empirically grounded theory building proposed by A. Strauss and J. Corbin
Type: a

Peters, G.; Gaese, V.: ¬Das DocCat-System in der Textdokumentation von G+J (2003) 0.03
```
0.030509992 = product of:
  0.050849985 = sum of:
    0.035740484 = weight(_text_:g in 1507) [ClassicSimilarity], result of:
      0.035740484 = score(doc=1507,freq=4.0), product of:
        0.15225126 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.040536046 = queryNorm
        0.23474671 = fieldWeight in 1507, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03125 = fieldNorm(doc=1507)
    0.0041253525 = weight(_text_:a in 1507) [ClassicSimilarity], result of:
      0.0041253525 = score(doc=1507,freq=6.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.088261776 = fieldWeight in 1507, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03125 = fieldNorm(doc=1507)
    0.010984149 = product of:
      0.021968298 = sum of:
        0.021968298 = weight(_text_:22 in 1507) [ClassicSimilarity], result of:
          0.021968298 = score(doc=1507,freq=2.0), product of:
            0.14195032 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.040536046 = queryNorm
            0.15476047 = fieldWeight in 1507, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1507)
      0.5 = coord(1/2)
  0.6 = coord(3/5)
```
Abstract

Wir werden einmal die Grundlagen des Text-Mining-Systems bei IBM darstellen, dann werden wir das Projekt etwas umfangreicher und deutlicher darstellen, da kennen wir uns aus. Von daher haben wir zwei Teile, einmal Heidelberg, einmal Hamburg. Noch einmal zur Technologie. Text-Mining ist eine von IBM entwickelte Technologie, die in einer besonderen Ausformung und Programmierung für uns zusammengestellt wurde. Das Projekt hieß bei uns lange Zeit DocText Miner und heißt seit einiger Zeit auf Vorschlag von IBM DocCat, das soll eine Abkürzung für Document-Categoriser sein, sie ist ja auch nett und anschaulich. Wir fangen an mit Text-Mining, das bei IBM in Heidelberg entwickelt wurde. Die verstehen darunter das automatische Indexieren als eine Instanz, also einen Teil von Text-Mining. Probleme werden dabei gezeigt, und das Text-Mining ist eben eine Methode zur Strukturierung von und der Suche in großen Dokumentenmengen, die Extraktion von Informationen und, das ist der hohe Anspruch, von impliziten Zusammenhängen. Das letztere sei dahingestellt. IBM macht das quantitativ, empirisch, approximativ und schnell. das muss man wirklich sagen. Das Ziel, und das ist ganz wichtig für unser Projekt gewesen, ist nicht, den Text zu verstehen, sondern das Ergebnis dieser Verfahren ist, was sie auf Neudeutsch a bundle of words, a bag of words nennen, also eine Menge von bedeutungstragenden Begriffen aus einem Text zu extrahieren, aufgrund von Algorithmen, also im Wesentlichen aufgrund von Rechenoperationen. Es gibt eine ganze Menge von linguistischen Vorstudien, ein wenig Linguistik ist auch dabei, aber nicht die Grundlage der ganzen Geschichte. Was sie für uns gemacht haben, ist also die Annotierung von Pressetexten für unsere Pressedatenbank. Für diejenigen, die es noch nicht kennen: Gruner + Jahr führt eine Textdokumentation, die eine Datenbank führt, seit Anfang der 70er Jahre, da sind z.Z. etwa 6,5 Millionen Dokumente darin, davon etwas über 1 Million Volltexte ab 1993. Das Prinzip war lange Zeit, dass wir die Dokumente, die in der Datenbank gespeichert waren und sind, verschlagworten und dieses Prinzip haben wir auch dann, als der Volltext eingeführt wurde, in abgespeckter Form weitergeführt. Zu diesen 6,5 Millionen Dokumenten gehören dann eben auch ungefähr 10 Millionen Faksimileseiten, weil wir die Faksimiles auch noch standardmäßig aufheben.

Date

22. 4.2003 11:45:36

Type

a

Klein, H.: Web Content Mining (2004) 0.03

0.028117254 = product of:
  0.04686209 = sum of:
    0.025272338 = weight(_text_:g in 3154) [ClassicSimilarity], result of:
      0.025272338 = score(doc=3154,freq=2.0), product of:
        0.15225126 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.040536046 = queryNorm
        0.165991 = fieldWeight in 3154, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03125 = fieldNorm(doc=3154)
    0.019207977 = weight(_text_:u in 3154) [ClassicSimilarity], result of:
      0.019207977 = score(doc=3154,freq=2.0), product of:
        0.13273303 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.040536046 = queryNorm
        0.14471136 = fieldWeight in 3154, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03125 = fieldNorm(doc=3154)
    0.0023817732 = weight(_text_:a in 3154) [ClassicSimilarity], result of:
      0.0023817732 = score(doc=3154,freq=2.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.050957955 = fieldWeight in 3154, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03125 = fieldNorm(doc=3154)
  0.6 = coord(3/5)

Source: Wissensorganisation und Edutainment: Wissen im Spannungsfeld von Gesellschaft, Gestaltung und Industrie. Proceedings der 7. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation, Berlin, 21.-23.3.2001. Hrsg.: C. Lehner, H.P. Ohly u. G. Rahmstorf
Type: a

Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.02

0.02224016 = product of:
  0.055600397 = sum of:
    0.031590424 = weight(_text_:g in 5997) [ClassicSimilarity], result of:
      0.031590424 = score(doc=5997,freq=2.0), product of:
        0.15225126 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.040536046 = queryNorm
        0.20748875 = fieldWeight in 5997, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
    0.024009973 = weight(_text_:u in 5997) [ClassicSimilarity], result of:
      0.024009973 = score(doc=5997,freq=2.0), product of:
        0.13273303 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.040536046 = queryNorm
        0.1808892 = fieldWeight in 5997, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
  0.4 = coord(2/5)

Editor: Gaul, W. u. G. Ritter

Mandl, T.: Text mining und data minig (2013) 0.02

0.021589752 = product of:
  0.05397438 = sum of:
    0.048019946 = weight(_text_:u in 713) [ClassicSimilarity], result of:
      0.048019946 = score(doc=713,freq=2.0), product of:
        0.13273303 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.040536046 = queryNorm
        0.3617784 = fieldWeight in 713, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.078125 = fieldNorm(doc=713)
    0.0059544328 = weight(_text_:a in 713) [ClassicSimilarity], result of:
      0.0059544328 = score(doc=713,freq=2.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.12739488 = fieldWeight in 713, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.078125 = fieldNorm(doc=713)
  0.4 = coord(2/5)

Source: Grundlagen der praktischen Information und Dokumentation. Handbuch zur Einführung in die Informationswissenschaft und -praxis. 6., völlig neu gefaßte Ausgabe. Hrsg. von R. Kuhlen, W. Semar u. D. Strauch. Begründet von Klaus Laisiepen, Ernst Lutterbeck, Karl-Heinrich Meyer-Uhlenried
Type: a

Maaten, L. van den; Hinton, G.: Visualizing non-metric similarities in multiple maps (2012) 0.02
```
0.019205404 = product of:
  0.048013512 = sum of:
    0.037908506 = weight(_text_:g in 3884) [ClassicSimilarity], result of:
      0.037908506 = score(doc=3884,freq=2.0), product of:
        0.15225126 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.040536046 = queryNorm
        0.24898648 = fieldWeight in 3884, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.046875 = fieldNorm(doc=3884)
    0.010105007 = weight(_text_:a in 3884) [ClassicSimilarity], result of:
      0.010105007 = score(doc=3884,freq=16.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.2161963 = fieldWeight in 3884, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=3884)
  0.4 = coord(2/5)
```
Abstract

Techniques for multidimensional scaling visualize objects as points in a low-dimensional metric map. As a result, the visualizations are subject to the fundamental limitations of metric spaces. These limitations prevent multidimensional scaling from faithfully representing non-metric similarity data such as word associations or event co-occurrences. In particular, multidimensional scaling cannot faithfully represent intransitive pairwise similarities in a visualization, and it cannot faithfully visualize "central" objects. In this paper, we present an extension of a recently proposed multidimensional scaling technique called t-SNE. The extension aims to address the problems of traditional multidimensional scaling techniques when these techniques are used to visualize non-metric similarities. The new technique, called multiple maps t-SNE, alleviates these problems by constructing a collection of maps that reveal complementary structure in the similarity data. We apply multiple maps t-SNE to a large data set of word association data and to a data set of NIPS co-authorships, demonstrating its ability to successfully visualize non-metric similarities.

Type

a

Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.02

0.018712291 = product of:
  0.046780728 = sum of:
    0.008336206 = weight(_text_:a in 4577) [ClassicSimilarity], result of:
      0.008336206 = score(doc=4577,freq=2.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.17835285 = fieldWeight in 4577, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.109375 = fieldNorm(doc=4577)
    0.038444523 = product of:
      0.076889046 = sum of:
        0.076889046 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
          0.076889046 = score(doc=4577,freq=2.0), product of:
            0.14195032 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.040536046 = queryNorm
            0.5416616 = fieldWeight in 4577, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=4577)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Date: 2. 4.2000 18:01:22
Type: a

Chardonnens, A.; Hengchen, S.: Text mining for cultural heritage institutions : a 5-step method for cultural heritage institutions (2017) 0.02

0.018666664 = product of:
  0.04666666 = sum of:
    0.038415954 = weight(_text_:u in 646) [ClassicSimilarity], result of:
      0.038415954 = score(doc=646,freq=2.0), product of:
        0.13273303 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.040536046 = queryNorm
        0.28942272 = fieldWeight in 646, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0625 = fieldNorm(doc=646)
    0.008250705 = weight(_text_:a in 646) [ClassicSimilarity], result of:
      0.008250705 = score(doc=646,freq=6.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.17652355 = fieldWeight in 646, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=646)
  0.4 = coord(2/5)

Source: Everything changes, everything stays the same? - Understanding information spaces : Proceedings of the 15th International Symposium of Information Science (ISI 2017), Berlin/Germany, 13th - 15th March 2017. Eds.: M. Gäde, V. Trkulja u. V. Petras
Type: a

Benoit, G.: Data mining (2002) 0.02
```
0.01802153 = product of:
  0.045053825 = sum of:
    0.037908506 = weight(_text_:g in 4296) [ClassicSimilarity], result of:
      0.037908506 = score(doc=4296,freq=2.0), product of:
        0.15225126 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.040536046 = queryNorm
        0.24898648 = fieldWeight in 4296, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.046875 = fieldNorm(doc=4296)
    0.0071453196 = weight(_text_:a in 4296) [ClassicSimilarity], result of:
      0.0071453196 = score(doc=4296,freq=8.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.15287387 = fieldWeight in 4296, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=4296)
  0.4 = coord(2/5)
```
Abstract

Data mining (DM) is a multistaged process of extracting previously unanticipated knowledge from large databases, and applying the results to decision making. Data mining tools detect patterns from the data and infer associations and rules from them. The extracted information may then be applied to prediction or classification models by identifying relations within the data records or between databases. Those patterns and rules can then guide decision making and forecast the effects of those decisions. However, this definition may be applied equally to "knowledge discovery in databases" (KDD). Indeed, in the recent literature of DM and KDD, a source of confusion has emerged, making it difficult to determine the exact parameters of both. KDD is sometimes viewed as the broader discipline, of which data mining is merely a component-specifically pattern extraction, evaluation, and cleansing methods (Raghavan, Deogun, & Sever, 1998, p. 397). Thurasingham (1999, p. 2) remarked that "knowledge discovery," "pattern discovery," "data dredging," "information extraction," and "knowledge mining" are all employed as synonyms for DM. Trybula, in his ARIST chapter an text mining, observed that the "existing work [in KDD] is confusing because the terminology is inconsistent and poorly defined.

Type

a

Heyer, G.; Quasthoff, U.; Wittig, T.: Text Mining : Wissensrohstoff Text. Konzepte, Algorithmen, Ergebnisse (2006) 0.02

0.017792126 = product of:
  0.044480316 = sum of:
    0.025272338 = weight(_text_:g in 5218) [ClassicSimilarity], result of:
      0.025272338 = score(doc=5218,freq=2.0), product of:
        0.15225126 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.040536046 = queryNorm
        0.165991 = fieldWeight in 5218, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.03125 = fieldNorm(doc=5218)
    0.019207977 = weight(_text_:u in 5218) [ClassicSimilarity], result of:
      0.019207977 = score(doc=5218,freq=2.0), product of:
        0.13273303 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.040536046 = queryNorm
        0.14471136 = fieldWeight in 5218, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03125 = fieldNorm(doc=5218)
  0.4 = coord(2/5)

Tiefschürfen in Datenbanken (2002) 0.02

0.0172718 = product of:
  0.0431795 = sum of:
    0.038415954 = weight(_text_:u in 996) [ClassicSimilarity], result of:
      0.038415954 = score(doc=996,freq=2.0), product of:
        0.13273303 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.040536046 = queryNorm
        0.28942272 = fieldWeight in 996, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0625 = fieldNorm(doc=996)
    0.0047635464 = weight(_text_:a in 996) [ClassicSimilarity], result of:
      0.0047635464 = score(doc=996,freq=2.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.10191591 = fieldWeight in 996, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=996)
  0.4 = coord(2/5)

Content: Enthält die Beiträge: Kruse, R., C. Borgelt: Suche im Datendschungel - Borgelt, C. u. R. Kruse: Unsicheres Wissen nutzen - Wrobel, S.: Lern- und Entdeckungsverfahren - Keim, D.A.: Data Mining mit bloßem Auge
Type: a

Ekbia, H.; Mattioli, M.; Kouper, I.; Arave, G.; Ghazinejad, A.; Bowman, T.; Suri, V.R.; Tsou, A.; Weingart, S.; Sugimoto, C.R.: Big data, bigger dilemmas : a critical review (2015) 0.02
```
0.01709206 = product of:
  0.04273015 = sum of:
    0.031590424 = weight(_text_:g in 2155) [ClassicSimilarity], result of:
      0.031590424 = score(doc=2155,freq=2.0), product of:
        0.15225126 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.040536046 = queryNorm
        0.20748875 = fieldWeight in 2155, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2155)
    0.0111397235 = weight(_text_:a in 2155) [ClassicSimilarity], result of:
      0.0111397235 = score(doc=2155,freq=28.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.23833402 = fieldWeight in 2155, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2155)
  0.4 = coord(2/5)
```
Abstract

The recent interest in Big Data has generated a broad range of new academic, corporate, and policy practices along with an evolving debate among its proponents, detractors, and skeptics. While the practices draw on a common set of tools, techniques, and technologies, most contributions to the debate come either from a particular disciplinary perspective or with a focus on a domain-specific issue. A close examination of these contributions reveals a set of common problematics that arise in various guises and in different places. It also demonstrates the need for a critical synthesis of the conceptual and practical dilemmas surrounding Big Data. The purpose of this article is to provide such a synthesis by drawing on relevant writings in the sciences, humanities, policy, and trade literature. In bringing these diverse literatures together, we aim to shed light on the common underlying issues that concern and affect all of these areas. By contextualizing the phenomenon of Big Data within larger socioeconomic developments, we also seek to provide a broader understanding of its drivers, barriers, and challenges. This approach allows us to identify attributes of Big Data that require more attention-autonomy, opacity, generativity, disparity, and futurity-leading to questions and ideas for moving beyond dilemmas.

Type

a

KDD : techniques and applications (1998) 0.02

0.016039107 = product of:
  0.040097766 = sum of:
    0.0071453196 = weight(_text_:a in 6783) [ClassicSimilarity], result of:
      0.0071453196 = score(doc=6783,freq=2.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.15287387 = fieldWeight in 6783, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.09375 = fieldNorm(doc=6783)
    0.032952446 = product of:
      0.06590489 = sum of:
        0.06590489 = weight(_text_:22 in 6783) [ClassicSimilarity], result of:
          0.06590489 = score(doc=6783,freq=2.0), product of:
            0.14195032 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.040536046 = queryNorm
            0.46428138 = fieldWeight in 6783, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=6783)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Footnote: A special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997

Maaten, L. van den; Hinton, G.: Visualizing data using t-SNE (2008) 0.02
```
0.016004506 = product of:
  0.040011264 = sum of:
    0.031590424 = weight(_text_:g in 3888) [ClassicSimilarity], result of:
      0.031590424 = score(doc=3888,freq=2.0), product of:
        0.15225126 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.040536046 = queryNorm
        0.20748875 = fieldWeight in 3888, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3888)
    0.00842084 = weight(_text_:a in 3888) [ClassicSimilarity], result of:
      0.00842084 = score(doc=3888,freq=16.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.18016359 = fieldWeight in 3888, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3888)
  0.4 = coord(2/5)
```
Abstract

We present a new technique called "t-SNE" that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large data sets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of data sets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the data sets.

Type

a

Baumgartner, R.: Methoden und Werkzeuge zur Webdatenextraktion (2006) 0.02

0.01580342 = product of:
  0.03950855 = sum of:
    0.03361396 = weight(_text_:u in 5808) [ClassicSimilarity], result of:
      0.03361396 = score(doc=5808,freq=2.0), product of:
        0.13273303 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.040536046 = queryNorm
        0.25324488 = fieldWeight in 5808, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5808)
    0.0058945883 = weight(_text_:a in 5808) [ClassicSimilarity], result of:
      0.0058945883 = score(doc=5808,freq=4.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.12611452 = fieldWeight in 5808, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5808)
  0.4 = coord(2/5)

Source: Semantic Web: Wege zur vernetzten Wissensgesellschaft. Hrsg.: T. Pellegrini, u. A. Blumauer
Type: a

Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.02
```
0.015553235 = product of:
  0.038883086 = sum of:
    0.031590424 = weight(_text_:g in 967) [ClassicSimilarity], result of:
      0.031590424 = score(doc=967,freq=2.0), product of:
        0.15225126 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.040536046 = queryNorm
        0.20748875 = fieldWeight in 967, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.0390625 = fieldNorm(doc=967)
    0.0072926614 = weight(_text_:a in 967) [ClassicSimilarity], result of:
      0.0072926614 = score(doc=967,freq=12.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.15602624 = fieldWeight in 967, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=967)
  0.4 = coord(2/5)
```
Abstract

Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.

Type

a
Kulathuramaiyer, N.; Maurer, H.: Implications of emerging data mining (2009) 0.02
```
0.015305734 = product of:
  0.038264334 = sum of:
    0.028811965 = weight(_text_:u in 3144) [ClassicSimilarity], result of:
      0.028811965 = score(doc=3144,freq=2.0), product of:
        0.13273303 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.040536046 = queryNorm
        0.21706703 = fieldWeight in 3144, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.046875 = fieldNorm(doc=3144)
    0.00945237 = weight(_text_:a in 3144) [ClassicSimilarity], result of:
      0.00945237 = score(doc=3144,freq=14.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.20223314 = fieldWeight in 3144, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=3144)
  0.4 = coord(2/5)
```
Abstract

Data Mining describes a technology that discovers non-trivial hidden patterns in a large collection of data. Although this technology has a tremendous impact on our lives, the invaluable contributions of this invisible technology often go unnoticed. This paper discusses advances in data mining while focusing on the emerging data mining capability. Such data mining applications perform multidimensional mining on a wide variety of heterogeneous data sources, providing solutions to many unresolved problems. This paper also highlights the advantages and disadvantages arising from the ever-expanding scope of data mining. Data Mining augments human intelligence by equipping us with a wealth of knowledge and by empowering us to perform our daily tasks better. As the mining scope and capacity increases, users and organizations become more willing to compromise privacy. The huge data stores of the 'master miners' allow them to gain deep insights into individual lifestyles and their social and behavioural patterns. Data integration and analysis capability of combining business and financial trends together with the ability to deterministically track market changes will drastically affect our lives.

Source

Social Semantic Web: Web 2.0, was nun? Hrsg.: A. Blumauer u. T. Pellegrini

Type

a
Li, D.; Tang, J.; Ding, Y.; Shuai, X.; Chambers, T.; Sun, G.; Luo, Z.; Zhang, J.: Topic-level opinion influence model (TOIM) : an investigation using tencent microblogging (2015) 0.02
```
0.015299073 = product of:
  0.038247682 = sum of:
    0.031590424 = weight(_text_:g in 2345) [ClassicSimilarity], result of:
      0.031590424 = score(doc=2345,freq=2.0), product of:
        0.15225126 = queryWeight, product of:
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.040536046 = queryNorm
        0.20748875 = fieldWeight in 2345, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.7559474 = idf(docFreq=2809, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2345)
    0.0066572586 = weight(_text_:a in 2345) [ClassicSimilarity], result of:
      0.0066572586 = score(doc=2345,freq=10.0), product of:
        0.046739966 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.040536046 = queryNorm
        0.14243183 = fieldWeight in 2345, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2345)
  0.4 = coord(2/5)
```
Abstract

Text mining has been widely used in multiple types of user-generated data to infer user opinion, but its application to microblogging is difficult because text messages are short and noisy, providing limited information about user opinion. Given that microblogging users communicate with each other to form a social network, we hypothesize that user opinion is influenced by its neighbors in the network. In this paper, we infer user opinion on a topic by combining two factors: the user's historical opinion about relevant topics and opinion influence from his/her neighbors. We thus build a topic-level opinion influence model (TOIM) by integrating both topic factor and opinion influence factor into a unified probabilistic model. We evaluate our model in one of the largest microblogging sites in China, Tencent Weibo, and the experiments show that TOIM outperforms baseline methods in opinion inference accuracy. Moreover, incorporating indirect influence further improves inference recall and f1-measure. Finally, we demonstrate some useful applications of TOIM in analyzing users' behaviors in Tencent Weibo.

Type

a

Search (162 results, page 1 of 9)

Authors

Years

Languages

Types

Themes

Subjects

Classifications