Search (53 results, page 1 of 3)

Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.02

0.022668483 = product of:
  0.045336965 = sum of:
    0.032071784 = product of:
      0.06414357 = sum of:
        0.06414357 = weight(_text_:media in 5997) [ClassicSimilarity], result of:
          0.06414357 = score(doc=5997,freq=4.0), product of:
            0.17529039 = queryWeight, product of:
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.037424255 = queryNorm
            0.36592746 = fieldWeight in 5997, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5997)
      0.5 = coord(1/2)
    0.013265181 = product of:
      0.026530363 = sum of:
        0.026530363 = weight(_text_:28 in 5997) [ClassicSimilarity], result of:
          0.026530363 = score(doc=5997,freq=2.0), product of:
            0.13406353 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.037424255 = queryNorm
            0.19789396 = fieldWeight in 5997, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5997)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Content: Data Analysis, Statistics, and Classification.- Pattern Recognition and Automation.- Data Mining, Information Processing, and Automation.- New Media, Web Mining, and Automation.- Applications in Management Science, Finance, and Marketing.- Applications in Medicine, Biology, Archaeology, and Others.- Author Index.- Subject Index.
Date: 26. 9.2006 18:02:28

Heyer, G.; Quasthoff, U.; Wittig, T.: Text Mining : Wissensrohstoff Text. Konzepte, Algorithmen, Ergebnisse (2006) 0.02
```
0.019412445 = product of:
  0.03882489 = sum of:
    0.010612145 = product of:
      0.02122429 = sum of:
        0.02122429 = weight(_text_:28 in 5218) [ClassicSimilarity], result of:
          0.02122429 = score(doc=5218,freq=2.0), product of:
            0.13406353 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.037424255 = queryNorm
            0.15831517 = fieldWeight in 5218, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.03125 = fieldNorm(doc=5218)
      0.5 = coord(1/2)
    0.028212745 = product of:
      0.08463823 = sum of:
        0.08463823 = weight(_text_:ermittelt in 5218) [ClassicSimilarity], result of:
          0.08463823 = score(doc=5218,freq=2.0), product of:
            0.26771787 = queryWeight, product of:
              7.1535926 = idf(docFreq=93, maxDocs=44218)
              0.037424255 = queryNorm
            0.31614712 = fieldWeight in 5218, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.1535926 = idf(docFreq=93, maxDocs=44218)
              0.03125 = fieldNorm(doc=5218)
      0.33333334 = coord(1/3)
  0.5 = coord(2/4)
```
Abstract

Ein großer Teil des Weltwissens befindet sich in Form digitaler Texte im Internet oder in Intranets. Heutige Suchmaschinen nutzen diesen Wissensrohstoff nur rudimentär: Sie können semantische Zusammen-hänge nur bedingt erkennen. Alle warten auf das semantische Web, in dem die Ersteller von Text selbst die Semantik einfügen. Das wird aber noch lange dauern. Es gibt jedoch eine Technologie, die es bereits heute ermöglicht semantische Zusammenhänge in Rohtexten zu analysieren und aufzubereiten. Das Forschungsgebiet "Text Mining" ermöglicht es mit Hilfe statistischer und musterbasierter Verfahren, Wissen aus Texten zu extrahieren, zu verarbeiten und zu nutzen. Hier wird die Basis für die Suchmaschinen der Zukunft gelegt. Das erste deutsche Lehrbuch zu einer bahnbrechenden Technologie: Text Mining: Wissensrohstoff Text Konzepte, Algorithmen, Ergebnisse Ein großer Teil des Weltwissens befindet sich in Form digitaler Texte im Internet oder in Intranets. Heutige Suchmaschinen nutzen diesen Wissensrohstoff nur rudimentär: Sie können semantische Zusammen-hänge nur bedingt erkennen. Alle warten auf das semantische Web, in dem die Ersteller von Text selbst die Semantik einfügen. Das wird aber noch lange dauern. Es gibt jedoch eine Technologie, die es bereits heute ermöglicht semantische Zusammenhänge in Rohtexten zu analysieren und aufzubereiten. Das For-schungsgebiet "Text Mining" ermöglicht es mit Hilfe statistischer und musterbasierter Verfahren, Wissen aus Texten zu extrahieren, zu verarbeiten und zu nutzen. Hier wird die Basis für die Suchmaschinen der Zukunft gelegt. Was fällt Ihnen bei dem Wort "Stich" ein? Die einen denken an Tennis, die anderen an Skat. Die verschiedenen Zusammenhänge können durch Text Mining automatisch ermittelt und in Form von Wortnetzen dargestellt werden. Welche Begriffe stehen am häufigsten links und rechts vom Wort "Festplatte"? Welche Wortformen und Eigennamen treten seit 2001 neu in der deutschen Sprache auf? Text Mining beantwortet diese und viele weitere Fragen. Tauchen Sie mit diesem Lehrbuch ein in eine neue, faszinierende Wissenschaftsdisziplin und entdecken Sie neue, bisher unbekannte Zusammenhänge und Sichtweisen. Sehen Sie, wie aus dem Wissensrohstoff Text Wissen wird! Dieses Lehrbuch richtet sich sowohl an Studierende als auch an Praktiker mit einem fachlichen Schwerpunkt in der Informatik, Wirtschaftsinformatik und/oder Linguistik, die sich über die Grundlagen, Verfahren und Anwendungen des Text Mining informieren möchten und Anregungen für die Implementierung eigener Anwendungen suchen. Es basiert auf Arbeiten, die während der letzten Jahre an der Abteilung Automatische Sprachverarbeitung am Institut für Informatik der Universität Leipzig unter Leitung von Prof. Dr. Heyer entstanden sind. Eine Fülle praktischer Beispiele von Text Mining-Konzepten und -Algorithmen verhelfen dem Leser zu einem umfassenden, aber auch detaillierten Verständnis der Grundlagen und Anwendungen des Text Mining. Folgende Themen werden behandelt: Wissen und Text Grundlagen der Bedeutungsanalyse Textdatenbanken Sprachstatistik Clustering Musteranalyse Hybride Verfahren Beispielanwendungen Anhänge: Statistik und linguistische Grundlagen 360 Seiten, 54 Abb., 58 Tabellen und 95 Glossarbegriffe Mit kostenlosen e-learning-Kurs "Schnelleinstieg: Sprachstatistik" Zusätzlich zum Buch gibt es in Kürze einen Online-Zertifikats-Kurs mit Mentor- und Tutorunterstützung.

Date

19. 7.2006 20:28:27

Methodologies for knowledge discovery and data mining : Third Pacific-Asia Conference, PAKDD'99, Beijing, China, April 26-28, 1999, Proceedings (1999) 0.02

0.015254872 = product of:
  0.030509744 = sum of:
    0.018571254 = product of:
      0.037142508 = sum of:
        0.037142508 = weight(_text_:28 in 3821) [ClassicSimilarity], result of:
          0.037142508 = score(doc=3821,freq=2.0), product of:
            0.13406353 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.037424255 = queryNorm
            0.27705154 = fieldWeight in 3821, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3821)
      0.5 = coord(1/2)
    0.01193849 = product of:
      0.03581547 = sum of:
        0.03581547 = weight(_text_:29 in 3821) [ClassicSimilarity], result of:
          0.03581547 = score(doc=3821,freq=2.0), product of:
            0.13164683 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.037424255 = queryNorm
            0.27205724 = fieldWeight in 3821, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3821)
      0.33333334 = coord(1/3)
  0.5 = coord(2/4)

Abstract: The 29 revised full papers presented together with 37 short papers were carefully selected from a total of 158 submissions. The book is divided into sections on emerging KDD technology; association rules; feature selection and generation; mining in semi-unstructured data; interestingness, surprisingness, and exceptions; rough sets, fuzzy logic, and neural networks; induction, classification, and clustering; visualization, causal models and graph-based methods; agent-based and distributed data mining; and advanced topics and new methodologies

Wiegmann, S.: Hättest du die Titanic überlebt? : Eine kurze Einführung in das Data Mining mit freier Software (2023) 0.02

0.015254872 = product of:
  0.030509744 = sum of:
    0.018571254 = product of:
      0.037142508 = sum of:
        0.037142508 = weight(_text_:28 in 876) [ClassicSimilarity], result of:
          0.037142508 = score(doc=876,freq=2.0), product of:
            0.13406353 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.037424255 = queryNorm
            0.27705154 = fieldWeight in 876, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.0546875 = fieldNorm(doc=876)
      0.5 = coord(1/2)
    0.01193849 = product of:
      0.03581547 = sum of:
        0.03581547 = weight(_text_:29 in 876) [ClassicSimilarity], result of:
          0.03581547 = score(doc=876,freq=2.0), product of:
            0.13164683 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.037424255 = queryNorm
            0.27205724 = fieldWeight in 876, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=876)
      0.33333334 = coord(1/3)
  0.5 = coord(2/4)

Date: 28. 1.2022 11:05:29

Amir, A.; Feldman, R.; Kashi, R.: ¬A new and versatile method for association generation (1997) 0.01

0.013582623 = product of:
  0.05433049 = sum of:
    0.05433049 = product of:
      0.08149573 = sum of:
        0.04093197 = weight(_text_:29 in 1270) [ClassicSimilarity], result of:
          0.04093197 = score(doc=1270,freq=2.0), product of:
            0.13164683 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.037424255 = queryNorm
            0.31092256 = fieldWeight in 1270, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0625 = fieldNorm(doc=1270)
        0.04056376 = weight(_text_:22 in 1270) [ClassicSimilarity], result of:
          0.04056376 = score(doc=1270,freq=2.0), product of:
            0.13105336 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037424255 = queryNorm
            0.30952093 = fieldWeight in 1270, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1270)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Date: 5. 4.1996 15:29:15
Source: Information systems. 22(1997) nos.5/6, S.333-347

Lischka, K.: Spurensuche im Datenwust : Data-Mining-Software fahndet nach kriminellen Mitarbeitern, guten Kunden - und bald vielleicht auch nach Terroristen (2002) 0.01
```
0.013115015 = product of:
  0.05246006 = sum of:
    0.05246006 = product of:
      0.07869009 = sum of:
        0.06347868 = weight(_text_:ermittelt in 1178) [ClassicSimilarity], result of:
          0.06347868 = score(doc=1178,freq=2.0), product of:
            0.26771787 = queryWeight, product of:
              7.1535926 = idf(docFreq=93, maxDocs=44218)
              0.037424255 = queryNorm
            0.23711035 = fieldWeight in 1178, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.1535926 = idf(docFreq=93, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1178)
        0.015211409 = weight(_text_:22 in 1178) [ClassicSimilarity], result of:
          0.015211409 = score(doc=1178,freq=2.0), product of:
            0.13105336 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037424255 = queryNorm
            0.116070345 = fieldWeight in 1178, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1178)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)
```
Content

"Ob man als Terrorist einen Anschlag gegen die Vereinigten Staaten plant, als Kassierer Scheine aus der Kasse unterschlägt oder für bestimmte Produkte besonders gerne Geld ausgibt - einen Unterschied macht Data-Mining-Software da nicht. Solche Programme analysieren riesige Daten- mengen und fällen statistische Urteile. Mit diesen Methoden wollen nun die For- scher des "Information Awaren in den Vereinigten Staaten Spuren von Terroristen in den Datenbanken von Behörden und privaten Unternehmen wie Kreditkartenfirmen finden. 200 Millionen Dollar umfasst der Jahresetat für die verschiedenen Forschungsprojekte. Dass solche Software in der Praxis funktioniert, zeigen die steigenden Umsätze der Anbieter so genannter Customer-Relationship-Management-Software. Im vergangenen Jahr ist das Potenzial für analytische CRM-Anwendungen laut dem Marktforschungsinstitut IDC weltweit um 22 Prozent gewachsen, bis zum Jahr 2006 soll es in Deutschland mit einem jährlichen Plus von 14,1 Prozent so weitergehen. Und das trotz schwacher Konjunktur - oder gerade deswegen. Denn ähnlich wie Data-Mining der USRegierung helfen soll, Terroristen zu finden, entscheiden CRM-Programme heute, welche Kunden für eine Firma profitabel sind. Und welche es künftig sein werden, wie Manuela Schnaubelt, Sprecherin des CRM-Anbieters SAP, beschreibt: "Die Kundenbewertung ist ein zentraler Bestandteil des analytischen CRM. Sie ermöglicht es Unternehmen, sich auf die für sie wichtigen und richtigen Kunden zu fokussieren. Darüber hinaus können Firmen mit speziellen Scoring- Verfahren ermitteln, welche Kunden langfristig in welchem Maße zum Unternehmenserfolg beitragen." Die Folgen der Bewertungen sind für die Betroffenen nicht immer positiv: Attraktive Kunden profitieren von individuellen Sonderangeboten und besonderer Zuwendung. Andere hängen vielleicht so lauge in der Warteschleife des Telefonservice, bis die profitableren Kunden abgearbeitet sind. So könnte eine praktische Umsetzung dessen aussehen, was SAP-Spreche-rin Schnaubelt abstrakt beschreibt: "In vielen Unternehmen wird Kundenbewertung mit der klassischen ABC-Analyse durchgeführt, bei der Kunden anhand von Daten wie dem Umsatz kategorisiert werden. A-Kunden als besonders wichtige Kunden werden anders betreut als C-Kunden." Noch näher am geplanten Einsatz von Data-Mining zur Terroristenjagd ist eine Anwendung, die heute viele Firmen erfolgreich nutzen: Sie spüren betrügende Mitarbeiter auf. Werner Sülzer vom großen CRM-Anbieter NCR Teradata beschreibt die Möglichkeiten so: "Heute hinterlässt praktisch jeder Täter - ob Mitarbeiter, Kunde oder Lieferant - Datenspuren bei seinen wirtschaftskriminellen Handlungen. Es muss vorrangig darum gehen, einzelne Spuren zu Handlungsmustern und Täterprofilen zu verdichten. Das gelingt mittels zentraler Datenlager und hoch entwickelter Such- und Analyseinstrumente." Von konkreten Erfolgen sprich: Entlas-sungen krimineller Mitarbeiter-nach Einsatz solcher Programme erzählen Unternehmen nicht gerne. Matthias Wilke von der "Beratungsstelle für Technologiefolgen und Qualifizierung" (BTQ) der Gewerkschaft Verdi weiß von einem Fall 'aus der Schweiz. Dort setzt die Handelskette "Pick Pay" das Programm "Lord Lose Prevention" ein. Zwei Monate nach Einfüh-rung seien Unterschlagungen im Wert von etwa 200 000 Franken ermittelt worden. Das kostete mehr als 50 verdächtige Kassiererinnen und Kassierer den Job.

Medien-Informationsmanagement : Archivarische, dokumentarische, betriebswirtschaftliche, rechtliche und Berufsbild-Aspekte ; [Frühjahrstagung der Fachgruppe 7 im Jahr 2000 in Weimar und Folgetagung 2001 in Köln] (2003) 0.01

0.01215677 = product of:
  0.02431354 = sum of:
    0.019243069 = product of:
      0.038486138 = sum of:
        0.038486138 = weight(_text_:media in 1833) [ClassicSimilarity], result of:
          0.038486138 = score(doc=1833,freq=4.0), product of:
            0.17529039 = queryWeight, product of:
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.037424255 = queryNorm
            0.21955647 = fieldWeight in 1833, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1833)
      0.5 = coord(1/2)
    0.00507047 = product of:
      0.015211409 = sum of:
        0.015211409 = weight(_text_:22 in 1833) [ClassicSimilarity], result of:
          0.015211409 = score(doc=1833,freq=2.0), product of:
            0.13105336 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037424255 = queryNorm
            0.116070345 = fieldWeight in 1833, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1833)
      0.33333334 = coord(1/3)
  0.5 = coord(2/4)

Date: 11. 5.2008 19:49:22
LCSH: Mass media / Archival resources / Congresses
Subject: Mass media / Archival resources / Congresses

Hofstede, A.H.M. ter; Proper, H.A.; Van der Weide, T.P.: Exploiting fact verbalisation in conceptual information modelling (1997) 0.01

0.011884794 = product of:
  0.047539175 = sum of:
    0.047539175 = product of:
      0.07130876 = sum of:
        0.03581547 = weight(_text_:29 in 2908) [ClassicSimilarity], result of:
          0.03581547 = score(doc=2908,freq=2.0), product of:
            0.13164683 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.037424255 = queryNorm
            0.27205724 = fieldWeight in 2908, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2908)
        0.03549329 = weight(_text_:22 in 2908) [ClassicSimilarity], result of:
          0.03549329 = score(doc=2908,freq=2.0), product of:
            0.13105336 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037424255 = queryNorm
            0.2708308 = fieldWeight in 2908, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2908)
      0.6666667 = coord(2/3)
  0.25 = coord(1/4)

Date: 5. 4.1996 15:29:15
Source: Information systems. 22(1997) nos.5/6, S.349-385

Ebrahimi, M.; ShafieiBavani, E.; Wong, R.; Chen, F.: Twitter user geolocation by filtering of highly mentioned users (2018) 0.01
```
0.0096215345 = product of:
  0.038486138 = sum of:
    0.038486138 = product of:
      0.076972276 = sum of:
        0.076972276 = weight(_text_:media in 4286) [ClassicSimilarity], result of:
          0.076972276 = score(doc=4286,freq=4.0), product of:
            0.17529039 = queryWeight, product of:
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.037424255 = queryNorm
            0.43911293 = fieldWeight in 4286, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.046875 = fieldNorm(doc=4286)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Geolocated social media data provide a powerful source of information about places and regional human behavior. Because only a small amount of social media data have been geolocation-annotated, inference techniques play a substantial role to increase the volume of annotated data. Conventional research in this area has been based on the text content of posts from a given user or the social network of the user, with some recent crossovers between the text- and network-based approaches. This paper proposes a novel approach to categorize highly-mentioned users (celebrities) into Local and Global types, and consequently use Local celebrities as location indicators. A label propagation algorithm is then used over the refined social network for geolocation inference. Finally, we propose a hybrid approach by merging a text-based method as a back-off strategy into our network-based approach. Empirical experiments over three standard Twitter benchmark data sets demonstrate that our approach outperforms state-of-the-art user geolocation methods.

Knowledge discovery and data mining (1998) 0.01

0.007959109 = product of:
  0.031836435 = sum of:
    0.031836435 = product of:
      0.06367287 = sum of:
        0.06367287 = weight(_text_:28 in 2898) [ClassicSimilarity], result of:
          0.06367287 = score(doc=2898,freq=2.0), product of:
            0.13406353 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.037424255 = queryNorm
            0.4749455 = fieldWeight in 2898, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.09375 = fieldNorm(doc=2898)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 7. 2.1999 11:18:28

Liu, W.; Weichselbraun, A.; Scharl, A.; Chang, E.: Semi-automatic ontology extension using spreading activation (2005) 0.01
```
0.0079373615 = product of:
  0.031749446 = sum of:
    0.031749446 = product of:
      0.06349889 = sum of:
        0.06349889 = weight(_text_:media in 3028) [ClassicSimilarity], result of:
          0.06349889 = score(doc=3028,freq=2.0), product of:
            0.17529039 = queryWeight, product of:
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.037424255 = queryNorm
            0.3622497 = fieldWeight in 3028, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3028)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

This paper describes a system to semi-automatically extend and refine ontologies by mining textual data from the Web sites of international online media. Expanding a seed ontology creates a semantic network through co-occurrence analysis, trigger phrase analysis, and disambiguation based on the WordNet lexical dictionary. Spreading activation then processes this semantic network to find the most probable candidates for inclusion in an extended ontology. Approaches to identifying hierarchical relationships such as subsumption, head noun analysis and WordNet consultation are used to confirm and classify the found relationships. Using a seed ontology on "climate change" as an example, this paper demonstrates how spreading activation improves the result by naturally integrating the mentioned methods.
Klein, H.: Web Content Mining (2004) 0.01
```
0.007053186 = product of:
  0.028212745 = sum of:
    0.028212745 = product of:
      0.08463823 = sum of:
        0.08463823 = weight(_text_:ermittelt in 3154) [ClassicSimilarity], result of:
          0.08463823 = score(doc=3154,freq=2.0), product of:
            0.26771787 = queryWeight, product of:
              7.1535926 = idf(docFreq=93, maxDocs=44218)
              0.037424255 = queryNorm
            0.31614712 = fieldWeight in 3154, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.1535926 = idf(docFreq=93, maxDocs=44218)
              0.03125 = fieldNorm(doc=3154)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)
```
Abstract

Web Mining - ein Schlagwort, das mit der Verbreitung des Internets immer öfter zu lesen und zu hören ist. Die gegenwärtige Forschung beschäftigt sich aber eher mit dem Nutzungsverhalten der Internetnutzer, und ein Blick in Tagungsprogramme einschlägiger Konferenzen (z.B. GOR - German Online Research) zeigt, dass die Analyse der Inhalte kaum Thema ist. Auf der GOR wurden 1999 zwei Vorträge zu diesem Thema gehalten, auf der Folgekonferenz 2001 kein einziger. Web Mining ist der Oberbegriff für zwei Typen von Web Mining: Web Usage Mining und Web Content Mining. Unter Web Usage Mining versteht man das Analysieren von Daten, wie sie bei der Nutzung des WWW anfallen und von den Servern protokolliert wenden. Man kann ermitteln, welche Seiten wie oft aufgerufen wurden, wie lange auf den Seiten verweilt wurde und vieles andere mehr. Beim Web Content Mining wird der Inhalt der Webseiten untersucht, der nicht nur Text, sondern auf Bilder, Video- und Audioinhalte enthalten kann. Die Software für die Analyse von Webseiten ist in den Grundzügen vorhanden, doch müssen die meisten Webseiten für die entsprechende Analysesoftware erst aufbereitet werden. Zuerst müssen die relevanten Websites ermittelt werden, die die gesuchten Inhalte enthalten. Das geschieht meist mit Suchmaschinen, von denen es mittlerweile Hunderte gibt. Allerdings kann man nicht davon ausgehen, dass die Suchmaschinen alle existierende Webseiten erfassen. Das ist unmöglich, denn durch das schnelle Wachstum des Internets kommen täglich Tausende von Webseiten hinzu, und bereits bestehende ändern sich der werden gelöscht. Oft weiß man auch nicht, wie die Suchmaschinen arbeiten, denn das gehört zu den Geschäftsgeheimnissen der Betreiber. Man muss also davon ausgehen, dass die Suchmaschinen nicht alle relevanten Websites finden (können). Der nächste Schritt ist das Herunterladen der Websites, dafür gibt es Software, die unter den Bezeichnungen OfflineReader oder Webspider zu finden ist. Das Ziel dieser Programme ist, die Website in einer Form herunterzuladen, die es erlaubt, die Website offline zu betrachten. Die Struktur der Website wird in der Regel beibehalten. Wer die Inhalte einer Website analysieren will, muss also alle Dateien mit seiner Analysesoftware verarbeiten können. Software für Inhaltsanalyse geht davon aus, dass nur Textinformationen in einer einzigen Datei verarbeitet werden. QDA Software (qualitative data analysis) verarbeitet dagegen auch Audiound Videoinhalte sowie internetspezifische Kommunikation wie z.B. Chats.

Heyer, G.; Läuter, M.; Quasthoff, U.; Wolff, C.: Texttechnologische Anwendungen am Beispiel Text Mining (2000) 0.01

0.0068034525 = product of:
  0.02721381 = sum of:
    0.02721381 = product of:
      0.05442762 = sum of:
        0.05442762 = weight(_text_:media in 5565) [ClassicSimilarity], result of:
          0.05442762 = score(doc=5565,freq=2.0), product of:
            0.17529039 = queryWeight, product of:
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.037424255 = queryNorm
            0.31049973 = fieldWeight in 5565, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.046875 = fieldNorm(doc=5565)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Sprachtechnologie für eine dynamische Wirtschaft im Medienzeitalter - Language technologies for dynamic business in the age of the media - L'ingénierie linguistique au service de la dynamisation économique à l'ère du multimédia: Tagungsakten der XXVI. Jahrestagung der Internationalen Vereinigung Sprache und Wirtschaft e.V., 23.-25.11.2000, Fachhochschule Köln. Hrsg.: K.-D. Schmitz

Ohly, H.P.: Bibliometric mining : added value from document analysis and retrieval (2008) 0.01

0.0068034525 = product of:
  0.02721381 = sum of:
    0.02721381 = product of:
      0.05442762 = sum of:
        0.05442762 = weight(_text_:media in 2386) [ClassicSimilarity], result of:
          0.05442762 = score(doc=2386,freq=2.0), product of:
            0.17529039 = queryWeight, product of:
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.037424255 = queryNorm
            0.31049973 = fieldWeight in 2386, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.046875 = fieldNorm(doc=2386)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Kompatibilität, Medien und Ethik in der Wissensorganisation - Compatibility, Media and Ethics in Knowledge Organization: Proceedings der 10. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation Wien, 3.-5. Juli 2006 - Proceedings of the 10th Conference of the German Section of the International Society of Knowledge Organization Vienna, 3-5 July 2006. Ed.: H.P. Ohly, S. Netscher u. K. Mitgutsch

Bella, A. La; Fronzetti Colladon, A.; Battistoni, E.; Castellan, S.; Francucci, M.: Assessing perceived organizational leadership styles through twitter text mining (2018) 0.01
```
0.0068034525 = product of:
  0.02721381 = sum of:
    0.02721381 = product of:
      0.05442762 = sum of:
        0.05442762 = weight(_text_:media in 2400) [ClassicSimilarity], result of:
          0.05442762 = score(doc=2400,freq=2.0), product of:
            0.17529039 = queryWeight, product of:
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.037424255 = queryNorm
            0.31049973 = fieldWeight in 2400, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.046875 = fieldNorm(doc=2400)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

We propose a text classification tool based on support vector machines for the assessment of organizational leadership styles, as appearing to Twitter users. We collected Twitter data over 51 days, related to the first 30 Italian organizations in the 2015 ranking of Forbes Global 2000-out of which we selected the five with the most relevant volumes of tweets. We analyzed the communication of the company leaders, together with the dialogue among the stakeholders of each company, to understand the association with perceived leadership styles and dimensions. To assess leadership profiles, we referred to the 10-factor model developed by Barchiesi and La Bella in 2007. We maintain the distinctiveness of the approach we propose, as it allows a rapid assessment of the perceived leadership capabilities of an enterprise, as they emerge from its social media interactions. It can also be used to show how companies respond and manage their communication when specific events take place, and to assess their stakeholder's reactions.
Wongthontham, P.; Abu-Salih, B.: Ontology-based approach for semantic data extraction from social big data : state-of-the-art and research directions (2018) 0.01
```
0.0068034525 = product of:
  0.02721381 = sum of:
    0.02721381 = product of:
      0.05442762 = sum of:
        0.05442762 = weight(_text_:media in 4097) [ClassicSimilarity], result of:
          0.05442762 = score(doc=4097,freq=2.0), product of:
            0.17529039 = queryWeight, product of:
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.037424255 = queryNorm
            0.31049973 = fieldWeight in 4097, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.046875 = fieldNorm(doc=4097)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

A challenge of managing and extracting useful knowledge from social media data sources has attracted much attention from academic and industry. To address this challenge, semantic analysis of textual data is focused in this paper. We propose an ontology-based approach to extract semantics of textual data and define the domain of data. In other words, we semantically analyse the social data at two levels i.e. the entity level and the domain level. We have chosen Twitter as a social channel challenge for a purpose of concept proof. Domain knowledge is captured in ontologies which are then used to enrich the semantics of tweets provided with specific semantic conceptual representation of entities that appear in the tweets. Case studies are used to demonstrate this approach. We experiment and evaluate our proposed approach with a public dataset collected from Twitter and from the politics domain. The ontology-based approach leverages entity extraction and concept mappings in terms of quantity and accuracy of concept identification.

Budzik, J.; Hammond, K.J.; Birnbaum, L.: Information access in context (2001) 0.01

0.005969245 = product of:
  0.02387698 = sum of:
    0.02387698 = product of:
      0.07163094 = sum of:
        0.07163094 = weight(_text_:29 in 3835) [ClassicSimilarity], result of:
          0.07163094 = score(doc=3835,freq=2.0), product of:
            0.13164683 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.037424255 = queryNorm
            0.5441145 = fieldWeight in 3835, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.109375 = fieldNorm(doc=3835)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Date: 29. 3.2002 17:31:17

Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.01

0.005915548 = product of:
  0.023662193 = sum of:
    0.023662193 = product of:
      0.07098658 = sum of:
        0.07098658 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
          0.07098658 = score(doc=4577,freq=2.0), product of:
            0.13105336 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037424255 = queryNorm
            0.5416616 = fieldWeight in 4577, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=4577)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Date: 2. 4.2000 18:01:22

Survey of text mining : clustering, classification, and retrieval (2004) 0.01
```
0.005669544 = product of:
  0.022678176 = sum of:
    0.022678176 = product of:
      0.045356352 = sum of:
        0.045356352 = weight(_text_:media in 804) [ClassicSimilarity], result of:
          0.045356352 = score(doc=804,freq=2.0), product of:
            0.17529039 = queryWeight, product of:
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.037424255 = queryNorm
            0.25874978 = fieldWeight in 804, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.0390625 = fieldNorm(doc=804)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Extracting content from text continues to be an important research problem for information processing and management. Approaches to capture the semantics of text-based document collections may be based on Bayesian models, probability theory, vector space models, statistical models, or even graph theory. As the volume of digitized textual media continues to grow, so does the need for designing robust, scalable indexing and search strategies (software) to meet a variety of user needs. Knowledge extraction or creation from text requires systematic yet reliable processing that can be codified and adapted for changing needs and environments. This book will draw upon experts in both academia and industry to recommend practical approaches to the purification, indexing, and mining of textual information. It will address document identification, clustering and categorizing documents, cleaning text, and visualizing semantic models of text.
Tonkin, E.L.; Tourte, G.J.L.: Working with text. tools, techniques and approaches for text mining (2016) 0.01
```
0.005669544 = product of:
  0.022678176 = sum of:
    0.022678176 = product of:
      0.045356352 = sum of:
        0.045356352 = weight(_text_:media in 4019) [ClassicSimilarity], result of:
          0.045356352 = score(doc=4019,freq=2.0), product of:
            0.17529039 = queryWeight, product of:
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.037424255 = queryNorm
            0.25874978 = fieldWeight in 4019, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6838713 = idf(docFreq=1110, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4019)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

What is text mining, and how can it be used? What relevance do these methods have to everyday work in information science and the digital humanities? How does one develop competences in text mining? Working with Text provides a series of cross-disciplinary perspectives on text mining and its applications. As text mining raises legal and ethical issues, the legal background of text mining and the responsibilities of the engineer are discussed in this book. Chapters provide an introduction to the use of the popular GATE text mining package with data drawn from social media, the use of text mining to support semantic search, the development of an authority system to support content tagging, and recent techniques in automatic language evaluation. Focused studies describe text mining on historical texts, automated indexing using constrained vocabularies, and the use of natural language processing to explore the climate science literature. Interviews are included that offer a glimpse into the real-life experience of working within commercial and academic text mining.

Search (53 results, page 1 of 3)

Authors

Years

Languages

Types

Themes

Subjects

Classifications