Search (77 results, page 1 of 4)

Klein, H.: Web Content Mining (2004) 0.03
```
0.032007314 = product of:
  0.120027415 = sum of:
    0.006439812 = product of:
      0.012879624 = sum of:
        0.012879624 = weight(_text_:online in 3154) [ClassicSimilarity], result of:
          0.012879624 = score(doc=3154,freq=2.0), product of:
            0.096027054 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031640913 = queryNorm
            0.13412495 = fieldWeight in 3154, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.03125 = fieldNorm(doc=3154)
      0.5 = coord(1/2)
    0.0440151 = weight(_text_:software in 3154) [ClassicSimilarity], result of:
      0.0440151 = score(doc=3154,freq=8.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.35064998 = fieldWeight in 3154, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03125 = fieldNorm(doc=3154)
    0.042123944 = weight(_text_:web in 3154) [ClassicSimilarity], result of:
      0.042123944 = score(doc=3154,freq=16.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.4079388 = fieldWeight in 3154, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=3154)
    0.027448557 = product of:
      0.054897115 = sum of:
        0.054897115 = weight(_text_:analyse in 3154) [ClassicSimilarity], result of:
          0.054897115 = score(doc=3154,freq=4.0), product of:
            0.16670908 = queryWeight, product of:
              5.268782 = idf(docFreq=618, maxDocs=44218)
              0.031640913 = queryNorm
            0.32929888 = fieldWeight in 3154, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.268782 = idf(docFreq=618, maxDocs=44218)
              0.03125 = fieldNorm(doc=3154)
      0.5 = coord(1/2)
  0.26666668 = coord(4/15)
```
Abstract

Web Mining - ein Schlagwort, das mit der Verbreitung des Internets immer öfter zu lesen und zu hören ist. Die gegenwärtige Forschung beschäftigt sich aber eher mit dem Nutzungsverhalten der Internetnutzer, und ein Blick in Tagungsprogramme einschlägiger Konferenzen (z.B. GOR - German Online Research) zeigt, dass die Analyse der Inhalte kaum Thema ist. Auf der GOR wurden 1999 zwei Vorträge zu diesem Thema gehalten, auf der Folgekonferenz 2001 kein einziger. Web Mining ist der Oberbegriff für zwei Typen von Web Mining: Web Usage Mining und Web Content Mining. Unter Web Usage Mining versteht man das Analysieren von Daten, wie sie bei der Nutzung des WWW anfallen und von den Servern protokolliert wenden. Man kann ermitteln, welche Seiten wie oft aufgerufen wurden, wie lange auf den Seiten verweilt wurde und vieles andere mehr. Beim Web Content Mining wird der Inhalt der Webseiten untersucht, der nicht nur Text, sondern auf Bilder, Video- und Audioinhalte enthalten kann. Die Software für die Analyse von Webseiten ist in den Grundzügen vorhanden, doch müssen die meisten Webseiten für die entsprechende Analysesoftware erst aufbereitet werden. Zuerst müssen die relevanten Websites ermittelt werden, die die gesuchten Inhalte enthalten. Das geschieht meist mit Suchmaschinen, von denen es mittlerweile Hunderte gibt. Allerdings kann man nicht davon ausgehen, dass die Suchmaschinen alle existierende Webseiten erfassen. Das ist unmöglich, denn durch das schnelle Wachstum des Internets kommen täglich Tausende von Webseiten hinzu, und bereits bestehende ändern sich der werden gelöscht. Oft weiß man auch nicht, wie die Suchmaschinen arbeiten, denn das gehört zu den Geschäftsgeheimnissen der Betreiber. Man muss also davon ausgehen, dass die Suchmaschinen nicht alle relevanten Websites finden (können). Der nächste Schritt ist das Herunterladen der Websites, dafür gibt es Software, die unter den Bezeichnungen OfflineReader oder Webspider zu finden ist. Das Ziel dieser Programme ist, die Website in einer Form herunterzuladen, die es erlaubt, die Website offline zu betrachten. Die Struktur der Website wird in der Regel beibehalten. Wer die Inhalte einer Website analysieren will, muss also alle Dateien mit seiner Analysesoftware verarbeiten können. Software für Inhaltsanalyse geht davon aus, dass nur Textinformationen in einer einzigen Datei verarbeitet werden. QDA Software (qualitative data analysis) verarbeitet dagegen auch Audiound Videoinhalte sowie internetspezifische Kommunikation wie z.B. Chats.

Perugini, S.; Ramakrishnan, N.: Mining Web functional dependencies for flexible information access (2007) 0.02

0.022503708 = product of:
  0.1687778 = sum of:
    0.059105016 = weight(_text_:web in 602) [ClassicSimilarity], result of:
      0.059105016 = score(doc=602,freq=14.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.57238775 = fieldWeight in 602, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=602)
    0.10967278 = weight(_text_:site in 602) [ClassicSimilarity], result of:
      0.10967278 = score(doc=602,freq=6.0), product of:
        0.1738463 = queryWeight, product of:
          5.494352 = idf(docFreq=493, maxDocs=44218)
          0.031640913 = queryNorm
        0.63086057 = fieldWeight in 602, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.494352 = idf(docFreq=493, maxDocs=44218)
          0.046875 = fieldNorm(doc=602)
  0.13333334 = coord(2/15)

Abstract: We present an approach to enhancing information access through Web structure mining in contrast to traditional approaches involving usage mining. Specifically, we mine the hardwired hierarchical hyperlink structure of Web sites to identify patterns of term-term co-occurrences we call Web functional dependencies (FDs). Intuitively, a Web FD x -> y declares that all paths through a site involving a hyperlink labeled x also contain a hyperlink labeled y. The complete set of FDs satisfied by a site help characterize (flexible and expressive) interaction paradigms supported by a site, where a paradigm is the set of explorable sequences therein. We describe algorithms for mining FDs and results from mining several hierarchical Web sites and present several interface designs that can exploit such FDs to provide compelling user experiences.
Footnote: Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"

Hölzig, C.: Google spürt Grippewellen auf : Die neue Anwendung ist bisher auf die USA beschränkt (2008) 0.02
```
0.015647462 = product of:
  0.0782373 = sum of:
    0.063223675 = weight(_text_:suchmaschine in 2403) [ClassicSimilarity], result of:
      0.063223675 = score(doc=2403,freq=4.0), product of:
        0.17890577 = queryWeight, product of:
          5.6542544 = idf(docFreq=420, maxDocs=44218)
          0.031640913 = queryNorm
        0.3533909 = fieldWeight in 2403, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.6542544 = idf(docFreq=420, maxDocs=44218)
          0.03125 = fieldNorm(doc=2403)
    0.006439812 = product of:
      0.012879624 = sum of:
        0.012879624 = weight(_text_:online in 2403) [ClassicSimilarity], result of:
          0.012879624 = score(doc=2403,freq=2.0), product of:
            0.096027054 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031640913 = queryNorm
            0.13412495 = fieldWeight in 2403, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.03125 = fieldNorm(doc=2403)
      0.5 = coord(1/2)
    0.008573813 = product of:
      0.017147627 = sum of:
        0.017147627 = weight(_text_:22 in 2403) [ClassicSimilarity], result of:
          0.017147627 = score(doc=2403,freq=2.0), product of:
            0.110801086 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031640913 = queryNorm
            0.15476047 = fieldWeight in 2403, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2403)
      0.5 = coord(1/2)
  0.2 = coord(3/15)
```
Content

"Vor Google gibt es kein Entrinnen. Nun macht sich die größte Internetsuchmaschine daran, auch gefährliche Grippewellen in den USA vorauszusagen - und das schneller als die US-Gesundheitsbehörde. In den Regionen, in denen die Influenza grassiert, häufen sich erfahrungsgemäß auch die Online-Anfragen im Internet speziell zu diesem Thema. "Wir haben einen engen Zusammenhang feststellen können zwischen Personen, die nach themenbezogenen Informationen suchen, und Personen, die tatsächlich an der Grippe erkrankt sind", schreibt Google. Ein Webtool namens "Google Flu Trends" errechnet aus den Anfragen die Ausbreitung von Grippeviren. Auch wenn nicht jeder Nutzer erkrankt sei, spiegele die Zahl der Anfragen doch ziemlich genau die Entwicklung einer Grippewelle wider. Das belege ein Vergleich mit den Daten der US-Seuchenkontrollbehörde CDC, die in den meisten Fällen nahezu identisch seien. Die Internet-Suchmaschine könne anders als die Gesundheitsbehörde täglich auf aktuelle Daten zurückgreifen. Dadurch sei Google in der Lage, die Grippesaison ein bis zwei Wochen früher vorherzusagen. Und Zeit bedeutet Leben, wie Lyn Finelli sagt, Leiter der Abteilung Influenza der USSeuchenkontrollbehörde: "Je früher wir gewarnt werden, desto früher können wir handeln. Dies kann die Anzahl der Erkrankten erheblich minimieren." "Google Flu Trends" ist das erste Projekt, das Datenbanken einer Suchmaschine nutzt, um einen auftretenden Grippevirus zu lokalisieren - zurzeit nur in den USA, aber weltweite Prognosen wären ein folgerichtiger nächster Schritt. Philip M. Polgreen von der Universität von Iowa verspricht sich noch viel mehr: "Theoretisch können wir diese Flut an Informationen dazu nutzen, auch den Verlauf anderer Krankheiten besser zu studieren." Um das Grippe-Ausbreitungsmodell zu erstellen, hat Google mehrere hundert Milliarden Suchanfragen aus den vergangenen Jahren analysiert. Datenschützer haben den Internetgiganten bereits mehrfach als "datenschutzfeindlich" eingestuft. Die Anwender wüssten weder, was mit den gesammelten Daten passiere, noch wie lange gespeicherte Informationen verfügbar seien. Google versichert jedoch, dass "Flu Trends" die Privatsphäre wahre. Das Tool könne niemals dazu genutzt werden, einzelne Nutzer zu identifizieren, da wir bei der Erstellung der Statistiken lediglich anonyme Datenmaterialien nutzen. Die Muster, die wir in den Daten analysieren, ergeben erst in einem größeren Kontext Sinn." An einer echten Virus-Grippe - nicht zu verwechseln mit einer Erkältung - erkranken weltweit mehrere Millionen Menschen, mehr als 500 000 sterben daran."

Date

3. 5.1997 8:44:22
Liu, Y.; Huang, X.; An, A.: Personalized recommendation with adaptive mixture of markov models (2007) 0.01
```
0.014914072 = product of:
  0.11185554 = sum of:
    0.03723266 = weight(_text_:web in 606) [ClassicSimilarity], result of:
      0.03723266 = score(doc=606,freq=8.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.36057037 = fieldWeight in 606, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=606)
    0.07462288 = weight(_text_:site in 606) [ClassicSimilarity], result of:
      0.07462288 = score(doc=606,freq=4.0), product of:
        0.1738463 = queryWeight, product of:
          5.494352 = idf(docFreq=493, maxDocs=44218)
          0.031640913 = queryNorm
        0.42924625 = fieldWeight in 606, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.494352 = idf(docFreq=493, maxDocs=44218)
          0.0390625 = fieldNorm(doc=606)
  0.13333334 = coord(2/15)
```
Abstract

With more and more information available on the Internet, the task of making personalized recommendations to assist the user's navigation has become increasingly important. Considering there might be millions of users with different backgrounds accessing a Web site everyday, it is infeasible to build a separate recommendation system for each user. To address this problem, clustering techniques can first be employed to discover user groups. Then, user navigation patterns for each group can be discovered, to allow the adaptation of a Web site to the interest of each individual group. In this paper, we propose to model user access sequences as stochastic processes, and a mixture of Markov models based approach is taken to cluster users and to capture the sequential relationships inherent in user access histories. Several important issues that arise in constructing the Markov models are also addressed. The first issue lies in the complexity of the mixture of Markov models. To improve the efficiency of building/maintaining the mixture of Markov models, we develop a lightweight adapt-ive algorithm to update the model parameters without recomputing model parameters from scratch. The second issue concerns the proper selection of training data for building the mixture of Markov models. We investigate two different training data selection strategies and perform extensive experiments to compare their effectiveness on a real dataset that is generated by a Web-based knowledge management system, Livelink.

Footnote

Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
Loonus, Y.: Einsatzbereiche der KI und ihre Relevanz für Information Professionals (2017) 0.01
```
0.0148409 = product of:
  0.11130674 = sum of:
    0.07829542 = weight(_text_:soziale in 5668) [ClassicSimilarity], result of:
      0.07829542 = score(doc=5668,freq=2.0), product of:
        0.19331455 = queryWeight, product of:
          6.1096387 = idf(docFreq=266, maxDocs=44218)
          0.031640913 = queryNorm
        0.40501565 = fieldWeight in 5668, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.1096387 = idf(docFreq=266, maxDocs=44218)
          0.046875 = fieldNorm(doc=5668)
    0.033011325 = weight(_text_:software in 5668) [ClassicSimilarity], result of:
      0.033011325 = score(doc=5668,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.2629875 = fieldWeight in 5668, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.046875 = fieldNorm(doc=5668)
  0.13333334 = coord(2/15)
```
Abstract

Es liegt in der Natur des Menschen, Erfahrungen und Ideen in Wort und Schrift mit anderen teilen zu wollen. So produzieren wir jeden Tag gigantische Mengen an Texten, die in digitaler Form geteilt und abgelegt werden. The Radicati Group schätzt, dass 2017 täglich 269 Milliarden E-Mails versendet und empfangen werden. Hinzu kommen größtenteils unstrukturierte Daten wie Soziale Medien, Presse, Websites und firmeninterne Systeme, beispielsweise in Form von CRM-Software oder PDF-Dokumenten. Der weltweite Bestand an unstrukturierten Daten wächst so rasant, dass es kaum möglich ist, seinen Umfang zu quantifizieren. Der Versuch, eine belastbare Zahl zu recherchieren, führt unweigerlich zu diversen Artikeln, die den Anteil unstrukturierter Texte am gesamten Datenbestand auf 80% schätzen. Auch wenn nicht mehr einwandfrei nachvollziehbar ist, woher diese Zahl stammt, kann bei kritischer Reflexion unseres Tagesablaufs kaum bezweifelt werden, dass diese Daten von großer wirtschaftlicher Relevanz sind.

Principles of data mining and knowledge discovery (1998) 0.01

0.012429901 = product of:
  0.09322426 = sum of:
    0.0440151 = weight(_text_:software in 3822) [ClassicSimilarity], result of:
      0.0440151 = score(doc=3822,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.35064998 = fieldWeight in 3822, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0625 = fieldNorm(doc=3822)
    0.049209163 = weight(_text_:evaluation in 3822) [ClassicSimilarity], result of:
      0.049209163 = score(doc=3822,freq=2.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.37076265 = fieldWeight in 3822, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.0625 = fieldNorm(doc=3822)
  0.13333334 = coord(2/15)

Abstract: The volume presents 26 revised papers corresponding to the oral presentations given at the conference, also included are refereed papers corresponding to the 30 poster presentations. These papers were selected from a total of 73 full draft submissions. The papers are organized in topical sections on rule evaluation, visualization, association rules and text mining, KDD process and software, tree construction, sequential and spatial data mining, and attribute selection

Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.01

0.0119626755 = product of:
  0.059813377 = sum of:
    0.012879624 = product of:
      0.025759248 = sum of:
        0.025759248 = weight(_text_:online in 1737) [ClassicSimilarity], result of:
          0.025759248 = score(doc=1737,freq=2.0), product of:
            0.096027054 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031640913 = queryNorm
            0.2682499 = fieldWeight in 1737, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0625 = fieldNorm(doc=1737)
      0.5 = coord(1/2)
    0.029786127 = weight(_text_:web in 1737) [ClassicSimilarity], result of:
      0.029786127 = score(doc=1737,freq=2.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.2884563 = fieldWeight in 1737, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0625 = fieldNorm(doc=1737)
    0.017147627 = product of:
      0.034295253 = sum of:
        0.034295253 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
          0.034295253 = score(doc=1737,freq=2.0), product of:
            0.110801086 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031640913 = queryNorm
            0.30952093 = fieldWeight in 1737, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1737)
      0.5 = coord(1/2)
  0.2 = coord(3/15)

Abstract: Defines digital libraries and discusses the effects of new technology on librarians. Examines the different viewpoints of librarians and information technologists on digital libraries. Describes the development of a digital library at the National Drug Intelligence Center, USA, which was carried out in collaboration with information technology experts. The system is based on Web enabled search technology to find information, data visualization and data mining to visualize it and use of SGML as an information standard to store it
Date: 22.11.1998 18:57:22
Source: Online. 21(1997) no.6, S.87-92

Wei, C.-P.; Lee, Y.-H.; Chiang, Y.-S.; Chen, C.-T.; Yang, C.C.C.: Exploiting temporal characteristics of features for effectively discovering event episodes from news corpora (2014) 0.01

0.011484365 = product of:
  0.057421822 = sum of:
    0.008049765 = product of:
      0.01609953 = sum of:
        0.01609953 = weight(_text_:online in 1225) [ClassicSimilarity], result of:
          0.01609953 = score(doc=1225,freq=2.0), product of:
            0.096027054 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031640913 = queryNorm
            0.16765618 = fieldWeight in 1225, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1225)
      0.5 = coord(1/2)
    0.030755727 = weight(_text_:evaluation in 1225) [ClassicSimilarity], result of:
      0.030755727 = score(doc=1225,freq=2.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.23172665 = fieldWeight in 1225, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1225)
    0.01861633 = weight(_text_:web in 1225) [ClassicSimilarity], result of:
      0.01861633 = score(doc=1225,freq=2.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.18028519 = fieldWeight in 1225, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1225)
  0.2 = coord(3/15)

Abstract: An organization performing environmental scanning generally monitors or tracks various events concerning its external environment. One of the major resources for environmental scanning is online news documents, which are readily accessible on news websites or infomediaries. However, the proliferation of the World Wide Web, which increases information sources and improves information circulation, has vastly expanded the amount of information to be scanned. Thus, it is essential to develop an effective event episode discovery mechanism to organize news documents pertaining to an event of interest. In this study, we propose two new metrics, Term Frequency × Inverse Document FrequencyTempo (TF×IDFTempo) and TF×Enhanced-IDFTempo, and develop a temporal-based event episode discovery (TEED) technique that uses the proposed metrics for feature selection and document representation. Using a traditional TF×IDF-based hierarchical agglomerative clustering technique as a performance benchmark, our empirical evaluation reveals that the proposed TEED technique outperforms its benchmark, as measured by cluster recall and cluster precision. In addition, the use of TF×Enhanced-IDFTempo significantly improves the effectiveness of event episode discovery when compared with the use of TF×IDFTempo.

Lischka, K.: Spurensuche im Datenwust : Data-Mining-Software fahndet nach kriminellen Mitarbeitern, guten Kunden - und bald vielleicht auch nach Terroristen (2002) 0.01
```
0.0114192255 = product of:
  0.085644186 = sum of:
    0.04366988 = weight(_text_:software in 1178) [ClassicSimilarity], result of:
      0.04366988 = score(doc=1178,freq=14.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.34789976 = fieldWeight in 1178, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1178)
    0.04197431 = sum of:
      0.029113589 = weight(_text_:analyse in 1178) [ClassicSimilarity], result of:
        0.029113589 = score(doc=1178,freq=2.0), product of:
          0.16670908 = queryWeight, product of:
            5.268782 = idf(docFreq=618, maxDocs=44218)
            0.031640913 = queryNorm
          0.1746371 = fieldWeight in 1178, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.268782 = idf(docFreq=618, maxDocs=44218)
            0.0234375 = fieldNorm(doc=1178)
      0.01286072 = weight(_text_:22 in 1178) [ClassicSimilarity], result of:
        0.01286072 = score(doc=1178,freq=2.0), product of:
          0.110801086 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.031640913 = queryNorm
          0.116070345 = fieldWeight in 1178, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0234375 = fieldNorm(doc=1178)
  0.13333334 = coord(2/15)
```
Abstract

US-Behörden wollen mit spezieller Software die Datenspuren von Terroristen finden. Wie das funktionieren könnte, zeigen Programmen, die heute schon für Unternehmen Kunden und Mitarbeiter analysieren.

Content

"Ob man als Terrorist einen Anschlag gegen die Vereinigten Staaten plant, als Kassierer Scheine aus der Kasse unterschlägt oder für bestimmte Produkte besonders gerne Geld ausgibt - einen Unterschied macht Data-Mining-Software da nicht. Solche Programme analysieren riesige Daten- mengen und fällen statistische Urteile. Mit diesen Methoden wollen nun die For- scher des "Information Awaren in den Vereinigten Staaten Spuren von Terroristen in den Datenbanken von Behörden und privaten Unternehmen wie Kreditkartenfirmen finden. 200 Millionen Dollar umfasst der Jahresetat für die verschiedenen Forschungsprojekte. Dass solche Software in der Praxis funktioniert, zeigen die steigenden Umsätze der Anbieter so genannter Customer-Relationship-Management-Software. Im vergangenen Jahr ist das Potenzial für analytische CRM-Anwendungen laut dem Marktforschungsinstitut IDC weltweit um 22 Prozent gewachsen, bis zum Jahr 2006 soll es in Deutschland mit einem jährlichen Plus von 14,1 Prozent so weitergehen. Und das trotz schwacher Konjunktur - oder gerade deswegen. Denn ähnlich wie Data-Mining der USRegierung helfen soll, Terroristen zu finden, entscheiden CRM-Programme heute, welche Kunden für eine Firma profitabel sind. Und welche es künftig sein werden, wie Manuela Schnaubelt, Sprecherin des CRM-Anbieters SAP, beschreibt: "Die Kundenbewertung ist ein zentraler Bestandteil des analytischen CRM. Sie ermöglicht es Unternehmen, sich auf die für sie wichtigen und richtigen Kunden zu fokussieren. Darüber hinaus können Firmen mit speziellen Scoring- Verfahren ermitteln, welche Kunden langfristig in welchem Maße zum Unternehmenserfolg beitragen." Die Folgen der Bewertungen sind für die Betroffenen nicht immer positiv: Attraktive Kunden profitieren von individuellen Sonderangeboten und besonderer Zuwendung. Andere hängen vielleicht so lauge in der Warteschleife des Telefonservice, bis die profitableren Kunden abgearbeitet sind. So könnte eine praktische Umsetzung dessen aussehen, was SAP-Spreche-rin Schnaubelt abstrakt beschreibt: "In vielen Unternehmen wird Kundenbewertung mit der klassischen ABC-Analyse durchgeführt, bei der Kunden anhand von Daten wie dem Umsatz kategorisiert werden. A-Kunden als besonders wichtige Kunden werden anders betreut als C-Kunden." Noch näher am geplanten Einsatz von Data-Mining zur Terroristenjagd ist eine Anwendung, die heute viele Firmen erfolgreich nutzen: Sie spüren betrügende Mitarbeiter auf. Werner Sülzer vom großen CRM-Anbieter NCR Teradata beschreibt die Möglichkeiten so: "Heute hinterlässt praktisch jeder Täter - ob Mitarbeiter, Kunde oder Lieferant - Datenspuren bei seinen wirtschaftskriminellen Handlungen. Es muss vorrangig darum gehen, einzelne Spuren zu Handlungsmustern und Täterprofilen zu verdichten. Das gelingt mittels zentraler Datenlager und hoch entwickelter Such- und Analyseinstrumente." Von konkreten Erfolgen sprich: Entlas-sungen krimineller Mitarbeiter-nach Einsatz solcher Programme erzählen Unternehmen nicht gerne. Matthias Wilke von der "Beratungsstelle für Technologiefolgen und Qualifizierung" (BTQ) der Gewerkschaft Verdi weiß von einem Fall 'aus der Schweiz. Dort setzt die Handelskette "Pick Pay" das Programm "Lord Lose Prevention" ein. Zwei Monate nach Einfüh-rung seien Unterschlagungen im Wert von etwa 200 000 Franken ermittelt worden. Das kostete mehr als 50 verdächtige Kassiererinnen und Kassierer den Job.
Jede Kasse schickt die Daten zu Stornos, Rückgaben, Korrekturen und dergleichen an eine zentrale Datenbank. Aus den Informationen errechnet das Programm Kassiererprofile. Wessen Arbeit stark Durchschnitt abweicht, macht sich verdächtig. Die Kriterien" legen im Einzelnen die Revisionsabteilungen fest, doch generell gilt: "Bei Auffälligkeiten wie überdurchschnittlichvielenStornierungen, Off nen der Kassenschublade ohne Verkauf nach einem Storno oder Warenrücknahmen ohne Kassenbon, können die Vorgänge nachträglich einzelnen Personen zugeordnet werden", sagt Rene Schiller, Marketing-Chef des Lord-Herstellers Logware. Ein Kündigungsgrund ist eine solche Datensammlung vor Gericht nicht. Doch auf der Basis können Unternehmen gezielt Detektive einsetzen. Oder sie konfrontieren die Mitarbeiter mit dem Material; woraufhin Schuldige meist gestehen. Wilke sieht Programme wie Lord kritisch:"Jeder, der in dem Raster auffällt, kann ein potenzieller Betrüger oder Dieb sein und verdient besondere Beobachtung." Dabei könne man vom Standard abweichen, weil man unausgeschlafen und deshalb unkonzentriert sei. Hier tut sich für Wilke die Gefahr technisierter Leistungskontrolle auf. "Es ist ja nicht schwierig, mit den Programmen zu berechnen, wie lange beispielsweise das Kassieren eines Samstagseinkaufs durchschnittlich dauert." Die Betriebsräte - ihre Zustimmung ist beim Einsatz technischer Kon trolleinrichtungen nötig - verurteilen die wertende Software weniger eindeutig. Im Gegenteil: Bei Kaufhof und Edeka haben sie dem Einsatz zugestimmt. Denn: "Die wollen ja nicht, dass ganze Abteilungen wegen Inventurverlusten oder dergleichen unter Generalverdacht fallen", erklärt Gewerkschaftler Wilke: "Angesichts der Leistungen kommerzieller Data-Mining-Programme verblüfft es, dass in den Vereinigten Staaten das "Information Awareness Office" noch drei Jahre für Forschung und Erprobung der eigenen Programme veranschlagt. 2005 sollen frühe Prototypen zur Terroristensuche einesgetz werden. Doch schon jetzt regt sich Protest. Datenschützer wie Marc Botenberg vom Informationszentrum für Daten schutz sprechen vom "ehrgeizigsten öffentlichen Überwachungssystem, das je vorgeschlagen wurde". Sie warnen besonders davor, Daten aus der Internetnutzung und private Mails auszuwerten. Das Verteidigungsministerium rudert zurück. Man denke nicht daran, über die Software im Inland aktiv zu werden. "Das werden die Geheimdienste, die Spionageabwehr und die Strafverfolger tun", sagt Unterstaatssekretär Edward Aldridge. Man werde während der Entwicklung und der Tests mit konstruierten und einigen - aus Sicht der Datenschützer unbedenklichen - realen Informationen arbeiten. Zu denken gibt jedoch Aldriges Antwort auf die Frage, warum so viel Geld für die Entwicklung von Übersetzungssoftware eingeplant ist: Damit man Datenbanken in anderen Sprachen nutzen könne - sofern man auf sie rechtmäßigen Zugriff bekommt."
Ku, L.-W.; Chen, H.-H.: Mining opinions from the Web : beyond relevance retrieval (2007) 0.01
```
0.009651082 = product of:
  0.072383106 = sum of:
    0.030755727 = weight(_text_:evaluation in 605) [ClassicSimilarity], result of:
      0.030755727 = score(doc=605,freq=2.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.23172665 = fieldWeight in 605, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.0390625 = fieldNorm(doc=605)
    0.041627377 = weight(_text_:web in 605) [ClassicSimilarity], result of:
      0.041627377 = score(doc=605,freq=10.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.40312994 = fieldWeight in 605, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=605)
  0.13333334 = coord(2/15)
```
Abstract

Documents discussing public affairs, common themes, interesting products, and so on, are reported and distributed on the Web. Positive and negative opinions embedded in documents are useful references and feedbacks for governments to improve their services, for companies to market their products, and for customers to purchase their objects. Web opinion mining aims to extract, summarize, and track various aspects of subjective information on the Web. Mining subjective information enables traditional information retrieval (IR) systems to retrieve more data from human viewpoints and provide information with finer granularity. Opinion extraction identifies opinion holders, extracts the relevant opinion sentences, and decides their polarities. Opinion summarization recognizes the major events embedded in documents and summarizes the supportive and the nonsupportive evidence. Opinion tracking captures subjective information from various genres and monitors the developments of opinions from spatial and temporal dimensions. To demonstrate and evaluate the proposed opinion mining algorithms, news and bloggers' articles are adopted. Documents in the evaluation corpora are tagged in different granularities from words, sentences to documents. In the experiments, positive and negative sentiment words and their weights are mined on the basis of Chinese word structures. The f-measure is 73.18% and 63.75% for verbs and nouns, respectively. Utilizing the sentiment words mined together with topical words, we achieve f-measure 62.16% at the sentence level and 74.37% at the document level.

Footnote

Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
Liu, B.: Web data mining : exploring hyperlinks, contents, and usage data (2011) 0.01
```
0.008018335 = product of:
  0.060137514 = sum of:
    0.006439812 = product of:
      0.012879624 = sum of:
        0.012879624 = weight(_text_:online in 354) [ClassicSimilarity], result of:
          0.012879624 = score(doc=354,freq=2.0), product of:
            0.096027054 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031640913 = queryNorm
            0.13412495 = fieldWeight in 354, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.03125 = fieldNorm(doc=354)
      0.5 = coord(1/2)
    0.0536977 = weight(_text_:web in 354) [ClassicSimilarity], result of:
      0.0536977 = score(doc=354,freq=26.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.520022 = fieldWeight in 354, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=354)
  0.13333334 = coord(2/15)
```
Abstract

Web mining aims to discover useful information and knowledge from the Web hyperlink structure, page contents, and usage data. Although Web mining uses many conventional data mining techniques, it is not purely an application of traditional data mining due to the semistructured and unstructured nature of the Web data and its heterogeneity. It has also developed many of its own algorithms and techniques. Liu has written a comprehensive text on Web data mining. Key topics of structure mining, content mining, and usage mining are covered both in breadth and in depth. His book brings together all the essential concepts and algorithms from related areas such as data mining, machine learning, and text processing to form an authoritative and coherent text. The book offers a rich blend of theory and practice, addressing seminal research ideas, as well as examining the technology from a practical point of view. It is suitable for students, researchers and practitioners interested in Web mining both as a learning text and a reference book. Lecturers can readily use it for classes on data mining, Web mining, and Web search. Additional teaching materials such as lecture slides, datasets, and implemented algorithms are available online.

Content

Inhalt: 1. Introduction 2. Association Rules and Sequential Patterns 3. Supervised Learning 4. Unsupervised Learning 5. Partially Supervised Learning 6. Information Retrieval and Web Search 7. Social Network Analysis 8. Web Crawling 9. Structured Data Extraction: Wrapper Generation 10. Information Integration

RSWK

World Wide Web / Data Mining

Subject

World Wide Web / Data Mining

Schwartz, D.: Graphische Datenanalyse für digitale Bibliotheken : Leistungs- und Funktionsumfang moderner Analyse- und Visualisierungsinstrumente (2006) 0.01

0.008003829 = product of:
  0.060028717 = sum of:
    0.026062861 = weight(_text_:web in 30) [ClassicSimilarity], result of:
      0.026062861 = score(doc=30,freq=2.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.25239927 = fieldWeight in 30, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=30)
    0.033965856 = product of:
      0.06793171 = sum of:
        0.06793171 = weight(_text_:analyse in 30) [ClassicSimilarity], result of:
          0.06793171 = score(doc=30,freq=2.0), product of:
            0.16670908 = queryWeight, product of:
              5.268782 = idf(docFreq=618, maxDocs=44218)
              0.031640913 = queryNorm
            0.40748656 = fieldWeight in 30, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.268782 = idf(docFreq=618, maxDocs=44218)
              0.0546875 = fieldNorm(doc=30)
      0.5 = coord(1/2)
  0.13333334 = coord(2/15)

Abstract: Das World Wide Web stellt umfangreiche Datenmengen zur Verfügung. Für den Benutzer wird es zunehmend schwieriger, diese Datenmengen zu sichten, zu bewerten und die relevanten Daten herauszufiltern. Einen Lösungsansatz für diese Problemstellung bieten Visualisierungsinstrumente, mit deren Hilfe Rechercheergebnisse nicht mehr ausschließlich über textbasierte Dokumentenlisten, sondern über Symbole, Icons oder graphische Elemente dargestellt werden. Durch geeignete Visualisierungstechniken können Informationsstrukturen in großen Datenmengen aufgezeigt werden. Informationsvisualisierung ist damit ein Instrument, um Rechercheergebnisse in einer digitalen Bibliothek zu strukturieren und relevante Daten für den Benutzer leichter auffindbar zu machen.

Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.01
```
0.007899529 = product of:
  0.05924647 = sum of:
    0.036906876 = weight(_text_:evaluation in 3464) [ClassicSimilarity], result of:
      0.036906876 = score(doc=3464,freq=2.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.278072 = fieldWeight in 3464, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.046875 = fieldNorm(doc=3464)
    0.022339594 = weight(_text_:web in 3464) [ClassicSimilarity], result of:
      0.022339594 = score(doc=3464,freq=2.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.21634221 = fieldWeight in 3464, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3464)
  0.13333334 = coord(2/15)
```
Abstract

We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.

Kraker, P.; Kittel, C,; Enkhbayar, A.: Open Knowledge Maps : creating a visual interface to the world's scientific knowledge based on natural language processing (2016) 0.01

0.007380123 = product of:
  0.05535092 = sum of:
    0.033011325 = weight(_text_:software in 3205) [ClassicSimilarity], result of:
      0.033011325 = score(doc=3205,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.2629875 = fieldWeight in 3205, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.046875 = fieldNorm(doc=3205)
    0.022339594 = weight(_text_:web in 3205) [ClassicSimilarity], result of:
      0.022339594 = score(doc=3205,freq=2.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.21634221 = fieldWeight in 3205, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3205)
  0.13333334 = coord(2/15)

Abstract: The goal of Open Knowledge Maps is to create a visual interface to the world's scientific knowledge. The base for this visual interface consists of so-called knowledge maps, which enable the exploration of existing knowledge and the discovery of new knowledge. Our open source knowledge mapping software applies a mixture of summarization techniques and similarity measures on article metadata, which are iteratively chained together. After processing, the representation is saved in a database for use in a web visualization. In the future, we want to create a space for collective knowledge mapping that brings together individuals and communities involved in exploration and discovery. We want to enable people to guide each other in their discovery by collaboratively annotating and modifying the automatically created maps.

Medien-Informationsmanagement : Archivarische, dokumentarische, betriebswirtschaftliche, rechtliche und Berufsbild-Aspekte ; [Frühjahrstagung der Fachgruppe 7 im Jahr 2000 in Weimar und Folgetagung 2001 in Köln] (2003) 0.01
```
0.006821164 = product of:
  0.03410582 = sum of:
    0.016505662 = weight(_text_:software in 1833) [ClassicSimilarity], result of:
      0.016505662 = score(doc=1833,freq=2.0), product of:
        0.12552431 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031640913 = queryNorm
        0.13149375 = fieldWeight in 1833, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1833)
    0.011169797 = weight(_text_:web in 1833) [ClassicSimilarity], result of:
      0.011169797 = score(doc=1833,freq=2.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.108171105 = fieldWeight in 1833, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1833)
    0.00643036 = product of:
      0.01286072 = sum of:
        0.01286072 = weight(_text_:22 in 1833) [ClassicSimilarity], result of:
          0.01286072 = score(doc=1833,freq=2.0), product of:
            0.110801086 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031640913 = queryNorm
            0.116070345 = fieldWeight in 1833, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1833)
      0.5 = coord(1/2)
  0.2 = coord(3/15)
```
Abstract

Als in den siebziger Jahren des vergangenen Jahrhunderts immer häufiger die Bezeichnung Informationsmanager für Leute propagiert wurde, die bis dahin als Dokumentare firmierten, wurde dies in den etablierten Kreisen der Archivare und Bibliothekare gelegentlich belächelt und als Zeichen einer Identitätskrise oder jedenfalls einer Verunsicherung des damit überschriebenen Berufsbilds gewertet. Für den Berufsstand der Medienarchivare/Mediendokumentare, die sich seit 1960 in der Fachgruppe 7 des Vereins, später Verbands deutscher Archivare (VdA) organisieren, gehörte diese Verortung im Zeichen neuer inhaltlicher Herausforderungen (Informationsflut) und Technologien (EDV) allerdings schon früh zu den Selbstverständlichkeiten des Berufsalltags. "Halt, ohne uns geht es nicht!" lautete die Überschrift eines Artikels im Verbandsorgan "Info 7", der sich mit der Einrichtung von immer mächtigeren Leitungsnetzen und immer schnelleren Datenautobahnen beschäftigte. Information, Informationsgesellschaft: diese Begriffe wurden damals fast nur im technischen Sinne verstanden. Die informatisierte, nicht die informierte Gesellschaft stand im Vordergrund - was wiederum Kritiker auf den Plan rief, von Joseph Weizenbaum in den USA bis hin zu den Informations-Ökologen in Bremen. Bei den nationalen, manchmal auch nur regionalen Projekten und Modellversuchen mit Datenautobahnen - auch beim frühen Btx - war nie so recht deutlich geworden, welche Inhalte in welcher Gestalt durch diese Netze und Straßen gejagt werden sollten und wer diese Inhalte eigentlich selektieren, portionieren, positionieren, kurz: managen sollte. Spätestens mit dem World Wide Web sind diese Projekte denn auch obsolet geworden, jedenfalls was die Hardware und Software anging. Geblieben ist das Thema Inhalte (neudeutsch: Content). Und - immer drängender im nicht nur technischen Verständnis - das Thema Informationsmanagement. MedienInformationsManagement war die Frühjahrstagung der Fachgruppe 7 im Jahr 2000 in Weimar überschrieben, und auch die Folgetagung 2001 in Köln, die der multimedialen Produktion einen dokumentarischen Pragmatismus gegenüber stellte, handelte vom Geschäftsfeld Content und von Content-Management-Systemen. Die in diesem 6. Band der Reihe Beiträge zur Mediendokumentation versammelten Vorträge und Diskussionsbeiträge auf diesen beiden Tagungen beleuchten das Titel-Thema aus den verschiedensten Blickwinkeln: archivarischen, dokumentarischen, kaufmännischen, berufsständischen und juristischen. Deutlich wird dabei, daß die Berufsbezeichnung Medienarchivarln/Mediendokumentarln ziemlich genau für all das steht, was heute mit sog. alten wie neuen Medien im organisatorischen, d.h. ordnenden und vermittelnden Sinne geschieht. Im besonderen Maße trifft dies auf das Internet und die aus ihm geborenen Intranets zu. Beide bedürfen genauso der ordnenden Hand, die sich an den alten Medien, an Buch, Zeitung, Tonträger, Film etc. geschult hat, denn sie leben zu großen Teilen davon. Daß das Internet gleichwohl ein Medium sui generis ist und die alten Informationsberufe vor ganz neue Herausforderungen stellt - auch das durchzieht die Beiträge von Weimar und Köln.

Date

11. 5.2008 19:49:22
O'Brien, H.L.; Lebow, M.: Mixed-methods approach to measuring user experience in online news interactions (2013) 0.01
```
0.0062473677 = product of:
  0.046855256 = sum of:
    0.01609953 = product of:
      0.03219906 = sum of:
        0.03219906 = weight(_text_:online in 1001) [ClassicSimilarity], result of:
          0.03219906 = score(doc=1001,freq=8.0), product of:
            0.096027054 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031640913 = queryNorm
            0.33531237 = fieldWeight in 1001, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1001)
      0.5 = coord(1/2)
    0.030755727 = weight(_text_:evaluation in 1001) [ClassicSimilarity], result of:
      0.030755727 = score(doc=1001,freq=2.0), product of:
        0.13272417 = queryWeight, product of:
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.031640913 = queryNorm
        0.23172665 = fieldWeight in 1001, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1947007 = idf(docFreq=1811, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1001)
  0.13333334 = coord(2/15)
```
Abstract

When it comes to evaluating online information experiences, what metrics matter? We conducted a study in which 30 people browsed and selected content within an online news website. Data collected included psychometric scales (User Engagement, Cognitive Absorption, System Usability Scales), self-reported interest in news content, and performance metrics (i.e., reading time, browsing time, total time, number of pages visited, and use of recommended links); a subset of the participants had their physiological responses recorded during the interaction (i.e., heart rate, electrodermal activity, electrocmytogram). Findings demonstrated the concurrent validity of the psychometric scales and interest ratings and revealed that increased time on tasks, number of pages visited, and use of recommended links were not necessarily indicative of greater self-reported engagement, cognitive absorption, or perceived usability. Positive ratings of news content were associated with lower physiological activity. The implications of this research are twofold. First, we propose that user experience is a useful framework for studying online information interactions and will result in a broader conceptualization of information interaction and its evaluation. Second, we advocate a mixed-methods approach to measurement that employs a suite of metrics capable of capturing the pragmatic (e.g., usability) and hedonic (e.g., fun, engagement) aspects of information interactions. We underscore the importance of using multiple measures in information research, because our results emphasize that performance and physiological data must be interpreted in the context of users' subjective experiences.
Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.01
```
0.005728226 = product of:
  0.042961694 = sum of:
    0.03224443 = weight(_text_:web in 1605) [ClassicSimilarity], result of:
      0.03224443 = score(doc=1605,freq=6.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.3122631 = fieldWeight in 1605, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1605)
    0.010717267 = product of:
      0.021434534 = sum of:
        0.021434534 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
          0.021434534 = score(doc=1605,freq=2.0), product of:
            0.110801086 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031640913 = queryNorm
            0.19345059 = fieldWeight in 1605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1605)
      0.5 = coord(1/2)
  0.13333334 = coord(2/15)
```
Abstract

Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22

Lackes, R.; Tillmanns, C.: Data Mining für die Unternehmenspraxis : Entscheidungshilfen und Fallstudien mit führenden Softwarelösungen (2006) 0.01

0.005596575 = product of:
  0.08394862 = sum of:
    0.08394862 = sum of:
      0.058227178 = weight(_text_:analyse in 1383) [ClassicSimilarity], result of:
        0.058227178 = score(doc=1383,freq=2.0), product of:
          0.16670908 = queryWeight, product of:
            5.268782 = idf(docFreq=618, maxDocs=44218)
            0.031640913 = queryNorm
          0.3492742 = fieldWeight in 1383, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.268782 = idf(docFreq=618, maxDocs=44218)
            0.046875 = fieldNorm(doc=1383)
      0.02572144 = weight(_text_:22 in 1383) [ClassicSimilarity], result of:
        0.02572144 = score(doc=1383,freq=2.0), product of:
          0.110801086 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.031640913 = queryNorm
          0.23214069 = fieldWeight in 1383, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=1383)
  0.06666667 = coord(1/15)

Abstract: Das Buch richtet sich an Praktiker in Unternehmen, die sich mit der Analyse von großen Datenbeständen beschäftigen. Nach einem kurzen Theorieteil werden vier Fallstudien aus dem Customer Relationship Management eines Versandhändlers bearbeitet. Dabei wurden acht führende Softwarelösungen verwendet: der Intelligent Miner von IBM, der Enterprise Miner von SAS, Clementine von SPSS, Knowledge Studio von Angoss, der Delta Miner von Bissantz, der Business Miner von Business Object und die Data Engine von MIT. Im Rahmen der Fallstudien werden die Stärken und Schwächen der einzelnen Lösungen deutlich, und die methodisch-korrekte Vorgehensweise beim Data Mining wird aufgezeigt. Beides liefert wertvolle Entscheidungshilfen für die Auswahl von Standardsoftware zum Data Mining und für die praktische Datenanalyse.
Date: 22. 3.2008 14:46:06

Liu, W.; Weichselbraun, A.; Scharl, A.; Chang, E.: Semi-automatic ontology extension using spreading activation (2005) 0.00

0.004977671 = product of:
  0.03733253 = sum of:
    0.011269671 = product of:
      0.022539342 = sum of:
        0.022539342 = weight(_text_:online in 3028) [ClassicSimilarity], result of:
          0.022539342 = score(doc=3028,freq=2.0), product of:
            0.096027054 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031640913 = queryNorm
            0.23471867 = fieldWeight in 3028, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3028)
      0.5 = coord(1/2)
    0.026062861 = weight(_text_:web in 3028) [ClassicSimilarity], result of:
      0.026062861 = score(doc=3028,freq=2.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.25239927 = fieldWeight in 3028, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3028)
  0.13333334 = coord(2/15)

Abstract: This paper describes a system to semi-automatically extend and refine ontologies by mining textual data from the Web sites of international online media. Expanding a seed ontology creates a semantic network through co-occurrence analysis, trigger phrase analysis, and disambiguation based on the WordNet lexical dictionary. Spreading activation then processes this semantic network to find the most probable candidates for inclusion in an extended ontology. Approaches to identifying hierarchical relationships such as subsumption, head noun analysis and WordNet consultation are used to confirm and classify the found relationships. Using a seed ontology on "climate change" as an example, this paper demonstrates how spreading activation improves the result by naturally integrating the mentioned methods.

Miao, Q.; Li, Q.; Zeng, D.: Fine-grained opinion mining by integrating multiple review sources (2010) 0.00

0.004977671 = product of:
  0.03733253 = sum of:
    0.011269671 = product of:
      0.022539342 = sum of:
        0.022539342 = weight(_text_:online in 4104) [ClassicSimilarity], result of:
          0.022539342 = score(doc=4104,freq=2.0), product of:
            0.096027054 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031640913 = queryNorm
            0.23471867 = fieldWeight in 4104, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4104)
      0.5 = coord(1/2)
    0.026062861 = weight(_text_:web in 4104) [ClassicSimilarity], result of:
      0.026062861 = score(doc=4104,freq=2.0), product of:
        0.10326045 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031640913 = queryNorm
        0.25239927 = fieldWeight in 4104, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4104)
  0.13333334 = coord(2/15)

Abstract: With the rapid development of Web 2.0, online reviews have become extremely valuable sources for mining customers' opinions. Fine-grained opinion mining has attracted more and more attention of both applied and theoretical research. In this article, the authors study how to automatically mine product features and opinions from multiple review sources. Specifically, they propose an integration strategy to solve the issue. Within the integration strategy, the authors mine domain knowledge from semistructured reviews and then exploit the domain knowledge to assist product feature extraction and sentiment orientation identification from unstructured reviews. Finally, feature-opinion tuples are generated. Experimental results on real-world datasets show that the proposed approach is effective.

Search (77 results, page 1 of 4)

Authors

Years

Languages

Types

Themes

Subjects

Classifications