Search (57 results, page 1 of 3)

  • × theme_ss:"Data Mining"
  1. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.03
    0.03495895 = product of:
      0.104876846 = sum of:
        0.09442965 = weight(_text_:propose in 3464) [ClassicSimilarity], result of:
          0.09442965 = score(doc=3464,freq=4.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.48135406 = fieldWeight in 3464, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
        0.0104471985 = product of:
          0.031341594 = sum of:
            0.031341594 = weight(_text_:29 in 3464) [ClassicSimilarity], result of:
              0.031341594 = score(doc=3464,freq=2.0), product of:
                0.13440257 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.038207654 = queryNorm
                0.23319192 = fieldWeight in 3464, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3464)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.
    Date
    1. 6.2010 9:29:57
  2. Srinivasan, P.: Text mining in biomedicine : challenges and opportunities (2006) 0.03
    0.025739681 = product of:
      0.07721904 = sum of:
        0.06677184 = weight(_text_:propose in 1497) [ClassicSimilarity], result of:
          0.06677184 = score(doc=1497,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 1497, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=1497)
        0.0104471985 = product of:
          0.031341594 = sum of:
            0.031341594 = weight(_text_:29 in 1497) [ClassicSimilarity], result of:
              0.031341594 = score(doc=1497,freq=2.0), product of:
                0.13440257 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.038207654 = queryNorm
                0.23319192 = fieldWeight in 1497, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1497)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    Text mining is about making serendipity more likely. Serendipity, the chance discovery of interesting ideas, has been responsible for many discoveries in science. Text mining systems strive to explore large text collections, separate the potentially meaningfull connections from a vast and mostly noisy background of random associations. In this paper we provide a summary of our text mining approach and also illustrate briefly some of the experiments we have conducted with this approach. In particular we use a profile-based text mining method. We have used these profiles to explore the global distribution of disease research, replicate discoveries made by others and propose new hypotheses. Text mining holds much potential that has yet to be tapped.
    Date
    29. 2.2008 17:14:09
  3. Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.02
    0.021449735 = product of:
      0.064349204 = sum of:
        0.055643205 = weight(_text_:propose in 967) [ClassicSimilarity], result of:
          0.055643205 = score(doc=967,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.2836406 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
        0.008706 = product of:
          0.026117997 = sum of:
            0.026117997 = weight(_text_:29 in 967) [ClassicSimilarity], result of:
              0.026117997 = score(doc=967,freq=2.0), product of:
                0.13440257 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.038207654 = queryNorm
                0.19432661 = fieldWeight in 967, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=967)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
    Date
    25. 6.2013 19:05:29
  4. Gill, A.J.; Hinrichs-Krapels, S.; Blanke, T.; Grant, J.; Hedges, M.; Tanner, S.: Insight workflow : systematically combining human and computational methods to explore textual data (2017) 0.02
    0.021449735 = product of:
      0.064349204 = sum of:
        0.055643205 = weight(_text_:propose in 3682) [ClassicSimilarity], result of:
          0.055643205 = score(doc=3682,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.2836406 = fieldWeight in 3682, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3682)
        0.008706 = product of:
          0.026117997 = sum of:
            0.026117997 = weight(_text_:29 in 3682) [ClassicSimilarity], result of:
              0.026117997 = score(doc=3682,freq=2.0), product of:
                0.13440257 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.038207654 = queryNorm
                0.19432661 = fieldWeight in 3682, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3682)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    Analyzing large quantities of real-world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real-world data set as our case study, and use this exploration as a demonstration of our "insight workflow," which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in-depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.
    Date
    16.11.2017 14:00:29
  5. Fonseca, F.; Marcinkowski, M.; Davis, C.: Cyber-human systems of thought and understanding (2019) 0.02
    0.021423629 = product of:
      0.064270884 = sum of:
        0.055643205 = weight(_text_:propose in 5011) [ClassicSimilarity], result of:
          0.055643205 = score(doc=5011,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.2836406 = fieldWeight in 5011, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5011)
        0.008627683 = product of:
          0.025883049 = sum of:
            0.025883049 = weight(_text_:22 in 5011) [ClassicSimilarity], result of:
              0.025883049 = score(doc=5011,freq=2.0), product of:
                0.13379669 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038207654 = queryNorm
                0.19345059 = fieldWeight in 5011, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5011)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    The present challenge faced by scientists working with Big Data comes in the overwhelming volume and level of detail provided by current data sets. Exceeding traditional empirical approaches, Big Data opens a new perspective on scientific work in which data comes to play a role in the development of the scientific problematic to be developed. Addressing this reconfiguration of our relationship with data through readings of Wittgenstein, Macherey, and Popper, we propose a picture of science that encourages scientists to engage with the data in a direct way, using the data itself as an instrument for scientific investigation. Using GIS as a theme, we develop the concept of cyber-human systems of thought and understanding to bridge the divide between representative (theoretical) thinking and (non-theoretical) data-driven science. At the foundation of these systems, we invoke the concept of the "semantic pixel" to establish a logical and virtual space linking data and the work of scientists. It is with this discussion of the relationship between analysts in their pursuit of knowledge and the rise of Big Data that this present discussion of the philosophical foundations of Big Data addresses the central questions raised by social informatics research.
    Date
    7. 3.2019 16:32:22
  6. Bella, A. La; Fronzetti Colladon, A.; Battistoni, E.; Castellan, S.; Francucci, M.: Assessing perceived organizational leadership styles through twitter text mining (2018) 0.02
    0.015738275 = product of:
      0.09442965 = sum of:
        0.09442965 = weight(_text_:propose in 2400) [ClassicSimilarity], result of:
          0.09442965 = score(doc=2400,freq=4.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.48135406 = fieldWeight in 2400, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=2400)
      0.16666667 = coord(1/6)
    
    Abstract
    We propose a text classification tool based on support vector machines for the assessment of organizational leadership styles, as appearing to Twitter users. We collected Twitter data over 51 days, related to the first 30 Italian organizations in the 2015 ranking of Forbes Global 2000-out of which we selected the five with the most relevant volumes of tweets. We analyzed the communication of the company leaders, together with the dialogue among the stakeholders of each company, to understand the association with perceived leadership styles and dimensions. To assess leadership profiles, we referred to the 10-factor model developed by Barchiesi and La Bella in 2007. We maintain the distinctiveness of the approach we propose, as it allows a rapid assessment of the perceived leadership capabilities of an enterprise, as they emerge from its social media interactions. It can also be used to show how companies respond and manage their communication when specific events take place, and to assess their stakeholder's reactions.
  7. Data mining : Theoretische Aspekte und Anwendungen (1998) 0.01
    0.013321342 = product of:
      0.07992805 = sum of:
        0.07992805 = weight(_text_:forschung in 966) [ClassicSimilarity], result of:
          0.07992805 = score(doc=966,freq=2.0), product of:
            0.1858777 = queryWeight, product of:
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.038207654 = queryNorm
            0.43000343 = fieldWeight in 966, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.0625 = fieldNorm(doc=966)
      0.16666667 = coord(1/6)
    
    Abstract
    Behandelt werden u.a. die Themen: Ziele und Methoden des Data Mining, Prozeß der Wissensentdeckung, State of the Art in der Forschung und Anwendung des Data Mining, wichtige Data Mining Tools, die Rolle der Informationsverarbeitung im KDD Prozeß, Data Warehousing, OLAP, Ansätze zur Benutzerunterstüzung des Data Mining Prozesses, Modellselektion und Evaluierungsmethoden für Data Mining Algorithmen
  8. Ayadi, H.; Torjmen-Khemakhem, M.; Daoud, M.; Huang, J.X.; Jemaa, M.B.: Mining correlations between medically dependent features and image retrieval models for query classification (2017) 0.01
    0.01311523 = product of:
      0.07869138 = sum of:
        0.07869138 = weight(_text_:propose in 3607) [ClassicSimilarity], result of:
          0.07869138 = score(doc=3607,freq=4.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.40112838 = fieldWeight in 3607, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3607)
      0.16666667 = coord(1/6)
    
    Abstract
    The abundance of medical resources has encouraged the development of systems that allow for efficient searches of information in large medical image data sets. State-of-the-art image retrieval models are classified into three categories: content-based (visual) models, textual models, and combined models. Content-based models use visual features to answer image queries, textual image retrieval models use word matching to answer textual queries, and combined image retrieval models, use both textual and visual features to answer queries. Nevertheless, most of previous works in this field have used the same image retrieval model independently of the query type. In this article, we define a list of generic and specific medical query features and exploit them in an association rule mining technique to discover correlations between query features and image retrieval models. Based on these rules, we propose to use an associative classifier (NaiveClass) to find the best suitable retrieval model given a new textual query. We also propose a second associative classifier (SmartClass) to select the most appropriate default class for the query. Experiments are performed on Medical ImageCLEF queries from 2008 to 2012 to evaluate the impact of the proposed query features on the classification performance. The results show that combining our proposed specific and generic query features is effective in query classification.
  9. Zhou, L.; Chaovalit, P.: Ontology-supported polarity mining (2008) 0.01
    0.012983414 = product of:
      0.077900484 = sum of:
        0.077900484 = weight(_text_:propose in 1343) [ClassicSimilarity], result of:
          0.077900484 = score(doc=1343,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3970968 = fieldWeight in 1343, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1343)
      0.16666667 = coord(1/6)
    
    Abstract
    Polarity mining provides an in-depth analysis of semantic orientations of text information. Motivated by its success in the area of topic mining, we propose an ontology-supported polarity mining (OSPM) approach. The approach aims to enhance polarity mining with ontology by providing detailed topic-specific information. OSPM was evaluated in the movie review domain using both supervised and unsupervised techniques. Results revealed that OSPM outperformed the baseline method without ontology support. The findings of this study not only advance the state of polarity mining research but also shed light on future research directions.
  10. Miao, Q.; Li, Q.; Zeng, D.: Fine-grained opinion mining by integrating multiple review sources (2010) 0.01
    0.012983414 = product of:
      0.077900484 = sum of:
        0.077900484 = weight(_text_:propose in 4104) [ClassicSimilarity], result of:
          0.077900484 = score(doc=4104,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3970968 = fieldWeight in 4104, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4104)
      0.16666667 = coord(1/6)
    
    Abstract
    With the rapid development of Web 2.0, online reviews have become extremely valuable sources for mining customers' opinions. Fine-grained opinion mining has attracted more and more attention of both applied and theoretical research. In this article, the authors study how to automatically mine product features and opinions from multiple review sources. Specifically, they propose an integration strategy to solve the issue. Within the integration strategy, the authors mine domain knowledge from semistructured reviews and then exploit the domain knowledge to assist product feature extraction and sentiment orientation identification from unstructured reviews. Finally, feature-opinion tuples are generated. Experimental results on real-world datasets show that the proposed approach is effective.
  11. Lischka, K.: Spurensuche im Datenwust : Data-Mining-Software fahndet nach kriminellen Mitarbeitern, guten Kunden - und bald vielleicht auch nach Terroristen (2002) 0.01
    0.011716543 = product of:
      0.035149626 = sum of:
        0.029973019 = weight(_text_:forschung in 1178) [ClassicSimilarity], result of:
          0.029973019 = score(doc=1178,freq=2.0), product of:
            0.1858777 = queryWeight, product of:
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.038207654 = queryNorm
            0.16125129 = fieldWeight in 1178, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1178)
        0.0051766094 = product of:
          0.015529828 = sum of:
            0.015529828 = weight(_text_:22 in 1178) [ClassicSimilarity], result of:
              0.015529828 = score(doc=1178,freq=2.0), product of:
                0.13379669 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038207654 = queryNorm
                0.116070345 = fieldWeight in 1178, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=1178)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Content
    "Ob man als Terrorist einen Anschlag gegen die Vereinigten Staaten plant, als Kassierer Scheine aus der Kasse unterschlägt oder für bestimmte Produkte besonders gerne Geld ausgibt - einen Unterschied macht Data-Mining-Software da nicht. Solche Programme analysieren riesige Daten- mengen und fällen statistische Urteile. Mit diesen Methoden wollen nun die For- scher des "Information Awaren in den Vereinigten Staaten Spuren von Terroristen in den Datenbanken von Behörden und privaten Unternehmen wie Kreditkartenfirmen finden. 200 Millionen Dollar umfasst der Jahresetat für die verschiedenen Forschungsprojekte. Dass solche Software in der Praxis funktioniert, zeigen die steigenden Umsätze der Anbieter so genannter Customer-Relationship-Management-Software. Im vergangenen Jahr ist das Potenzial für analytische CRM-Anwendungen laut dem Marktforschungsinstitut IDC weltweit um 22 Prozent gewachsen, bis zum Jahr 2006 soll es in Deutschland mit einem jährlichen Plus von 14,1 Prozent so weitergehen. Und das trotz schwacher Konjunktur - oder gerade deswegen. Denn ähnlich wie Data-Mining der USRegierung helfen soll, Terroristen zu finden, entscheiden CRM-Programme heute, welche Kunden für eine Firma profitabel sind. Und welche es künftig sein werden, wie Manuela Schnaubelt, Sprecherin des CRM-Anbieters SAP, beschreibt: "Die Kundenbewertung ist ein zentraler Bestandteil des analytischen CRM. Sie ermöglicht es Unternehmen, sich auf die für sie wichtigen und richtigen Kunden zu fokussieren. Darüber hinaus können Firmen mit speziellen Scoring- Verfahren ermitteln, welche Kunden langfristig in welchem Maße zum Unternehmenserfolg beitragen." Die Folgen der Bewertungen sind für die Betroffenen nicht immer positiv: Attraktive Kunden profitieren von individuellen Sonderangeboten und besonderer Zuwendung. Andere hängen vielleicht so lauge in der Warteschleife des Telefonservice, bis die profitableren Kunden abgearbeitet sind. So könnte eine praktische Umsetzung dessen aussehen, was SAP-Spreche-rin Schnaubelt abstrakt beschreibt: "In vielen Unternehmen wird Kundenbewertung mit der klassischen ABC-Analyse durchgeführt, bei der Kunden anhand von Daten wie dem Umsatz kategorisiert werden. A-Kunden als besonders wichtige Kunden werden anders betreut als C-Kunden." Noch näher am geplanten Einsatz von Data-Mining zur Terroristenjagd ist eine Anwendung, die heute viele Firmen erfolgreich nutzen: Sie spüren betrügende Mitarbeiter auf. Werner Sülzer vom großen CRM-Anbieter NCR Teradata beschreibt die Möglichkeiten so: "Heute hinterlässt praktisch jeder Täter - ob Mitarbeiter, Kunde oder Lieferant - Datenspuren bei seinen wirtschaftskriminellen Handlungen. Es muss vorrangig darum gehen, einzelne Spuren zu Handlungsmustern und Täterprofilen zu verdichten. Das gelingt mittels zentraler Datenlager und hoch entwickelter Such- und Analyseinstrumente." Von konkreten Erfolgen sprich: Entlas-sungen krimineller Mitarbeiter-nach Einsatz solcher Programme erzählen Unternehmen nicht gerne. Matthias Wilke von der "Beratungsstelle für Technologiefolgen und Qualifizierung" (BTQ) der Gewerkschaft Verdi weiß von einem Fall 'aus der Schweiz. Dort setzt die Handelskette "Pick Pay" das Programm "Lord Lose Prevention" ein. Zwei Monate nach Einfüh-rung seien Unterschlagungen im Wert von etwa 200 000 Franken ermittelt worden. Das kostete mehr als 50 verdächtige Kassiererinnen und Kassierer den Job.
    Jede Kasse schickt die Daten zu Stornos, Rückgaben, Korrekturen und dergleichen an eine zentrale Datenbank. Aus den Informationen errechnet das Programm Kassiererprofile. Wessen Arbeit stark Durchschnitt abweicht, macht sich verdächtig. Die Kriterien" legen im Einzelnen die Revisionsabteilungen fest, doch generell gilt: "Bei Auffälligkeiten wie überdurchschnittlichvielenStornierungen, Off nen der Kassenschublade ohne Verkauf nach einem Storno oder Warenrücknahmen ohne Kassenbon, können die Vorgänge nachträglich einzelnen Personen zugeordnet werden", sagt Rene Schiller, Marketing-Chef des Lord-Herstellers Logware. Ein Kündigungsgrund ist eine solche Datensammlung vor Gericht nicht. Doch auf der Basis können Unternehmen gezielt Detektive einsetzen. Oder sie konfrontieren die Mitarbeiter mit dem Material; woraufhin Schuldige meist gestehen. Wilke sieht Programme wie Lord kritisch:"Jeder, der in dem Raster auffällt, kann ein potenzieller Betrüger oder Dieb sein und verdient besondere Beobachtung." Dabei könne man vom Standard abweichen, weil man unausgeschlafen und deshalb unkonzentriert sei. Hier tut sich für Wilke die Gefahr technisierter Leistungskontrolle auf. "Es ist ja nicht schwierig, mit den Programmen zu berechnen, wie lange beispielsweise das Kassieren eines Samstagseinkaufs durchschnittlich dauert." Die Betriebsräte - ihre Zustimmung ist beim Einsatz technischer Kon trolleinrichtungen nötig - verurteilen die wertende Software weniger eindeutig. Im Gegenteil: Bei Kaufhof und Edeka haben sie dem Einsatz zugestimmt. Denn: "Die wollen ja nicht, dass ganze Abteilungen wegen Inventurverlusten oder dergleichen unter Generalverdacht fallen", erklärt Gewerkschaftler Wilke: "Angesichts der Leistungen kommerzieller Data-Mining-Programme verblüfft es, dass in den Vereinigten Staaten das "Information Awareness Office" noch drei Jahre für Forschung und Erprobung der eigenen Programme veranschlagt. 2005 sollen frühe Prototypen zur Terroristensuche einesgetz werden. Doch schon jetzt regt sich Protest. Datenschützer wie Marc Botenberg vom Informationszentrum für Daten schutz sprechen vom "ehrgeizigsten öffentlichen Überwachungssystem, das je vorgeschlagen wurde". Sie warnen besonders davor, Daten aus der Internetnutzung und private Mails auszuwerten. Das Verteidigungsministerium rudert zurück. Man denke nicht daran, über die Software im Inland aktiv zu werden. "Das werden die Geheimdienste, die Spionageabwehr und die Strafverfolger tun", sagt Unterstaatssekretär Edward Aldridge. Man werde während der Entwicklung und der Tests mit konstruierten und einigen - aus Sicht der Datenschützer unbedenklichen - realen Informationen arbeiten. Zu denken gibt jedoch Aldriges Antwort auf die Frage, warum so viel Geld für die Entwicklung von Übersetzungssoftware eingeplant ist: Damit man Datenbanken in anderen Sprachen nutzen könne - sofern man auf sie rechtmäßigen Zugriff bekommt."
  12. Kipcic, O.; Cramer, C.: Wie Zeitungsinhalte Forschung und Entwicklung befördern (2017) 0.01
    0.011656174 = product of:
      0.06993704 = sum of:
        0.06993704 = weight(_text_:forschung in 3885) [ClassicSimilarity], result of:
          0.06993704 = score(doc=3885,freq=2.0), product of:
            0.1858777 = queryWeight, product of:
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.038207654 = queryNorm
            0.376253 = fieldWeight in 3885, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3885)
      0.16666667 = coord(1/6)
    
  13. Fenstermacher, K.D.; Ginsburg, M.: Client-side monitoring for Web mining (2003) 0.01
    0.011128641 = product of:
      0.06677184 = sum of:
        0.06677184 = weight(_text_:propose in 1611) [ClassicSimilarity], result of:
          0.06677184 = score(doc=1611,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 1611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=1611)
      0.16666667 = coord(1/6)
    
    Abstract
    "Garbage in, garbage out" is a well-known phrase in computer analysis, and one that comes to mind when mining Web data to draw conclusions about Web users. The challenge is that data analysts wish to infer patterns of client-side behavior from server-side data. However, because only a fraction of the user's actions ever reaches the Web server, analysts must rely an incomplete data. In this paper, we propose a client-side monitoring system that is unobtrusive and supports flexible data collection. Moreover, the proposed framework encompasses client-side applications beyond the Web browser. Expanding monitoring beyond the browser to incorporate standard office productivity tools enables analysts to derive a much richer and more accurate picture of user behavior an the Web.
  14. Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004) 0.01
    0.011128641 = product of:
      0.06677184 = sum of:
        0.06677184 = weight(_text_:propose in 2563) [ClassicSimilarity], result of:
          0.06677184 = score(doc=2563,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 2563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=2563)
      0.16666667 = coord(1/6)
    
    Abstract
    Topic discovery is an important means for marketing, e-Business and social science studies. As well, it can be applied to various purposes, such as identifying a group with certain properties and observing the emergence and diminishment of a certain cyber community. Previous topic discovery work (J.M. Kleinberg, Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, p. 668) requires manual judgment of usefulness of outcomes and is thus incapable of handling the explosive growth of the Internet. In this paper, we propose the Automatic Topic Discovery (ATD) method, which combines a method of base set construction, a clustering algorithm and an iterative principal eigenvector computation method to discover the topics relevant to a given query without using manual examination. Given a query, ATD returns with topics associated with the query and top representative pages for each topic. Our experiments show that the ATD method performs better than the traditional eigenvector method in terms of computation time and topic discovery quality.
  15. Li, J.; Zhang, P.; Cao, J.: External concept support for group support systems through Web mining (2009) 0.01
    0.011128641 = product of:
      0.06677184 = sum of:
        0.06677184 = weight(_text_:propose in 2806) [ClassicSimilarity], result of:
          0.06677184 = score(doc=2806,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 2806, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=2806)
      0.16666667 = coord(1/6)
    
    Abstract
    External information plays an important role in group decision-making processes, yet research about external information support for Group Support Systems (GSS) has been lacking. In this study, we propose an approach to build a concept space to provide external concept support for GSS users. Built on a Web mining algorithm, the approach can mine a concept space from the Web and retrieve related concepts from the concept space based on users' comments in a real-time manner. We conduct two experiments to evaluate the quality of the proposed approach and the effectiveness of the external concept support provided by this approach. The experiment results indicate that the concept space mined from the Web contained qualified concepts to stimulate divergent thinking. The results also demonstrate that external concept support in GSS greatly enhanced group productivity for idea generation tasks.
  16. Sun, X.; Lin, H.: Topical community detection from mining user tagging behavior and interest (2013) 0.01
    0.011128641 = product of:
      0.06677184 = sum of:
        0.06677184 = weight(_text_:propose in 605) [ClassicSimilarity], result of:
          0.06677184 = score(doc=605,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=605)
      0.16666667 = coord(1/6)
    
    Abstract
    With the development of Web2.0, social tagging systems in which users can freely choose tags to annotate resources according to their interests have attracted much attention. In particular, literature on the emergence of collective intelligence in social tagging systems has increased. In this article, we propose a probabilistic generative model to detect latent topical communities among users. Social tags and resource contents are leveraged to model user interest in two similar and correlated ways. Our primary goal is to capture user tagging behavior and interest and discover the emergent topical community structure. The communities should be groups of users with frequent social interactions as well as similar topical interests, which would have important research implications for personalized information services. Experimental results on two real social tagging data sets with different genres have shown that the proposed generative model more accurately models user interest and detects high-quality and meaningful topical communities.
  17. Sarnikar, S.; Zhang, Z.; Zhao, J.L.: Query-performance prediction for effective query routing in domain-specific repositories (2014) 0.01
    0.011128641 = product of:
      0.06677184 = sum of:
        0.06677184 = weight(_text_:propose in 1326) [ClassicSimilarity], result of:
          0.06677184 = score(doc=1326,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 1326, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=1326)
      0.16666667 = coord(1/6)
    
    Abstract
    The effective use of corporate memory is becoming increasingly important because every aspect of e-business requires access to information repositories. Unfortunately, less-than-satisfying effectiveness in state-of-the-art information-retrieval techniques is well known, even for some of the best search engines such as Google. In this study, the authors resolve this retrieval ineffectiveness problem by developing a new framework for predicting query performance, which is the first step toward better retrieval effectiveness. Specifically, they examine the relationship between query performance and query context. A query context consists of the query itself, the document collection, and the interaction between the two. The authors first analyze the characteristics of query context and develop various features for predicting query performance. Then, they propose a context-sensitive model for predicting query performance based on the characteristics of the query and the document collection. Finally, they validate this model with respect to five real-world collections of documents and demonstrate its utility in routing queries to the correct repository with high accuracy.
  18. Song, J.; Huang, Y.; Qi, X.; Li, Y.; Li, F.; Fu, K.; Huang, T.: Discovering hierarchical topic evolution in time-stamped documents (2016) 0.01
    0.011128641 = product of:
      0.06677184 = sum of:
        0.06677184 = weight(_text_:propose in 2853) [ClassicSimilarity], result of:
          0.06677184 = score(doc=2853,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 2853, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=2853)
      0.16666667 = coord(1/6)
    
    Abstract
    The objective of this paper is to propose a hierarchical topic evolution model (HTEM) that can organize time-varying topics in a hierarchy and discover their evolutions with multiple timescales. In the proposed HTEM, topics near the root of the hierarchy are more abstract and also evolve in the longer timescales than those near the leaves. To achieve this goal, the distance-dependent Chinese restaurant process (ddCRP) is extended to a new nested process that is able to simultaneously model the dependencies among data and the relationship between clusters. The HTEM is proposed based on the new process for time-stamped documents, in which the timestamp is utilized to measure the dependencies among documents. Moreover, an efficient Gibbs sampler is developed for the proposed HTEM. Our experimental results on two popular real-world data sets verify that the proposed HTEM can capture coherent topics and discover their hierarchical evolutions. It also outperforms the baseline model in terms of likelihood on held-out data.
  19. Wongthontham, P.; Abu-Salih, B.: Ontology-based approach for semantic data extraction from social big data : state-of-the-art and research directions (2018) 0.01
    0.011128641 = product of:
      0.06677184 = sum of:
        0.06677184 = weight(_text_:propose in 4097) [ClassicSimilarity], result of:
          0.06677184 = score(doc=4097,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 4097, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=4097)
      0.16666667 = coord(1/6)
    
    Abstract
    A challenge of managing and extracting useful knowledge from social media data sources has attracted much attention from academic and industry. To address this challenge, semantic analysis of textual data is focused in this paper. We propose an ontology-based approach to extract semantics of textual data and define the domain of data. In other words, we semantically analyse the social data at two levels i.e. the entity level and the domain level. We have chosen Twitter as a social channel challenge for a purpose of concept proof. Domain knowledge is captured in ontologies which are then used to enrich the semantics of tweets provided with specific semantic conceptual representation of entities that appear in the tweets. Case studies are used to demonstrate this approach. We experiment and evaluate our proposed approach with a public dataset collected from Twitter and from the politics domain. The ontology-based approach leverages entity extraction and concept mappings in terms of quantity and accuracy of concept identification.
  20. Ebrahimi, M.; ShafieiBavani, E.; Wong, R.; Chen, F.: Twitter user geolocation by filtering of highly mentioned users (2018) 0.01
    0.011128641 = product of:
      0.06677184 = sum of:
        0.06677184 = weight(_text_:propose in 4286) [ClassicSimilarity], result of:
          0.06677184 = score(doc=4286,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 4286, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=4286)
      0.16666667 = coord(1/6)
    
    Abstract
    Geolocated social media data provide a powerful source of information about places and regional human behavior. Because only a small amount of social media data have been geolocation-annotated, inference techniques play a substantial role to increase the volume of annotated data. Conventional research in this area has been based on the text content of posts from a given user or the social network of the user, with some recent crossovers between the text- and network-based approaches. This paper proposes a novel approach to categorize highly-mentioned users (celebrities) into Local and Global types, and consequently use Local celebrities as location indicators. A label propagation algorithm is then used over the refined social network for geolocation inference. Finally, we propose a hybrid approach by merging a text-based method as a back-off strategy into our network-based approach. Empirical experiments over three standard Twitter benchmark data sets demonstrate that our approach outperforms state-of-the-art user geolocation methods.

Years

Languages

  • e 39
  • d 18

Types