Search (7 results, page 1 of 1)

  • × theme_ss:"Data Mining"
  • × theme_ss:"Internet"
  • × type_ss:"a"
  1. Klein, H.: Web Content Mining (2004) 0.01
    0.012391705 = product of:
      0.037175115 = sum of:
        0.008743925 = weight(_text_:in in 3154) [ClassicSimilarity], result of:
          0.008743925 = score(doc=3154,freq=12.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.14725187 = fieldWeight in 3154, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.03125 = fieldNorm(doc=3154)
        0.02843119 = weight(_text_:und in 3154) [ClassicSimilarity], result of:
          0.02843119 = score(doc=3154,freq=18.0), product of:
            0.09675359 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.043654136 = queryNorm
            0.29385152 = fieldWeight in 3154, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.03125 = fieldNorm(doc=3154)
      0.33333334 = coord(2/6)
    
    Abstract
    Web Mining - ein Schlagwort, das mit der Verbreitung des Internets immer öfter zu lesen und zu hören ist. Die gegenwärtige Forschung beschäftigt sich aber eher mit dem Nutzungsverhalten der Internetnutzer, und ein Blick in Tagungsprogramme einschlägiger Konferenzen (z.B. GOR - German Online Research) zeigt, dass die Analyse der Inhalte kaum Thema ist. Auf der GOR wurden 1999 zwei Vorträge zu diesem Thema gehalten, auf der Folgekonferenz 2001 kein einziger. Web Mining ist der Oberbegriff für zwei Typen von Web Mining: Web Usage Mining und Web Content Mining. Unter Web Usage Mining versteht man das Analysieren von Daten, wie sie bei der Nutzung des WWW anfallen und von den Servern protokolliert wenden. Man kann ermitteln, welche Seiten wie oft aufgerufen wurden, wie lange auf den Seiten verweilt wurde und vieles andere mehr. Beim Web Content Mining wird der Inhalt der Webseiten untersucht, der nicht nur Text, sondern auf Bilder, Video- und Audioinhalte enthalten kann. Die Software für die Analyse von Webseiten ist in den Grundzügen vorhanden, doch müssen die meisten Webseiten für die entsprechende Analysesoftware erst aufbereitet werden. Zuerst müssen die relevanten Websites ermittelt werden, die die gesuchten Inhalte enthalten. Das geschieht meist mit Suchmaschinen, von denen es mittlerweile Hunderte gibt. Allerdings kann man nicht davon ausgehen, dass die Suchmaschinen alle existierende Webseiten erfassen. Das ist unmöglich, denn durch das schnelle Wachstum des Internets kommen täglich Tausende von Webseiten hinzu, und bereits bestehende ändern sich der werden gelöscht. Oft weiß man auch nicht, wie die Suchmaschinen arbeiten, denn das gehört zu den Geschäftsgeheimnissen der Betreiber. Man muss also davon ausgehen, dass die Suchmaschinen nicht alle relevanten Websites finden (können). Der nächste Schritt ist das Herunterladen der Websites, dafür gibt es Software, die unter den Bezeichnungen OfflineReader oder Webspider zu finden ist. Das Ziel dieser Programme ist, die Website in einer Form herunterzuladen, die es erlaubt, die Website offline zu betrachten. Die Struktur der Website wird in der Regel beibehalten. Wer die Inhalte einer Website analysieren will, muss also alle Dateien mit seiner Analysesoftware verarbeiten können. Software für Inhaltsanalyse geht davon aus, dass nur Textinformationen in einer einzigen Datei verarbeitet werden. QDA Software (qualitative data analysis) verarbeitet dagegen auch Audiound Videoinhalte sowie internetspezifische Kommunikation wie z.B. Chats.
    Series
    Fortschritte in der Wissensorganisation; Bd.7
    Source
    Wissensorganisation und Edutainment: Wissen im Spannungsfeld von Gesellschaft, Gestaltung und Industrie. Proceedings der 7. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation, Berlin, 21.-23.3.2001. Hrsg.: C. Lehner, H.P. Ohly u. G. Rahmstorf
  2. Huvila, I.: Mining qualitative data on human information behaviour from the Web (2010) 0.01
    0.009900499 = product of:
      0.029701497 = sum of:
        0.006246961 = weight(_text_:in in 4676) [ClassicSimilarity], result of:
          0.006246961 = score(doc=4676,freq=2.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.10520181 = fieldWeight in 4676, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4676)
        0.023454536 = weight(_text_:und in 4676) [ClassicSimilarity], result of:
          0.023454536 = score(doc=4676,freq=4.0), product of:
            0.09675359 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.043654136 = queryNorm
            0.24241515 = fieldWeight in 4676, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4676)
      0.33333334 = coord(2/6)
    
    Abstract
    This paper discusses an approach of collecting qualitative data on human information behaviour that is based on mining web data using search engines. The approach is technically the same that has been used for some time in webometric research to make statistical inferences on web data, but the present paper shows how the same tools and data collecting methods can be used to gather data for qualitative data analysis on human information behaviour.
    Source
    Information und Wissen: global, sozial und frei? Proceedings des 12. Internationalen Symposiums für Informationswissenschaft (ISI 2011) ; Hildesheim, 9. - 11. März 2011. Hrsg.: J. Griesbaum, T. Mandl u. C. Womser-Hacker
  3. Raan, A.F.J. van; Noyons, E.C.M.: Discovery of patterns of scientific and technological development and knowledge transfer (2002) 0.00
    0.001821651 = product of:
      0.010929906 = sum of:
        0.010929906 = weight(_text_:in in 3603) [ClassicSimilarity], result of:
          0.010929906 = score(doc=3603,freq=12.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.18406484 = fieldWeight in 3603, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3603)
      0.16666667 = coord(1/6)
    
    Abstract
    This paper addresses a bibliometric methodology to discover the structure of the scientific 'landscape' in order to gain detailed insight into the development of MD fields, their interaction, and the transfer of knowledge between them. This methodology is appropriate to visualize the position of MD activities in relation to interdisciplinary MD developments, and particularly in relation to socio-economic problems. Furthermore, it allows the identification of the major actors. It even provides the possibility of foresight. We describe a first approach to apply bibliometric mapping as an instrument to investigate characteristics of knowledge transfer. In this paper we discuss the creation of 'maps of science' with help of advanced bibliometric methods. This 'bibliometric cartography' can be seen as a specific type of data-mining, applied to large amounts of scientific publications. As an example we describe the mapping of the field neuroscience, one of the largest and fast growing fields in the life sciences. The number of publications covered by this database is about 80,000 per year, the period covered is 1995-1998. Current research is going an to update the mapping for the years 1999-2002. This paper addresses the main lines of the methodology and its application in the study of knowledge transfer.
  4. Derek Doran, D.; Gokhale, S.S.: ¬A classification framework for web robots (2012) 0.00
    0.001682769 = product of:
      0.010096614 = sum of:
        0.010096614 = weight(_text_:in in 505) [ClassicSimilarity], result of:
          0.010096614 = score(doc=505,freq=4.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.17003182 = fieldWeight in 505, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0625 = fieldNorm(doc=505)
      0.16666667 = coord(1/6)
    
    Abstract
    The behavior of modern web robots varies widely when they crawl for different purposes. In this article, we present a framework to classify these web robots from two orthogonal perspectives, namely, their functionality and the types of resources they consume. Applying the classification framework to a year-long access log from the UConn SoE web server, we present trends that point to significant differences in their crawling behavior.
  5. Fenstermacher, K.D.; Ginsburg, M.: Client-side monitoring for Web mining (2003) 0.00
    0.0015457221 = product of:
      0.009274333 = sum of:
        0.009274333 = weight(_text_:in in 1611) [ClassicSimilarity], result of:
          0.009274333 = score(doc=1611,freq=6.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.1561842 = fieldWeight in 1611, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=1611)
      0.16666667 = coord(1/6)
    
    Abstract
    "Garbage in, garbage out" is a well-known phrase in computer analysis, and one that comes to mind when mining Web data to draw conclusions about Web users. The challenge is that data analysts wish to infer patterns of client-side behavior from server-side data. However, because only a fraction of the user's actions ever reaches the Web server, analysts must rely an incomplete data. In this paper, we propose a client-side monitoring system that is unobtrusive and supports flexible data collection. Moreover, the proposed framework encompasses client-side applications beyond the Web browser. Expanding monitoring beyond the browser to incorporate standard office productivity tools enables analysts to derive a much richer and more accurate picture of user behavior an the Web.
  6. Chen, H.; Chau, M.: Web mining : machine learning for Web applications (2003) 0.00
    0.0012620769 = product of:
      0.0075724614 = sum of:
        0.0075724614 = weight(_text_:in in 4242) [ClassicSimilarity], result of:
          0.0075724614 = score(doc=4242,freq=4.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.12752387 = fieldWeight in 4242, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=4242)
      0.16666667 = coord(1/6)
    
    Abstract
    With more than two billion pages created by millions of Web page authors and organizations, the World Wide Web is a tremendously rich knowledge base. The knowledge comes not only from the content of the pages themselves, but also from the unique characteristics of the Web, such as its hyperlink structure and its diversity of content and languages. Analysis of these characteristics often reveals interesting patterns and new knowledge. Such knowledge can be used to improve users' efficiency and effectiveness in searching for information an the Web, and also for applications unrelated to the Web, such as support for decision making or business management. The Web's size and its unstructured and dynamic content, as well as its multilingual nature, make the extraction of useful knowledge a challenging research problem. Furthermore, the Web generates a large amount of data in other formats that contain valuable information. For example, Web server logs' information about user access patterns can be used for information personalization or improving Web page design.
  7. Kong, S.; Ye, F.; Feng, L.; Zhao, Z.: Towards the prediction problems of bursting hashtags on Twitter (2015) 0.00
    0.0010411602 = product of:
      0.006246961 = sum of:
        0.006246961 = weight(_text_:in in 2338) [ClassicSimilarity], result of:
          0.006246961 = score(doc=2338,freq=2.0), product of:
            0.059380736 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.043654136 = queryNorm
            0.10520181 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.16666667 = coord(1/6)
    
    Abstract
    Hundreds of thousands of hashtags are generated every day on Twitter. Only a few will burst and become trending topics. In this article, we provide the definition of a bursting hashtag and conduct a systematic study of a series of challenging prediction problems that span the entire life cycles of bursting hashtags. Around the problem of "how to build a system to predict bursting hashtags," we explore different types of features and present machine learning solutions. On real data sets from Twitter, experiments are conducted to evaluate the effectiveness of the proposed solutions and the contributions of features.