Search (6 results, page 1 of 1)

  • × theme_ss:"Internet"
  • × theme_ss:"Data Mining"
  1. Chen, H.; Chau, M.: Web mining : machine learning for Web applications (2003) 0.03
    0.029079849 = product of:
      0.058159698 = sum of:
        0.058159698 = product of:
          0.116319396 = sum of:
            0.116319396 = weight(_text_:web in 4242) [ClassicSimilarity], result of:
              0.116319396 = score(doc=4242,freq=20.0), product of:
                0.17002425 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.052098576 = queryNorm
                0.6841342 = fieldWeight in 4242, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4242)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    With more than two billion pages created by millions of Web page authors and organizations, the World Wide Web is a tremendously rich knowledge base. The knowledge comes not only from the content of the pages themselves, but also from the unique characteristics of the Web, such as its hyperlink structure and its diversity of content and languages. Analysis of these characteristics often reveals interesting patterns and new knowledge. Such knowledge can be used to improve users' efficiency and effectiveness in searching for information an the Web, and also for applications unrelated to the Web, such as support for decision making or business management. The Web's size and its unstructured and dynamic content, as well as its multilingual nature, make the extraction of useful knowledge a challenging research problem. Furthermore, the Web generates a large amount of data in other formats that contain valuable information. For example, Web server logs' information about user access patterns can be used for information personalization or improving Web page design.
  2. Derek Doran, D.; Gokhale, S.S.: ¬A classification framework for web robots (2012) 0.02
    0.024522282 = product of:
      0.049044564 = sum of:
        0.049044564 = product of:
          0.09808913 = sum of:
            0.09808913 = weight(_text_:web in 505) [ClassicSimilarity], result of:
              0.09808913 = score(doc=505,freq=8.0), product of:
                0.17002425 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.052098576 = queryNorm
                0.5769126 = fieldWeight in 505, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0625 = fieldNorm(doc=505)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The behavior of modern web robots varies widely when they crawl for different purposes. In this article, we present a framework to classify these web robots from two orthogonal perspectives, namely, their functionality and the types of resources they consume. Applying the classification framework to a year-long access log from the UConn SoE web server, we present trends that point to significant differences in their crawling behavior.
  3. Fenstermacher, K.D.; Ginsburg, M.: Client-side monitoring for Web mining (2003) 0.02
    0.02432995 = product of:
      0.0486599 = sum of:
        0.0486599 = product of:
          0.0973198 = sum of:
            0.0973198 = weight(_text_:web in 1611) [ClassicSimilarity], result of:
              0.0973198 = score(doc=1611,freq=14.0), product of:
                0.17002425 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.052098576 = queryNorm
                0.57238775 = fieldWeight in 1611, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1611)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    "Garbage in, garbage out" is a well-known phrase in computer analysis, and one that comes to mind when mining Web data to draw conclusions about Web users. The challenge is that data analysts wish to infer patterns of client-side behavior from server-side data. However, because only a fraction of the user's actions ever reaches the Web server, analysts must rely an incomplete data. In this paper, we propose a client-side monitoring system that is unobtrusive and supports flexible data collection. Moreover, the proposed framework encompasses client-side applications beyond the Web browser. Expanding monitoring beyond the browser to incorporate standard office productivity tools enables analysts to derive a much richer and more accurate picture of user behavior an the Web.
    Footnote
    Teil eines Themenheftes: "Web retrieval and mining: A machine learning perspective"
  4. Chakrabarti, S.: Mining the Web : discovering knowledge from hypertext data (2003) 0.02
    0.020332802 = product of:
      0.040665604 = sum of:
        0.040665604 = product of:
          0.08133121 = sum of:
            0.08133121 = weight(_text_:web in 2222) [ClassicSimilarity], result of:
              0.08133121 = score(doc=2222,freq=22.0), product of:
                0.17002425 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.052098576 = queryNorm
                0.47835067 = fieldWeight in 2222, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2222)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Footnote
    Rez. in: JASIST 55(2004) no.3, S.275-276 (C. Chen): "This is a book about finding significant statistical patterns on the Web - in particular, patterns that are associated with hypertext documents, topics, hyperlinks, and queries. The term pattern in this book refers to dependencies among such items. On the one hand, the Web contains useful information an just about every topic under the sun. On the other hand, just like searching for a needle in a haystack, one would need powerful tools to locate useful information an the vast land of the Web. Soumen Chakrabarti's book focuses an a wide range of techniques for machine learning and data mining an the Web. The goal of the book is to provide both the technical Background and tools and tricks of the trade of Web content mining. Much of the technical content reflects the state of the art between 1995 and 2002. The targeted audience is researchers and innovative developers in this area, as well as newcomers who intend to enter this area. The book begins with an introduction chapter. The introduction chapter explains fundamental concepts such as crawling and indexing as well as clustering and classification. The remaining eight chapters are organized into three parts: i) infrastructure, ii) learning and iii) applications.
    Part I, Infrastructure, has two chapters: Chapter 2 on crawling the Web and Chapter 3 an Web search and information retrieval. The second part of the book, containing chapters 4, 5, and 6, is the centerpiece. This part specifically focuses an machine learning in the context of hypertext. Part III is a collection of applications that utilize the techniques described in earlier chapters. Chapter 7 is an social network analysis. Chapter 8 is an resource discovery. Chapter 9 is an the future of Web mining. Overall, this is a valuable reference book for researchers and developers in the field of Web mining. It should be particularly useful for those who would like to design and probably code their own Computer programs out of the equations and pseudocodes an most of the pages. For a student, the most valuable feature of the book is perhaps the formal and consistent treatments of concepts across the board. For what is behind and beyond the technical details, one has to either dig deeper into the bibliographic notes at the end of each chapter, or resort to more in-depth analysis of relevant subjects in the literature. lf you are looking for successful stories about Web mining or hard-way-learned lessons of failures, this is not the book."
  5. Huvila, I.: Mining qualitative data on human information behaviour from the Web (2010) 0.02
    0.018582305 = product of:
      0.03716461 = sum of:
        0.03716461 = product of:
          0.07432922 = sum of:
            0.07432922 = weight(_text_:web in 4676) [ClassicSimilarity], result of:
              0.07432922 = score(doc=4676,freq=6.0), product of:
                0.17002425 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.052098576 = queryNorm
                0.43716836 = fieldWeight in 4676, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4676)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper discusses an approach of collecting qualitative data on human information behaviour that is based on mining web data using search engines. The approach is technically the same that has been used for some time in webometric research to make statistical inferences on web data, but the present paper shows how the same tools and data collecting methods can be used to gather data for qualitative data analysis on human information behaviour.
  6. Klein, H.: Web Content Mining (2004) 0.02
    0.017339872 = product of:
      0.034679744 = sum of:
        0.034679744 = product of:
          0.06935949 = sum of:
            0.06935949 = weight(_text_:web in 3154) [ClassicSimilarity], result of:
              0.06935949 = score(doc=3154,freq=16.0), product of:
                0.17002425 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.052098576 = queryNorm
                0.4079388 = fieldWeight in 3154, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3154)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Web Mining - ein Schlagwort, das mit der Verbreitung des Internets immer öfter zu lesen und zu hören ist. Die gegenwärtige Forschung beschäftigt sich aber eher mit dem Nutzungsverhalten der Internetnutzer, und ein Blick in Tagungsprogramme einschlägiger Konferenzen (z.B. GOR - German Online Research) zeigt, dass die Analyse der Inhalte kaum Thema ist. Auf der GOR wurden 1999 zwei Vorträge zu diesem Thema gehalten, auf der Folgekonferenz 2001 kein einziger. Web Mining ist der Oberbegriff für zwei Typen von Web Mining: Web Usage Mining und Web Content Mining. Unter Web Usage Mining versteht man das Analysieren von Daten, wie sie bei der Nutzung des WWW anfallen und von den Servern protokolliert wenden. Man kann ermitteln, welche Seiten wie oft aufgerufen wurden, wie lange auf den Seiten verweilt wurde und vieles andere mehr. Beim Web Content Mining wird der Inhalt der Webseiten untersucht, der nicht nur Text, sondern auf Bilder, Video- und Audioinhalte enthalten kann. Die Software für die Analyse von Webseiten ist in den Grundzügen vorhanden, doch müssen die meisten Webseiten für die entsprechende Analysesoftware erst aufbereitet werden. Zuerst müssen die relevanten Websites ermittelt werden, die die gesuchten Inhalte enthalten. Das geschieht meist mit Suchmaschinen, von denen es mittlerweile Hunderte gibt. Allerdings kann man nicht davon ausgehen, dass die Suchmaschinen alle existierende Webseiten erfassen. Das ist unmöglich, denn durch das schnelle Wachstum des Internets kommen täglich Tausende von Webseiten hinzu, und bereits bestehende ändern sich der werden gelöscht. Oft weiß man auch nicht, wie die Suchmaschinen arbeiten, denn das gehört zu den Geschäftsgeheimnissen der Betreiber. Man muss also davon ausgehen, dass die Suchmaschinen nicht alle relevanten Websites finden (können). Der nächste Schritt ist das Herunterladen der Websites, dafür gibt es Software, die unter den Bezeichnungen OfflineReader oder Webspider zu finden ist. Das Ziel dieser Programme ist, die Website in einer Form herunterzuladen, die es erlaubt, die Website offline zu betrachten. Die Struktur der Website wird in der Regel beibehalten. Wer die Inhalte einer Website analysieren will, muss also alle Dateien mit seiner Analysesoftware verarbeiten können. Software für Inhaltsanalyse geht davon aus, dass nur Textinformationen in einer einzigen Datei verarbeitet werden. QDA Software (qualitative data analysis) verarbeitet dagegen auch Audiound Videoinhalte sowie internetspezifische Kommunikation wie z.B. Chats.