Search (18 results, page 1 of 1)

Derek Doran, D.; Gokhale, S.S.: ¬A classification framework for web robots (2012) 0.07

0.06588599 = product of:
  0.26354396 = sum of:
    0.067092 = weight(_text_:web in 505) [ClassicSimilarity], result of:
      0.067092 = score(doc=505,freq=8.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.5769126 = fieldWeight in 505, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0625 = fieldNorm(doc=505)
    0.12935995 = weight(_text_:log in 505) [ClassicSimilarity], result of:
      0.12935995 = score(doc=505,freq=2.0), product of:
        0.22837062 = queryWeight, product of:
          6.4086204 = idf(docFreq=197, maxDocs=44218)
          0.035634913 = queryNorm
        0.5664474 = fieldWeight in 505, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.4086204 = idf(docFreq=197, maxDocs=44218)
          0.0625 = fieldNorm(doc=505)
    0.067092 = weight(_text_:web in 505) [ClassicSimilarity], result of:
      0.067092 = score(doc=505,freq=8.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.5769126 = fieldWeight in 505, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0625 = fieldNorm(doc=505)
  0.25 = coord(3/12)

Abstract: The behavior of modern web robots varies widely when they crawl for different purposes. In this article, we present a framework to classify these web robots from two orthogonal perspectives, namely, their functionality and the types of resources they consume. Applying the classification framework to a year-long access log from the UConn SoE web server, we present trends that point to significant differences in their crawling behavior.

Liu, B.: Web data mining : exploring hyperlinks, contents, and usage data (2011) 0.07

0.06585966 = product of:
  0.19757898 = sum of:
    0.060475912 = weight(_text_:web in 354) [ClassicSimilarity], result of:
      0.060475912 = score(doc=354,freq=26.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.520022 = fieldWeight in 354, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=354)
    0.032903954 = weight(_text_:world in 354) [ClassicSimilarity], result of:
      0.032903954 = score(doc=354,freq=4.0), product of:
        0.13696888 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.035634913 = queryNorm
        0.24022943 = fieldWeight in 354, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.03125 = fieldNorm(doc=354)
    0.043723192 = weight(_text_:wide in 354) [ClassicSimilarity], result of:
      0.043723192 = score(doc=354,freq=4.0), product of:
        0.1578897 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.035634913 = queryNorm
        0.2769224 = fieldWeight in 354, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.03125 = fieldNorm(doc=354)
    0.060475912 = weight(_text_:web in 354) [ClassicSimilarity], result of:
      0.060475912 = score(doc=354,freq=26.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.520022 = fieldWeight in 354, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=354)
  0.33333334 = coord(4/12)

Abstract: Web mining aims to discover useful information and knowledge from the Web hyperlink structure, page contents, and usage data. Although Web mining uses many conventional data mining techniques, it is not purely an application of traditional data mining due to the semistructured and unstructured nature of the Web data and its heterogeneity. It has also developed many of its own algorithms and techniques. Liu has written a comprehensive text on Web data mining. Key topics of structure mining, content mining, and usage mining are covered both in breadth and in depth. His book brings together all the essential concepts and algorithms from related areas such as data mining, machine learning, and text processing to form an authoritative and coherent text. The book offers a rich blend of theory and practice, addressing seminal research ideas, as well as examining the technology from a practical point of view. It is suitable for students, researchers and practitioners interested in Web mining both as a learning text and a reference book. Lecturers can readily use it for classes on data mining, Web mining, and Web search. Additional teaching materials such as lecture slides, datasets, and implemented algorithms are available online.
Content: Inhalt: 1. Introduction 2. Association Rules and Sequential Patterns 3. Supervised Learning 4. Unsupervised Learning 5. Partially Supervised Learning 6. Information Retrieval and Web Search 7. Social Network Analysis 8. Web Crawling 9. Structured Data Extraction: Wrapper Generation 10. Information Integration
RSWK: World Wide Web / Data Mining
Subject: World Wide Web / Data Mining

Miao, Q.; Li, Q.; Zeng, D.: Fine-grained opinion mining by integrating multiple review sources (2010) 0.05

0.048591226 = product of:
  0.14577368 = sum of:
    0.02935275 = weight(_text_:web in 4104) [ClassicSimilarity], result of:
      0.02935275 = score(doc=4104,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 4104, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4104)
    0.040716566 = weight(_text_:world in 4104) [ClassicSimilarity], result of:
      0.040716566 = score(doc=4104,freq=2.0), product of:
        0.13696888 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.035634913 = queryNorm
        0.29726875 = fieldWeight in 4104, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4104)
    0.02935275 = weight(_text_:web in 4104) [ClassicSimilarity], result of:
      0.02935275 = score(doc=4104,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.25239927 = fieldWeight in 4104, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4104)
    0.046351604 = product of:
      0.09270321 = sum of:
        0.09270321 = weight(_text_:2.0 in 4104) [ClassicSimilarity], result of:
          0.09270321 = score(doc=4104,freq=2.0), product of:
            0.20667298 = queryWeight, product of:
              5.799733 = idf(docFreq=363, maxDocs=44218)
              0.035634913 = queryNorm
            0.4485502 = fieldWeight in 4104, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.799733 = idf(docFreq=363, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4104)
      0.5 = coord(1/2)
  0.33333334 = coord(4/12)

Abstract: With the rapid development of Web 2.0, online reviews have become extremely valuable sources for mining customers' opinions. Fine-grained opinion mining has attracted more and more attention of both applied and theoretical research. In this article, the authors study how to automatically mine product features and opinions from multiple review sources. Specifically, they propose an integration strategy to solve the issue. Within the integration strategy, the authors mine domain knowledge from semistructured reviews and then exploit the domain knowledge to assist product feature extraction and sentiment orientation identification from unstructured reviews. Finally, feature-opinion tuples are generated. Experimental results on real-world datasets show that the proposed approach is effective.

Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.04

0.037927527 = product of:
  0.11378258 = sum of:
    0.03631461 = weight(_text_:web in 1605) [ClassicSimilarity], result of:
      0.03631461 = score(doc=1605,freq=6.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.3122631 = fieldWeight in 1605, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1605)
    0.029083263 = weight(_text_:world in 1605) [ClassicSimilarity], result of:
      0.029083263 = score(doc=1605,freq=2.0), product of:
        0.13696888 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.035634913 = queryNorm
        0.21233483 = fieldWeight in 1605, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1605)
    0.03631461 = weight(_text_:web in 1605) [ClassicSimilarity], result of:
      0.03631461 = score(doc=1605,freq=6.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.3122631 = fieldWeight in 1605, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1605)
    0.012070097 = product of:
      0.024140194 = sum of:
        0.024140194 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
          0.024140194 = score(doc=1605,freq=2.0), product of:
            0.12478739 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.035634913 = queryNorm
            0.19345059 = fieldWeight in 1605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1605)
      0.5 = coord(1/2)
  0.33333334 = coord(4/12)

Abstract: Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
Source: Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22

Wei, C.-P.; Lee, Y.-H.; Chiang, Y.-S.; Chen, C.-T.; Yang, C.C.C.: Exploiting temporal characteristics of features for effectively discovering event episodes from news corpora (2014) 0.04

0.03655399 = product of:
  0.10966197 = sum of:
    0.02096625 = weight(_text_:web in 1225) [ClassicSimilarity], result of:
      0.02096625 = score(doc=1225,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.18028519 = fieldWeight in 1225, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1225)
    0.029083263 = weight(_text_:world in 1225) [ClassicSimilarity], result of:
      0.029083263 = score(doc=1225,freq=2.0), product of:
        0.13696888 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.035634913 = queryNorm
        0.21233483 = fieldWeight in 1225, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1225)
    0.038646206 = weight(_text_:wide in 1225) [ClassicSimilarity], result of:
      0.038646206 = score(doc=1225,freq=2.0), product of:
        0.1578897 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.035634913 = queryNorm
        0.24476713 = fieldWeight in 1225, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1225)
    0.02096625 = weight(_text_:web in 1225) [ClassicSimilarity], result of:
      0.02096625 = score(doc=1225,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.18028519 = fieldWeight in 1225, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1225)
  0.33333334 = coord(4/12)

Abstract: An organization performing environmental scanning generally monitors or tracks various events concerning its external environment. One of the major resources for environmental scanning is online news documents, which are readily accessible on news websites or infomediaries. However, the proliferation of the World Wide Web, which increases information sources and improves information circulation, has vastly expanded the amount of information to be scanned. Thus, it is essential to develop an effective event episode discovery mechanism to organize news documents pertaining to an event of interest. In this study, we propose two new metrics, Term Frequency × Inverse Document FrequencyTempo (TF×IDFTempo) and TF×Enhanced-IDFTempo, and develop a temporal-based event episode discovery (TEED) technique that uses the proposed metrics for feature selection and document representation. Using a traditional TF×IDF-based hierarchical agglomerative clustering technique as a performance benchmark, our empirical evaluation reveals that the proposed TEED technique outperforms its benchmark, as measured by cluster recall and cluster precision. In addition, the use of TF×Enhanced-IDFTempo significantly improves the effectiveness of event episode discovery when compared with the use of TF×IDFTempo.

Mining text data (2012) 0.03

0.030316532 = product of:
  0.090949595 = sum of:
    0.016773 = weight(_text_:web in 362) [ClassicSimilarity], result of:
      0.016773 = score(doc=362,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.14422815 = fieldWeight in 362, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=362)
    0.030916965 = weight(_text_:wide in 362) [ClassicSimilarity], result of:
      0.030916965 = score(doc=362,freq=2.0), product of:
        0.1578897 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.035634913 = queryNorm
        0.1958137 = fieldWeight in 362, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.03125 = fieldNorm(doc=362)
    0.016773 = weight(_text_:web in 362) [ClassicSimilarity], result of:
      0.016773 = score(doc=362,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.14422815 = fieldWeight in 362, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=362)
    0.02648663 = product of:
      0.05297326 = sum of:
        0.05297326 = weight(_text_:2.0 in 362) [ClassicSimilarity], result of:
          0.05297326 = score(doc=362,freq=2.0), product of:
            0.20667298 = queryWeight, product of:
              5.799733 = idf(docFreq=363, maxDocs=44218)
              0.035634913 = queryNorm
            0.2563144 = fieldWeight in 362, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.799733 = idf(docFreq=363, maxDocs=44218)
              0.03125 = fieldNorm(doc=362)
      0.5 = coord(1/2)
  0.33333334 = coord(4/12)

Abstract: Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including the key research content on the topic, and the future directions of research in the field. There is a special focus on Text Embedded with Heterogeneous and Multimedia Data which makes the mining process much more challenging. A number of methods have been designed such as transfer learning and cross-lingual mining for such cases. Mining Text Data simplifies the content, so that advanced-level students, practitioners and researchers in computer science can benefit from this book. Academic and corporate libraries, as well as ACM, IEEE, and Management Science focused on information security, electronic commerce, databases, data mining, machine learning, and statistics are the primary buyers for this reference book.

Zhang, Z.; Li, Q.; Zeng, D.; Ga, H.: Extracting evolutionary communities in community question answering (2014) 0.02

0.018760197 = product of:
  0.07504079 = sum of:
    0.02096625 = weight(_text_:web in 1286) [ClassicSimilarity], result of:
      0.02096625 = score(doc=1286,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.18028519 = fieldWeight in 1286, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1286)
    0.02096625 = weight(_text_:web in 1286) [ClassicSimilarity], result of:
      0.02096625 = score(doc=1286,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.18028519 = fieldWeight in 1286, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1286)
    0.033108287 = product of:
      0.06621657 = sum of:
        0.06621657 = weight(_text_:2.0 in 1286) [ClassicSimilarity], result of:
          0.06621657 = score(doc=1286,freq=2.0), product of:
            0.20667298 = queryWeight, product of:
              5.799733 = idf(docFreq=363, maxDocs=44218)
              0.035634913 = queryNorm
            0.320393 = fieldWeight in 1286, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.799733 = idf(docFreq=363, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1286)
      0.5 = coord(1/2)
  0.25 = coord(3/12)

Abstract: With the rapid growth of Web 2.0, community question answering (CQA) has become a prevalent information seeking channel, in which users form interactive communities by posting questions and providing answers. Communities may evolve over time, because of changes in users' interests, activities, and new users joining the network. To better understand user interactions in CQA communities, it is necessary to analyze the community structures and track community evolution over time. Existing work in CQA focuses on question searching or content quality detection, and the important problems of community extraction and evolutionary pattern detection have not been studied. In this article, we propose a probabilistic community model (PCM) to extract overlapping community structures and capture their evolution patterns in CQA. The empirical results show that our algorithm appears to improve the community extraction quality. We show empirically, using the iPhone data set, that interesting community evolution patterns can be discovered, with each evolution pattern reflecting the variation of users' interests over time. Our analysis suggests that individual users could benefit to gain comprehensive information from tracking the transition of products. We also show that the communities provide a decision-making basis for business.

Huvila, I.: Mining qualitative data on human information behaviour from the Web (2010) 0.02

0.016946819 = product of:
  0.10168091 = sum of:
    0.050840456 = weight(_text_:web in 4676) [ClassicSimilarity], result of:
      0.050840456 = score(doc=4676,freq=6.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.43716836 = fieldWeight in 4676, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4676)
    0.050840456 = weight(_text_:web in 4676) [ClassicSimilarity], result of:
      0.050840456 = score(doc=4676,freq=6.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.43716836 = fieldWeight in 4676, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4676)
  0.16666667 = coord(2/12)

Abstract: This paper discusses an approach of collecting qualitative data on human information behaviour that is based on mining web data using search engines. The approach is technically the same that has been used for some time in webometric research to make statistical inferences on web data, but the present paper shows how the same tools and data collecting methods can be used to gather data for qualitative data analysis on human information behaviour.

Sun, X.; Lin, H.: Topical community detection from mining user tagging behavior and interest (2013) 0.02
```
0.015342983 = product of:
  0.1841158 = sum of:
    0.1841158 = weight(_text_:tagging in 605) [ClassicSimilarity], result of:
      0.1841158 = score(doc=605,freq=10.0), product of:
        0.21038401 = queryWeight, product of:
          5.9038734 = idf(docFreq=327, maxDocs=44218)
          0.035634913 = queryNorm
        0.8751416 = fieldWeight in 605, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          5.9038734 = idf(docFreq=327, maxDocs=44218)
          0.046875 = fieldNorm(doc=605)
  0.083333336 = coord(1/12)
```
Abstract

With the development of Web2.0, social tagging systems in which users can freely choose tags to annotate resources according to their interests have attracted much attention. In particular, literature on the emergence of collective intelligence in social tagging systems has increased. In this article, we propose a probabilistic generative model to detect latent topical communities among users. Social tags and resource contents are leveraged to model user interest in two similar and correlated ways. Our primary goal is to capture user tagging behavior and interest and discover the emergent topical community structure. The communities should be groups of users with frequent social interactions as well as similar topical interests, which would have important research implications for personalized information services. Experimental results on two real social tagging data sets with different genres have shown that the proposed generative model more accurately models user interest and detects high-quality and meaningful topical communities.
Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.01
```
0.0083865 = product of:
  0.050318997 = sum of:
    0.025159499 = weight(_text_:web in 3464) [ClassicSimilarity], result of:
      0.025159499 = score(doc=3464,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.21634221 = fieldWeight in 3464, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3464)
    0.025159499 = weight(_text_:web in 3464) [ClassicSimilarity], result of:
      0.025159499 = score(doc=3464,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.21634221 = fieldWeight in 3464, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3464)
  0.16666667 = coord(2/12)
```
Abstract

We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.

Kraker, P.; Kittel, C,; Enkhbayar, A.: Open Knowledge Maps : creating a visual interface to the world's scientific knowledge based on natural language processing (2016) 0.01

0.0083865 = product of:
  0.050318997 = sum of:
    0.025159499 = weight(_text_:web in 3205) [ClassicSimilarity], result of:
      0.025159499 = score(doc=3205,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.21634221 = fieldWeight in 3205, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3205)
    0.025159499 = weight(_text_:web in 3205) [ClassicSimilarity], result of:
      0.025159499 = score(doc=3205,freq=2.0), product of:
        0.11629491 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.035634913 = queryNorm
        0.21634221 = fieldWeight in 3205, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3205)
  0.16666667 = coord(2/12)

Abstract: The goal of Open Knowledge Maps is to create a visual interface to the world's scientific knowledge. The base for this visual interface consists of so-called knowledge maps, which enable the exploration of existing knowledge and the discovery of new knowledge. Our open source knowledge mapping software applies a mixture of summarization techniques and similarity measures on article metadata, which are iteratively chained together. After processing, the representation is saved in a database for use in a web visualization. In the future, we want to create a space for collective knowledge mapping that brings together individuals and communities involved in exploration and discovery. We want to enable people to guide each other in their discovery by collaboratively annotating and modifying the automatically created maps.

Tonkin, E.L.; Tourte, G.J.L.: Working with text. tools, techniques and approaches for text mining (2016) 0.01
```
0.0057179923 = product of:
  0.068615906 = sum of:
    0.068615906 = weight(_text_:tagging in 4019) [ClassicSimilarity], result of:
      0.068615906 = score(doc=4019,freq=2.0), product of:
        0.21038401 = queryWeight, product of:
          5.9038734 = idf(docFreq=327, maxDocs=44218)
          0.035634913 = queryNorm
        0.326146 = fieldWeight in 4019, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.9038734 = idf(docFreq=327, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4019)
  0.083333336 = coord(1/12)
```
Abstract

What is text mining, and how can it be used? What relevance do these methods have to everyday work in information science and the digital humanities? How does one develop competences in text mining? Working with Text provides a series of cross-disciplinary perspectives on text mining and its applications. As text mining raises legal and ethical issues, the legal background of text mining and the responsibilities of the engineer are discussed in this book. Chapters provide an introduction to the use of the popular GATE text mining package with data drawn from social media, the use of text mining to support semantic search, the development of an authority system to support content tagging, and recent techniques in automatic language evaluation. Focused studies describe text mining on historical texts, automated indexing using constrained vocabularies, and the use of natural language processing to explore the climate science literature. Interviews are included that offer a glimpse into the real-life experience of working within commercial and academic text mining.
Gill, A.J.; Hinrichs-Krapels, S.; Blanke, T.; Grant, J.; Hedges, M.; Tanner, S.: Insight workflow : systematically combining human and computational methods to explore textual data (2017) 0.00
```
0.0034274952 = product of:
  0.041129943 = sum of:
    0.041129943 = weight(_text_:world in 3682) [ClassicSimilarity], result of:
      0.041129943 = score(doc=3682,freq=4.0), product of:
        0.13696888 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.035634913 = queryNorm
        0.30028677 = fieldWeight in 3682, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3682)
  0.083333336 = coord(1/12)
```
Abstract

Analyzing large quantities of real-world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real-world data set as our case study, and use this exploration as a demonstration of our "insight workflow," which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in-depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.
Mohr, J.W.; Bogdanov, P.: Topic models : what they are and why they matter (2013) 0.00
```
0.0029083265 = product of:
  0.034899916 = sum of:
    0.034899916 = weight(_text_:world in 1142) [ClassicSimilarity], result of:
      0.034899916 = score(doc=1142,freq=2.0), product of:
        0.13696888 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.035634913 = queryNorm
        0.25480178 = fieldWeight in 1142, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=1142)
  0.083333336 = coord(1/12)
```
Abstract

We provide a brief, non-technical introduction to the text mining methodology known as "topic modeling." We summarize the theory and background of the method and discuss what kinds of things are found by topic models. Using a text corpus comprised of the eight articles from the special issue of Poetics on the subject of topic models, we run a topic model on these articles, both as a way to introduce the methodology and also to help summarize some of the ways in which social and cultural scientists are using topic models. We review some of the critiques and debates over the use of the method and finally, we link these developments back to some of the original innovations in the field of content analysis that were pioneered by Harold D. Lasswell and colleagues during and just after World War II.
Sarnikar, S.; Zhang, Z.; Zhao, J.L.: Query-performance prediction for effective query routing in domain-specific repositories (2014) 0.00
```
0.0029083265 = product of:
  0.034899916 = sum of:
    0.034899916 = weight(_text_:world in 1326) [ClassicSimilarity], result of:
      0.034899916 = score(doc=1326,freq=2.0), product of:
        0.13696888 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.035634913 = queryNorm
        0.25480178 = fieldWeight in 1326, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=1326)
  0.083333336 = coord(1/12)
```
Abstract

The effective use of corporate memory is becoming increasingly important because every aspect of e-business requires access to information repositories. Unfortunately, less-than-satisfying effectiveness in state-of-the-art information-retrieval techniques is well known, even for some of the best search engines such as Google. In this study, the authors resolve this retrieval ineffectiveness problem by developing a new framework for predicting query performance, which is the first step toward better retrieval effectiveness. Specifically, they examine the relationship between query performance and query context. A query context consists of the query itself, the document collection, and the interaction between the two. The authors first analyze the characteristics of query context and develop various features for predicting query performance. Then, they propose a context-sensitive model for predicting query performance based on the characteristics of the query and the document collection. Finally, they validate this model with respect to five real-world collections of documents and demonstrate its utility in routing queries to the correct repository with high accuracy.
Song, J.; Huang, Y.; Qi, X.; Li, Y.; Li, F.; Fu, K.; Huang, T.: Discovering hierarchical topic evolution in time-stamped documents (2016) 0.00
```
0.0029083265 = product of:
  0.034899916 = sum of:
    0.034899916 = weight(_text_:world in 2853) [ClassicSimilarity], result of:
      0.034899916 = score(doc=2853,freq=2.0), product of:
        0.13696888 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.035634913 = queryNorm
        0.25480178 = fieldWeight in 2853, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=2853)
  0.083333336 = coord(1/12)
```
Abstract

The objective of this paper is to propose a hierarchical topic evolution model (HTEM) that can organize time-varying topics in a hierarchy and discover their evolutions with multiple timescales. In the proposed HTEM, topics near the root of the hierarchy are more abstract and also evolve in the longer timescales than those near the leaves. To achieve this goal, the distance-dependent Chinese restaurant process (ddCRP) is extended to a new nested process that is able to simultaneously model the dependencies among data and the relationship between clusters. The HTEM is proposed based on the new process for time-stamped documents, in which the timestamp is utilized to measure the dependencies among documents. Moreover, an efficient Gibbs sampler is developed for the proposed HTEM. Our experimental results on two popular real-world data sets verify that the proposed HTEM can capture coherent topics and discover their hierarchical evolutions. It also outperforms the baseline model in terms of likelihood on held-out data.

Hallonsten, O.; Holmberg, D.: Analyzing structural stratification in the Swedish higher education system : data contextualization with policy-history analysis (2013) 0.00

0.0010058414 = product of:
  0.012070097 = sum of:
    0.012070097 = product of:
      0.024140194 = sum of:
        0.024140194 = weight(_text_:22 in 668) [ClassicSimilarity], result of:
          0.024140194 = score(doc=668,freq=2.0), product of:
            0.12478739 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.035634913 = queryNorm
            0.19345059 = fieldWeight in 668, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=668)
      0.5 = coord(1/2)
  0.083333336 = coord(1/12)

Date: 22. 3.2013 19:43:01

Fonseca, F.; Marcinkowski, M.; Davis, C.: Cyber-human systems of thought and understanding (2019) 0.00

0.0010058414 = product of:
  0.012070097 = sum of:
    0.012070097 = product of:
      0.024140194 = sum of:
        0.024140194 = weight(_text_:22 in 5011) [ClassicSimilarity], result of:
          0.024140194 = score(doc=5011,freq=2.0), product of:
            0.12478739 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.035634913 = queryNorm
            0.19345059 = fieldWeight in 5011, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5011)
      0.5 = coord(1/2)
  0.083333336 = coord(1/12)

Date: 7. 3.2019 16:32:22

Search (18 results, page 1 of 1)

Authors

Types

Themes

Subjects

Classifications