Search (66 results, page 1 of 4)

  • × theme_ss:"Data Mining"
  1. Wei, C.-P.; Lee, Y.-H.; Chiang, Y.-S.; Chen, C.-T.; Yang, C.C.C.: Exploiting temporal characteristics of features for effectively discovering event episodes from news corpora (2014) 0.12
    0.11603433 = product of:
      0.15471244 = sum of:
        0.008323434 = product of:
          0.033293735 = sum of:
            0.033293735 = weight(_text_:based in 1225) [ClassicSimilarity], result of:
              0.033293735 = score(doc=1225,freq=4.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23539014 = fieldWeight in 1225, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1225)
          0.25 = coord(1/4)
        0.056460675 = weight(_text_:term in 1225) [ClassicSimilarity], result of:
          0.056460675 = score(doc=1225,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.25776416 = fieldWeight in 1225, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1225)
        0.08992833 = weight(_text_:frequency in 1225) [ClassicSimilarity], result of:
          0.08992833 = score(doc=1225,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.32531026 = fieldWeight in 1225, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1225)
      0.75 = coord(3/4)
    
    Abstract
    An organization performing environmental scanning generally monitors or tracks various events concerning its external environment. One of the major resources for environmental scanning is online news documents, which are readily accessible on news websites or infomediaries. However, the proliferation of the World Wide Web, which increases information sources and improves information circulation, has vastly expanded the amount of information to be scanned. Thus, it is essential to develop an effective event episode discovery mechanism to organize news documents pertaining to an event of interest. In this study, we propose two new metrics, Term Frequency × Inverse Document FrequencyTempo (TF×IDFTempo) and TF×Enhanced-IDFTempo, and develop a temporal-based event episode discovery (TEED) technique that uses the proposed metrics for feature selection and document representation. Using a traditional TF×IDF-based hierarchical agglomerative clustering technique as a performance benchmark, our empirical evaluation reveals that the proposed TEED technique outperforms its benchmark, as measured by cluster recall and cluster precision. In addition, the use of TF×Enhanced-IDFTempo significantly improves the effectiveness of event episode discovery when compared with the use of TF×IDFTempo.
  2. Tu, Y.-N.; Hsu, S.-L.: Constructing conceptual trajectory maps to trace the development of research fields (2016) 0.05
    0.047906943 = product of:
      0.095813885 = sum of:
        0.005885557 = product of:
          0.023542227 = sum of:
            0.023542227 = weight(_text_:based in 3059) [ClassicSimilarity], result of:
              0.023542227 = score(doc=3059,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.16644597 = fieldWeight in 3059, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3059)
          0.25 = coord(1/4)
        0.08992833 = weight(_text_:frequency in 3059) [ClassicSimilarity], result of:
          0.08992833 = score(doc=3059,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.32531026 = fieldWeight in 3059, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3059)
      0.5 = coord(2/4)
    
    Abstract
    This study proposes a new method to construct and trace the trajectory of conceptual development of a research field by combining main path analysis, citation analysis, and text-mining techniques. Main path analysis, a method used commonly to trace the most critical path in a citation network, helps describe the developmental trajectory of a research field. This study extends the main path analysis method and applies text-mining techniques in the new method, which reflects the trajectory of conceptual development in an academic research field more accurately than citation frequency, which represents only the articles examined. Articles can be merged based on similarity of concepts, and by merging concepts the history of a research field can be described more precisely. The new method was applied to the "h-index" and "text mining" fields. The precision, recall, and F-measures of the h-index were 0.738, 0.652, and 0.658 and those of text-mining were 0.501, 0.653, and 0.551, respectively. Last, this study not only establishes the conceptual trajectory map of a research field, but also recommends keywords that are more precise than those used currently by researchers. These precise keywords could enable researchers to gather related works more quickly than before.
  3. Baeza-Yates, R.; Hurtado, C.; Mendoza, M.: Improving search engines by query clustering (2007) 0.04
    0.043642364 = product of:
      0.08728473 = sum of:
        0.00823978 = product of:
          0.03295912 = sum of:
            0.03295912 = weight(_text_:based in 601) [ClassicSimilarity], result of:
              0.03295912 = score(doc=601,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23302436 = fieldWeight in 601, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=601)
          0.25 = coord(1/4)
        0.079044946 = weight(_text_:term in 601) [ClassicSimilarity], result of:
          0.079044946 = score(doc=601,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.36086982 = fieldWeight in 601, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0546875 = fieldNorm(doc=601)
      0.5 = coord(2/4)
    
    Abstract
    In this paper, we present a framework for clustering Web search engine queries whose aim is to identify groups of queries used to search for similar information on the Web. The framework is based on a novel term vector model of queries that integrates user selections and the content of selected documents extracted from the logs of a search engine. The query representation obtained allows us to treat query clustering similarly to standard document clustering. We study the application of the clustering framework to two problems: relevance ranking boosting and query recommendation. Finally, we evaluate with experiments the effectiveness of our approach.
  4. Srinivasan, P.: Text mining : generating hypotheses from MEDLINE (2004) 0.04
    0.037407737 = product of:
      0.074815474 = sum of:
        0.0070626684 = product of:
          0.028250674 = sum of:
            0.028250674 = weight(_text_:based in 2225) [ClassicSimilarity], result of:
              0.028250674 = score(doc=2225,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.19973516 = fieldWeight in 2225, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2225)
          0.25 = coord(1/4)
        0.06775281 = weight(_text_:term in 2225) [ClassicSimilarity], result of:
          0.06775281 = score(doc=2225,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.309317 = fieldWeight in 2225, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.046875 = fieldNorm(doc=2225)
      0.5 = coord(2/4)
    
    Abstract
    Hypothesis generation, a crucial initial step for making scientific discoveries, relies an prior knowledge, experience, and intuition. Chance connections made between seemingly distinct subareas sometimes turn out to be fruitful. The goal in text mining is to assist in this process by automatically discovering a small set of interesting hypotheses from a suitable text collection. In this report, we present open and closed text mining algorithms that are built within the discovery framework established by Swanson and Smalheiser. Our algorithms represent topics using metadata profiles. When applied to MEDLINE, these are McSH based profiles. We present experiments that demonstrate the effectiveness of our algorithms. Specifically, our algorithms successfully generate ranked term lists where the key terms representing novel relationships between topics are ranked high.
  5. Bella, A. La; Fronzetti Colladon, A.; Battistoni, E.; Castellan, S.; Francucci, M.: Assessing perceived organizational leadership styles through twitter text mining (2018) 0.04
    0.03706847 = product of:
      0.07413694 = sum of:
        0.0070626684 = product of:
          0.028250674 = sum of:
            0.028250674 = weight(_text_:based in 2400) [ClassicSimilarity], result of:
              0.028250674 = score(doc=2400,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.19973516 = fieldWeight in 2400, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2400)
          0.25 = coord(1/4)
        0.06707428 = product of:
          0.13414855 = sum of:
            0.13414855 = weight(_text_:assessment in 2400) [ClassicSimilarity], result of:
              0.13414855 = score(doc=2400,freq=4.0), product of:
                0.25917634 = queryWeight, product of:
                  5.52102 = idf(docFreq=480, maxDocs=44218)
                  0.04694356 = queryNorm
                0.51759565 = fieldWeight in 2400, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.52102 = idf(docFreq=480, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2400)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    We propose a text classification tool based on support vector machines for the assessment of organizational leadership styles, as appearing to Twitter users. We collected Twitter data over 51 days, related to the first 30 Italian organizations in the 2015 ranking of Forbes Global 2000-out of which we selected the five with the most relevant volumes of tweets. We analyzed the communication of the company leaders, together with the dialogue among the stakeholders of each company, to understand the association with perceived leadership styles and dimensions. To assess leadership profiles, we referred to the 10-factor model developed by Barchiesi and La Bella in 2007. We maintain the distinctiveness of the approach we propose, as it allows a rapid assessment of the perceived leadership capabilities of an enterprise, as they emerge from its social media interactions. It can also be used to show how companies respond and manage their communication when specific events take place, and to assess their stakeholder's reactions.
  6. Goldberg, D.M.; Zaman, N.; Brahma, A.; Aloiso, M.: Are mortgage loan closing delay risks predictable? : A predictive analysis using text mining on discussion threads (2022) 0.03
    0.032392055 = product of:
      0.06478411 = sum of:
        0.008323434 = product of:
          0.033293735 = sum of:
            0.033293735 = weight(_text_:based in 501) [ClassicSimilarity], result of:
              0.033293735 = score(doc=501,freq=4.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23539014 = fieldWeight in 501, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=501)
          0.25 = coord(1/4)
        0.056460675 = weight(_text_:term in 501) [ClassicSimilarity], result of:
          0.056460675 = score(doc=501,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.25776416 = fieldWeight in 501, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=501)
      0.5 = coord(2/4)
    
    Abstract
    Loan processors and underwriters at mortgage firms seek to gather substantial supporting documentation to properly understand and model loan risks. In doing so, loan originations become prone to closing delays, risking client dissatisfaction and consequent revenue losses. We collaborate with a large national mortgage firm to examine the extent to which these delays are predictable, using internal discussion threads to prioritize interventions for loans most at risk. Substantial work experience is required to predict delays, and we find that even highly trained employees have difficulty predicting delays by reviewing discussion threads. We develop an array of methods to predict loan delays. We apply four modern out-of-the-box sentiment analysis techniques, two dictionary-based and two rule-based, to predict delays. We contrast these approaches with domain-specific approaches, including firm-provided keyword searches and "smoke terms" derived using machine learning. Performance varies widely across sentiment approaches; while some sentiment approaches prioritize the top-ranking records well, performance quickly declines thereafter. The firm-provided keyword searches perform at the rate of random chance. We observe that the domain-specific smoke term approaches consistently outperform other approaches and offer better prediction than loan and borrower characteristics. We conclude that text mining solutions would greatly assist mortgage firms in delay prevention.
  7. Saz, J.T.: Perspectivas en recuperacion y explotacion de informacion electronica : el 'data mining' (1997) 0.03
    0.028230337 = product of:
      0.11292135 = sum of:
        0.11292135 = weight(_text_:term in 3723) [ClassicSimilarity], result of:
          0.11292135 = score(doc=3723,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.5155283 = fieldWeight in 3723, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.078125 = fieldNorm(doc=3723)
      0.25 = coord(1/4)
    
    Abstract
    Presents the concept and the techniques identified by the term data mining. Explains the principles and phases of developing a data mining process, and the main types of data mining tools
  8. KDD : techniques and applications (1998) 0.03
    0.026143279 = product of:
      0.052286558 = sum of:
        0.014125337 = product of:
          0.056501348 = sum of:
            0.056501348 = weight(_text_:based in 6783) [ClassicSimilarity], result of:
              0.056501348 = score(doc=6783,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.39947033 = fieldWeight in 6783, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6783)
          0.25 = coord(1/4)
        0.038161222 = product of:
          0.076322444 = sum of:
            0.076322444 = weight(_text_:22 in 6783) [ClassicSimilarity], result of:
              0.076322444 = score(doc=6783,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.46428138 = fieldWeight in 6783, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6783)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Footnote
    A special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997
    Source
    Knowledge-based systems. 10(1998) no.7, S.401-470
  9. Perugini, S.; Ramakrishnan, N.: Mining Web functional dependencies for flexible information access (2007) 0.02
    0.023954237 = product of:
      0.09581695 = sum of:
        0.09581695 = weight(_text_:term in 602) [ClassicSimilarity], result of:
          0.09581695 = score(doc=602,freq=4.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.4374403 = fieldWeight in 602, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.046875 = fieldNorm(doc=602)
      0.25 = coord(1/4)
    
    Abstract
    We present an approach to enhancing information access through Web structure mining in contrast to traditional approaches involving usage mining. Specifically, we mine the hardwired hierarchical hyperlink structure of Web sites to identify patterns of term-term co-occurrences we call Web functional dependencies (FDs). Intuitively, a Web FD x -> y declares that all paths through a site involving a hyperlink labeled x also contain a hyperlink labeled y. The complete set of FDs satisfied by a site help characterize (flexible and expressive) interaction paradigms supported by a site, where a paradigm is the set of explorable sequences therein. We describe algorithms for mining FDs and results from mining several hierarchical Web sites and present several interface designs that can exploit such FDs to provide compelling user experiences.
  10. Gill, A.J.; Hinrichs-Krapels, S.; Blanke, T.; Grant, J.; Hedges, M.; Tanner, S.: Insight workflow : systematically combining human and computational methods to explore textual data (2017) 0.02
    0.022482082 = product of:
      0.08992833 = sum of:
        0.08992833 = weight(_text_:frequency in 3682) [ClassicSimilarity], result of:
          0.08992833 = score(doc=3682,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.32531026 = fieldWeight in 3682, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3682)
      0.25 = coord(1/4)
    
    Abstract
    Analyzing large quantities of real-world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real-world data set as our case study, and use this exploration as a demonstration of our "insight workflow," which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in-depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.
  11. Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.02
    0.017428853 = product of:
      0.034857705 = sum of:
        0.009416891 = product of:
          0.037667565 = sum of:
            0.037667565 = weight(_text_:based in 1737) [ClassicSimilarity], result of:
              0.037667565 = score(doc=1737,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.26631355 = fieldWeight in 1737, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1737)
          0.25 = coord(1/4)
        0.025440816 = product of:
          0.05088163 = sum of:
            0.05088163 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
              0.05088163 = score(doc=1737,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.30952093 = fieldWeight in 1737, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1737)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Defines digital libraries and discusses the effects of new technology on librarians. Examines the different viewpoints of librarians and information technologists on digital libraries. Describes the development of a digital library at the National Drug Intelligence Center, USA, which was carried out in collaboration with information technology experts. The system is based on Web enabled search technology to find information, data visualization and data mining to visualize it and use of SGML as an information standard to store it
    Date
    22.11.1998 18:57:22
  12. Berendt, B.; Krause, B.; Kolbe-Nusser, S.: Intelligent scientific authoring tools : interactive data mining for constructive uses of citation networks (2010) 0.02
    0.016938202 = product of:
      0.06775281 = sum of:
        0.06775281 = weight(_text_:term in 4226) [ClassicSimilarity], result of:
          0.06775281 = score(doc=4226,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.309317 = fieldWeight in 4226, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.046875 = fieldNorm(doc=4226)
      0.25 = coord(1/4)
    
    Abstract
    Many powerful methods and tools exist for extracting meaning from scientific publications, their texts, and their citation links. However, existing proposals often neglect a fundamental aspect of learning: that understanding and learning require an active and constructive exploration of a domain. In this paper, we describe a new method and a tool that use data mining and interactivity to turn the typical search and retrieve dialogue, in which the user asks questions and a system gives answers, into a dialogue that also involves sense-making, in which the user has to become active by constructing a bibliography and a domain model of the search term(s). This model starts from an automatically generated and annotated clustering solution that is iteratively modified by users. The tool is part of an integrated authoring system covering all phases from search through reading and sense-making to writing. Two evaluation studies demonstrate the usability of this interactive and constructive approach, and they show that clusters and groups represent identifiable sub-topics.
  13. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.01
    0.012111973 = product of:
      0.024223946 = sum of:
        0.008323434 = product of:
          0.033293735 = sum of:
            0.033293735 = weight(_text_:based in 1605) [ClassicSimilarity], result of:
              0.033293735 = score(doc=1605,freq=4.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23539014 = fieldWeight in 1605, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1605)
          0.25 = coord(1/4)
        0.015900511 = product of:
          0.031801023 = sum of:
            0.031801023 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
              0.031801023 = score(doc=1605,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.19345059 = fieldWeight in 1605, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1605)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
  14. Frické, M.: Big data and its epistemology (2015) 0.01
    0.011857167 = product of:
      0.047428668 = sum of:
        0.047428668 = product of:
          0.094857335 = sum of:
            0.094857335 = weight(_text_:assessment in 1811) [ClassicSimilarity], result of:
              0.094857335 = score(doc=1811,freq=2.0), product of:
                0.25917634 = queryWeight, product of:
                  5.52102 = idf(docFreq=480, maxDocs=44218)
                  0.04694356 = queryNorm
                0.36599535 = fieldWeight in 1811, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.52102 = idf(docFreq=480, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1811)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The article considers whether Big Data, in the form of data-driven science, will enable the discovery, or appraisal, of universal scientific theories, instrumentalist tools, or inductive inferences. It points out, initially, that such aspirations are similar to the now-discredited inductivist approach to science. On the positive side, Big Data may permit larger sample sizes, cheaper and more extensive testing of theories, and the continuous assessment of theories. On the negative side, data-driven science encourages passive data collection, as opposed to experimentation and testing, and hornswoggling ("unsound statistical fiddling"). The roles of theory and data in inductive algorithms, statistical modeling, and scientific discoveries are analyzed, and it is argued that theory is needed at every turn. Data-driven science is a chimera.
  15. Chakrabarti, S.: Mining the Web : discovering knowledge from hypertext data (2003) 0.01
    0.011292135 = product of:
      0.04516854 = sum of:
        0.04516854 = weight(_text_:term in 2222) [ClassicSimilarity], result of:
          0.04516854 = score(doc=2222,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.20621133 = fieldWeight in 2222, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.03125 = fieldNorm(doc=2222)
      0.25 = coord(1/4)
    
    Footnote
    Rez. in: JASIST 55(2004) no.3, S.275-276 (C. Chen): "This is a book about finding significant statistical patterns on the Web - in particular, patterns that are associated with hypertext documents, topics, hyperlinks, and queries. The term pattern in this book refers to dependencies among such items. On the one hand, the Web contains useful information an just about every topic under the sun. On the other hand, just like searching for a needle in a haystack, one would need powerful tools to locate useful information an the vast land of the Web. Soumen Chakrabarti's book focuses an a wide range of techniques for machine learning and data mining an the Web. The goal of the book is to provide both the technical Background and tools and tricks of the trade of Web content mining. Much of the technical content reflects the state of the art between 1995 and 2002. The targeted audience is researchers and innovative developers in this area, as well as newcomers who intend to enter this area. The book begins with an introduction chapter. The introduction chapter explains fundamental concepts such as crawling and indexing as well as clustering and classification. The remaining eight chapters are organized into three parts: i) infrastructure, ii) learning and iii) applications.
  16. Kantardzic, M.: Data mining : concepts, models, methods, and algorithms (2003) 0.01
    0.011292135 = product of:
      0.04516854 = sum of:
        0.04516854 = weight(_text_:term in 2291) [ClassicSimilarity], result of:
          0.04516854 = score(doc=2291,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.20621133 = fieldWeight in 2291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.03125 = fieldNorm(doc=2291)
      0.25 = coord(1/4)
    
    Abstract
    This book offers a comprehensive introduction to the exploding field of data mining. We are surrounded by data, numerical and otherwise, which must be analyzed and processed to convert it into information that informs, instructs, answers, or otherwise aids understanding and decision-making. Due to the ever-increasing complexity and size of today's data sets, a new term, data mining, was created to describe the indirect, automatic data analysis techniques that utilize more complex and sophisticated tools than those which analysts used in the past to do mere data analysis. "Data Mining: Concepts, Models, Methods, and Algorithms" discusses data mining principles and then describes representative state-of-the-art methods and algorithms originating from different disciplines such as statistics, machine learning, neural networks, fuzzy logic, and evolutionary computation. Detailed algorithms are provided with necessary explanations and illustrative examples. This text offers guidance: how and when to use a particular software tool (with their companion data sets) from among the hundreds offered when faced with a data set to mine. This allows analysts to create and perform their own data mining experiments using their knowledge of the methodologies and techniques provided. This book emphasizes the selection of appropriate methodologies and data analysis software, as well as parameter tuning. These critically important, qualitative decisions can only be made with the deeper understanding of parameter meaning and its role in the technique that is offered here. Data mining is an exploding field and this book offers much-needed guidance to selecting among the numerous analysis programs that are available.
  17. Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.01
    0.011130357 = product of:
      0.04452143 = sum of:
        0.04452143 = product of:
          0.08904286 = sum of:
            0.08904286 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
              0.08904286 = score(doc=4577,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.5416616 = fieldWeight in 4577, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4577)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    2. 4.2000 18:01:22
  18. Lusti, M.: Data Warehousing and Data Mining : Eine Einführung in entscheidungsunterstützende Systeme (1999) 0.01
    0.006360204 = product of:
      0.025440816 = sum of:
        0.025440816 = product of:
          0.05088163 = sum of:
            0.05088163 = weight(_text_:22 in 4261) [ClassicSimilarity], result of:
              0.05088163 = score(doc=4261,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.30952093 = fieldWeight in 4261, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4261)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    17. 7.2002 19:22:06
  19. Amir, A.; Feldman, R.; Kashi, R.: ¬A new and versatile method for association generation (1997) 0.01
    0.006360204 = product of:
      0.025440816 = sum of:
        0.025440816 = product of:
          0.05088163 = sum of:
            0.05088163 = weight(_text_:22 in 1270) [ClassicSimilarity], result of:
              0.05088163 = score(doc=1270,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.30952093 = fieldWeight in 1270, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1270)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Information systems. 22(1997) nos.5/6, S.333-347
  20. Hofstede, A.H.M. ter; Proper, H.A.; Van der Weide, T.P.: Exploiting fact verbalisation in conceptual information modelling (1997) 0.01
    0.0055651786 = product of:
      0.022260714 = sum of:
        0.022260714 = product of:
          0.04452143 = sum of:
            0.04452143 = weight(_text_:22 in 2908) [ClassicSimilarity], result of:
              0.04452143 = score(doc=2908,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.2708308 = fieldWeight in 2908, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2908)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Information systems. 22(1997) nos.5/6, S.349-385

Years

Languages

  • e 58
  • d 7
  • sp 1
  • More… Less…

Types