Search (21 results, page 1 of 2)

  • × language_ss:"e"
  • × theme_ss:"Data Mining"
  • × type_ss:"a"
  1. Baeza-Yates, R.; Hurtado, C.; Mendoza, M.: Improving search engines by query clustering (2007) 0.01
    0.014134051 = product of:
      0.084804304 = sum of:
        0.084804304 = weight(_text_:ranking in 601) [ClassicSimilarity], result of:
          0.084804304 = score(doc=601,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.4183332 = fieldWeight in 601, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.0546875 = fieldNorm(doc=601)
      0.16666667 = coord(1/6)
    
    Abstract
    In this paper, we present a framework for clustering Web search engine queries whose aim is to identify groups of queries used to search for similar information on the Web. The framework is based on a novel term vector model of queries that integrates user selections and the content of selected documents extracted from the logs of a search engine. The query representation obtained allows us to treat query clustering similarly to standard document clustering. We study the application of the clustering framework to two problems: relevance ranking boosting and query recommendation. Finally, we evaluate with experiments the effectiveness of our approach.
  2. Bella, A. La; Fronzetti Colladon, A.; Battistoni, E.; Castellan, S.; Francucci, M.: Assessing perceived organizational leadership styles through twitter text mining (2018) 0.01
    0.0121149 = product of:
      0.0726894 = sum of:
        0.0726894 = weight(_text_:ranking in 2400) [ClassicSimilarity], result of:
          0.0726894 = score(doc=2400,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.35857132 = fieldWeight in 2400, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.046875 = fieldNorm(doc=2400)
      0.16666667 = coord(1/6)
    
    Abstract
    We propose a text classification tool based on support vector machines for the assessment of organizational leadership styles, as appearing to Twitter users. We collected Twitter data over 51 days, related to the first 30 Italian organizations in the 2015 ranking of Forbes Global 2000-out of which we selected the five with the most relevant volumes of tweets. We analyzed the communication of the company leaders, together with the dialogue among the stakeholders of each company, to understand the association with perceived leadership styles and dimensions. To assess leadership profiles, we referred to the 10-factor model developed by Barchiesi and La Bella in 2007. We maintain the distinctiveness of the approach we propose, as it allows a rapid assessment of the perceived leadership capabilities of an enterprise, as they emerge from its social media interactions. It can also be used to show how companies respond and manage their communication when specific events take place, and to assess their stakeholder's reactions.
  3. Lihui, C.; Lian, C.W.: Using Web structure and summarisation techniques for Web content mining (2005) 0.01
    0.010095751 = product of:
      0.0605745 = sum of:
        0.0605745 = weight(_text_:ranking in 1046) [ClassicSimilarity], result of:
          0.0605745 = score(doc=1046,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.29880944 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1046)
      0.16666667 = coord(1/6)
    
    Abstract
    The dynamic nature and size of the Internet can result in difficulty finding relevant information. Most users typically express their information need via short queries to search engines and they often have to physically sift through the search results based on relevance ranking set by the search engines, making the process of relevance judgement time-consuming. In this paper, we describe a novel representation technique which makes use of the Web structure together with summarisation techniques to better represent knowledge in actual Web Documents. We named the proposed technique as Semantic Virtual Document (SVD). We will discuss how the proposed SVD can be used together with a suitable clustering algorithm to achieve an automatic content-based categorization of similar Web Documents. The auto-categorization facility as well as a "Tree-like" Graphical User Interface (GUI) for post-retrieval document browsing enhances the relevance judgement process for Internet users. Furthermore, we will introduce how our cluster-biased automatic query expansion technique can be used to overcome the ambiguity of short queries typically given by users. We will outline our experimental design to evaluate the effectiveness of the proposed SVD for representation and present a prototype called iSEARCH (Intelligent SEarch And Review of Cluster Hierarchy) for Web content mining. Our results confirm, quantify and extend previous research using Web structure and summarisation techniques, introducing novel techniques for knowledge representation to enhance Web content mining.
  4. Goldberg, D.M.; Zaman, N.; Brahma, A.; Aloiso, M.: Are mortgage loan closing delay risks predictable? : A predictive analysis using text mining on discussion threads (2022) 0.01
    0.010095751 = product of:
      0.0605745 = sum of:
        0.0605745 = weight(_text_:ranking in 501) [ClassicSimilarity], result of:
          0.0605745 = score(doc=501,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.29880944 = fieldWeight in 501, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.0390625 = fieldNorm(doc=501)
      0.16666667 = coord(1/6)
    
    Abstract
    Loan processors and underwriters at mortgage firms seek to gather substantial supporting documentation to properly understand and model loan risks. In doing so, loan originations become prone to closing delays, risking client dissatisfaction and consequent revenue losses. We collaborate with a large national mortgage firm to examine the extent to which these delays are predictable, using internal discussion threads to prioritize interventions for loans most at risk. Substantial work experience is required to predict delays, and we find that even highly trained employees have difficulty predicting delays by reviewing discussion threads. We develop an array of methods to predict loan delays. We apply four modern out-of-the-box sentiment analysis techniques, two dictionary-based and two rule-based, to predict delays. We contrast these approaches with domain-specific approaches, including firm-provided keyword searches and "smoke terms" derived using machine learning. Performance varies widely across sentiment approaches; while some sentiment approaches prioritize the top-ranking records well, performance quickly declines thereafter. The firm-provided keyword searches perform at the rate of random chance. We observe that the domain-specific smoke term approaches consistently outperform other approaches and offer better prediction than loan and borrower characteristics. We conclude that text mining solutions would greatly assist mortgage firms in delay prevention.
  5. Amir, A.; Feldman, R.; Kashi, R.: ¬A new and versatile method for association generation (1997) 0.01
    0.009068082 = product of:
      0.05440849 = sum of:
        0.05440849 = product of:
          0.081612736 = sum of:
            0.04099074 = weight(_text_:29 in 1270) [ClassicSimilarity], result of:
              0.04099074 = score(doc=1270,freq=2.0), product of:
                0.13183585 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03747799 = queryNorm
                0.31092256 = fieldWeight in 1270, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1270)
            0.040622 = weight(_text_:22 in 1270) [ClassicSimilarity], result of:
              0.040622 = score(doc=1270,freq=2.0), product of:
                0.13124153 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03747799 = queryNorm
                0.30952093 = fieldWeight in 1270, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1270)
          0.6666667 = coord(2/3)
      0.16666667 = coord(1/6)
    
    Date
    5. 4.1996 15:29:15
    Source
    Information systems. 22(1997) nos.5/6, S.333-347
  6. Hofstede, A.H.M. ter; Proper, H.A.; Van der Weide, T.P.: Exploiting fact verbalisation in conceptual information modelling (1997) 0.01
    0.007934572 = product of:
      0.047607433 = sum of:
        0.047607433 = product of:
          0.07141115 = sum of:
            0.035866898 = weight(_text_:29 in 2908) [ClassicSimilarity], result of:
              0.035866898 = score(doc=2908,freq=2.0), product of:
                0.13183585 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03747799 = queryNorm
                0.27205724 = fieldWeight in 2908, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2908)
            0.03554425 = weight(_text_:22 in 2908) [ClassicSimilarity], result of:
              0.03554425 = score(doc=2908,freq=2.0), product of:
                0.13124153 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03747799 = queryNorm
                0.2708308 = fieldWeight in 2908, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2908)
          0.6666667 = coord(2/3)
      0.16666667 = coord(1/6)
    
    Date
    5. 4.1996 15:29:15
    Source
    Information systems. 22(1997) nos.5/6, S.349-385
  7. Budzik, J.; Hammond, K.J.; Birnbaum, L.: Information access in context (2001) 0.00
    0.0039852113 = product of:
      0.023911266 = sum of:
        0.023911266 = product of:
          0.071733795 = sum of:
            0.071733795 = weight(_text_:29 in 3835) [ClassicSimilarity], result of:
              0.071733795 = score(doc=3835,freq=2.0), product of:
                0.13183585 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03747799 = queryNorm
                0.5441145 = fieldWeight in 3835, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3835)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    29. 3.2002 17:31:17
  8. Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.00
    0.0039493614 = product of:
      0.023696167 = sum of:
        0.023696167 = product of:
          0.0710885 = sum of:
            0.0710885 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
              0.0710885 = score(doc=4577,freq=2.0), product of:
                0.13124153 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03747799 = queryNorm
                0.5416616 = fieldWeight in 4577, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4577)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    2. 4.2000 18:01:22
  9. Cardie, C.: Empirical methods in information extraction (1997) 0.00
    0.0022772634 = product of:
      0.013663581 = sum of:
        0.013663581 = product of:
          0.04099074 = sum of:
            0.04099074 = weight(_text_:29 in 3246) [ClassicSimilarity], result of:
              0.04099074 = score(doc=3246,freq=2.0), product of:
                0.13183585 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03747799 = queryNorm
                0.31092256 = fieldWeight in 3246, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3246)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    6. 3.1999 13:50:29
  10. Bath, P.A.: Data mining in health and medical information (2003) 0.00
    0.0022772634 = product of:
      0.013663581 = sum of:
        0.013663581 = product of:
          0.04099074 = sum of:
            0.04099074 = weight(_text_:29 in 4263) [ClassicSimilarity], result of:
              0.04099074 = score(doc=4263,freq=2.0), product of:
                0.13183585 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03747799 = queryNorm
                0.31092256 = fieldWeight in 4263, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4263)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    23.10.2005 18:29:03
  11. Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.00
    0.0022567778 = product of:
      0.013540667 = sum of:
        0.013540667 = product of:
          0.040622 = sum of:
            0.040622 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
              0.040622 = score(doc=1737,freq=2.0), product of:
                0.13124153 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03747799 = queryNorm
                0.30952093 = fieldWeight in 1737, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1737)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    22.11.1998 18:57:22
  12. Srinivasan, P.: Text mining in biomedicine : challenges and opportunities (2006) 0.00
    0.0017079476 = product of:
      0.010247685 = sum of:
        0.010247685 = product of:
          0.030743055 = sum of:
            0.030743055 = weight(_text_:29 in 1497) [ClassicSimilarity], result of:
              0.030743055 = score(doc=1497,freq=2.0), product of:
                0.13183585 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03747799 = queryNorm
                0.23319192 = fieldWeight in 1497, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1497)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    29. 2.2008 17:14:09
  13. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.00
    0.0017079476 = product of:
      0.010247685 = sum of:
        0.010247685 = product of:
          0.030743055 = sum of:
            0.030743055 = weight(_text_:29 in 3464) [ClassicSimilarity], result of:
              0.030743055 = score(doc=3464,freq=2.0), product of:
                0.13183585 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03747799 = queryNorm
                0.23319192 = fieldWeight in 3464, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3464)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    1. 6.2010 9:29:57
  14. Qiu, X.Y.; Srinivasan, P.; Hu, Y.: Supervised learning models to predict firm performance with annual reports : an empirical study (2014) 0.00
    0.0017079476 = product of:
      0.010247685 = sum of:
        0.010247685 = product of:
          0.030743055 = sum of:
            0.030743055 = weight(_text_:29 in 1205) [ClassicSimilarity], result of:
              0.030743055 = score(doc=1205,freq=2.0), product of:
                0.13183585 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03747799 = queryNorm
                0.23319192 = fieldWeight in 1205, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1205)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    29. 1.2014 16:46:40
  15. Raan, A.F.J. van; Noyons, E.C.M.: Discovery of patterns of scientific and technological development and knowledge transfer (2002) 0.00
    0.0014232898 = product of:
      0.008539738 = sum of:
        0.008539738 = product of:
          0.025619213 = sum of:
            0.025619213 = weight(_text_:29 in 3603) [ClassicSimilarity], result of:
              0.025619213 = score(doc=3603,freq=2.0), product of:
                0.13183585 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03747799 = queryNorm
                0.19432661 = fieldWeight in 3603, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3603)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Source
    Gaining insight from research information (CRIS2002): Proceedings of the 6th International Conference an Current Research Information Systems, University of Kassel, August 29 - 31, 2002. Eds: W. Adamczak u. A. Nase
  16. Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.00
    0.0014232898 = product of:
      0.008539738 = sum of:
        0.008539738 = product of:
          0.025619213 = sum of:
            0.025619213 = weight(_text_:29 in 967) [ClassicSimilarity], result of:
              0.025619213 = score(doc=967,freq=2.0), product of:
                0.13183585 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03747799 = queryNorm
                0.19432661 = fieldWeight in 967, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=967)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    25. 6.2013 19:05:29
  17. Tu, Y.-N.; Hsu, S.-L.: Constructing conceptual trajectory maps to trace the development of research fields (2016) 0.00
    0.0014232898 = product of:
      0.008539738 = sum of:
        0.008539738 = product of:
          0.025619213 = sum of:
            0.025619213 = weight(_text_:29 in 3059) [ClassicSimilarity], result of:
              0.025619213 = score(doc=3059,freq=2.0), product of:
                0.13183585 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03747799 = queryNorm
                0.19432661 = fieldWeight in 3059, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3059)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    21. 7.2016 19:29:19
  18. Gill, A.J.; Hinrichs-Krapels, S.; Blanke, T.; Grant, J.; Hedges, M.; Tanner, S.: Insight workflow : systematically combining human and computational methods to explore textual data (2017) 0.00
    0.0014232898 = product of:
      0.008539738 = sum of:
        0.008539738 = product of:
          0.025619213 = sum of:
            0.025619213 = weight(_text_:29 in 3682) [ClassicSimilarity], result of:
              0.025619213 = score(doc=3682,freq=2.0), product of:
                0.13183585 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03747799 = queryNorm
                0.19432661 = fieldWeight in 3682, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3682)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    16.11.2017 14:00:29
  19. Hallonsten, O.; Holmberg, D.: Analyzing structural stratification in the Swedish higher education system : data contextualization with policy-history analysis (2013) 0.00
    0.0014104862 = product of:
      0.008462917 = sum of:
        0.008462917 = product of:
          0.025388751 = sum of:
            0.025388751 = weight(_text_:22 in 668) [ClassicSimilarity], result of:
              0.025388751 = score(doc=668,freq=2.0), product of:
                0.13124153 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03747799 = queryNorm
                0.19345059 = fieldWeight in 668, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=668)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    22. 3.2013 19:43:01
  20. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.00
    0.0014104862 = product of:
      0.008462917 = sum of:
        0.008462917 = product of:
          0.025388751 = sum of:
            0.025388751 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
              0.025388751 = score(doc=1605,freq=2.0), product of:
                0.13124153 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03747799 = queryNorm
                0.19345059 = fieldWeight in 1605, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1605)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22