Search (8 results, page 1 of 1)

  • × year_i:[2020 TO 2030}
  • × theme_ss:"Data Mining"
  1. Mandl, T.: Text Mining und Data Mining (2023) 0.08
    0.076390296 = product of:
      0.15278059 = sum of:
        0.15278059 = product of:
          0.30556118 = sum of:
            0.30556118 = weight(_text_:mining in 774) [ClassicSimilarity], result of:
              0.30556118 = score(doc=774,freq=12.0), product of:
                0.28585905 = queryWeight, product of:
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.05066224 = queryNorm
                1.0689225 = fieldWeight in 774, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=774)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Text und Data Mining sind ein Bündel von Technologien, die eng mit den Themenfeldern Statistik, Maschinelles Lernen und dem Erkennen von Mustern verbunden sind. Die üblichen Definitionen beziehen eine Vielzahl von verschiedenen Verfahren mit ein, ohne eine exakte Grenze zu ziehen. Data Mining bezeichnet die Suche nach Mustern, Regelmäßigkeiten oder Auffälligkeiten in stark strukturierten und vor allem numerischen Daten. "Any algorithm that enumerates patterns from, or fits models to, data is a data mining algorithm." Numerische Daten und Datenbankinhalte werden als strukturierte Daten bezeichnet. Dagegen gelten Textdokumente in natürlicher Sprache als unstrukturierte Daten.
    Theme
    Data Mining
  2. Lowe, D.B.; Dollinger, I.; Koster, T.; Herbert, B.E.: Text mining for type of research classification (2021) 0.06
    0.059772413 = product of:
      0.11954483 = sum of:
        0.11954483 = product of:
          0.23908965 = sum of:
            0.23908965 = weight(_text_:mining in 720) [ClassicSimilarity], result of:
              0.23908965 = score(doc=720,freq=10.0), product of:
                0.28585905 = queryWeight, product of:
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.05066224 = queryNorm
                0.83639 = fieldWeight in 720, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.046875 = fieldNorm(doc=720)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This project brought together undergraduate students in Computer Science with librarians to mine abstracts of articles from the Texas A&M University Libraries' institutional repository, OAKTrust, in order to probe the creation of new metadata to improve discovery and use. The mining operation task consisted simply of classifying the articles into two categories of research type: basic research ("for understanding," "curiosity-based," or "knowledge-based") and applied research ("use-based"). These categories are fundamental especially for funders but are also important to researchers. The mining-to-classification steps took several iterations, but ultimately, we achieved good results with the toolkit BERT (Bidirectional Encoder Representations from Transformers). The project and its workflows represent a preview of what may lie ahead in the future of crafting metadata using text mining techniques to enhance discoverability.
    Theme
    Data Mining
  3. Wiegmann, S.: Hättest du die Titanic überlebt? : Eine kurze Einführung in das Data Mining mit freier Software (2023) 0.04
    0.044103958 = product of:
      0.088207915 = sum of:
        0.088207915 = product of:
          0.17641583 = sum of:
            0.17641583 = weight(_text_:mining in 876) [ClassicSimilarity], result of:
              0.17641583 = score(doc=876,freq=4.0), product of:
                0.28585905 = queryWeight, product of:
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.05066224 = queryNorm
                0.61714274 = fieldWeight in 876, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=876)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Theme
    Data Mining
  4. Jones, K.M.L.; Rubel, A.; LeClere, E.: ¬A matter of trust : higher education institutions as information fiduciaries in an age of educational data mining and learning analytics (2020) 0.04
    0.038582932 = product of:
      0.077165864 = sum of:
        0.077165864 = product of:
          0.15433173 = sum of:
            0.15433173 = weight(_text_:mining in 5968) [ClassicSimilarity], result of:
              0.15433173 = score(doc=5968,freq=6.0), product of:
                0.28585905 = queryWeight, product of:
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.05066224 = queryNorm
                0.5398875 = fieldWeight in 5968, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5968)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Higher education institutions are mining and analyzing student data to effect educational, political, and managerial outcomes. Done under the banner of "learning analytics," this work can-and often does-surface sensitive data and information about, inter alia, a student's demographics, academic performance, offline and online movements, physical fitness, mental wellbeing, and social network. With these data, institutions and third parties are able to describe student life, predict future behaviors, and intervene to address academic or other barriers to student success (however defined). Learning analytics, consequently, raise serious issues concerning student privacy, autonomy, and the appropriate flow of student data. We argue that issues around privacy lead to valid questions about the degree to which students should trust their institution to use learning analytics data and other artifacts (algorithms, predictive scores) with their interests in mind. We argue that higher education institutions are paradigms of information fiduciaries. As such, colleges and universities have a special responsibility to their students. In this article, we use the information fiduciary concept to analyze cases when learning analytics violate an institution's responsibility to its students.
    Theme
    Data Mining
  5. Goldberg, D.M.; Zaman, N.; Brahma, A.; Aloiso, M.: Are mortgage loan closing delay risks predictable? : A predictive analysis using text mining on discussion threads (2022) 0.04
    0.038582932 = product of:
      0.077165864 = sum of:
        0.077165864 = product of:
          0.15433173 = sum of:
            0.15433173 = weight(_text_:mining in 501) [ClassicSimilarity], result of:
              0.15433173 = score(doc=501,freq=6.0), product of:
                0.28585905 = queryWeight, product of:
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.05066224 = queryNorm
                0.5398875 = fieldWeight in 501, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=501)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Loan processors and underwriters at mortgage firms seek to gather substantial supporting documentation to properly understand and model loan risks. In doing so, loan originations become prone to closing delays, risking client dissatisfaction and consequent revenue losses. We collaborate with a large national mortgage firm to examine the extent to which these delays are predictable, using internal discussion threads to prioritize interventions for loans most at risk. Substantial work experience is required to predict delays, and we find that even highly trained employees have difficulty predicting delays by reviewing discussion threads. We develop an array of methods to predict loan delays. We apply four modern out-of-the-box sentiment analysis techniques, two dictionary-based and two rule-based, to predict delays. We contrast these approaches with domain-specific approaches, including firm-provided keyword searches and "smoke terms" derived using machine learning. Performance varies widely across sentiment approaches; while some sentiment approaches prioritize the top-ranking records well, performance quickly declines thereafter. The firm-provided keyword searches perform at the rate of random chance. We observe that the domain-specific smoke term approaches consistently outperform other approaches and offer better prediction than loan and borrower characteristics. We conclude that text mining solutions would greatly assist mortgage firms in delay prevention.
    Theme
    Data Mining
  6. Datentracking in der Wissenschaft : Aggregation und Verwendung bzw. Verkauf von Nutzungsdaten durch Wissenschaftsverlage. Ein Informationspapier des Ausschusses für Wissenschaftliche Bibliotheken und Informationssysteme der Deutschen Forschungsgemeinschaft (2021) 0.03
    0.026731037 = product of:
      0.053462073 = sum of:
        0.053462073 = product of:
          0.10692415 = sum of:
            0.10692415 = weight(_text_:mining in 248) [ClassicSimilarity], result of:
              0.10692415 = score(doc=248,freq=2.0), product of:
                0.28585905 = queryWeight, product of:
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.05066224 = queryNorm
                0.37404498 = fieldWeight in 248, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.046875 = fieldNorm(doc=248)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Theme
    Data Mining
  7. Organisciak, P.; Schmidt, B.M.; Downie, J.S.: Giving shape to large digital libraries through exploratory data analysis (2022) 0.03
    0.026731037 = product of:
      0.053462073 = sum of:
        0.053462073 = product of:
          0.10692415 = sum of:
            0.10692415 = weight(_text_:mining in 473) [ClassicSimilarity], result of:
              0.10692415 = score(doc=473,freq=2.0), product of:
                0.28585905 = queryWeight, product of:
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.05066224 = queryNorm
                0.37404498 = fieldWeight in 473, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.046875 = fieldNorm(doc=473)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Theme
    Data Mining
  8. Borgman, C.L.; Wofford, M.F.; Golshan, M.S.; Darch, P.T.: Collaborative qualitative research at scale : reflections on 20 years of acquiring global data and making data global (2021) 0.02
    0.022275863 = product of:
      0.044551726 = sum of:
        0.044551726 = product of:
          0.08910345 = sum of:
            0.08910345 = weight(_text_:mining in 239) [ClassicSimilarity], result of:
              0.08910345 = score(doc=239,freq=2.0), product of:
                0.28585905 = queryWeight, product of:
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.05066224 = queryNorm
                0.31170416 = fieldWeight in 239, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=239)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Theme
    Data Mining