Search (6 results, page 1 of 1)

  • × language_ss:"e"
  • × theme_ss:"Automatisches Klassifizieren"
  • × theme_ss:"Data Mining"
  1. Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.01
    0.013120042 = product of:
      0.032800104 = sum of:
        0.009077741 = weight(_text_:e in 3015) [ClassicSimilarity], result of:
          0.009077741 = score(doc=3015,freq=6.0), product of:
            0.055003747 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.03826694 = queryNorm
            0.16503859 = fieldWeight in 3015, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.046875 = fieldNorm(doc=3015)
        0.023722364 = product of:
          0.07116709 = sum of:
            0.07116709 = weight(_text_:evolution in 3015) [ClassicSimilarity], result of:
              0.07116709 = score(doc=3015,freq=2.0), product of:
                0.2026858 = queryWeight, product of:
                  5.29663 = idf(docFreq=601, maxDocs=44218)
                  0.03826694 = queryNorm
                0.35112026 = fieldWeight in 3015, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.29663 = idf(docFreq=601, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3015)
          0.33333334 = coord(1/3)
      0.4 = coord(2/5)
    
    Abstract
    We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
    Language
    e
  2. Fong, A.C.M.: Mining a Web citation database for document clustering (2002) 0.00
    0.0024458165 = product of:
      0.012229082 = sum of:
        0.012229082 = weight(_text_:e in 3940) [ClassicSimilarity], result of:
          0.012229082 = score(doc=3940,freq=2.0), product of:
            0.055003747 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.03826694 = queryNorm
            0.2223318 = fieldWeight in 3940, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.109375 = fieldNorm(doc=3940)
      0.2 = coord(1/5)
    
    Language
    e
  3. Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004) 0.00
    0.0014823888 = product of:
      0.0074119437 = sum of:
        0.0074119437 = weight(_text_:e in 2563) [ClassicSimilarity], result of:
          0.0074119437 = score(doc=2563,freq=4.0), product of:
            0.055003747 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.03826694 = queryNorm
            0.13475344 = fieldWeight in 2563, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.046875 = fieldNorm(doc=2563)
      0.2 = coord(1/5)
    
    Abstract
    Topic discovery is an important means for marketing, e-Business and social science studies. As well, it can be applied to various purposes, such as identifying a group with certain properties and observing the emergence and diminishment of a certain cyber community. Previous topic discovery work (J.M. Kleinberg, Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, p. 668) requires manual judgment of usefulness of outcomes and is thus incapable of handling the explosive growth of the Internet. In this paper, we propose the Automatic Topic Discovery (ATD) method, which combines a method of base set construction, a clustering algorithm and an iterative principal eigenvector computation method to discover the topics relevant to a given query without using manual examination. Given a query, ATD returns with topics associated with the query and top representative pages for each topic. Our experiments show that the ATD method performs better than the traditional eigenvector method in terms of computation time and topic discovery quality.
    Language
    e
  4. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.00
    0.0010482072 = product of:
      0.0052410355 = sum of:
        0.0052410355 = weight(_text_:e in 3464) [ClassicSimilarity], result of:
          0.0052410355 = score(doc=3464,freq=2.0), product of:
            0.055003747 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.03826694 = queryNorm
            0.09528506 = fieldWeight in 3464, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
      0.2 = coord(1/5)
    
    Language
    e
  5. Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.00
    8.73506E-4 = product of:
      0.00436753 = sum of:
        0.00436753 = weight(_text_:e in 5997) [ClassicSimilarity], result of:
          0.00436753 = score(doc=5997,freq=2.0), product of:
            0.055003747 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.03826694 = queryNorm
            0.07940422 = fieldWeight in 5997, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5997)
      0.2 = coord(1/5)
    
    Language
    e
  6. Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.00
    8.73506E-4 = product of:
      0.00436753 = sum of:
        0.00436753 = weight(_text_:e in 967) [ClassicSimilarity], result of:
          0.00436753 = score(doc=967,freq=2.0), product of:
            0.055003747 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.03826694 = queryNorm
            0.07940422 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.2 = coord(1/5)
    
    Language
    e