Search (4 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × theme_ss:"Informetrie"
  1. Ahonen, H.: Knowledge discovery in documents by extracting frequent word sequences (1999) 0.01
    0.007195845 = product of:
      0.01439169 = sum of:
        0.01439169 = product of:
          0.043175068 = sum of:
            0.043175068 = weight(_text_:h in 6088) [ClassicSimilarity], result of:
              0.043175068 = score(doc=6088,freq=2.0), product of:
                0.11234917 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045220956 = queryNorm
                0.38429362 = fieldWeight in 6088, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6088)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  2. Moohebat, M.; Raj, R.G.; Kareem, S.B.A.; Thorleuchter, D.: Identifying ISI-indexed articles by their lexical usage : a text analysis approach (2015) 0.01
    0.006366888 = product of:
      0.012733776 = sum of:
        0.012733776 = product of:
          0.03820133 = sum of:
            0.03820133 = weight(_text_:k in 1664) [ClassicSimilarity], result of:
              0.03820133 = score(doc=1664,freq=2.0), product of:
                0.16142878 = queryWeight, product of:
                  3.569778 = idf(docFreq=3384, maxDocs=44218)
                  0.045220956 = queryNorm
                0.23664509 = fieldWeight in 1664, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.569778 = idf(docFreq=3384, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1664)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    This research creates an architecture for investigating the existence of probable lexical divergences between articles, categorized as Institute for Scientific Information (ISI) and non-ISI, and consequently, if such a difference is discovered, to propose the best available classification method. Based on a collection of ISI- and non-ISI-indexed articles in the areas of business and computer science, three classification models are trained. A sensitivity analysis is applied to demonstrate the impact of words in different syntactical forms on the classification decision. The results demonstrate that the lexical domains of ISI and non-ISI articles are distinguishable by machine learning techniques. Our findings indicate that the support vector machine identifies ISI-indexed articles in both disciplines with higher precision than do the Naïve Bayesian and K-Nearest Neighbors techniques.
  3. Radev, D.R.; Joseph, M.T.; Gibson, B.; Muthukrishnan, P.: ¬A bibliometric and network analysis of the field of computational linguistics (2016) 0.00
    0.0035979224 = product of:
      0.007195845 = sum of:
        0.007195845 = product of:
          0.021587534 = sum of:
            0.021587534 = weight(_text_:h in 2764) [ClassicSimilarity], result of:
              0.021587534 = score(doc=2764,freq=2.0), product of:
                0.11234917 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045220956 = queryNorm
                0.19214681 = fieldWeight in 2764, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2764)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    The ACL Anthology is a large collection of research papers in computational linguistics. Citation data were obtained using text extraction from a collection of PDF files with significant manual postprocessing performed to clean up the results. Manual annotation of the references was then performed to complete the citation network. We analyzed the networks of paper citations, author citations, and author collaborations in an attempt to identify the most central papers and authors. The analysis includes general network statistics, PageRank, metrics across publication years and venues, the impact factor and h-index, as well as other measures.
  4. Chen, L.; Fang, H.: ¬An automatic method for ex-tracting innovative ideas based on the Scopus® database (2019) 0.00
    0.0025699446 = product of:
      0.005139889 = sum of:
        0.005139889 = product of:
          0.015419668 = sum of:
            0.015419668 = weight(_text_:h in 5310) [ClassicSimilarity], result of:
              0.015419668 = score(doc=5310,freq=2.0), product of:
                0.11234917 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045220956 = queryNorm
                0.13724773 = fieldWeight in 5310, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5310)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)