Search (4 results, page 1 of 1)

Wolfram, D.: ¬The symbiotic relationship between information retrieval and informetrics (2015) 0.01

0.00885179 = product of:
  0.061962526 = sum of:
    0.061962526 = weight(_text_:retrieval in 1689) [ClassicSimilarity], result of:
      0.061962526 = score(doc=1689,freq=4.0), product of:
        0.109248295 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.036116153 = queryNorm
        0.5671716 = fieldWeight in 1689, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.09375 = fieldNorm(doc=1689)
  0.14285715 = coord(1/7)

Footnote: Beitrag in einem Special Issue "Combining bibliometrics and information retrieval"

Lu, K.; Cai, X.; Ajiferuke, I.; Wolfram, D.: Vocabulary size and its effect on topic representation (2017) 0.00
```
0.0031295803 = product of:
  0.021907061 = sum of:
    0.021907061 = weight(_text_:retrieval in 3414) [ClassicSimilarity], result of:
      0.021907061 = score(doc=3414,freq=2.0), product of:
        0.109248295 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.036116153 = queryNorm
        0.20052543 = fieldWeight in 3414, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=3414)
  0.14285715 = coord(1/7)
```
Abstract

This study investigates how computational overhead for topic model training may be reduced by selectively removing terms from the vocabulary of text corpora being modeled. We compare the impact of removing singly occurring terms, the top 0.5%, 1% and 5% most frequently occurring terms and both top 0.5% most frequent and singly occurring terms, along with changes in the number of topics modeled (10, 20, 30, 40, 50, 100) using three datasets. Four outcome measures are compared. The removal of singly occurring terms has little impact on outcomes for all of the measures tested. Document discriminative capacity, as measured by the document space density, is reduced by the removal of frequently occurring terms, but increases with higher numbers of topics. Vocabulary size does not greatly influence entropy, but entropy is affected by the number of topics. Finally, topic similarity, as measured by pairwise topic similarity and Jensen-Shannon divergence, decreases with the removal of frequent terms. The findings have implications for information science research in information retrieval and informetrics that makes use of topic modeling.

Ajiferuke, I.; Lu, K.; Wolfram, D.: ¬A comparison of citer and citation-based measure outcomes for multiple disciplines (2010) 0.00

0.0013980685 = product of:
  0.009786479 = sum of:
    0.009786479 = product of:
      0.029359438 = sum of:
        0.029359438 = weight(_text_:22 in 4000) [ClassicSimilarity], result of:
          0.029359438 = score(doc=4000,freq=2.0), product of:
            0.1264726 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036116153 = queryNorm
            0.23214069 = fieldWeight in 4000, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=4000)
      0.33333334 = coord(1/3)
  0.14285715 = coord(1/7)

Date: 28. 9.2010 12:54:22

Castanha, R.C.G.; Wolfram, D.: ¬The domain of knowledge organization : a bibliometric analysis of prolific authors and their intellectual space (2018) 0.00

0.0011650572 = product of:
  0.0081554 = sum of:
    0.0081554 = product of:
      0.0244662 = sum of:
        0.0244662 = weight(_text_:22 in 4150) [ClassicSimilarity], result of:
          0.0244662 = score(doc=4150,freq=2.0), product of:
            0.1264726 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.036116153 = queryNorm
            0.19345059 = fieldWeight in 4150, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4150)
      0.33333334 = coord(1/3)
  0.14285715 = coord(1/7)

Source: Knowledge organization. 45(2018) no.1, S.13-22

Search (4 results, page 1 of 1)

Authors

Themes