Search (1 results, page 1 of 1)

Did you mean:
rswk_00%3a%22Bibliothek %2f sozialen software%22 1
rswk_00%3a%22Bibliothek %2f sozialer software%22 1
rswk_00%3a%22Bibliothek %2f soziales software%22 1
rswk_00%3a%22Bibliothek %2f soziale software%22 1
rswk_00%3a%22Bibliothek %2f sociales software%22 1

Rehurek, R.; Sojka, P.: Software framework for topic modelling with large corpora (2010) 0.01
```
0.009798723 = product of:
  0.039194893 = sum of:
    0.039194893 = product of:
      0.078389786 = sum of:
        0.078389786 = weight(_text_:software in 1058) [ClassicSimilarity], result of:
          0.078389786 = score(doc=1058,freq=6.0), product of:
            0.17209321 = queryWeight, product of:
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.043379538 = queryNorm
            0.4555077 = fieldWeight in 1058, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.046875 = fieldNorm(doc=1058)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Large corpora are ubiquitous in today's world and memory quickly becomes the limiting factor in practical applications of the Vector Space Model (VSM). In this paper, we identify a gap in existing implementations of many of the popular algorithms, which is their scalability and ease of use. We describe a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion. Within this framework, we implement several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation, in a way that makes them completely independent of the training corpus size. Particular emphasis is placed on straightforward and intuitive framework design, so that modifications and extensions of the methods and/or their application by interested practitioners are effortless. We demonstrate the usefulness of our approach on a real-world scenario of computing document similarities within an existing digital library DML-CZ.

Content

Für die Software, vgl.: http://radimrehurek.com/gensim/index.html. Für eine Demo, vgl.: http://dml.cz/handle/10338.dmlcz/100785/SimilarArticles.