Search (2 results, page 1 of 1)

Did you mean:
object's%3a%22Alexandria digital library project%22 2
objects%3a%22Alexandria digital library project%22 2

Rehurek, R.; Sojka, P.: Software framework for topic modelling with large corpora (2010) 0.04
```
0.037334766 = product of:
  0.07466953 = sum of:
    0.051698197 = weight(_text_:digital in 1058) [ClassicSimilarity], result of:
      0.051698197 = score(doc=1058,freq=2.0), product of:
        0.19770671 = queryWeight, product of:
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.050121464 = queryNorm
        0.26148933 = fieldWeight in 1058, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.944552 = idf(docFreq=2326, maxDocs=44218)
          0.046875 = fieldNorm(doc=1058)
    0.022971334 = weight(_text_:library in 1058) [ClassicSimilarity], result of:
      0.022971334 = score(doc=1058,freq=2.0), product of:
        0.1317883 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.050121464 = queryNorm
        0.17430481 = fieldWeight in 1058, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.046875 = fieldNorm(doc=1058)
  0.5 = coord(2/4)
```
Abstract

Large corpora are ubiquitous in today's world and memory quickly becomes the limiting factor in practical applications of the Vector Space Model (VSM). In this paper, we identify a gap in existing implementations of many of the popular algorithms, which is their scalability and ease of use. We describe a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion. Within this framework, we implement several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation, in a way that makes them completely independent of the training corpus size. Particular emphasis is placed on straightforward and intuitive framework design, so that modifications and extensions of the methods and/or their application by interested practitioners are effortless. We demonstrate the usefulness of our approach on a real-world scenario of computing document similarities within an existing digital library DML-CZ.

Sojka, P.; Liska, M.: ¬The art of mathematics retrieval (2011) 0.01

0.008403145 = product of:
  0.03361258 = sum of:
    0.03361258 = product of:
      0.06722516 = sum of:
        0.06722516 = weight(_text_:22 in 3450) [ClassicSimilarity], result of:
          0.06722516 = score(doc=3450,freq=4.0), product of:
            0.17551683 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050121464 = queryNorm
            0.38301262 = fieldWeight in 3450, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3450)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Content: Vgl.: DocEng2011, September 19-22, 2011, Mountain View, California, USA Copyright 2011 ACM 978-1-4503-0863-2/11/09
Date: 22. 2.2017 13:00:42

Search (2 results, page 1 of 1)

Authors