Search (3 results, page 1 of 1)

  • × author_ss:"Sojka, P."
  • × type_ss:"el"
  1. Rehurek, R.; Sojka, P.: Software framework for topic modelling with large corpora (2010) 0.01
    0.010281074 = product of:
      0.041124295 = sum of:
        0.041124295 = product of:
          0.08224859 = sum of:
            0.08224859 = weight(_text_:software in 1058) [ClassicSimilarity], result of:
              0.08224859 = score(doc=1058,freq=6.0), product of:
                0.18056466 = queryWeight, product of:
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.045514934 = queryNorm
                0.4555077 = fieldWeight in 1058, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1058)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Large corpora are ubiquitous in today's world and memory quickly becomes the limiting factor in practical applications of the Vector Space Model (VSM). In this paper, we identify a gap in existing implementations of many of the popular algorithms, which is their scalability and ease of use. We describe a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion. Within this framework, we implement several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation, in a way that makes them completely independent of the training corpus size. Particular emphasis is placed on straightforward and intuitive framework design, so that modifications and extensions of the methods and/or their application by interested practitioners are effortless. We demonstrate the usefulness of our approach on a real-world scenario of computing document similarities within an existing digital library DML-CZ.
    Content
    Für die Software, vgl.: http://radimrehurek.com/gensim/index.html. Für eine Demo, vgl.: http://dml.cz/handle/10338.dmlcz/100785/SimilarArticles.
  2. Líska, M.; Sojka, P.: MIaS 1.5 (2014) 0.01
    0.008394462 = product of:
      0.03357785 = sum of:
        0.03357785 = product of:
          0.0671557 = sum of:
            0.0671557 = weight(_text_:software in 1652) [ClassicSimilarity], result of:
              0.0671557 = score(doc=1652,freq=4.0), product of:
                0.18056466 = queryWeight, product of:
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.045514934 = queryNorm
                0.3719205 = fieldWeight in 1652, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1652)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    A math-aware, full-text indexing based search engine that enables users to search for mathematical formulae inside documents. Search engine is unique because it is able to index and search structural information like representation of mathematical formulae. There is no other software or IR system that is able to store three billions of formulae in its index and search it with response time below a second. MIaS processes documents containing mathematical notation in MathML format. The system is built as an extension to any full-text indexing engine and has been verifiend on state-of-the-art Lucene core. It is scalable - it was verified to index almost whole arxiv.org (440,000 papers) having more than 160,000,000 formulae. Software is being used in EuDML (eudml.org) and other digital libraries. For more details see papers in peer reviewed conferences: [1] Sojka, Petr; Líska, Martin. In Matthew R. B. Hardy, Frank Wm. Tompa. Proceedings of the 2011 ACM Symposium on Document Engineering. Mountain View, CA, USA : ACM, 2011. pp.57--60. [2] Sojka, Petr; Líska, Martin. In J.H.Davenport, W.M. Farmer, J.Urban, F. Rabe. Intelligent Computer Mathematics LNCS 6824. Springer, 2011, pp.228--243.
  3. Sojka, P.; Liska, M.: ¬The art of mathematics retrieval (2011) 0.01
    0.0076308344 = product of:
      0.030523337 = sum of:
        0.030523337 = product of:
          0.061046675 = sum of:
            0.061046675 = weight(_text_:22 in 3450) [ClassicSimilarity], result of:
              0.061046675 = score(doc=3450,freq=4.0), product of:
                0.15938555 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045514934 = queryNorm
                0.38301262 = fieldWeight in 3450, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3450)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Content
    Vgl.: DocEng2011, September 19-22, 2011, Mountain View, California, USA Copyright 2011 ACM 978-1-4503-0863-2/11/09
    Date
    22. 2.2017 13:00:42