Search (4 results, page 1 of 1)

  • × author_ss:"Sojka, P."
  • × type_ss:"el"
  1. Sojka, P.; Liska, M.: ¬The art of mathematics retrieval (2011) 0.04
    0.036181405 = product of:
      0.07236281 = sum of:
        0.07236281 = sum of:
          0.010589487 = weight(_text_:a in 3450) [ClassicSimilarity], result of:
            0.010589487 = score(doc=3450,freq=10.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.19940455 = fieldWeight in 3450, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3450)
          0.061773323 = weight(_text_:22 in 3450) [ClassicSimilarity], result of:
            0.061773323 = score(doc=3450,freq=4.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.38301262 = fieldWeight in 3450, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3450)
      0.5 = coord(1/2)
    
    Abstract
    The design and architecture of MIaS (Math Indexer and Searcher), a system for mathematics retrieval is presented, and design decisions are discussed. We argue for an approach based on Presentation MathML using a similarity of math subformulae. The system was implemented as a math-aware search engine based on the state-ofthe-art system Apache Lucene. Scalability issues were checked against more than 400,000 arXiv documents with 158 million mathematical formulae. Almost three billion MathML subformulae were indexed using a Solr-compatible Lucene.
    Content
    Vgl.: DocEng2011, September 19-22, 2011, Mountain View, California, USA Copyright 2011 ACM 978-1-4503-0863-2/11/09
    Date
    22. 2.2017 13:00:42
    Type
    a
  2. Rehurek, R.; Sojka, P.: Software framework for topic modelling with large corpora (2010) 0.00
    0.0024857575 = product of:
      0.004971515 = sum of:
        0.004971515 = product of:
          0.00994303 = sum of:
            0.00994303 = weight(_text_:a in 1058) [ClassicSimilarity], result of:
              0.00994303 = score(doc=1058,freq=12.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.18723148 = fieldWeight in 1058, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1058)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Large corpora are ubiquitous in today's world and memory quickly becomes the limiting factor in practical applications of the Vector Space Model (VSM). In this paper, we identify a gap in existing implementations of many of the popular algorithms, which is their scalability and ease of use. We describe a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion. Within this framework, we implement several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation, in a way that makes them completely independent of the training corpus size. Particular emphasis is placed on straightforward and intuitive framework design, so that modifications and extensions of the methods and/or their application by interested practitioners are effortless. We demonstrate the usefulness of our approach on a real-world scenario of computing document similarities within an existing digital library DML-CZ.
    Type
    a
  3. Líska, M.; Sojka, P.: MIaS 1.5 (2014) 0.00
    0.0014351527 = product of:
      0.0028703054 = sum of:
        0.0028703054 = product of:
          0.005740611 = sum of:
            0.005740611 = weight(_text_:a in 1652) [ClassicSimilarity], result of:
              0.005740611 = score(doc=1652,freq=4.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.10809815 = fieldWeight in 1652, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1652)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A math-aware, full-text indexing based search engine that enables users to search for mathematical formulae inside documents. Search engine is unique because it is able to index and search structural information like representation of mathematical formulae. There is no other software or IR system that is able to store three billions of formulae in its index and search it with response time below a second. MIaS processes documents containing mathematical notation in MathML format. The system is built as an extension to any full-text indexing engine and has been verifiend on state-of-the-art Lucene core. It is scalable - it was verified to index almost whole arxiv.org (440,000 papers) having more than 160,000,000 formulae. Software is being used in EuDML (eudml.org) and other digital libraries. For more details see papers in peer reviewed conferences: [1] Sojka, Petr; Líska, Martin. In Matthew R. B. Hardy, Frank Wm. Tompa. Proceedings of the 2011 ACM Symposium on Document Engineering. Mountain View, CA, USA : ACM, 2011. pp.57--60. [2] Sojka, Petr; Líska, Martin. In J.H.Davenport, W.M. Farmer, J.Urban, F. Rabe. Intelligent Computer Mathematics LNCS 6824. Springer, 2011, pp.228--243.
  4. Sojka, P.; Lee, M.; Rehurek, R.; Hatlapatka, R.; Kucbel, M.; Bouche, T.; Goutorbe, C.; Anghelache, R.; Wojciechowski, K.: Toolset for entity and semantic associations : Final Release (2013) 0.00
    0.0010148063 = product of:
      0.0020296127 = sum of:
        0.0020296127 = product of:
          0.0040592253 = sum of:
            0.0040592253 = weight(_text_:a in 1057) [ClassicSimilarity], result of:
              0.0040592253 = score(doc=1057,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.07643694 = fieldWeight in 1057, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1057)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this document we describe the final release of the toolset for entity and semantic associations, integrating two versions (language dependent and language independent) of Unsupervised Document Similarity implemented by MU (using gensim tool) and Citation Indexing, Resolution and Matching (UJF/CMD). We give a brief description of tools, the rationale behind decisions made, and provide elementary evaluation. Tools are integrated in the main project result, EuDML website, and they deliver the needed functionality for exploratory searching and browsing the collected documents. EuDML users and content providers thus benefit from millions of algorithmically generated similarity and citation links, developed using state of the art machine learning and matching methods.