Search (2 results, page 1 of 1)
- Did you mean:
- author's%3a%22Srivastava%2c A.%22 2
- authors%3a%22Srivastava%2c A.%22 2
-
Sojka, P.; Liska, M.: ¬The art of mathematics retrieval (2011)
0.04
0.036181405 = product of: 0.07236281 = sum of: 0.07236281 = sum of: 0.010589487 = weight(_text_:a in 3450) [ClassicSimilarity], result of: 0.010589487 = score(doc=3450,freq=10.0), product of: 0.053105544 = queryWeight, product of: 1.153047 = idf(docFreq=37942, maxDocs=44218) 0.046056706 = queryNorm 0.19940455 = fieldWeight in 3450, product of: 3.1622777 = tf(freq=10.0), with freq of: 10.0 = termFreq=10.0 1.153047 = idf(docFreq=37942, maxDocs=44218) 0.0546875 = fieldNorm(doc=3450) 0.061773323 = weight(_text_:22 in 3450) [ClassicSimilarity], result of: 0.061773323 = score(doc=3450,freq=4.0), product of: 0.16128273 = queryWeight, product of: 3.5018296 = idf(docFreq=3622, maxDocs=44218) 0.046056706 = queryNorm 0.38301262 = fieldWeight in 3450, product of: 2.0 = tf(freq=4.0), with freq of: 4.0 = termFreq=4.0 3.5018296 = idf(docFreq=3622, maxDocs=44218) 0.0546875 = fieldNorm(doc=3450) 0.5 = coord(1/2)
- Abstract
- The design and architecture of MIaS (Math Indexer and Searcher), a system for mathematics retrieval is presented, and design decisions are discussed. We argue for an approach based on Presentation MathML using a similarity of math subformulae. The system was implemented as a math-aware search engine based on the state-ofthe-art system Apache Lucene. Scalability issues were checked against more than 400,000 arXiv documents with 158 million mathematical formulae. Almost three billion MathML subformulae were indexed using a Solr-compatible Lucene.
- Content
- Vgl.: DocEng2011, September 19-22, 2011, Mountain View, California, USA Copyright 2011 ACM 978-1-4503-0863-2/11/09
- Date
- 22. 2.2017 13:00:42
- Type
- a
-
Rehurek, R.; Sojka, P.: Software framework for topic modelling with large corpora (2010)
0.00
0.0024857575 = product of: 0.004971515 = sum of: 0.004971515 = product of: 0.00994303 = sum of: 0.00994303 = weight(_text_:a in 1058) [ClassicSimilarity], result of: 0.00994303 = score(doc=1058,freq=12.0), product of: 0.053105544 = queryWeight, product of: 1.153047 = idf(docFreq=37942, maxDocs=44218) 0.046056706 = queryNorm 0.18723148 = fieldWeight in 1058, product of: 3.4641016 = tf(freq=12.0), with freq of: 12.0 = termFreq=12.0 1.153047 = idf(docFreq=37942, maxDocs=44218) 0.046875 = fieldNorm(doc=1058) 0.5 = coord(1/2) 0.5 = coord(1/2)
- Abstract
- Large corpora are ubiquitous in today's world and memory quickly becomes the limiting factor in practical applications of the Vector Space Model (VSM). In this paper, we identify a gap in existing implementations of many of the popular algorithms, which is their scalability and ease of use. We describe a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion. Within this framework, we implement several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation, in a way that makes them completely independent of the training corpus size. Particular emphasis is placed on straightforward and intuitive framework design, so that modifications and extensions of the methods and/or their application by interested practitioners are effortless. We demonstrate the usefulness of our approach on a real-world scenario of computing document similarities within an existing digital library DML-CZ.
- Type
- a
Authors
- Liska, M. 1
- Rehurek, R. 1