Search (468 results, page 1 of 24)

  • × type_ss:"el"
  1. CBT-Multimedialexikon : WINDOWS-Hypertextlexikon Multimedia (199?) 0.12
    0.11946598 = product of:
      0.17919897 = sum of:
        0.081401736 = weight(_text_:based in 5991) [ClassicSimilarity], result of:
          0.081401736 = score(doc=5991,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.5326271 = fieldWeight in 5991, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.125 = fieldNorm(doc=5991)
        0.09779723 = product of:
          0.19559446 = sum of:
            0.19559446 = weight(_text_:training in 5991) [ClassicSimilarity], result of:
              0.19559446 = score(doc=5991,freq=2.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.8256285 = fieldWeight in 5991, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.125 = fieldNorm(doc=5991)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Theme
    Computer Based Training
  2. Weber, S.: ¬Der Angriff der Digitalgeräte auf die übrigen Lernmedien (2015) 0.10
    0.10453274 = product of:
      0.15679911 = sum of:
        0.07122652 = weight(_text_:based in 2505) [ClassicSimilarity], result of:
          0.07122652 = score(doc=2505,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.46604872 = fieldWeight in 2505, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.109375 = fieldNorm(doc=2505)
        0.08557258 = product of:
          0.17114516 = sum of:
            0.17114516 = weight(_text_:training in 2505) [ClassicSimilarity], result of:
              0.17114516 = score(doc=2505,freq=2.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.722425 = fieldWeight in 2505, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2505)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Theme
    Computer Based Training
  3. Mortimer, M.; Lockhead, K.; Hyland, M.: CatSkill : a multimedia course on AACR2 and MARC (1994) 0.09
    0.08959949 = product of:
      0.13439924 = sum of:
        0.0610513 = weight(_text_:based in 7865) [ClassicSimilarity], result of:
          0.0610513 = score(doc=7865,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.39947033 = fieldWeight in 7865, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.09375 = fieldNorm(doc=7865)
        0.073347926 = product of:
          0.14669585 = sum of:
            0.14669585 = weight(_text_:training in 7865) [ClassicSimilarity], result of:
              0.14669585 = score(doc=7865,freq=2.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.6192214 = fieldWeight in 7865, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.09375 = fieldNorm(doc=7865)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Theme
    Computer Based Training
  4. Mai, F.; Galke, L.; Scherp, A.: Using deep learning for title-based semantic subject indexing to reach competitive performance to full-text (2018) 0.08
    0.079280265 = product of:
      0.11892039 = sum of:
        0.044059984 = weight(_text_:based in 4093) [ClassicSimilarity], result of:
          0.044059984 = score(doc=4093,freq=6.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.28829288 = fieldWeight in 4093, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4093)
        0.07486041 = product of:
          0.14972082 = sum of:
            0.14972082 = weight(_text_:training in 4093) [ClassicSimilarity], result of:
              0.14972082 = score(doc=4093,freq=12.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.6319902 = fieldWeight in 4093, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4093)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    For (semi-)automated subject indexing systems in digital libraries, it is often more practical to use metadata such as the title of a publication instead of the full-text or the abstract. Therefore, it is desirable to have good text mining and text classification algorithms that operate well already on the title of a publication. So far, the classification performance on titles is not competitive with the performance on the full-texts if the same number of training samples is used for training. However, it is much easier to obtain title data in large quantities and to use it for training than full-text data. In this paper, we investigate the question how models obtained from training on increasing amounts of title training data compare to models from training on a constant number of full-texts. We evaluate this question on a large-scale dataset from the medical domain (PubMed) and from economics (EconBiz). In these datasets, the titles and annotations of millions of publications are available, and they outnumber the available full-texts by a factor of 20 and 15, respectively. To exploit these large amounts of data to their full potential, we develop three strong deep learning classifiers and evaluate their performance on the two datasets. The results are promising. On the EconBiz dataset, all three classifiers outperform their full-text counterparts by a large margin. The best title-based classifier outperforms the best full-text method by 9.9%. On the PubMed dataset, the best title-based method almost reaches the performance of the best full-text classifier, with a difference of only 2.9%.
  5. Zeilmann, K.; Beer, K.; dpa: Tablet statt Lehrbuch : wie die Digitalisierung die Unis verändert (2016) 0.07
    0.07466623 = product of:
      0.11199935 = sum of:
        0.050876085 = weight(_text_:based in 2699) [ClassicSimilarity], result of:
          0.050876085 = score(doc=2699,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.33289194 = fieldWeight in 2699, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.078125 = fieldNorm(doc=2699)
        0.061123267 = product of:
          0.12224653 = sum of:
            0.12224653 = weight(_text_:training in 2699) [ClassicSimilarity], result of:
              0.12224653 = score(doc=2699,freq=2.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.5160178 = fieldWeight in 2699, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2699)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Theme
    Computer Based Training
  6. Lackes, R.; Mack, D.: Computer Based Training on neural nets : Basics, development, and practice (1998) 0.07
    0.07391581 = product of:
      0.11087371 = sum of:
        0.050364755 = weight(_text_:based in 964) [ClassicSimilarity], result of:
          0.050364755 = score(doc=964,freq=4.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.3295462 = fieldWeight in 964, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=964)
        0.06050895 = product of:
          0.1210179 = sum of:
            0.1210179 = weight(_text_:training in 964) [ClassicSimilarity], result of:
              0.1210179 = score(doc=964,freq=4.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.5108316 = fieldWeight in 964, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=964)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Theme
    Computer Based Training
  7. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I.: Attention Is all you need (2017) 0.07
    0.0711273 = product of:
      0.10669095 = sum of:
        0.04316979 = weight(_text_:based in 970) [ClassicSimilarity], result of:
          0.04316979 = score(doc=970,freq=4.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.28246817 = fieldWeight in 970, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=970)
        0.06352116 = product of:
          0.12704232 = sum of:
            0.12704232 = weight(_text_:training in 970) [ClassicSimilarity], result of:
              0.12704232 = score(doc=970,freq=6.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.53626144 = fieldWeight in 970, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.046875 = fieldNorm(doc=970)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
  8. Engelbrecht, T.; Lankau, R.: Technologie in unseren Schulen schadet mehr, als sie nützt. (2017) 0.06
    0.05973299 = product of:
      0.08959948 = sum of:
        0.040700868 = weight(_text_:based in 3722) [ClassicSimilarity], result of:
          0.040700868 = score(doc=3722,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.26631355 = fieldWeight in 3722, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0625 = fieldNorm(doc=3722)
        0.048898615 = product of:
          0.09779723 = sum of:
            0.09779723 = weight(_text_:training in 3722) [ClassicSimilarity], result of:
              0.09779723 = score(doc=3722,freq=2.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.41281426 = fieldWeight in 3722, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3722)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Theme
    Computer Based Training
  9. Sojka, P.; Liska, M.: ¬The art of mathematics retrieval (2011) 0.06
    0.056254204 = product of:
      0.084381305 = sum of:
        0.050364755 = weight(_text_:based in 3450) [ClassicSimilarity], result of:
          0.050364755 = score(doc=3450,freq=4.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.3295462 = fieldWeight in 3450, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3450)
        0.03401655 = product of:
          0.0680331 = sum of:
            0.0680331 = weight(_text_:22 in 3450) [ClassicSimilarity], result of:
              0.0680331 = score(doc=3450,freq=4.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.38301262 = fieldWeight in 3450, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3450)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The design and architecture of MIaS (Math Indexer and Searcher), a system for mathematics retrieval is presented, and design decisions are discussed. We argue for an approach based on Presentation MathML using a similarity of math subformulae. The system was implemented as a math-aware search engine based on the state-ofthe-art system Apache Lucene. Scalability issues were checked against more than 400,000 arXiv documents with 158 million mathematical formulae. Almost three billion MathML subformulae were indexed using a Solr-compatible Lucene.
    Content
    Vgl.: DocEng2011, September 19-22, 2011, Mountain View, California, USA Copyright 2011 ACM 978-1-4503-0863-2/11/09
    Date
    22. 2.2017 13:00:42
  10. Palm, F.: QVIZ : Query and context based visualization of time-spatial cultural dynamics (2007) 0.05
    0.048992746 = product of:
      0.073489115 = sum of:
        0.052871976 = weight(_text_:based in 1289) [ClassicSimilarity], result of:
          0.052871976 = score(doc=1289,freq=6.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.34595144 = fieldWeight in 1289, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=1289)
        0.020617142 = product of:
          0.041234285 = sum of:
            0.041234285 = weight(_text_:22 in 1289) [ClassicSimilarity], result of:
              0.041234285 = score(doc=1289,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.23214069 = fieldWeight in 1289, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1289)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    QVIZ will research and create a framework for visualizing and querying archival resources by a time-space interface based on maps and emergent knowledge structures. The framework will also integrate social software, such as wikis, in order to utilize knowledge in existing and new communities of practice. QVIZ will lead to improved information sharing and knowledge creation, easier access to information in a user-adapted context and innovative ways of exploring and visualizing materials over time, between countries and other administrative units. The common European framework for sharing and accessing archival information provided by the QVIZ project will open a considerably larger commercial market based on archival materials as well as a richer understanding of European history.
    Content
    Vortrag anlässlich des Workshops: "Extending the multilingual capacity of The European Library in the EDL project Stockholm, Swedish National Library, 22-23 November 2007".
  11. Karpathy, A.: ¬The unreasonable effectiveness of recurrent neural networks (2015) 0.05
    0.045772478 = product of:
      0.06865872 = sum of:
        0.025438042 = weight(_text_:based in 1865) [ClassicSimilarity], result of:
          0.025438042 = score(doc=1865,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.16644597 = fieldWeight in 1865, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1865)
        0.043220676 = product of:
          0.08644135 = sum of:
            0.08644135 = weight(_text_:training in 1865) [ClassicSimilarity], result of:
              0.08644135 = score(doc=1865,freq=4.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.3648797 = fieldWeight in 1865, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1865)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    There's something magical about Recurrent Neural Networks (RNNs). I still remember when I trained my first recurrent network for Image Captioning. Within a few dozen minutes of training my first baby model (with rather arbitrarily-chosen hyperparameters) started to generate very nice looking descriptions of images that were on the edge of making sense. Sometimes the ratio of how simple your model is to the quality of the results you get out of it blows past your expectations, and this was one of those times. What made this result so shocking at the time was that the common wisdom was that RNNs were supposed to be difficult to train (with more experience I've in fact reached the opposite conclusion). Fast forward about a year: I'm training RNNs all the time and I've witnessed their power and robustness many times, and yet their magical outputs still find ways of amusing me. This post is about sharing some of that magic with you. By the way, together with this post I am also releasing code on Github (https://github.com/karpathy/char-rnn) that allows you to train character-level language models based on multi-layer LSTMs. You give it a large chunk of text and it will learn to generate text like it one character at a time. You can also use it to reproduce my experiments below. But we're getting ahead of ourselves; What are RNNs anyway?
  12. Bensman, S.J.: Eugene Garfield, Francis Narin, and PageRank : the theoretical bases of the Google search engine (2013) 0.05
    0.045460265 = product of:
      0.068190396 = sum of:
        0.040700868 = weight(_text_:based in 1149) [ClassicSimilarity], result of:
          0.040700868 = score(doc=1149,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.26631355 = fieldWeight in 1149, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0625 = fieldNorm(doc=1149)
        0.027489524 = product of:
          0.05497905 = sum of:
            0.05497905 = weight(_text_:22 in 1149) [ClassicSimilarity], result of:
              0.05497905 = score(doc=1149,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.30952093 = fieldWeight in 1149, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1149)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    This paper presents a test of the validity of using Google Scholar to evaluate the publications of researchers by comparing the premises on which its search engine, PageRank, is based, to those of Garfield's theory of citation indexing. It finds that the premises are identical and that PageRank and Garfield's theory of citation indexing validate each other.
    Date
    17.12.2013 11:02:22
  13. Dunning, T.: Statistical identification of language (1994) 0.04
    0.044799745 = product of:
      0.06719962 = sum of:
        0.03052565 = weight(_text_:based in 3627) [ClassicSimilarity], result of:
          0.03052565 = score(doc=3627,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.19973516 = fieldWeight in 3627, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=3627)
        0.036673963 = product of:
          0.073347926 = sum of:
            0.073347926 = weight(_text_:training in 3627) [ClassicSimilarity], result of:
              0.073347926 = score(doc=3627,freq=2.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.3096107 = fieldWeight in 3627, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3627)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    A statistically based program has been written which learns to distinguish between languages. The amount of training text that such a program needs is surprisingly small, and the amount of text needed to make an identification is also quite small. The program incorporates no linguistic presuppositions other than the assumption that text can be encoded as a string of bytes. Such a program can be used to determine which language small bits of text are in. It also shows a potential for what might be called 'statistical philology' in that it may be applied directly to phonetic transcriptions to help elucidate family trees among language dialects. A variant of this program has been shown to be useful as a quality control in biochemistry. In this application, genetic sequences are assumed to be expressions in a language peculiar to the organism from which the sequence is taken. Thus language identification becomes species identification.
  14. Rehurek, R.; Sojka, P.: Software framework for topic modelling with large corpora (2010) 0.04
    0.044799745 = product of:
      0.06719962 = sum of:
        0.03052565 = weight(_text_:based in 1058) [ClassicSimilarity], result of:
          0.03052565 = score(doc=1058,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.19973516 = fieldWeight in 1058, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=1058)
        0.036673963 = product of:
          0.073347926 = sum of:
            0.073347926 = weight(_text_:training in 1058) [ClassicSimilarity], result of:
              0.073347926 = score(doc=1058,freq=2.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.3096107 = fieldWeight in 1058, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1058)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Large corpora are ubiquitous in today's world and memory quickly becomes the limiting factor in practical applications of the Vector Space Model (VSM). In this paper, we identify a gap in existing implementations of many of the popular algorithms, which is their scalability and ease of use. We describe a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion. Within this framework, we implement several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation, in a way that makes them completely independent of the training corpus size. Particular emphasis is placed on straightforward and intuitive framework design, so that modifications and extensions of the methods and/or their application by interested practitioners are effortless. We demonstrate the usefulness of our approach on a real-world scenario of computing document similarities within an existing digital library DML-CZ.
  15. Kleineberg, M.: Context analysis and context indexing : formal pragmatics in knowledge organization (2014) 0.04
    0.04475718 = product of:
      0.13427153 = sum of:
        0.13427153 = product of:
          0.4028146 = sum of:
            0.4028146 = weight(_text_:3a in 1826) [ClassicSimilarity], result of:
              0.4028146 = score(doc=1826,freq=2.0), product of:
                0.43003735 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.050723847 = queryNorm
                0.93669677 = fieldWeight in 1826, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1826)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Source
    http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CDQQFjAE&url=http%3A%2F%2Fdigbib.ubka.uni-karlsruhe.de%2Fvolltexte%2Fdocuments%2F3131107&ei=HzFWVYvGMsiNsgGTyoFI&usg=AFQjCNE2FHUeR9oQTQlNC4TPedv4Mo3DaQ&sig2=Rlzpr7a3BLZZkqZCXXN_IA&bvm=bv.93564037,d.bGg&cad=rja
  16. Stapleton, M.; Adams, M.: Faceted categorisation for the corporate desktop : visualisation and interaction using metadata to enhance user experience (2007) 0.04
    0.04252462 = product of:
      0.06378693 = sum of:
        0.04316979 = weight(_text_:based in 718) [ClassicSimilarity], result of:
          0.04316979 = score(doc=718,freq=4.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.28246817 = fieldWeight in 718, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=718)
        0.020617142 = product of:
          0.041234285 = sum of:
            0.041234285 = weight(_text_:22 in 718) [ClassicSimilarity], result of:
              0.041234285 = score(doc=718,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.23214069 = fieldWeight in 718, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=718)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Mark Stapleton and Matt Adamson began their presentation by describing how Dow Jones' Factiva range of information services processed an average of 170,000 documents every day, drawn from over 10,000 sources in 22 languages. These documents are categorized within five facets: Company, Subject, Industry, Region and Language. The digital feeds received from information providers undergo a series of processing stages, initially to prepare them for automatic categorization and then to format them ready for distribution. The categorization stage is able to handle 98% of documents automatically, the remaining 2% requiring some form of human intervention. Depending on the source, categorization can involve any combination of 'Autocoding', 'Dictionary-based Categorizing', 'Rules-based Coding' or 'Manual Coding'
  17. Zanibbi, R.; Yuan, B.: Keyword and image-based retrieval for mathematical expressions (2011) 0.04
    0.04252462 = product of:
      0.06378693 = sum of:
        0.04316979 = weight(_text_:based in 3449) [ClassicSimilarity], result of:
          0.04316979 = score(doc=3449,freq=4.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.28246817 = fieldWeight in 3449, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=3449)
        0.020617142 = product of:
          0.041234285 = sum of:
            0.041234285 = weight(_text_:22 in 3449) [ClassicSimilarity], result of:
              0.041234285 = score(doc=3449,freq=2.0), product of:
                0.17762627 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050723847 = queryNorm
                0.23214069 = fieldWeight in 3449, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3449)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Two new methods for retrieving mathematical expressions using conventional keyword search and expression images are presented. An expression-level TF-IDF (term frequency-inverse document frequency) approach is used for keyword search, where queries and indexed expressions are represented by keywords taken from LATEX strings. TF-IDF is computed at the level of individual expressions rather than documents to increase the precision of matching. The second retrieval technique is a form of Content-Base Image Retrieval (CBIR). Expressions are segmented into connected components, and then components in the query expression and each expression in the collection are matched using contour and density features, aspect ratios, and relative positions. In an experiment using ten randomly sampled queries from a corpus of over 22,000 expressions, precision-at-k (k= 20) for the keyword-based approach was higher (keyword: µ= 84.0,s= 19.0, image-based:µ= 32.0,s= 30.7), but for a few of the queries better results were obtained using a combination of the two techniques.
    Date
    22. 2.2017 12:53:49
  18. Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.04
    0.037638705 = product of:
      0.056458056 = sum of:
        0.03052565 = weight(_text_:based in 1253) [ClassicSimilarity], result of:
          0.03052565 = score(doc=1253,freq=8.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.19973516 = fieldWeight in 1253, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
        0.025932407 = product of:
          0.051864814 = sum of:
            0.051864814 = weight(_text_:training in 1253) [ClassicSimilarity], result of:
              0.051864814 = score(doc=1253,freq=4.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.21892783 = fieldWeight in 1253, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=1253)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
    We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.
  19. Mair, D.: E-Learning: das Drehbuch : Handbuch für Medienautoren und Projektleiter (2005) 0.04
    0.037333116 = product of:
      0.055999674 = sum of:
        0.025438042 = weight(_text_:based in 1429) [ClassicSimilarity], result of:
          0.025438042 = score(doc=1429,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.16644597 = fieldWeight in 1429, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1429)
        0.030561633 = product of:
          0.061123267 = sum of:
            0.061123267 = weight(_text_:training in 1429) [ClassicSimilarity], result of:
              0.061123267 = score(doc=1429,freq=2.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.2580089 = fieldWeight in 1429, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1429)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Theme
    Computer Based Training
  20. Wartena, C.; Sommer, M.: Automatic classification of scientific records using the German Subject Heading Authority File (SWD) (2012) 0.04
    0.037333116 = product of:
      0.055999674 = sum of:
        0.025438042 = weight(_text_:based in 472) [ClassicSimilarity], result of:
          0.025438042 = score(doc=472,freq=2.0), product of:
            0.15283063 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.050723847 = queryNorm
            0.16644597 = fieldWeight in 472, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=472)
        0.030561633 = product of:
          0.061123267 = sum of:
            0.061123267 = weight(_text_:training in 472) [ClassicSimilarity], result of:
              0.061123267 = score(doc=472,freq=2.0), product of:
                0.23690371 = queryWeight, product of:
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.050723847 = queryNorm
                0.2580089 = fieldWeight in 472, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.67046 = idf(docFreq=1125, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=472)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.
    Content
    This work is partially based on the Bachelor thesis of Maike Sommer. Vgl. auch: http://sda2012.dke-research.de.

Years

Languages

  • e 360
  • d 94
  • a 3
  • el 2
  • i 2
  • f 1
  • nl 1
  • More… Less…

Types

  • a 233
  • s 16
  • r 11
  • i 10
  • x 8
  • m 7
  • p 7
  • n 3
  • b 2
  • More… Less…

Themes