Search (44 results, page 1 of 3)

  • × theme_ss:"Automatisches Indexieren"
  1. Mao, J.; Xu, W.; Yang, Y.; Wang, J.; Yuille, A.L.: Explain images with multimodal recurrent neural networks (2014) 0.03
    0.029296497 = product of:
      0.058592994 = sum of:
        0.058592994 = product of:
          0.11718599 = sum of:
            0.11718599 = weight(_text_:network in 1557) [ClassicSimilarity], result of:
              0.11718599 = score(doc=1557,freq=6.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.51133573 = fieldWeight in 1557, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1557)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models the probability distribution of generating a word given previous words and the image. Image descriptions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on three benchmark datasets (IAPR TC-12 [8], Flickr 8K [28], and Flickr 30K [13]). Our model outperforms the state-of-the-art generative method. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.
  2. Blank, I.; Rokach, L.; Shani, G.: Leveraging metadata to recommend keywords for academic papers (2016) 0.03
    0.028190564 = product of:
      0.05638113 = sum of:
        0.05638113 = product of:
          0.11276226 = sum of:
            0.11276226 = weight(_text_:network in 3232) [ClassicSimilarity], result of:
              0.11276226 = score(doc=3232,freq=8.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.492033 = fieldWeight in 3232, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3232)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Users of research databases, such as CiteSeerX, Google Scholar, and Microsoft Academic, often search for papers using a set of keywords. Unfortunately, many authors avoid listing sufficient keywords for their papers. As such, these applications may need to automatically associate good descriptive keywords with papers. When the full text of the paper is available this problem has been thoroughly studied. In many cases, however, due to copyright limitations, research databases do not have access to the full text. On the other hand, such databases typically maintain metadata, such as the title and abstract and the citation network of each paper. In this paper we study the problem of predicting which keywords are appropriate for a research paper, using different methods based on the citation network and available metadata. Our main goal is in providing search engines with the ability to extract keywords from the available metadata. However, our system can also be used for other applications, such as for recommending keywords for the authors of new papers. We create a data set of research papers, and their citation network, keywords, and other metadata, containing over 470K papers with and more than 2 million keywords. We compare our methods with predicting keywords using the title and abstract, in offline experiments and in a user study, concluding that the citation network provides much better predictions.
  3. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.03
    0.027889157 = product of:
      0.055778313 = sum of:
        0.055778313 = product of:
          0.11155663 = sum of:
            0.11155663 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.11155663 = score(doc=402,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  4. Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.02
    0.024403011 = product of:
      0.048806023 = sum of:
        0.048806023 = product of:
          0.097612046 = sum of:
            0.097612046 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
              0.097612046 = score(doc=262,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.5416616 = fieldWeight in 262, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=262)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    20.10.2000 12:22:23
  5. Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.02
    0.024403011 = product of:
      0.048806023 = sum of:
        0.048806023 = product of:
          0.097612046 = sum of:
            0.097612046 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
              0.097612046 = score(doc=6265,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.5416616 = fieldWeight in 6265, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6265)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Information outlook. 9(2005) no.8, S.22-23
  6. Alexander, M.: Retrieving digital data with fuzzy matching (1997) 0.02
    0.022552451 = product of:
      0.045104902 = sum of:
        0.045104902 = product of:
          0.090209804 = sum of:
            0.090209804 = weight(_text_:network in 151) [ClassicSimilarity], result of:
              0.090209804 = score(doc=151,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.3936264 = fieldWeight in 151, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0625 = fieldNorm(doc=151)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In 1993 the British Library established a programme of activities entitled Initiatives for Access (IFA) to identify and develop computer applications based on the new technologies emerging in the aereas of digital and network service. Discusses the problem of the effective retrieval of digital data after its capture focusing on the product Excalibur EFS which looks at the way information is sorted at its fundamental level and identifies patterns in numbers. Looks at the benefits of Excalibur and outlines other experiments in progress as part of the IFA programme
  7. Hirawa, M.: Role of keywords in the network searching era (1998) 0.02
    0.022552451 = product of:
      0.045104902 = sum of:
        0.045104902 = product of:
          0.090209804 = sum of:
            0.090209804 = weight(_text_:network in 3446) [ClassicSimilarity], result of:
              0.090209804 = score(doc=3446,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.3936264 = fieldWeight in 3446, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3446)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  8. Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.02
    0.020916866 = product of:
      0.041833732 = sum of:
        0.041833732 = product of:
          0.083667465 = sum of:
            0.083667465 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
              0.083667465 = score(doc=58,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.46428138 = fieldWeight in 58, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=58)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    14. 6.2015 22:12:44
  9. Hauer, M.: Automatische Indexierung (2000) 0.02
    0.020916866 = product of:
      0.041833732 = sum of:
        0.041833732 = product of:
          0.083667465 = sum of:
            0.083667465 = weight(_text_:22 in 5887) [ClassicSimilarity], result of:
              0.083667465 = score(doc=5887,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.46428138 = fieldWeight in 5887, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5887)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Wissen in Aktion: Wege des Knowledge Managements. 22. Online-Tagung der DGI, Frankfurt am Main, 2.-4.5.2000. Proceedings. Hrsg.: R. Schmidt
  10. Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.02
    0.020916866 = product of:
      0.041833732 = sum of:
        0.041833732 = product of:
          0.083667465 = sum of:
            0.083667465 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
              0.083667465 = score(doc=2051,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.46428138 = fieldWeight in 2051, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2051)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    14. 6.2015 22:12:56
  11. Hauer, M.: Tiefenindexierung im Bibliothekskatalog : 17 Jahre intelligentCAPTURE (2019) 0.02
    0.020916866 = product of:
      0.041833732 = sum of:
        0.041833732 = product of:
          0.083667465 = sum of:
            0.083667465 = weight(_text_:22 in 5629) [ClassicSimilarity], result of:
              0.083667465 = score(doc=5629,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.46428138 = fieldWeight in 5629, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5629)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    B.I.T.online. 22(2019) H.2, S.163-166
  12. Kumpe, D.: Methoden zur automatischen Indexierung von Dokumenten (2006) 0.02
    0.019733394 = product of:
      0.039466787 = sum of:
        0.039466787 = product of:
          0.078933574 = sum of:
            0.078933574 = weight(_text_:network in 782) [ClassicSimilarity], result of:
              0.078933574 = score(doc=782,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.3444231 = fieldWeight in 782, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=782)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Diese Diplomarbeit handelt von der Indexierung von unstrukturierten und natürlichsprachigen Dokumenten. Die zunehmende Informationsflut und die Zahl an veröffentlichten wissenschaftlichen Berichten und Büchern machen eine maschinelle inhaltliche Erschließung notwendig. Um die Anforderungen hierfür besser zu verstehen, werden Probleme der natürlichsprachigen schriftlichen Kommunikation untersucht. Die manuellen Techniken der Indexierung und die Dokumentationssprachen werden vorgestellt. Die Indexierung wird thematisch in den Bereich der inhaltlichen Erschließung und des Information Retrieval eingeordnet. Weiterhin werden Vor- und Nachteile von ausgesuchten Algorithmen untersucht und Softwareprodukte im Bereich des Information Retrieval auf ihre Arbeitsweise hin evaluiert. Anhand von Beispiel-Dokumenten werden die Ergebnisse einzelner Verfahren vorgestellt. Mithilfe des Projekts European Migration Network werden Probleme und grundlegende Anforderungen an die Durchführung einer inhaltlichen Erschließung identifiziert und Lösungsmöglichkeiten vorgeschlagen.
  13. Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
              0.06972289 = score(doc=1952,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 1952, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1952)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    16. 8.1998 12:51:22
  14. Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
              0.06972289 = score(doc=4157,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 4157, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4157)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill
  15. Tsareva, P.V.: Algoritmy dlya raspoznavaniya pozitivnykh i negativnykh vkhozdenii deskriptorov v tekst i protsedura avtomaticheskoi klassifikatsii tekstov (1999) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 374) [ClassicSimilarity], result of:
              0.06972289 = score(doc=374,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 374, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=374)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    1. 4.2002 10:22:41
  16. Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
              0.06972289 = score(doc=2759,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 2759, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2759)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    1. 2.2016 18:25:22
  17. Goller, C.; Löning, J.; Will, T.; Wolff, W.: Automatic document classification : a thourough evaluation of various methods (2000) 0.02
    0.016914338 = product of:
      0.033828676 = sum of:
        0.033828676 = product of:
          0.06765735 = sum of:
            0.06765735 = weight(_text_:network in 5480) [ClassicSimilarity], result of:
              0.06765735 = score(doc=5480,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.29521978 = fieldWeight in 5480, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5480)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    (Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. In this paper we thoroughly evaluate a wide variety of these methods on a document classification task for German text. We evaluate different feature construction and selection methods and various classifiers. Our main results are: (1) feature selection is necessary not only to reduce learning and classification time, but also to avoid overfitting (even for Support Vector Machines); (2) surprisingly, our morphological analysis does not improve classification quality compared to a letter 5-gram approach; (3) Support Vector Machines are significantly better than all other classification methods
  18. Karpathy, A.; Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions (2015) 0.02
    0.016914338 = product of:
      0.033828676 = sum of:
        0.033828676 = product of:
          0.06765735 = sum of:
            0.06765735 = weight(_text_:network in 1868) [ClassicSimilarity], result of:
              0.06765735 = score(doc=1868,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.29521978 = fieldWeight in 1868, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1868)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We present a model that generates free-form natural language descriptions of image regions. Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data. Our approach is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate the effectiveness of our alignment model with ranking experiments on Flickr8K, Flickr30K and COCO datasets, where we substantially improve on the state of the art. We then show that the sentences created by our generative model outperform retrieval baselines on the three aforementioned datasets and a new dataset of region-level annotations.
  19. Kiros, R.; Salakhutdinov, R.; Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models (2014) 0.02
    0.016914338 = product of:
      0.033828676 = sum of:
        0.033828676 = product of:
          0.06765735 = sum of:
            0.06765735 = weight(_text_:network in 1871) [ClassicSimilarity], result of:
              0.06765735 = score(doc=1871,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.29521978 = fieldWeight in 1871, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1871)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embedding space with images and text and (b): a novel language model for decoding distributed representations from our space. Our pipeline effectively unifies joint image-text embedding models with multimodal neural language models. We introduce the structure-content neural language model that disentangles the structure of a sentence to its content, conditioned on representations produced by the encoder. The encoder allows one to rank images and sentences while the decoder can generate novel descriptions from scratch. Using LSTM to encode sentences, we match the state-of-the-art performance on Flickr8K and Flickr30K without using object detections. We also set new best results when using the 19-layer Oxford convolutional network. Furthermore we show that with linear encoders, the learned embedding space captures multimodal regularities in terms of vector space arithmetic e.g. *image of a blue car* - "blue" + "red" is near images of red cars. Sample captions generated for 800 images are made available for comparison.
  20. Donahue, J.; Hendricks, L.A.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description (2014) 0.01
    0.014095282 = product of:
      0.028190564 = sum of:
        0.028190564 = product of:
          0.05638113 = sum of:
            0.05638113 = weight(_text_:network in 1873) [ClassicSimilarity], result of:
              0.05638113 = score(doc=1873,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.2460165 = fieldWeight in 1873, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1873)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep" in that they can be compositional in spatial and temporal "layers". Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

Years

Languages

  • e 26
  • d 16
  • ja 1
  • ru 1
  • More… Less…

Types

  • a 38
  • el 7
  • x 3
  • m 1
  • More… Less…