Search (1 results, page 1 of 1)

Mao, J.; Xu, W.; Yang, Y.; Wang, J.; Yuille, A.L.: Explain images with multimodal recurrent neural networks (2014) 0.00

0.0035508582 = product of:
  0.014203433 = sum of:
    0.009282129 = product of:
      0.027846385 = sum of:
        0.027846385 = weight(_text_:w in 1557) [ClassicSimilarity], result of:
          0.027846385 = score(doc=1557,freq=2.0), product of:
            0.11022897 = queryWeight, product of:
              3.8108058 = idf(docFreq=2659, maxDocs=44218)
              0.02892537 = queryNorm
            0.2526231 = fieldWeight in 1557, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8108058 = idf(docFreq=2659, maxDocs=44218)
              0.046875 = fieldNorm(doc=1557)
      0.33333334 = coord(1/3)
    0.004921304 = product of:
      0.02460652 = sum of:
        0.02460652 = weight(_text_:28 in 1557) [ClassicSimilarity], result of:
          0.02460652 = score(doc=1557,freq=2.0), product of:
            0.103618294 = queryWeight, product of:
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.02892537 = queryNorm
            0.23747274 = fieldWeight in 1557, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5822632 = idf(docFreq=3342, maxDocs=44218)
              0.046875 = fieldNorm(doc=1557)
      0.2 = coord(1/5)
  0.25 = coord(2/8)

Abstract: In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models the probability distribution of generating a word given previous words and the image. Image descriptions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on three benchmark datasets (IAPR TC-12 [8], Flickr 8K [28], and Flickr 30K [13]). Our model outperforms the state-of-the-art generative method. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.