Search (1 results, page 1 of 1)

  • × author_ss:"Fan, W."
  • × author_ss:"Li, W."
  • × year_i:[2020 TO 2030}
  1. Li, W.; Zheng, Y.; Zhan, Y.; Feng, R.; Zhang, T.; Fan, W.: Cross-modal retrieval with dual multi-angle self-attention (2021) 0.03
    0.025393788 = product of:
      0.06771677 = sum of:
        0.04338471 = weight(_text_:retrieval in 67) [ClassicSimilarity], result of:
          0.04338471 = score(doc=67,freq=6.0), product of:
            0.124912694 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.041294612 = queryNorm
            0.34732026 = fieldWeight in 67, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=67)
        0.014968331 = weight(_text_:of in 67) [ClassicSimilarity], result of:
          0.014968331 = score(doc=67,freq=10.0), product of:
            0.06457475 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.041294612 = queryNorm
            0.23179851 = fieldWeight in 67, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=67)
        0.009363732 = product of:
          0.018727465 = sum of:
            0.018727465 = weight(_text_:on in 67) [ClassicSimilarity], result of:
              0.018727465 = score(doc=67,freq=4.0), product of:
                0.090823986 = queryWeight, product of:
                  2.199415 = idf(docFreq=13325, maxDocs=44218)
                  0.041294612 = queryNorm
                0.20619515 = fieldWeight in 67, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.199415 = idf(docFreq=13325, maxDocs=44218)
                  0.046875 = fieldNorm(doc=67)
          0.5 = coord(1/2)
      0.375 = coord(3/8)
    
    Abstract
    In recent years, cross-modal retrieval has been a popular research topic in both fields of computer vision and natural language processing. There is a huge semantic gap between different modalities on account of heterogeneous properties. How to establish the correlation among different modality data faces enormous challenges. In this work, we propose a novel end-to-end framework named Dual Multi-Angle Self-Attention (DMASA) for cross-modal retrieval. Multiple self-attention mechanisms are applied to extract fine-grained features for both images and texts from different angles. We then integrate coarse-grained and fine-grained features into a multimodal embedding space, in which the similarity degrees between images and texts can be directly compared. Moreover, we propose a special multistage training strategy, in which the preceding stage can provide a good initial value for the succeeding stage and make our framework work better. Very promising experimental results over the state-of-the-art methods can be achieved on three benchmark datasets of Flickr8k, Flickr30k, and MSCOCO.
    Source
    Journal of the Association for Information Science and Technology. 72(2021) no.1, S.46-65