Search (1 results, page 1 of 1)

Did you mean:
object's%3a%22Gro%c3%9Fe data becker lexikon%22 1
object's%3a%22Gro%c3%9Fe data baecker lexikon%22 1
object's%3a%22Gro%c3%9Fe data becker lexicon%22 1
objects%3a%22Gro%c3%9Fe data becker lexikon%22 1
objects%3a%22Gro%c3%9Fe data baecker lexikon%22 1

Li, W.; Zheng, Y.; Zhan, Y.; Feng, R.; Zhang, T.; Fan, W.: Cross-modal retrieval with dual multi-angle self-attention (2021) 0.01
```
0.0063353376 = product of:
  0.02534135 = sum of:
    0.02534135 = weight(_text_:data in 67) [ClassicSimilarity], result of:
      0.02534135 = score(doc=67,freq=2.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.2096163 = fieldWeight in 67, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=67)
  0.25 = coord(1/4)
```
Abstract

In recent years, cross-modal retrieval has been a popular research topic in both fields of computer vision and natural language processing. There is a huge semantic gap between different modalities on account of heterogeneous properties. How to establish the correlation among different modality data faces enormous challenges. In this work, we propose a novel end-to-end framework named Dual Multi-Angle Self-Attention (DMASA) for cross-modal retrieval. Multiple self-attention mechanisms are applied to extract fine-grained features for both images and texts from different angles. We then integrate coarse-grained and fine-grained features into a multimodal embedding space, in which the similarity degrees between images and texts can be directly compared. Moreover, we propose a special multistage training strategy, in which the preceding stage can provide a good initial value for the succeeding stage and make our framework work better. Very promising experimental results over the state-of-the-art methods can be achieved on three benchmark datasets of Flickr8k, Flickr30k, and MSCOCO.