Conversation

Fahim Farook

"BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning. (arXiv:2206.08657v3 [cs.CV] UPDATED)" — A proposal for multiple bridge layers that build a connection between the top layers of uni-modal encoders and each layer of the cross-modal encoder.

Paper: http://arxiv.org/abs/2206.08657
Code: https://github.com/microsoft/BridgeTower

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
(a) – (d) are four categories o…
0
1
0