Conversation

Fahim Farook

"Grounding Language Models to Images for Multimodal Generation. (arXiv:2301.13823v1 [cs.CL])" — An efficient method to ground pretrained text-only language models to the visual domain, enabling them to process and generate arbitrarily interleaved image-and-text data. This approach apparently works with any off-the-shelf language model.

Paper: http://arxiv.org/abs/2301.13823

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Our method grounds a language m…
1
1
0