Conversation

Fahim Farook

"Fine-grained Cross-modal Fusion based Refinement for Text-to-Image Synthesis. (arXiv:2302.08706v1 [cs.CV])" — Another text-to-image generation approach where, instead of generating the final image from a noisy image, you generate an initial low-resolution image based on the input text and then use a GAN (Generative Adversarial Network) during the second stage to generate the final output.

Paper: http://arxiv.org/abs/2302.08706
Code: https://github.com/haoranhfut/FF-GAN

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Samples generated by Attn-GAN, …
0
1
0