"Multimodal Event Transformer for Image-guided Story Ending Generation. (arXiv:2301.11357v1 [cs.CV])" — A multimodal event transformer, an event-based reasoning framework for image-guided story ending generation which constructs visual and semantic event graphs from story plots and ending image, and leverages event-based reasoning to reason and mine implicit information in a single modality.