"Learning Sequential Latent Variable Models from Multimodal Time Series Data. (arXiv:2204.10419v2 [cs.LG] UPDATED)" — A self-supervised generative modelling framework to jointly learn a probabilistic latent state representation of multimodal data and the respective dynamics to improve prediction and representation quality.