"Continual Transformers: Redundancy-Free Attention for Online Inference. (arXiv:2201.06268v3 [cs.AI] UPDATED)" — A novel formulations of the Scaled Dot-Product Attention, which enable Transformers to perform efficient online token-by-token inference on a continual input stream.