"Efficient Attention via Control Variates. (arXiv:2302.04542v1 [cs.LG])" β A look at control variates to show that Random-Feature-based Attention (RFA) can be decomposed into a sum of multiple control variate estimators for each element in the sequence.