The paper is:
- Dinis Junior, Guilherme, Sindri Magnússon, and Jaakko Hollmén. “Policy Evaluation with Delayed, Aggregated Anonymous Feedback.” In Discovery Science, edited by Poncelet Pascal and Dino Ienco, 114–23. Lecture Notes in Computer Science. Cham: Springer Nature Switzerland, 2022. https://doi.org/10.1007/978-3-031-18840-4_9.
Code snapshot: https://github.com/guidj/rl-daaf/tree/71d971147ae063c56980227ed0ee7a0b3687e257