This package contains the official PyTorch implementation of our inverse- and square-root free Shampoo optimizer from our ICML paper 'Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective' (the 'IF-Shampoo' optimizer in Fig. 3).
Some highlights of the optimizer:
- Numerically stable, even in
bfloat16
, due to a fully matrix-multiplication based update (no matrix decompositions) - Compatible with any architecture as the pre-conditioner only uses mini-batch gradients
- Kronecker factors can have structures to reduce memory and computation, thanks to our previous SINGD work
-
Stable (recommended):
pip install sirfshampoo
-
Latest version from GitHub
main
branch:pip install git+https://github.com/f-dangel/sirfshampo.git@main
-
SIRFShampoo
assumes that the objective is an average over per-example losses. -
The code has stabilized only recently. Expect things to break and help us improve by filing issues.
If you find this code useful for your research, consider citing the paper:
@inproceedings{lin2024can,
title = {Can We Remove the Square-Root in Adaptive Gradient Methods? A
Second-Order Perspective},
author = {Wu Lin and Felix Dangel and Runa Eschenhagen and Juhan Bae and
Richard E. Turner and Alireza Makhzani},
booktitle = {International Conference on Machine Learning (ICML)},
year = 2024,
}