From 0083d9d11d53e20f877ea9680ebf277b66a1647c Mon Sep 17 00:00:00 2001 From: Gilad Turok <36947659+gil2rok@users.noreply.github.com> Date: Tue, 30 Jul 2024 03:19:24 -0400 Subject: [PATCH] Docs: fix GaLore optimizer code example (#32249) Docs: fix GaLore optimizer example Fix incorrect usage of GaLore optimizer in Transformers trainer code example. The GaLore optimizer uses low-rank gradient updates to reduce memory usage. GaLore is quite popular and is implemented by the authors in [https://github.com/jiaweizzhao/GaLore](https://github.com/jiaweizzhao/GaLore). A few months ago GaLore was added to the HuggingFace Transformers library in https://github.com/huggingface/transformers/pull/29588. Documentation of the Trainer module includes a few code examples of how to use GaLore. However, the `optim_targe_modules` argument to the `TrainingArguments` function is incorrect, as discussed in https://github.com/huggingface/transformers/pull/29588#issuecomment-2006289512. This pull request fixes this issue. --- docs/source/en/trainer.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/en/trainer.md b/docs/source/en/trainer.md index b71f42aa147b..916ae6428e87 100644 --- a/docs/source/en/trainer.md +++ b/docs/source/en/trainer.md @@ -278,7 +278,7 @@ args = TrainingArguments( max_steps=100, per_device_train_batch_size=2, optim="galore_adamw", - optim_target_modules=["attn", "mlp"] + optim_target_modules=[r".*.attn.*", r".*.mlp.*"] ) model_id = "google/gemma-2b" @@ -315,7 +315,7 @@ args = TrainingArguments( max_steps=100, per_device_train_batch_size=2, optim="galore_adamw", - optim_target_modules=["attn", "mlp"], + optim_target_modules=[r".*.attn.*", r".*.mlp.*"], optim_args="rank=64, update_proj_gap=100, scale=0.10", ) @@ -359,7 +359,7 @@ args = TrainingArguments( max_steps=100, per_device_train_batch_size=2, optim="galore_adamw_layerwise", - optim_target_modules=["attn", "mlp"] + optim_target_modules=[r".*.attn.*", r".*.mlp.*"] ) model_id = "google/gemma-2b"