Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove deprecated Mixed precision flags #1471

Merged
merged 1 commit into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/package_reference/gaudi_config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ Here is a description of each configuration parameter:
- `use_fused_adam` enables to decide whether to use the [custom fused implementation of the ADAM optimizer provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#custom-optimizers).
- `use_fused_clip_norm` enables to decide whether to use the [custom fused implementation of gradient norm clipping provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#other-custom-ops).
- `use_torch_autocast` enables PyTorch autocast; used to define good pre-defined config; users should favor `--bf16` training argument
- `autocast_bf16_ops` list of operations that should be run with bf16 precision under autocast context; using environment flag LOWER_LIST is a preffered way for operator autocast list override
- `autocast_fp32_ops` list of operations that should be run with fp32 precision under autocast context; using environment flag FP32_LIST is a preffered way for operator autocast list override
- `autocast_bf16_ops` list of operations that should be run with bf16 precision under autocast context; using environment flag PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST is a preffered way for operator autocast list override
- `autocast_fp32_ops` list of operations that should be run with fp32 precision under autocast context; using environment flag PT_HPU_AUTOCAST_FP32_OPS_LIST is a preffered way for operator autocast list override


You can find examples of Gaudi configurations in the [Habana model repository on the Hugging Face Hub](https://huggingface.co/habana). For instance, [for BERT Large we have](https://huggingface.co/Habana/bert-large-uncased-whole-word-masking/blob/main/gaudi_config.json):
Expand Down
10 changes: 5 additions & 5 deletions examples/language-modeling/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,7 @@ python3 run_lora_clm.py \
```
- Single-card finetuning of Falcon-40B:
```bash
LOWER_LIST=ops_bf16.txt python3 run_lora_clm.py \
PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt python3 run_lora_clm.py \
--model_name_or_path tiiuae/falcon-40b \
--dataset_name timdettmers/openassistant-guanaco \
--bf16 True \
Expand Down Expand Up @@ -474,7 +474,7 @@ python ../gaudi_spawn.py \

- Multi-card finetuning of Llama2-7B with FP8:
```bash
LOWER_LIST=ops_bf16.txt python ../gaudi_spawn.py \
PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt python ../gaudi_spawn.py \
--world_size 8 --use_mpi run_lora_clm.py \
--model_name_or_path meta-llama/Llama-2-7b-hf \
--dataset_name tatsu-lab/alpaca \
Expand Down Expand Up @@ -569,7 +569,7 @@ python ../gaudi_spawn.py \

- Multi-card finetuning of Falcon-40B:
```bash
LOWER_LIST=ops_bf16.txt python3 ../gaudi_spawn.py \
PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt python3 ../gaudi_spawn.py \
--world_size 8 --use_mpi run_lora_clm.py \
--model_name_or_path tiiuae/falcon-40b \
--dataset_name timdettmers/openassistant-guanaco \
Expand Down Expand Up @@ -647,7 +647,7 @@ python3 ../gaudi_spawn.py --use_deepspeed --world_size 8 run_lora_clm.py \
- Multi-card finetuning of Llama2-70B with FSDP and LoRA:

```bash
LOWER_LIST=ops_bf16.txt PT_HPU_LAZY_MODE=0 \
PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt PT_HPU_LAZY_MODE=0 \
python3 ../gaudi_spawn.py --world_size 8 --use_mpi run_lora_clm.py \
--model_name_or_path meta-llama/Llama-2-70b-hf \
--dataset_name tatsu-lab/alpaca \
Expand Down Expand Up @@ -690,7 +690,7 @@ python3 ../gaudi_spawn.py --world_size 8 --use_mpi run_lora_clm.py \
- Falcon-180B example command saves only the LoRA parameters at end
- For inference we need to merge the pretrained model and LoRA weights
```bash
DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED=1 LOWER_LIST=ops_bf16.txt python3 ../gaudi_spawn.py \
PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt python3 ../gaudi_spawn.py \
--world_size 8 --use_deepspeed run_lora_clm.py \
--model_name_or_path tiiuae/falcon-180B \
--dataset_name timdettmers/openassistant-guanaco \
Expand Down
4 changes: 2 additions & 2 deletions optimum/habana/transformers/gaudi_configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,5 +93,5 @@ def declare_autocast_bf16_fp32_ops(self):
autocast_bf16_filename,
autocast_fp32_filename,
)
os.environ["LOWER_LIST"] = autocast_bf16_filename
os.environ["FP32_LIST"] = autocast_fp32_filename
os.environ["PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST"] = autocast_bf16_filename
os.environ["PT_HPU_AUTOCAST_FP32_OPS_LIST"] = autocast_fp32_filename
2 changes: 1 addition & 1 deletion optimum/habana/transformers/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ def __init__(
"The argument `--bf16` was not given but `use_torch_autocast` is True in the Gaudi configuration so mixed-precision training with Torch Autocast is enabled."
)

if self.use_hpu_amp and "LOWER_LIST" not in os.environ:
if self.use_hpu_amp and "PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST" not in os.environ:
self.gaudi_config.declare_autocast_bf16_fp32_ops()

if self.args.use_lazy_mode:
Expand Down
8 changes: 5 additions & 3 deletions tests/test_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -439,7 +439,7 @@ def test(self):

env_variables = os.environ.copy()
if "falcon" in model_name:
env_variables["LOWER_LIST"] = str(example_script.parent / "ops_bf16.txt")
env_variables["PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST"] = str(example_script.parent / "ops_bf16.txt")
elif "flan" in model_name:
env_variables["PT_HPU_MAX_COMPOUND_OP_SIZE"] = "512"
elif "bloom" in model_name:
Expand All @@ -450,13 +450,15 @@ def test(self):
env_variables["DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED"] = "1"
elif fsdp:
if "llama" in model_name:
env_variables["LOWER_LIST"] = str(example_script.parent / "ops_bf16.txt")
env_variables["PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST"] = str(
example_script.parent / "ops_bf16.txt"
)
env_variables["PT_HPU_LAZY_MODE"] = "0"
elif deepspeed and "gpt-neox-20b" in model_name:
env_variables["LD_PRELOAD"] = ""

if fp8 and "llama" in model_name:
env_variables["LOWER_LIST"] = str(example_script.parent / "ops_bf16.txt")
env_variables["PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST"] = str(example_script.parent / "ops_bf16.txt")

extra_command_line_arguments = baseline.get("distribution").get(distribution).get("extra_arguments", [])

Expand Down
Loading