Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Sentence Transformer STS restart issue #1814

Merged
merged 2 commits into from
Mar 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions examples/sentence-transformers-training/nli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,16 +67,16 @@ Pretraining the `intfloat/e5-mistral-7b-instruct` model requires approximately 1
python training_nli.py intfloat/e5-mistral-7b-instruct --peft --lora_target_module "q_proj" "k_proj" "v_proj" --learning_rate 1e-5
```

## Multi-card Training with Deepspeed Zero2/3
## Multi-card Training with Deepspeed Zero3

Pretraining the `intfloat/e5-mistral-7b-instruct` model requires approximately 130GB of memory, which exceeds the capacity of a single HPU (Gaudi 2 with 98GB memory). To address this, we can use the Zero2/Zero3 stages of DeepSpeed (model parallelism) to reduce the memory requirements.
Pretraining the `intfloat/e5-mistral-7b-instruct` model requires approximately 130GB of memory, which exceeds the capacity of a single HPU (Gaudi 2 with 98GB memory). To address this, we will use the Zero3 stages of DeepSpeed (model parallelism) to reduce the memory requirements.

Our tests have shown that training this model requires at least four HPUs when using DeepSpeed Zero2.
Our tests have shown that training this model requires at least four HPUs when using DeepSpeed Zero3.

```bash
python ../../gaudi_spawn.py --world_size 4 --use_deepspeed training_nli.py intfloat/e5-mistral-7b-instruct --deepspeed ds_config.json --bf16 --no-use_hpu_graphs_for_training --learning_rate 1e-7
```
In the above command, we need to enable lazy mode with a learning rate of `1e-7` and configure DeepSpeed using the `ds_config.json` file. To further reduce memory usage, change the stage to 3 (DeepSpeed Zero3) in the `ds_config.json` file.
In the above command, we need to enable lazy mode with a learning rate of `1e-7` and configure DeepSpeed using the `ds_config.json` file.

# Dataset

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
},
"gradient_clipping": 1.0,
"zero_optimization": {
"stage": 2,
"stage": 3,
"overlap_comm": false,
"reduce_scatter": false,
"contiguous_gradients": false
Expand Down
8 changes: 4 additions & 4 deletions examples/sentence-transformers-training/sts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,17 +54,17 @@ Pretraining the `intfloat/e5-mistral-7b-instruct` model requires approximately 1
python training_stsbenchmark.py intfloat/e5-mistral-7b-instruct --peft --lora_target_modules "q_proj" "k_proj" "v_proj"
```

## Multi-card Training with Deepspeed Zero2/3
## Multi-card Training with Deepspeed Zero3

Pretraining the `intfloat/e5-mistral-7b-instruct` model requires approximately 130GB of memory, which exceeds the capacity of a single HPU (Gaudi 2 with 98GB memory). To address this, we can use the Zero2/Zero3 stages of DeepSpeed (model parallelism) to reduce the memory requirements.
Pretraining the `intfloat/e5-mistral-7b-instruct` model requires approximately 130GB of memory, which exceeds the capacity of a single HPU (Gaudi 2 with 98GB memory). To address this, we will use the Zero3 stages of DeepSpeed (model parallelism) to reduce the memory requirements.

Our tests have shown that training this model requires at least four HPUs when using DeepSpeed Zero2.
Our tests have shown that training this model requires at least four HPUs when using DeepSpeed Zero3.

```bash
python ../../gaudi_spawn.py --world_size 4 --use_deepspeed training_stsbenchmark.py intfloat/e5-mistral-7b-instruct --deepspeed ds_config.json --bf16 --no-use_hpu_graphs_for_training --learning_rate 1e-7
```

In the above command, we need to enable lazy mode with a learning rate of `1e-7` and configure DeepSpeed using the `ds_config.json` file. To further reduce memory usage, change the stage to 3 (DeepSpeed Zero3) in the `ds_config.json` file.
In the above command, we need to enable lazy mode with a learning rate of `1e-7` and configure DeepSpeed using the `ds_config.json` file.

# Training data

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
},
"gradient_clipping": 1.0,
"zero_optimization": {
"stage": 2,
"stage": 3,
"overlap_comm": false,
"reduce_scatter": false,
"contiguous_gradients": false
Expand Down
Loading