Skip to content

Commit

Permalink
Fix Sentence Transformer STS restart issue (#1814)
Browse files Browse the repository at this point in the history
  • Loading branch information
ZhengHongming888 authored Mar 5, 2025
1 parent 2691f25 commit 6d052ff
Show file tree
Hide file tree
Showing 4 changed files with 10 additions and 10 deletions.
8 changes: 4 additions & 4 deletions examples/sentence-transformers-training/nli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,16 +67,16 @@ Pretraining the `intfloat/e5-mistral-7b-instruct` model requires approximately 1
python training_nli.py intfloat/e5-mistral-7b-instruct --peft --lora_target_module "q_proj" "k_proj" "v_proj" --learning_rate 1e-5
```

## Multi-card Training with Deepspeed Zero2/3
## Multi-card Training with Deepspeed Zero3

Pretraining the `intfloat/e5-mistral-7b-instruct` model requires approximately 130GB of memory, which exceeds the capacity of a single HPU (Gaudi 2 with 98GB memory). To address this, we can use the Zero2/Zero3 stages of DeepSpeed (model parallelism) to reduce the memory requirements.
Pretraining the `intfloat/e5-mistral-7b-instruct` model requires approximately 130GB of memory, which exceeds the capacity of a single HPU (Gaudi 2 with 98GB memory). To address this, we will use the Zero3 stages of DeepSpeed (model parallelism) to reduce the memory requirements.

Our tests have shown that training this model requires at least four HPUs when using DeepSpeed Zero2.
Our tests have shown that training this model requires at least four HPUs when using DeepSpeed Zero3.

```bash
python ../../gaudi_spawn.py --world_size 4 --use_deepspeed training_nli.py intfloat/e5-mistral-7b-instruct --deepspeed ds_config.json --bf16 --no-use_hpu_graphs_for_training --learning_rate 1e-7
```
In the above command, we need to enable lazy mode with a learning rate of `1e-7` and configure DeepSpeed using the `ds_config.json` file. To further reduce memory usage, change the stage to 3 (DeepSpeed Zero3) in the `ds_config.json` file.
In the above command, we need to enable lazy mode with a learning rate of `1e-7` and configure DeepSpeed using the `ds_config.json` file.

# Dataset

Expand Down
2 changes: 1 addition & 1 deletion examples/sentence-transformers-training/nli/ds_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
},
"gradient_clipping": 1.0,
"zero_optimization": {
"stage": 2,
"stage": 3,
"overlap_comm": false,
"reduce_scatter": false,
"contiguous_gradients": false
Expand Down
8 changes: 4 additions & 4 deletions examples/sentence-transformers-training/sts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,17 +54,17 @@ Pretraining the `intfloat/e5-mistral-7b-instruct` model requires approximately 1
python training_stsbenchmark.py intfloat/e5-mistral-7b-instruct --peft --lora_target_modules "q_proj" "k_proj" "v_proj"
```

## Multi-card Training with Deepspeed Zero2/3
## Multi-card Training with Deepspeed Zero3

Pretraining the `intfloat/e5-mistral-7b-instruct` model requires approximately 130GB of memory, which exceeds the capacity of a single HPU (Gaudi 2 with 98GB memory). To address this, we can use the Zero2/Zero3 stages of DeepSpeed (model parallelism) to reduce the memory requirements.
Pretraining the `intfloat/e5-mistral-7b-instruct` model requires approximately 130GB of memory, which exceeds the capacity of a single HPU (Gaudi 2 with 98GB memory). To address this, we will use the Zero3 stages of DeepSpeed (model parallelism) to reduce the memory requirements.

Our tests have shown that training this model requires at least four HPUs when using DeepSpeed Zero2.
Our tests have shown that training this model requires at least four HPUs when using DeepSpeed Zero3.

```bash
python ../../gaudi_spawn.py --world_size 4 --use_deepspeed training_stsbenchmark.py intfloat/e5-mistral-7b-instruct --deepspeed ds_config.json --bf16 --no-use_hpu_graphs_for_training --learning_rate 1e-7
```

In the above command, we need to enable lazy mode with a learning rate of `1e-7` and configure DeepSpeed using the `ds_config.json` file. To further reduce memory usage, change the stage to 3 (DeepSpeed Zero3) in the `ds_config.json` file.
In the above command, we need to enable lazy mode with a learning rate of `1e-7` and configure DeepSpeed using the `ds_config.json` file.

# Training data

Expand Down
2 changes: 1 addition & 1 deletion examples/sentence-transformers-training/sts/ds_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
},
"gradient_clipping": 1.0,
"zero_optimization": {
"stage": 2,
"stage": 3,
"overlap_comm": false,
"reduce_scatter": false,
"contiguous_gradients": false
Expand Down

0 comments on commit 6d052ff

Please sign in to comment.