Skip to content

Commit 23875b1

Browse files
authored
Update docs for baichuan2 training (huggingface#1586)
1 parent 26ac244 commit 23875b1

File tree

3 files changed

+29
-2
lines changed

3 files changed

+29
-2
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -239,7 +239,7 @@ The following model architectures, tasks and device distributions have been vali
239239
| DETR | | <div style="text-align:left"><li>Single card</li></div> | <li>[object detection](https://github.com/huggingface/optimum-habana/tree/main/examples/object-detection)</li> |
240240
| Mllama | <div style="text-align:left"><li>LoRA</li></div> | :heavy_check_mark: | <li>[image to text](https://github.com/huggingface/optimum-habana/tree/main/examples/image-to-text)</li> |
241241
| MiniCPM3 | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
242-
| Baichuan2 | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
242+
| Baichuan2 | <div style="text-align:left"><li>DeepSpeed</li></div> | <div style="text-align:left"><li>Single card</li></div> | <li>[language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)</li><li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
243243
| DeepSeek-V2 | | :heavy_check_mark: | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
244244
| ChatGLM | <div style="text-align:left"><li>DeepSpeed</li></div> | <div style="text-align:left"><li>Single card</li></div> | <li>[language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)</li><li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
245245
</div>

docs/source/index.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ In the tables below, ✅ means single-card, multi-card and DeepSpeed have all be
106106
| DETR | | <div style="text-align:left"><li>Single card</li></div> | <li>[object detection](https://github.com/huggingface/optimum-habana/tree/main/examples/object-detection)</li> |
107107
| Mllama | <div style="text-align:left"><li>LoRA</li></div> || <li>[image to text](https://github.com/huggingface/optimum-habana/tree/main/examples/image-to-text)</li> |
108108
| MiniCPM3 | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
109-
| Baichuan2 | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
109+
| Baichuan2 | <div style="text-align:left"><li>DeepSpeed</li></div> | <div style="text-align:left"><li>Single card</li></div> | <li>[language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)</li><li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
110110
| DeepSeek-V2 | || <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
111111
| ChatGLM | <div style="text-align:left"><li>DeepSpeed</li></div> | <div style="text-align:left"><li>Single card</li></div> | <li>[language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)</li><li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
112112

examples/language-modeling/README.md

+27
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,33 @@ python ../gaudi_spawn.py \
157157
--logging_steps 20
158158
```
159159

160+
### Multi-card Training with Deepspeed (Baichuan2-13B-Chat)
161+
```bash
162+
python ../gaudi_spawn.py \
163+
--world_size 8 --use_deepspeed run_clm.py \
164+
--config_name baichuan-inc/Baichuan2-13B-Chat \
165+
--tokenizer_name baichuan-inc/Baichuan2-13B-Chat \
166+
--dataset_name wikitext \
167+
--num_train_epochs 30 \
168+
--dataset_config_name wikitext-2-raw-v1 \
169+
--per_device_train_batch_size 2 \
170+
--per_device_eval_batch_size 2 \
171+
--do_train \
172+
--do_eval \
173+
--deepspeed llama2_ds_zero3_config.json \
174+
--output_dir /tmp/test-clm \
175+
--gaudi_config_name Habana/gpt2 \
176+
--use_habana \
177+
--use_lazy_mode \
178+
--throughput_warmup_steps 3 \
179+
--bf16 \
180+
--block_size 1024 \
181+
--use_cache False \
182+
--overwrite_output_dir \
183+
--logging_first_step True \
184+
--logging_steps 20
185+
```
186+
160187

161188
## Multi-Node Training with Deepspeed (GPT-NeoX)
162189

0 commit comments

Comments
 (0)