Skip to content

Commit

Permalink
Merge branch 'main' into ds/torch-compile-model-setup
Browse files Browse the repository at this point in the history
  • Loading branch information
dsmertin authored Nov 22, 2024
2 parents aa268c0 + 3cdb0eb commit c94329e
Show file tree
Hide file tree
Showing 55 changed files with 5,552 additions and 251 deletions.
6 changes: 2 additions & 4 deletions .github/workflows/fast_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,7 @@ concurrency:
jobs:
transformers:
name: Run tests for optimum.habana.transformers
runs-on:
group: aws-dl1-24xlarge
runs-on: [self-hosted, linux, x64, gaudi2]
steps:
- name: Checkout
uses: actions/checkout@v2
Expand All @@ -38,8 +37,7 @@ jobs:
/bin/bash tests/ci/fast_tests.sh
diffusers:
name: Run tests for optimum.habana.diffusers
runs-on:
group: aws-dl1-24xlarge
runs-on: [self-hosted, linux, x64, gaudi2]
steps:
- name: Checkout
uses: actions/checkout@v2
Expand Down
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ style: clean
fast_tests:
python -m pip install .[tests]
python -m pytest tests/test_gaudi_configuration.py tests/test_trainer_distributed.py tests/test_trainer.py tests/test_trainer_seq2seq.py
# TODO enable when CI has more servers
# python -m pytest test_functional_text_generation_example.py

# Run unit and integration tests related to Diffusers
fast_tests_diffusers:
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,8 @@ The following model architectures, tasks and device distributions have been vali
| Qwen2 | <div style="text-align:left"><li>Single card</li></div> | <div style="text-align:left"><li>Single card</li></div> | <li>[language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)</li><li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| Qwen2-MoE | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| Gemma | :heavy_check_mark: | <div style="text-align:left"><li>Single card</li></div> | <li>[language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)</li><li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| XGLM | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| Cohere | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| T5 / Flan T5 | :heavy_check_mark: | :heavy_check_mark: | <li>[summarization](https://github.com/huggingface/optimum-habana/tree/main/examples/summarization)</li><li>[translation](https://github.com/huggingface/optimum-habana/tree/main/examples/translation)</li><li>[question answering](https://github.com/huggingface/optimum-habana/tree/main/examples/question-answering#fine-tuning-t5-on-squad20)</li> |
| BART | | <div style="text-align:left"><li>Single card</li></div> | <li>[summarization](https://github.com/huggingface/optimum-habana/tree/main/examples/summarization)</li><li>[translation](https://github.com/huggingface/optimum-habana/tree/main/examples/translation)</li><li>[question answering](https://github.com/huggingface/optimum-habana/tree/main/examples/question-answering#fine-tuning-t5-on-squad20)</li> |
| ViT | :heavy_check_mark: | :heavy_check_mark: | <li>[image classification](https://github.com/huggingface/optimum-habana/tree/main/examples/image-classification)</li> |
Expand All @@ -228,10 +230,12 @@ The following model architectures, tasks and device distributions have been vali
| OWLViT | | <div style="text-align:left"><li>Single card</li></div> | <li>[zero shot object detection](https://github.com/huggingface/optimum-habana/tree/main/examples/zero-shot-object-detection)</li> |
| ClipSeg | | <div style="text-align:left"><li>Single card</li></div> | <li>[object segmentation](https://github.com/huggingface/optimum-habana/tree/main/examples/object-segementation)</li> |
| Llava / Llava-next | | <div style="text-align:left"><li>Single card</li></div> | <li>[image to text](https://github.com/huggingface/optimum-habana/tree/main/examples/image-to-text)</li> |
| idefics2 | <div style="text-align:left"><li>LoRA</li></div> | <div style="text-align:left"><li>Single card</li></div> | <li>[image to text](https://github.com/huggingface/optimum-habana/tree/main/examples/image-to-text)</li> |
| Segment Anything Model | | <div style="text-align:left"><li>Single card</li></div> | <li>[object segmentation](https://github.com/huggingface/optimum-habana/tree/main/examples/object-segementation)</li> |
| VideoMAE | | <div style="text-align:left"><li>Single card</li></div> | <li>[Video classification](https://github.com/huggingface/optimum-habana/tree/main/examples/video-classification)</li> |
| TableTransformer | | <div style="text-align:left"><li>Single card</li></div> | <li>[table object detection](https://github.com/huggingface/optimum-habana/tree/main/examples/table-detection) </li> |
| DETR | | <div style="text-align:left"><li>Single card</li></div> | <li>[object detection](https://github.com/huggingface/optimum-habana/tree/main/examples/object-detection)</li> |
| Mllama | <div style="text-align:left"><li>LoRA</li></div> | :heavy_check_mark: | <li>[image to text](https://github.com/huggingface/optimum-habana/tree/main/examples/image-to-text)</li> |
</div>
Expand Down
4 changes: 4 additions & 0 deletions docs/source/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ In the tables below, ✅ means single-card, multi-card and DeepSpeed have all be
| Qwen2 | <div style="text-align:left"><li>Single card</li></div> | <div style="text-align:left"><li>Single card</li></div> | <li>[language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)</li><li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| Qwen2-MoE | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| Persimmon | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| XGLM | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| Cohere | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| T5 / Flan T5 ||| <li>[summarization](https://github.com/huggingface/optimum-habana/tree/main/examples/summarization)</li><li>[translation](https://github.com/huggingface/optimum-habana/tree/main/examples/translation)</li><li>[question answering](https://github.com/huggingface/optimum-habana/tree/main/examples/question-answering#fine-tuning-t5-on-squad20)</li> |
| BART | | <div style="text-align:left"><li>Single card</li></div> | <li>[summarization](https://github.com/huggingface/optimum-habana/tree/main/examples/summarization)</li><li>[translation](https://github.com/huggingface/optimum-habana/tree/main/examples/translation)</li><li>[question answering](https://github.com/huggingface/optimum-habana/tree/main/examples/question-answering#fine-tuning-t5-on-squad20)</li> |
| ViT ||| <li>[image classification](https://github.com/huggingface/optimum-habana/tree/main/examples/image-classification)</li> |
Expand All @@ -74,10 +76,12 @@ In the tables below, ✅ means single-card, multi-card and DeepSpeed have all be
| OWLViT | | <div style="text-align:left"><li>Single card</li></div> | <li>[zero shot object detection](https://github.com/huggingface/optimum-habana/tree/main/examples/zero-shot-object-detection)</li> |
| ClipSeg | | <div style="text-align:left"><li>Single card</li></div> | <li>[object segmentation](https://github.com/huggingface/optimum-habana/tree/main/examples/object-segementation)</li> |
| Llava / Llava-next | | <div style="text-align:left"><li>Single card</li></div> | <li>[image to text](https://github.com/huggingface/optimum-habana/tree/main/examples/image-to-text)</li> |
| idefics2 | <div style="text-align:left"><li>LoRA</li></div> | <div style="text-align:left"><li>Single card</li></div> | <li>[image to text](https://github.com/huggingface/optimum-habana/tree/main/examples/image-to-text)</li> |
| SAM | | <div style="text-align:left"><li>Single card</li></div> | <li>[object segmentation](https://github.com/huggingface/optimum-habana/tree/main/examples/object-segementation)</li> |
| VideoMAE | | <div style="text-align:left"><li>Single card</li></div> | <li>[Video classification](https://github.com/huggingface/optimum-habana/tree/main/examples/video-classification)</li> |
| TableTransformer | | <div style="text-align:left"><li>Single card</li></div> | <li>[table object detection](https://github.com/huggingface/optimum-habana/tree/main/examples/table-detection)</li> |
| DETR | | <div style="text-align:left"><li>Single card</li></div> | <li>[object detection](https://github.com/huggingface/optimum-habana/tree/main/examples/object-detection)</li> |
| Mllama | <div style="text-align:left"><li>LoRA</li></div> || <li>[image to text](https://github.com/huggingface/optimum-habana/tree/main/examples/image-to-text)</li> |

- Diffusers

Expand Down
4 changes: 2 additions & 2 deletions docs/source/package_reference/gaudi_config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ Here is a description of each configuration parameter:
- `use_fused_adam` enables to decide whether to use the [custom fused implementation of the ADAM optimizer provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#custom-optimizers).
- `use_fused_clip_norm` enables to decide whether to use the [custom fused implementation of gradient norm clipping provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#other-custom-ops).
- `use_torch_autocast` enables PyTorch autocast; used to define good pre-defined config; users should favor `--bf16` training argument
- `autocast_bf16_ops` list of operations that should be run with bf16 precision under autocast context; using environment flag LOWER_LIST is a preffered way for operator autocast list override
- `autocast_fp32_ops` list of operations that should be run with fp32 precision under autocast context; using environment flag FP32_LIST is a preffered way for operator autocast list override
- `autocast_bf16_ops` list of operations that should be run with bf16 precision under autocast context; using environment flag PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST is a preffered way for operator autocast list override
- `autocast_fp32_ops` list of operations that should be run with fp32 precision under autocast context; using environment flag PT_HPU_AUTOCAST_FP32_OPS_LIST is a preffered way for operator autocast list override


You can find examples of Gaudi configurations in the [Habana model repository on the Hugging Face Hub](https://huggingface.co/habana). For instance, [for BERT Large we have](https://huggingface.co/Habana/bert-large-uncased-whole-word-masking/blob/main/gaudi_config.json):
Expand Down
30 changes: 17 additions & 13 deletions examples/image-classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ pip install -r requirements.txt
Here we show how to fine-tune a Vision Transformer (`ViT`) on Cifar10:

```bash
python run_image_classification.py \
PT_HPU_LAZY_MODE=0 python run_image_classification.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--dataset_name cifar10 \
--output_dir /tmp/outputs/ \
Expand All @@ -51,10 +51,11 @@ python run_image_classification.py \
--save_total_limit 3 \
--seed 1337 \
--use_habana \
--use_lazy_mode \
--use_hpu_graphs_for_inference \
--use_lazy_mode False \
--torch_compile_backend hpu_backend \
--torch_compile \
--gaudi_config_name Habana/vit \
--throughput_warmup_steps 3 \
--throughput_warmup_steps 6 \
--dataloader_num_workers 1 \
--bf16
```
Expand Down Expand Up @@ -92,16 +93,17 @@ root/cat/[...]/asd932_.png
In other words, you need to organize your images in subfolders, based on their class. You can then run the script like this:

```bash
python run_image_classification.py \
PT_HPU_LAZY_MODE=0 python run_image_classification.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--train_dir <path-to-train-root> \
--output_dir /tmp/outputs/ \
--remove_unused_columns False \
--do_train \
--do_eval \
--use_habana \
--use_lazy_mode \
--use_hpu_graphs_for_inference \
--use_lazy_mode False \
--torch_compile_backend hpu_backend \
--torch_compile \
--gaudi_config_name Habana/vit \
--throughput_warmup_steps 3 \
--dataloader_num_workers 1 \
Expand Down Expand Up @@ -184,7 +186,7 @@ python run_image_classification.py \
Here is how you would fine-tune ViT on Cifar10 using 8 HPUs:

```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=0 python ../gaudi_spawn.py \
--world_size 8 --use_mpi run_image_classification.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--dataset_name cifar10 \
Expand All @@ -203,8 +205,9 @@ python ../gaudi_spawn.py \
--save_total_limit 3 \
--seed 1337 \
--use_habana \
--use_lazy_mode \
--use_hpu_graphs_for_inference \
--use_lazy_mode False \
--torch_compile_backend hpu_backend \
--torch_compile \
--gaudi_config_name Habana/vit \
--throughput_warmup_steps 8 \
--dataloader_num_workers 1 \
Expand All @@ -224,7 +227,7 @@ For Swin, you need to change/add the following arguments:
Similarly to multi-HPU training, here is how you would fine-tune ViT on Cifar10 using 8 HPUs with DeepSpeed:

```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=0 python ../gaudi_spawn.py \
--world_size 8 --use_deepspeed run_image_classification.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--dataset_name cifar10 \
Expand All @@ -243,8 +246,9 @@ python ../gaudi_spawn.py \
--save_total_limit 3 \
--seed 1337 \
--use_habana \
--use_lazy_mode \
--use_hpu_graphs_for_inference \
--use_lazy_mode False \
--torch_compile_backend hpu_backend \
--torch_compile \
--gaudi_config_name Habana/vit \
--throughput_warmup_steps 3 \
--dataloader_num_workers 1 \
Expand Down
Loading

0 comments on commit c94329e

Please sign in to comment.