Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Update lm_eval version (#1473)" except README.md #1581

Closed
wants to merge 40 commits into from
Closed
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
8695024
Fixes in unify_measurements (#1496)
HolyFalafel Dec 3, 2024
946a3e4
[wav2vec2] Remove tensor.item and dynamic slicing operations in the l…
chaojun-zhang Dec 3, 2024
267ace3
Fix lm_eval script for starcoder and gemma (#1463)
skavulya Dec 3, 2024
7d4385b
Add option to use bf16 in PT sdp (#5) (#1514)
astachowiczhabana Dec 3, 2024
6d39b3d
Fix tests.test_peft_inference failure (#1543)
sywangyi Dec 3, 2024
50eb073
Update lm_eval version (#1473)
alexey-belyakov Dec 3, 2024
f0452dc
Fix bad import in Baichuan code (#1547)
regisss Dec 3, 2024
553719a
Restore performance in generate (#1546)
ugolowic Dec 3, 2024
7ea6a54
Enable pyTorch-IMage-Models (TIMM) with HPUs (#1459)
ZhengHongming888 Dec 3, 2024
ed15601
Add HF login for 8x Gaudi2 CI
regisss Dec 3, 2024
7fa9d4e
Adding support for Context Parallelism using Deepseed's DistributedAt…
bhargaveede Dec 3, 2024
5d05aef
Fix Llama CI
regisss Dec 3, 2024
eaeda1e
Add DynamicMoE support for Mixtral (#1511)
kwisniewski98 Dec 3, 2024
c1bb5a5
Fix for llava models not generating text with test failures in 1.19 (…
tthakkal Dec 3, 2024
d49ca3b
Refactor KV cache, Rope , reduce common code (#1148)
abhilash1910 Dec 3, 2024
d653394
Adjust Qwen2-7B test case (#1551)
Wei-Lin-Intel Dec 4, 2024
8a9708a
[run_lm_eval.py] Fixed too many print dump json info (#1553)
FocusLuo Dec 4, 2024
c4738b8
Fix for single_card llama7b and falcon40b CI errors (#1549)
MohitIntel Dec 4, 2024
297b605
Implemented fusedSDPA for stable diffusion (#36) (#1545)
astachowiczhabana Dec 4, 2024
c0446ec
Apply --sdp_on_bf16 to image-to-text examples (#1557)
schoi-habana Dec 5, 2024
9e312ff
Fix accuracy regression in Gemma (#1556)
skavulya Dec 5, 2024
0710aa1
Fix FusedSDPA wrapper from TransformerEngine (#1562)
pbielak Dec 5, 2024
e0cbfe3
Run albert-xxlarge-v1 CI as torch.compile mode (#1563)
yeonsily Dec 6, 2024
5ed877e
Update README commands for the models to use --sdp_on_bf16 (#1566)
yeonsily Dec 6, 2024
4a586a5
Minicpm patch (#1567)
pi314ever Dec 6, 2024
c5a8d42
Updated gemma_2b_it CI (#1561)
Luca-Calabria Dec 6, 2024
9555191
Fixed Adalora Test for OH 1.15 (#1564)
npiroozan Dec 6, 2024
1814a06
Fixed LORACP Test for OH 1.15 (#1568)
npiroozan Dec 6, 2024
010fc96
Add requirements.txt
regisss Dec 6, 2024
ced1c8a
Update the baseline for 1.18 to reflect performance in 1.19 (#1571)
emascarenhas Dec 6, 2024
e222934
fusedsdpa for stable diffusion xl (#1565)
skaulintel Dec 6, 2024
cf7d24a
Fix prefix llama ci failure (#1570)
sywangyi Dec 6, 2024
d632cc9
Add sdp_on_bf16 to tests,text-gen (#1559)
hsubramony Dec 6, 2024
5d70886
Fix mllama test (#1569)
sywangyi Dec 6, 2024
fdc79d4
Fix lazy_mode assignment (#1558)
vidyasiv Dec 6, 2024
abb4fca
Fix diffusers import (#1574)
skaulintel Dec 8, 2024
9498c7c
Update README commands for more models to use --sdp_on_bf16 (#1575)
yeonsily Dec 8, 2024
a1b41b6
Revert "Update lm_eval version (#1473)" except README.md
shepark Dec 9, 2024
2306054
Remove trailing space
regisss Dec 9, 2024
e972fa5
Merge remote-tracking branch 'optimum-habana/main' into revert_lmeval…
regisss Dec 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/slow_tests_gaudi2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ jobs:
--net=host \
--ipc=host \
vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest \
/bin/bash tests/ci/slow_tests_8x.sh
/bin/bash tests/ci/slow_tests_8x.sh ${{ secrets.TEXT_GENERATION_CI_HUB_TOKEN }}
single-card:
name: Test single-card models
if: ${{ !cancelled() && (success() || failure()) }}
Expand Down
3 changes: 2 additions & 1 deletion examples/contrastive-image-text/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,8 @@ python ../gaudi_spawn.py --use_mpi --world_size 8 run_bridgetower.py \
--dataloader_num_workers 1 \
--mediapipe_dataloader \
--distribution_strategy fast_ddp \
--trust_remote_code
--trust_remote_code \
--sdp_on_bf16
```

> `--mediapipe_dataloader` only works on Gaudi2.
Expand Down
52 changes: 34 additions & 18 deletions examples/image-to-text/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,63 +44,71 @@ python3 run_pipeline.py \
--model_name_or_path Salesforce/blip-image-captioning-large \
--image_path "https://ankur3107.github.io/assets/images/image-captioning-example.png" \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

To run Llava-1.5-7b inference, use the following command:
```bash
python3 run_pipeline.py \
--model_name_or_path llava-hf/llava-1.5-7b-hf \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

To run Llava-1.5-13b inference, use the following command:
```bash
python3 run_pipeline.py \
--model_name_or_path llava-hf/llava-1.5-13b-hf \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

To run Llava-v1.6-mistral-7b inference, use the following command:
```bash
python3 run_pipeline.py \
--model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

To run Llava-v1.6-vicuna-13b inference, use the following command:
```bash
python3 run_pipeline.py \
--model_name_or_path llava-hf/llava-v1.6-vicuna-13b-hf \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

To run Llava-hf/llava-v1.6-34b-hf inference, use the following command:
```bash
python3 run_pipeline.py \
--model_name_or_path llava-hf/llava-v1.6-34b-hf \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

To run google/paligemma-3b-mix-224 inference, use the following command:
```bash
python3 run_pipeline.py \
--model_name_or_path google/paligemma-3b-mix-224 \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

To run Llava-hf/llama3-llava-next-8b-hf inference, use the following command:
```bash
python3 run_pipeline.py \
--model_name_or_path llava-hf/llama3-llava-next-8b-hf \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

To run idefics2 inference, use the following command:
Expand All @@ -109,16 +117,18 @@ To run idefics2 inference, use the following command:
python3 run_pipeline.py \
--model_name_or_path HuggingFaceM4/idefics2-8b \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

To run mllama inference, use the following command:
To run mllama inference using reduced precision in the SDPA, use the following command:

```bash
python3 run_pipeline.py \
--model_name_or_path meta-llama/Llama-3.2-11B-Vision-Instruct \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

### Inference with FP8
Expand All @@ -133,16 +143,18 @@ QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_pipeline.py \
--model_name_or_path llava-hf/llava-1.5-7b-hf \
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

Here is an example to quantize the model based on previous measurements for Llava-1.5-7b:
```bash
QUANT_CONFIG=./quantization_config/maxabs_quant.json python run_pipeline.py \
QUANT_CONFIG=./quantization_config/maxabs_quant_scale_format_const.json python run_pipeline.py \
--model_name_or_path llava-hf/llava-1.5-7b-hf \
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```


Expand All @@ -152,7 +164,8 @@ QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_pipeline.py \
--model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

Here is an example to quantize the model based on previous measurements for Llava-v1.6-mistral-7b:
Expand All @@ -161,7 +174,8 @@ QUANT_CONFIG=./quantization_config/maxabs_quant_scale_format_const.json python r
--model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

Here is an example to measure the tensor quantization statistics on Llava-v1.6-vicuna-13b:
Expand All @@ -170,7 +184,8 @@ QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_pipeline.py \
--model_name_or_path llava-hf/llava-v1.6-vicuna-13b-hf \
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

Here is an example to quantize the model based on previous measurements for Llava-v1.6-vicuna-13b:
Expand All @@ -179,7 +194,8 @@ QUANT_CONFIG=./quantization_config/maxabs_quant_scale_format_const.json python r
--model_name_or_path llava-hf/llava-v1.6-vicuna-13b-hf \
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
--use_hpu_graphs \
--bf16
--bf16 \
--sdp_on_bf16
```

### Inference with FusedSDPA
Expand Down
9 changes: 9 additions & 0 deletions examples/image-to-text/run_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,11 @@ def main():
action="store_true",
help="Whether to use the key/value cache for decoding. It should speed up generation.",
)
parser.add_argument(
"--sdp_on_bf16",
action="store_true",
help="Allow PyTorch to use reduced precision in the SDPA math backend",
)

args = parser.parse_args()

Expand Down Expand Up @@ -304,6 +309,10 @@ def main():
"flash_attention_recompute": args.flash_attention_recompute,
"limit_hpu_graphs": args.limit_hpu_graphs,
}

if args.sdp_on_bf16:
torch._C._set_math_sdp_allow_fp16_bf16_reduction(True)

if args.use_kv_cache:
generate_kwargs["use_cache"] = args.use_kv_cache

Expand Down
163 changes: 163 additions & 0 deletions examples/pytorch-image-models/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
<!---
Copyright 2021 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# pyTorch-IMage-Models (TIMM) Examples with HPUs

This directory contains the scripts that showcases how to inference/fine-tune the TIMM models on intel's HPUs with the lazy/graph modes. We support the trainging for single/multiple HPU cards both two. Currently we support several most downloadable models from Hugging Face as below list.

- [timm/resnet50.a1_in1k](https://huggingface.co/timm/resnet50.a1_in1k)
- [timm/resnet18.a1_in1k](https://huggingface.co/timm/resnet18.a1_in1k)
- [timm/resnet18.fb_swsl_ig1b_ft_in1k](https://huggingface.co/timm/resnet18.fb_swsl_ig1b_ft_in1k)
- [timm/wide_resnet50_2.racm_in1k](https://huggingface.co/timm/wide_resnet50_2.racm_in1k)
- [timm/efficientnet_b3.ra2_in1k](https://huggingface.co/timm/efficientnet_b3.ra2_in1k)
- [timm/efficientnet_lite0.ra_in1k](https://huggingface.co/timm/efficientnet_lite0.ra_in1k)
- [timm/efficientnet_b0.ra_in1k](https://huggingface.co/timm/efficientnet_b0.ra_in1k)
- [timm/nf_regnet_b1.ra2_in1k](https://huggingface.co/timm/nf_regnet_b1.ra2_in1k)
- [timm/mobilenetv3_large_100.ra_in1k](https://huggingface.co/timm/mobilenetv3_large_100.ra_in1k)
- [timm/tf_mobilenetv3_large_minimal_100.in1k](https://huggingface.co/timm/tf_mobilenetv3_large_minimal_100.in1k)
- [timm/vit_base_patch16_224.augreg2_in21k_ft_in1k](https://huggingface.co/timm/vit_base_patch16_224.augreg2_in21k_ft_in1k)
- [timm/vgg19.tv_in1k](https://huggingface.co/timm/vgg19.tv_in1k)

## Requirements

First, you should install the pytorch-image-models (Timm):
```bash
git clone https://github.com/huggingface/pytorch-image-models.git
cd pytorch-image-models
pip install .
```

## Single-HPU training

### Using datasets from Hub

Here we show how to fine-tune the [imagenette2-320 dataset](https://huggingface.co/datasets/johnowhitaker/imagenette2-320) and model with [timm/resnet50.a1_in1k](https://huggingface.co/timm/resnet50.a1_in1k) from Hugging Face.

### Training with HPU lazy mode

```bash
python train_hpu_lazy.py \
--data-dir ./ \
--dataset hfds/johnowhitaker/imagenette2-320 \
--device 'hpu' \
--model resnet50.a1_in1k \
--train-split train \
--val-spit train \
--dataset-download
```

python train_hpu_lazy.py --data-dir='./' --dataset hfds/johnowhitaker/imagenette2-320 --device='hpu' --model resnet50.a1_in1k
### Training with HPU graph mode

```bash
python train_hpu_graph.py \
--data-dir ./ \
--dataset hfds/johnowhitaker/imagenette2-320 \
--device 'hpu' \
--model resnet50.a1_in1k \
--train-split train \
--val-spit train \
--dataset-download
```

Here the results for lazy mode is shown below for example:

```bash
Train: 0 [ 0/73 ( 1%)] Loss: 6.86 (6.86) Time: 9.575s, 13.37/s (9.575s, 13.37/s) LR: 1.000e-05 Data: 0.844 (0.844)
Train: 0 [ 50/73 ( 70%)] Loss: 6.77 (6.83) Time: 0.320s, 400.32/s (0.470s, 272.39/s) LR: 1.000e-05 Data: 0.217 (0.047)
Test: [ 0/30] Time: 6.593 (6.593) Loss: 6.723 ( 6.723) Acc@1: 0.000 ( 0.000) Acc@5: 0.000 ( 0.000)
Test: [ 30/30] Time: 3.856 (0.732) Loss: 6.615 ( 6.691) Acc@1: 0.000 ( 0.076) Acc@5: 1.176 ( 3.287)

Train: 1 [ 0/73 ( 1%)] Loss: 6.69 (6.69) Time: 0.796s, 160.74/s (0.796s, 160.74/s) LR: 1.001e-02 Data: 0.685 (0.685)
Train: 1 [ 50/73 ( 70%)] Loss: 3.23 (3.76) Time: 0.160s, 798.85/s (0.148s, 863.22/s) LR: 1.001e-02 Data: 0.053 (0.051)
Test: [ 0/30] Time: 0.663 (0.663) Loss: 1.926 ( 1.926) Acc@1: 46.094 ( 46.094) Acc@5: 85.938 ( 85.938)
Test: [ 30/30] Time: 0.022 (0.126) Loss: 1.462 ( 1.867) Acc@1: 63.529 ( 39.261) Acc@5: 83.529 ( 85.096)

```


## Multi-HPU training

Here we show how to fine-tune the [imagenette2-320 dataset](https://huggingface.co/datasets/johnowhitaker/imagenette2-320) and model with [timm/resnet50.a1_in1k](https://huggingface.co/timm/resnet50.a1_in1k) from Hugging Face.

### Training with HPU lazy mode
```bash
torchrun --nnodes 1 --nproc_per_node 2 \
train_hpu_lazy.py \
--data-dir ./ \
--dataset hfds/johnowhitaker/imagenette2-320 \
--device 'hpu' \
--model resnet50.a1_in1k \
--train-split train \
--val-split train \
--dataset-download
```
### Training with HPU graph mode

```bash
torchrun --nnodes 1 --nproc_per_node 2 \
train_hpu_graph.py \
--data-dir ./ \
--dataset hfds/johnowhitaker/imagenette2-320 \
--device 'hpu' \
--model resnet50.a1_in1k \
--train-split train \
--val-split train \
--dataset-download
```

Here the results for lazy mode is shown below for example:

```bash
Train: 0 [ 0/36 ( 3%)] Loss: 6.88 (6.88) Time: 10.036s, 25.51/s (10.036s, 25.51/s) LR: 1.000e-05 Data: 0.762 (0.762)
Test: [ 0/15] Time: 7.796 (7.796) Loss: 6.915 ( 6.915) Acc@1: 0.000 ( 0.000) Acc@5: 0.000 ( 0.000)
Test: [ 15/15] Time: 1.915 (1.263) Loss: 6.847 ( 6.818) Acc@1: 0.000 ( 0.000) Acc@5: 0.000 ( 0.688)

Train: 1 [ 0/36 ( 3%)] Loss: 6.84 (6.84) Time: 6.687s, 38.28/s (6.687s, 38.28/s) LR: 2.001e-02 Data: 0.701 (0.701)
Test: [ 0/15] Time: 1.315 (1.315) Loss: 2.463 ( 2.463) Acc@1: 14.062 ( 14.062) Acc@5: 48.828 ( 48.828)
Test: [ 15/15] Time: 0.020 (0.180) Loss: 1.812 ( 1.982) Acc@1: 52.326 ( 32.934) Acc@5: 66.279 ( 75.064)

```



## Single-HPU inference

Here we show how to fine-tune the [imagenette2-320 dataset](https://huggingface.co/datasets/johnowhitaker/imagenette2-320) and model with [timm/resnet50.a1_in1k](https://huggingface.co/timm/resnet50.a1_in1k) from Hugging Face.

### HPU with graph mode
```bash
python inference.py \
--data-dir='./' \
--dataset hfds/johnowhitaker/imagenette2-320 \
--device='hpu' \
--model resnet50.a1_in1k \
--split train \
--graph_mode
```

### HPU with lazy mode
```bash
python inference.py \
--data-dir='./' \
--dataset hfds/johnowhitaker/imagenette2-320 \
--device='hpu' \
--model resnet50.a1_in1k \
--split train
```



Loading