@@ -15,28 +15,68 @@ limitations under the License.
15
15
-->
16
16
17
17
# Image to Text Examples
18
-
19
- This directory contains a script that showcases how to use the Transformers pipeline API to run image to text task on HPUs.
18
+ This directory contains a script that showcases how to perform image to text generation on Intel® Gaudi® AI Accelerators.
20
19
21
20
## Single-HPU inference
22
21
22
+ Models that have been validated:
23
+ - [ nlpconnect/vit-gpt2-image-captioning] ( https://huggingface.co/nlpconnect/vit-gpt2-image-captioning )
24
+ - [ Salesforce/blip-image-captioning-large] ( https://huggingface.co/Salesforce/blip-image-captioning-large )
25
+ - [ Salesforce/blip-image-captioning-base] ( https://huggingface.co/Salesforce/blip-image-captioning-base )
26
+ - [ llava-hf/llava-1.5-7b-hf] ( https://huggingface.co/llava-hf/llava-1.5-7b-hf )
27
+ - [ llava-hf/llava-1.5-13b-hf] ( https://huggingface.co/llava-hf/llava-1.5-13b-hf )
28
+ - [ llava-hf/llava-v1.6-mistral-7b-hf] ( https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf )
29
+ - [ llava-hf/llava-v1.6-vicuna-7b-hf] ( https://huggingface.co/llava-hf/llava-v1.6-vicuna-7b-hf )
30
+ - [ llava-hf/llava-v1.6-vicuna-13b-hf] ( https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf )
31
+
32
+ ### Inference with BF16
33
+
34
+ To run Salesforce/blip-image-captioning-large inference, use the following command:
23
35
``` bash
24
36
python3 run_pipeline.py \
25
37
--model_name_or_path Salesforce/blip-image-captioning-large \
26
38
--image_path " https://ankur3107.github.io/assets/images/image-captioning-example.png" \
27
39
--use_hpu_graphs \
28
40
--bf16
29
41
```
30
- Models that have been validated:
31
- - [ nlpconnect/vit-gpt2-image-captioning] ( https://huggingface.co/nlpconnect/vit-gpt2-image-captioning )
32
- - [ Salesforce/blip-image-captioning-large] ( https://huggingface.co/Salesforce/blip-image-captioning-large )
33
- - [ Salesforce/blip-image-captioning-base] ( https://huggingface.co/Salesforce/blip-image-captioning-base )
34
42
35
- ### Running with FP8
43
+ To run Llava-1.5-7b inference, use the following command:
44
+ ``` bash
45
+ python3 run_pipeline.py \
46
+ --model_name_or_path llava-hf/llava-1.5-7b-hf \
47
+ --use_hpu_graphs \
48
+ --bf16
49
+ ```
50
+
51
+ To run Llava-1.5-13b inference, use the following command:
52
+ ``` bash
53
+ python3 run_pipeline.py \
54
+ --model_name_or_path llava-hf/llava-1.5-13b-hf \
55
+ --use_hpu_graphs \
56
+ --bf16
57
+ ```
58
+
59
+ To run Llava-v1.6-mistral-7b inference, use the following command:
60
+ ``` bash
61
+ python3 run_pipeline.py \
62
+ --model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
63
+ --use_hpu_graphs \
64
+ --bf16
65
+ ```
36
66
37
- Llava-1.5-7b and Llava-1.5-13b in FP8 are enabled using the Quantization Toolkit (HQT), which provides model measurement and quantization capabilities in PyTorch.
67
+ To run Llava-v1.6-vicuna-13b inference, use the following command:
68
+ ``` bash
69
+ python3 run_pipeline.py \
70
+ --model_name_or_path llava-hf/llava-v1.6-vicuna-13b-hf \
71
+ --use_hpu_graphs \
72
+ --bf16
73
+ ```
74
+
75
+ ### Inference with FP8
38
76
39
- More information on enabling fp8 in SynapseAI is available here:
77
+ Inference for Llava-1.5-7b, Llava-1.5-13b, Llava-v1.6-mistral-7b and Llava-v1.6-vicuna-13b in FP8 precision are enabled using the Quantization Toolkit (HQT), which provides model measurement and quantization capabilities in PyTorch.
78
+
79
+ More information on enabling FP8 in SynapseAI is available here:
40
80
https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Inference_Using_FP8.html
41
81
42
82
Here is an example to measure the tensor quantization statistics on Llava-1.5-7b:
@@ -56,3 +96,65 @@ QUANT_CONFIG=./quantization_config/maxabs_quant.json python run_pipeline.py \
56
96
--use_hpu_graphs \
57
97
--bf16
58
98
```
99
+
100
+
101
+ Here is an example to measure the tensor quantization statistics on Llava-v1.6-mistral-7b:
102
+ ``` bash
103
+ QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_pipeline.py \
104
+ --model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
105
+ --image_path " https://llava-vl.github.io/static/images/view.jpg" \
106
+ --use_hpu_graphs \
107
+ --bf16
108
+ ```
109
+
110
+ Here is an example to quantize the model based on previous measurements for Llava-v1.6-mistral-7b:
111
+ ``` bash
112
+ QUANT_CONFIG=./quantization_config/maxabs_quant.json python run_pipeline.py \
113
+ --model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
114
+ --image_path " https://llava-vl.github.io/static/images/view.jpg" \
115
+ --use_hpu_graphs \
116
+ --bf16
117
+ ```
118
+
119
+ Here is an example to measure the tensor quantization statistics on Llava-v1.6-vicuna-13b:
120
+ ``` bash
121
+ QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_pipeline.py \
122
+ --model_name_or_path llava-hf/llava-v1.6-vicuna-13b-hf \
123
+ --image_path " https://llava-vl.github.io/static/images/view.jpg" \
124
+ --use_hpu_graphs \
125
+ --bf16
126
+ ```
127
+
128
+ Here is an example to quantize the model based on previous measurements for Llava-v1.6-vicuna-13b:
129
+ ``` bash
130
+ QUANT_CONFIG=./quantization_config/maxabs_quant.json python run_pipeline.py \
131
+ --model_name_or_path llava-hf/llava-v1.6-vicuna-13b-hf \
132
+ --image_path " https://llava-vl.github.io/static/images/view.jpg" \
133
+ --use_hpu_graphs \
134
+ --bf16
135
+ ```
136
+
137
+ ### Inference with FusedSDPA
138
+
139
+ Habana FusedSDPA is a fused and optimized implementation of torch.nn.functional.scaled_dot_product_attention() for Gaudi. For more details, refer to [ Gaudi online documentation] ( https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Optimization_in_PyTorch_Models.html?highlight=fusedsdpa#using-fused-scaled-dot-product-attention-fusedsdpa ) . Currently FusedSDPA works with BF16 precision for Llava models.
140
+
141
+ Use the following commands to run Llava-1.5-7b inference with FusedSDPA
142
+ ``` bash
143
+ python3 run_pipeline.py \
144
+ --model_name_or_path llava-hf/llava-1.5-7b-hf \
145
+ --image_path " https://llava-vl.github.io/static/images/view.jpg" \
146
+ --use_hpu_graphs \
147
+ --bf16 \
148
+ --use_flash_attention
149
+ ```
150
+
151
+
152
+ Use the following commands to run Llava-v1.6-mistral-7b inference with FusedSDPA
153
+ ``` bash
154
+ python3 run_pipeline.py \
155
+ --model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
156
+ --image_path " https://llava-vl.github.io/static/images/view.jpg" \
157
+ --use_hpu_graphs \
158
+ --bf16 \
159
+ --use_flash_attention
160
+ ```
0 commit comments