Update supported dtypes for fp8 (#1573)

pytorch · Jan 17, 2025 · f520c91 · f520c91
1 parent eea4d25
commit f520c91
Showing 1 changed file with 3 additions and 3 deletions.
diff --git a/torchao/quantization/README.md b/torchao/quantization/README.md
@@ -156,7 +156,7 @@ from torchao.quantization import quantize_, float8_weight_only
 quantize_(model, float8_weight_only())
 ```
 
-This API is only tested on H100. Hardware with CUDA compute capability 8.9 or greater is required.
+Supports all dtypes for original weight and activation. This API is only tested on H100. Hardware with CUDA compute capability 8.9 or greater is required.
 
 #### A8W8 Float8 Dynamic Quantization with Tensorwise Scaling
 
@@ -166,7 +166,7 @@ from torchao.quantization import quantize_, float8_dynamic_activation_float8_wei
 quantize_(model, float8_dynamic_activation_float8_weight(granularity=PerTensor()))
 ```
 
-This API is only tested on H100. Hardware with CUDA compute capability 8.9 or greater is required.
+Supports all dtypes for original weight and activation. This API is only tested on H100. Hardware with CUDA compute capability 8.9 or greater is required.
 
 ### A8W8 Float8 Dynamic Quantization with Rowwise Scaling
 
@@ -176,7 +176,7 @@ from torchao.quantization import quantize_, PerRow, float8_dynamic_activation_fl
 quantize_(model, float8_dynamic_activation_float8_weight(granularity=PerRow()))
 ```
 
-This API is only tested on H100. Hardware with CUDA compute capability 8.9 or greater is required.
+Per-row scaling is only supported for bfloat16 weight and activation. This API is only tested on H100. Hardware with CUDA compute capability 8.9 or greater is required.
 
 #### A16W6 Floating Point WeightOnly Quantization