Skip to content

Commit

Permalink
Update supported dtypes for fp8 (#1573)
Browse files Browse the repository at this point in the history
  • Loading branch information
jainapurva authored Jan 17, 2025
1 parent eea4d25 commit f520c91
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions torchao/quantization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ from torchao.quantization import quantize_, float8_weight_only
quantize_(model, float8_weight_only())
```

This API is only tested on H100. Hardware with CUDA compute capability 8.9 or greater is required.
Supports all dtypes for original weight and activation. This API is only tested on H100. Hardware with CUDA compute capability 8.9 or greater is required.

#### A8W8 Float8 Dynamic Quantization with Tensorwise Scaling

Expand All @@ -166,7 +166,7 @@ from torchao.quantization import quantize_, float8_dynamic_activation_float8_wei
quantize_(model, float8_dynamic_activation_float8_weight(granularity=PerTensor()))
```

This API is only tested on H100. Hardware with CUDA compute capability 8.9 or greater is required.
Supports all dtypes for original weight and activation. This API is only tested on H100. Hardware with CUDA compute capability 8.9 or greater is required.

### A8W8 Float8 Dynamic Quantization with Rowwise Scaling

Expand All @@ -176,7 +176,7 @@ from torchao.quantization import quantize_, PerRow, float8_dynamic_activation_fl
quantize_(model, float8_dynamic_activation_float8_weight(granularity=PerRow()))
```

This API is only tested on H100. Hardware with CUDA compute capability 8.9 or greater is required.
Per-row scaling is only supported for bfloat16 weight and activation. This API is only tested on H100. Hardware with CUDA compute capability 8.9 or greater is required.

#### A16W6 Floating Point WeightOnly Quantization

Expand Down

0 comments on commit f520c91

Please sign in to comment.