int8 dynamic prefill weight only decode #1436

jcaip · 2024-12-18T19:02:59Z

This PR adds in weight_only_decode option to int8_dynamic_activation_int8_weight, which when set will use dynamic quantization for matmuls of shape (> 1, x) * (x, n) and weight only quantization for the batch_size=1 case.

It also updates generate.py to take in a text file for the prompt, we use this to demonstrate these prefill speedups with sh demo_summarize.sh.

Summary: This PR adds in a sparsity option to the LLaMa benchmarks. Test Plan: Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2024-12-18T19:03:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1436

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b144a53 with merge base 567cb46 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This PR adds in weight_only_decode option to int8_dynamic_activation_int8_weight, which when set will use dynamic quantization for matmuls of shape (> 1, x) * (x, n) and weight only quantization for the batch_size=1 case. It also updates generate.py to take in a text file for the prompt, we use this to demonstrate these prefill speedups with sh demo_summarize.sh.

jcaip added 30 commits October 18, 2024 11:05

Add sparsity flag to benchmark

f390fd9

Summary: This PR adds in a sparsity option to the LLaMa benchmarks. Test Plan: Reviewers: Subscribers: Tasks: Tags:

update

67937a9

update

6b62266

fp8 testing

aa4c9df

fp8 testing

6b1ede1

wip

3c07c40

update benchmark script

a6c7de9

update

3660766

wip

ddf2e10

udpate

ad4d3b0

update

653587e

wip

c757357

wip

f1b0841

test

afeaff5

wip

c294765

update

803e9b3

fix

eb18850

wip

2642212

move out of aqt

4eccdb9

wip

13e6fd6

moved float8+24 to it's own file

608d70c

Merge branch 'main' into jcaip/sparse-benchmarking-updates

b1f1796

update

30a4fac

wip

6091592

remove float8 for now

17f9121

wip

75d0a0b

fix

b2fba99

fix

ba5665d

time prefill by default

4fdfa7b

update

111babc

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 18, 2024

jcaip added 19 commits December 25, 2024 09:21

update

38d60c7

update

97cca7a

merge main

525053b

fix merge confligt

4da1b31

demo

2517406

update

5b8a28c

update generate

e25b30c

moved summarization to standalone script

a58e0fd

update

ea5cb0c

update weight only decode flag

17a191a

remove prompt.txt

8899435

cleanup

a3056ff

remove moby.txt

67a1a35

update

1554a8c

update

5161364

update

562191f

update benchmars

bf18806

rename arg

89f03d8

update demo script

ce58e1e

jcaip changed the title ~~Jcaip/prefill 24 sparse benchmarking~~ int8 dynamic prefill weight only decode Dec 30, 2024

formatting

b144a53

jcaip added topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) topic: performance Use this tag if this PR improves the performance of a feature labels Dec 30, 2024

jcaip requested review from jerryzh168 and HDCharles December 30, 2024 18:36

drisspg approved these changes Dec 30, 2024

View reviewed changes

jcaip merged commit 52b6f4d into main Dec 30, 2024
19 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int8 dynamic prefill weight only decode #1436

int8 dynamic prefill weight only decode #1436

jcaip commented Dec 18, 2024 •

edited

Loading

pytorch-bot bot commented Dec 18, 2024 •

edited

Loading

int8 dynamic prefill weight only decode #1436

int8 dynamic prefill weight only decode #1436

Conversation

jcaip commented Dec 18, 2024 • edited Loading

pytorch-bot bot commented Dec 18, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1436

✅ No Failures

jcaip commented Dec 18, 2024 •

edited

Loading

pytorch-bot bot commented Dec 18, 2024 •

edited

Loading