Neo Quantization Fixes #2724

a-ys · 2025-02-05T23:31:58Z

Description

Various fixes for neo fp8 CI:

Update llm/prepare.py to properly pass in option for FP8 quantization.
Do tokenization from FP8 on our side, like we were doing before with AutoFP8. Delegating this to llmcompressor leads to unneeded tokenization that leads to timeout in CI.
Use lmi-dist-venv to run awq quantization, due to hf transformers version incompatibility.

Type of change

Bug fix (non-breaking change which fixes an issue)

Feature/Issue validation/testing

Repeated local tests with Llama-3.1-8b as in AutoFP8 to llmcompressor migration for FP8 quantization #2701

- for fp8, do dataset prep and tokenization on our side rather than through llmcompressor. prevents timeout. - use lmi-dist venv for awq, due to incompatible hf transformers version

a-ys added 2 commits February 5, 2025 19:23

update neo quantization test for llmcompressor

7d003c8

neo quantization script fixes

7596018

- for fp8, do dataset prep and tokenization on our side rather than through llmcompressor. prevents timeout. - use lmi-dist venv for awq, due to incompatible hf transformers version

a-ys requested review from zachgk and a team as code owners February 5, 2025 23:31

siddvenk approved these changes Feb 6, 2025

View reviewed changes

siddvenk merged commit 772f17f into deepjavalibrary:master Feb 6, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neo Quantization Fixes #2724

Neo Quantization Fixes #2724

a-ys commented Feb 5, 2025 •

edited

Loading

Neo Quantization Fixes #2724

Neo Quantization Fixes #2724

Conversation

a-ys commented Feb 5, 2025 • edited Loading

Description

Type of change

Feature/Issue validation/testing

a-ys commented Feb 5, 2025 •

edited

Loading