Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Benchmarks and Documentation for GraniteCausalLM #86

Merged
merged 8 commits into from
Oct 1, 2024

Conversation

fabianlim
Copy link
Contributor

@fabianlim fabianlim commented Sep 26, 2024

In this PR we update the benchmarks for GraniteCausalLM

  • in addition, the README.md is also updated to describe how a new model can be added in the future
  • NOTE: we did not update the GPTQ results in the bench. will do this possibly at a later time.

Note this PR requires the following dependency updates

  • transformers>=4.45: for GraniteCausalLM
  • accelerate>=0.34.1: required for transformers>=4.45 if GraniteCausalLM is needed.
  • trl > 0.11.1: when using baseline bnb, requires this fix for a bug that was introduced in transformers==4.45 Fix Inconsistency with IsShardedQLoRA Setting huggingface/trl#2089
  • bitsandtbyes==0.43.3: it seems that the later versions give segmentation fault errors

Known issues with quant peft

  • single GPU w/o FOAK
  • single GPU w FOAK -> fused lora dequant problem (this is an issue with the compiled binaries in bitsandbytes 0.43.3, that is not compatible with maybe the CUDA toolkit or torch version)
  • multi GPU w/o FOAK -> rank 1 stuck at prepare_model (this is resolved by disabling low_cpu_mem_mode)
  • multi GPU w FOAK -> meta device problem (see 2 in Distributed Training Problems for QLoRA models with Transformers pre-release 4.45  #83) (this is resolved by disabling low_cpu_mem_mode)
  • bad loss with BNB+FOAK -> (resolved by updating lora fused ops to support bias)

Performance

Overall impressive improvements with kernels.

FULL FT
image

PEFT
image

Quantized Peft (BNB)
image

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
@wynterl
Copy link
Contributor

wynterl commented Sep 27, 2024

awesome, great results @fabianlim

@raghukiran1224
Copy link

Indeed, awesome results @fabianlim !

@fabianlim
Copy link
Contributor Author

fabianlim commented Sep 27, 2024

@wynterl @raghukiran1224 the loss for BNB + fused ops looks problematic. Needs more debugging, Ok i found that its because Granite has a bias in the Linear, but the FOAK kernels do not support bias. This just requires some minor (but tedious) modifications

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants