Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSDP Compatible, Sequential SparseGPT #1947

Merged
merged 3 commits into from
Jan 10, 2024
Merged

FSDP Compatible, Sequential SparseGPT #1947

merged 3 commits into from
Jan 10, 2024

Conversation

Satrat
Copy link

@Satrat Satrat commented Jan 10, 2024

Adding back the sequential_update flag to SparseGPTModifier and WandaModifier. Previously this flag affected whether or not we calibrate modules within a transformers block sequentially. Regardless of this flag we would always perform OBCQ sequentially across different transformer blocks.

In the updated FSDP compatible implementation, transformers blocks are calibrated in parallel with no option for sequential calibration. The new sequential_update flag affects whether transformer blocks are processed sequentially, modules within blocks are always calibrated in parallel. Note that running sequential updates requires a lot more computation, as the whole model forward pass it run num_calibration_samples times for each transformer block

@Satrat Satrat marked this pull request as ready for review January 10, 2024 19:35
@Satrat Satrat requested review from rahul-tuli and bfineran January 10, 2024 19:41
@Satrat Satrat merged commit 80983e5 into sgpt_fsdp Jan 10, 2024
@Satrat Satrat deleted the sgpt_sequential branch January 10, 2024 20:32
bfineran pushed a commit that referenced this pull request Jan 11, 2024
* initial recipe re-loading

* loading for input recipe

* persist structure across recipe loads

* clean up fn names

* clean up duplicated code

* delete extra file

* unit tests

* fix failing test

* quantization edge cases

* quant tests

* fixes for stage name clashes

* clean up documentation

* setup StageRunner class

* running one_shot from text_gen script

* cleanup helper fns

* precision support

* formatting

* WIP for alternating

* fixing device issue

* MVP for alternating flows

* add apply flag during finalization as well

* clarity comments

* clean up docstrings

* fix unit test

* WIP FSDP support

* fix for unwrapping

* WIP for state reloading between stages

* example fsdp config updates

* add finetuning README

* fix for 2nd oneshot stage

* cleaning up stage logic

* mvp for single GPU fsdp

* WIP obcq FSDP

* quality

* sgpt wrapper

* clean up on finalize, improve logging

* cleaning up device args

* merge alternating

* WIP alternating

* fsdp compatible, training loss issue

* fix for loss bug

* fixing checkpoint issue

* fix for quantization saving

* cleanup after merge

* unit test fix

* clean up logging

* move FSDP to helper files

* update docstrings, clean up

* fix circular import

* unmodify example

* fix typo!

* setup FSDP for when starting from oneshot

* update setup and readme

* fix CLI issue, update README

* POC for sequential FSDP OBCQ (#1947)

* fix GHA line lost in merge

* fix calib loading

* fix dependencies

* reverting OBCQ merged changes for now

* restore SparseCausalModel for now

* add progress bar for calibration forward pass (#1950)

---------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants