Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add ExpertParallel Mixture-of-Experts Plugin (#99)
* initial commit Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * include prepare_scattermoe Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fixes and add scenarios-moe. Allow gradient_accum=null mode Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * missed out on CONTENTS.yaml Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * update readme, code cleanup, add comments and initial bench Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * more cleanup and update pf bench Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add more comments and minor refactoring Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * finish up comments Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add padding free to granite moe Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fmt and lint. Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * install workflow + more fmt + fix test Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * go back to dtensors for sharded checkpoints Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add scattermoe checkpoint restorer utility Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fmt + lint Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * more cleanup Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * improved documention on state dict inferernce Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add more test on inferring checkpoint metadat Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * update configs for mixtral Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * update granite configs Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fix readme and update GraniteMoE to FOAK Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * commit benches Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> --------- Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
- Loading branch information