update readme

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
foundation-model-stack · Aug 23, 2024 · 38811c3 · 38811c3
1 parent 41cc851
commit 38811c3
Showing 1 changed file with 13 additions and 1 deletion.
diff --git a/plugins/accelerated-moe/README.md b/plugins/accelerated-moe/README.md
@@ -3,6 +3,19 @@
 This library contains plugins to accelerate finetuning with the following optimizations:
 1. Expert-Parallel MoE with Megablocks
 
+## Plugins
+
+Plugin | Description | Depends | Loading | Augmentation | Callbacks
+--|--|--|--|--|--
+[megablocks](./src/fms_acceleration_moe/framework_plugin_megablocks.py) | MoE Expert Parallel with megablocks | megablocks | ✅ | |  ✅
+
+
+## Running Benchmarks
+
+```
+tox -e run-benches -- 8 8 scenarios.yaml accelerated-moe-megablocks
+```
+
 ## Expert-Parallel MoE with Megablocks
 
 Not all of the features of `megablocks` are being incorporated; listing down some of the restrictions of the current integration:
@@ -12,7 +25,6 @@ Not all of the features of `megablocks` are being incorporated; listing down som
 - the `shard_moe` may not scale well with larger models as the current implementation `torch.concat` all the expert weights together before passing to `torch.distributed` to be sharded. This is redundently done in all devices, so it is inefficient.
 - currently only supports `StateDictType.SHARDED_STATE_DICT` because the implementation uses `DTensors` which have limited support for full state dicts. However for efficiency considerations, sharded state dicts are the most efficient. 
 
-
 ### Megablocks Dependencies
 
 Currently databricks megablocks does not have a PyPi repository and no proper release, so we have to install directly from Github, refer to instructions below.