Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
  • Loading branch information
fabianlim committed Aug 23, 2024
1 parent 41cc851 commit 38811c3
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion plugins/accelerated-moe/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,19 @@
This library contains plugins to accelerate finetuning with the following optimizations:
1. Expert-Parallel MoE with Megablocks

## Plugins

Plugin | Description | Depends | Loading | Augmentation | Callbacks
--|--|--|--|--|--
[megablocks](./src/fms_acceleration_moe/framework_plugin_megablocks.py) | MoE Expert Parallel with megablocks | megablocks | ✅ | | ✅


## Running Benchmarks

```
tox -e run-benches -- 8 8 scenarios.yaml accelerated-moe-megablocks
```

## Expert-Parallel MoE with Megablocks

Not all of the features of `megablocks` are being incorporated; listing down some of the restrictions of the current integration:
Expand All @@ -12,7 +25,6 @@ Not all of the features of `megablocks` are being incorporated; listing down som
- the `shard_moe` may not scale well with larger models as the current implementation `torch.concat` all the expert weights together before passing to `torch.distributed` to be sharded. This is redundently done in all devices, so it is inefficient.
- currently only supports `StateDictType.SHARDED_STATE_DICT` because the implementation uses `DTensors` which have limited support for full state dicts. However for efficiency considerations, sharded state dicts are the most efficient.


### Megablocks Dependencies

Currently databricks megablocks does not have a PyPi repository and no proper release, so we have to install directly from Github, refer to instructions below.
Expand Down

0 comments on commit 38811c3

Please sign in to comment.