Skip to content

Commit

Permalink
fixes to readme and tox
Browse files Browse the repository at this point in the history
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
  • Loading branch information
fabianlim committed Aug 23, 2024
1 parent e00fcd0 commit cd9db22
Show file tree
Hide file tree
Showing 4 changed files with 29 additions and 8 deletions.
29 changes: 26 additions & 3 deletions plugins/accelerated-moe/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,37 @@ Plugin | Description | Depends | Loading | Augmentation | Callbacks

## Running Benchmarks

Run the below in the top-level directory of this repo:
- the `megablocks` dep is not included by default, so the `-x` switch installs it.

```
tox -e run-benches \
-x testenv:run-benches.deps+="-r plugins/accelerated-moe/requirements-mb.txt" \
-- \
8 8 benchmark_outputs scenarios.yaml accelerated-moe-megablocks
```

NOTE: if `FileNotFoundError` is observed on the *triton cache*, similar to issues like these:
- https://github.com/triton-lang/triton/issues/2688

then somehow `tox` is causing problems with triton and multiprocessing (there is some race condition).
But the workaound is to first *activate the tox env* and
running in `bash`:
```
tox -e run-benches -- 8 8 scenarios.yaml accelerated-moe-megablocks
# if FileNotFoundError in the triton cache is observed
# - then activate the env and run the script manually
source .tox/run-benches/bin/activate
bash scripts/run_benchmarks.sh \
8 8 benchmark_outputs scenarios.yaml accelerated-moe-megablocks
```


## Expert-Parallel MoE with Megablocks

Not all of the features of `megablocks` are being incorporated; listing down some of the restrictions of the current integration:
- curretnly not passing the data parallel `dp_mesh` to the `FSDP` constructor, so `FSDP` will always shard over the default process group (over world_size).
- currently not passing the data parallel `dp_mesh` to the `FSDP` constructor, so `FSDP` will always shard over the default process group (over world_size).
- now support only loading *sharded* `safetensor` non-GGUF MoE checkpoints. This is a reasonable assumption since MoE checkpoints are typically above the size limit that prevents it being saved into a single checkpoint filed.
- only supports the *dropless sparse* MLPs in the megablocks package; the other variations like non-dropless and grouped computes are not currently integrated.
- the `shard_moe` may not scale well with larger models as the current implementation `torch.concat` all the expert weights together before passing to `torch.distributed` to be sharded. This is redundently done in all devices, so it is inefficient.
Expand All @@ -34,5 +57,5 @@ Currently databricks megablocks does not have a PyPi repository and no proper re
```
# this will install the megablocks from Github
# megablocks requires CUDA Toolkit to build.
pip install -r requirements_mb.txt
pip install -r requirements-mb.txt
```
3 changes: 3 additions & 0 deletions plugins/accelerated-moe/requirements-mb.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
megablocks @ git+https://github.com/databricks/megablocks.git@bce5d7b2aaf5038bc93b36f76c2baf51c2939bd2

# auto_gptq @ git+https://github.com/AutoGPTQ/AutoGPTQ.git@ea829c7bbe83561c2b1de26795b6592992373ef7

This file was deleted.

4 changes: 0 additions & 4 deletions tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,6 @@ commands =
python -m fms_acceleration.cli install -e {toxinidir}/plugins/attention_and_distributed_packing
python -m fms_acceleration.cli install -e {toxinidir}/plugins/accelerated-moe

# need to install some optional dependencies
# - the megablocks dependency
pip install -r {toxinidir}/plugins/accelerated-moe/requirements-mb.txt

# run the benchmark script
bash scripts/run_benchmarks.sh {posargs:"1 2" "4 8" benchmark_outputs}

Expand Down

0 comments on commit cd9db22

Please sign in to comment.