-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Distributed inference OOMs on machines with different RAM size #1804
Comments
So the problem here is the pipeline parallel is pretty dumb and assumes each machine has an equal amount of RAM. It divides the model evenly in three sections and the third section is way too big for your 64GB M2 Ultra. We could do something a bit more dynamic based on the machine size to support heterogenous machines. |
is mlx doing its own sharding ? i thought you needed exo for that |
Yes MLX can do distributed inference directly using mx.distributed. RIght now, it's a lower level API than what you can do with Exo. So depends on what you want to do. |
"mpirun -np 3 --hostfile hosts.txt /opt/homebrew/anaconda3/bin/python3 /Users/user/deepseek/pipeline_generate.py --prompt "Hello world"" |
Check out the getting started guide for mx.distributed. Make sure you can run that simple example. If it doesn't work there are some tips on setting up MPI there that can help. If that works then the above should also work.. if it doesn't let us know. |
I have implemented the Getting Started Guide for mx.distributed. When running 'mpirun-np 2-- host host1, host2 python3 pipine_generation. py', there is an issue with the 'group=mx. distributed. int (backend=' mpi ')' in the pipine_generation. py file. Has this script been updated now? Can you guide me on the next steps, thank you. |
Could you share more details about the issue you are seeing? |
mpirun -np 2 python3 pipeline_generate.py Traceback (most recent call last):
|
Maybe try updating MLX? That line should work in a recent version. |
I am using mlx-0.23.1. May I ask what the reason is? Running pipelin_generate.exe always gives an error. Is this the problem with file? |
Describe the bug
Running distributed inference of DeepSeek-R1-3bit on three M2 Ultra machines fails.
Desktop:
To Reproduce
hosts.txt:
Actual behavior
Expected behavior
Each machine loads up about 80-90% of its memory with weights and does not OOM. Inference eventually runs and produces tokens.
Additional context
MPI log:
The text was updated successfully, but these errors were encountered: