Skip to content

Latest commit

 

History

History
19 lines (10 loc) · 583 Bytes

README.md

File metadata and controls

19 lines (10 loc) · 583 Bytes

collective_matmul

This unit test composes two back-to-back GEMM layers (FC1 and FC2 of LLM MLP). FC1 does AG+GEMM, and FC2 does GEMM+RS.

Running examples

175B config

python collective_matmul.py --dp 2 --tp 4

You can change dp (Data Parallel) and tp (Tensor Model Parallel) by simply giving differen numbre to above commandline.

To run baseline (i.e., no overlapping), add --no_tp_overlap in the commandline.

5B config

python collective_matmul.py --batch_size 4 --hidden_size 4096

DP, TP, and overlapping arguments are configured in the same way as 175B.