# collective_matmul

This unit test composes two back-to-back GEMM layers (FC1 and FC2 of LLM MLP). FC1 does AG+GEMM, and FC2 does GEMM+RS.

## Running examples

### 175B config

`python collective_matmul.py --dp 2 --tp 4`

You can change dp (Data Parallel) and tp (Tensor Model Parallel) by simply giving differen numbre to above commandline. 

To run baseline (i.e., no overlapping), add `--no_tp_overlap` in the commandline.

### 5B config

`python collective_matmul.py --batch_size 4 --hidden_size 4096`

DP, TP, and overlapping arguments are configured in the same way as 175B.