Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of models and trainers with base class for common methods #306

Open
wants to merge 42 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
2b8e301
Refactor models and trainers with base_class for common methods
PierpaoloSorbellini Mar 27, 2023
5e0ded8
Revert "Release ChatLLaMA 0.0.4"
PierpaoloSorbellini Mar 27, 2023
3fa5c53
Merge branch 'main' of https://github.com/nebuly-ai/nebullvm into main
PierpaoloSorbellini Mar 27, 2023
ab1f09e
Refactor of models and trainers with base class for common methods
PierpaoloSorbellini Mar 27, 2023
3d54d50
Fix comments and values in the config.yaml
PierpaoloSorbellini Mar 27, 2023
9f5eab4
Add load 8 bit from HF
PierpaoloSorbellini Mar 27, 2023
dc46ee4
Add check on load int 8
PierpaoloSorbellini Mar 27, 2023
c1d03d3
Add Reward and Critic support for LoRA PEFT
PierpaoloSorbellini Mar 28, 2023
36c350d
Add SelfInstruct Dataset from HF
PierpaoloSorbellini Mar 28, 2023
bb92ee7
Fix imports
Mar 28, 2023
6fc94d3
Add logging with proper class
Mar 29, 2023
dc2489f
Fix logs for deepspeed
Mar 30, 2023
0b0795d
Fix early logs with multi-GPUs
Mar 30, 2023
01be6dc
Fix MultiGPU for accelerate
Mar 30, 2023
13b1abd
Fix batch-size for accelerate
Mar 30, 2023
db8b3c2
Add multi gpu training to readme.md
Mar 30, 2023
d771fb2
Fix fp16 training
Mar 31, 2023
e5f959c
Merge branch 'main' into refactor
PierpaoloSorbellini Mar 31, 2023
d5084e5
Fix Distributed training for RLHF
PierpaoloSorbellini Apr 3, 2023
2ec5eaa
Add new models
PierpaoloSorbellini Apr 3, 2023
33e97e2
Add decapoda models
PierpaoloSorbellini Apr 3, 2023
8332a26
Add unsupported model message
PierpaoloSorbellini Apr 3, 2023
32ddfa2
Change sing to KL div accordingly to issue #298
PierpaoloSorbellini Apr 3, 2023
aa9881c
Fix imports order
PierpaoloSorbellini Apr 3, 2023
b10f1dc
Add cases for lora-peft model loading
PierpaoloSorbellini Apr 4, 2023
86a699b
Merge branch 'refactor' of https://github.com/nebuly-ai/nebullvm into…
PierpaoloSorbellini Apr 4, 2023
1f29ba4
Fix Actor 8bit training
PierpaoloSorbellini Apr 4, 2023
1836788
Adjust code comments to match new adjustments
PierpaoloSorbellini Apr 4, 2023
966a19d
Fix device error when using vanilla pytorch trainig
PierpaoloSorbellini Apr 4, 2023
feacb88
Fix RLHF with fp16
PierpaoloSorbellini Apr 5, 2023
f894494
Move grad scaler into base class
PierpaoloSorbellini Apr 5, 2023
b56185f
Add check on 8bit load and distributed training
PierpaoloSorbellini Apr 5, 2023
5699aaa
Add template to self-instruct dataset
PierpaoloSorbellini Apr 12, 2023
5c83927
Fix checkpoints name in actor training
PierpaoloSorbellini Apr 12, 2023
a205ee6
Fix slow loss computation
PierpaoloSorbellini Apr 12, 2023
bb386c4
Fix checkpoints also in reward models
PierpaoloSorbellini Apr 12, 2023
22a64af
Fix checkpoint for rl
PierpaoloSorbellini Apr 12, 2023
10211c6
Add n_checkpoints for all the training with old checkpoints removal
PierpaoloSorbellini Apr 12, 2023
442b396
Improve datasets quality with reward model negative examples
PierpaoloSorbellini Apr 13, 2023
71a6c02
Merge branch 'main' of https://github.com/nebuly-ai/nebullvm into main
PierpaoloSorbellini Apr 14, 2023
1189787
Merge branch 'main' into refactor
PierpaoloSorbellini Apr 14, 2023
98b96c2
Fix merge issues
PierpaoloSorbellini Apr 14, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions apps/accelerate/chatllama/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -408,6 +408,24 @@ We support 3 different options to prepare the `reward_training_data`:
- **(⚠️WIP)** Few examples provided by the user and dataset synthetically expanded using LLM
</details>

## Single-Node Multi-GPU Training
Currently chatllama supports [Accelerate](https://github.com/huggingface/accelerate) and [DeepSpeed](https://github.com/microsoft/DeepSpeed) for multi-GPU training.
To run a distributed training you need to enable one of them in your `/artifacts/config/config.yaml` file by setting either
`deepspeed_enable` or `accelerate_enable` to `True` or `False`. <br />
Each type of training (i.e. reward model training, actor supervised fine-tuning, RLHF) has its own flags to tweak.
Deepspeed settings can be customised using the `/artifacts/config/ds_config.yaml` file, while accelerate can be configured by running
```bash
accelerate config
```
from the command line.
Once the project is configured, the trainin must be started using:
```bash
deepspeed artifacts/main.py artifacts/config/config.yaml --type <type_of_training>
```
or
```bash
accelerate launch artifacts/main.py artifacts/config/config.yaml --type <type_of_training>
```
# License

See the [LICENSE](https://github.com/nebuly-ai/nebullvm/blob/main/apps/accelerate/chatllama/LICENSE) file.
22 changes: 16 additions & 6 deletions apps/accelerate/chatllama/artifacts/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,18 @@ trainer_config:
# here specify the name of the actor_rl checkpoint from which resume
# during actor RL training. If null load the last one.
checkpoint_name: null
deepspeed_enable: False
deepspeed_config_path: "artifacts/config/ds_config.json"
accelerate_enable: False

actor_config:
model: "facebook/opt-1.3b"
model: "facebook/opt-125m"
load_8bit: False
model_folder: "./models"
tokenizer_path: "path-to-tokenizer"
train_dataset_path: "./datasets/actor_training_data.json"
validation_dataset_path: null
# froze model embedding during training
# froze model embedding during training (only for llama)
froze_embeddings: True
# use fairscale layers to build the model instead of vanilla pytorch
# only for llama
Expand All @@ -51,7 +55,7 @@ actor_config:
additonal_prompt_tokens: 20
# temperature for the actor
temperature: 0.1
batch_size: 2
batch_size: 1
# number iteration after print
iteration_per_print: 1
lr: 0.000009
Expand All @@ -78,34 +82,40 @@ reward_config:
# more can be simply added in the reward.py __init__()
model: "facebook/opt-125m"
model_folder: "./models"
load_8bit: False
# hidden size of the additional ffw head to produce the scores
model_head_hidden_size: 2048
max_sequence_length: 2048
train_dataset_path: "./datasets/reward_training_data.json"
validation_dataset_path: null
batch_size: 8
batch_size: 1
epochs: 1
iteration_per_print: 1
# steps after which the checkpoint are saved
checkpoint_steps: 10000
checkpoint_steps: 200
# here specify the name of the reward checkpoint from which resume
# during reward training. If null load the last one.
checkpoint_name: null
lr: 0.000009
# deepspeed settings
deepspeed_enable: False
deepspeed_enable: True
deepspeed_config_path: "./artifacts/config/ds_config.json"
# accelerate settings
accelerate_enable: False
peft_enable: False
peft_config_path: "./artifacts/config/peft_config.yaml"

critic_config:
# model to be chosen are gp2-large, bart-base, longformer-base-4096
# more can be simply added in the reward.py __init__()
model: "facebook/opt-125m"
load_8bit: False
# hidden size of the additional ffw head to produce the scores
model_head_hidden_size: 2048
max_sequence_length: 2048
model_folder: "./models"
# here specify the name of the critic checkpoint from which resume
# during critic training. If null load the last one.
checkpoint_name: null
peft_enable: True
peft_config_path: "./artifacts/config/peft_config.yaml"
6 changes: 6 additions & 0 deletions apps/accelerate/chatllama/artifacts/config/ds_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,11 @@
"stage3_gather_16bit_weights_on_model_save": true,
"ignore_unused_parameters": true,
"round_robin_gradients": true
},
"comms_logger": {
"enabled": false,
"verbose": false,
"prof_all": false,
"debug": false
}
}
17 changes: 14 additions & 3 deletions apps/accelerate/chatllama/artifacts/download_dataset.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
import argparse
import os

from chatllama.rlhf.dataset import AnthropicRLHF, StanfordNLPSHPDataset
from chatllama.rlhf.dataset import (
AnthropicRLHF,
SelfInstruct,
StanfordNLPSHP,
)


if __name__ == "__main__":
Expand All @@ -15,7 +19,7 @@
parser.add_argument(
"dataset_name",
help="dataset name it can be. SSHP: stanfordnlp/SHP or ",
choices=["SHP", "ARLHF"],
choices=["SHP", "ARLHF", "SI"],
)
parser.add_argument(
"-p",
Expand All @@ -40,7 +44,7 @@
raise ValueError("Number of samples should be an integer")

if args.dataset_name == "SHP":
dataset = StanfordNLPSHPDataset()
dataset = StanfordNLPSHP()
dataset.save_dataset(args.path, n_samples)

elif args.dataset_name == "ARLHF":
Expand All @@ -49,3 +53,10 @@
args.path,
n_samples,
)
elif args.dataset_name == "SI":
dataset = SelfInstruct()
dataset.save_dataset(
args.path,
n_samples,
)

15 changes: 10 additions & 5 deletions apps/accelerate/chatllama/artifacts/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

from chatllama.rlhf.actor import ActorTrainer
from chatllama.rlhf.config import Config
from chatllama.rlhf.dataset import BaseDataset
from chatllama.rlhf.reward import RewardTrainer
from chatllama.rlhf.trainer import RLTrainer

Expand Down Expand Up @@ -31,7 +30,16 @@
parser.add_argument(
"-r", "--reward", help="Specify reward model by name", default=None
)
parser.add_argument("--local_rank", help="Local rank parameter for deepspeed", default=None)

parser.add_argument(
"--local_rank",
type=int,
default=-1,
help="local rank passed from distributed launcher",
)

# Include DeepSpeed configuration arguments
# parser = deepspeed.add_config_arguments(parser)

# parse arguments
args = parser.parse_args()
Expand All @@ -53,15 +61,12 @@
config.critic.max_sequence_length,
)
config.actor.max_sequence_length = max_seq
BaseDataset.clean_dataset(config)
rlhf_trainer = RLTrainer(config)
rlhf_trainer.train()
elif args.type == "ACTOR":
BaseDataset.clean_dataset(config.actor)
actor_trainer = ActorTrainer(config.actor)
actor_trainer.train()
elif args.type == "REWARD":
BaseDataset.clean_dataset(config.reward)
reward_trainer = RewardTrainer(config.reward)
reward_trainer.train()
elif args.type == "ALL":
Expand Down
Loading