[Chatllama] support for accelerate from hf #233

PierpaoloSorbellini · 2023-03-09T09:31:38Z

No description provided.

…core

… class with comments, fix tokenizer truncate and add training scheduler

…e checkpoints of actor and reward model

…et and update model loader with a method to get the path for training statistics

…d reward models training

…ape[1] in PPO loss

apps/accelerate/chatllama/artifacts/config/config.yaml

diegofiori · 2023-03-21T13:54:29Z

apps/accelerate/chatllama/artifacts/config/ds_config.json

+      "min_loss_scale": 1
+  },
+  "zero_optimization": {
+    "stage": 2,


we should try to use stage 3 instead

I would like to, but with stage=3 is not working.

apps/accelerate/chatllama/chatllama/llama_model.py

apps/accelerate/chatllama/artifacts/main.py

apps/accelerate/chatllama/chatllama/rlhf/actor.py

apps/accelerate/chatllama/chatllama/rlhf/trainer.py

[Chatllama] fix typo of discounted_rewards in PPO loss

… into rlhf_accelerate

diegofiori

LGTM!

PierpaoloSorbellini added 7 commits March 8, 2023 14:53

Add accelerate to the trainings

d3a2b67

Fix issue with reward model init

96dbb9e

Add accelerate_enable flag to config.py and yaml

3cbff04

Add consistency check between accelerate and deepspeed

ed7aaed

Fix setup of accelerate when training

fa4c22f

Merge branch 'main' into rlhf_accelerate

f094d8f

Discard the elements from the reward dataset that are impossible to s…

9a929f0

…core

diegofiori mentioned this pull request Mar 12, 2023

[Chatllama]: MultiGPU support for training #254

Open

PierpaoloSorbellini added 3 commits March 13, 2023 10:51

Fix tokenizer ambiguity in actorcritic forward and generate

0fd741e

Add staticmethod for init of tokenizer

00c1242

Merge with main

2884650

PierpaoloSorbellini changed the title ~~support for accelerate from hf~~ [chatllama] support for accelerate from hf Mar 14, 2023

PierpaoloSorbellini changed the title ~~[chatllama] support for accelerate from hf~~ [Chatllama] support for accelerate from hf Mar 14, 2023

PierpaoloSorbellini added 10 commits March 14, 2023 11:38

Add order sorting in dataset generation, refactor of reward and actor…

b48a388

… class with comments, fix tokenizer truncate and add training scheduler

Add cleaning of the dataset for sequence length

4887040

Fix RLHF compatibility with new tokenizeradjustments

35f7aa8

Fix issue with tokenizers pad and eos token in RL

31a274a

Add scheduler to RLHF

c4ae355

QuickFix for sequence mismatch in actor and critic, Fix a bug with th…

882e74f

…e checkpoints of actor and reward model

Add discounted rewards in RLHF

4f0fdc7

Fix actor_rl save path

e44034b

Add indentation in json files

06cb02d

Merge branch 'main' into rlhf_accelerate

960f26d

This was referenced Mar 17, 2023

[Chatllama] RLHF Training - RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method #262

Closed

[Chatllama] Generate Data Error on tutorial #265

Closed

PierpaoloSorbellini and others added 5 commits March 17, 2023 17:09

Adjust for Anthropic dataset training, added training stats logging

48284ce

Add training stats logging

7c2cd43

Remove last n checkpoint

10ca238

Fix a bug in generate() of chatllama, add template to Standford datas…

d3a09d4

…et and update model loader with a method to get the path for training statistics

Fix start_step value after resuming in between two epochs in actor an…

47c9176

…d reward models training

PierpaoloSorbellini and others added 2 commits March 20, 2023 14:04

Add minor fix to deepspeed on reward model

f0f12b8

Fix Deepspeed initialization and checkpoints single GPU

5843195

PierpaoloSorbellini mentioned this pull request Mar 20, 2023

[Chatllama] train chatllama REWARD model using deepspeed ,got:RuntimeError: Found dtype Float but expected Half #275

Open

Fix LLaMA with deepspeed single GPU

8ccfadb

This was referenced Mar 22, 2023

[Chatllama] looks like REWARD MODEL donesn't support OPT #281

Closed

[ChatLlama] Error in the start of OPT1.3B actor pre-training #284

Closed

fix: Fix typo of discounted_rewards.shape[0] to discounted_rewards.sh…

d083816

…ape[1] in PPO loss

diegofiori reviewed Mar 23, 2023

View reviewed changes

diegofiori and others added 6 commits March 23, 2023 14:24

Merge pull request #291 from mountinyy/fix/ppo_typo

7bd3f94

[Chatllama] fix typo of discounted_rewards in PPO loss

Fix comments pre-release

dedf8cf

Merge branch 'main' into rlhf_accelerate

e066feb

Merge branch 'rlhf_accelerate' of https://github.com/nebuly-ai/nebullvm…

f8fcb57

… into rlhf_accelerate

Add fixes from the comunity that have not produced a PR

5ff1c67

Add discounted rewards coefficient to config

7a5c54f

diegofiori approved these changes Mar 23, 2023

View reviewed changes

diegofiori added 2 commits March 23, 2023 16:31

Set padding side to left

14382c3

Set lr value for actor

5854da8

diegofiori merged commit 520c31e into main Mar 23, 2023

diegofiori deleted the rlhf_accelerate branch March 23, 2023 15:35

diegofiori mentioned this pull request Mar 31, 2023

[Chatllama] Actor for llama #236

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Chatllama] support for accelerate from hf #233

[Chatllama] support for accelerate from hf #233

PierpaoloSorbellini commented Mar 9, 2023

diegofiori Mar 21, 2023

PierpaoloSorbellini Mar 23, 2023

diegofiori left a comment

[Chatllama] support for accelerate from hf #233

[Chatllama] support for accelerate from hf #233

Conversation

PierpaoloSorbellini commented Mar 9, 2023

diegofiori Mar 21, 2023

Choose a reason for hiding this comment

PierpaoloSorbellini Mar 23, 2023

Choose a reason for hiding this comment

diegofiori left a comment

Choose a reason for hiding this comment