Skip to content

Commit

Permalink
improve wandb log in
Browse files Browse the repository at this point in the history
  • Loading branch information
xingyaoww committed Jan 23, 2025
1 parent 54621a0 commit 32b9cd7
Show file tree
Hide file tree
Showing 2 changed files with 103 additions and 5 deletions.
104 changes: 99 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,31 @@ pip3 install flash-attn --no-build-isolation
pip install wandb IPython matplotlib
```

## The Task

## Generate Data
```
conda activate zero
python verl/examples/data_preprocess/countdown.py
python examples/data_preprocess/countdown.py
```

## Run Training
```
conda activate zero
```
**Single GPU dry run**
Works for model <= 1.5B

For Qwen2.5-0.5B base, we know it fails to learn reasoning.

```
export CUDA_VISIBLE_DEVICES=7
export N_GPUS=1
export BASE_MODEL=Qwen/Qwen2.5-0.5B
export BASE_MODEL=Qwen/Qwen2.5-1.5B
export DATA_DIR=$HOME/data/countdown
export WANDB_API_KEY=0929e692448f1bc929d71d7e3ece80073c3041e6
export EXPERIMENT_NAME=countdown-qwen2.5-1.5b
PYTHONUNBUFFERE=1 python3 -m verl.trainer.main_ppo \
data.train_files=$DATA_DIR/train.parquet \
data.val_files=$DATA_DIR/test.parquet \
Expand All @@ -57,9 +67,93 @@ PYTHONUNBUFFERE=1 python3 -m verl.trainer.main_ppo \
trainer.default_hdfs_dir=null \
trainer.n_gpus_per_node=$N_GPUS \
trainer.nnodes=1 \
trainer.save_freq=10 \
trainer.save_freq=30 \
trainer.test_freq=10 \
trainer.project_name=zero \
trainer.experiment_name=countdown \
trainer.project_name=TinyZero \
trainer.experiment_name=$EXPERIMENT_NAME \
trainer.total_epochs=15 2>&1 | tee verl_demo.log
```

**3B model dry run**
In this case, the base model is able to develop sophisticated reasoning skills.
```
export CUDA_VISIBLE_DEVICES=4,5
export N_GPUS=2
export BASE_MODEL=Qwen/Qwen2.5-3B
export DATA_DIR=$HOME/data/countdown
export ROLLOUT_TP_SIZE=2
export WANDB_API_KEY=0929e692448f1bc929d71d7e3ece80073c3041e6
export EXPERIMENT_NAME=countdown-qwen2.5-3b
python3 -m verl.trainer.main_ppo \
data.train_files=$DATA_DIR/train.parquet \
data.val_files=$DATA_DIR/test.parquet \
data.train_batch_size=256 \
data.val_batch_size=1312 \
data.max_prompt_length=256 \
data.max_response_length=1024 \
actor_rollout_ref.model.path=$BASE_MODEL \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.actor.ppo_mini_batch_size=128 \
actor_rollout_ref.actor.ppo_micro_batch_size=8 \
actor_rollout_ref.rollout.log_prob_micro_batch_size=8 \
actor_rollout_ref.rollout.tensor_model_parallel_size=$ROLLOUT_TP_SIZE \
actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
actor_rollout_ref.ref.log_prob_micro_batch_size=4 \
critic.optim.lr=1e-5 \
critic.model.path=$BASE_MODEL \
critic.ppo_micro_batch_size=8 \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.logger=['wandb'] \
+trainer.val_before_train=False \
trainer.default_hdfs_dir=null \
trainer.n_gpus_per_node=$N_GPUS \
trainer.nnodes=1 \
trainer.save_freq=30 \
trainer.test_freq=10 \
trainer.project_name=TinyZero \
trainer.experiment_name=$EXPERIMENT_NAME \
trainer.total_epochs=15 2>&1 | tee verl_demo.log
```

**OpenLlama 7B model dry run**
In this case, the base model is able to develop sophisticated reasoning skills.
```
export CUDA_VISIBLE_DEVICES=4,5,6,7
export N_GPUS=4
export EXPERIMENT_NAME=countdown-open_llama_7b
export BASE_MODEL=openlm-research/open_llama_7b_v2
export DATA_DIR=$HOME/data/countdown
export ROLLOUT_TP_SIZE=4
export WANDB_API_KEY=0929e692448f1bc929d71d7e3ece80073c3041e6
python3 -m verl.trainer.main_ppo \
data.train_files=$DATA_DIR/train.parquet \
data.val_files=$DATA_DIR/test.parquet \
data.train_batch_size=256 \
data.val_batch_size=1312 \
data.max_prompt_length=256 \
data.max_response_length=1024 \
actor_rollout_ref.model.path=$BASE_MODEL \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.actor.ppo_mini_batch_size=128 \
actor_rollout_ref.actor.ppo_micro_batch_size=8 \
actor_rollout_ref.rollout.log_prob_micro_batch_size=8 \
actor_rollout_ref.rollout.tensor_model_parallel_size=$ROLLOUT_TP_SIZE \
actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
actor_rollout_ref.ref.log_prob_micro_batch_size=4 \
critic.optim.lr=1e-5 \
critic.model.path=$BASE_MODEL \
critic.ppo_micro_batch_size=8 \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.logger=['wandb'] \
+trainer.val_before_train=False \
trainer.default_hdfs_dir=null \
trainer.n_gpus_per_node=$N_GPUS \
trainer.nnodes=1 \
trainer.save_freq=30 \
trainer.test_freq=10 \
trainer.project_name=TinyZero \
trainer.experiment_name=$EXPERIMENT_NAME \
trainer.total_epochs=15 2>&1 | tee verl_demo.log
```
4 changes: 4 additions & 0 deletions verl/utils/tracking.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ def __init__(self, project_name, experiment_name, default_backend: Union[str, Li

if 'tracking' in default_backend or 'wandb' in default_backend:
import wandb
import os
WANDB_API_KEY = os.environ.get("WANDB_API_KEY", None)
if WANDB_API_KEY:
wandb.login(key=WANDB_API_KEY)
wandb.init(project=project_name, name=experiment_name, config=config)
self.logger['wandb'] = wandb

Expand Down

0 comments on commit 32b9cd7

Please sign in to comment.