Skip to content

Commit

Permalink
adds no-grad context for reference model to DPO (#473)
Browse files Browse the repository at this point in the history
  • Loading branch information
rasbt authored Jan 8, 2025
1 parent 2d7ca7e commit a48f9c7
Showing 1 changed file with 13 additions and 11 deletions.
24 changes: 13 additions & 11 deletions ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2149,16 +2149,18 @@
" labels=batch[\"rejected\"],\n",
" selection_mask=batch[\"rejected_mask\"]\n",
" )\n",
" ref_chosen_log_probas = compute_logprobs(\n",
" logits=reference_model(batch[\"chosen\"]),\n",
" labels=batch[\"chosen\"],\n",
" selection_mask=batch[\"chosen_mask\"]\n",
" )\n",
" ref_rejected_log_probas = compute_logprobs(\n",
" logits=reference_model(batch[\"rejected\"]),\n",
" labels=batch[\"rejected\"],\n",
" selection_mask=batch[\"rejected_mask\"]\n",
" )\n",
" \n",
" with torch.no_grad():\n",
" ref_chosen_log_probas = compute_logprobs(\n",
" logits=reference_model(batch[\"chosen\"]),\n",
" labels=batch[\"chosen\"],\n",
" selection_mask=batch[\"chosen_mask\"]\n",
" )\n",
" ref_rejected_log_probas = compute_logprobs(\n",
" logits=reference_model(batch[\"rejected\"]),\n",
" labels=batch[\"rejected\"],\n",
" selection_mask=batch[\"rejected_mask\"]\n",
" )\n",
" loss, chosen_rewards, rejected_rewards = compute_dpo_loss(\n",
" model_chosen_logprobs=policy_chosen_log_probas,\n",
" model_rejected_logprobs=policy_rejected_log_probas,\n",
Expand Down Expand Up @@ -3090,7 +3092,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.10.6"
}
},
"nbformat": 4,
Expand Down

0 comments on commit a48f9c7

Please sign in to comment.