Fix typos in docs for Multi Adapter RL (MARL). (#1312)

* Fix more typos * Fix typos in docs.
huggingface · Feb 2, 2024 · ae87b3a · ae87b3a
1 parent 3f7cee7
commit ae87b3a
Showing 1 changed file with 5 additions and 5 deletions.
diff --git a/docs/source/multi_adapter_rl.mdx b/docs/source/multi_adapter_rl.mdx
@@ -1,6 +1,6 @@
 # Multi Adapter RL (MARL) - a single base model for everything
 
-Here we present an approach that uses a single base model for the entire PPO algorithm - which includes retrieving the reference logits, computing the active logits and the rewards. This feature is experimental as we did not tested the convergence of the approach. We encourage the community to let us know if they potentially face into any issue.
+Here we present an approach that uses a single base model for the entire PPO algorithm - which includes retrieving the reference logits, computing the active logits and the rewards. This feature is experimental as we did not test the convergence of the approach. We encourage the community to let us know if they potentially face issues.
 
 ## Requirements
 
@@ -48,7 +48,7 @@ trainer = PPOTrainer(
 
 ...
 ```
-Then inside your PPO training loop, call the `compute_reward_score` method by accessing to the `model` attribute from `PPOTrainer`.
+Then inside your PPO training loop, call the `compute_reward_score` method by accessing the `model` attribute from `PPOTrainer`.
 
 ```python
 rewards = trainer.model.compute_reward_score(**inputs)
@@ -58,8 +58,8 @@ rewards = trainer.model.compute_reward_score(**inputs)
 
 ### Control on the adapter name 
 
-If you are familiar with the `peft` library, you know that you can use multiple adapters inside the same model. What you can do is to train multiple adapters on the same base model to fine-tune on different policies. 
-In this case, you want to have a control on the adapter name you want to activate back, after retrieving the reward. For that, simply pass the appropriate `adapter_name` to `ppo_adapter_name` argument when calling `compute_reward_score`.
+If you are familiar with the `peft` library, you know that you can use multiple adapters inside the same model. What you can do is train multiple adapters on the same base model to fine-tune on different policies. 
+In this case, you want to be able to control the adapter name you want to activate back, after retrieving the reward. For that, simply pass the appropriate `adapter_name` to `ppo_adapter_name` argument when calling `compute_reward_score`.
 
 ```python
 adapter_name_policy_1 = "policy_1"
@@ -97,4 +97,4 @@ trainer = PPOTrainer(
     ...
 )
 ...
-```
+```