-
Notifications
You must be signed in to change notification settings - Fork 12
Parameter usage
NeoRL uses OpenAI Gym API, allowing users to create an env via neorl.make()
For neorl.make()
func, the parameters are shown below:
param | type | description |
---|---|---|
task |
str | The task name you want to create. A full list of tasks is available here |
reward_func |
func | A customized reward function, which should be provided if you want to calculate reward instead of using built-in reward of dataset. |
The following code segment shows the usage of neorl
with a customized reward function.
import neorl
def customized_reward_func(data):
obs = data["obs"]
action = data["action"]
obs_next = data["next_obs"]
single_reward = False
if len(obs.shape) == 1:
single_reward = True
obs = obs.reshape(1, -1)
if len(action.shape) == 1:
action = action.reshape(1, -1)
if len(obs_next.shape) == 1:
obs_next = obs_next.reshape(1, -1)
CRF = 3.0
CRC = 1.0
fatigue = obs_next[:, -2]
consumption = obs_next[:, -1]
cost = CRF * fatigue + CRC * consumption
reward = -cost
if single_reward:
reward = reward[0].item()
else:
reward = reward.reshape(-1, 1)
return reward
env = neorl.make("ib", reward_func=customized_reward_func) # create the industrial benchmark env
For get_dataset()
func, the parameters are shown below:
param | type | description |
---|---|---|
task_name_version |
str | The name and version (if applicable) of the task, default is the same as task while making env |
data_type |
str | Which type of policy is used to collect data. It should be one of ["high", "medium", "low"], default to high
|
train_num |
int | The num of trajectory of training data. Note that the num should be less than 10,000, 100 by default |
need_val |
bool | Whether needs to download validation data, default to True
|
val_ratio |
float | The ratio of validation data to training data, default to 0.1
|
path |
str | The directory of data to load from or download to ./data/
|
use_data_reward |
bool | Whether uses default data reward. If false, a customized reward function should be provided by users while making env |
Note that task_name_version
is the same as task
while making env by default. For instance, env = neorl.make("citylearn")
will bind citylearn
with env
and dataset
, which indicates env.get_dataset()
will obtain citylearn data by default. For flexibility, task_name_version
can be other task considering some people only intend to obtain data using an existing env instead of creating a neo one.
When calling get_dataset()
, it will first look at local path
for appropriate dataset ("appropriate" means the data type should match with the target data and the num of trajectories should not be less than the target data's). Meanwhile, MD5 is utilized to ensure dataset is complete and correct. If local dataset is not applicable, it will download the least appropriate dataset from remote server to path
according to local data_map.json
.
import neorl
env = neorl.make("finance")
train_data, val_data = env.get_dataset(data_type="medium", train_num=100, need_val=True, val_ratio=0.2, use_data_reward=True)
It will load 100 trajectories for train_data and 10 trajectories for val_data, both using "medium" policy and built-in data reward.
import neorl
env = neorl.make("citylearn")
train_data, _ = env.get_dataset(task_name_version="HalfCheetah-v3", data_type="low", train_num=50, need_val=False, use_data_reward=True)
It will load 50 trajectories for train_data without val_data, using "low" policy and built-in data reward.