First, you need to install JDK 1.8, on Debian based systems you can run the following:
sudo add-apt-repository ppa:openjdk-r/ppa
sudo apt-get update
sudo apt-get install openjdk-8-jdk
Note
If you work on another OS, you can follow the instructions here to install JDK 1.8.
Now, you can install the MineRL environment:
pip install -e .[minerl]
Warning
If you run into any problems during the installation due to some missing files that are not downloaded, please have a look at this issue.
We have modified the MineRL environments to have a custom action and observation space. We provide three different tasks:
- Navigate: you need to set the
env.id
argument tocustom_navigate
. - Obtain Iron Pickaxe: you need to set the
env.id
argument tocustom_obtain_iron_pickaxe
. - Obtain Diamond: you need to set the
env.id
argument tocustom_obtain_diamond
.
Note
If you want to use a MineRL environment, you must specify it, for example, by setting env=minerl
in the cli arguments or by creating your custom config file.
In all these environments, it is possible to have or not a dense reward, you can set the type of the reward by setting the env.wrapper.dense
argument to True
if you want a dense reward, to False
otherwise.
In the Navigate task, it is also possible to choose whether or not to train the agent in an extreme environment (for more info, check here). To choose whether or not to train the agent on an extreme environment, you need to set the env.wrapper.extreme
argument to True
or False
.
In addition, in all the environments, it is possible to set the break speed multiplier through the env.break_speed_multiplier
argument.
We have slightly modified the observation space, by adding the life stats (life, food, and oxygen) and reshaping those already present (based on the idea proposed by Hafner in DreamerV3):
- We represent the inventory with a vector with one entry for each item of the game which gives the quantity of the corresponding item in the inventory.
- A max inventory vector with one entry for each item which contains the maximum number of items obtained by the agent so far in the episode.
- The RGB first-person camera image.
- A vector of three elements representing the life, the food, and the oxygen levels of the agent.
- A one-hot vector indicating the equipped item, only for the obtain tasks.
- A scalar indicating the compass angle to the goal location, only for the navigate tasks.
We decided to convert the multi-discrete action space into a discrete action space. Moreover, we restrict the look-up/down actions between min_pitch
and max_pitch
degrees.
In addition, we added the forward action when the agent selects one of the following actions: jump
, sprint
, and sneak
.
Finally, we added sticky actions for the jump
and attack
actions. You can set the values of the sticky_jump
and sticky_attack
parameters through the env.sticky_jump
and env.sticky_attack
arguments, respectively. The sticky actions, if set, force the agent to repeat the selected actions for a certain number of steps.
Note
Since the MineRL environments have a multi-discrete action space, the sticky actions can be easily implemented. The agent will perform the selected action and the sticky actions simultaneously.
The action repeat in the Minecraft environments is set to 1, indeed, It makes no sense to force the agent to repeat an action such as crafting (it may not have enough material for the second action).
Note
The env.sticky_attack
parameter is set to 0
if the env.break_speed_multiplier > 1
.
If you work on a headless machine, you need to software renderer. We recommend to adopt one of the following solutions:
- Install the
xvfb
software with thesudo apt install xvfb
command and prefix the training command withxvfb-run
. For instance, to train DreamerV2 on the navigate task on a headless machine, you need to run the following command:xvfb-run python sheeprl.py exp=dreamer_v3 fabric.devices=1 env=minerl env.id=custom_navigate algo.cnn_keys.encoder=[rgb]
. - Exploit the PyVirtualDisplay package.