Solve the problem of dose determination with reinforcement learning.
The action is a ndarray
with shape (4,)
. it looks like: {x, y, z, d}
.
NUM | Action | Type | Min | Max |
---|---|---|---|---|
X | Step on the X-axis from the current state | Сontinuously | -Inf | Inf |
Y | Step on the y-axis from the current state | Сontinuously | -Inf | Inf |
Z | Step on the Z-axis from the current state | Сontinuously | -Inf | Inf |
D | Dose to current point | Discrete | 0 | 1 |
The observation is an array with shape (X, Y, Z, K)
with the values corresponding to the following positions and velocities:
Num | Observation | Min | Max |
---|---|---|---|
X | Cart Position | 0 | Inf |
Y | Cart Velocity | 0 | Inf |
Z | Pole Angle | 0 | Inf |
K | Is tumor | 0 | 1 |
- X0, Y0, Z0 - Coordinate of manipulator
- Angle
- Distance
- Radius
- Time
Since the goal is to keep the pole upright for as long as possible, a reward of +1
for every step taken,
including the termination step, is allotted. The threshold for rewards is 475 for v1.
All observations are assigned a uniformly random value in (-0.05, 0.05)
The episode ends if any one of the following occurs:
- Termination: Pole Angle is greater than ±12°
- Termination: Cart Position is greater than ±2.4 (center of the cart reaches the edge of the display)
- Truncation: Episode length is greater than 500 (200 for v0)
gymnasium.make('gym_onkorobot/OnkoRobot-v0')
No additional arguments are currently supported.