In the Pick-up and Drop-off (PD) World, our goal is to design a route from the agent so that it could use the least steps to send all the blocks to drop-off cells. To solve reinforcement learning problems, we use a statistical approach and dynamic programming, especially Q-learning, to estimate the utility of taking actions in the states of the …
-
Updated
Jul 24, 2020 - C++