In this paper, both model-based dynamic programming and model-free reinforcement learning are put into comparison. We show the necessary theory and basics to work on control problems for finite discrete-time dynamic systems and how to attain an optimal policy that optimizes our objective function. Stochastic dynamic programming and Q-learning are then applied to a practical example problem that showcases the different approaches and their respective results. Our practical results show that both methods are valid approaches to solving the example.
-
Code
-
Thesis
-
Presentation Slides
-
Python code for dp, ql and statistics