Skip to content

Pole Balancer is a Python program that uses reinforcement learning (RL) to automatically design a policy for the classic controls problem of a cart balancing a pole. Through Markov decision processes framework, we can perform reinforcement learning without having any explicit knowledge of the physics of the underlying system, in our case, the po…

License

Notifications You must be signed in to change notification settings

avrumnoor/pole-balancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pole Balancer

Forks Stars

NumPy SciPy

PyCharm

About Pole Balancer

Pole being balanced

Pole Balancer is a Python program that uses reinforcement learning (RL) to automatically design a policy for the classic controls problem of a cart balancing a pole. Through Markov decision processes framework, we can perform reinforcement learning without having any explicit knowledge of the physics of the underlying system, in our case, the pole on the cart.

Requirements

  • Ubuntu 18.04+, macOS 10.15+ and Windows 10+ (64-bit)
  • At least 5GB of memory
  • Anaconda/Miniconda
  • Python 3.6 or above
  • A Python IDE (Jupyter/PyCharm)

Getting Started

Install the following Python packages:

  • matplotlib
  • numpy
  • scipy
  • pillow

Clone

git clone https://github.com/avrumnoor/PoleBalancer.git

Run

python polebalancer.py

Model

A thin pole is hinged to a cart. The cart moves laterally on a smooth table surface. The program fails if either the angle of the pole deviates by more than a particular amount from the vertical position (i.e., if the pole falls over), or if the cart’s position goes out of bounds (i.e., if it falls off the end of the table).

Program Objective

Balance the pole with these constraints, by appropriately having the cart accelerate left and right.

Algorithm

  • Estimate a model (i.e., transition probabilities and rewards) for the underlying MDP.
  • Obtain a value function by solving Bellman’s equations for this estimataion to obtain a value function.
  • Act greedily with respect to this value function.
  • Initially, each state has estimated reward zero, and the estimated transition probabilities are uniform.
  • As the program goes along taking actions, it will gather observations on transitions and rewards, which it can use to get a better estimate of the MDP model.
  • Store the state transitions and reward observations each time, and update the model and value function/policy only periodically.
  • Each time a failure occurs, re-estimate the transition probabilities and rewards as the average of the observed values (if any).
  • Repeat previous steps until convergence (once several consecutive attempts (defined by the parameter NO LEARNING THRESHOLD) to solve Bellman’s equation all converge in the first iteration since this implies that the estimated model has stopped changing significantly).

Results

Graph of the results

Author

Avrum Noor

Buy Me A Coffee

LinkedIn Twitter Followers Stars

Acknowledgements

Anand Avati

Stanford Machine Learning Coursework

About

Pole Balancer is a Python program that uses reinforcement learning (RL) to automatically design a policy for the classic controls problem of a cart balancing a pole. Through Markov decision processes framework, we can perform reinforcement learning without having any explicit knowledge of the physics of the underlying system, in our case, the po…

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages