rl-study

Homework 1

Implementing a simple Gaussian Log-Likelihood gaussian log-likelihood

Create two arms (A, B) with outcomes 0 or 1 and adjustable E(P(X)). An arm that, if set to 10%, should produce a 1 10% of the time.
Implement greedy, e-greedy, UCB, and thompson sampling. Set the hyperparameters appropriately and don't change them.
Find the average multiplier for each algorithm by testing 5 times for the above arms, where A has a win rate of 10%, 20%, ..., 90%, and B has a win rate of 10%, 20%, ..., 90%. In total, you should test 9 X 9 X 5 X 4 times, and the final result should be 9 X 9 X 4.
Represent the above results in a scatter plot. The x-axis is where E(P(A)) is 10%, 20%, ..., 90% and the y-axis is where E(P(B)) is 10%, 20%, ..., 90%. color is the algorithm and size is the average multiplier per algorithm.
Submit your code to your respective GitHubs and one final image to this channel.

Write a function that returns a value map when converging using policy evaluation and value iteration in the environment shown in the photo!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
1.gaussian_loglikelihood.py		1.gaussian_loglikelihood.py
2.py		2.py
4.py		4.py
README.md		README.md