You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. Make the expert heirarchical: Make 2 Q-learning modules. Activate one when the agent doesn't have a flag and the other when agent has the flag.
2. Make the expert HRL: Make the expert options-like. Then there will be a higher-level policy that switches (then us manually switching). May require look at some reference options implementation.
3. Implement MaxEnt IRL:
3.1 Store expert trajectories tau={s1,a1,...,sT,aT}
3.2 Create new class "inverse_agent" that can see only tau.
3.3 Implement MaxEnt model (ref imp http://178.79.149.207/posts/maxent.html)