-
Notifications
You must be signed in to change notification settings - Fork 49
MCTS default policy
Peter Shih edited this page Aug 6, 2017
·
11 revisions
- Survey
- Recent Advances in General Game Playing
- Generalized Monte-Carlo Tree Search Extensions for General Game Playing
- Methods
- Predicate-Average Sampling Technique
- Features-to-Action Sampling Technique
- Idea
- A global table to store Q(a)
- a is one of the actions
- Q(a) is the number of times the action a is chosen
- Example: in Othello, a good action a is to put a chess in the corner
- The global table can be updated in back-propagation step
- Use the table to bias the MCTS default policy
- Choose actions using Boltzmann distribution over Q(a)
- Can use predicates to constraint the table
- Example: chess piece is at a certain location
- A global table to store Q(a)
- Hand-crafted predicates
- Predicate: In main-action:
- Actions: play-card, attack, hero-power, end-turn
- Rationale: match the heuristic
- End-turn should be chosen less likely then play-card, attack, and hero-power
- Hero-power should be chosen less likely then play-card, and attack
- Predicate: choose attacker
- Actions: choose the character idx to attack
- Note: Maybe no need for this type of bias. All character should attack if it's attackable.
- Predicate: choose defender
- Action: Given the attacker, choose the character idx to defend
- Rationale:
- Prefer to attack hero if it make him/her died.
- Prefer to attack the one s.t. attacker survives, defender dies.
- Predicate: In main-action:
- Backup
- Self-Character 1 can attack Opponent-Character 1
- and so on for self-character 1
8 and opponent character 18 - total 8*8=64 features
- and so on for self-character 1
- Self-Character 1 can attack Opponent-Character 1, attacker alive, defender die
- Self-Character 1 can attack Opponent-Character 1