MCTS default policy

Survey
- Recent Advances in General Game Playing
- Generalized Monte-Carlo Tree Search Extensions for General Game Playing
Methods
- Predicate-Average Sampling Technique
- Features-to-Action Sampling Technique
Idea
- A global table to store Q(a)
  - a is one of the actions
  - Q(a) is the number of times the action a is chosen
  - Example: in Othello, a good action a is to put a chess in the corner
- The global table can be updated in back-propagation step
- Use the table to bias the MCTS default policy
  - Choose actions using Boltzmann distribution over Q(a)
- Can use predicates to constraint the table
  - Example: chess piece is at a certain location
Hand-crafted predicates
- Predicate: In main-action:
  - Actions: play-card, attack, hero-power, end-turn
  - Rationale: match the heuristic
    - End-turn should be chosen less likely then play-card, attack, and hero-power
    - Hero-power should be chosen less likely then play-card, and attack
- Predicate: choose attacker
  - Actions: choose the character idx to attack
  - Note: Maybe no need for this type of bias. All character should attack if it's attackable.
- Predicate: choose defender
  - Action: Given the attacker, choose the character idx to defend
  - Rationale:
    - Prefer to attack hero if it make him/her died.
    - Prefer to attack the one s.t. attacker survives, defender dies.
Backup
- Self-Character 1 can attack Opponent-Character 1
  - and so on for self-character 1~~8 and opponent character 1~~8
  - total 8*8=64 features
- Self-Character 1 can attack Opponent-Character 1, attacker alive, defender die

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCTS default policy

Clone this wiki locally