Skip to content

MCTS default policy

Peter Shih edited this page Aug 6, 2017 · 11 revisions
  • Survey
  • Methods
    • Predicate-Average Sampling Technique
    • Features-to-Action Sampling Technique
  • Idea
    • A global table to store Q(a)
      • a is one of the actions
      • Q(a) is the number of times the action a is chosen
      • Example: in Othello, a good action a is to put a chess in the corner
    • The global table can be updated in back-propagation step
    • Use the table to bias the MCTS default policy
      • Choose actions using Boltzmann distribution over Q(a)
    • Can use predicates to constraint the table
      • Example: chess piece is at a certain location
  • Hand-crafted predicates
    • Predicate: In main-action:
      • Actions: play-card, attack, hero-power, end-turn
      • Rationale: match the heuristic
        • End-turn should be chosen less likely then play-card, attack, and hero-power
        • Hero-power should be chosen less likely then play-card, and attack
    • Predicate: choose attacker
      • Actions: choose the character idx to attack
      • Note: Maybe no need for this type of bias. All character should attack if it's attackable.
    • Predicate: choose defender
      • Action: Given the attacker, choose the character idx to defend
      • Rationale:
        • Prefer to attack hero if it make him/her died.
        • Prefer to attack the one s.t. attacker survives, defender dies.
  • Backup
    • Self-Character 1 can attack Opponent-Character 1
      • and so on for self-character 18 and opponent character 18
      • total 8*8=64 features
    • Self-Character 1 can attack Opponent-Character 1, attacker alive, defender die
Clone this wiki locally