-
Notifications
You must be signed in to change notification settings - Fork 49
MCTS default policy
Peter Shih edited this page Aug 7, 2017
·
11 revisions
- Survey
- Recent Advances in General Game Playing
- Generalized Monte-Carlo Tree Search Extensions for General Game Playing
- Methods
- Predicate-Average Sampling Technique
- Features-to-Action Sampling Technique
- Idea
- A global table to store Q(a)
- a is one of the actions
- Q(a) is the number of times the action a is chosen
- Example: in Othello, a good action a is to put a chess in the corner
- The global table can be updated in back-propagation step
- Use the table to bias the MCTS default policy
- Choose actions using Boltzmann distribution over Q(a)
- Can use predicates to constraint the table
- Example: chess piece is at a certain location
- A global table to store Q(a)
- Hand-crafted predicates
- Force-select actions:
- If a character is attackable, choose to attack
- Rather than play a card, etc.
- Choose the attacker with highest HP
- If attack hero will make him/her died, attack hero
- If not, attack minion
- Choose a weaker defender
- Defender dies, attacker survives
- Choose the defender with highest attack
- Choose a defender with same strength
- Both attacker and defender died
- Choose the defender with highest attack
- Choose a stronger defender
- Attacker dies, defender survives
- Choose a defender with highest attack
- If a weapon is equipped, do not play a weapon card
- If a character is attackable, choose to attack
- Predicate: In main-action:
- Actions:
- Play a minion when owning <= 3 minions
- Play a minion when owning >= 4 minions
- Attack minion
- Attack hero
- Hero power
- End turn
- Rationale: match the heuristic
- End-turn should be chosen less likely then play-card, attack, and hero-power
- Hero-power should be chosen less likely then play-card, and attack
- Prefer to play a minion when not much minions
- Actions:
- Predicate: choose target of a card with a particular card id
- Actions:
- Target an ally character
- Target an enemy character
- Actions:
- Force-select actions:
- TODO
- Refine definition of actions
- If we carefully define actions, we can use RAVE (rapid action value estimation) to enhance tree policy
- And use MAST (move-averaging sampling technique) to enhance default policy
- Use (Predicate-) Rapid Action Value Estimation
- Ideas to redefine actions:
- A minion is stronger/same/weaker than n minions
- Or, just sum up the 'HP' and 'attack' to gain a rough idea how strong it is
- This is a total ordering. A rank can be used
- Hero health + armor
- Note: a game state can be seen/valued by the predicates which holds in that state
- A minion is stronger/same/weaker than n minions
- Refine definition of actions
- Backup
- Self-Character 1 can attack Opponent-Character 1
- and so on for self-character 1-8 and opponent character 1-8
- total 8*8=64 features
- Self-Character 1 can attack Opponent-Character 1, attacker alive, defender die
- Self-Character 1 can attack Opponent-Character 1