MCTS default policy

Survey
- Recent Advances in General Game Playing
- Generalized Monte-Carlo Tree Search Extensions for General Game Playing
Methods
- Predicate-Average Sampling Technique
- Features-to-Action Sampling Technique
Idea
- A global table to store Q(a)
  - a is one of the actions
  - Q(a) is the number of times the action a is chosen
  - Example: in Othello, a good action a is to put a chess in the corner
- The global table can be updated in back-propagation step
- Use the table to bias the MCTS default policy
  - Choose actions using Boltzmann distribution over Q(a)
- Can use predicates to constraint the table
  - Example: chess piece is at a certain location
Hand-crafted predicates
- Force-select actions:
  - If a character is attackable, choose to attack
    - Rather than play a card, etc.
    - Choose the attacker with highest HP
  - If attack hero will make him/her died, attack hero
    - If not, attack minion
  - Choose a weaker defender
    - Defender dies, attacker survives
    - Choose the defender with highest attack
  - Choose a defender with same strength
    - Both attacker and defender died
    - Choose the defender with highest attack
  - Choose a stronger defender
    - Attacker dies, defender survives
    - Choose a defender with highest attack
  - If a weapon is equipped, do not play a weapon card
- Predicate: In main-action:
  - Actions:
    - Play a minion when owning <= 3 minions
    - Play a minion when owning >= 4 minions
    - Attack minion
    - Attack hero
    - Hero power
    - End turn
  - Rationale: match the heuristic
    - End-turn should be chosen less likely then play-card, attack, and hero-power
    - Hero-power should be chosen less likely then play-card, and attack
    - Prefer to play a minion when not much minions
- Predicate: choose target of a card with a particular card id
  - Actions:
    - Target an ally character
    - Target an enemy character
TODO
- Refine definition of actions
  - If we carefully define actions, we can use RAVE (rapid action value estimation) to enhance tree policy
  - And use MAST (move-averaging sampling technique) to enhance default policy
- Use (Predicate-) Rapid Action Value Estimation
  - http://ai.cs.unibas.ch/papers/theses/neidinger-bachelor-17-slides.pdf
- Ideas to redefine actions:
  - A minion is stronger/same/weaker than n minions
    - Or, just sum up the 'HP' and 'attack' to gain a rough idea how strong it is
    - This is a total ordering. A rank can be used
  - Hero health + armor
  - Note: a game state can be seen/valued by the predicates which holds in that state
Backup
- Self-Character 1 can attack Opponent-Character 1
  - and so on for self-character 1-8 and opponent character 1-8
  - total 8*8=64 features
- Self-Character 1 can attack Opponent-Character 1, attacker alive, defender die

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCTS default policy

Clone this wiki locally