~!~ OpenSpiel Extravaganza ~!~ Jun 22nd - Jul 3rd #251

lanctot · 2020-06-22T20:03:45Z

Hello everyone,

Starting today until July 3rd, there will be concentrated effort to add functionality to OpenSpiel. During this time, we will be updating github every day (possibly even many times per day).

We will be more available than usual to answer questions that you may have about OpenSpiel. Please use this thread.

Among the list of planned additions:

Public states sub-API (for implementing algorithms such as DeepStack)
Card games and RL (Hearts, Skat, Bridge, Tarok)
More games! Clobber, Emergent Communication, Emerald Mines / Boulderdash, Hyper-backgammon)
Add bidding to Skat
Get C++ AlphaZero working externally
AlphaZero in Hex; Hex & AlphaZero improvements
Initial investigation in convergence of CFR in n-player and general-sum games

We will also try to run some algorithms and might report a few results.

lanctot · 2020-06-22T20:32:18Z

@findmyway did you know about AlphaZero.jl project? I posted a suggestion to support OpenSpiel games: jonathan-laurent/AlphaZero.jl#15 . Would that be something you'd be interested in trying out or looking into, even if you don't have time between this specific period? Would be a neat collaboration between projects!

findmyway · 2020-06-23T01:39:43Z

Hi @lanctot ,
Yeah, in fact, I contacted @jonathan-laurent just after I decided to write this Julia wrapper. And we had some nice discussions several months ago jonathan-laurent/AlphaZero.jl#4 . I'll provide more info in the issue you created there. jonathan-laurent/AlphaZero.jl#15

By the way, I'm also working on porting some other algorithms (CFR related). Will keep you synced.

lanctot · 2020-06-30T15:35:19Z

Ok so we will use this thread as a way to describe our project in a bit more detail and show any relevant progress.

I will start. I implemented HyperBackgammon, which is a simple variant where each side has only 3 checkers. However, it is not the full game because we still don't have the doubling cube implemented (note to self: we should do that :-p)

Here's a screenshot of the initial state:

+------|------+
|......|...ooo|
|......|......|
|......|......|
|......|......|
|......|......|
|      |      |
|......|......|
|......|......|
|......|......|
|......|......|
|......|...xxx|
+------|------+
Turn: *
Dice: 
Bar:
Scores, X: 0, O: 0

It is implemented as a variant of Backgammon, you can load it using the game string backgammon(hyper_backgammon=true)

lanctot · 2020-06-30T15:41:22Z

With the help of several people @fzvinicius @finbarrtimbers @jblespiau , I did the final stages of getting OpenSpiel fully supported with TF 2.2 and Python 3.8. As a result, we now fully support Ubuntu 20.04. For details see #166 and #249.

inejc · 2020-07-02T09:17:55Z

I finalized de/serialization of CFR and MCCFR C++ solvers and exposed the functionality in Python. For details see #222.

Below is the output of the Python example where an MCCFR solver is pickled halfway through training and then the process is continued with the loaded solver:

$ python open_spiel/python/examples/mccfr_cpp_example.py --sampling=external --iterations=20
Iteration 0 exploitability: 0.583333
Iteration 1 exploitability: 0.604167
Iteration 2 exploitability: 0.611111
Iteration 3 exploitability: 0.562500
Iteration 4 exploitability: 0.529167
Iteration 5 exploitability: 0.465278
Iteration 6 exploitability: 0.443452
Iteration 7 exploitability: 0.432292
Iteration 8 exploitability: 0.425926
Iteration 9 exploitability: 0.420833
Persisting the model...
Loading the model...
Exploitability of the loaded model: 0.420833
Iteration 10 exploitability: 0.416667
Iteration 11 exploitability: 0.411458
Iteration 12 exploitability: 0.400819
Iteration 13 exploitability: 0.378803
Iteration 14 exploitability: 0.359722
Iteration 15 exploitability: 0.343027
Iteration 16 exploitability: 0.330226
Iteration 17 exploitability: 0.318848
Iteration 18 exploitability: 0.305478
Iteration 19 exploitability: 0.291289

inejc · 2020-07-02T09:31:26Z

@TimSmole and I implemented the Slovenian Tarok card game and started porting it to the OpenSpiel repository. For details see #274.

Below is part of the initial output of the Python script that enables one to play the full game:

$ python tarok/python/play_game.py                                          
Game phase: GamePhase.CARD_DEALING
Selected contract: Contract.NOT_SELECTED
Current player: -1

Player cards: []

Legal actions: [('Deal', 0)]

Enter action: 0
---------------------------------------------------------------------- 

Game phase: GamePhase.BIDDING
Selected contract: Contract.NOT_SELECTED
Current player: 1

Player cards: [('V', 4), ('VI', 5), ('VII', 6), ('XII', 11), ('XIV', 13), ...]

Legal actions: [('Pass', 0), ('Two', 3), ('One', 4), ('Beggar', 8), ...]

Enter action: 0
---------------------------------------------------------------------- 

Game phase: GamePhase.BIDDING
Selected contract: Contract.NOT_SELECTED
Current player: 2

Player cards: [('Pagat', 0), ('II', 1), ('IIII', 3), ('VIII', 7), ...]

Legal actions: [('Pass', 0), ('Two', 3), ('One', 4), ('Beggar', 8), ...]

Enter action: 0
---------------------------------------------------------------------- 

Game phase: GamePhase.BIDDING
Selected contract: Contract.NOT_SELECTED
Current player: 0

Player cards: [('III', 2), ('X', 9), ('XIII', 12), ('XV', 14), ...]

Legal actions: [('Klop', 1), ('Three', 2), ('Two', 3), ('One', 4), ...]

Enter action: 1
---------------------------------------------------------------------- 

Game phase: GamePhase.TRICKS_PLAYING
Selected contract: Contract.KLOP
Current player: 0

Player cards: [('III', 2), ('X', 9), ('XIII', 12), ('XV', 14), ...]

Trick cards: []

Legal actions: [('III', 2), ('X', 9), ('XIII', 12), ('XV', 14), ...]

Enter action: 2
---------------------------------------------------------------------- 

Game phase: GamePhase.TRICKS_PLAYING
Selected contract: Contract.KLOP
Current player: 1

Player cards: [('IIII', 3), ('V', 4), ('IX', 8), ('XVII', 16), ...]

Trick cards: [('III', 2)]

Legal actions: [('IIII', 3), ('V', 4), ('IX', 8), ('XVII', 16), ...]

ryanbhayward · 2020-07-02T18:21:56Z

I've been following what Bedir and Mohammad-R have been doing. (It's not yet directly related to openspiel, but I've been writing some postscript so that I can generate board position pix for the game of Y, especially on the geodesic-dome-style boards introduced by Craige Schensted)

christianjans · 2020-07-02T21:14:12Z

I just have added the standard implementation of the game Clobber.

It supports up to a 99x26 board (99 rows, 26 columns), although boards for humans and computers are usually kept to 5x6, 8x8, and 10x10.

For this original version, beginning from a custom board is not supported in Python, and the option to invert the default checkerboard pattern of the pieces is also not implemented. However, these shouldn't be too difficult to get going and can be done soon.

Below is a full game of me (Player 1, 'x') losing to an MCTS player (Player 0, 'o') on a 4x4 board.

$ python3 open_spiel/python/examples/mcts.py --game=clobber\(rows=4,columns=4\) --player1=mcts --player2=human
Initial state:
4xoxo
3oxox
2xoxo
1oxox
 abcd

Player 0 sampled action: c1b1
Next state:
4xoxo
3oxox
2xoxo
1oo.x
 abcd

Choose an action (empty to print legal actions): a4a3
Player 1 sampled action: a4a3
Next state:
4.oxo
3xxox
2xoxo
1oo.x
 abcd

Player 0 sampled action: d2d3
Next state:
4.oxo
3xxoo
2xox.
1oo.x
 abcd

Choose an action (empty to print legal actions): c4b4
Player 1 sampled action: c4b4
Next state:
4.x.o
3xxoo
2xox.
1oo.x
 abcd

Player 0 sampled action: b2c2
Next state:
4.x.o
3xxoo
2x.o.
1oo.x
 abcd

Choose an action (empty to print legal actions): b3c3
Player 1 sampled action: b3c3
Next state:
4.x.o
3x.xo
2x.o.
1oo.x
 abcd

Player 0 sampled action: c2c3
Next state:
4.x.o
3x.oo
2x...
1oo.x
 abcd

Choose an action (empty to print legal actions): a2a1
Player 1 sampled action: a2a1
Next state:
4.x.o
3x.oo
2....
1xo.x
 abcd

Player 0 sampled action: b1a1
Next state:
4.x.o
3x.oo
2....
1o..x
 abcd

Returns: 1.0 -1.0 , Game actions: c1b1 a4a3 d2d3 c4b4 b2c2 b3c3 c2c3 a2a1 b1a1
Number of games played: 1
Number of distinct games played: 1
Players: mcts human
Overall wins [1, 0]
Overall returns [1.0, -1.0]

tuero · 2020-07-02T21:28:23Z

I implemented a bridged version of Boulder Dash/Emerald Mines #281 (see here for a description, and here for a playthrough). The mechanics of the game are simple: collect enough diamonds to open the exit, while avoiding enemies. This game offers many interesting challenges for AI

Long delays between rewards
Dodging enemies by exploiting their predictive movement
Dropping boulders onto enemies to reveal diamonds (understanding mechanics, consequences of actions, and timing)
While the agent moves through the level, dirt is removed which causes items to fall, which can result in deadlocks (blocking the exit or diamonds)
It is also quite fun to play :)

Many of the original elements are supported, and custom boards can be taken in as arguments. I have also converted many of the original Boulder Dash levels into the specified format, which you can find here (althrough these will be quite hard on their own for AI to tackle). I will also be adding simplified versions of the levels to that repo, which tackle different complexities the game offers while being easier to solve.

I have attached a gif below of the ASCII board.

vbhatt-cs · 2020-07-03T02:47:02Z

I implemented a modified version of the Lewis Signaling Game (https://en.wikipedia.org/wiki/Lewis_signaling_game) with an option for arbitrary payoffs. This game can be used as a starting point for multi-agent communication algorithms.

With certain payoff matrices, even a game with 3 states and 3 messages can be challenging for decentralized algorithms. Below are the results for centralized and decentralized tabular Q-Learning and decentralized DQN on identity payoff matrices. Since ε is decayed linearly from 1 to 0, the curve looks nearly linear in the centralized case.

The following figures show the final joint policy for each state. The counts for each (state, action) pair denotes the number of runs that had that particular joint policy.

With the following payoff matrix, the decentralized methods fail to reach the optimal policy.

jhtschultz · 2020-07-03T03:49:58Z

As part of the Hearts project, we (@nathansttt, @solinas, @jhtschultz) implemented the classic card game Hearts in OpenSpiel, open sourced the current state of the art Hearts program xinxin (https://github.com/nathansttt/hearts), and built an interface that allows xinxin to play as an OpenSpiel bot.

Example OpenSpiel Hearts state:

Xinxin screenshot:

We also added information state resampling to Hearts, which enables algorithms like IS-MCTS to run on the game. Along with xinxin, this provides another useful baseline to compare learning algorithms against. Beyond establishing the first open source benchmarks for Hearts, OpenSpiel-xinxin enables us to generate high-quality games which can be used to learn supervised policies. We wrote a python example of a script that does this. Going forward, we could use these policies to seed learning algorithms, or as a faster alternative to xinxin for move generation in MCTS rollouts.

Hearts is an important representative of the class of trick taking card games. Unlike Bridge, there are no teams in Hearts, making it a multiplayer general sum game. As such, it inherits many theoretical complexities and ambiguities. Although RL research on hearts dates back to at least 1997, it still remains a domain in which algorithms have yet to achieve superhuman performance. With this project we aim to preserve an important part of AI history, and use it to facilitate progress in the active research area of multi-agent RL.

ElnazDavoodi · 2020-07-03T05:07:52Z

I worked on Public states sub-API to add factored-observation support. In imperfect information games, each agent gets different private information, as well as some public information. The agent's observation is factored to "private observation" and "public observation". During an imperfect information game, each agent should have a non-empty "private observation" at some states (e.g. in Kuhn-poker, agents are dealt cards and the card of each agent is its private observation), and both agents have "public observation" in all states of the game (e.g. in Kuhn-poker, both agents can observe all agent's actions such as bet, call, etc.).
Also, a version of Kuhn-poker was added which supports factored-observations.

lanctot · 2020-07-03T05:59:37Z

Hi all,

So other than importing PRs and running stuff, I managed to get a small but fun thing done: progress on CFR's empirical convergence outside two-player zero-sum games.

I had previously implemented the EFCCEDist and EFCEDist functions in algorithms/corr_dist.* to be analogues of NashConv: distances to extensive-form coarse-correlated equilibria (EFCCE) and to extensive-form correlated equilibria (EFCE). Motivated by my enthusiasm after reading Celli et al. 2019 and Farina et al. '2019, I wanted to know what CFR's emprical joint policy converges to.

So, during the extravaganza, I added the following (to be submitted in an update in the coming weeks): the ability to extract CFR's current joint policy and corr_dev_builder, a helper class that builds up correlation devices (distributions over joint pure policies) incrementally over time, with the ability to sample pure joint policies from CFR's stochastic policy.

I ran CFR on the general-sum variant of 3-card Goofspiel described in the above papers, the specific game string being turn_based_simultaneous_game(game=goofspiel(num_cards=3,points_order=descending,returns_type=total_points)) . This graph shows the EFCCEDist and EFCEDist of CFR's joint average policy (to build this I sampled 100 joint policies per CFR's mixed strategy and weighed them all equally):

And this graph shows the expect payoffs to each player:

BedirT · 2020-07-03T06:25:39Z

Hi all,

I took the first steps on modified AlphaZero. I added the first version for alpha-beta search functionality instead of MCTS on AlphaZero. The main idea is to be able to compare the results and see the differences between the search methods. The policy is not used in the search yet (to be added), for now only values are used. It is significantly slower, and depth limited, therefore I couldn't get the results of the experiments yet, which I will be posting when I hopefully do.

michalsustr · 2020-07-03T07:26:07Z

I worked with Elnaz on adding support for factored observations, which are the basis of the Factored Observation Games (FOG) formalism. We implemented factored observations for Kuhn Poker with arbitrary number of players. We added a special PublicState interface and implemented it for Kuhn Poker as well. See Public State API for more details. This interface provides data structures and methods necessary for algorithms that solve large-scale games of imperfect information, like Heads-Up, No-Limit Texas Hold'em Poker (a game with ~10^160 states). We added python bindings that support efficient vectorized operations via numpy (resp. Eigen) arrays without copy overhead. This allows to create relatively efficient algorithms also in Python, expanding the accessibility of this research area.

We implemented a PublicState <-> State visualization that nicely shows the relationship between the game structures:

We have a work-in-progress implementation of the CFR algorithm that runs on the public tree. We expect to publish it within a few days, maybe even tomorrow. We also added a number of consistency tests between the PublicState and State API.

There remains much to do, but this was a nice first step. Hopefully it will be followed by contributions of many authors, and many scientific papers! :)

mrdaliri · 2020-07-03T15:46:26Z

Hi,
During Extravaganza, I worked on getting C++ AlphaZero to work externally. The current version of AlphaZero is implemented using Tensorflow, which is known to have a tricky C/C++ API.

After many trials and errors, and with the help of many great people, I was able to fix all compile and linking errors for all four TF/C++ targets of OpenSpiel (alpha_zero library, vpnet_test, alpha_zero_example and tf_trajectories_example) and build them using Tensorflow 2.2.0. Since TF doesn't have a native C++ API (it only has C API), we utilize tensorflow_cc project to build TF using CMake and Bazel from scratch.

Right now, we still have two runtime errors reported by TF checks. I guess that they are caused by some version mismatch between external libraries (TF/Eigen) and can be fixed by spending more time digging into TF/Eigen source code and release notes.

I'll continue to work on these issues, and I hope we can have a working C++ AlphaZero in a couple of weeks. You can follow updates on this project in issue #172.

flostim · 2020-07-03T16:07:30Z

Hi,

I spent most of the Extravaganza adding a GUI for Skat and Hearts to the soon to be released OpenSpiel GUI framework by @bart-devylder. They allow you to play the games against bots and see all the observations that a bot would see. Additionally they can be used to review games played by bots.

Skat GUI

Hearts GUI

Skat Bidding
I also started working on adding full bidding to Skat. This turned out to be a bit more complicated than I thought and I will make the game configurable to allow both simplified and full bidding.

lanctot changed the title ~~OpenSpiel Extravaganza Jun 22nd - Jul 3rd~~ ~!~ OpenSpiel Extravaganza ~!~ Jun 22nd - Jul 3rd Jun 22, 2020

lanctot pinned this issue Jun 22, 2020

lanctot mentioned this issue Jun 22, 2020

Support for OpenSpiel games? jonathan-laurent/AlphaZero.jl#15

Closed

lanctot unpinned this issue Jul 6, 2020

lanctot closed this as completed Jul 6, 2020

michalsustr mentioned this issue Jan 6, 2021

Use OpenSpiel in Unity3d #193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

~!~ OpenSpiel Extravaganza ~!~ Jun 22nd - Jul 3rd #251

~!~ OpenSpiel Extravaganza ~!~ Jun 22nd - Jul 3rd #251

lanctot commented Jun 22, 2020 •

edited

Loading

lanctot commented Jun 22, 2020

findmyway commented Jun 23, 2020 •

edited

Loading

lanctot commented Jun 30, 2020 •

edited

Loading

lanctot commented Jun 30, 2020 •

edited

Loading

inejc commented Jul 2, 2020

inejc commented Jul 2, 2020

ryanbhayward commented Jul 2, 2020

christianjans commented Jul 2, 2020

tuero commented Jul 2, 2020 •

edited

Loading

vbhatt-cs commented Jul 3, 2020

jhtschultz commented Jul 3, 2020 •

edited

Loading

ElnazDavoodi commented Jul 3, 2020 •

edited

Loading

lanctot commented Jul 3, 2020 •

edited

Loading

BedirT commented Jul 3, 2020

michalsustr commented Jul 3, 2020

mrdaliri commented Jul 3, 2020

flostim commented Jul 3, 2020 •

edited

Loading

~!~ OpenSpiel Extravaganza ~!~ Jun 22nd - Jul 3rd #251

~!~ OpenSpiel Extravaganza ~!~ Jun 22nd - Jul 3rd #251

Comments

lanctot commented Jun 22, 2020 • edited Loading

lanctot commented Jun 22, 2020

findmyway commented Jun 23, 2020 • edited Loading

lanctot commented Jun 30, 2020 • edited Loading

lanctot commented Jun 30, 2020 • edited Loading

inejc commented Jul 2, 2020

inejc commented Jul 2, 2020

ryanbhayward commented Jul 2, 2020

christianjans commented Jul 2, 2020

tuero commented Jul 2, 2020 • edited Loading

vbhatt-cs commented Jul 3, 2020

jhtschultz commented Jul 3, 2020 • edited Loading

ElnazDavoodi commented Jul 3, 2020 • edited Loading

lanctot commented Jul 3, 2020 • edited Loading

BedirT commented Jul 3, 2020

michalsustr commented Jul 3, 2020

mrdaliri commented Jul 3, 2020

flostim commented Jul 3, 2020 • edited Loading

lanctot commented Jun 22, 2020 •

edited

Loading

findmyway commented Jun 23, 2020 •

edited

Loading

lanctot commented Jun 30, 2020 •

edited

Loading

lanctot commented Jun 30, 2020 •

edited

Loading

tuero commented Jul 2, 2020 •

edited

Loading

jhtschultz commented Jul 3, 2020 •

edited

Loading

ElnazDavoodi commented Jul 3, 2020 •

edited

Loading

lanctot commented Jul 3, 2020 •

edited

Loading

flostim commented Jul 3, 2020 •

edited

Loading