-
Notifications
You must be signed in to change notification settings - Fork 963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
~!~ OpenSpiel Extravaganza ~!~ Jun 22nd - Jul 3rd #251
Comments
@findmyway did you know about AlphaZero.jl project? I posted a suggestion to support OpenSpiel games: jonathan-laurent/AlphaZero.jl#15 . Would that be something you'd be interested in trying out or looking into, even if you don't have time between this specific period? Would be a neat collaboration between projects! |
Hi @lanctot , By the way, I'm also working on porting some other algorithms (CFR related). Will keep you synced. |
Ok so we will use this thread as a way to describe our project in a bit more detail and show any relevant progress. I will start. I implemented HyperBackgammon, which is a simple variant where each side has only 3 checkers. However, it is not the full game because we still don't have the doubling cube implemented (note to self: we should do that :-p) Here's a screenshot of the initial state:
It is implemented as a variant of Backgammon, you can load it using the game string |
With the help of several people @fzvinicius @finbarrtimbers @jblespiau , I did the final stages of getting OpenSpiel fully supported with TF 2.2 and Python 3.8. As a result, we now fully support Ubuntu 20.04. For details see #166 and #249. |
I finalized de/serialization of CFR and MCCFR C++ solvers and exposed the functionality in Python. For details see #222. Below is the output of the Python example where an MCCFR solver is pickled halfway through training and then the process is continued with the loaded solver: $ python open_spiel/python/examples/mccfr_cpp_example.py --sampling=external --iterations=20
Iteration 0 exploitability: 0.583333
Iteration 1 exploitability: 0.604167
Iteration 2 exploitability: 0.611111
Iteration 3 exploitability: 0.562500
Iteration 4 exploitability: 0.529167
Iteration 5 exploitability: 0.465278
Iteration 6 exploitability: 0.443452
Iteration 7 exploitability: 0.432292
Iteration 8 exploitability: 0.425926
Iteration 9 exploitability: 0.420833
Persisting the model...
Loading the model...
Exploitability of the loaded model: 0.420833
Iteration 10 exploitability: 0.416667
Iteration 11 exploitability: 0.411458
Iteration 12 exploitability: 0.400819
Iteration 13 exploitability: 0.378803
Iteration 14 exploitability: 0.359722
Iteration 15 exploitability: 0.343027
Iteration 16 exploitability: 0.330226
Iteration 17 exploitability: 0.318848
Iteration 18 exploitability: 0.305478
Iteration 19 exploitability: 0.291289 |
@TimSmole and I implemented the Slovenian Tarok card game and started porting it to the OpenSpiel repository. For details see #274. Below is part of the initial output of the Python script that enables one to play the full game: $ python tarok/python/play_game.py
Game phase: GamePhase.CARD_DEALING
Selected contract: Contract.NOT_SELECTED
Current player: -1
Player cards: []
Legal actions: [('Deal', 0)]
Enter action: 0
----------------------------------------------------------------------
Game phase: GamePhase.BIDDING
Selected contract: Contract.NOT_SELECTED
Current player: 1
Player cards: [('V', 4), ('VI', 5), ('VII', 6), ('XII', 11), ('XIV', 13), ...]
Legal actions: [('Pass', 0), ('Two', 3), ('One', 4), ('Beggar', 8), ...]
Enter action: 0
----------------------------------------------------------------------
Game phase: GamePhase.BIDDING
Selected contract: Contract.NOT_SELECTED
Current player: 2
Player cards: [('Pagat', 0), ('II', 1), ('IIII', 3), ('VIII', 7), ...]
Legal actions: [('Pass', 0), ('Two', 3), ('One', 4), ('Beggar', 8), ...]
Enter action: 0
----------------------------------------------------------------------
Game phase: GamePhase.BIDDING
Selected contract: Contract.NOT_SELECTED
Current player: 0
Player cards: [('III', 2), ('X', 9), ('XIII', 12), ('XV', 14), ...]
Legal actions: [('Klop', 1), ('Three', 2), ('Two', 3), ('One', 4), ...]
Enter action: 1
----------------------------------------------------------------------
Game phase: GamePhase.TRICKS_PLAYING
Selected contract: Contract.KLOP
Current player: 0
Player cards: [('III', 2), ('X', 9), ('XIII', 12), ('XV', 14), ...]
Trick cards: []
Legal actions: [('III', 2), ('X', 9), ('XIII', 12), ('XV', 14), ...]
Enter action: 2
----------------------------------------------------------------------
Game phase: GamePhase.TRICKS_PLAYING
Selected contract: Contract.KLOP
Current player: 1
Player cards: [('IIII', 3), ('V', 4), ('IX', 8), ('XVII', 16), ...]
Trick cards: [('III', 2)]
Legal actions: [('IIII', 3), ('V', 4), ('IX', 8), ('XVII', 16), ...] |
I just have added the standard implementation of the game Clobber. It supports up to a 99x26 board (99 rows, 26 columns), although boards for humans and computers are usually kept to 5x6, 8x8, and 10x10. For this original version, beginning from a custom board is not supported in Python, and the option to invert the default checkerboard pattern of the pieces is also not implemented. However, these shouldn't be too difficult to get going and can be done soon. Below is a full game of me (Player 1, 'x') losing to an MCTS player (Player 0, 'o') on a 4x4 board.
|
I implemented a bridged version of Boulder Dash/Emerald Mines #281 (see here for a description, and here for a playthrough). The mechanics of the game are simple: collect enough diamonds to open the exit, while avoiding enemies. This game offers many interesting challenges for AI
Many of the original elements are supported, and custom boards can be taken in as arguments. I have also converted many of the original Boulder Dash levels into the specified format, which you can find here (althrough these will be quite hard on their own for AI to tackle). I will also be adding simplified versions of the levels to that repo, which tackle different complexities the game offers while being easier to solve. |
I implemented a modified version of the Lewis Signaling Game (https://en.wikipedia.org/wiki/Lewis_signaling_game) with an option for arbitrary payoffs. This game can be used as a starting point for multi-agent communication algorithms. With certain payoff matrices, even a game with 3 states and 3 messages can be challenging for decentralized algorithms. Below are the results for centralized and decentralized tabular Q-Learning and decentralized DQN on identity payoff matrices. Since ε is decayed linearly from 1 to 0, the curve looks nearly linear in the centralized case. The following figures show the final joint policy for each state. The counts for each (state, action) pair denotes the number of runs that had that particular joint policy. With the following payoff matrix, the decentralized methods fail to reach the optimal policy. |
As part of the Hearts project, we (@nathansttt, @solinas, @jhtschultz) implemented the classic card game Hearts in OpenSpiel, open sourced the current state of the art Hearts program xinxin (https://github.com/nathansttt/hearts), and built an interface that allows xinxin to play as an OpenSpiel bot. Example OpenSpiel Hearts state: We also added information state resampling to Hearts, which enables algorithms like IS-MCTS to run on the game. Along with xinxin, this provides another useful baseline to compare learning algorithms against. Beyond establishing the first open source benchmarks for Hearts, OpenSpiel-xinxin enables us to generate high-quality games which can be used to learn supervised policies. We wrote a python example of a script that does this. Going forward, we could use these policies to seed learning algorithms, or as a faster alternative to xinxin for move generation in MCTS rollouts. Hearts is an important representative of the class of trick taking card games. Unlike Bridge, there are no teams in Hearts, making it a multiplayer general sum game. As such, it inherits many theoretical complexities and ambiguities. Although RL research on hearts dates back to at least 1997, it still remains a domain in which algorithms have yet to achieve superhuman performance. With this project we aim to preserve an important part of AI history, and use it to facilitate progress in the active research area of multi-agent RL. |
I worked on Public states sub-API to add factored-observation support. In imperfect information games, each agent gets different private information, as well as some public information. The agent's observation is factored to "private observation" and "public observation". During an imperfect information game, each agent should have a non-empty "private observation" at some states (e.g. in Kuhn-poker, agents are dealt cards and the card of each agent is its private observation), and both agents have "public observation" in all states of the game (e.g. in Kuhn-poker, both agents can observe all agent's actions such as bet, call, etc.). |
Hi all, So other than importing PRs and running stuff, I managed to get a small but fun thing done: progress on CFR's empirical convergence outside two-player zero-sum games. I had previously implemented the So, during the extravaganza, I added the following (to be submitted in an update in the coming weeks): the ability to extract CFR's current joint policy and I ran CFR on the general-sum variant of 3-card Goofspiel described in the above papers, the specific game string being And this graph shows the expect payoffs to each player: |
Hi all, I took the first steps on modified AlphaZero. I added the first version for alpha-beta search functionality instead of MCTS on AlphaZero. The main idea is to be able to compare the results and see the differences between the search methods. The policy is not used in the search yet (to be added), for now only values are used. It is significantly slower, and depth limited, therefore I couldn't get the results of the experiments yet, which I will be posting when I hopefully do. |
I worked with Elnaz on adding support for factored observations, which are the basis of the Factored Observation Games (FOG) formalism. We implemented factored observations for Kuhn Poker with arbitrary number of players. We added a special We implemented a We have a work-in-progress implementation of the CFR algorithm that runs on the public tree. We expect to publish it within a few days, maybe even tomorrow. We also added a number of consistency tests between the There remains much to do, but this was a nice first step. Hopefully it will be followed by contributions of many authors, and many scientific papers! :) |
Hi, After many trials and errors, and with the help of many great people, I was able to fix all compile and linking errors for all four TF/C++ targets of OpenSpiel ( Right now, we still have two runtime errors reported by TF checks. I guess that they are caused by some version mismatch between external libraries (TF/Eigen) and can be fixed by spending more time digging into TF/Eigen source code and release notes. I'll continue to work on these issues, and I hope we can have a working C++ AlphaZero in a couple of weeks. You can follow updates on this project in issue #172. |
Hi, I spent most of the Extravaganza adding a GUI for Skat and Hearts to the soon to be released OpenSpiel GUI framework by @bart-devylder. They allow you to play the games against bots and see all the observations that a bot would see. Additionally they can be used to review games played by bots. Skat Bidding |
Hello everyone,
Starting today until July 3rd, there will be concentrated effort to add functionality to OpenSpiel. During this time, we will be updating github every day (possibly even many times per day).
We will be more available than usual to answer questions that you may have about OpenSpiel. Please use this thread.
Among the list of planned additions:
We will also try to run some algorithms and might report a few results.
The text was updated successfully, but these errors were encountered: