Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scoring #292

Merged
merged 19 commits into from
Jun 2, 2016
Merged

Fix scoring #292

merged 19 commits into from
Jun 2, 2016

Conversation

ujh
Copy link
Owner

@ujh ujh commented May 13, 2016

Run the playouts for 10-30s for use in final_score and final_status_list. It's still WIP as some tests don't pass yet and I also don't quite know if this is better as before. I'll have to run the 13x13 benchmark on master and this branch to see if this branch is better at predicting the score (i.e. closer to GnuGo).

Fixes #280, #291

@ujh ujh added this to the 0.3.2 milestone May 13, 2016
@iopq
Copy link
Collaborator

iopq commented May 13, 2016

That's not how other engines do it. They let you have an analyze command like reg_genmove or donplayouts 1000 and THEN you score.

@ujh
Copy link
Owner Author

ujh commented May 13, 2016

I came up with that solution for the case where the bot is winning a game which means it's doing very few playouts for the last move. My assumption is that this is the reason for the bad scoring. But all this is only based on a feeling. I'm currently running 50 games on 13x13 for both the master branch and this one. Then we will know the following:

  1. In how many cases are GnuGo and Iomrascálaí disagreeing on master, i.e. is there even a big problem in the first place.
  2. Is this better than it was before.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.6%) to 91.702% when pulling 16e5479 on fix-scoring into fd63ad4 on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.2%) to 92.015% when pulling 0778a5b on fix-scoring into fd63ad4 on master.

@ujh
Copy link
Owner Author

ujh commented May 21, 2016

I ran a few games on 13x13 with both master and this branch. For master it looks like this:

14.4% same score as GnuGo (35 of 243, ± 4.39 at 95%, ± 4.48 at 99%)

For this branch it's quite a bit better:

66.34% same score as GnuGo (136 of 205, ± 6.44 at 95%, ± 6.57 at 99%)

It's of course not perfect but I think the improvement is big enough that it makes sense to merge this.

@iopq
Copy link
Collaborator

iopq commented May 21, 2016

But I like to set my own time limits for search, how do I get the previous functionality back?

Also, are the games where there are score disagreements played to the end, or resigned by Iomrascalai?

@ujh
Copy link
Owner Author

ujh commented May 22, 2016

I unfortunately can't check as I'm on holiday without a computer. But can't you also use search and pass in a custom stop closure?

@ujh
Copy link
Owner Author

ujh commented May 22, 2016

Oh, and these numbers are for games that weren't resigned.

@iopq
Copy link
Collaborator

iopq commented May 22, 2016

I'm using the executable, right now I just do a search for x seconds first (by changing the timer) and then ask for a score.

@ujh
Copy link
Owner Author

ujh commented May 22, 2016

I see. We could introduce a custom GTP command that allows you to set the number of seconds it should search for final_score.

@iopq
Copy link
Collaborator

iopq commented May 22, 2016

I've seen a command like donplayouts 10000 to do ten thousand playouts for this purpose. That way final_score just uses whatever the playouts are already done instead of doing additional playouts. Of course this also applies to a command like uct_gfx which shows the statistics for all the tree moves. I'm working on uct_gfx already and I don't feel like I should have to make special version of the command to populate the tree. Instead I could just do donplayouts 1000 to do uct_gfx OR final_score depending on my needs.

@ujh
Copy link
Owner Author

ujh commented May 22, 2016

The issue with that is that not enough playouts are done when the win rate goes above 80%. So we end up with almost no data at the end of the game. That's why I introduced this code to do some "extra" playouts to get a better prediction of which stoned are dead.

@iopq
Copy link
Collaborator

iopq commented May 22, 2016

So just run the donplayouts 10000 command first? Or, since we don't have one, time_settings 60 0 0 reg_genmove final_score should do the trick.

@ujh
Copy link
Owner Author

ujh commented May 23, 2016

Let's discuss this once I'm back from my holiday. I think we just have totally different use cases.

@ujh
Copy link
Owner Author

ujh commented May 31, 2016

Alright, so I'm back from holiday. My current use case is as follows: If Iomrascálaí is winning a game it stops the search early during the last moves so it doesn't run many playouts. If we then use these few playouts for scoring after the game is over we get the score wrong quite a lot as dead stones aren't marked as dead. That's why I do this somewhat hackish thing of removing the "end of game marker" and run the search for 10-30 seconds. Then final_score and final_status_list produce better results. It may even be useful to not include pass moves in this search.

What exactly is your use case? To me it seems to be similar to the imrscl-ownership command which depends on the currently existing search tree. Is this what you're doing?

@iopq
Copy link
Collaborator

iopq commented May 31, 2016

Yes, I do the exact same thing. My solution is to first run time_settings 0 0 10 and then genmove b (or reg_genmove b) to get the exact amount of search time I want. I don't see why search time has to be built-in to the command itself since you can control the time more finely. Then I do imsrcl-ownership and count the game and ALSO show the owned points on screen.

My ideal solution is to implement donplayouts 10000 and just do 10k playouts (or however many you need)

@ujh
Copy link
Owner Author

ujh commented May 31, 2016

OK, does imrscl-ownership trigger the search I've implemented for final_score? If so then that's of course a bug. Otherwise I don't quite see the connection between your command and my changes here as you don't seem to use final_score. If this of course does cause the search to kick in that would be bad.

BTW, I'm not against implementing donplayouts but I wonder what the connection to this PR here is.

@iopq
Copy link
Collaborator

iopq commented May 31, 2016

it doesn't, but if someone wants to use final_score they should be able to set their own number of playouts

it makes no sense to merge this and then undo it after donplayouts is implemented

@ujh
Copy link
Owner Author

ujh commented May 31, 2016

I see. For me the main reason of existence for final_score and final_status_list are for a smooth end game. That requires us (the developers) to tune the parameters to achieve the best results. Personally I'd rather implement another similar command (and donplayouts) that can be used as an analyze command. To me they have rather different uses. One it tuned to ideally produce a correct score after the game is over. And the other is used to give insights into the current state of the search tree.

@ujh
Copy link
Owner Author

ujh commented May 31, 2016

So would it be enough to separately implement donplayouts? We could of course also add a flag that circumvents the extra search if no play or genmove happened between donplayouts and final_score or final_status_list.

@iopq
Copy link
Collaborator

iopq commented May 31, 2016

If donplayouts is implemented, why bother with mixing search and counting? Seems like those are separate concerns. If you really want a separate command it could just call out to donplayouts and final_score functions instead of adding all of this extra code to do the same thing with separate boolean flags.

@ujh
Copy link
Owner Author

ujh commented May 31, 2016

Alright, I'm convinced. I've added the imrscl-donplayouts GTP command and I now run 10,000 playouts if necessary. I will have to check tomorrow if there are still bugs in there (esp. with when to automatically run the playouts and when not) but I thought that maybe you wanted to have a look already.

I'll have to see if 10,000 is enough or if I have to increase the number of course.

@ujh
Copy link
Owner Author

ujh commented Jun 1, 2016

So, it all seems to work as expected. If you run imrscl-donplayouts then final_score and final_status_list will return immediately. After a genmove or kgs-genmove_cleanup it will then reset and do 10,000 playouts. I will have to measure the improvement again and maybe change the value to something higher than 10,000 of course, but first I wanted to check if this is what you had in mind.

@iopq
Copy link
Collaborator

iopq commented Jun 1, 2016

Yeah, this is what I had in mind. We can tweak the implementations later if we need to adjust how it works.

@ujh
Copy link
Owner Author

ujh commented Jun 1, 2016

Cool. And thanks for the discussion! I think we ended up with a better solution. 👍

@ujh ujh removed the wip label Jun 1, 2016
@ujh ujh merged commit 1493764 into master Jun 2, 2016
@ujh ujh deleted the fix-scoring branch June 2, 2016 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Final score sometimes wrong
3 participants