Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Game of GO benchmark #1169

Closed
ViralBShah opened this issue Aug 17, 2012 · 4 comments
Closed

Game of GO benchmark #1169

ViralBShah opened this issue Aug 17, 2012 · 4 comments
Labels
performance Must go faster

Comments

@ViralBShah
Copy link
Member

Issue to track the Game of GO benchmark

https://groups.google.com/d/topic/julia-dev/8uIjpx-YTKw/discussion
Code from https://gist.github.com/3373404

@ViralBShah
Copy link
Member Author

JeffBezanson added a commit that referenced this issue Nov 16, 2012
this is an optimization and also makes it easier to get callback pointers.
closes #938. sparse on Range 3x faster
helps #1211 (ziggurat), about 25% faster
helps #1169 (game of go), about 25% faster
helps #939 (sortperm), about 25% faster
helps #1163 (graph centrality) a bit, about 10% faster
@quinnj
Copy link
Member

quinnj commented Jun 19, 2013

So I played around with this last night incorporating the pending @inbounds macro and using the profiler. My modified code runs in about 9.2s, which is 1.7x gcc -O0 and 4.8x gcc -O3. It's also more than 2x faster than the original Julia code.
One interesting note was the apparent slowness of mod, as shown in the profiling results below:

635 ...pbox/go.jl; ...dditional_liberty; line: 123
191 ...pbox/go.jl; ...dditional_liberty; line: 124

Where lines 123 and 124 correspond to:

ai = 1 + mod(pos - 1, board.size)
aj = 1 + fld(pos - 1, board.size)

I guess I wouldn't expect mod to be that much slower than fld (I ran the profiler a few times just to check it wasn't a sampling thing).

Gist of my modified code: https://gist.github.com/karbarcca/5815251

@StefanKarpinski
Copy link
Member

It's deeply unfortunate that processors implement rem in hardware but not mod since mod is generally the better choice. This forces mod to be implemented in terms of rem and it ends up being significantly slower. Given the absurd excess of transistors modern CPUs have and the crazy number of instructions that the x86_64 architecture has, you would think they could add a frigging mod instruction.

@JeffBezanson
Copy link
Member

Much faster after #4042.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

4 participants