Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate gsl-shell/luajit into microbenchmarks #6430

Merged
merged 1 commit into from
Apr 6, 2014
Merged

Conversation

jiahao
Copy link
Member

@jiahao jiahao commented Apr 5, 2014

Integrate gsl-shell/luajit into microbenchmarks

  • Also update command to get Julia version info.
  • Renames label for gsl-shell/lua tests from gsl_shell to lua.
  • Remove non-working external BLAS usage warning.

Closes #1662.

- Also update command to get Julia version info.
- Renames label for gsl-shell/lua tests from gsl_shell to lua.
- Remove non-working external BLAS usage warning

Closes #1662.
@jiahao jiahao changed the title WIP: Integrate gsl-shell/luajit into microbenchmarks Integrate gsl-shell/luajit into microbenchmarks Apr 6, 2014
@jiahao
Copy link
Member Author

jiahao commented Apr 6, 2014

Ready for review.

Provisional benchmark results:

FortranJuliaPythonRMatlabOctaveMathematicaJavaScriptGoLua
GCC 4.8.1 0.3.0-prerelease+2497 2.7.3 3.0.2 R2012a 3.6.4 8.0 V8 3.7.12.22 go1 gsl-shell 2.3.1
fib0.411.6865.40880.283499.246985.53140.023.781.901.93
parse_int3.642.0414.3454.301240.196974.7929.622.443.955.15
quicksort1.091.0031.41522.6186.211150.0736.373.611.052.05
mandel0.750.7414.58107.1556.67335.946.223.312.020.74
pi_sum0.800.8313.4415.391.06246.651.320.841.171.00
rand_mat_stat0.801.9813.5613.716.8318.775.624.038.294.36
rand_mat_mul4.390.880.964.140.953.531.1914.388.824.21

@jiahao
Copy link
Member Author

jiahao commented Apr 6, 2014

I reran the tests with an upgraded version of R. The change in performance numbers is quite interesting:

3.0.2 3.1.0
fib 880.28 552.33
parse_int 54.30 49.77
quicksort 522.61 282.31
mandel 107.15 66.33
pi_sum 15.39 15.03
rand_mat_stat 13.71 15.51
rand_mat_mul 4.14 4.12

@tkelman
Copy link
Contributor

tkelman commented Apr 6, 2014

While you're at it, Octave also had a major recent release that could improve their numbers.

@johnmyleswhite
Copy link
Member

Maybe they finally pulled in Radford Neil's patches?

@jiahao
Copy link
Member Author

jiahao commented Apr 6, 2014

Unfortunately Octave 3.8 hasn't made it into any Ubuntu distribution stream that I'm aware of, and I'm too lazy to build from source. ;-)

@tkelman
Copy link
Contributor

tkelman commented Apr 6, 2014

Fair enough, the idea that the benchmarks should be based on "standard default easy installation version" makes perfect sense.

@StefanKarpinski
Copy link
Member

Those Lua numbers are definitely interesting, as are the improved R numbers.

@cbecker
Copy link
Contributor

cbecker commented Apr 6, 2014

I get an error with the fortran tests, because of the static compilation FFLAGS+= -static-libgfortran, is that necessary?

@cbecker
Copy link
Contributor

cbecker commented Apr 6, 2014

@jiahao what machine are you running the benchmarks on?
For pi_sum I get very good performance from Lua (as fast as C) but Julia is doing 30% worse (1.3). You can check the msec timings I get at #1662 .

@jiahao
Copy link
Member Author

jiahao commented Apr 6, 2014

@cbecker this is our main julia test machine whose specs are mentioned somewhere near the benchmarking data. It does have quite a lot of memory, which could be a factor.

@jiahao
Copy link
Member Author

jiahao commented Apr 6, 2014

I think the static compilation flag was introduced before my time. It's probably unnecessary if you can get away without using it.

@cbecker
Copy link
Contributor

cbecker commented Apr 6, 2014

@jiahao I don't think having huge amounts of memory would speed up pi_sum. Maybe it is related to AVX or some set of instructions that your machine and mine differ about. If you have the chance, please PM me with cat /proc/cpuinfo.

@jiahao
Copy link
Member Author

jiahao commented Apr 6, 2014

Mm, the thought did occur to me.

Here's the tail end of cpuinfo:

processor   : 79
vendor_id   : GenuineIntel
cpu family  : 6
model       : 47
model name  : Intel(R) Xeon(R) CPU E7- 8850  @ 2.00GHz
stepping    : 2
microcode   : 0x36
cpu MHz     : 1996.000
cache size  : 24576 KB
physical id : 3
siblings    : 10
core id     : 25
cpu cores   : 10
apicid      : 242
initial apicid  : 242
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt aes lahf_lm ida arat epb dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips    : 4000.23
clflush size    : 64
cache_alignment : 64
address sizes   : 44 bits physical, 48 bits virtual
power management:

@cbecker
Copy link
Contributor

cbecker commented Apr 6, 2014

The difference is that I have, on top of what you listed, eagerfpu tsc_deadline_timer xsave avx xsaveopt pln pts

I suppose the timing difference could be due to avx. There are some LLVM bug reports about avx performance worse than sse http://llvm.org/bugs/buglist.cgi?quicksearch=avx .

@JeffBezanson is there an easy way to disable AVX within Julia? I could try that to confirm that this is the issue.

@ihnorton
Copy link
Member

ihnorton commented Apr 6, 2014

@cbecker grep for avx2 in src/codegen.cpp. using -avx at the same place should do what you want (then make clean && make)

@cbecker
Copy link
Contributor

cbecker commented Apr 6, 2014

Thanks @ihnorton.

@jiahao, @StefanKarpinski Indeed, AVX is slowing it down.Timings for pi_sum:

without AVX:  39msec
with AVX:     48msec

I suppose there are cases where AVX indeed helps (EDIT: indeed, rand_mat_mul gets 2x improvement), but for now there may not be much we can do but wait for the LLVM bugs to be fixed.

jiahao added a commit that referenced this pull request Apr 6, 2014
Integrate gsl-shell/luajit into microbenchmarks
@jiahao jiahao merged commit 1620a69 into master Apr 6, 2014
@jiahao jiahao deleted the cjh/fix-1662 branch April 6, 2014 19:57
@cbecker cbecker mentioned this pull request May 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

performance request... add luajit and/or gsl-shell to benchmark
7 participants