Integrate gsl-shell/luajit into microbenchmarks #6430

jiahao · 2014-04-05T16:19:32Z

Integrate gsl-shell/luajit into microbenchmarks

Also update command to get Julia version info.
Renames label for gsl-shell/lua tests from gsl_shell to lua.
Remove non-working external BLAS usage warning.

Closes #1662.

- Also update command to get Julia version info. - Renames label for gsl-shell/lua tests from gsl_shell to lua. - Remove non-working external BLAS usage warning Closes #1662.

jiahao · 2014-04-06T05:31:47Z

Ready for review.

Provisional benchmark results:

	Fortran	Julia	Python	R	Matlab	Octave	Mathematica	JavaScript	Go	Lua
	GCC 4.8.1	0.3.0-prerelease+2497	2.7.3	3.0.2	R2012a	3.6.4	8.0	V8 3.7.12.22	go1	gsl-shell 2.3.1
fib	0.41	1.68	65.40	880.28	3499.24	6985.53	140.02	3.78	1.90	1.93
parse_int	3.64	2.04	14.34	54.30	1240.19	6974.79	29.62	2.44	3.95	5.15
quicksort	1.09	1.00	31.41	522.61	86.21	1150.07	36.37	3.61	1.05	2.05
mandel	0.75	0.74	14.58	107.15	56.67	335.94	6.22	3.31	2.02	0.74
pi_sum	0.80	0.83	13.44	15.39	1.06	246.65	1.32	0.84	1.17	1.00
rand_mat_stat	0.80	1.98	13.56	13.71	6.83	18.77	5.62	4.03	8.29	4.36
rand_mat_mul	4.39	0.88	0.96	4.14	0.95	3.53	1.19	14.38	8.82	4.21

jiahao · 2014-04-06T05:45:49Z

I reran the tests with an upgraded version of R. The change in performance numbers is quite interesting:

	3.0.2	3.1.0
fib	880.28	552.33
parse_int	54.30	49.77
quicksort	522.61	282.31
mandel	107.15	66.33
pi_sum	15.39	15.03
rand_mat_stat	13.71	15.51
rand_mat_mul	4.14	4.12

tkelman · 2014-04-06T06:02:57Z

While you're at it, Octave also had a major recent release that could improve their numbers.

johnmyleswhite · 2014-04-06T06:03:48Z

Maybe they finally pulled in Radford Neil's patches?

jiahao · 2014-04-06T06:07:46Z

Unfortunately Octave 3.8 hasn't made it into any Ubuntu distribution stream that I'm aware of, and I'm too lazy to build from source. ;-)

tkelman · 2014-04-06T06:10:01Z

Fair enough, the idea that the benchmarks should be based on "standard default easy installation version" makes perfect sense.

StefanKarpinski · 2014-04-06T06:58:26Z

Those Lua numbers are definitely interesting, as are the improved R numbers.

cbecker · 2014-04-06T09:00:18Z

I get an error with the fortran tests, because of the static compilation FFLAGS+= -static-libgfortran, is that necessary?

cbecker · 2014-04-06T09:18:16Z

@jiahao what machine are you running the benchmarks on?
For pi_sum I get very good performance from Lua (as fast as C) but Julia is doing 30% worse (1.3). You can check the msec timings I get at #1662 .

jiahao · 2014-04-06T14:09:48Z

@cbecker this is our main julia test machine whose specs are mentioned somewhere near the benchmarking data. It does have quite a lot of memory, which could be a factor.

jiahao · 2014-04-06T14:15:02Z

I think the static compilation flag was introduced before my time. It's probably unnecessary if you can get away without using it.

cbecker · 2014-04-06T14:18:21Z

@jiahao I don't think having huge amounts of memory would speed up pi_sum. Maybe it is related to AVX or some set of instructions that your machine and mine differ about. If you have the chance, please PM me with cat /proc/cpuinfo.

jiahao · 2014-04-06T14:25:43Z

Mm, the thought did occur to me.

Here's the tail end of cpuinfo:

processor   : 79
vendor_id   : GenuineIntel
cpu family  : 6
model       : 47
model name  : Intel(R) Xeon(R) CPU E7- 8850  @ 2.00GHz
stepping    : 2
microcode   : 0x36
cpu MHz     : 1996.000
cache size  : 24576 KB
physical id : 3
siblings    : 10
core id     : 25
cpu cores   : 10
apicid      : 242
initial apicid  : 242
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt aes lahf_lm ida arat epb dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips    : 4000.23
clflush size    : 64
cache_alignment : 64
address sizes   : 44 bits physical, 48 bits virtual
power management:

cbecker · 2014-04-06T15:22:08Z

The difference is that I have, on top of what you listed, eagerfpu tsc_deadline_timer xsave avx xsaveopt pln pts

I suppose the timing difference could be due to avx. There are some LLVM bug reports about avx performance worse than sse http://llvm.org/bugs/buglist.cgi?quicksearch=avx .

@JeffBezanson is there an easy way to disable AVX within Julia? I could try that to confirm that this is the issue.

ihnorton · 2014-04-06T15:49:48Z

@cbecker grep for avx2 in src/codegen.cpp. using -avx at the same place should do what you want (then make clean && make)

cbecker · 2014-04-06T15:58:58Z

Thanks @ihnorton.

@jiahao, @StefanKarpinski Indeed, AVX is slowing it down.Timings for pi_sum:

without AVX:  39msec
with AVX:     48msec

I suppose there are cases where AVX indeed helps (EDIT: indeed, rand_mat_mul gets 2x improvement), but for now there may not be much we can do but wait for the LLVM bugs to be fixed.

Integrate gsl-shell/luajit into microbenchmarks

Integrate gsl-shell/luajit into microbenchmarks

9a57b99

- Also update command to get Julia version info. - Renames label for gsl-shell/lua tests from gsl_shell to lua. - Remove non-working external BLAS usage warning Closes #1662.

jiahao changed the title ~~WIP: Integrate gsl-shell/luajit into microbenchmarks~~ Integrate gsl-shell/luajit into microbenchmarks Apr 6, 2014

cbecker mentioned this pull request Apr 6, 2014

performance request... add luajit and/or gsl-shell to benchmark #1662

Closed

jiahao added a commit that referenced this pull request Apr 6, 2014

Merge pull request #6430 from JuliaLang/cjh/fix-1662

1620a69

Integrate gsl-shell/luajit into microbenchmarks

jiahao merged commit 1620a69 into master Apr 6, 2014

jiahao deleted the cjh/fix-1662 branch April 6, 2014 19:57

cbecker mentioned this pull request May 23, 2014

Use atsign-simd for sum #6928

Merged

stepelu mentioned this pull request Dec 1, 2015

LuaJIT / Julia unfair benchmarking #14222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate gsl-shell/luajit into microbenchmarks #6430

Integrate gsl-shell/luajit into microbenchmarks #6430

jiahao commented Apr 5, 2014

jiahao commented Apr 6, 2014

jiahao commented Apr 6, 2014

tkelman commented Apr 6, 2014

johnmyleswhite commented Apr 6, 2014

jiahao commented Apr 6, 2014

tkelman commented Apr 6, 2014

StefanKarpinski commented Apr 6, 2014

cbecker commented Apr 6, 2014

cbecker commented Apr 6, 2014

jiahao commented Apr 6, 2014

jiahao commented Apr 6, 2014

cbecker commented Apr 6, 2014

jiahao commented Apr 6, 2014

cbecker commented Apr 6, 2014

ihnorton commented Apr 6, 2014

cbecker commented Apr 6, 2014

Integrate gsl-shell/luajit into microbenchmarks #6430

Integrate gsl-shell/luajit into microbenchmarks #6430

Conversation

jiahao commented Apr 5, 2014

jiahao commented Apr 6, 2014

jiahao commented Apr 6, 2014

tkelman commented Apr 6, 2014

johnmyleswhite commented Apr 6, 2014

jiahao commented Apr 6, 2014

tkelman commented Apr 6, 2014

StefanKarpinski commented Apr 6, 2014

cbecker commented Apr 6, 2014

cbecker commented Apr 6, 2014

jiahao commented Apr 6, 2014

jiahao commented Apr 6, 2014

cbecker commented Apr 6, 2014

jiahao commented Apr 6, 2014

cbecker commented Apr 6, 2014

ihnorton commented Apr 6, 2014

cbecker commented Apr 6, 2014