-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized binaries for all new architectures ? #26
Comments
The reason why Zen gained new binaries is because they have separate FADD and FMA hardware which can be utilized simultaneously for greater throughput. This is not the case for the vast majority of Intel CPUs. With the exception of client Golden Cove, all of them can be fully utilized with just FMA. So the existing binaries should already be optimal. For client Golden Cove (Alder/Raptor Lake), I think it is similar to Zen where you need simultaneous FADD and FMA. So in theory, it might get higher flops with the Zen1 binary than the Haswell binary. But I don't have an Alder/Raptor Lake machine to test and confirm. I suppose the upcoming Arrow Lake will also fit squarely in this category. But again, I need hardware to test. |
So, it's not the kind of instructions or the number or even the chain of the instructions that really matter, but the additional blocks of hardware and the available ports - probably due to the extra hardware. So the last Zen executable is fully optimized for Zen 5 regarding AVX512 like Zen 4 ? |
Actually, now that I look at it, it looks like I did something different. I added the FADD+FMA test to all the binaries. And I didn't make any new binaries because they wouldn't be any different from existing one. Zen1-3 -> Use 2017-Zen |
Yes, I can see the phrase "Add 512-bit FADD + FMA test for Zen4" next to all binaries. Maybe you can add a folder named Version 4 or Version 3.1 or Version 3 (v2) or something to clarify it better. And probably change the ReadMe file too :) |
Hello.
I see that for Intel AVX2 code you keep the same binary from Haswell (2013) architecture and the same goes for Intel AVX512 using SkylakePurley (2017) executable and for AMD AVX2/AVX512 you have the same Zen (2017) executable using version 3v2 (AVX512 added for Zen 4)
I know that the instruction sets of AVX2/ AVX512 haven't changed especially regarding the instructions used for this benchmark app, but the implementation of AVX2/ AVX512 even in the same brand/company - Intel / AMD - has definitely changed.
I mean Haswell (2013) executes differently AVX2 code compared to RaptorLake Refresh (2023).
The same goes for SkylakePurley compared to RocketLake or SapphireRapids regarding AVX512 and Zen 2 vs Zen 3 for AVX2 or Zen 4 vs Zen 5 for AVX512.
How come you keep the same executable for all these different architectures ?
TIA
The text was updated successfully, but these errors were encountered: