Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Trivial) optimized dsdot implementation for HASWELL #1329

Merged
merged 4 commits into from
Oct 25, 2017
Merged

(Trivial) optimized dsdot implementation for HASWELL #1329

merged 4 commits into from
Oct 25, 2017

Conversation

martin-frbg
Copy link
Collaborator

For #1326

@kohr-h
Copy link

kohr-h commented Nov 13, 2017

Would the same change in the microkernel work for other architectures, too? They all look very similar, apart from the number of floats they process at a time.

@martin-frbg
Copy link
Collaborator Author

I expect so, but did not want to mess with all of them at once.

@kohr-h
Copy link

kohr-h commented Nov 13, 2017

Sure. I'm asking because in the overarching Numpy issue, people won't be happy with Haswell support only. And by "support" I mean "optimized microkernel".
Although it seems to me that the part in the microkernel that is removed for dsdot really doesn't make any difference in runtime. Does that agree with your experience?

@martin-frbg
Copy link
Collaborator Author

The small optimization in the microkernel may well be immesurable, the important part is the declaration in KERNEL.HASWELL that tells the build system to use the modified sdot.c file for dsdot as well instead of falling back to the slow generic C. This should work equally well for all the other x86_64 cpus that have their own sdot microkernels, and only a little bit slower for those that fallback to the C implementation of sdot_kernel_16. And it probably would work also for arm, power etc. once their local implementations of sdot.c are adapted.

@kohr-h
Copy link

kohr-h commented Nov 14, 2017

That's what I was guessing, too. Most of the gain for little work (at least for one platform :-)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants