-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cli/trampolines] Clean up definitions, fix win32 exporting issue #39543
Conversation
|
||
// aarch64 on mac requires some special assembler syntax for both calculating memory | ||
// offsets and even just the assembler statement separator token | ||
#if defined(__aarch64__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why wouldn't this go in trampolines_aarch64.S, when it is only defined and used there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it's nice to have the .S
files be as free of preprocessor stuff as possible; it makes it much easier to, with your eyeball, compare the trampolines across platforms.
Adding @vchuravy because I changed the ppc64le trampoline again; it clobbers one less register now and passes libblastrampoline tests on power now, so I figure we probably want that change here as well. |
b8e70ec
to
2b38b20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PPC changes look right and necessary. r9
is volatile and used to pass arguments.
f35914c
to
c4def51
Compare
Our win32 trampolines weren't getting exported properly. Also take the opportunity to properly organize after debugging libblastrampoline and finding the current organizational strategy lacking. In the future, if we ever want to mangle the symbols from these trampolines, we now have a convenient `CNAME()` that is properly invoked from all trampolines and can consistently mangle names.
According to the examples on page 138-139 of the ELV v2 OpenPOWER ABI [0], we should avoid clobbering r9 and only touch r2 and r12. This fixed a test failure in libblastrampoline, so it's probably needed here as well. [0]: https://members.openpowerfoundation.org/document/dl/576
c4def51
to
f0b0cd8
Compare
Our win32 trampolines weren't getting exported properly. Also take the
opportunity to properly organize after debugging libblastrampoline and
finding the current organizational strategy lacking.
In the future, if we ever want to mangle the symbols from these
trampolines, we now have a convenient
CNAME()
that is properly invokedfrom all trampolines and can consistently mangle names.