-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smaller Task
object
#36802
smaller Task
object
#36802
Conversation
// does not need to be defined until the task runs | ||
int16_t tid; | ||
/* for the multiqueue */ | ||
int16_t prio; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the fields tid
and prio
will sadly break our C access to the Julia kernel in GAP.jl. I'll look into adding an API that will allow us to avoid direct access to jl_task_t
members, but in the meantime, could I request that these members are not moved around? This changed doesn't seem to be required by this PR (and is mentioned neither in the PR description nor the commit message).
Perhaps this reordering is meant to make access to tid
and prio
faster by moving them to the start of the struct, but it's not clear whether that's the case, and ideally it'd be benchmark separately anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It packs the struct by optimizing field alignment. Our general policy for internal changes is "caveat emptor" and that users who poke at internals need to use version checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, if that's the motivation for the change, shouldn't it still be explicitly mentioned in the commit message?
Secondly, of course I understand that this is the general rule, but given that we have no choice but to poke at this, and given that it's unclear to me how much that minor part of the commit contributes to the overall performance win, I think it's not unreasonable to at least ask whether this chance could be delayed for the time being? "Just a version check" is not enough to deal with it, one needs to compute the correct offsets for all target platforms, and all relevant Julia versions (here: < 1.6 and >= 1.6, so just two for now), then hardcode these in the C code accessing them. I can of course do that, but it's not quite that trivial as "use a version check" might make it sound like :-).
See also discussion in PR #36064
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, if that's the motivation for the change, shouldn't it still be explicitly mentioned in the commit message?
I mean, the commit message is "Smaller task objects" which is the motivation for this. Sure, it could perhaps be even more descriptive but it isn't like it is an unrelated change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one needs to compute the correct offsets for all target platforms, and all relevant Julia versions (here: < 1.6 and >= 1.6, so just two for now), then hardcode these in the C code accessing them
Why is that necessary? The natural way to solve this is to include the header file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off: I only now fully realized what @vchuravy pointed out, namely this is about "packing" the struct jl_task_t
. As such, it may very well be that the offset of the copy_stack
member does not change, and hence my worries are for nothing, and this PR actually causes no problems for us. As soon as I am at a computer again, I'll verify this.
@JeffBezanson Of course we are using the header file. But that only shows us the layout of the struct in that specific version of Julia. If the layout of the struct changes between Julia versions in a way that affects the generated machine code (e.g. because the offset of a member we access changes), then that means we either needsto recompile for each (set of) Julia versions were the struct differs (i.e. have multiple versions of the binary), or else use tricks like hardcoding offsets (so effectively bypassing the header).
For details, see #36064 and specifically #36064 (comment) (that talks about ptls->safe_restore
, but the same applies to task->copy_stack
.
I've also started work on #36823 in an attempt to allow us to get rid of direct access to task->copy_stack
, but that's not yet ready (but I hope it'll be ready in time for 1.6, we'll see).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so I checked on various systems, and indeed on 64bit systems this does not change the offset of copy_stack
. Since we are only planning on support 64bit anyway, I withdraw my concerns, and apologize for the noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad you lucked out this time, but these offsets will probably change in other PRs. I will try to keep the offset of copy_stack
if possible. But in general we do not guarantee ABI compatibility at this level between minor versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sure -- we initially had the plan of compiling the code once for each major Julia version for this reason. But with the advent of JLLs, now it is interesting to use those. One model there is to keep having a separate version of the JLL for each Julia version; but the logistics of that are complicated. So now we are trying to support all with a single compilation, and in the course of that, are trying to get rid of all access to members of internal structs. One step was taken with PR #36064, and the final bit (assuming we didn't miss something...) will come with PR #36823
526c885
to
a793b51
Compare
- move locks array to thread-local state - move tid,prio to occupy existing alignment padding
a793b51
to
fa8f2e2
Compare
int16_t tid; | ||
// multiqueue priority | ||
int16_t prio; | ||
|
||
jl_ucontext_t ctx; // saved thread state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially store this inside the stkbuf object too, if that's helpful for making it cheaper
With this
sizeof(jl_task_t)
goes from 616 bytes to416352. Who likes free performance?master,
pfib(31)
, 4 threads:This PR:
master, spawn/wait test from #36699, 4 threads:
this PR:
I would consider it unlikely/inadvisable for a task to hold 29 locks 😂
I'm planning to remove a couple more redundant fields from
Task
, but this should be a big obvious win for now.