smaller `Task` object #36802

JeffBezanson · 2020-07-25T23:57:51Z

With this sizeof(jl_task_t) goes from 616 bytes to ~~416~~ 352. Who likes free performance?

master, pfib(31), 4 threads:

  1.879936 seconds (13.24 M allocations: 1.902 GiB, 41.12% gc time)

This PR:

  1.617563 seconds (13.22 M allocations: 1.499 GiB, 36.26% gc time)

master, spawn/wait test from #36699, 4 threads:

3.090069 seconds (21.87 M allocations: 2.872 GiB, 2.63% gc time)

this PR:

2.852884 seconds (21.90 M allocations: 2.218 GiB, 3.54% gc time)

I would consider it unlikely/inadvisable for a task to hold 29 locks 😂
I'm planning to remove a couple more redundant fields from Task, but this should be a big obvious win for now.

fingolfin · 2020-07-27T09:27:39Z

src/julia.h

-    // does not need to be defined until the task runs
-    int16_t tid;
-    /* for the multiqueue */
-    int16_t prio;


Moving the fields tid and prio will sadly break our C access to the Julia kernel in GAP.jl. I'll look into adding an API that will allow us to avoid direct access to jl_task_t members, but in the meantime, could I request that these members are not moved around? This changed doesn't seem to be required by this PR (and is mentioned neither in the PR description nor the commit message).

Perhaps this reordering is meant to make access to tid and prio faster by moving them to the start of the struct, but it's not clear whether that's the case, and ideally it'd be benchmark separately anyway?

It packs the struct by optimizing field alignment. Our general policy for internal changes is "caveat emptor" and that users who poke at internals need to use version checks.

First off, if that's the motivation for the change, shouldn't it still be explicitly mentioned in the commit message?

Secondly, of course I understand that this is the general rule, but given that we have no choice but to poke at this, and given that it's unclear to me how much that minor part of the commit contributes to the overall performance win, I think it's not unreasonable to at least ask whether this chance could be delayed for the time being? "Just a version check" is not enough to deal with it, one needs to compute the correct offsets for all target platforms, and all relevant Julia versions (here: < 1.6 and >= 1.6, so just two for now), then hardcode these in the C code accessing them. I can of course do that, but it's not quite that trivial as "use a version check" might make it sound like :-).

See also discussion in PR #36064

First off, if that's the motivation for the change, shouldn't it still be explicitly mentioned in the commit message?

I mean, the commit message is "Smaller task objects" which is the motivation for this. Sure, it could perhaps be even more descriptive but it isn't like it is an unrelated change.

one needs to compute the correct offsets for all target platforms, and all relevant Julia versions (here: < 1.6 and >= 1.6, so just two for now), then hardcode these in the C code accessing them

Why is that necessary? The natural way to solve this is to include the header file.

First off: I only now fully realized what @vchuravy pointed out, namely this is about "packing" the struct jl_task_t. As such, it may very well be that the offset of the copy_stack member does not change, and hence my worries are for nothing, and this PR actually causes no problems for us. As soon as I am at a computer again, I'll verify this.

@JeffBezanson Of course we are using the header file. But that only shows us the layout of the struct in that specific version of Julia. If the layout of the struct changes between Julia versions in a way that affects the generated machine code (e.g. because the offset of a member we access changes), then that means we either needsto recompile for each (set of) Julia versions were the struct differs (i.e. have multiple versions of the binary), or else use tricks like hardcoding offsets (so effectively bypassing the header).

For details, see #36064 and specifically #36064 (comment) (that talks about ptls->safe_restore, but the same applies to task->copy_stack.

I've also started work on #36823 in an attempt to allow us to get rid of direct access to task->copy_stack, but that's not yet ready (but I hope it'll be ready in time for 1.6, we'll see).

OK, so I checked on various systems, and indeed on 64bit systems this does not change the offset of copy_stack. Since we are only planning on support 64bit anyway, I withdraw my concerns, and apologize for the noise.

I'm glad you lucked out this time, but these offsets will probably change in other PRs. I will try to keep the offset of copy_stack if possible. But in general we do not guarantee ABI compatibility at this level between minor versions.

Yeah, sure -- we initially had the plan of compiling the code once for each major Julia version for this reason. But with the advent of JLLs, now it is interesting to use those. One model there is to keep having a separate version of the JLL for each Julia version; but the logistics of that are complicated. So now we are trying to support all with a single compilation, and in the course of that, are trying to get rid of all access to members of internal structs. One step was taken with PR #36064, and the final bit (assuming we didn't miss something...) will come with PR #36823

- move locks array to thread-local state - move tid,prio to occupy existing alignment padding

vtjnash · 2020-08-13T18:41:45Z

src/julia.h

+    int16_t tid;
+    // multiqueue priority
+    int16_t prio;
+
    jl_ucontext_t ctx; // saved thread state


We could potentially store this inside the stkbuf object too, if that's helpful for making it cheaper

JeffBezanson added performance Must go faster multithreading Base.Threads and related functionality labels Jul 25, 2020

fingolfin reviewed Jul 27, 2020

View reviewed changes

This was referenced Jul 27, 2020

Accessors for current_task, safe_restore in TLS #36064

Merged

Add jl_active_task_stack to gcext API #36823

Merged

JeffBezanson force-pushed the jb/smallertask branch from 526c885 to a793b51 Compare August 2, 2020 16:27

smaller Task object

fa8f2e2

- move locks array to thread-local state - move tid,prio to occupy existing alignment padding

JeffBezanson force-pushed the jb/smallertask branch from a793b51 to fa8f2e2 Compare August 2, 2020 23:08

fingolfin mentioned this pull request Aug 3, 2020

Crash in get_module_functions when (pre)compiling packages JuliaInterop/CxxWrap.jl#259

Closed

vtjnash approved these changes Aug 4, 2020

View reviewed changes

Merge branch 'master' into jb/smallertask

1693be9

JeffBezanson merged commit 48602c5 into master Aug 5, 2020

JeffBezanson deleted the jb/smallertask branch August 5, 2020 20:55

simeonschaub pushed a commit to simeonschaub/julia that referenced this pull request Aug 11, 2020

smaller Task object (JuliaLang#36802)

a4a1d4e

- move locks array to thread-local state - move tid,prio to occupy existing alignment padding

vtjnash reviewed Aug 13, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smaller `Task` object #36802

smaller `Task` object #36802

JeffBezanson commented Jul 25, 2020 •

edited

Loading

fingolfin Jul 27, 2020

vchuravy Jul 27, 2020

fingolfin Jul 27, 2020

KristofferC Jul 27, 2020

JeffBezanson Jul 27, 2020

fingolfin Jul 27, 2020

fingolfin Jul 28, 2020

JeffBezanson Jul 28, 2020

fingolfin Jul 29, 2020

vtjnash Aug 13, 2020

smaller Task object #36802

smaller Task object #36802

Conversation

JeffBezanson commented Jul 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smaller `Task` object #36802

smaller `Task` object #36802

JeffBezanson commented Jul 25, 2020 •

edited

Loading