Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smaller Task object #36802

Merged
merged 2 commits into from
Aug 5, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 5 additions & 7 deletions src/julia.h
Original file line number Diff line number Diff line change
Expand Up @@ -1782,6 +1782,11 @@ typedef struct _jl_task_t {
uint8_t sticky; // record whether this Task can be migrated to a new thread

// hidden state:
// id of owning thread - does not need to be defined until the task runs
int16_t tid;
// multiqueue priority
int16_t prio;

jl_ucontext_t ctx; // saved thread state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially store this inside the stkbuf object too, if that's helpful for making it cheaper

void *stkbuf; // malloc'd memory (either copybuf or stack)
size_t bufsz; // actual sizeof stkbuf
Expand All @@ -1797,13 +1802,6 @@ typedef struct _jl_task_t {
// current world age
size_t world_age;

// id of owning thread
// does not need to be defined until the task runs
int16_t tid;
/* for the multiqueue */
int16_t prio;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the fields tid and prio will sadly break our C access to the Julia kernel in GAP.jl. I'll look into adding an API that will allow us to avoid direct access to jl_task_t members, but in the meantime, could I request that these members are not moved around? This changed doesn't seem to be required by this PR (and is mentioned neither in the PR description nor the commit message).

Perhaps this reordering is meant to make access to tid and prio faster by moving them to the start of the struct, but it's not clear whether that's the case, and ideally it'd be benchmark separately anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It packs the struct by optimizing field alignment. Our general policy for internal changes is "caveat emptor" and that users who poke at internals need to use version checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, if that's the motivation for the change, shouldn't it still be explicitly mentioned in the commit message?

Secondly, of course I understand that this is the general rule, but given that we have no choice but to poke at this, and given that it's unclear to me how much that minor part of the commit contributes to the overall performance win, I think it's not unreasonable to at least ask whether this chance could be delayed for the time being? "Just a version check" is not enough to deal with it, one needs to compute the correct offsets for all target platforms, and all relevant Julia versions (here: < 1.6 and >= 1.6, so just two for now), then hardcode these in the C code accessing them. I can of course do that, but it's not quite that trivial as "use a version check" might make it sound like :-).

See also discussion in PR #36064

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, if that's the motivation for the change, shouldn't it still be explicitly mentioned in the commit message?

I mean, the commit message is "Smaller task objects" which is the motivation for this. Sure, it could perhaps be even more descriptive but it isn't like it is an unrelated change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one needs to compute the correct offsets for all target platforms, and all relevant Julia versions (here: < 1.6 and >= 1.6, so just two for now), then hardcode these in the C code accessing them

Why is that necessary? The natural way to solve this is to include the header file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off: I only now fully realized what @vchuravy pointed out, namely this is about "packing" the struct jl_task_t. As such, it may very well be that the offset of the copy_stack member does not change, and hence my worries are for nothing, and this PR actually causes no problems for us. As soon as I am at a computer again, I'll verify this.

@JeffBezanson Of course we are using the header file. But that only shows us the layout of the struct in that specific version of Julia. If the layout of the struct changes between Julia versions in a way that affects the generated machine code (e.g. because the offset of a member we access changes), then that means we either needsto recompile for each (set of) Julia versions were the struct differs (i.e. have multiple versions of the binary), or else use tricks like hardcoding offsets (so effectively bypassing the header).

For details, see #36064 and specifically #36064 (comment) (that talks about ptls->safe_restore, but the same applies to task->copy_stack.

I've also started work on #36823 in an attempt to allow us to get rid of direct access to task->copy_stack, but that's not yet ready (but I hope it'll be ready in time for 1.6, we'll see).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so I checked on various systems, and indeed on 64bit systems this does not change the offset of copy_stack. Since we are only planning on support 64bit anyway, I withdraw my concerns, and apologize for the noise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you lucked out this time, but these offsets will probably change in other PRs. I will try to keep the offset of copy_stack if possible. But in general we do not guarantee ABI compatibility at this level between minor versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sure -- we initially had the plan of compiling the code once for each major Julia version for this reason. But with the advent of JLLs, now it is interesting to use those. One model there is to keep having a separate version of the JLL for each Julia version; but the logistics of that are complicated. So now we are trying to support all with a single compilation, and in the course of that, are trying to get rid of all access to members of internal structs. One step was taken with PR #36064, and the final bit (assuming we didn't miss something...) will come with PR #36823

// This is statically initialized when the task is not holding any locks
arraylist_t locks;
jl_timing_block_t *timing_stack;
} jl_task_t;

Expand Down
3 changes: 3 additions & 0 deletions src/julia_threads.h
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,9 @@ struct _jl_tls_states_t {
// Access via jl_exception_occurred().
struct _jl_value_t *previous_exception;

// currently-held locks, to be released when an exception is thrown
small_arraylist_t locks;

JULIA_DEBUG_SLEEPWAKE(
uint64_t uv_run_enter;
uint64_t uv_run_leave;
Expand Down
13 changes: 4 additions & 9 deletions src/locks.h
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,10 @@ static inline void jl_mutex_lock_nogc(jl_mutex_t *lock) JL_NOTSAFEPOINT
static inline void jl_lock_frame_push(jl_mutex_t *lock)
{
jl_ptls_t ptls = jl_get_ptls_states();
// For early bootstrap
if (__unlikely(!ptls->current_task))
return;
arraylist_t *locks = &ptls->current_task->locks;
size_t len = locks->len;
small_arraylist_t *locks = &ptls->locks;
uint32_t len = locks->len;
if (__unlikely(len >= locks->max)) {
arraylist_grow(locks, 1);
small_arraylist_grow(locks, 1);
}
else {
locks->len = len + 1;
Expand All @@ -70,9 +67,7 @@ static inline void jl_lock_frame_push(jl_mutex_t *lock)
static inline void jl_lock_frame_pop(void)
{
jl_ptls_t ptls = jl_get_ptls_states();
if (__likely(ptls->current_task)) {
ptls->current_task->locks.len--;
}
ptls->locks.len--;
}

#define JL_SIGATOMIC_BEGIN() do { \
Expand Down
4 changes: 2 additions & 2 deletions src/rtutils.c
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ JL_DLLEXPORT void jl_enter_handler(jl_handler_t *eh)
eh->prev = current_task->eh;
eh->gcstack = ptls->pgcstack;
eh->gc_state = ptls->gc_state;
eh->locks_len = current_task->locks.len;
eh->locks_len = ptls->locks.len;
eh->defer_signal = ptls->defer_signal;
eh->finalizers_inhibited = ptls->finalizers_inhibited;
eh->world_age = ptls->world_age;
Expand Down Expand Up @@ -245,7 +245,7 @@ JL_DLLEXPORT void jl_eh_restore_state(jl_handler_t *eh)
int8_t old_gc_state = ptls->gc_state;
current_task->eh = eh->prev;
ptls->pgcstack = eh->gcstack;
arraylist_t *locks = &current_task->locks;
small_arraylist_t *locks = &ptls->locks;
if (locks->len > eh->locks_len) {
for (size_t i = locks->len;i > eh->locks_len;i--)
jl_mutex_unlock_nogc((jl_mutex_t*)locks->items[i - 1]);
Expand Down
67 changes: 67 additions & 0 deletions src/support/arraylist.c
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,73 @@ void *arraylist_pop(arraylist_t *a)
return p;
}

// small arraylist

small_arraylist_t *small_arraylist_new(small_arraylist_t *a, uint32_t size)
{
a->len = 0;
if (size <= SMALL_AL_N_INLINE) {
a->items = &a->_space[0];
a->max = SMALL_AL_N_INLINE;
}
else {
a->items = (void**)LLT_ALLOC(size*sizeof(void*));
if (a->items == NULL) return NULL;
a->max = size;
}
return a;
}

void small_arraylist_free(small_arraylist_t *a)
{
if (a->items != &a->_space[0])
LLT_FREE(a->items);
a->len = 0;
a->max = SMALL_AL_N_INLINE;
a->items = &a->_space[0];
}

void small_arraylist_grow(small_arraylist_t *a, uint32_t n)
{
size_t len = a->len;
size_t newlen = len + n;
if (newlen > a->max) {
if (a->items == &a->_space[0]) {
void **p = (void**)LLT_ALLOC((a->len+n)*sizeof(void*));
if (p == NULL) return;
memcpy(p, a->items, len * sizeof(void*));
a->items = p;
a->max = newlen;
}
else {
size_t nm = a->max * 2;
if (nm == 0)
nm = 1;
while (newlen > nm)
nm *= 2;
void **p = (void**)LLT_REALLOC(a->items, nm * sizeof(void*));
if (p == NULL) return;
a->items = p;
a->max = nm;
}
}
a->len = newlen;
}

void small_arraylist_push(small_arraylist_t *a, void *elt)
{
small_arraylist_grow(a, 1);
a->items[a->len - 1] = elt;
}

void *small_arraylist_pop(small_arraylist_t *a)
{
if (a->len == 0) return NULL;
void *p = a->items[--a->len];
a->items[a->len] = NULL;
return p;
}

#ifdef __cplusplus
}
#endif
16 changes: 16 additions & 0 deletions src/support/arraylist.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@

#define AL_N_INLINE 29

#define SMALL_AL_N_INLINE 6

#ifdef __cplusplus
extern "C" {
#endif
Expand All @@ -25,6 +27,20 @@ void arraylist_push(arraylist_t *a, void *elt) JL_NOTSAFEPOINT;
void *arraylist_pop(arraylist_t *a) JL_NOTSAFEPOINT;
void arraylist_grow(arraylist_t *a, size_t n) JL_NOTSAFEPOINT;

typedef struct {
uint32_t len;
uint32_t max;
void **items;
void *_space[SMALL_AL_N_INLINE];
} small_arraylist_t;

small_arraylist_t *small_arraylist_new(small_arraylist_t *a, uint32_t size) JL_NOTSAFEPOINT;
void small_arraylist_free(small_arraylist_t *a) JL_NOTSAFEPOINT;

void small_arraylist_push(small_arraylist_t *a, void *elt) JL_NOTSAFEPOINT;
void *small_arraylist_pop(small_arraylist_t *a) JL_NOTSAFEPOINT;
void small_arraylist_grow(small_arraylist_t *a, uint32_t n) JL_NOTSAFEPOINT;

#ifdef __cplusplus
}
#endif
Expand Down
11 changes: 2 additions & 9 deletions src/task.c
Original file line number Diff line number Diff line change
Expand Up @@ -269,13 +269,8 @@ static void ctx_switch(jl_ptls_t ptls)
jl_task_t *t = *pt;
assert(t != ptls->current_task);
jl_task_t *lastt = ptls->current_task;
// If the current task is not holding any locks, free the locks list
// so that it can be GC'd without leaking memory
arraylist_t *locks = &lastt->locks;
if (locks->len == 0 && locks->items != locks->_space) {
arraylist_free(locks);
arraylist_new(locks, 0);
}
// none of these locks should be held across a task switch
assert(ptls->locks.len == 0);

int killed = (lastt->state == done_sym || lastt->state == failed_sym);
if (!t->started && !t->copy_stack) {
Expand Down Expand Up @@ -593,7 +588,6 @@ JL_DLLEXPORT jl_task_t *jl_new_task(jl_function_t *start, jl_value_t *completion
#ifdef ENABLE_TIMINGS
t->timing_stack = jl_root_timing;
#endif
arraylist_new(&t->locks, 0);

#if defined(JL_DEBUG_BUILD)
if (!t->copy_stack)
Expand Down Expand Up @@ -1108,7 +1102,6 @@ void jl_init_root_task(void *stack_lo, void *stack_hi)
ptls->current_task->excstack = NULL;
ptls->current_task->tid = ptls->tid;
ptls->current_task->sticky = 1;
arraylist_new(&ptls->current_task->locks, 0);

#ifdef COPY_STACKS
if (always_copy_stacks) {
Expand Down
1 change: 1 addition & 0 deletions src/threading.c
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,7 @@ void jl_init_threadtls(int16_t tid)
#ifdef _OS_WINDOWS_
ptls->needs_resetstkoflw = 0;
#endif
small_arraylist_new(&ptls->locks, 0);
jl_init_thread_heap(ptls);
jl_install_thread_signal_handler(ptls);

Expand Down
2 changes: 2 additions & 0 deletions test/threads.jl
Original file line number Diff line number Diff line change
Expand Up @@ -95,9 +95,11 @@ function Base.wait(idle::UvTestIdle)
Base.lock(idle.cond)
try
idle.active = true
Base.iolock_end()
wait(idle.cond)
finally
Base.unlock(idle.cond)
Base.iolock_begin()
Base.unpreserve_handle(idle)
Base.iolock_end()
end
Expand Down