Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessors for current_task, safe_restore in TLS #36064

Merged
merged 1 commit into from
Jul 27, 2020

Conversation

fingolfin
Copy link
Member

In code which may be compiled against one Julia version but then gets loaded
in another (e.g. due to an update), it is problematic to directly access
members of jl_ptls_t, as this structure frequently changes between Julia
versions

The accessor function jl_get_current_task and jl_get_root_task avoid this.

Note that macros jl_current_task and jl_root_task exist, but since those
are compiled into the code which includes julia.h, they do not deal with
the situation described above.

No alternatives seem to exist for jl_get_safe_restore and jl_set_safe_restore.

Motivation: the integration of GAP with Julia by changing GAP to use the Julia GC so far works quite well (see also GAP.jl) but unfortunately GAP has to be recompiled when the Julia version changes. This is because GAP kernel code expects the GC to perform stack scanning; and for that it needs to know whether the current task is the root task, and also to catch hard errors (segfaults) triggered by reading beyond the bounds of a stack into a guard page. (Note that we try very hard to not run into this, but for the main thread, this apparently cannot be avoided 100%). To cope with this, we need to recompile GAP against every new Julia version; unfortunately an upgrade to Julia does not necessarily trigger updates to installed packages, so this is trick to get right. See also JuliaPackaging/BinaryBuilder.jl#511

A lot of this could be sidestepped if were able to make a GAP_jll. Alas, there are several stumbling stones... This PR tries to remove several of them. We need ...

  • safe_restore to catch exceptions caused by access to guard pages;
  • the root_task to check whether the current task is the root task (so an API to test whether a given task is the root task, a la jl_is_roottask(task *t) or even jl_is_currenttask_root or so would be sufficient to our needs;
  • the current_taskso that we can e.g. call jl_task_stack_buffer on it.

I am completely open to changes here; e.g. better names, better places where to put the new code, etc.; perhaps you also have alternate ideas how to structure this.

If this ever gets merged (after revisions etc.), can I request that this be backported to 1.5? Then we'd only have to make two GAP_jll: one for Julia 1.3 and 1.4 (no TLS layout between them), and one for 1.5+. (that said, wI'll survive if it doesn't get into 1.5 anymore, of course)

If there are any questions, @rbehrends and me will try our best to address them.

@fingolfin fingolfin force-pushed the mh/ptls-accessors branch from df6394f to 493949d Compare May 29, 2020 22:28
@fingolfin
Copy link
Member Author

BTW I realize that exposing safe_restore might be something you are not fond of. We actually were hoping to avoid this by using JL_TRY; looking at the code, I thought that this should catch accesses to the stack guard pages, but in practice, this doesn't seem to be the case on macOS; and on Linux, it works a few times but eventually we end up in a crash anyway.

We are open to explore other alternatives, but before we do that, we need some feedback to know what is and isn't acceptable for you.

@mkitti
Copy link
Contributor

mkitti commented Jun 7, 2020

Cross referencing this with #35726 and paging @c42f for comment.

@c42f
Copy link
Member

c42f commented Jun 9, 2020

Personally I think these APIs could be OK if they allow you to construct a more stable workaround for whatever is causing the underlying problems. But clearly there's very close coupling between GAP and julia implementation details here, so any compatibility with future Julia versions would be best-effort rather than a guarantee.

Ideally we'd address the underlying problems instead but I understand they could be pretty difficult. Perhaps you could describe and link to the parts of GAP which are interacting with the Julia GC in a problematic way? (Was this discussed on a previous occasion?)

@fingolfin
Copy link
Member Author

Yes the link is low-level and I don't expect anything beyond a best-effort, considering that we are deeply integrating with the GC and need to deal with low-level implementation details of tasks and task stacks. That said, considering the development so far, I am not expecting serious issues anytime soon.

Relevant prior PRs are #28368 and #32088 by @rbehrends

@c42f
Copy link
Member

c42f commented Jun 9, 2020

Yes I saw https://github.com/JuliaLang/Juleps/blob/master/GcExtensions.md in overview a while back; very impressive work.

Does gcext.c reproduce any of the problematic cases which this change helps with?

@fingolfin
Copy link
Member Author

@c42f Unfortunately gcext.c does not currently replicate the issue. Obviously it cannot replicate the bit where different Julia versions change the memory layout of the task struct (just to avoid misunderstandings: I am not complaining that this happens, it's to be expected).

But perhaps we could at least replicate the code in there for which we currently need new APIs. @rbehrends what do you think?

There are two relevant functions in the GAP code here:

Use in root scanner

In GapRootScanner() we currently require jl_get_current_task and jl_get_root_task (resp. jl_is_root_task or something like that). The counterpart in gcext.c is root_scanner().

One "clean" way to resolve our problem would be to extend the rootscanner callback with two additional arguments: the current task pointer, and a boolean is_root_task. The only issue I have with this is that it'd break the API (and hence ABI). OTOH we are already dealing with ABI issues here, so this might be acceptable: if there was also something like a #define JULIA_GCEXT_API_VERSION 2 or so then we could conditionally compile our code; we need to deal with an API/ABI break anyway (I mean, to use those new APIs in this PR).
However, such a change would have to be coordinate with any other uses of jl_gc_set_cb_root_scanner (such as CxxWrap.jl, ping @barche).

A safer approach would be to add a NEW separate hook type, say root_scanner_v2, and a corresponding jl_gc_set_cb_root_scanner_v2 etc., and offer both types of root scanner hooks in parallel. This adds a bit of complexity and a one-time overhead per GC start, but that ought to be negligible (famous last words).

In retrospect, I really regret that we didn't add these two arguments to the root_scanner; but in our defense, some of the issues we dealt with in there only became apparent later, and we didn't realize our mistake on implicitly relying on the TLS/task memory layout (nor were we at that point even thinking about supporting multiple Julia 1.x versions with a single GAP binary...)

BTW, we did add a root_task argument for task_scanner() callbacks (which could have been named is_root_task, I guess), but that then isn't used by us... I think this may have been a leftover from an earlier version of the code (perhaps @rbehrends remembers).

Use in task scanner

Our task scanner callback GapTaskScanner invokes ScanTaskStack which in turn invokes
SafeScanTaskStack.

As the name suggests these try to scan the task stacks. Unfortunately, it seems the stack range returned by jl_task_stack_buffer is not always quite correct, and may extend beyond the start of the stack, into the guard pages Julia adds. As a result, the stack scanner can trigger an access to a protected page, which causes a segfault; we catch that using the safe_restore member of the Julia TLS -- using JL_TRY to do the same did not work in our experiments.

One way to trigger the segfaults is to do using GAP; schedule(@task GC.gc()) (which won't crash because we catch the segfaults, but will if you remove our code for doing so).

So why does this happen? I'll let @rbehrends try to explain that.

Taking a step back here, of course the ideal fix would be to ensure that jl_task_stack_buffer always returns precise info so that we never run into the issue.

@rbehrends
Copy link
Contributor

There are basically two separate issues here. The more low-level one is that we need to intercept segmentation faults. Right now, the only portable and safe way to do that seems to be using safe_restore. For example, on macOS, segmentation faults are being handled using mach system calls, which makes overriding signal handling temporarily with sigaction() etc. not an option. JL_TRY and JL_CATCH would be cleaner, but don't seem to work for segmentation faults (I haven't dug deeply enough into the logic for that to understand why or if maybe I'm doing something wrong).

Why do we (possibly) get segmentation faults? Because stacks may end in guard pages (which are inside the stack area) and there is AFAIK no portable way to figure out where they are, at least not for OS-allocated stacks. So, we cannot guarantee that we may not hit them.

Second, figuring out the current task of a thread is something that we need for interacting with jl_task_stack_buffer() in order to scan GAP roots that are on current thread stacks. The root task is something that we only need for the special case of when GAP embeds Julia (rather than GAP working as a Julia module) and thus controls the stack of the main thread. (We may possibly be able to figure a way around that part).

@fingolfin
Copy link
Member Author

Wondering if @Keno or @yuyichao have any thoughts on this?

@yuyichao
Copy link
Contributor

The tasks accessors looks fine, though I don't really see why the one for the thread the GC is running on should be anything special.

Exposing safe_restore is fine too as long as you only call known code under it. You should still have a way to figure out what the stack limit is by not simply running into it. We have jl_init_stack_limits and the result is stored in the task.

@vtjnash
Copy link
Member

vtjnash commented Jun 11, 2020

AFAIK no portable way to figure out where they are

We could probably always store the lower bound in the relevant jl_task_t field when we switch away, as we'd do when using copy stacks (the bound is simply the current frame address). Our estimate of the upper bound is always going to be unreliable in the general case in the future (of using foreign threads or when loaded as a shared library).

@rbehrends
Copy link
Contributor

Explicitly storing an approximation of the current low end would be the easiest (and also most efficient) solution. However, that would have to happen not just when switching away, but also during the stop the world phase of a GC, I believe.


Some additional notes:

We are already utilizing the result of jl_init_stack_limits(), but it doesn't help us, as these limits at least sometimes include the guard pages.

I also looked at the code of the function, and that raised a couple of questions.

One is that the high end is calculated taking the address of stacksize rather than calculating stackaddr + stacksize, which feels odd, but may be an intentional way of just grabbing an address on the stack.

Second, it uses pthread_get_stackaddr_np() on macOS for the low end, but the following quick test seems to indicate that the function actually returns the high end of the stack (on macoS 10.14.6):

#include <pthread.h>
#include <stdio.h>

char *frameaddr;

void *start(void *arg) {
  char buffer[16384];
  frameaddr = buffer;
  return NULL;
}

int main() {
  pthread_t thread;
  pthread_create(&thread, NULL, start, NULL);
  extern void *pthread_get_stackaddr_np(pthread_t thread);        
  extern size_t pthread_get_stacksize_np(pthread_t thread);       
  char *stack = pthread_get_stackaddr_np(thread);
  size_t size = pthread_get_stacksize_np(thread);
  pthread_join(thread, NULL);
  printf("%ld %p %p %ld\n", size, stack, frameaddr, frameaddr - stack);
  return 0;
}

(I'm wondering if I'm not doing something wrong, because I would expect quite a few things to break if the stack limits were actually flipped on macOS.)

In any event, this still leaves us without the guard size. On Linux and FreeBSD, we can use pthread_attr_getguardsize() for that, but there's no equivalent on macOS for an already running thread (though we could probably go with the default value).

Also, I'm not sure how one would figure out the guard pages for the main thread.

@fingolfin
Copy link
Member Author

@yuyichao @vtjnash so I am not clear how you'd prefer us to proceed? From what you write it sounds almost as if you'd be willing to merge this PR as-is? Perhaps I am misunderstanding, though -- I'd appreciate if you could make some recommendation as to what we ought to change to make this ready for merge?

@fingolfin
Copy link
Member Author

I am also asking because we really were hoping to still get this into Julia 1.5 (given that this just adds a few light accessor function, there should be no risk of a regression?), but I guess this is getting more unlikely with every further day that is passing. (So also ping to @StefanKarpinski and @JeffBezanson -- my apologies for bugging you, but it'd be good to know where we stand here...)

@yuyichao
Copy link
Contributor

Merging/exposing these sounds fine to me. It's mostly about if these are concepts that we thing external C code could use and I think that's fair. If it turns out that you need something else it can be added later. The code added here are pretty thin shim anyway.

Whether you are using it correctly in your code is quite a different issue and doesn't really matter for this PR as far as my concern... On that end my only doubt was

The tasks accessors looks fine, though I don't really see why the one for the thread the GC is running on should be anything special.

which I do like an answer out of curiosity but that doesn't really block this PR.

@fingolfin
Copy link
Member Author

@yuyichao thank you for the clarification and the comment. Yeah, whether we are doing everything right is indeed a separate question, and one we constantly wonder about as well ;-). In part because we have to guess about some aspects of the Julia kernel and GC and the invariants it assumes... I've already encourage @rbehrends to discuss these with you as needed.

Regarding your doubt / question, here is my understanding (and as always @rbehrends will hopefully correct me if I am wrong): We only treat the stack of the root task on the main thread differently if Julia is embedded into GAP, i.e., GAP runs on the main thread and the inits libjulia from there. In that case, we have precise knowledge about where the relevant part of the stack starts (because we record it in GAP's main function), and we simply use that in the jl_gc_set_cb_root_scanner to override the value computed by jl_task_stack_buffer. This simply avoids issues with scanning too far (reaching into guard pages). I am not sure (as in: I don't remember, it's too long ago we wrote this code and we didn't document this) whether it also ensures that we scan enough: I.e. it might be that jl_task_stack_buffer returns a stack start that does not cover the parts of the stack "owned" by GAP (i.e. there is a range of the stack which no Julia code ever "sees" as it is called too deep in the GAP callstack). Again: I am not saying this happens, but in any case, with our patch I don't have to worry about it.

By the way: we recently discussed whether we could remove the jl_task_stack_buffer scan from our "root scanner callback", and only scan stacks inside the "task scanner callback". That would be nice, because it would

  1. avoid code duplication (all stack scanning done in a single place)
  2. it would avoid the need for jl_get_current_task as the task scanner callback receives the relevant task as an argument
  3. it would avoid the need for jl_get_root_task as the task scanner callback already receives a boolean indicating whether the task given to it is a root task.

So that would be super nice! However, sadly this does not quite work, at least when embedding Julia into GAP: the problem is that the root task might not be "dirty" for GC purposes, and thus Julia might not call any task scanner callbacks; but then we don't scan the root stack, which we almost need to scan (at least in this scenario). Perhaps (?!?) it would be possible to pull the above of in the "GAP embedded into Julia" mode, but actually those are currently identical code wise, and even if not, we really need both modes at this time.

@yuyichao
Copy link
Contributor

We only treat the stack of the root task on the main thread differently if Julia is embedded into GAP, i.e., GAP runs on the main thread and the inits libjulia from there.

This special treatment is fine. But my question is that this is not what jl_get_root_task gives you. The GC may run on any thread.

@fingolfin
Copy link
Member Author

Sure! We only use it to check if the current task is the root task. If it is, we run our special code, otherwise not.

@yuyichao
Copy link
Contributor

So your special code doesn't care which thread it is running on and doesn't care if the answer you got is about the thread/task you initialized your library on?

@fingolfin
Copy link
Member Author

It checks whether the current task is the root task of the main thread. For that and only that we have special requirements and special information.

@yuyichao
Copy link
Contributor

It checks whether the current task is the root task of the main thread.

Again, my point is that if you call these functions during GC this is not what you get.
You do not get an answer about whether the current task is the root task of the main thread

@fingolfin
Copy link
Member Author

So, are you saying jl_get_root_task doesn't return the root task of the active thread during GC?
Or that jl_get_current_task doesn't return the current task of the active thread during GC? Or what else? It'd be helpful if you could explain what these do instead of just telling they don't do what we think...

@yuyichao
Copy link
Contributor

yuyichao commented Jun 17, 2020

No I'm saying that active thread is not the main thread, not the thread you initialize your library on.

Under the assumption that

Again, my point is that if you call these functions during GC this is not what you get.


It'd be helpful if you could explain what these do instead of just telling they don't do what we think...

And I asked about it in so many different ways already, since I'm not sure what you are missing or what I am missing.

for the thread the GC is running on should be anything special.
The GC may run on any thread.
if the answer you got is about the thread/task you initialized your library on?


Also note that I've never even asked anything about the task. I've only ever asked about the thread you run on vs the main thread. It'll be helpful if you can clarify what you mean by "main thread" since that doesn't seem to be well defined and seems to alter meaning between the thread the GC is running on or the thread the lilbrary was initialized on.

@rbehrends
Copy link
Contributor

To clarify, this is about when we are not using GAP as a library. We understand (and have seen it happening) that a package can be loaded on any thread, any task and that the GC can be invoked on any thread, any task.

In our specific case, we're dealing with the situation when Julia is the library, not GAP, and is initialized from GAP with GAP running as the main application, which will always happen from GAP's main thread in that situation. The special case that we are considering is for when the GC happens to be invoked from there rather than from a separate thread/task.

@yuyichao
Copy link
Contributor

The special case that we are considering is for when the GC happens to be invoked from there rather than from a separate thread/task.

What's "there"? Are you only starting one julia thread?

@rbehrends
Copy link
Contributor

rbehrends commented Jun 18, 2020

What's "there"? Are you only starting one julia thread?

There = root task of the main thread. We don't make any assumptions about the number of Julia threads that are being used.

Edit: that said, I don't think we have ever used or needed more than one thread for that particular setup in practice, as most of our work is done via GAP.jl.

@yuyichao
Copy link
Contributor

yuyichao commented Jun 18, 2020

Right so the question comes back to

the one for the thread the GC is running on should be anything special

From #36064 (comment), the special condition should be if the main thread is running root task IIUC. I don't see how the GC thread become special.

that said, I don't think we have ever used or needed more than one thread for that particular setup in practice, as most of our work is done via GAP.jl.

And I'm exactly wondering if that makes this appears to work correctly when it's not.

@rbehrends
Copy link
Contributor

So, I looked at the code again (I think I wrote that last year), and the way it actually looks is that we probably would have to extend the check also for when we scan that task (root task of main thread) from another thread. But we're still going to need it.

That said, the only use case that we have right now for that setup (Julia as a library used from GAP) is to run regression tests. After building GAP.jl, for tests we also run the standard GAP test suite from the gap executable that was built as part of that and linked with -ljulia. So far, this has always been a de facto single-threaded exercise, so the issue has never cropped up.

@yuyichao
Copy link
Contributor

have to extend the check also for when we scan that task (root task of main thread) from another thread. But we're still going to need it.

That is exactly what I thought. Assuming "it" doesn't mean "checking if the task on the thread the GC is running on is the root task for that thread". GC should never care about which thread it runs on.

@rbehrends
Copy link
Contributor

That's essentially right. The issue is that this particular task needs some special treatment.

@yuyichao
Copy link
Contributor

As I said, the functions here are still fine to export (especially current task since the corresponding concept exist and is accessible in julia and has to be stable).

You can decide what other API you want later. There's currently no way to do this by calling a function in the GC but you could save the main thread TLS at init time and accessing field (directly or add accessor), i.e. functions that takes jl_ptls_t ptls2 as input and returns ptls2->root_task or ptls2->current_task as output should do what you need, using the ptls2 you saved at init time.

I don't think we want to export jl_all_tls_states. However, API returning the values (ptls, tasks etc) for the main thread or lookup based on tid (julia accessible property so has to be stable) could be fine (though, again, shouldn't be needed here).

@fingolfin
Copy link
Member Author

Thanks @yuyichao . We discussed this, and for now, our plan is indeed to call jl_get_current_task once right after jl_init() (and only when Julia is embedded, so we hope that we can be sure (?) that at this point there is only one thread and one task; and that the "root task" object for the main task is never changed.

Then we wouldn't need jl_get_root_task anymore. We still need access to safe_restore, though. Assuming this works out, we could thus drop the jl_get_root_task related changes from this PR.

As to `jl_get_current_task, it is already now exported by Julia, it is just missing from the header files; as such we are already using it now by adding our own prototype for it, but it'd be nice to have it in the headers.

That leaves safe_restore, which we currently use because the stack range returned by jl_task_stack_buffer has an accurate bottom (or at least good enough for us), but the top may be inaccurate and thus lead to us scanning a guard page. If Julia were to store the active SP value as tasks switch out, as @vtjnash suggested, that'd allow us to avoid using safe_restore (and in fact provide an optimization, as we'd not scan parts of the stack that are not active, which would also avoid false references to objects which are still in memory but outside the active part of the stack).

@yuyichao
Copy link
Contributor

only one thread and one task

Not really, but you should be on the main one.

And API wise all the ones here seems fair to me. If you want to trim some down since you don't need them that's fine by me too....

@mkitti
Copy link
Contributor

mkitti commented Jun 18, 2020

We discussed this, and for now, our plan is indeed to call jl_get_current_task once right after jl_init()

This is essentially done for the constant Base.roottask within Julia.

https://github.com/JuliaLang/julia/blob/master/base/initdefs.jl#L31

@fingolfin
Copy link
Member Author

fingolfin commented Jun 26, 2020

So, as it is, we don't need jl_root_task anymore, but still need jl_current_task (which exists, we just add our own header). That leaves safe_restore.

We could do away with that if for each task, we had a function like this (which we then could use instead of jl_task_stack_buffer, which could be marked as "deprecated" or so).

// Query the active stack range for task ta, and set *start and *end accordingly.
// That is, *start is set to the (last known) SP value (marking the top of the stack),
// and end is set to the stack base (marking its bottom).
JL_DLLEXPORT int jl_active_task_stack(jl_task_t *ta, void **start, void **end)

Such a function would then not only prevent us from running into guard pages (such alleviating the need for access to safe_restore), it would also result in a nice optimization (by helping us avoid scanning unused parts of the stack). It would also simplify some of our code and allow us to get rid of some assumptions about the Julia internals.

I guess that means whenever a task is switched out, its SP (stack pointer / frame pointer) must be recorded. I think this is what @vtjnash was alluding to in his earlier comment? For non-active tasks, II believe this pseudo-code patch would do it (of course active_stack_size would have to be added to jl_task_t):

diff --git a/src/task.c b/src/task.c
index 9d88306dd4..86f533ed20 100644
--- a/src/task.c
+++ b/src/task.c
@@ -316,6 +316,11 @@ static void ctx_switch(jl_ptls_t ptls)
         }
         else
 #endif
+        char *frame_addr = (char*)((uintptr_t)jl_get_frame_addr() & ~15);
+        char *stackbase = (char*)ptls->stackbase;
+        assert(stackbase > frame_addr);
+        lastt->active_stack_size = stackbase - frame_addr;
+
         *pt = NULL; // can't fail after here: clear the gc-root for the target task now
         lastt->gcstack = ptls->pgcstack;
     }

That said, this wouldn't help for active tasks. I guess for the current task on the GC thread we can observe the SP anyway, but what about the current tasks of other threads?

Anyway, assuming this can really be done w/o putting an unreasonable burden in complexity and performance, would such a thing (tracking the SP / "lower bound" of the stack) be something you think would have a chance of being merged? Then we would investigate this further.

UPDATE: I accidentally wrote PC when I meant SP, sorry for the confusion, now corrected

@fingolfin
Copy link
Member Author

While it's interesting to ponder alternatives, I'd appreciate if this PR (which was already approved) could just be merged, pretty please? Because then I am at least sure we'll have something to work with in 1.6. If we come up with a better solution in the meantime, this PR could still be reverted, after all.

@c42f
Copy link
Member

c42f commented Jul 24, 2020

So, as it is, we don't need jl_root_task anymore

If this is still true, can we remove jl_root_task before merging, or do you still need it?

In code which may be compiled against one Julia version but then gets
loaded in another (e.g. due to an update), it is problematic to directly
access members of jl_ptls_t, as this structure frequently changes
between Julia versions

The existing accessor function `jl_get_current_task` helps to avoid this,
so make it public.

Note that the public macro `jl_current_task` exist, but since macros are
compiled into the code which includes `julia.h`, they do not deal with the
situation described above.

No alternatives currently exist for `jl_get_safe_restore` and
`jl_set_safe_restore`.
@fingolfin fingolfin force-pushed the mh/ptls-accessors branch from e1056d2 to 8a86044 Compare July 24, 2020 12:52
@fingolfin fingolfin changed the title Accessors for root_task, current_task, safe_restore in TLS Accessors for current_task, safe_restore in TLS Jul 24, 2020
@fingolfin
Copy link
Member Author

@c42f done, and rebased

@c42f c42f merged commit 7adb9ce into JuliaLang:master Jul 27, 2020
@c42f
Copy link
Member

c42f commented Jul 27, 2020

Thanks @fingolfin this seems extremely reasonable; I merged it so it will be in 1.6.

For older versions I think you can maintain a list of byte offsets into the TLS to address the field of interest (depending on Julia version), detect the current version and do a load / store of the current task with a version-dependent offset? Essentially this emulates jl_get_current_task on your side by baking in knowledge of the Julia internals for various versions; it's a horrible workaround, but it should get you unstuck.

I feel like the suggestions further up about having jl_active_task_stack would be very reasonable and neat way to address your use case.

@fingolfin fingolfin deleted the mh/ptls-accessors branch July 27, 2020 09:28
@fingolfin
Copy link
Member Author

@c42f Yes I already was thinking about hardcoding specific offsets for various Julia versions. Luckily, 1.4 and 1.5 have the same here, though, so I could also get away with three binaries: one for 1.3; one for 1.4 & 1.5; and one using the new API introduced here for >= 1.6. I'll have to experiment a bit. For now I just want to get a JLL working with one of these versions; once that's there, adapting it to cover multiple Julia versions, one way or another, will be doable.

I definitely want to look into adding jl_active_task_stack or so, but I don't want risk not having it ready in time for 1.6 (work on this necessarily is a side job for me right now; also, we are expecting a baby any day now... ;-).

In the meantime, I discovered that we also are accessing members of jl_task_t directly right now, and that might of course also change across Julia versions (e.g. PR #36802 does that! Though I hope @JeffBezanson might be willing to hold off from that for a bit.

Concretely, we access task->copy_stack; I think this is ultimately because the API jl_task_stack_buffer (which we added) is not designed quite right. I think this will be resolved by the introduction of a jl_active_task_stack function, too.

@c42f
Copy link
Member

c42f commented Jul 27, 2020

get away with three binaries

Well you don't really need three binaries. You should be able to do something along the lines of

static size_t tls_current_task_offset = -1;

// Call sometime around initializing libjulia
void init_offsets()
{
    // Constants determined via offsetof(struct _jl_tls_states_t, current_task)
    // for various versions (and operating systems)
    if (jl_ver_minor() == 5) {
        tls_current_task_offset = 6608;
    else if (jl_ver_minor() == 4 {
        // ...
    }
}

jl_task_t* my_get_current_task()
{
    jl_ptls_t ptls = jl_get_ptls_states();
    return *((jl_task_t**)((char*)(ptls) + tls_current_task_offset));
}

for sure, it's ugly and inconvenient :-/

@fingolfin
Copy link
Member Author

We don't need my_get_current_task, as we can (and do!) use jl_get_current_task also in older Julia versions; we simply added this at the top of our C source file:

JL_DLLEXPORT jl_value_t *jl_get_current_task(void);

So the only thing were we need this is for accessing ptls->safe_restore. For these, I see the following options:

  1. We use hardcoded offsets for Julia 1.3 and 1.4+1.5 and 1.6 (so we don't use) jl_get_safe_restore/jl_set_safe_restore for Julia >= 1.6, and hope that by the time 1.7 comes out, we can drop support for Julia < 1.6, and switch to jl_get_safe_restore/jl_set_safe_restore (e.g. if 1.6 was an LTS, that seems plausible)
  2. We use hardcoded offsets for Julia 1.3 and 1.4+1.5, and jl_get_safe_restore/jl_set_safe_restore for Julia >= 1.6; the latter then would have to be dlsym'ed, though; for that we'd have to carefully check what the overhead is for jumping to those dlsym'ed function pointers (my guess is it'll be irrelevant, but I'd rather be sure than relying on a guess ;-).
  3. We produce two binaries, one for Julia < 1.6 which has hardcoded offsets, and one for >= 1.6 which uses jl_get_safe_restore/jl_set_safe_restore normally (so no dlysm shenanigans)
  4. We produce three binaries, without any hardcoded offsets.

If we produce two variants, then we might as well produce three, the extra setup work in build_tarballs.jl is minimal.... Well, except that there is no Julia_jll for Julia 1.3 right now, but I could probably add one with moderate effort (famous last words).

The main caveat with approaches 3 and 4 is that right now there is no way to adjust the compat requirement for Julia used in the JLL's generated Project.toml (see also JuliaPackaging/BinaryBuilder.jl#856).

So yeah, approaches 1 and 2 have certainly some appeal. But I'll figure it out once I have GAP_jll working with one Julia version ;-)

simeonschaub pushed a commit to simeonschaub/julia that referenced this pull request Aug 11, 2020
In code which may be compiled against one Julia version but then gets
loaded in another (e.g. due to an update), it is problematic to directly
access members of jl_ptls_t, as this structure frequently changes
between Julia versions

The existing accessor function `jl_get_current_task` helps to avoid this,
so make it public.

Note that the public macro `jl_current_task` exist, but since macros are
compiled into the code which includes `julia.h`, they do not deal with the
situation described above.

No alternatives currently exist for `jl_get_safe_restore` and
`jl_set_safe_restore`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants