-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accessors for current_task, safe_restore in TLS #36064
Conversation
df6394f
to
493949d
Compare
BTW I realize that exposing We are open to explore other alternatives, but before we do that, we need some feedback to know what is and isn't acceptable for you. |
493949d
to
e1056d2
Compare
Personally I think these APIs could be OK if they allow you to construct a more stable workaround for whatever is causing the underlying problems. But clearly there's very close coupling between GAP and julia implementation details here, so any compatibility with future Julia versions would be best-effort rather than a guarantee. Ideally we'd address the underlying problems instead but I understand they could be pretty difficult. Perhaps you could describe and link to the parts of GAP which are interacting with the Julia GC in a problematic way? (Was this discussed on a previous occasion?) |
Yes the link is low-level and I don't expect anything beyond a best-effort, considering that we are deeply integrating with the GC and need to deal with low-level implementation details of tasks and task stacks. That said, considering the development so far, I am not expecting serious issues anytime soon. Relevant prior PRs are #28368 and #32088 by @rbehrends |
Yes I saw https://github.com/JuliaLang/Juleps/blob/master/GcExtensions.md in overview a while back; very impressive work. Does |
@c42f Unfortunately But perhaps we could at least replicate the code in there for which we currently need new APIs. @rbehrends what do you think? There are two relevant functions in the GAP code here: Use in root scannerIn One "clean" way to resolve our problem would be to extend the rootscanner callback with two additional arguments: the current task pointer, and a boolean A safer approach would be to add a NEW separate hook type, say In retrospect, I really regret that we didn't add these two arguments to the root_scanner; but in our defense, some of the issues we dealt with in there only became apparent later, and we didn't realize our mistake on implicitly relying on the TLS/task memory layout (nor were we at that point even thinking about supporting multiple Julia 1.x versions with a single GAP binary...) BTW, we did add a Use in task scannerOur task scanner callback As the name suggests these try to scan the task stacks. Unfortunately, it seems the stack range returned by One way to trigger the segfaults is to do So why does this happen? I'll let @rbehrends try to explain that. Taking a step back here, of course the ideal fix would be to ensure that |
There are basically two separate issues here. The more low-level one is that we need to intercept segmentation faults. Right now, the only portable and safe way to do that seems to be using Why do we (possibly) get segmentation faults? Because stacks may end in guard pages (which are inside the stack area) and there is AFAIK no portable way to figure out where they are, at least not for OS-allocated stacks. So, we cannot guarantee that we may not hit them. Second, figuring out the current task of a thread is something that we need for interacting with |
The tasks accessors looks fine, though I don't really see why the one for the thread the GC is running on should be anything special. Exposing safe_restore is fine too as long as you only call known code under it. You should still have a way to figure out what the stack limit is by not simply running into it. We have |
We could probably always store the lower bound in the relevant jl_task_t field when we switch away, as we'd do when using copy stacks (the bound is simply the current frame address). Our estimate of the upper bound is always going to be unreliable in the general case in the future (of using foreign threads or when loaded as a shared library). |
Explicitly storing an approximation of the current low end would be the easiest (and also most efficient) solution. However, that would have to happen not just when switching away, but also during the stop the world phase of a GC, I believe. Some additional notes: We are already utilizing the result of I also looked at the code of the function, and that raised a couple of questions. One is that the high end is calculated taking the address of Second, it uses
(I'm wondering if I'm not doing something wrong, because I would expect quite a few things to break if the stack limits were actually flipped on macOS.) In any event, this still leaves us without the guard size. On Linux and FreeBSD, we can use Also, I'm not sure how one would figure out the guard pages for the main thread. |
@yuyichao @vtjnash so I am not clear how you'd prefer us to proceed? From what you write it sounds almost as if you'd be willing to merge this PR as-is? Perhaps I am misunderstanding, though -- I'd appreciate if you could make some recommendation as to what we ought to change to make this ready for merge? |
I am also asking because we really were hoping to still get this into Julia 1.5 (given that this just adds a few light accessor function, there should be no risk of a regression?), but I guess this is getting more unlikely with every further day that is passing. (So also ping to @StefanKarpinski and @JeffBezanson -- my apologies for bugging you, but it'd be good to know where we stand here...) |
Merging/exposing these sounds fine to me. It's mostly about if these are concepts that we thing external C code could use and I think that's fair. If it turns out that you need something else it can be added later. The code added here are pretty thin shim anyway. Whether you are using it correctly in your code is quite a different issue and doesn't really matter for this PR as far as my concern... On that end my only doubt was
which I do like an answer out of curiosity but that doesn't really block this PR. |
@yuyichao thank you for the clarification and the comment. Yeah, whether we are doing everything right is indeed a separate question, and one we constantly wonder about as well ;-). In part because we have to guess about some aspects of the Julia kernel and GC and the invariants it assumes... I've already encourage @rbehrends to discuss these with you as needed. Regarding your doubt / question, here is my understanding (and as always @rbehrends will hopefully correct me if I am wrong): We only treat the stack of the root task on the main thread differently if Julia is embedded into GAP, i.e., GAP runs on the main thread and the inits libjulia from there. In that case, we have precise knowledge about where the relevant part of the stack starts (because we record it in GAP's main function), and we simply use that in the By the way: we recently discussed whether we could remove the
So that would be super nice! However, sadly this does not quite work, at least when embedding Julia into GAP: the problem is that the root task might not be "dirty" for GC purposes, and thus Julia might not call any task scanner callbacks; but then we don't scan the root stack, which we almost need to scan (at least in this scenario). Perhaps (?!?) it would be possible to pull the above of in the "GAP embedded into Julia" mode, but actually those are currently identical code wise, and even if not, we really need both modes at this time. |
This special treatment is fine. But my question is that this is not what |
Sure! We only use it to check if the current task is the root task. If it is, we run our special code, otherwise not. |
So your special code doesn't care which thread it is running on and doesn't care if the answer you got is about the thread/task you initialized your library on? |
It checks whether the current task is the root task of the main thread. For that and only that we have special requirements and special information. |
Again, my point is that if you call these functions during GC this is not what you get. |
So, are you saying jl_get_root_task doesn't return the root task of the active thread during GC? |
No I'm saying that active thread is not the main thread, not the thread you initialize your library on. Under the assumption that
And I asked about it in so many different ways already, since I'm not sure what you are missing or what I am missing.
Also note that I've never even asked anything about the task. I've only ever asked about the thread you run on vs the main thread. It'll be helpful if you can clarify what you mean by "main thread" since that doesn't seem to be well defined and seems to alter meaning between the thread the GC is running on or the thread the lilbrary was initialized on. |
To clarify, this is about when we are not using GAP as a library. We understand (and have seen it happening) that a package can be loaded on any thread, any task and that the GC can be invoked on any thread, any task. In our specific case, we're dealing with the situation when Julia is the library, not GAP, and is initialized from GAP with GAP running as the main application, which will always happen from GAP's main thread in that situation. The special case that we are considering is for when the GC happens to be invoked from there rather than from a separate thread/task. |
What's "there"? Are you only starting one julia thread? |
There = root task of the main thread. We don't make any assumptions about the number of Julia threads that are being used. Edit: that said, I don't think we have ever used or needed more than one thread for that particular setup in practice, as most of our work is done via |
Right so the question comes back to
From #36064 (comment), the special condition should be if the main thread is running root task IIUC. I don't see how the GC thread become special.
And I'm exactly wondering if that makes this appears to work correctly when it's not. |
So, I looked at the code again (I think I wrote that last year), and the way it actually looks is that we probably would have to extend the check also for when we scan that task (root task of main thread) from another thread. But we're still going to need it. That said, the only use case that we have right now for that setup (Julia as a library used from GAP) is to run regression tests. After building |
That is exactly what I thought. Assuming "it" doesn't mean "checking if the task on the thread the GC is running on is the root task for that thread". GC should never care about which thread it runs on. |
That's essentially right. The issue is that this particular task needs some special treatment. |
As I said, the functions here are still fine to export (especially current task since the corresponding concept exist and is accessible in julia and has to be stable). You can decide what other API you want later. There's currently no way to do this by calling a function in the GC but you could save the main thread TLS at init time and accessing field (directly or add accessor), i.e. functions that takes I don't think we want to export |
Thanks @yuyichao . We discussed this, and for now, our plan is indeed to call Then we wouldn't need As to `jl_get_current_task, it is already now exported by Julia, it is just missing from the header files; as such we are already using it now by adding our own prototype for it, but it'd be nice to have it in the headers. That leaves |
Not really, but you should be on the main one. And API wise all the ones here seems fair to me. If you want to trim some down since you don't need them that's fine by me too.... |
This is essentially done for the constant https://github.com/JuliaLang/julia/blob/master/base/initdefs.jl#L31 |
So, as it is, we don't need We could do away with that if for each task, we had a function like this (which we then could use instead of
Such a function would then not only prevent us from running into guard pages (such alleviating the need for access to I guess that means whenever a task is switched out, its SP (stack pointer / frame pointer) must be recorded. I think this is what @vtjnash was alluding to in his earlier comment? For non-active tasks, II believe this pseudo-code patch would do it (of course diff --git a/src/task.c b/src/task.c
index 9d88306dd4..86f533ed20 100644
--- a/src/task.c
+++ b/src/task.c
@@ -316,6 +316,11 @@ static void ctx_switch(jl_ptls_t ptls)
}
else
#endif
+ char *frame_addr = (char*)((uintptr_t)jl_get_frame_addr() & ~15);
+ char *stackbase = (char*)ptls->stackbase;
+ assert(stackbase > frame_addr);
+ lastt->active_stack_size = stackbase - frame_addr;
+
*pt = NULL; // can't fail after here: clear the gc-root for the target task now
lastt->gcstack = ptls->pgcstack;
} That said, this wouldn't help for active tasks. I guess for the current task on the GC thread we can observe the SP anyway, but what about the current tasks of other threads? Anyway, assuming this can really be done w/o putting an unreasonable burden in complexity and performance, would such a thing (tracking the SP / "lower bound" of the stack) be something you think would have a chance of being merged? Then we would investigate this further. UPDATE: I accidentally wrote PC when I meant SP, sorry for the confusion, now corrected |
While it's interesting to ponder alternatives, I'd appreciate if this PR (which was already approved) could just be merged, pretty please? Because then I am at least sure we'll have something to work with in 1.6. If we come up with a better solution in the meantime, this PR could still be reverted, after all. |
If this is still true, can we remove |
In code which may be compiled against one Julia version but then gets loaded in another (e.g. due to an update), it is problematic to directly access members of jl_ptls_t, as this structure frequently changes between Julia versions The existing accessor function `jl_get_current_task` helps to avoid this, so make it public. Note that the public macro `jl_current_task` exist, but since macros are compiled into the code which includes `julia.h`, they do not deal with the situation described above. No alternatives currently exist for `jl_get_safe_restore` and `jl_set_safe_restore`.
e1056d2
to
8a86044
Compare
@c42f done, and rebased |
Thanks @fingolfin this seems extremely reasonable; I merged it so it will be in 1.6. For older versions I think you can maintain a list of byte offsets into the TLS to address the field of interest (depending on Julia version), detect the current version and do a load / store of the current task with a version-dependent offset? Essentially this emulates I feel like the suggestions further up about having |
@c42f Yes I already was thinking about hardcoding specific offsets for various Julia versions. Luckily, 1.4 and 1.5 have the same here, though, so I could also get away with three binaries: one for 1.3; one for 1.4 & 1.5; and one using the new API introduced here for >= 1.6. I'll have to experiment a bit. For now I just want to get a JLL working with one of these versions; once that's there, adapting it to cover multiple Julia versions, one way or another, will be doable. I definitely want to look into adding In the meantime, I discovered that we also are accessing members of Concretely, we access |
Well you don't really need three binaries. You should be able to do something along the lines of static size_t tls_current_task_offset = -1;
// Call sometime around initializing libjulia
void init_offsets()
{
// Constants determined via offsetof(struct _jl_tls_states_t, current_task)
// for various versions (and operating systems)
if (jl_ver_minor() == 5) {
tls_current_task_offset = 6608;
else if (jl_ver_minor() == 4 {
// ...
}
}
jl_task_t* my_get_current_task()
{
jl_ptls_t ptls = jl_get_ptls_states();
return *((jl_task_t**)((char*)(ptls) + tls_current_task_offset));
} for sure, it's ugly and inconvenient :-/ |
We don't need JL_DLLEXPORT jl_value_t *jl_get_current_task(void); So the only thing were we need this is for accessing
If we produce two variants, then we might as well produce three, the extra setup work in The main caveat with approaches 3 and 4 is that right now there is no way to adjust the So yeah, approaches 1 and 2 have certainly some appeal. But I'll figure it out once I have GAP_jll working with one Julia version ;-) |
In code which may be compiled against one Julia version but then gets loaded in another (e.g. due to an update), it is problematic to directly access members of jl_ptls_t, as this structure frequently changes between Julia versions The existing accessor function `jl_get_current_task` helps to avoid this, so make it public. Note that the public macro `jl_current_task` exist, but since macros are compiled into the code which includes `julia.h`, they do not deal with the situation described above. No alternatives currently exist for `jl_get_safe_restore` and `jl_set_safe_restore`.
In code which may be compiled against one Julia version but then gets loaded
in another (e.g. due to an update), it is problematic to directly access
members of jl_ptls_t, as this structure frequently changes between Julia
versions
The accessor function
jl_get_current_task
andjl_get_root_task
avoid this.Note that macros
jl_current_task
andjl_root_task
exist, but since thoseare compiled into the code which includes
julia.h
, they do not deal withthe situation described above.
No alternatives seem to exist for
jl_get_safe_restore
andjl_set_safe_restore
.Motivation: the integration of GAP with Julia by changing GAP to use the Julia GC so far works quite well (see also GAP.jl) but unfortunately GAP has to be recompiled when the Julia version changes. This is because GAP kernel code expects the GC to perform stack scanning; and for that it needs to know whether the current task is the root task, and also to catch hard errors (segfaults) triggered by reading beyond the bounds of a stack into a guard page. (Note that we try very hard to not run into this, but for the main thread, this apparently cannot be avoided 100%). To cope with this, we need to recompile GAP against every new Julia version; unfortunately an upgrade to Julia does not necessarily trigger updates to installed packages, so this is trick to get right. See also JuliaPackaging/BinaryBuilder.jl#511
A lot of this could be sidestepped if were able to make a
GAP_jll
. Alas, there are several stumbling stones... This PR tries to remove several of them. We need ...safe_restore
to catch exceptions caused by access to guard pages;root_task
to check whether the current task is the root task (so an API to test whether a given task is the root task, a lajl_is_roottask(task *t)
or evenjl_is_currenttask_root
or so would be sufficient to our needs;current_task
so that we can e.g. calljl_task_stack_buffer
on it.I am completely open to changes here; e.g. better names, better places where to put the new code, etc.; perhaps you also have alternate ideas how to structure this.
If this ever gets merged (after revisions etc.), can I request that this be backported to 1.5? Then we'd only have to make two GAP_jll: one for Julia 1.3 and 1.4 (no TLS layout between them), and one for 1.5+. (that said, wI'll survive if it doesn't get into 1.5 anymore, of course)
If there are any questions, @rbehrends and me will try our best to address them.