-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of atomic load in toplevel code #47578
Conversation
It was somewhat intended. I wanted to move it before every CallInstr, instead of doing it after the calls or ccalls, so that it was synchronized with any other atomic barriers written by the user |
2277af7
to
fe9f220
Compare
I implemented that suggestion, @vtjnash. |
fe9f220
to
59dc30f
Compare
Is the linux32 GC corruption known? I don't see how it could be related... As this is a simple performance fix for an issue that could otherwise trip up people benchmarking code in the REPL (which seems like a common thing to do), I think it would be good to back-port this to 1.8 |
It might be an OOM error showing up as something else. Does retrying fix it? |
No, it seems to happen every time... |
59dc30f
to
edcc9d2
Compare
After the rebase, CI looks better. |
The 32bit windows failure is still there, as well as on other PRs rebased ontop of this (as an example https://github.com/JuliaLang/julia/commits/vc/vtune) |
As noted in #47561, the performance of toplevel code is pretty bad because of the atomic barrier (loading the global world age into the task-local one) that's emitted after every instruction now. @vtjnash noted that monotonic ordering is sufficient, which recovers some performance (1.5s -> 0.7s). But the biggest improvement comes from not emitting the atomic operation between every statement, but only when we perform a call.
Fixes #47561