-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A Fix for Aliasing Caused by Result Location Semantics #12251
Comments
Out of curiosity, how much does this overlap with Return Value Optimization under C-style semantics? Are there cases where this goes beyond what the optimizer can do on its own today (w/ a naive C-style lowering that introduces temporaries)? |
This is a very similar problem to C++'s RVO and NRVO semantics. The core problem underlying both is that the result location pointer is observable to the program, and can be stored into memory somewhere. In C++ the pointer is exposed via the
There are a few cases where it can bleed into semantics. In the case where the optimizer inlines a function, it will probably be able to remove the copy. But if the function is not inlined, it becomes very difficult. There might be a way to convey the set of operations that would expose the result location in a well defined way, and allow it to remove the copy if it knows a called function does not do those things, but I'm a bit skeptical that LLVM can do that out of the box. Here are some examples where the RLS semantics might pin down the optimizer: async fn foo() void {
const frame = @frame();
var ptr = &frame;
suspend {}
assert(ptr.* == @frame());
}
fn spawnAsync() @Frame(foo) {
return async foo();
}
test {
const frame = spawnAsync();
resume frame;
nosuspend await frame; // is this a lifetime bug or not?
} fn makeEmptyList() RingList {
var result: RingList = undefined;
result.next = &result; // by #2765 semantics this is the result location
return result;
}
test {
const list = makeEmptyList();
assert(list.next == &list); // is this true, false, or undefined?
} These examples require an answer to the result location question. The first is either a lifetime bug or not, but the second properly pins down the optimizer. If the assert definitely fails, then the optimizer may not remove the temporary. If the assert definitely succeeds, then the language must not use a temporary. If the comparison is |
Here's another case, without lifetime problems: fn checkResult(u32 const *ptr) u32 {
var result: u32 = 4;
assert(ptr == &result); // this should definitely be a well-defined true or false
return result;
}
test {
var r: u32 = 2;
r = checkResult(&r);
} |
Thanks for those helpful examples. I was having trouble grokking the impact of RLS without aliasing, but the value is clear now. Sounds like an equivalent way to describe it is "RLS, with temporaries" (to avoid aliasing), plus guaranteed temporary/copy elision when initializing variables. Very nice approach - I'm a fan of the proposal 👍 |
Very nice writeup. Does this and if yes, how would this, affect C interop or semantics of translated C code? Missing documentation issue in #2809. |
Thank you for the great write-up! re-stating the core issue:Current RLS introduce a language-level I think when the caller is required to guarantee limits to aliasing, having syntax to do/indicate this at the call site is the proper solution. Detailed thoughts on parameter passing (that ended up reading like a follow-up proposal):From my perspective, the value-to-const-reference conversion for parameter passing is structurally the same challenge. const T = struct{
x: u8,
};
var global: T = T{.x = 0};
fn g(val: T, pointer: *T) void {
pointer.* = T{.x = 2};
@import("std").log.info("{d}", .{val.x}); //for call B without temporary, val is no longer constant
}
test {
var local: T = T{.x = 4};
g(global, &local); // call A
g(global, &global); //call B - breaks the non-aliasing assumption
// alternative proposed below; call site creates copies unless 'noalias' syntax is used
g(noalias global, &local); //explicit call A
g(global, &global); //call B - default to safe copy of per-value argument
} I think when the caller is required to guarantee limits to aliasing, having syntax to do/indicate this at the call site is the proper solution. @matu3ba regarding C (/ C++), I don't believe translated code would be notably affected. Details:From what I could find, the C language has no explicit rules regarding any return value optimization; it's up to the compiler to reason that such a transformation is safe within the bounds of the language spec, but elision is never guaranteed. |
Result locations are for the zig internal calling conventions only, callconv(.C) functions have C semantics where the parameters are copies and the result is always stored in a temporary (subject to the as-if rule which allows for elision as an optimization in some cases).
I've thought about this and a lot of similar variants, and it might be worth doing but I'm not totally sure. One difficulty with any syntax on call sites is that it also has to work with |
I personally don't see the difficulty with extending |
Therefore, I think that annotating caller and callee for where stack copies are made on function call and elided are the sanest option. I suspect that with comptime information this also would leave the door open for external static analysis as borrow checking of very error prone parts and, except for arbitrary global pointers, proper ecapsulation via stack copy. Do you have other ideas? Or what do you think? @SpexGuy Please tell me, where and how my model/assumption is flawed. |
@matu3ba Small nitpick, we technically have |
Mhm. This could be eliminated, if the callee could force the caller to use 1 specific semantic for each argument (similar to what SpexGuy suggested. But this would prevent the caller to opt-in into more safety (at cost of lower performance). However, I can not exactly follow why it matters, how each argument is passed over as function argument. I might be too tired (~2h sleep), but as far as I understand, the parameter type is only relevant for the type matching at the usage side and the optimizer has freedom over how it lowers it (copy by value or reference). Do you have an (ideally) brief code example with desugaring like what SpexGuy is doing in the talk ("ATTACK of the KILLER FEATURES - Martin Wickham - Software You Can Love Vancouver 2023") that demonstrates per fn args annotation is necessary @rohlem ?
|
@matu3ba I'm not saying it's necessary to decide for each argument (and the result) individually.
Right, the main issue here is just that Zig currently tells the optimizer that none of the arguments and the result location alias each other, even if this is the case. Note that we already have (sort-of-proposal of caller-provided `noalias` guarantee, as in my old comment, again here)
const P = struct{x: u8, y: u8};
fn flip(in: P) P {
return .{.x = in.y, .y = in.x};
}
const a = P{.x = 3, .y = 4};
test {
var b = a;
// my proposed solution
// We don't tell the compiler that anything is `noalias`,
// so it should assume that everything may alias and make copies for us.
b = flip(b);
// Here we tell the compiler that `b` doesn't alias `b`
// => code will miscompile, as it does in status-quo;
// maybe not full illegal behavior though if we can limit the allowed effects?
noalias b = flip(noalias b);
// The result of an initialization is always `noalias` by default,
// because we cannot reference `c` on the right hand side.
// (Here `noalias` on the argument is not necessary, though it still documents
// that no global mutated by `flip` aliases `b`.)
var c = flip(noalias b);
// Tricky: Values that the compiler provides backing for in-place
// (literals, results of operators, functions) are trivially `noalias`,
// because the compiler can prove that they are.
// HOWEVER it then can't re-use that value as a result location.
var d = flip(.{ a.x, c.y});
// We could allow this next example for explicitly
// telling the compiler to re-use the temporary's location.
// (We can imagine different functions like "double" which are correct
// even without copies, that declare their argument `noalias` as well.)
var e = flip(noalias .{ b.x, d.y});
// For regularity, the compiler shouldn't be allowed to "be smart" and
// replace `.{d.x, d.y}` with value-equivalent `d` IMO: causes issues here.
noalias d = flip(.{d.x, d.y});
} Note that function bodies can always be generated to assume by-reference EDIT: Important detail I initially forgot (apparently it was late for me too): |
fn swizzle(v: *const [4]f32) [4]f32 {
return .{ v[1], v[2], v[3], v[0] };
}
test {
var arr = [5]f32{ 0, 1, 2, 3, 4 };
arr[0..4] = swizzle(&arr[1..5]);
} This function call is aliasing the parameter and the return value, and therefore should be Checked Illegal Behavior, IMO. Could we add runtime assertion for a parameter aliasing with the return value? Can the problem happen when using "single assignment"? |
Yes, but this isn't all cases, and may even have false positives. Here's another case where aliasing is hard to detect: const Inner = [2]Big;
const Outer = struct { v: *Inner };
fn foo(o: Outer) Inner {
return .{ o.v[1], o.v[0] };
}
test {
var o: Outer = ...;
o.v.* = foo(o);
} Here the parameter to foo is Additionally, Finally, depending on how exactly we define aliasing, there may be false positives. For example, imagine a function like this: fn last(s: []const T) T {
return s[s.len-1];
}
test {
var v: [4]T = ...;
v[0] = last(&v);
} Here the parameter and the result location overlap, but the function performs no reads that overlap the result location. So there is no aliasing in this example, despite the overlapping types.
To get a correct check without false positives, we would have to verify within the function that no individual reads or writes overlap the result location. This is possible to do, but cannot be turned on or off by the caller. So these checks would be present in all calls. |
Thanks for all the examples you're producing, it's very helpful to the discussion. I'm suggesting runtime checks (yet imperfect) because I don't really see another way forward for Zig. Disabling result pointer optimization like in C, means user will have to deal with it. Handling it correctly for the compiler seems to require Rust or Val-like restrictions to the language. What about setting the result pointer to undefined (0xaa) when entering the function in Debug mode? That way reads through any alias will be detected. This would also break your But |
Can someone summarize what the issues are with the original proposal? It seems like the way forward to me; even if it may produce suboptimal code in some cases, it appears to solve the problem with async. |
@SpexGuy I'm new to Zig and probably not qualified to be commenting, but here I go. I came to this issue via your talk: https://youtu.be/dEIsJPpCZYg?si=GCUFI46nHnjQ_SaH It was a bit off-putting to learn of issues with aliasing caused by hidden pointer magic. Thanks for working to make Zig better!
It's interesting that to avoid a copy requires introducing a tmp variable and a copy, but if the optimizer can often remove the later, then it makes sense in the end. For a language with "no hidden memory allocations" and "no hidden control flow", I didn't expect the hidden "value-to-const-reference transformation". But I'm not about to ask for "no hidden optimizations." 😂
Swift 5.9 recently added the concept of noncopyable structs. Would it be worth considering something like that? Async frames aren't the only thing that shouldn't be copied, so why not allow users to mark things as noncopyable too? 🤔 (Obviously recognizing that Swift is a very different language with built in reference counting a lot of copying) |
My grug brain thinks we should just define the semantics of // ORIGINAL
const V2 = struct { x: f32, y: f32 };
test {
var v = V2{ .x = 0, .y = 1 };
v = V2{ .x = v.y, .y = v.x };
}
// DE-SUGARED
const V2 = struct { x: f32, y: f32 };
test {
// BEGIN TX
var tmp: V2 = undefined;
tmp.x = 0;
tmp.y = 1;
// COMMIT
var v: V2 = undefined;
v.x = tmp.x;
v.y = tmp.y;
// END TX
// BEGIN TX
var tmp2: V2 = undefined;
tmp2.x = v.y;
tmp2.y = v.x;
// COMMIT
v = undefined;
v.x = tmp2.x;
v.y = tmp2.y;
// END TX
} So yes temporaries for all assignments. I feel that this is a large enough footgun that we should immediately fix this with temporaries and then debate other solutions after. This is a correctness problem and we shouldn't worry about performance. |
That makes sense, but then how would we plague the language with ill-conceived experimental semantics for many years? Especially when it boosts the performance in many cases while merely unleashing pure chaos in any code that is vaguely self-referential? Remember: Only code that works is benchmarked. Sarcasm aside, the best time to add Forwarding Assignment Syntax to the language (thus returning to sane defaults) was when RLS was first added. The second best time is now. |
Motivation
Result Location Semantics have three core goals in Zig
Unfortunately this feature also causes some unintended side effects, like #4021, #3234, and #12064.
The problem looks like this:
With the current semantics, the result location
&v
of the second assignment is forwarded to the struct initialization expression, causing it to be equivalent toThis means that
v.x
is overwritten in the first assignment, and then used in the second assignment.This is made worse by the transparent value-to-const-reference transformation which is performed on some struct parameters, which can cause aliasing to be easily observed, as in #3696 and #5973.
Here these two features conspire to preserve aliasing through the function call. The parameter is transformed into a const reference and the result location is forwarded to the function. Both of these are
&v
, so this has the same behavior as the first example.This behavior also has consequences for the optimizer. In order for an RLS transformation to be a strict win, the implicit return pointer needs to be marked
noalias
. However, this mark would turn the problems listed above into Unchecked Illegal Behavior, allowing the optimizer to mark them asunreachable
. But without thenoalias
mark, the optimizer may in some cases be constrained and unable to combine writes to the result location due to concerns about aliasing. Additionally, since RLS applies to all structs, it includes very small structs that could be more efficiently returned in registers.There are specific cases where we could detect this aliasing and make it Checked Illegal Behavior, but we would not be able to detect all cases. Consider as an example,
Aliasing occurs here, but the only way to find it is to keep track of every read and write, and perform a pairwise comparison on all of them. Once loops get involved, the memory requirement increases for every iteration of the loop, as does the work required for detection. So it's unreasonable to make this checked in the general case.
Since it has occurred naturally in nonobvious ways, it shouldn't be UIB. But as shown above, we can't really make it CIB either. Implementation defined behavior here would only create a needless and difficult to observe portability burden. The alternative is to make it well-defined, and allow result pointers to alias. But that defeats the first core goal of RLS, to be an optimization. It also causes unexpected behavior, as shown by the linked issues. So something needs to change.
The Fix
The obvious fix is to introduce temporaries on all assignments, but this would give up all of the three core goals. However, there are many cases where we can detect that a value must not alias, and safely preserve the current behavior. For example:
The only cases that need to change are of the form
For these expressions, the existing var may alias inside the expr. For the sake of avoiding accidental aliasing, we should change the interpretation of these statements to do the following:
&<existing_var_or_expr>
@TypeOf(<existing_var_or_expr>)
, or an anonymous temporary if it has an inferred type<expr>
using the temporary as the result locationThis change may have a performance cost in some cases, however it will also allow us to safely put the
noalias
attribute on result location pointers. Functions would be generated in the same way they are now, with an RLS pointer, but calls to those functions may need to introduce a temporary based on the form of the call.Forwarding Assignment Syntax
Sometimes you need to overwrite a large struct, and avoiding a copy is important for performance. After this change, you have two options:
I think this is probably good enough for most use cases, but if there are convincing arguments we could consider introducing a new syntax that avoids the temporary and asserts that no aliasing will occur. A "forwarding assignment" where the lhs is used as an input to the rhs. I'd like to avoid syntax bikeshedding in this issue, but here are some examples of how it could look:
Consequences for Async and Pinned
Aside from optimization, async and pinned types have correctness requirements based on avoiding copies. Since this change will introduce copies which could break existing async code, we should implement compile errors for copying async frames before or at the same time as making this change. Once that's in though, most of async will still work with these changes. For example,
But there are still places where you need to assign to existing memory. For example:
We have a sort of workaround for these cases, in the form of
@asyncCall
, but it's a pretty dramatic change in syntax for a small change in semantics. We don't have any parallel workaround for pinned structs. So for the sake of async and pinned, we may want to more strongly consider adding a Forwarding Assignment Syntax.Consequences for Parameter Passing
The value-to-const-reference conversion on parameters dovetails with the old RLS semantics to cause a lot of problems, as described at the beginning. However, with RLS "fixed" to avoid aliasing, I suspect that leaves it in a much better place. I think we should fix RLS first, and then see what problems still arise from parameter conversion and consider it separately if it is still causing problems.
Consequences for Result Location References
These new temporaries may be observable by code using #2765, if code is allowed to reference the result location directly. However #2765 is not implemented yet so it won't affect any existing code.
The text was updated successfully, but these errors were encountered: