-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match CPU/GPU logic for start tangents #705
Conversation
When encoding a start tangent for an end cap or closed segment, use logic designed to match the GPU computed tangent for the first segment. Note: this changes the GPU calculation to write out the lerp calculation explicitly rather than use the mix intrinsic, so we can rely on the rounding behavior. In the presence of fastmath, the rounding behavior is not guaranteed, but it is verified to work on M1. Fixes #704, unless we get bitten by fastmath.
The current state of this is "works for me," and I'm feeling a bit uncertain whether it's fully portable. If we do get hit by fastmath issues, which is entirely plausible given the weak guarantees of the language, then I'm inclined to take this in a somewhat different direction: revert the encoding changes so that the final segment writes the unmodified point values, but change the GPU-side tangent calculation so that it's based on the encoded values rather than the cubic points after degree raising. So I'm uploading this hoping to get more feedback. If it generally works, I think we can go forward. |
I'll dig into this. I do think fixing watertightness issues is in scope, but also would consider unblocking 0.3 if we have no known regressions. |
Use the explicit `fma()` intrinsic to express affine transformations. This reduces the chance that the compiler will compile different instances of the same source expression with different association/rounding behavior, and in particular reduces the incidence of watertightness failures with a rotation transform.
I've done a fair amount of investigation. I did implement the calculation of tangents before degree raising. It didn't make a significant difference, so I'm not sure it's worth preparing the patch. It is possible that on some GPU it would be an improvement. The watertightness issues get significantly worse when there is a rotation transform. I have a patch (which I will upload as a commit to this PR) that changes transform_apply (in flatten.wgsl) to use I'm doing some more exploration, but I'm inclined to declare victory soon, as we have a state that is not a regression and is in fact considerably better (with rotations) than any previous state. To solve this completely systematically may require a profound rethink - generate path points once then use indirection to use the same point as both the line start of one segment and line end of another, which would guarantee watertightness by construction. |
This adds a test. It's clipped to 250x250 otherwise it would hit the oversize file check, but I think it's good enough to catch serious regressions.
Also tweak threshold so it catches the failure.
As of the most recent commit, the new test is validated by failing on main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't meaningfully review the maths here, but I can confirm it does resolve the artifacts
Improve docstring Co-authored-by: Daniel McNab <36049421+DJMcNab@users.noreply.github.com>
Okay, so I'm a bit late to review this, but one nitpick: This PR says it matches CPU and GPU logic. But the main change in shader code is: // Degree-raise
if seg_type == PATH_TAG_LINETO {
p3 = p1;
p2 = p3 + (1.0 / 3.0) * (p0 - p3);
p1 = p0 + (1.0 / 3.0) * (p3 - p0);
} else if seg_type == PATH_TAG_QUADTO {
p3 = p2;
p2 = p1 + (1.0 / 3.0) * (p2 - p1);
p1 = p1 + (1.0 / 3.0) * (p0 - p1);
} Whereas the equivalent code in the CPU shader in // Degree-raise
if seg_type == PATH_TAG_LINETO {
p3 = p1;
p2 = p3.mix(p0, 1.0 / 3.0);
p1 = p0.mix(p3, 1.0 / 3.0);
} else if seg_type == PATH_TAG_QUADTO {
p3 = p2;
p2 = p1.mix(p2, 1.0 / 3.0);
p1 = p1.mix(p0, 1.0 / 3.0);
} That is, same as the old GPU code. Not sure what's going on here. Probably the CPU shader should be edited to match the change in the GPU shader? |
Those look the same to me? Rust doesn't have fast math. Unless you're claiming that |
Well the GPU side went from using |
From the pr description |
I should really read PR descriptions more often. I don't know when I got the habit of not doing that. |
When encoding a start tangent for an end cap or closed segment, use logic designed to match the GPU computed tangent for the first segment.
Note: this changes the GPU calculation to write out the lerp calculation explicitly rather than use the mix intrinsic, so we can rely on the rounding behavior. In the presence of fastmath, the rounding behavior is not guaranteed, but it is verified to work on M1.
Fixes #704, unless we get bitten by fastmath.