Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match CPU/GPU logic for start tangents #705

Merged
merged 6 commits into from
Oct 3, 2024
Merged

Match CPU/GPU logic for start tangents #705

merged 6 commits into from
Oct 3, 2024

Conversation

raphlinus
Copy link
Contributor

When encoding a start tangent for an end cap or closed segment, use logic designed to match the GPU computed tangent for the first segment.

Note: this changes the GPU calculation to write out the lerp calculation explicitly rather than use the mix intrinsic, so we can rely on the rounding behavior. In the presence of fastmath, the rounding behavior is not guaranteed, but it is verified to work on M1.

Fixes #704, unless we get bitten by fastmath.

When encoding a start tangent for an end cap or closed segment, use
logic designed to match the GPU computed tangent for the first segment.

Note: this changes the GPU calculation to write out the lerp calculation
explicitly rather than use the mix intrinsic, so we can rely on the
rounding behavior. In the presence of fastmath, the rounding behavior is
not guaranteed, but it is verified to work on M1.

Fixes #704, unless we get bitten by fastmath.
@raphlinus
Copy link
Contributor Author

The current state of this is "works for me," and I'm feeling a bit uncertain whether it's fully portable. If we do get hit by fastmath issues, which is entirely plausible given the weak guarantees of the language, then I'm inclined to take this in a somewhat different direction: revert the encoding changes so that the final segment writes the unmodified point values, but change the GPU-side tangent calculation so that it's based on the encoded values rather than the cubic points after degree raising.

So I'm uploading this hoping to get more feedback. If it generally works, I think we can go forward.

@DJMcNab DJMcNab added this to the Vello 0.3 release milestone Sep 27, 2024
@DJMcNab
Copy link
Member

DJMcNab commented Oct 1, 2024

As I mentioned on Zulip, I still see some watertightness failures detected by #416 with this PR. Are those out-of-scope for this change?

I would also like to see a snapshot test for the longpathdash test scene added (ideally validated as showing the failure)

@raphlinus
Copy link
Contributor Author

I'll dig into this. I do think fixing watertightness issues is in scope, but also would consider unblocking 0.3 if we have no known regressions.

Use the explicit `fma()` intrinsic to express affine transformations. This reduces the chance that the compiler will compile different instances of the same source expression with different association/rounding behavior, and in particular reduces the incidence of watertightness failures with a rotation transform.
@raphlinus
Copy link
Contributor Author

I've done a fair amount of investigation.

I did implement the calculation of tangents before degree raising. It didn't make a significant difference, so I'm not sure it's worth preparing the patch. It is possible that on some GPU it would be an improvement.

The watertightness issues get significantly worse when there is a rotation transform. I have a patch (which I will upload as a commit to this PR) that changes transform_apply (in flatten.wgsl) to use fma() instead of multiplication and addition, and that behavior goes away. That is extremely strong evidence that we're being hit by fastmath, and that in particular the same calculation in the source is being compiled with different rounding behavior in different callsites.

I'm doing some more exploration, but I'm inclined to declare victory soon, as we have a state that is not a regression and is in fact considerably better (with rotations) than any previous state. To solve this completely systematically may require a profound rethink - generate path points once then use indirection to use the same point as both the line start of one segment and line end of another, which would guarantee watertightness by construction.

This adds a test. It's clipped to 250x250 otherwise it would hit the oversize file check, but I think it's good enough to catch serious regressions.
Also tweak threshold so it catches the failure.
@raphlinus
Copy link
Contributor Author

As of the most recent commit, the new test is validated by failing on main.

Copy link
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't meaningfully review the maths here, but I can confirm it does resolve the artifacts

:shipit:

vello_encoding/src/path.rs Outdated Show resolved Hide resolved
Improve docstring

Co-authored-by: Daniel McNab <36049421+DJMcNab@users.noreply.github.com>
@raphlinus raphlinus added this pull request to the merge queue Oct 3, 2024
Merged via the queue into main with commit 30c5e8e Oct 3, 2024
17 checks passed
@raphlinus raphlinus deleted the more_watertight branch October 3, 2024 16:10
github-merge-queue bot pushed a commit that referenced this pull request Oct 4, 2024
Adds #706 and #711 to the changelog, as they are somewhat major changes.

I've not added #705, as it seemed to be a regression from 0.2.0 rather
than a relevant change
@PoignardAzur
Copy link
Collaborator

Okay, so I'm a bit late to review this, but one nitpick:

This PR says it matches CPU and GPU logic. But the main change in shader code is:

    // Degree-raise
    if seg_type == PATH_TAG_LINETO {
        p3 = p1;
        p2 = p3 + (1.0 / 3.0) * (p0 - p3);
        p1 = p0 + (1.0 / 3.0) * (p3 - p0);
    } else if seg_type == PATH_TAG_QUADTO {
        p3 = p2;
        p2 = p1 + (1.0 / 3.0) * (p2 - p1);
        p1 = p1 + (1.0 / 3.0) * (p0 - p1);
    }

Whereas the equivalent code in the CPU shader in vello_shaders/src/cpu/flatten.rs is still:

    // Degree-raise
    if seg_type == PATH_TAG_LINETO {
        p3 = p1;
        p2 = p3.mix(p0, 1.0 / 3.0);
        p1 = p0.mix(p3, 1.0 / 3.0);
    } else if seg_type == PATH_TAG_QUADTO {
        p3 = p2;
        p2 = p1.mix(p2, 1.0 / 3.0);
        p1 = p1.mix(p0, 1.0 / 3.0);
    }

That is, same as the old GPU code.

Not sure what's going on here. Probably the CPU shader should be edited to match the change in the GPU shader?

@DJMcNab
Copy link
Member

DJMcNab commented Oct 5, 2024

Those look the same to me? Rust doesn't have fast math.

Unless you're claiming that mix isn't the same operation as the CPU side implementation?

@PoignardAzur
Copy link
Collaborator

Well the GPU side went from using mix to explicitly using operators. I assumed there was a reason for that. Maybe I'm reading too much into it.

@DJMcNab
Copy link
Member

DJMcNab commented Oct 8, 2024

Note: this changes the GPU calculation to write out the lerp calculation explicitly rather than use the mix intrinsic, so we can rely on the rounding behavior. In the presence of fastmath, the rounding behavior is not guaranteed, but it is verified to work on M1.

From the pr description

@PoignardAzur
Copy link
Collaborator

I should really read PR descriptions more often. I don't know when I got the habit of not doing that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Watertightness regression in longpathdash
3 participants