-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug and Validation Layers #416
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit, but could this and the sin_cos
change be done separately from this?
Thanks for taking a look. I don't mind sending those as a a separate PR. |
I decided to move this out of Draft status to kick off the discussion. Some of the issues that I mentioned in the original comment above (under "Open Questions") are still open and I would love to hear people's thoughts on them. |
To be clear, this would be visual validation for debugging, right? This doesn't include a way to say eg "assert that this list of command doesn't trigger a validation error"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, having some more debug code is extremely useful, thanks.
I've not done a very thoughtful/thorough review yet, but think getting initial comments in is useful.
I was unable to see any watertightness failures in tiger as in your screenshot. Were those intentionally added for an example?
src/debug.rs
Outdated
} | ||
} | ||
|
||
const SHADERS: &str = r#" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer this to be its own file, primarily for syntax highlighting reasons
This PR is mostly about visual validation, yes. That said the watertightness test could be run as part of a test runner that actually asserts or reports the failures in a textual form (e.g. by incorporating into what's proposed in #439) |
Thanks, I'll address them as soon as I get time.
No, these are real errors that are present in the flattening logic (in both the CPU and GPU versions of I'm also intrigued by #439 and I'm wondering if the watertightness test can be run as part of that as well. |
Hmm, I still don't see them. The layer appears to be enabled, as the screen goes dark, and the frame time increases massively I can't test the CPU version, as that instantly panics (as expected/as you documented) due to the download issue |
Could these errors be non-deterministic? |
Ah, you're likely not seeing the errors because the GPU-side stroke expansion is disabled by default (look for |
56033f1
to
03f896a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since #456 we have standard copyright headers. Please update the two files to have consistent capitalization with the standard.
I'm sorry this hasn't been reviewed for so long - it keeps on being pushed down my todo list. In my mind, the major concern is that this doesn't compose with the CPU shaders. It feels like it should be trivial to do readback on a CPU buffer, given that we already have it in memory; is it only an API issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly conflicted here; the actual debug layer code looks good and works fine, but it's very clear that the code here has outgrown our abstractions.
Given that:
- We recommend that users use the
render_to_surface
method (rather thanrender_to_surface_async
) - All the new complexity is inside
render_to_surface_async
I think it's fine to ship this. It's no secret that this code is going to need a really careful rethink at some point, but that isn't the job of this PR, and doing that rethink will be easier if we have this use case in mind.
That's also why I don't mind the CPU shaders not being supported here; clearly it's pretty far from ideal, but it doesn't break anything too badly there
Sorry again for not reviewing this earlier; my personal philosophy on code review has shifted in the intermediate time, to more favour landing things and improving them in-situ.
This PR is however definitely blocked on restoring the wgpu-profiler functionality, i.e. the resolve_queries
calls.
@@ -360,8 +386,11 @@ impl WgpuEngine { | |||
transient_map | |||
.bufs | |||
.insert(buf_proxy.id, TransientBuf::Cpu(bytes)); | |||
let usage = | |||
BufferUsages::COPY_SRC | BufferUsages::COPY_DST | BufferUsages::STORAGE; | |||
// TODO: restrict VERTEX usage to "debug_layers" feature? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we have lots of issues with implicit usages; I think we might need to either make this explicit at the recording level, or else loop through the recording and determine what capabilities we need. I think I'd prefer to just make it explicit.
But this isn't a task for this PR.
} | ||
} | ||
|
||
pub(crate) fn validate_line_soup(lines: &[LineSoup]) -> Vec<LineEndpoint> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did want to do this on the GPU, I suppose we'd want to do a 2d sort, so that all points at the same position are next to each other, then check that the points all pair up (maybe in a cleverer way than a single-threaded scan)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first pass....
pub struct LineEndpoint { | ||
pub path_ix: u32, | ||
|
||
// Coordinates in IEEE-754 32-bit float representation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just store them as f32
? (This should at least be explained here if it stays this way.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added my best-guess explanation
Recording and WgpuEngine can now record and execute draw commands with a render pipeline.
The blit pipeline now uses the render pass feature in Recording instead of making wgpu calls directly.
This introduces a new debug module and DebugLayers data structure that can render debug visualizations of GPU buffers that are internal to the vello pipeline. This currently supports a line-based visualization of the LineSoup and PathBbox buffers. The debug visualization depends on CPU read-back of the BumpAllocators buffer to issue a draw call. This could technically be avoided with an indirect draw but the visualizations will eventually include CPU-side processing and validation. The draws are recorded to the same Recording as the blit in `render_to_texture_async`. The buffers that are used by the draw commands are temporarily retained outside of the `Render` data structure. These buffers are currently released back to the engine explicitly and in various places in code since safe resource removal currently requires a Recording.
Added the `ResourceProxy::BufferRange` type, which represents a buffer binding with an offset.
Both blit and debug pipelines need to be recompiled when the engine gets recreated.
- Introduce the VALIDATION layer which runs a watertightness test on LineSoup buffer contents. - Add debug visualization layers for LineSoup endpoints and validation test errors. - Add DebugLayers options to toggle individual layers. - Add with_winit key bindings to toggle individual layers.
Co-Authored-By: Bruce Mitchener <bruce.mitchener@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about the actual visualization parts, but I'm okay with the integration parts!
We've been talking about various ways to perform CPU-side validation/testing over the outputs of Vello pipeline stages. It's also generally useful to be able to visualize the contents of some of these intermediate data structures (such as bounding boxes, the line soup, etc) and to be able to visually interpret any errors that are surfaced from the validation tests.
I implemented the beginnings of this in a new
debug_layers
feature. I'm putting this up as a Draft PR as there are a few unresolved aspects that I'd like to get feedback on first (more on this below).Running Instructions
To try this out run the
with_winit
example with--features debug_layers
and use the number keys (1-4) to toggle the individual layers.Summary
This PR introduces the concept of "debug layers" that are rendered directly to the surface texture over the fine rasterizer output. There are currently 4 different layers:
BOUNDING_BOXES
: Visualizes the edges of path bounding boxes as green linesLINESOUP_SEGMENTS
: Visualizes LineSoup segments as orange linesLINESOUP_POINTS
: Visualizes the LineSoup endpoints as cyan circles.VALIDATION
: Runs a collection of validation tests on intermediate Vello GPU data structures. This is currently limited to a watertightness test on line segments. Following the test run, the layer visualizes the positions of detected errors as red circles.These layers can be individually toggled using a new
DebugLayers
field invello::RenderParams
. The following is an example output with all 4 layers enabled:Each layer is implemented as an individual render pass. The first 3 layers are simple visualizations of intermediate GPU buffers. The
VALIDATION
layer is special since it runs CPU-side validation steps (currently a watertightness test over the LineSoup buffer), which requires read-back. The general idea is thatVALIDATION
can grow to encompass additional sanity checks.Overview of Changes
blit
pipeline can now be expressed as aRecording
. The debug layer render passes get recorded to thisRecording
. All render passes share the same render encoder and target the same surface texture.Recording
to allow them to be used in a subsequent render pass. Currently this separation into multiple recordings is necessary since the visualizations require GPU->CPU read-back. This is partially encapsulated by the newCapturedBuffers
data structure.debug
module and theDebugLayers
andDebugRenderer
data structures.DebugRenderer
is an encapsulation of the various render pipelines used to visualize the layers that are requested viaDebugLayers
.DebugRenderer
is currently also responsible for execution the validation tests when theVALIDATION
layer is enabled.with_winit
example now has key bindings (the number keys 1-4) to toggle the individual layers.Open Questions
It probably makes sense to have a better separation between running validation tests and visualizing their output. Currently both are performed by
DebugRenderer::render
.CapturedBuffers
doesn't handle buffer clean up well. The currentengine
abstractions require that a buffer be returned to the underlying engine's pool via aRecording
command. This creates and submits a command buffer to simply free a buffer, which is a bit too heavy-weight. This whole mechanism could use some rethinking.Currently, these buffers get conditionally freed in various places in the code and it would be nice to consolidate that logic.
The
VALIDATION
layer currently doesn't work with the--use-cpu
flag since the buffer download command isn't supported for CPU-only buffers. Currently, it's the job ofsrc/render.rs
to know which buffers need to get downloaded for validation purposes. It currently simply records a download command. It would be nice for the engine to make the download command seamless across both CPU and GPU buffers rather than having thesrc/render.rs
code do something different across the CPU vs GPU modalities.Currently all layers require read-back. The debug layers (
BOUNDING_BOXES
,LINESOUP_SEGMENTS
,LINESOUP_POINTS
) read back theBumpAllocators
buffers to obtain instance counts used in their draw commands. This read-back could be avoided by instead issuing indirect draws for the debug layers. I think this could be implemented with a relatively simple scheme: a new compute pipeline stage is introduced (gated by#[cfg(feature = "debug_layers")]
, which can inspect any number of the vello intermediate buffers (such as thebump
buffer) and populate an indirect draw buffer. The indirect draw buffer would be laid out with multipleDrawIndirect
entries, each assigned to a different pre-defined instance type (theDebugRenderer
only issues instanced draws).DebugRenderer
would then issue an indirect draw with the appropriate indirect buffer offset for each render pipeline.The read-back would still be necessary for CPU-side validation stages and their visualization can't really take advantage of the indirect draw. Then again, the exact ordering of the draw submission and the read-backs implemented in this PR is likely to change following the proposal in Strategy for robust dynamic memory, readback, and async #366.