multi_draw_indirect_* on Metal #2148

FredrikNoren · 2021-11-02T06:40:52Z

Is your feature request related to a problem? Please describe.
I'd like to implement GPU-side culling, and from what I understand multi_draw_indirect_* is a key component to this. It would be amazing if this could be supported on OSX as well. (The _count versions would be even better, but as far as I understand it they're not available in Metal? I could be wrong)

Describe the solution you'd like
multi_draw_indirect_* available on OSX/Metal

Describe alternatives you've considered
I guess for now I can do a manual for-loop and issue one draw_*_indirect call per command (which I would still build on the GPU, for the gpu side culling), to simulate multi_draw*.

Additional context
@cwfitzgerald, @jasperrlz and I talked about this yesterday in the matrix chat, figured I'd also post an issue to track this. Also, from the discussion in #742 it sounded like there could be a way to do it?

The text was updated successfully, but these errors were encountered:

FredrikNoren · 2021-11-02T10:16:19Z

Hm so I tried to just add the feature flag to the accepted list of feature flags for Metal, and it seemed to just work (I'm running that on a MacBook Air 2020, big sur): FredrikNoren@7da68cb

kvark · 2021-11-02T21:24:45Z

This one does indeed work, but the _count versions are not yet implemented in Metal:

unsafe fn draw_indirect_count(
        &mut self,
        _buffer: &super::Buffer,
        _offset: wgt::BufferAddress,
        _count_buffer: &super::Buffer,
        _count_offset: wgt::BufferAddress,
        _max_count: u32,
    ) {
        //TODO
    }
    unsafe fn draw_indexed_indirect_count(
        &mut self,
        _buffer: &super::Buffer,
        _offset: wgt::BufferAddress,
        _count_buffer: &super::Buffer,
        _count_offset: wgt::BufferAddress,
        _max_count: u32,
    ) {
        //TODO
    }

We could try to implement it. Or we could try to split the feature in 2 (count and non-count).

cwfitzgerald · 2021-11-02T22:22:47Z

Ah I didn't realize we emulated multi-draw-indirect on metal with a for loop of single indirects. Using this idea, the way we could implement MDIC without a cpu readback is by dispatching a compute shader which copies and zeros out the dispatches above the count. This is getting dangerously close to "emulation" and I would rather just split the feature.

Edit: It's already split, there are two features: MDI and MDIC

FredrikNoren · 2022-03-17T13:21:37Z

To add some more to this issue; this is now my #1 CPU blocker on Metal:

On Windows, the CPU is basically idle for the same scene (~15ms, "drop render pass" doesn't even show up). This is my rendering code:

#[cfg(not(target_os = "macos"))]
{
    render_pass.multi_draw_indexed_indirect(
        &cull_state.commands.buffer(),
        (offset * std::mem::size_of::<DrawIndexedIndirect>() as u64),
        mat.entities.len() as u32,
    );
}
#[cfg(target_os = "macos")]
{
    for i in 0..mat.entities.len() {
        render_pass
            .draw_indexed_indirect(&cull_state.commands.buffer(), (offset + i as u64) * std::mem::size_of::<DrawIndexedIndirect>() as u64);
    }
}

I was looking into if I could try to add Indirect Command Buffers to the metal hal myself, but I'm honestly quite lost. Any pointers or suggestions would be appreciated! Or if there's some other way to reduce the CPU overhead (I was looking at the instance_count field but really not sure how I could use that).

kvark · 2022-03-19T06:18:50Z

@FredrikNoren thank you for sharing!

The Indirect Command Buffers (ICB) - we wanted to use them as implementation for GPU RenderBundle things. So the road to this would be first adding a feature flag (internally used by wgpu-hal) for native render bundle support, adding the relevant API in wgpu-hal, implementing it on Metal, and then hooking up the real WebGPU's RenderBundle logic to this wgpu-hal feature (if supported). It's a relatively big chunk of work.

Perhaps you could run this workload through Metal's System Trace to see what the sampling profiler reports. Maybe indirect calls on Metal are just that slow? Or maybe we are doing something silly in wgpu-core.

FredrikNoren · 2022-03-19T12:28:36Z

@kvark That sounds great! Would love to try to see if switching to those RenderBundles would help in the future then. Is there any timeline when that might be available?

I ran it though the Metal System Trace, but to be honest I'm not 100% how to read the results. Here's the trace: https://drive.google.com/file/d/1pqWsBappybinzbDF7PTPD51VddDi7tit/view?usp=sharing

kvark · 2022-03-20T04:32:17Z

It's hard to put any timeline to this, since there is nobody ready to take on this complex task, and I'm no longer actively implementing new features. I'll try to keep it on my radar for the moment of sudden inflow of free time :)
I opened your trace, and unfortunately I can't see much there since there are no symbols.

I see something a little strange though in this picture. It's the encoders. There appears to be many encoders open simultaneously, and closed only at the end of a frame? Perhaps, you could try keeping less of them open?

cwfitzgerald · 2022-03-21T01:48:57Z

So a bit more context here, there are two different ways that we will interact with metal indirect command buffers in wgpu.

First is through render bundles. These allow the user to record a series of commands on the cpu, then get them replay them multiple times over the course of a program. This maps fairly directly to indirect command buffers and just needs the work done on it, the plan of attack has been set. If you don't need to change the volume or the arguments to the calls, this can be a good solution.

Then is through the use of multi-draw indirect, these allow a shader to write out a series of draw calls that will be executed as a single "blast" of draw calls. These are most useful in cases of gpu culling or other places where the gpu wants to generate a possibly arbitrary amount of draw calls to be executed. Right now, the design of wgpu's multi-draw indirect requires that a compute shader be added in the metal backend which translates the vulkan style multi-draw-indirect buffer into the metal indirect command buffer. This is far from the most efficient implementation, and I would love to see a world where we implement something that lets shaders write directly to ICBs.

kvark · 2022-03-28T05:27:40Z

@cwfitzgerald side note - so that's where your proposal comes in? With an indirect command encoder being an API primitive, we'll be able to implement it purely on host side without writing from compute shaders. That seems like a solid argument, although not sure if it's enough to warrant the API complication.

FredrikNoren · 2022-03-28T06:49:04Z

Writing to ICBs from shaders sounds great!

In the meantime; is there any way to circumvent webgpu to get access to metal directly, or would a fork be my best option?

FredrikNoren · 2022-03-28T07:00:30Z

@kvark Re. the many encoders; I think it's actually just two encoders (I just open two in the code), but what's showing is each render pass. Also not sure why symbols doesn't show up, it's built with debug = true (but in release mode)

cwfitzgerald · 2022-03-28T13:16:35Z

@kvark we already have ICBs on the host side: renderbundles (we haven't implemented it yet, but we can)

My proposal was to add a multi-draw-indirect-like ICB as a wgsl buffer type so that a user's compute shader can freely write to it. This would let vulkan, dx12, and metal resolve to different ICB code (vulkan "normal" MDI buffer, dx12 a MDI buffer with 3 push constants, and metal an actual ICB). This also lets us do the bounds checking during the write, without needing a separate compute shader to validate.

FredrikNoren changed the title ~~multi_draw_indirect_*_count on Metal~~ multi_draw_indirect_* on Metal Nov 2, 2021

kvark added the type: enhancement New feature or request label Nov 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi_draw_indirect_* on Metal #2148

multi_draw_indirect_* on Metal #2148

FredrikNoren commented Nov 2, 2021 •

edited

Loading

FredrikNoren commented Nov 2, 2021

kvark commented Nov 2, 2021

cwfitzgerald commented Nov 2, 2021 •

edited

Loading

FredrikNoren commented Mar 17, 2022

kvark commented Mar 19, 2022

FredrikNoren commented Mar 19, 2022

kvark commented Mar 20, 2022

cwfitzgerald commented Mar 21, 2022

kvark commented Mar 28, 2022

FredrikNoren commented Mar 28, 2022

FredrikNoren commented Mar 28, 2022

cwfitzgerald commented Mar 28, 2022

multi_draw_indirect_* on Metal #2148

multi_draw_indirect_* on Metal #2148

Comments

FredrikNoren commented Nov 2, 2021 • edited Loading

FredrikNoren commented Nov 2, 2021

kvark commented Nov 2, 2021

cwfitzgerald commented Nov 2, 2021 • edited Loading

FredrikNoren commented Mar 17, 2022

kvark commented Mar 19, 2022

FredrikNoren commented Mar 19, 2022

kvark commented Mar 20, 2022

cwfitzgerald commented Mar 21, 2022

kvark commented Mar 28, 2022

FredrikNoren commented Mar 28, 2022

FredrikNoren commented Mar 28, 2022

cwfitzgerald commented Mar 28, 2022

FredrikNoren commented Nov 2, 2021 •

edited

Loading

cwfitzgerald commented Nov 2, 2021 •

edited

Loading