-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow large numbers of draw objects #526
Conversation
Previously there was a limit of workgroup size squared for the number of draw objects, which is 64k in practice. This PR makes each workgroup iterate multiple blocks if that limit is exceeded, borrowing a technique from FidelityFX sort. WIP, this causes hangs on mac. Uploading to test on other hardware. Also contains some changes for testing that may not want to be committed as is. Fixes #334
Add barrier for write-after-read hazard in coarse. The loop in question processes 64k draw objects at a time, so the barrier only gets invoked when that limit is exceeded. Also move new test scene so it isn't the first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
const N_WIDE: usize = 300; | ||
const N_HIGH: usize = 300; | ||
const SCENE_WIDTH: f64 = 2000.0; | ||
const SCENE_HEIGHT: f64 = 1500.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider assigning the dimensions to params.resolution
(where params
refers to the second parameter to this function). That will adjust the scene transform so that the contents fit inside the window.
shader/draw_leaf.wgsl
Outdated
for (var i = 0u; i < firstTrailingBit(WG_SIZE); i += 1u) { | ||
var prefix = sh_scratch[0]; | ||
|
||
let num_blocks_total = (config.n_drawobj + (WG_SIZE - 1u)) / WG_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You could remove the parentheses around WG_SIZE - 1u
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically the reason for these parentheses was that I didn't trust constant evaluation to understand the associativity of addition. But I think it's better to trust the compiler here, and have the cleanest expression of programmer intent.
let n_blocks_base = num_blocks_total / WG_SIZE; | ||
let remainder = num_blocks_total % WG_SIZE; | ||
let first_block = n_blocks_base * wg_id.x + min(wg_id.x, remainder); | ||
let n_blocks = n_blocks_base + u32(wg_id.x < remainder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand lines 79-81 correctly, they distribute the remainder across workgroups to load balance? I would add a comment here that clarifies the block/workgroup/thread assignment.
I would also consider moving these calculations to a helper somewhere in shared/ so that they can be reused in shader/draw_reduce.wgsl
(though I don't feel strongly about this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally feel that for 5 lines of logic it's clearer to have them in place than to have to go find them, but realize this is a style issue.
Regarding documentation: I'm seriously thinking we need a wiki entry on prefix sum. I'm inclined to spend a couple hours or so on it. There's just too much lore to jam into a comment. That said, I've written three lines.
let ix = global_id.x; | ||
let tag_word = read_draw_tag_from_scene(ix); | ||
var agg = map_draw_tag(tag_word); | ||
let num_blocks_total = (config.n_drawobj + (WG_SIZE - 1u)) / WG_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about explaining the block->thread assignment.
Set resolution in params for test scene. Add comments explaining division of work.
Previously there was a limit of workgroup size squared for the number of draw objects, which is 64k in practice. This PR makes each workgroup iterate multiple blocks if that limit is exceeded, borrowing a technique from FidelityFX sort.
WIP, this causes hangs on mac. Uploading to test on other hardware.
Also contains some changes for testing that may not want to be committed as is.
Fixes #334