-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flicking problem #334
Comments
It is found that the config.n_drawobj in coarse.wgsl has a certain relationship. When it exceeds 65535, will there be conflicts between the data before 65535 and the data after 65535 due to the synchronization of the working group, resulting in problems with the graphics display. Is there a better solution to the 65535 drawing limit?hope it can be resolved. |
Yes, a limit of 64k draw objects is a known problem, and has a straightforward solution. This issue can serve as the tracking bug for that. Thanks for the analysis! |
can you tell which bug or issue link is there so we can know when it's fixed or has any progress? |
is this bug been fixed? |
Not yet. The stroke rework is taking a lot longer than expected, though there is progress. This will be a high priority after that, and is also one of the items tracked in #302. |
Is this the same issue? warning: flashing images Screen.Recording.2023-12-23.at.17.22.19.mov |
No, that issue is caused by overflow of internal buffers (related to #366), which is in turn provoked by not culling lines and tiles that land outside the viewport. We do plan to work on all that. |
I plan on addressing the 64k draw object problem shortly. There are three approaches that can be taken. One is to conditionally apply a 3-level dispatch when the (workgroup size)^2 limit is crossed. This is what's done with pathtags, and I find it ugly. Among other things, it requires more permutations of shaders to be compiled, and there's also some complex conditional logic for which shaders to dispatch. I do have a local patch which is almost done, so it is perhaps the path of least resistance. The second approach is inspired by a technique I saw in FidelityFX sort, and is implemented in my recent sorting exploration. In that approach, each workgroup iterates over A drawback to the latter approach is that it may limit the amount of addressable parallelism. Doing a quick calculation, for very large inputs it will dispatch 64k threads, regardless of the size of the input. That is more threads than directly supported by any existing hardware (RTX 4090 has 16k), though it may limit opportunities for latency hiding. An advantage to the latter approach is that it's two fewer dispatches. As a future potential optimization, we may want to have more permutations (specialization by pipeline override) to (a) allow larger workgroups when the hardware supports it (the WebGPU spec only requires 256, which informs the choices we've made), and (b) support iteration over multiple elements per thread. The former is probably the best way to improve opportunities to exploit parallelism on powerful GPUs (1M threads should be plenty for at least a while) and has no real downside other than wiring up the plumbing. The latter is more of a tradeoff, as it improves bandwidth for large problems but limits parallelism for small ones. To switch between the two adaptively requires potentially compiling both variants (affecting cold-start time including shader compilation) and of course the complexity of the logic. The third approach is to go back to single pass scan techniques, as was done in piet-gpu. We now know how to do this in WebGPU (see Zulip thread) but the performance implications are mixed; in particular it would be a performance regression on Apple Silicon. I'm most inclined to go with the second approach, as I think it's the best set of tradeoffs and admits additional optimization that would address the biggest shortcoming. I'll start on a PR, and if that goes well, probably apply the same technique to path tags. |
Previously there was a limit of workgroup size squared for the number of draw objects, which is 64k in practice. This PR makes each workgroup iterate multiple blocks if that limit is exceeded, borrowing a technique from FidelityFX sort. WIP, this causes hangs on mac. Uploading to test on other hardware. Also contains some changes for testing that may not want to be committed as is. Fixes #334
Previously there was a limit of workgroup size squared for the number of draw objects, which is 64k in practice. This PR makes each workgroup iterate multiple blocks if that limit is exceeded, borrowing a technique from FidelityFX sort. WIP, this causes hangs on mac. Uploading to test on other hardware. Also contains some changes for testing that may not want to be committed as is. Fixes #334
* Allow large numbers of draw objects Previously there was a limit of workgroup size squared for the number of draw objects, which is 64k in practice. This PR makes each workgroup iterate multiple blocks if that limit is exceeded, borrowing a technique from FidelityFX sort. WIP, this causes hangs on mac. Uploading to test on other hardware. Also contains some changes for testing that may not want to be committed as is. Fixes #334 * Add missing barrier Add barrier for write-after-read hazard in coarse. The loop in question processes 64k draw objects at a time, so the barrier only gets invoked when that limit is exceeded. Also move new test scene so it isn't the first. * Address review comments Set resolution in params for test scene. Add comments explaining division of work.
should the readme be changed after this was closed? |
Thanks for the reminder! We intend to go through the list of issues in the README before publishing version 0.2.0, but a PR to remove the outdated items now would be welcome |
See #543 |
I have test a svg file, which is not that big, not as big as the CIA map case.
When I loaded the file, and zoom in, I find that the screen is flicking, some small parts not rendering correctly.
Is this because of float precision problem?
Quit sure there is no clipping in this file
The text was updated successfully, but these errors were encountered: