-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate framebuffers that are rendered onto themselves #5150
Conversation
Looking forward to test it as it should help lots of games which render to self textures |
Oh, I must note this has some potential to impact performance, and apparently happens in many games (apparently innocently in some, not sure.) I'm worried that the reporting may have somehow been inaccurate. -[Unknown] |
Yeah, I'm also a bit worried about inaccurate reporting here but not sure how that could happen - we only check at draw time, right, so it can't really be that it's just reporting an intermediate state when the game is in the middle of switching both texture and render target. |
If performance will be impacted further I propose an extended period of testing is allowed for this commit before the merge. Looking back at PPSSPP's recent history so far, there have been 2 major traceable points where performance has noticeably dipped in a lot of games due to necessarily added functionality:- a. The Dual Source Alpha commit (v0.9.6-59-g214cc01 214cc01) - required for various games to display glow effects properly etc. The impact has been mostly mitigated by @hrydgard via selective application of the effect in games/scenarios where it is applicable instead of globally. Used to affect games like Dead or Alive: Paradise by up to 50% in speed reduction. b. The Blitting commit (v0.9.6-334-gce8f98e ce8f98e) - required for Jeanne D'arc. The impact is felt mostly in games like Gods Eater Burst and the God of War games (over 50% reduction in speed at higher resolutions as compared to the previous commit, v0.9.6-331-g4d477f0) during GPU limited scenarios. This could potentially be the next significant one. |
Well, even if it's somehow working, if it really is texturing from a buffer to itself, there are almost definitely issues on some video card. I'm testing some things now that might cause it to set texture incorrectly. If I'm right, fixing that could end up speeding up PPSSPP (just a tiny bit), actually. -[Unknown] |
@solarmystic if you're testing performance impact, you'll want to download this again and recompile, I just added a change to prevent some extra work that was never necessary. -[Unknown] |
#ifdef MAY_HAVE_GLES3 | ||
|
||
// TODO: Maybe merge with bvfbs_? Not sure if those could be packing, and they're created at a different size. | ||
if (!renderCopy_ || renderCopyWidth_ != framebuffer->renderWidth || renderCopyHeight_ != framebuffer->renderHeight) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this happens twice per frame with different sizes, this will lead to excessive fbo creation/destroying. Not sure if we should have a
map<pair<width, height>, fbo>
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about this. But I'm not sure about lifetime management. I kinda want to reuse the ones already created for "read framebuffers to memory", but not sure if that's safe and they have different size characteristics.
Also, btw, I tried creating without depth/stencil (passed false) and it simply failed...
-[Unknown]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's odd, should have worked to create a no-depth-buffer FBO. Might have a bug in fbo.cpp somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, it works now, I probably did something else wrong when I tried that...
-[Unknown]
Ridge Racer probably has some different issue then, unless it returns in the reports. |
@unknownbrackets , do you have the code which is for handling CLUT games ? (that fixes the DBZ Tag VS) |
This commit fixes #5088 |
Looks like it needs a rebase now, it collided with raven's cleanup. Anyway, I'm gonna merge this soon. We can optimize later but this shouldn't be a change that universally slows things down, only a few games (and it will fix problems in them), @solarmystic . |
This makes it possible to override it easier, rather than needing it in PSP RAM.
Needed to properly consider render to self (should rebind if the framebuffer is later used with a separate target.)
This fixes hrydgard#2917. I verified that, at least for my card, blitting is much faster than glReadPixels (by quite a margin.)
any screen for dbz vs tag fixed? thx |
This does not fix that game. It needs the clut translation. -[Unknown] |
a ok. thx |
After some preliminary testing, I am inclined to agree with you. This commit doesn't seem to have the wide ranging performance degradation effects seen in the previous ones. By all means, push it through. I'm still not done with every game in the list, but If any obvious regressions (read:- most likely when unthrottled) are spotted later on, I'll just report them, as usual. |
Duplicate framebuffers that are rendered onto themselves
Heh, I'd like to think most of my changes improve performance or keep it the same. That one change I knew would be which is why I excluded it for mobile, but I really wasn't sure how many games it could be affecting. -[Unknown] |
Yep, it was my test game for this problem. I think it was not even doing a clear (like 3rd Birthday was) so it was a lot worse. -[Unknown] |
@Bigpet to better explain the CLUT thing: Normally, games place textures in RAM or VRAM, outside of where they actually draw with the GE (you know, polygons and stuff.) This works great and is fully supported. They can use CLUTs with an indexed texture, which also works awesome. However, sometimes, games do something a bit more interesting. They render (using polygons) to an area of memory (a framebuffer in OpenGL), and then use THAT as a texture. For example, let's say that you're playing a sports game and have your name on your jersey. It might render the letters to a texture (curved using the GE or etc.), and then use that as a texture when drawing your character. Taking that a step further, some games draw a scene, and then draw that texture on top of itself. For example, render a the scene, and then draw it blended onto itself with lighting and alpha test to brighten some areas up. OpenGL doesn't actually allow you to use a framebuffer as both the source and target of a render - it gets all weird if you do that. That's what this pull request fixed. But - some games do something weird. Sometimes even while rendering a texture on top of itself (but not always, sometimes just using a separate target.) They use a CLUT. For example, perhaps to convert to grayscale, or who knows. Maybe just to isolate certain colors. But they take this texture and render it somewhere translating the color values using a CLUT. Because the OpenGL framebuffer isn't actually in memory, we can't just translate the colors the same way we do for regular, non rendered textures (which is the VAST majority of textures which use a CLUT.) The pixel data is "inaccessible" to us without downloading it. The options basically boil down to:
I expect the third option to be fast and safe, although I haven't actually checked yet if games are not using special CLUT masks and such that I think would eliminate it as an option, at least for mobile. -[Unknown] |
In option number three, if games are using clut masks and shift I would expect that they are using them to extract either the red, green or blue channel, in which case we can recognize and emulate it okay even on OpenGL ES without too much trouble. Of course, if games do weirder things we may be out of luck. |
It must be doing it then. How does it look with buffered rendering off? Seems like it's pretty slow anyway. Is it flushing a lot (debug statistics)? -[Unknown] |
You know: I had a thought. Maybe this is safe/fine if alpha blending is disabled? I've only seen it cause problems when blending was enabled, I think. Then again, it's probably always a bad idea to render to the texture you've bound... -[Unknown] |
How to make alpha blending off? |
@unknownbrackets , rendering to the same texture you are reading from is not defined to ever be completely safe in OpenGL. In practice it tends to be safe-ish on SOME gpus (nVidia and ATI, at least) in these cases:
Case 1) is likely to be safe on most GPUs for somewhat-obvious reasons but case 2) is dicey. Pretty sure it will not work on PowerVR due to the tile based rendering. I don't think alpha blending has much to do with it. |
Okay. Anyway, 3rd Birthday uses: Dragon Ball Z Tag Team uses: -[Unknown] |
Those even components with appropriate masks will be easy at least : just generate .r or .a in the shader etc. Anyway, even without integer support, masks like 0x01 or 0x03 etc can be approximated (with enough accuracy, I think) by multiplying, frac() and dividing back down. Likewise, masks like 0xF0 could be implemented by multiplying, subtract frac(value), and dividing down. Mask like 0x5 (0101) are a bit more tricky to do without proper bitmasking though. |
Yeah, I'm sure we can find a way to approximate most of it, but definitely seems like a pain... -[Unknown] |
Don't really see a way around it, so if it's pain it takes, well, that's what it'll take.... |
We're logging this for a bunch of games, I'm not really sure how it can be this many.
http://report.ppsspp.org/logs/kind/428
But either way, Brave Story and 3rd Birthday very clearly do this, and it causes weird behavior in OpenGL. This creates a separate FBO to use as a temporary. It might be better to create a cache of fbos but I wasn't sure about lifetime etc. This does at least help the existing issues.
However, some games (like even 3rd Birthday I think, and Dragon Ball Z Tag Team), even when doing this also render with a CLUT. I think this can use the same temporary FBO (I think it should always render 1:1 to a temporary FBO when using a CLUT) but instead render to it using a shader and a 1D attachment of the CLUT. Seems like a bit of a pain.
Anyway, this doesn't really help those CLUT games, but it may impact them by making their behavior more consistent (before it would sometimes have weird stuff in the source framebuffer.)
-[Unknown]