Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate framebuffers that are rendered onto themselves #5150

Merged
merged 5 commits into from
Jan 20, 2014

Conversation

unknownbrackets
Copy link
Collaborator

We're logging this for a bunch of games, I'm not really sure how it can be this many.
http://report.ppsspp.org/logs/kind/428

But either way, Brave Story and 3rd Birthday very clearly do this, and it causes weird behavior in OpenGL. This creates a separate FBO to use as a temporary. It might be better to create a cache of fbos but I wasn't sure about lifetime etc. This does at least help the existing issues.

However, some games (like even 3rd Birthday I think, and Dragon Ball Z Tag Team), even when doing this also render with a CLUT. I think this can use the same temporary FBO (I think it should always render 1:1 to a temporary FBO when using a CLUT) but instead render to it using a shader and a 1D attachment of the CLUT. Seems like a bit of a pain.

Anyway, this doesn't really help those CLUT games, but it may impact them by making their behavior more consistent (before it would sometimes have weird stuff in the source framebuffer.)

-[Unknown]

@dbz400
Copy link
Contributor

dbz400 commented Jan 20, 2014

Looking forward to test it as it should help lots of games which render to self textures

@unknownbrackets
Copy link
Collaborator Author

Oh, I must note this has some potential to impact performance, and apparently happens in many games (apparently innocently in some, not sure.) I'm worried that the reporting may have somehow been inaccurate.

-[Unknown]

@hrydgard
Copy link
Owner

Yeah, I'm also a bit worried about inaccurate reporting here but not sure how that could happen - we only check at draw time, right, so it can't really be that it's just reporting an intermediate state when the game is in the middle of switching both texture and render target.

@solarmystic
Copy link
Contributor

If performance will be impacted further I propose an extended period of testing is allowed for this commit before the merge.

Looking back at PPSSPP's recent history so far, there have been 2 major traceable points where performance has noticeably dipped in a lot of games due to necessarily added functionality:-

a. The Dual Source Alpha commit (v0.9.6-59-g214cc01 214cc01) - required for various games to display glow effects properly etc.

The impact has been mostly mitigated by @hrydgard via selective application of the effect in games/scenarios where it is applicable instead of globally. Used to affect games like Dead or Alive: Paradise by up to 50% in speed reduction.

b. The Blitting commit (v0.9.6-334-gce8f98e ce8f98e) - required for Jeanne D'arc.

The impact is felt mostly in games like Gods Eater Burst and the God of War games (over 50% reduction in speed at higher resolutions as compared to the previous commit, v0.9.6-331-g4d477f0) during GPU limited scenarios.

This could potentially be the next significant one.

@unknownbrackets
Copy link
Collaborator Author

Well, even if it's somehow working, if it really is texturing from a buffer to itself, there are almost definitely issues on some video card.

I'm testing some things now that might cause it to set texture incorrectly. If I'm right, fixing that could end up speeding up PPSSPP (just a tiny bit), actually.

-[Unknown]

@unknownbrackets
Copy link
Collaborator Author

@solarmystic if you're testing performance impact, you'll want to download this again and recompile, I just added a change to prevent some extra work that was never necessary.

-[Unknown]

#ifdef MAY_HAVE_GLES3

// TODO: Maybe merge with bvfbs_? Not sure if those could be packing, and they're created at a different size.
if (!renderCopy_ || renderCopyWidth_ != framebuffer->renderWidth || renderCopyHeight_ != framebuffer->renderHeight) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this happens twice per frame with different sizes, this will lead to excessive fbo creation/destroying. Not sure if we should have a

map<pair<width, height>, fbo> 

or something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about this. But I'm not sure about lifetime management. I kinda want to reuse the ones already created for "read framebuffers to memory", but not sure if that's safe and they have different size characteristics.

Also, btw, I tried creating without depth/stencil (passed false) and it simply failed...

-[Unknown]

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's odd, should have worked to create a no-depth-buffer FBO. Might have a bug in fbo.cpp somewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, it works now, I probably did something else wrong when I tried that...

-[Unknown]

@dbz400
Copy link
Contributor

dbz400 commented Jan 20, 2014

Bravestory looks very well now .RR2 also look better though still show bit artifact.

screen00388

@hrydgard
Copy link
Owner

Ridge Racer probably has some different issue then, unless it returns in the reports.

@dbz400
Copy link
Contributor

dbz400 commented Jan 20, 2014

@unknownbrackets , do you have the code which is for handling CLUT games ? (that fixes the DBZ Tag VS)

@dbz400
Copy link
Contributor

dbz400 commented Jan 20, 2014

This commit fixes #5088

screen00389
screen00390

@hrydgard
Copy link
Owner

Looks like it needs a rebase now, it collided with raven's cleanup.

Anyway, I'm gonna merge this soon. We can optimize later but this shouldn't be a change that universally slows things down, only a few games (and it will fix problems in them), @solarmystic .

This makes it possible to override it easier, rather than needing it in
PSP RAM.
Needed to properly consider render to self (should rebind if the
framebuffer is later used with a separate target.)
This fixes hrydgard#2917.

I verified that, at least for my card, blitting is much faster than
glReadPixels (by quite a margin.)
@brujo5
Copy link

brujo5 commented Jan 20, 2014

any screen for dbz vs tag fixed?

thx

@unknownbrackets
Copy link
Collaborator Author

This does not fix that game. It needs the clut translation.

-[Unknown]

@brujo5
Copy link

brujo5 commented Jan 20, 2014

a ok.

thx

@solarmystic
Copy link
Contributor

@hrydgard

After some preliminary testing, I am inclined to agree with you. This commit doesn't seem to have the wide ranging performance degradation effects seen in the previous ones. By all means, push it through.

I'm still not done with every game in the list, but If any obvious regressions (read:- most likely when unthrottled) are spotted later on, I'll just report them, as usual.

hrydgard added a commit that referenced this pull request Jan 20, 2014
Duplicate framebuffers that are rendered onto themselves
@hrydgard hrydgard merged commit b15ead5 into hrydgard:master Jan 20, 2014
@unknownbrackets
Copy link
Collaborator Author

Heh, I'd like to think most of my changes improve performance or keep it the same. That one change I knew would be which is why I excluded it for mobile, but I really wasn't sure how many games it could be affecting.

-[Unknown]

@solarmystic
Copy link
Contributor

@unknownbrackets

Brave story is absolutely gorgeous now, that glitch triangle that used to appear ingame (not just in the opening cutscene) is gone, and the screen is very, very clean now, no matter the rendering resolution set. It was really visually marred before this.

Before (glitch triangle enclosed in the red square):-
screen00336

After:-
screen00337

I'd say this game really benefitted the most from this set of commits.

@unknownbrackets
Copy link
Collaborator Author

Yep, it was my test game for this problem. I think it was not even doing a clear (like 3rd Birthday was) so it was a lot worse.

-[Unknown]

@unknownbrackets unknownbrackets deleted the texcache branch January 21, 2014 05:53
@unknownbrackets
Copy link
Collaborator Author

@Bigpet to better explain the CLUT thing:

Normally, games place textures in RAM or VRAM, outside of where they actually draw with the GE (you know, polygons and stuff.) This works great and is fully supported. They can use CLUTs with an indexed texture, which also works awesome.

However, sometimes, games do something a bit more interesting. They render (using polygons) to an area of memory (a framebuffer in OpenGL), and then use THAT as a texture. For example, let's say that you're playing a sports game and have your name on your jersey. It might render the letters to a texture (curved using the GE or etc.), and then use that as a texture when drawing your character.

Taking that a step further, some games draw a scene, and then draw that texture on top of itself. For example, render a the scene, and then draw it blended onto itself with lighting and alpha test to brighten some areas up. OpenGL doesn't actually allow you to use a framebuffer as both the source and target of a render - it gets all weird if you do that. That's what this pull request fixed.

But - some games do something weird. Sometimes even while rendering a texture on top of itself (but not always, sometimes just using a separate target.) They use a CLUT. For example, perhaps to convert to grayscale, or who knows. Maybe just to isolate certain colors. But they take this texture and render it somewhere translating the color values using a CLUT.

Because the OpenGL framebuffer isn't actually in memory, we can't just translate the colors the same way we do for regular, non rendered textures (which is the VAST majority of textures which use a CLUT.) The pixel data is "inaccessible" to us without downloading it.

The options basically boil down to:

  • Download the framebuffer (glReadPixels, slow), rescale it to 1x resolution (slow), and set the flip flag (hopefully it doesn't ALSO need it swizzled or something crazy.) Then process as a regular texture.
  • Use a temp FBO to rescale down to 1x resolution, then download the framebuffer (glReadPixels, slow), and otherwise do as above. This is kinda what "Read framebuffers to memory" already does, but less destructive (and possibly slower, but at least faster than the above option.)
  • Use a temp FBO with the CLUT bound as a special texture, and a custom shader to do lookups in the texture and translate each pixel (this may only work if the game doesn't use CLUT masks and stuff, which it may for this particular use case...) This can retain the higher render resolution, although doesn't have to.
  • Alter our existing shader to do this translation (but then you run the risk of getting inaccurate lookups because of downsampling / bilinear filtering / etc.)

I expect the third option to be fast and safe, although I haven't actually checked yet if games are not using special CLUT masks and such that I think would eliminate it as an option, at least for mobile.

-[Unknown]

@hrydgard
Copy link
Owner

In option number three, if games are using clut masks and shift I would expect that they are using them to extract either the red, green or blue channel, in which case we can recognize and emulate it okay even on OpenGL ES without too much trouble. Of course, if games do weirder things we may be out of luck.

@daniel229
Copy link
Collaborator

slow down in killzone
01

02

@daniel229
Copy link
Collaborator

GPU hungry
0.96-531
50 FPS 93% GPU usage
03

0.96-525
92fps 33% GPU usage
04

@unknownbrackets
Copy link
Collaborator Author

It must be doing it then. How does it look with buffered rendering off?

Seems like it's pretty slow anyway. Is it flushing a lot (debug statistics)?

-[Unknown]

@unknownbrackets
Copy link
Collaborator Author

You know: I had a thought. Maybe this is safe/fine if alpha blending is disabled? I've only seen it cause problems when blending was enabled, I think. Then again, it's probably always a bad idea to render to the texture you've bound...

-[Unknown]

@daniel229
Copy link
Collaborator

With buffered rendering off, it does not display anything, if chang buffered rendering off in game it would crash.
05

debug statistics
06

@daniel229
Copy link
Collaborator

How to make alpha blending off?

@hrydgard
Copy link
Owner

@unknownbrackets , rendering to the same texture you are reading from is not defined to ever be completely safe in OpenGL. In practice it tends to be safe-ish on SOME gpus (nVidia and ATI, at least) in these cases:

  1. You are rendering to a subrectangle of the texture that is entirely disjoint from the region you are reading texels from.
  2. You are rendering with a 1:1 texel to pixel mapping, your shader only reads from the one texel you are writing to with nearest filtering, and your triangle strip or whatever is not self-intersecting (does not touch the same pixel twice)

Case 1) is likely to be safe on most GPUs for somewhat-obvious reasons but case 2) is dicey. Pretty sure it will not work on PowerVR due to the tile based rendering.

I don't think alpha blending has much to do with it.

@unknownbrackets
Copy link
Collaborator Author

Okay.

Anyway, 3rd Birthday uses:
CLUT16 / 4444 masked 0x0f (so even component)
CLUT32 / 4444 masked 0x0f with shift 12 (so even component)

Dragon Ball Z Tag Team uses:
CLUT32 / 5650 masked 0x7f with shift 24 (so even component)
CLUT16 / 5650 masked 0xff with shift 5 (probably even component)
CLUT16 / 8888 masked 0x01 (uh oh... this is what makes the screen dark...)
CLUT16 / 8888 masked 0x01 with shift 3 (uh oh? makes the other side dark...)

-[Unknown]

@hrydgard
Copy link
Owner

Those even components with appropriate masks will be easy at least : just generate .r or .a in the shader etc.

Anyway, even without integer support, masks like 0x01 or 0x03 etc can be approximated (with enough accuracy, I think) by multiplying, frac() and dividing back down. Likewise, masks like 0xF0 could be implemented by multiplying, subtract frac(value), and dividing down.

Mask like 0x5 (0101) are a bit more tricky to do without proper bitmasking though.

@unknownbrackets
Copy link
Collaborator Author

Yeah, I'm sure we can find a way to approximate most of it, but definitely seems like a pain...

-[Unknown]

@hrydgard
Copy link
Owner

Don't really see a way around it, so if it's pain it takes, well, that's what it'll take....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants