-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear external tileset skeletons from tile tree to save memory usage #1107
Conversation
…external-tilesets-2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @azrogers, this is so good! I tried flying around with Google Photorealistic 3D Tiles in Cesium for Unreal, and memory usage stays extremely steady. Previously I believe it would have gone up quite quickly. I didn't see any crashes or other dodgy behavior either. This will be a major improvement for our users!
Mostly small comments here, but I did notice a couple of cases where I think there's potential for (rare) bad behavior.
I think it's also worth taking a bit of time to think through whether there could be any other gotchas like this, and generally do everything we can to thoroughly test everything. Perhaps bring back the soak test from #1415?
…external-tilesets-2
…um-native into unload-external-tilesets-2
@kring Looks like reversing the direction of iteration, as well as unloading empty tiles, worked great! I'll take a look at that soak test and think of some other ways we could verify correct behavior here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more small things here. Also please update CHANGES.md.
Bringing the soak test up-to-date (now in CesiumGS/cesium-unreal#1615 for reasons detailed in that PR) did identify some crashes, including issues with |
I realize now that |
It's now tracking most of the tile pointer usages in |
Ok, I believe I've finally solved it! Turns out, there's two reasons that we do in fact need to remove children from
I ran some quick comparisons with the tile loading soak test (CesiumGS/cesium-unreal#1615) to show that this does indeed clean up external tilesets: |
For my own reference, I created an inventory of how the two counts in a _doNotUnloadCount
Summary: _tilesStillNotUnloadedCount
Summary: |
Based on the above, I think it's possible to collapse this back down to a single count, Also, let's rename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good! A few suggestions that might help clean things up a bit.
Cesium3DTilesSelection/src/Tile.cpp
Outdated
@@ -122,9 +163,19 @@ void Tile::createChildTiles(std::vector<Tile>&& children) { | |||
throw std::runtime_error("Children already created."); | |||
} | |||
|
|||
int32_t prevLoadedContentsCount = this->_tilesStillNotUnloadedCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's possible for any of these children
to have loaded content at this point. I can't think of how they would, at least. Am I mistaken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out it is possible - in TilesetJsonLoader::createLoader
when we call createChildTiles
with the root tile from parseTilesetJson
. When parseTilesetJson
is called with an implicit tileset, that root tile is created with external content which gives it the ContentLoaded
state. Checking in createChildTiles
if any children have the ContentLoaded
state and ticking up the _doNotUnloadSubtreeCount
for each one solves the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting! Implicit tiling has been a bit of a blind spot for me in reviewing this so far. We should definitely make sure implicit tilesets are still working well in this PR, if you haven't already.
It doesn't need to be part of this PR, but we should eventually support unloading implicit tilesets, too. The rules there are a bit different, though. With an external tileset, once we're sure the entire thing is unused, we can unload all of it. It's all or nothing. However, with implicit tiling, we can recreate explicit tiles for any part of the implicit tree, so it's valid to unload any unused subtree.
Removing |
@kring Successfully got the fix for empty tile handling implemented. I believe that's everything! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking better and better, but I still have a few comments. Thanks for your patience and ongoing work on this @azrogers!
if (tile.getState() == TileLoadState::ContentLoaded) { | ||
++this->_doNotUnloadSubtreeCount; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still to handle the root tile of implicit tilesets, which are constructed with TileExternalContent
, right?
If so, I don't have high confidence it will work correctly in all cases. For one thing, an implicit root need not be at the root tile of the tileset.json (though it often is). Second, this code is only incrementing the count on this tile, not propagating it up the tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Messed around with the behavior here. We're now adding the _doNotUnloadSubtreeCount
of the children to the current tile's count. I believe this is OK as there shouldn't be any other pointers to the child tiles yet if they're being passed to this method, so the count should just represent whether any of the children are loaded. I've also changed it to propagate the counter up through the tree. So I believe the way it's written now, an implicit root buried in the tree of a tileset.json would load like:
parseTileJsonRecursively
returns the implicit root tile with the external content and theContentLoaded
state.- The parent tile in the tree passes that implicit root to
createChildTiles
, which updates the parent tile's_doNotUnloadSubtreeCount
as the implicit root tile isContentLoaded
. - The parent tile's parent tile passes the parent tile to
createChildTiles
, which adds the parent tile's_doNotUnloadSubtreeCount
to its own counter. - Repeat up the tree until we reach the root.
I think this covers all our bases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost! I tried out the S2 globe, and saw a crash (_doNotUnloadSubtreeCount
assertion failure) when entering Play in Editor mode and then exiting it. The problem was here:
return tile; |
On that line, thanks to your changes to createChildTiles
, the _doNotUnloadSubtreeCount
of tile
is 6, as it should be. But then it's returning that tile
instance from a function whose return type is std::optional<Tile>
. If it were just Tile
, we'd get named return value optimization, and there would be no issue. But since it's a std::optional
, the Tile
gets moved into the optional. And remember how, at my suggestion, the move constructor doesn't copy _doNotUnloadSubtreeCount
? So the function ends up returning a Tile with a _doNotUnloadSubtreeCount
of zero! 😱
My suggestion was reasonable (I think) when the _doNotUnloadSubtreeCount
was only used to track pointers to the Tile
instance. But now it has that dual purpose where it also tracks loaded content in this Tile and its subtree. That's what the 6
represents in this case, and it's essential that those counts move into the new Tile
, or else we'll have a bug.
With the count having a dual purpose, there's no way to tell during the move operation which counts should move and which shouldn't. So one solution is to go back to two different counts. But as a practical matter, we take great pains to avoid moving out of Tile instances with outstanding pointers to them, because this is the sort of thing that would quickly lead to bugs. So while it feels a little bit wrong, I think we can safely say that any Tile that is the source of a move only has counts as a result of content, not pointers. And thus we should move them all.
So I made that change, and it fixed the S2 bug.
I'm going to make sure CI is happy and do a bit more testing, and then merge this!
Found another bug. This one might be tricky to reproduce, but fortunately I was running in the debugger at the time and the cause is clear enough. It was triggered by a strange connection error (not sure of the cause) and generated this log before the crash:
The assertion is in I believe the problem is in This is not a new problem, but previously the tile would have just gotten stuck in that ContentLoading state with no other major ill effects. Now it causes an assertion failure. |
The most common reason this may happen is on a connection failure.
I was actually able to reproduce the assertion failure quite easily by disconnecting my network temporarily, and submitted a fix. |
Thanks for the major effort here, @azrogers! This is going to be a wonderful improvement for our users. Merging! |
Closes #739. Currently, though the content of tiles are unloaded when no longer used, the "skeletons" - the
Tile
objects - created by loading external tilesets are never unloaded. This can cause memory usage to steadily increase. This change implements a_doNotUnloadCount
number on each tile that tracks situations where the tile's pointers are still in use. When a tile is in a situation where its pointer is being used - the tile is being loaded, for example - it increments this count on the tile and each of its parent tiles, and when the pointer is no longer needed, this counter is decremented up the tree as well. This means we can clear the children of external tilesets when their_doNotUnloadCount
number is 0. This implementation also includes aTileDoNotUnloadCountTracker
class, enabled with theCESIUM_DEBUG_TILE_UNLOADING
switch, that tracks the source of every modification to a tile's_doNotUnloadCount
.