-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Ensure GeometryTile::pending state is only false in the last placement attempt #9842
Conversation
880072e
to
dd8c97a
Compare
I think there's a race condition here (this timeline is imagined, not taken from an actual trace):
|
I haven't found a way to reproduce the workflow mentioned by @ChrisLoer ( #9842 (comment) ) yet, but it looks possible. @kkaefer's suggestion of bookkeeping these via correlation ID sounds like a good solution - I'll add a patch for that. |
dd8c97a
to
b79022f
Compare
The correlation ID change looks good. Doesn't it also make c69fbee unnecessary? If that change is still necessary, we should update the comment in |
Not really - c69fbee prevents the first |
I didn't follow this -- how does coalescing affect the correlation IDs? Can you put together a main/worker timeline to show the condition this is trying to prevent? I would expect I just looked more closely at the |
Hmm, the worker requests for new images whenever it finishes (re)doing layout, which is only triggered by However, I've noticed we call for |
Sorry for the confusion - by coalescing I meant the two In
So even correlation for placement attempts (as I described in the previous comment) is not enough to prevent this (though it saves lots of redundant placement attempts according to my tests!). In this case, what we really need is to flag In this case, we could either enforce re-doing layout (costly), or simply do not clear the feature geometries (memory costly?). In this case, I prefer the latter. In sum: re-setting Do you agree with the proposed approach? |
bd2f954
to
b070593
Compare
Sorry, the correlation ID was meant to go to the main thread in the |
Sounds reasonable (makes By simply adding correlation from the worker thread into main when calling diff --git a/src/mbgl/tile/geometry_tile.cpp b/src/mbgl/tile/geometry_tile.cpp
index 5aa27fb..a377e95 100644
--- a/src/mbgl/tile/geometry_tile.cpp
+++ b/src/mbgl/tile/geometry_tile.cpp
@@ -175,8 +175,11 @@ void GeometryTile::onImagesAvailable(ImageMap images) {
worker.invoke(&GeometryTileWorker::onImagesAvailable, std::move(images), correlationID);
}
-void GeometryTile::getImages(ImageDependencies imageDependencies) {
- imageManager.getImages(*this, std::move(imageDependencies));
+void GeometryTile::getImages(ImageDependencies imageDependencies, uint64_t correlationID_) {
+ // Ignore `getImages` requests from previous requests.
+ if (correlationID_ == correlationID) {
+ imageManager.getImages(*this, std::move(imageDependencies));
+ }
}
void GeometryTile::upload(gl::Context& context) {
diff --git a/src/mbgl/tile/geometry_tile.hpp b/src/mbgl/tile/geometry_tile.hpp
index 2f4f68d..1da16f5 100644
--- a/src/mbgl/tile/geometry_tile.hpp
+++ b/src/mbgl/tile/geometry_tile.hpp
@@ -44,7 +44,7 @@ public:
void onImagesAvailable(ImageMap) override;
void getGlyphs(GlyphDependencies);
- void getImages(ImageDependencies);
+ void getImages(ImageDependencies, uint64_t correlationID);
void upload(gl::Context&) override;
Bucket* getBucket(const style::Layer::Impl&) const override;
diff --git a/src/mbgl/tile/geometry_tile_worker.cpp b/src/mbgl/tile/geometry_tile_worker.cpp
index 6b19920..3480c45 100644
--- a/src/mbgl/tile/geometry_tile_worker.cpp
+++ b/src/mbgl/tile/geometry_tile_worker.cpp
@@ -240,7 +240,7 @@ void GeometryTileWorker::requestNewGlyphs(const GlyphDependencies& glyphDependen
void GeometryTileWorker::requestNewImages(const ImageDependencies& imageDependencies) {
pendingImageDependencies = imageDependencies;
if (!pendingImageDependencies.empty()) {
- parent.invoke(&GeometryTile::getImages, pendingImageDependencies);
+ parent.invoke(&GeometryTile::getImages, pendingImageDependencies, correlationID);
}
} We get the following scenario when running
Because |
2d56721
to
b5a97c4
Compare
@ChrisLoer and I just had a talk about the current state of this PR, and we agreed on the following changes:
I'm going to test the proposed changes above and update this PR accordingly. |
There is one drawback for this approach, see example below:
The image correlation would need to propagate through this entire chain. This is fine, however |
b5a97c4
to
905e325
Compare
Given the issue with the special image correlation described above, I'll stick with splitting the correlation from layout and placement requests, as it solves the issues pointed out by the unit tests, doesn't impact in continuous mode operation and guarantees that |
GeometryTile::pending
is only set to false
in the last placement attempt
GeometryTile::pending
is only set to false
in the last placement attempt
More context on the correlation changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From #9842 (comment):
The image correlation would need to propagate through this entire chain.
Yes, I think that's true. But it also doesn't seem to me that the layout/placement ID approach solves this problem (I know the tests pass, but I think that may just mean we should build some unit tests designed to go specifically at some of these edge cases).
This is fine, however ImageManager::notify is also fired when we set the ImageManager to be in loaded state. This doesn't happen inside the worker thread, so there is no easy way for the worker thread to keep track of image correlation requests in this case.
I don't understand why this is a problem -- in the setLoaded
function, we're still calling notify
for individual requestors, and each request would come along with a correlation ID that we could pass back to onImagesAvailable
. Each call to getImages
(with a unique ID) would still trigger exactly one call to onImagesAvailable
(with the same ID): either it would happen immediately, or it would happen later in setLoaded
Finally, we split layout and placement correlations so we can prevent cases where onImagesAvailable won't end up attempting placement (e.g. when it's called twice in API.RecycleMapUpdateImages), thus never updating the correlation check in GeometryTile::onPlacement that makes pending go to false.
This is the case I'm worried about. It seems like it's still possible with the current logic? (see my comments on GeometryTile::getImages
).
Man, this stuff is tricky to think about!
src/mbgl/tile/geometry_tile.cpp
Outdated
if (mode == MapMode::Continuous) { | ||
placementThrottler.invoke(); | ||
} else { | ||
worker.invoke(&GeometryTileWorker::setPlacementConfig, *requestedConfig, placementCorrelationID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing this with a call to invokePlacement()
here would have the same effect and make it a little more clear it was following the same path.
src/mbgl/tile/geometry_tile.cpp
Outdated
@@ -133,7 +141,9 @@ void GeometryTile::onLayout(LayoutResult result) { | |||
void GeometryTile::onPlacement(PlacementResult result) { | |||
loaded = true; | |||
renderable = true; | |||
if (result.correlationID == correlationID) { | |||
if (result.layoutCorrelationID == lastLayoutCorrelationID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's how I read this. If:
- This placement result is using the same layout as the last layout result
- The last layout result was for the most recent layout request
- This placement result is for the most recent placement request
Then mark this tile no longer pending. Is that right?
The logic before this change was, "if this placement is the first placement to happen in response to the last call to any of setData
/setLayers
/setPlacementConfig
, then mark this tile as no longer pending". In order for that logic to work, the critical assumption was that while the worker might receive onImagesAvailable
or onGlyphsAvailable
, it would only do placement if it had everything it needed to proceed to completion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This placement result is using the same layout as the last layout result
The last layout result was for the most recent layout request
This placement result is for the most recent placement request
Then mark this tile no longer pending. Is that right?
💯
The logic before this change was, "if this placement is the first placement to happen in response to the last call to any of setData/setLayers/setPlacementConfig, then mark this tile as no longer pending". In order for that logic to work, the critical assumption was that while the worker might receive onImagesAvailable or onGlyphsAvailable, it would only do placement if it had everything it needed to proceed to completion.
Precisely - and as exemplified in API.RecycleMapUpdateImages
that is an assumption we cannot rely.
src/mbgl/tile/geometry_tile.cpp
Outdated
@@ -84,14 +84,18 @@ void GeometryTile::setPlacementConfig(const PlacementConfig& desiredConfig) { | |||
// state despite pending parse operations. | |||
pending = true; | |||
|
|||
++correlationID; | |||
++placementCorrelationID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand right here, incrementing the placementCorrelationID
on every call here is important even if we're going to throttle the placement request, so that the tile stays in the "pending" state until the final placement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that the tile stays in the "pending" state until the final placement?
👍
src/mbgl/tile/geometry_tile.cpp
Outdated
@@ -166,11 +176,14 @@ void GeometryTile::getGlyphs(GlyphDependencies glyphDependencies) { | |||
} | |||
|
|||
void GeometryTile::onImagesAvailable(ImageMap images) { | |||
worker.invoke(&GeometryTileWorker::onImagesAvailable, std::move(images)); | |||
worker.invoke(&GeometryTileWorker::onImagesAvailable, std::move(images), ++placementCorrelationID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the logic makes me nervous. If onImagesAvailable
doesn't cause a placement result to come back (i.e. for whatever reason symbolLayoutsNeedPreparation
is false), you'll end up with the tile stalled in the "pending" state (because symbolDependenciesChanged
won't trigger a placement).
For this to work, it has to be the case that symbolLayoutsNeedPreparation
will always be true whenever onImagesAvailable
gets called. To guarantee that, you'd need the cooperation of any code that might call onImagesAvailable
(so ImageManager
would have to be able to do something like cancel an in-progress request for a tile if the tile received a new setData
call).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right - I believe I had code on my prior attempts that would either mark symbolLayoutsNeedPreparation
to true, but now that we have a broader understanding of its implications I agree this is an error-prone approach.
src/mbgl/tile/geometry_tile.cpp
Outdated
imageManager.getImages(*this, std::move(imageDependencies)); | ||
void GeometryTile::getImages(ImageDependencies imageDependencies, uint64_t layoutCorrelationID_) { | ||
// Ignore image requests from previous layout requests. | ||
if (layoutCorrelationID == layoutCorrelationID_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like either (1) we don't need this check, or (2) this check isn't enough. The reason I say it isn't enough is that something like a setData
or setLayers
could be called after getImages
, but before the ImageManager
sent a result back via onImagesAvailable
.
On the other hand, if it's always safe to call onImagesAvailable
, we don't need the check at all (except maybe as what I think would be a tiny/edge-case performance optimization).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, if it's always safe to call onImagesAvailable, we don't need the check at all (except maybe as what I think would be a tiny/edge-case performance optimization).
Agreed - and with your insight in #9842 (review) I have to apologize as I didn't properly noticed that in the image-correlation-id-coming-from-worker-approach we would then have already populated ImageManager::requestors
with that image correlation ID info in getImages
.
…gesAvailable" This reverts commit cd8eb13.
905e325
to
d2ba845
Compare
Thank you so much for the thoughtful review @ChrisLoer ❤️ I am now confident that your image correlation ID approach is our best chance to tackle this problem without adding too much extra states. I just want to point out for the fact that with that approach, we'll have two correlation schemes - one controlled by the main thread for checking when to mark the tile as non-pending, and the other in the worker thread to check when to ignore or proceed with image request replies coming from the main thread. I'll add this information in a short comment in |
d2ba845
to
11463b7
Compare
Great! I don't see a problem with the current approach. We might be able to come up with a better name for what we're doing than I would still like to get @jfirebaugh 's eyes on the changes before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a solid approach to me. I haven't reasoned through it in detail, but I trust your joint analysis.
Is the glyph request process susceptible to a similar issue? Do we need a glyphCorrelationID
?
I think we're OK there because there aren't any runtime operations that could cause the result for a given fontstack/glyph ID to change once it was loaded the first time. |
Thank you @ChrisLoer and @jfirebaugh - and apologies for the amount of rework to find the best approach for this 🙇♂️ |
🎉 💥 |
This PR:
GeometryTilerWorker
change from [core] Ensure all image dependencies are met in GeometryTileWorker::onImagesAvailable #9739, and:ImageManager
to notify its requestors only when all image dependencies are met.This causes
API.RecycleMapUpdateImages
to pass without cd8eb13.