-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster Intersects queries #765
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In systemed#635, we introduced a way to avoid calling Boost's R-Tree for `Intersects` when we knew the underlying z14 tile had no shapes in it. This was a perf win (although not as big as was claimed in that PR -- there was a bug, fixed in systemed#641, which was the source of much of the speedup). This improves on that work for the case of large, irregularly shaped polygons. Currently, for each shape in the shapefile/GeoJSON file, we compute the set of z14 tiles in its bounding box, and set a bit in a bitset for each of those tiles. This is fast to compute, but lossy. For example, the bounding box of https://www.openstreetmap.org/relation/9848163#map=12/39.9616/-105.2197 covers much of the city of Boulder, even though the park itself is entirely outside of Boulder. Instead, this PR now uses 2 bits per z14 tile. The loading of shapes is as before: the first bit is set to 1 if the z14 tile is contained in a bounding box of a shape. The second bit is initialized to 0. At `Intersects` time, we'll lazily refine the index, but only for those tiles that are actually being queried. We'll actually check if the z14 tile intersects any of the shapes, and then either set the second bit to 1, or clear the first bit. This doubles the memory requirement for indexing from 32MB to 64MB per layer. I've left some counters in - I'm going to fiddle with some other ideas to see if there are other speedups. I'll remove the counters before making a PR. Related: the code previously treated the lats as non-projected values. This was wrong, but didn't affect correctness, since both the indexing and querying code made the same error. This commit changes it to correctly use latp.
indexZoom gets clamped to z14 because we ultimately end up storing things relative to z6, and 14 - 6 = 8: going over that makes a bunch of data structures use 2x the memory, for only a marginal speed bump. Here, storing the bitset at z15 rather than z14 also takes 2x the memory, but it provides a 50% decrease in runtime in my use case. I'm going to see if I can claw back some memory by making the bitset sparse -- e.g. if people are loading political boundaries like cities, counties, countries, parks, there's no need for us to pay a memory cost for the ocean and Antarctica.
We previously had a dense bitset that contained all the z15 tiles. Now, we split the z15 tiles up, grouped by z6 tile. We only allocate a dense bitset for the z6 region if there is at least one z15 tile in the region. This is slightly slower at runtime due to the extra indirections, but the memory savings is likely worth it.
Very neat! Thank you. (I idly wonder if there's any mileage in using this approach elsewhere – the #323 (comment) scenario. But we might have squeezed all the gains out of that already.) |
True, I think this approach could have worked well in lieu of #607. I suspect we've got all the gains there - IIRC, it's no longer showing up in the CPU profiles, whereas before it was really obvious. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This improves on previous work to short-circuit
Intersects
queries in the negative case, so as to avoid the expensive call into Boost's R-Tree. (Introduced in wildly-fast-and-wildly-broken form in PR #635, then fixed in #641.)For one of my use cases, this decreases the build time from 7m47s to 1m30s, while also decreasing memory use.
There are 3 main changes:
The bounding box approach is a good approximation for many shapes but breaks down for large, irregular shapes that span many z14 tiles, like https://www.openstreetmap.org/relation/9848163#map=12/39.9616/-105.2197
Instead, we now use 2 bits for each z14 tile. The first bit indicates whether the tile might contain an intersecting shape. The second bit indicates whether it definitely contains an intersecting shape. The second bit is lazily initialized with the more expensive intersects query only when that tile is queried.
This provides a big speedup, but at the cost of 2x the memory usage.
Previously, the old code indexed at
indexZoom
, which is clamped to 14. The code now usesspatialIndexZoom
, which is set to 15.indexZoom
controls other indexes as well, where the marginal decrease in runtime doesn't scale 1:1 with increased memory usage. For this index, though, it seems like the extra 2x memory cost is repaid with a 50% decrease in runtime, which seems worth it.To mitigate the 4x memory increase (from 32MB to 128MB per indexed layer), the index is now sparse. We previously tracked a bitset for the entire globe. Now, we track the index as a set of bitsets, one for each z6 tile. If no z15 tiles in a given z6 tile need to be indexed, we don't allocate a bitset. If you're doing a small geographical extract, you'll benefit from this. Even if you're doing the globe, you'll probably benefit, since your shapefiles probably don't cover the oceans and Antarctica.
This gives up a bit of the runtime benefit, but decreases the memory usage.
Some future improvements (not urgent, just capturing them before I forget):
Reduce memory: instead of allocating 2 bits per z15 tile, we could do 1 bit per z15 tile to track intersections, and, say, 1 bit per z13 tile to track whether we've done the lazy initialization. Then lazily init all the z15 tiles in the containing z13 tile on first access.
With some bookkeeping, I think we could speed up the special case where there's a positive match of exactly 1 shape, if that 1 shape spans the entire z15 tile. This definitely occurs often for my use case of national parks - probably it occurs for other use cases like political boundaries.
AreaIntersecting and CoveredBy could re-use these indexes to speed up their negative cases. And if (2) is done, they could re-use that work to speed up the positive case in the special case. I don't use these yet, but I might start soon.