Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster Intersects queries #765

Merged
merged 6 commits into from
Oct 3, 2024
Merged

Conversation

cldellow
Copy link
Contributor

@cldellow cldellow commented Oct 1, 2024

This improves on previous work to short-circuit Intersects queries in the negative case, so as to avoid the expensive call into Boost's R-Tree. (Introduced in wildly-fast-and-wildly-broken form in PR #635, then fixed in #641.)

For one of my use cases, this decreases the build time from 7m47s to 1m30s, while also decreasing memory use.

There are 3 main changes:

  1. The old code was very naive. When loading shapes from a shapefile or GeoJSON file, it used a bounding box on the shape, and marked every z14 tile contained therein as possibly containing an intersecting shape.

The bounding box approach is a good approximation for many shapes but breaks down for large, irregular shapes that span many z14 tiles, like https://www.openstreetmap.org/relation/9848163#map=12/39.9616/-105.2197

Instead, we now use 2 bits for each z14 tile. The first bit indicates whether the tile might contain an intersecting shape. The second bit indicates whether it definitely contains an intersecting shape. The second bit is lazily initialized with the more expensive intersects query only when that tile is queried.

This provides a big speedup, but at the cost of 2x the memory usage.

  1. Previously, the old code indexed at indexZoom, which is clamped to 14. The code now uses spatialIndexZoom, which is set to 15. indexZoom controls other indexes as well, where the marginal decrease in runtime doesn't scale 1:1 with increased memory usage. For this index, though, it seems like the extra 2x memory cost is repaid with a 50% decrease in runtime, which seems worth it.

  2. To mitigate the 4x memory increase (from 32MB to 128MB per indexed layer), the index is now sparse. We previously tracked a bitset for the entire globe. Now, we track the index as a set of bitsets, one for each z6 tile. If no z15 tiles in a given z6 tile need to be indexed, we don't allocate a bitset. If you're doing a small geographical extract, you'll benefit from this. Even if you're doing the globe, you'll probably benefit, since your shapefiles probably don't cover the oceans and Antarctica.

This gives up a bit of the runtime benefit, but decreases the memory usage.

Some future improvements (not urgent, just capturing them before I forget):

  1. Reduce memory: instead of allocating 2 bits per z15 tile, we could do 1 bit per z15 tile to track intersections, and, say, 1 bit per z13 tile to track whether we've done the lazy initialization. Then lazily init all the z15 tiles in the containing z13 tile on first access.

  2. With some bookkeeping, I think we could speed up the special case where there's a positive match of exactly 1 shape, if that 1 shape spans the entire z15 tile. This definitely occurs often for my use case of national parks - probably it occurs for other use cases like political boundaries.

  3. AreaIntersecting and CoveredBy could re-use these indexes to speed up their negative cases. And if (2) is done, they could re-use that work to speed up the positive case in the special case. I don't use these yet, but I might start soon.

In systemed#635, we introduced a way
to avoid calling Boost's R-Tree for `Intersects` when we knew the
underlying z14 tile had no shapes in it. This was a perf win (although
not as big as was claimed in that PR -- there was a bug, fixed in
systemed#641, which was the source
of much of the speedup).

This improves on that work for the case of large, irregularly shaped
polygons.

Currently, for each shape in the shapefile/GeoJSON file, we compute
the set of z14 tiles in its bounding box, and set a bit in a bitset
for each of those tiles. This is fast to compute, but lossy.

For example, the bounding box of
https://www.openstreetmap.org/relation/9848163#map=12/39.9616/-105.2197
covers much of the city of Boulder, even though the park itself is
entirely outside of Boulder.

Instead, this PR now uses 2 bits per z14 tile. The loading of shapes
is as before: the first bit is set to 1 if the z14 tile is contained in
a bounding box of a shape. The second bit is initialized to 0.

At `Intersects` time, we'll lazily refine the index, but only for those
tiles that are actually being queried. We'll actually check if the z14
tile intersects any of the shapes, and then either set the second bit to
1, or clear the first bit.

This doubles the memory requirement for indexing from 32MB to 64MB
per layer.

I've left some counters in - I'm going to fiddle with some other ideas
to see if there are other speedups. I'll remove the counters before
making a PR.

Related: the code previously treated the lats as non-projected values.
This was wrong, but didn't affect correctness, since both the indexing
and querying code made the same error. This commit changes it to
correctly use latp.
indexZoom gets clamped to z14 because we ultimately end up storing
things relative to z6, and 14 - 6 = 8: going over that makes a bunch
of data structures use 2x the memory, for only a marginal speed bump.

Here, storing the bitset at z15 rather than z14 also takes 2x the
memory, but it provides a 50% decrease in runtime in my use case.

I'm going to see if I can claw back some memory by making the bitset
sparse -- e.g. if people are loading political boundaries like cities,
counties, countries, parks, there's no need for us to pay a memory
cost for the ocean and Antarctica.
We previously had a dense bitset that contained all the z15 tiles.

Now, we split the z15 tiles up, grouped by z6 tile. We only allocate a
dense bitset for the z6 region if there is at least one z15 tile in the
region.

This is slightly slower at runtime due to the extra indirections, but
the memory savings is likely worth it.
@systemed systemed merged commit e42aaa7 into systemed:master Oct 3, 2024
7 checks passed
@systemed
Copy link
Owner

systemed commented Oct 3, 2024

Very neat! Thank you.

(I idly wonder if there's any mileage in using this approach elsewhere – the #323 (comment) scenario. But we might have squeezed all the gains out of that already.)

@cldellow
Copy link
Contributor Author

cldellow commented Oct 3, 2024

True, I think this approach could have worked well in lieu of #607. I suspect we've got all the gains there - IIRC, it's no longer showing up in the CPU profiles, whereas before it was really obvious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants