faster Intersects queries #765

cldellow · 2024-10-01T03:45:26Z

This improves on previous work to short-circuit Intersects queries in the negative case, so as to avoid the expensive call into Boost's R-Tree. (Introduced in wildly-fast-and-wildly-broken form in PR #635, then fixed in #641.)

For one of my use cases, this decreases the build time from 7m47s to 1m30s, while also decreasing memory use.

There are 3 main changes:

The old code was very naive. When loading shapes from a shapefile or GeoJSON file, it used a bounding box on the shape, and marked every z14 tile contained therein as possibly containing an intersecting shape.

The bounding box approach is a good approximation for many shapes but breaks down for large, irregular shapes that span many z14 tiles, like https://www.openstreetmap.org/relation/9848163#map=12/39.9616/-105.2197

Instead, we now use 2 bits for each z14 tile. The first bit indicates whether the tile might contain an intersecting shape. The second bit indicates whether it definitely contains an intersecting shape. The second bit is lazily initialized with the more expensive intersects query only when that tile is queried.

This provides a big speedup, but at the cost of 2x the memory usage.

Previously, the old code indexed at indexZoom, which is clamped to 14. The code now uses spatialIndexZoom, which is set to 15. indexZoom controls other indexes as well, where the marginal decrease in runtime doesn't scale 1:1 with increased memory usage. For this index, though, it seems like the extra 2x memory cost is repaid with a 50% decrease in runtime, which seems worth it.
To mitigate the 4x memory increase (from 32MB to 128MB per indexed layer), the index is now sparse. We previously tracked a bitset for the entire globe. Now, we track the index as a set of bitsets, one for each z6 tile. If no z15 tiles in a given z6 tile need to be indexed, we don't allocate a bitset. If you're doing a small geographical extract, you'll benefit from this. Even if you're doing the globe, you'll probably benefit, since your shapefiles probably don't cover the oceans and Antarctica.

This gives up a bit of the runtime benefit, but decreases the memory usage.

Some future improvements (not urgent, just capturing them before I forget):

Reduce memory: instead of allocating 2 bits per z15 tile, we could do 1 bit per z15 tile to track intersections, and, say, 1 bit per z13 tile to track whether we've done the lazy initialization. Then lazily init all the z15 tiles in the containing z13 tile on first access.
With some bookkeeping, I think we could speed up the special case where there's a positive match of exactly 1 shape, if that 1 shape spans the entire z15 tile. This definitely occurs often for my use case of national parks - probably it occurs for other use cases like political boundaries.
AreaIntersecting and CoveredBy could re-use these indexes to speed up their negative cases. And if (2) is done, they could re-use that work to speed up the positive case in the special case. I don't use these yet, but I might start soon.

In systemed#635, we introduced a way to avoid calling Boost's R-Tree for `Intersects` when we knew the underlying z14 tile had no shapes in it. This was a perf win (although not as big as was claimed in that PR -- there was a bug, fixed in systemed#641, which was the source of much of the speedup). This improves on that work for the case of large, irregularly shaped polygons. Currently, for each shape in the shapefile/GeoJSON file, we compute the set of z14 tiles in its bounding box, and set a bit in a bitset for each of those tiles. This is fast to compute, but lossy. For example, the bounding box of https://www.openstreetmap.org/relation/9848163#map=12/39.9616/-105.2197 covers much of the city of Boulder, even though the park itself is entirely outside of Boulder. Instead, this PR now uses 2 bits per z14 tile. The loading of shapes is as before: the first bit is set to 1 if the z14 tile is contained in a bounding box of a shape. The second bit is initialized to 0. At `Intersects` time, we'll lazily refine the index, but only for those tiles that are actually being queried. We'll actually check if the z14 tile intersects any of the shapes, and then either set the second bit to 1, or clear the first bit. This doubles the memory requirement for indexing from 32MB to 64MB per layer. I've left some counters in - I'm going to fiddle with some other ideas to see if there are other speedups. I'll remove the counters before making a PR. Related: the code previously treated the lats as non-projected values. This was wrong, but didn't affect correctness, since both the indexing and querying code made the same error. This commit changes it to correctly use latp.

indexZoom gets clamped to z14 because we ultimately end up storing things relative to z6, and 14 - 6 = 8: going over that makes a bunch of data structures use 2x the memory, for only a marginal speed bump. Here, storing the bitset at z15 rather than z14 also takes 2x the memory, but it provides a 50% decrease in runtime in my use case. I'm going to see if I can claw back some memory by making the bitset sparse -- e.g. if people are loading political boundaries like cities, counties, countries, parks, there's no need for us to pay a memory cost for the ocean and Antarctica.

We previously had a dense bitset that contained all the z15 tiles. Now, we split the z15 tiles up, grouped by z6 tile. We only allocate a dense bitset for the z6 region if there is at least one z15 tile in the region. This is slightly slower at runtime due to the extra indirections, but the memory savings is likely worth it.

systemed · 2024-10-03T08:06:44Z

Very neat! Thank you.

(I idly wonder if there's any mileage in using this approach elsewhere – the #323 (comment) scenario. But we might have squeezed all the gains out of that already.)

cldellow · 2024-10-03T14:02:49Z

True, I think this approach could have worked well in lieu of #607. I suspect we've got all the gains there - IIRC, it's no longer showing up in the CPU profiles, whereas before it was really obvious.

cldellow added 6 commits September 30, 2024 21:48

remove debug counters

90b1d86

adjust comment

eeac1fb

revert unnecessary change

1f842f8

systemed merged commit e42aaa7 into systemed:master Oct 3, 2024
7 checks passed

This was referenced Oct 5, 2024

speed up negative cases of other CoveredBy, FindCovering, AreaIntersecting cldellow/tilemaker#7

Open

speed up special positive case of Intersects/FindIntersecting cldellow/tilemaker#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster Intersects queries #765

faster Intersects queries #765

cldellow commented Oct 1, 2024

systemed commented Oct 3, 2024

cldellow commented Oct 3, 2024

faster Intersects queries #765

faster Intersects queries #765

Conversation

cldellow commented Oct 1, 2024

systemed commented Oct 3, 2024

cldellow commented Oct 3, 2024