Use rtree for large polygons #323

kleunen · 2021-09-19T17:27:51Z

Use rtree only for polygons that cover large areas.

I actually got Africa converted with this on my 8G machine, it shows quite a significant saving in memory. It stores polygons in the rtree if the polygon covers more than 2 tiles at basezoom. It can also store linestrings, but I saw that this does impact the performance of the conversion.

systemed · 2021-09-20T09:00:06Z

Timings for Great Britain:

User time (seconds): 12276.29
System time (seconds): 154.09
Elapsed (wall clock) time (h:mm:ss or m:ss): 22:21.18
Maximum resident set size (kbytes): 14389332
File system inputs: 4653680
File system outputs: 2188184

So compared to #314 (comment), slightly smaller memory usage than master (which is what you'd hope, given that the biggest differences will be with polar regions etc.), but surprisingly quite a lot faster!

kleunen · 2021-09-20T09:04:20Z

Hmm, only slightly smaller. It really depends on the area, on the geofabrik Africa extract, i see about 3G/4G lower memory consumption on my machine. So a huge difference.

I wonder if we can also optimize storing the linestring.

systemed · 2021-09-20T09:06:09Z

I wouldn't expect GB to be much smaller - there aren't many big polygons (apart from the coastline). Africa sounds like a big improvement.

One slightly odd rendering issue - this is a GB extract so I wouldn't necessarily expect Ireland to be rendered, but at some zoom levels, the tiles chosen for rendering are a bit weird:

kleunen · 2021-09-20T09:07:35Z

Yes, it is because the rtree polygons are now not included in calculating the to be rendered tiles anymore.

Maybe the bounds of the rtree should be included.

kleunen · 2021-09-20T19:15:07Z

Have another try, the tiles are calculated from the clipping box now.
So this now always needs to be set when converting.

systemed · 2021-09-21T10:03:30Z

Clipping box looks good!

Similar memory usage to last time, time up slightly (but then it's generating more tiles so I'd expect that):

User time (seconds): 19092.60
System time (seconds): 161.80
Elapsed (wall clock) time (h:mm:ss or m:ss): 26:22.38
Maximum resident set size (kbytes): 14428496
File system inputs: 4653360
File system outputs: 2453648

kleunen · 2021-09-21T10:06:00Z

The memory usage should not go up, the additional tiles which are generated now are mostly empty.

Maybe the trade-off when to store in rtree can be improved.

kleunen · 2021-09-21T19:12:31Z

Have another try after optimizing things a bit :)

systemed · 2021-09-21T20:21:41Z

Very similar! Happy to run it on a different extract if you think it might be interesting.

User time (seconds): 19011.64
System time (seconds): 209.51
Elapsed (wall clock) time (h:mm:ss or m:ss): 26:51.59
Maximum resident set size (kbytes): 14428084
File system inputs: 4653616
File system outputs: 2458384

kleunen · 2021-09-21T20:23:56Z

I guess it is difficult to compare now. Because both the number of tiles increased + the effect of the rtree.

kleunen · 2021-09-22T06:48:11Z

The filesystem outputs is significantly increased, what does this number mean ?

systemed · 2021-09-22T09:18:25Z

It's apparently "the total number of bytes written / 512". I'd expect it to be a bit greater given that the bbox is larger. (Not sure why "file system inputs" was 0 for master in #314 (comment) - that seems wrong given that it has to read the shapefiles and the .pbf.)

Currently running this branch on planet.osm.pbf - should have a result/timing some time tomorrow.

systemed · 2021-09-23T09:16:14Z

My attempt to run this on the planet was much slower than before (after 37 hours it was still on z11), but this might be because I was using the full extent of the planet, whereas previously I was using a bbox that excluded the polar regions. I'll try with some different areas and bboxes.

kleunen · 2021-09-23T14:20:27Z

Yes. Maybe focus on the lower left quarter of the planet. So the region which includes south america and the south pole. On my machine this convert within couple of hours. And compare this PR with the current master.

kleunen · 2021-09-23T17:46:31Z

On my Netherlands extract, not much difference between the two approaches:
Master:
Stored 114325890 nodes, 230175 ways, 79706 relations
Shape points: 0, lines: 0, polygons: 147
Generated points: 9776589, lines: 4456623, polygons: 12683656
Zoom level 14, writing tile 62189 of 62189
Memory used: 10231576

Filled the tileset with good things at netherlands.mbtiles

real 5m34.782s
user 59m40.483s
sys 0m53.007s

RTREE:
Stored 114325890 nodes, 230176 ways, 79706 relations
Shape points: 0, lines: 0, polygons: 147
Generated points: 9776589, lines: 4456623, polygons: 12683657
Zoom level 14, writing tile 63130 of 63130
Memory used: 10236976

Filled the tileset with good things at netherlands.mbtiles

real 5m40.357s
user 60m14.247s
sys 0m56.997s

systemed · 2021-10-22T12:24:18Z

I'm currently trying this again on the full planet with --bbox -180,-60,180,75... it's working through z11 (slowly!) at the moment.

systemed · 2021-10-23T16:36:56Z

After 37 hours (the time it took to generate the planet in #315) it was still going very slowly through z12 so I killed it.

I'm now running it with the rtree used for shapefiles but not for OSM-derived objects, to see if this makes a difference. Peak memory consumption is 115GB which is a handy improvement over #315.

kleunen · 2021-10-23T17:19:27Z

Yes, I think it is really needed to try out what configuration reduces the memory in all cases. I do not fully understand under what condition the usage of the rtree can reduce the overall memory consumption. Maybe using the rtree only for the shapesfile is a better option.

It seems the existing approach with the tile_index is already quite an efficient approach. So it is difficult to improve on this.

systemed · 2021-10-24T19:31:25Z

I think the rule of thumb is that:

small objects are most efficiently stored with the tile index (i.e. one OutputObjectRef in a single tile)
large objects are most efficiently stored with the rtree (i.e. one OutputObjectRef+bbox which covers many tiles)

So the rtree makes particular sense for coastline polygons which are very big and can cover many tiles, particularly approaching the poles, where the spherical Mercator projection tends to stretch the y axis => more tiles. The coastline shapefile can potentially take up 26GB just in OutputObjectRefs, so this is a big deal.

There is a performance issue with the rtree approach. Even when restricting it to shapefiles only, certain areas are much slower to write than before, especially at z14, whereas others are still fast. I haven't figured out why yet. One possible reason:

with the current (tile index) approach, we only include the polygon in those z14 tiles which intersect the actual polygon:

with an rtree, we consider it in those z14 tiles which intersect its bounding box:

Consequently we're processing a big complex multipolygon for many tiles where we weren't doing so before.

If this is the case then there might be a few ways round it - e.g. storing a compressed list of affected tiles with each rtree entry (e.g. "row 0 has tiles in columns 163-189, row 1 has tiles in columns 164-188, row 2 has..." etc.), or subdividing the bboxes into multiple smaller bboxes each with their own rtree entry.

kleunen · 2021-10-25T06:04:12Z

Maybe it is an idea to define the tileindex at multiple zoom levels.
When there is a block of 4 tiles covered at z14, this may possibly be stored as single entry at z13
When there is a block of 4 tiles covered at z13, this may possibly be stored as single entry at z12
...

I think with only a limited number of zoom levels (3 or 4), you can get a reduction in the memory usage of the tileindex.

systemed · 2021-10-25T08:35:33Z

Yes, that could work!

I did also wonder whether adding an explicit ::intersects check (maybe only for shapefile/rtree-derived objects?) might help with the multipolygon issue above.

Results from --bbox -180,-60,180,75 with the rtree restricted to shapefile data only:

peak RAM c. 115GB (vs 131GB previously)
execution time 49hr48 (vs c. 37 hours previously)
planet.mbtiles size 78.7GB (vs 69GB previously) - don't know what's causing this
Fast shutdown #320 worked :)

systemed · 2023-03-25T23:18:03Z

Reworked and merged as #479.

Use rtree for large polygons

04bd130

kleunen mentioned this pull request Sep 19, 2021

Use rtree instead of tileindex to store output objects #314

Closed

Optimized rtree

f6a202d

kleunen force-pushed the rtree_2 branch from d12499f to f6a202d Compare September 21, 2021 18:59

kleunen mentioned this pull request Oct 19, 2021

Add --clip-to option to constrain output to tile boundaries #346

Open

systemed mentioned this pull request Oct 25, 2021

Initialise OutputObjects with layer minzoom #351

Merged

systemed mentioned this pull request Mar 22, 2023

Use rtree for large polygons #479

Merged

systemed closed this Mar 25, 2023

systemed mentioned this pull request Dec 5, 2023

Clipping monster polygons #606

Closed

systemed mentioned this pull request Oct 3, 2024

faster Intersects queries #765

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use rtree for large polygons #323

Use rtree for large polygons #323

kleunen commented Sep 19, 2021 •

edited

Loading

systemed commented Sep 20, 2021

kleunen commented Sep 20, 2021

systemed commented Sep 20, 2021

kleunen commented Sep 20, 2021

kleunen commented Sep 20, 2021

systemed commented Sep 21, 2021

kleunen commented Sep 21, 2021

kleunen commented Sep 21, 2021

systemed commented Sep 21, 2021

kleunen commented Sep 21, 2021

kleunen commented Sep 22, 2021

systemed commented Sep 22, 2021 •

edited

Loading

systemed commented Sep 23, 2021

kleunen commented Sep 23, 2021 •

edited

Loading

kleunen commented Sep 23, 2021

systemed commented Oct 22, 2021

systemed commented Oct 23, 2021

kleunen commented Oct 23, 2021

systemed commented Oct 24, 2021

kleunen commented Oct 25, 2021

systemed commented Oct 25, 2021 •

edited

Loading

systemed commented Mar 25, 2023

Use rtree for large polygons #323

Use rtree for large polygons #323

Conversation

kleunen commented Sep 19, 2021 • edited Loading

systemed commented Sep 20, 2021

kleunen commented Sep 20, 2021

systemed commented Sep 20, 2021

kleunen commented Sep 20, 2021

kleunen commented Sep 20, 2021

systemed commented Sep 21, 2021

kleunen commented Sep 21, 2021

kleunen commented Sep 21, 2021

systemed commented Sep 21, 2021

kleunen commented Sep 21, 2021

kleunen commented Sep 22, 2021

systemed commented Sep 22, 2021 • edited Loading

systemed commented Sep 23, 2021

kleunen commented Sep 23, 2021 • edited Loading

kleunen commented Sep 23, 2021

systemed commented Oct 22, 2021

systemed commented Oct 23, 2021

kleunen commented Oct 23, 2021

systemed commented Oct 24, 2021

kleunen commented Oct 25, 2021

systemed commented Oct 25, 2021 • edited Loading

systemed commented Mar 25, 2023

kleunen commented Sep 19, 2021 •

edited

Loading

systemed commented Sep 22, 2021 •

edited

Loading

kleunen commented Sep 23, 2021 •

edited

Loading

systemed commented Oct 25, 2021 •

edited

Loading