-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use rtree for large polygons #323
Conversation
Timings for Great Britain:
So compared to #314 (comment), slightly smaller memory usage than master (which is what you'd hope, given that the biggest differences will be with polar regions etc.), but surprisingly quite a lot faster! |
Hmm, only slightly smaller. It really depends on the area, on the geofabrik Africa extract, i see about 3G/4G lower memory consumption on my machine. So a huge difference. I wonder if we can also optimize storing the linestring. |
I wouldn't expect GB to be much smaller - there aren't many big polygons (apart from the coastline). Africa sounds like a big improvement. One slightly odd rendering issue - this is a GB extract so I wouldn't necessarily expect Ireland to be rendered, but at some zoom levels, the tiles chosen for rendering are a bit weird: |
Yes, it is because the rtree polygons are now not included in calculating the to be rendered tiles anymore. Maybe the bounds of the rtree should be included. |
Have another try, the tiles are calculated from the clipping box now. |
Clipping box looks good! Similar memory usage to last time, time up slightly (but then it's generating more tiles so I'd expect that):
|
The memory usage should not go up, the additional tiles which are generated now are mostly empty. Maybe the trade-off when to store in rtree can be improved. |
Have another try after optimizing things a bit :) |
Very similar! Happy to run it on a different extract if you think it might be interesting.
|
I guess it is difficult to compare now. Because both the number of tiles increased + the effect of the rtree. |
The filesystem outputs is significantly increased, what does this number mean ? |
It's apparently "the total number of bytes written / 512". I'd expect it to be a bit greater given that the bbox is larger. (Not sure why "file system inputs" was 0 for Currently running this branch on planet.osm.pbf - should have a result/timing some time tomorrow. |
My attempt to run this on the planet was much slower than before (after 37 hours it was still on z11), but this might be because I was using the full extent of the planet, whereas previously I was using a bbox that excluded the polar regions. I'll try with some different areas and bboxes. |
On my Netherlands extract, not much difference between the two approaches: Filled the tileset with good things at netherlands.mbtiles real 5m34.782s RTREE: Filled the tileset with good things at netherlands.mbtiles real 5m40.357s |
I'm currently trying this again on the full planet with |
After 37 hours (the time it took to generate the planet in #315) it was still going very slowly through z12 so I killed it. I'm now running it with the rtree used for shapefiles but not for OSM-derived objects, to see if this makes a difference. Peak memory consumption is 115GB which is a handy improvement over #315. |
Yes, I think it is really needed to try out what configuration reduces the memory in all cases. I do not fully understand under what condition the usage of the rtree can reduce the overall memory consumption. Maybe using the rtree only for the shapesfile is a better option. It seems the existing approach with the tile_index is already quite an efficient approach. So it is difficult to improve on this. |
I think the rule of thumb is that:
So the rtree makes particular sense for coastline polygons which are very big and can cover many tiles, particularly approaching the poles, where the spherical Mercator projection tends to stretch the y axis => more tiles. The coastline shapefile can potentially take up 26GB just in OutputObjectRefs, so this is a big deal. There is a performance issue with the rtree approach. Even when restricting it to shapefiles only, certain areas are much slower to write than before, especially at z14, whereas others are still fast. I haven't figured out why yet. One possible reason:
Consequently we're processing a big complex multipolygon for many tiles where we weren't doing so before. If this is the case then there might be a few ways round it - e.g. storing a compressed list of affected tiles with each rtree entry (e.g. "row 0 has tiles in columns 163-189, row 1 has tiles in columns 164-188, row 2 has..." etc.), or subdividing the bboxes into multiple smaller bboxes each with their own rtree entry. |
Maybe it is an idea to define the tileindex at multiple zoom levels. I think with only a limited number of zoom levels (3 or 4), you can get a reduction in the memory usage of the tileindex. |
Yes, that could work! I did also wonder whether adding an explicit ::intersects check (maybe only for shapefile/rtree-derived objects?) might help with the multipolygon issue above. Results from
|
Reworked and merged as #479. |
Use rtree only for polygons that cover large areas.
I actually got Africa converted with this on my 8G machine, it shows quite a significant saving in memory. It stores polygons in the rtree if the polygon covers more than 2 tiles at basezoom. It can also store linestrings, but I saw that this does impact the performance of the conversion.