-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance on large files #61
Comments
Hi @Robinlovelace - thanks for the report, and the reprex. I have definitely run into issues with rmapshaper being unable to handle really large spatial objects, but in this case I had no trouble (See below). Can you output your library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(rmapshaper)
u = "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_ua_caswa_2001_clipped.zip"
download.file(u, destfile = "zipped_shapefile.zip")
unzip("zipped_shapefile.zip")
f = list.files(pattern = ".shp")
res = sf::st_read(f)
#> Reading layer `england_ua_caswa_2001_clipped' from data source `/private/var/folders/2w/x5wq73f93yzgm7hjr_b_54q00000gp/T/RtmpC0DMCy/england_ua_caswa_2001_clipped.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 1061 features and 5 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: 243367 ymin: 50322 xmax: 595739.3 ymax: 537152
#> epsg (SRID): NA
#> proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
system.time(cas2003_simple <- rmapshaper::ms_simplify(input = res, keep = 0.05))
#> user system elapsed
#> 12.505 1.409 14.181
object.size(res)
#> 27084984 bytes
object.size(cas2003_simple)
#> 2205128 bytes
plot(cas2003_simple[, "name"]) |
Also related to #59 |
Apologies, I provided the smaller of the 2 .shp files (not tested). Please try on this bigger one:
|
Ah, that is a different beast. It did complete for me, but it was definitely slow: system.time(cas2003_simple <- rmapshaper::ms_simplify(input = res, keep = 0.05))
#> user system elapsed
#> 1740.608 64.060 1802.608 I'll see if I can find anything that can do better, but the bottleneck is converting to geojson before sending it into the V8 context to be processed by the mapshaper javascript library. |
Could there not be an option to do it via read/write and I recall doing a bodge involving that (I think you helped me with it!) and it was pretty fast. |
There could for sure, I’ve definitely thought about it. Would require some thought about how to make sure the system |
I'm working on optionally using the system |
Great news - let me know when it's ready to test and I can do some benchmarks. For 9 in 10 use cases it's unlikely to make any difference so 100% behind it being an option for people handling chunky files. Thanks loads! |
Hi @Robinlovelace - I've finally got around to implementing this. If you are willing to give it a bit of a test drive, that would be great! You can |
Fantastic work @ateucher this is 4 times faster on the smaller of the examples - will make life easier for ppl wanting to simplify giant vector datasets. Can test on larger dataset at some point but from sketchy wifi connection on laptop, it expectations! Reprex showing the speed-up: devtools::install_github("ateucher/rmapshaper", ref = "mapshaper_v0.4.x")
#> Skipping install of 'rmapshaper' from a github remote, the SHA1 (d197a3b7) has not changed since last install.
#> Use `force = TRUE` to force installation
library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.2.2, proj.4 4.9.2
library(rmapshaper)
u = "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_ua_caswa_2001_clipped.zip"
download.file(u, destfile = "zipped_shapefile.zip")
unzip("zipped_shapefile.zip")
f = list.files(pattern = ".shp")
res = sf::st_read(f)
#> Reading layer `england_ua_caswa_2001_clipped' from data source `/tmp/RtmpiO2NE3/england_ua_caswa_2001_clipped.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 1061 features and 5 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: 243367 ymin: 50322 xmax: 595739.3 ymax: 537152
#> epsg (SRID): NA
#> proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
system.time(cas2003_simple <- rmapshaper::ms_simplify(input = res, keep = 0.05))
#> user system elapsed
#> 34.572 2.196 36.980
system.time(cas2003_simple_sys <- rmapshaper::ms_simplify(input = res, keep = 0.05,
sys = T))
#> user system elapsed
#> 8.996 0.208 9.154
object.size(res)
#> 27084984 bytes
object.size(cas2003_simple)
#> 2205128 bytes
identical(cas2003_simple, cas2003_simple_sys)
#> [1] TRUE
plot(cas2003_simple[, "name"]) |
Excellent! I ran it with the big version and this is what I get: library(sf)
#> Linking to GEOS 3.6.2, GDAL 2.2.3, proj.4 4.9.3
library(rmapshaper)
u = "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_caswa_2001_clipped.zip"
download.file(u, destfile = "zipped_shapefile.zip")
unzip("zipped_shapefile.zip")
f = list.files(pattern = ".shp")
res = sf::st_read(f)
#> Reading layer `england_caswa_2001_clipped' from data source `/private/var/folders/2w/x5wq73f93yzgm7hjr_b_54q00000gp/T/RtmpZLkA8z/england_caswa_2001_clipped.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 6930 features and 5 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: 85665 ymin: 7054 xmax: 655604 ymax: 657534.1
#> epsg (SRID): NA
#> proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
system.time(cas2003_simple_sys <- ms_simplify(input = res, keep = 0.05, sys = TRUE))
#> user system elapsed
#> 57.382 2.670 59.937
system.time(cas2003_simple_internal <- ms_simplify(input = res, keep = 0.05, sys = FALSE))
#> user system elapsed
#> 1276.978 60.552 1514.478
object.size(cas2003_simple_sys)
#> 16266760 bytes
object.size(cas2003_simple_internal)
#> 16266760 bytes Still not super fast but much better; the bottleneck is writing to geojson with |
Great - exceeds expectations! One idea you may have already tried is writing to different formats. I think this is a great solution in any case, many thanks for the fix. |
@Robinlovelace rmapshaper 0.4 is now on CRAN with the |
Fantastic, many thanks! Heads-up @Nowosad we should mention this in the simplification section of the book. |
I've just been using this on some moderately large shapfiles.
Here's some reproducible code to demo the issue:
Context: I'm frustrated at the poor provision of UK administrative borders in suitable file formats or levels of simplification and have started a package to deal with it:
https://github.com/Robinlovelace/ukborders
Thanks loads for rmapshaper in any case!
The text was updated successfully, but these errors were encountered: