Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance on large files #61

Closed
Robinlovelace opened this issue Oct 16, 2017 · 14 comments · Fixed by #73
Closed

Performance on large files #61

Robinlovelace opened this issue Oct 16, 2017 · 14 comments · Fixed by #73
Milestone

Comments

@Robinlovelace
Copy link
Contributor

Robinlovelace commented Oct 16, 2017

I've just been using this on some moderately large shapfiles.

Here's some reproducible code to demo the issue:

u = "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_ua_caswa_2001_clipped.zip"
download.file(u, destfile = "zipped_shapefile.zip")
unzip("zipped_shapefile.zip")
f = list.files(pattern = ".shp")
res = sf::st_read(f)
# cas2003_simple = rmapshaper::ms_simplify(input = res, keep = 0.05) # didn't finish...
st_write(res, "cas2003.shp")
ms_msg = "mapshaper cas2003.shp -simplify dp 5% -o format=geojson cas2003.json"
[simplify] Repaired 437 intersections; 12 intersections could not be repaired
[o] Wrote cas2003.json
   user  system elapsed 
  7.984   0.248   8.164 
cas2003_simple = st_read("cas2003.json")

Context: I'm frustrated at the poor provision of UK administrative borders in suitable file formats or levels of simplification and have started a package to deal with it:

https://github.com/Robinlovelace/ukborders

Thanks loads for rmapshaper in any case!

@ateucher
Copy link
Owner

Hi @Robinlovelace - thanks for the report, and the reprex. I have definitely run into issues with rmapshaper being unable to handle really large spatial objects, but in this case I had no trouble (See below). Can you output your sessionInfo()?

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(rmapshaper)
u = "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_ua_caswa_2001_clipped.zip"
download.file(u, destfile = "zipped_shapefile.zip")
unzip("zipped_shapefile.zip")
f = list.files(pattern = ".shp")
res = sf::st_read(f)
#> Reading layer `england_ua_caswa_2001_clipped' from data source `/private/var/folders/2w/x5wq73f93yzgm7hjr_b_54q00000gp/T/RtmpC0DMCy/england_ua_caswa_2001_clipped.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 1061 features and 5 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 243367 ymin: 50322 xmax: 595739.3 ymax: 537152
#> epsg (SRID):    NA
#> proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs

system.time(cas2003_simple <- rmapshaper::ms_simplify(input = res, keep = 0.05))
#>    user  system elapsed 
#>  12.505   1.409  14.181

object.size(res)
#> 27084984 bytes
object.size(cas2003_simple)
#> 2205128 bytes

plot(cas2003_simple[, "name"])

@ateucher
Copy link
Owner

Also related to #59

@Robinlovelace
Copy link
Contributor Author

Apologies, I provided the smaller of the 2 .shp files (not tested). Please try on this bigger one:

u = "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_caswa_2001_clipped.zip"

@ateucher
Copy link
Owner

Ah, that is a different beast. It did complete for me, but it was definitely slow:

system.time(cas2003_simple <- rmapshaper::ms_simplify(input = res, keep = 0.05))
#>    user   system  elapsed 
#> 1740.608   64.060 1802.608

I'll see if I can find anything that can do better, but the bottleneck is converting to geojson before sending it into the V8 context to be processed by the mapshaper javascript library.

@Robinlovelace
Copy link
Contributor Author

Could there not be an option to do it via read/write and system()?

I recall doing a bodge involving that (I think you helped me with it!) and it was pretty fast.

@ateucher
Copy link
Owner

There could for sure, I’ve definitely thought about it. Would require some thought about how to make sure the system mapshaper API is consistent with the commands generated in rmapshaper.

@ateucher
Copy link
Owner

ateucher commented Nov 8, 2017

I'm working on optionally using the system mapshaper in the mapshaper_v0.4.x branch, along with upgrading to the latest version of mapshaper.

@ateucher ateucher added this to the 0.4.0 milestone Nov 8, 2017
@Robinlovelace
Copy link
Contributor Author

Great news - let me know when it's ready to test and I can do some benchmarks. For 9 in 10 use cases it's unlikely to make any difference so 100% behind it being an option for people handling chunky files. Thanks loads!

@ateucher
Copy link
Owner

ateucher commented Feb 4, 2018

Hi @Robinlovelace - I've finally got around to implementing this. If you are willing to give it a bit of a test drive, that would be great! You can install_github("ateucher/rmapshaper", ref = "mapshaper_v0.4.x"). The argument is sys and is available in all functions. Default is FALSE, so if you set it to TRUE it should work!

@Robinlovelace
Copy link
Contributor Author

Fantastic work @ateucher this is 4 times faster on the smaller of the examples - will make life easier for ppl wanting to simplify giant vector datasets. Can test on larger dataset at some point but from sketchy wifi connection on laptop, it expectations! Reprex showing the speed-up:

devtools::install_github("ateucher/rmapshaper", ref = "mapshaper_v0.4.x")
#> Skipping install of 'rmapshaper' from a github remote, the SHA1 (d197a3b7) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.2.2, proj.4 4.9.2
library(rmapshaper)
u = "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_ua_caswa_2001_clipped.zip"
download.file(u, destfile = "zipped_shapefile.zip")
unzip("zipped_shapefile.zip")
f = list.files(pattern = ".shp")
res = sf::st_read(f)
#> Reading layer `england_ua_caswa_2001_clipped' from data source `/tmp/RtmpiO2NE3/england_ua_caswa_2001_clipped.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 1061 features and 5 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 243367 ymin: 50322 xmax: 595739.3 ymax: 537152
#> epsg (SRID):    NA
#> proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
system.time(cas2003_simple <- rmapshaper::ms_simplify(input = res, keep = 0.05))
#>    user  system elapsed 
#>  34.572   2.196  36.980
system.time(cas2003_simple_sys <- rmapshaper::ms_simplify(input = res, keep = 0.05, 
  sys = T))
#>    user  system elapsed 
#>   8.996   0.208   9.154
object.size(res)
#> 27084984 bytes
object.size(cas2003_simple)
#> 2205128 bytes
identical(cas2003_simple, cas2003_simple_sys)
#> [1] TRUE
plot(cas2003_simple[, "name"])

@ateucher
Copy link
Owner

ateucher commented Feb 4, 2018

Excellent! I ran it with the big version and this is what I get:

library(sf)
#> Linking to GEOS 3.6.2, GDAL 2.2.3, proj.4 4.9.3
library(rmapshaper)

u = "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_caswa_2001_clipped.zip"
download.file(u, destfile = "zipped_shapefile.zip")
unzip("zipped_shapefile.zip")
f = list.files(pattern = ".shp")
res = sf::st_read(f)
#> Reading layer `england_caswa_2001_clipped' from data source `/private/var/folders/2w/x5wq73f93yzgm7hjr_b_54q00000gp/T/RtmpZLkA8z/england_caswa_2001_clipped.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 6930 features and 5 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 85665 ymin: 7054 xmax: 655604 ymax: 657534.1
#> epsg (SRID):    NA
#> proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs

system.time(cas2003_simple_sys <- ms_simplify(input = res, keep = 0.05, sys = TRUE))
#>    user  system elapsed 
#>  57.382   2.670  59.937

system.time(cas2003_simple_internal <- ms_simplify(input = res, keep = 0.05, sys = FALSE))
#>     user   system  elapsed 
#> 1276.978   60.552 1514.478

object.size(cas2003_simple_sys)
#> 16266760 bytes
object.size(cas2003_simple_internal)
#> 16266760 bytes

Still not super fast but much better; the bottleneck is writing to geojson with sf::st_write...

@Robinlovelace
Copy link
Contributor Author

Great - exceeds expectations! One idea you may have already tried is writing to different formats. I think this is a great solution in any case, many thanks for the fix.

@ateucher
Copy link
Owner

ateucher commented Apr 4, 2018

@Robinlovelace rmapshaper 0.4 is now on CRAN with the sys argument available in all functions.

@Robinlovelace
Copy link
Contributor Author

Fantastic, many thanks! Heads-up @Nowosad we should mention this in the simplification section of the book.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants