-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add information about multi-threaded compression with geotiff creation #968
Add information about multi-threaded compression with geotiff creation #968
Conversation
Codecov Report
@@ Coverage Diff @@
## master #968 +/- ##
==========================================
+ Coverage 86.76% 86.77% +<.01%
==========================================
Files 179 179
Lines 27298 27298
==========================================
+ Hits 23686 23687 +1
+ Misses 3612 3611 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding this valuable info!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I'd prefer to keep the default DEFLATE
compression. The imagery are saved only once, but e.g. in WMS usage read plenty of times so the fast reading is a must. I'd rather use the DASK_NUM_WORKERS
as the default for num_threads
to speed things up in a normal setup.
I think we could use
|
There seems to be one gotcha in that. By default $ python -c 'import dask; print(dask.config.get("num_workers", "nope"))'
nope while $ DASK_NUM_WORKERS=4 python -c 'import dask; print(dask.config.get("num_workers", "nope"))'
4 So by default the compression would use only one thread even if everything else used all available resources. |
Yeah, that was my intention. I was hoping to avoid accidentally killing the user's machine. I thought 1 was safer. We could also do |
Do we want to fix the number of threads in this PR, or make a separate one ? |
I think we should either change the default threads or the default compression (to no compression) and do it in trollimage. Doing either of those means this PR should be updated to reflect those changes. I'm a little worried/uncomfortable with how we are changing GDAL's defaults to do what we want and making those changes effect everyone. Granted, we are using dask so changing the default number of threads kind of makes sense. Setting the default compression though was maybe not the "cleanest" solution. |
If the user is asking for |
@pnuu True. To be clear, GDAL and dask are likely not using the same threads so I guess there could be some performance/kernel concerns. |
I'm merging this as I think this information is useful as it is. Please make another PR if you think this should be further modified. |
Adding the
num_threads
option to mysave_datasets
calls has reduced my processing time down a ton. For example some processing that was taking 20m on a docker container on an older machine is now taking 5m30s by specifyingnum_threads=8
. This PR adds documentation to the FAQ page about using this option.Other options
As mentioned on Slack by @pnuu, maybe we should consider setting this by default or changing the default compression algorithm in trollimage to be something other than DEFLATE. If this is done I would recommend no compression.