Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading of DNB data from VIIRS compact SDR is slow #940

Closed
pnuu opened this issue Oct 16, 2019 · 3 comments · Fixed by #941
Closed

Loading of DNB data from VIIRS compact SDR is slow #940

pnuu opened this issue Oct 16, 2019 · 3 comments · Fixed by #941

Comments

@pnuu
Copy link
Member

pnuu commented Oct 16, 2019

Describe the bug
Loading of DNB data with viirs_compact reader is slow. Loading of one granule (768 scanlines of 4064 measurements) takes a bit over 7 seconds. As the below snippet demonstrates, everything is done lazily and no dask compute() calls are made.

To Reproduce

import glob
import time
import dask

from satpy import Scene
from pyresample.test.utils import CustomScheduler

fnames = sorted(glob.glob('/home/lahtinep/data/satellite/new/SVDNBC*b41281*'))
glbl = Scene(reader='viirs_compact', filenames=fnames[0:1])
tic = time.time()
# This will raise an exception if any `compute()` calls are made
with dask.config.set(scheduler=CustomScheduler(max_computes=0)):
    glbl.load(['DNB'])
print(time.time() - tic)

Expected behavior
The .load() call should be almost instantaneous.

Actual results
The "lazy" .load() call takes 7 seconds per granule. For hncc_dnb composite this is even worse, 23 s per granule. I ran some profiling and found that satpy.readers.viirs_compact.expand_array() is the one taking the time. This function is called by both VIIRSCompactFileHandler.navigate() and, in addition twice VIIRSCompactFileHandler.angles() for hncc_dnb .

Profiling

Total time: 14.7971 s
File: /home/lahtinep/Software/pytroll/packages/satpy/satpy/readers/viirs_compact.py
Function: expand_array at line 395
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   395                                           @profile
   396                                           def expand_array(data,
   397                                                            scans,
   398                                                            c_align,
   399                                                            c_exp,
   400                                                            scan_size=16,
   401                                                            tpz_size=16,
   402                                                            nties=200,
   403                                                            track_offset=0.5,
   404                                                            scan_offset=0.5):
   405                                               """Expand *data* according to alignment and expansion."""
   406       192       6178.0     32.2      0.0      nties = np.asscalar(nties)
   407       192       2688.0     14.0      0.0      tpz_size = np.asscalar(tpz_size)
   408       192       2164.0     11.3      0.0      s_scan, s_track = da.meshgrid(np.arange(nties * tpz_size),
   409       192     651925.0   3395.4      4.4                                    np.arange(scans * scan_size))
   410       192     546368.0   2845.7      3.7      s_track = (s_track.reshape(scans, scan_size, nties, tpz_size) % scan_size
   411       192    1130793.0   5889.5      7.6                 + track_offset) / scan_size
   412       192     548327.0   2855.9      3.7      s_scan = (s_scan.reshape(scans, scan_size, nties, tpz_size) % tpz_size
   413       192    1081750.0   5634.1      7.3                + scan_offset) / tpz_size
   414                                           
   415       192    2057581.0  10716.6     13.9      a_scan = s_scan + s_scan * (1 - s_scan) * c_exp + s_track * (
   416       192    1958127.0  10198.6     13.2          1 - s_track) * c_align
   417       192        566.0      2.9      0.0      a_track = s_track
   418                                           
   419       192     195067.0   1016.0      1.3      data_a = data[:scans * 2:2, np.newaxis, :-1, np.newaxis]
   420       192     189617.0    987.6      1.3      data_b = data[:scans * 2:2, np.newaxis, 1:, np.newaxis]
   421       192     192634.0   1003.3      1.3      data_c = data[1:scans * 2:2, np.newaxis, 1:, np.newaxis]
   422       192     186189.0    969.7      1.3      data_d = data[1:scans * 2:2, np.newaxis, :-1, np.newaxis]
   423                                           
   424       192     596579.0   3107.2      4.0      fdata = ((1 - a_track)
   425       192    2447401.0  12746.9     16.5               * ((1 - a_scan) * data_a + a_scan * data_b)
   426       192        567.0      3.0      0.0               + a_track
   427       192    2915920.0  15187.1     19.7               * ((1 - a_scan) * data_d + a_scan * data_c))
   428       192      86701.0    451.6      0.6      return fdata.reshape(scans * scan_size, nties * tpz_size)

Environment Info:

  • OS: Ubuntu 18.04 Linux
  • Satpy Version: 0.16.2.dev149+g4313bb4f (current master branch )
  • Dask version: 2.5.2
  • Xarray version 0.14.0 (FIX, was 0.17.0)

Additional context
I still have old versions of satpy/dask/xarray in operations which seems to do this fast (less than a second for 10 DNB granules). The versions there are:

  • Satpy version: 0.11.2+11.g7047b8d
  • Dask version: 1.1.0
  • Xarray version: 0.11.3
@pnuu
Copy link
Member Author

pnuu commented Oct 16, 2019

I'll try downgrading Dask and Xarray few steps to see if that helps.

Ping @djhoese @mraspaud

@pnuu
Copy link
Member Author

pnuu commented Oct 16, 2019

On my new server I have satpy 0.17.2 from conda-forge, dask 2.5.2, and xarray 0.13.0, and there the loading is practically instantaneous. Now I have locally these same versions, but it's still slow. Weird.

@pnuu
Copy link
Member Author

pnuu commented Oct 16, 2019

I read the logs too hastily. The loading of DNB granules in this format has always been rathers slow. The logs just weren't clear on when there are calls and when cached values are used. But anyway, @mraspaud speeded up things quite nicely in the linked PR, so this wasn't all for nothing 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant