-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for fsspec files to seviri_l1b_nc reader #1547
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks for implementing the FSFile support in the nc seviri reader! I only have a couple of suggestions inline.
satpy/readers/seviri_l1b_nc.py
Outdated
f_obj = open_file_or_filename(self.filename) | ||
|
||
self.nc = xr.open_dataset(self.filename, | ||
decode_cf=True, | ||
mask_and_scale=False, | ||
chunks=CHUNK_SIZE) | ||
dataset = xr.open_dataset(f_obj, | ||
decode_cf=True, | ||
mask_and_scale=False, | ||
chunks=CHUNK_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is being used in multiple places now, and it might need to evolve in the future if fsspec starts return PathLib compliant files or if xarray starts supporting s3buckets for example. I think it would be beneficial if we could create a reader agnostic function that would do this, so in the future if the APIs of dependencies change, it would make it easier to change. So how about a function along these lines:
def open_dataset(filename, *args, **kwargs):
f_obj = open_file_or_filename(filename)
return xr.open_dataset(f_obj, *args, **kwargs)
This goes a bit against my conviction that we should avoid *args and **kwargs though... maybe a decorator would be an alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that you want to move the logic into a function completely outside of the reader so it can be used in other readers and the nc
method just calls open_dataset
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. So where would you think that function should go (maybe in file_handlers.py)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds about right!
satpy/readers/seviri_l1b_nc.py
Outdated
equatorial_radius = equatorial_radius[0] | ||
polar_radius = polar_radius[0] | ||
ssp_lon = ssp_lon[0] | ||
self.mda['vis_ir_column_dir_grid_step'] = self.mda['vis_ir_column_dir_grid_step'][0] | ||
self.mda['vis_ir_line_dir_grid_step'] = self.mda['vis_ir_line_dir_grid_step'][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think item
would suit better here. Also maybe this can be delegated to a small method (including the if
statement)
equatorial_radius = equatorial_radius[0] | |
polar_radius = polar_radius[0] | |
ssp_lon = ssp_lon[0] | |
self.mda['vis_ir_column_dir_grid_step'] = self.mda['vis_ir_column_dir_grid_step'][0] | |
self.mda['vis_ir_line_dir_grid_step'] = self.mda['vis_ir_line_dir_grid_step'][0] | |
equatorial_radius = equatorial_radius[0] | |
polar_radius = polar_radius.item() | |
ssp_lon = ssp_lon.item() | |
self.mda['vis_ir_column_dir_grid_step'] = self.mda['vis_ir_column_dir_grid_step'].item() | |
self.mda['vis_ir_line_dir_grid_step'] = self.mda['vis_ir_line_dir_grid_step'].item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will refactor to item. I was thinking about a method too but I am not sure if it is worth the overhead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't bother to put item
in a special method.
@@ -32,7 +32,7 @@ | |||
from satpy.tests.utils import make_dataid | |||
|
|||
|
|||
def new_read_file(instance): | |||
def new_read_file(h5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h5
isn't really an explicit name, could you rename this parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
Indeed more documentation on FSFile would be nice. I think it deserves at least a paragraph in the main documentation, with an example or two. As for saying which readers support this, I would say we could probably use the main reader table for that (together with 'status' maybe?) |
Yes I think that is a good idea to include it in the main reader table and additionally add some examples. |
DeepCode failed to analyze this pull requestSomething went wrong despite trying multiple times, sorry about that. |
30f5cd0
to
ed77e35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, thanks for all the new documentation! Some comment inline.
for supported readers to read from remote filesystems has been added. | ||
|
||
To add this feature to a reader the call to :func:`xarray.open_dataset` | ||
hast to be replaced by the function :func:`~satpy.readers.file_handlers.open_dataset` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hast to be replaced by the function :func:`~satpy.readers.file_handlers.open_dataset` | |
has to be replaced by the function :func:`~satpy.readers.file_handlers.open_dataset` |
# if FSFile is used h5netcdf engine is used which outputs arrays instead of floats for attributes | ||
if isinstance(equatorial_radius, np.ndarray): | ||
equatorial_radius = equatorial_radius.item() | ||
polar_radius = polar_radius.item() | ||
ssp_lon = ssp_lon.item() | ||
self.mda['vis_ir_column_dir_grid_step'] = self.mda['vis_ir_column_dir_grid_step'].item() | ||
self.mda['vis_ir_line_dir_grid_step'] = self.mda['vis_ir_line_dir_grid_step'].item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this a separate method, to shorten get_metadata
?
def open_dataset(filename, *args, **kwargs): | ||
"""Open a file with xarray. | ||
|
||
Args: | ||
filename (Union[str, FSFile]): | ||
The path to the file to open. Can be a `string` or | ||
:class:`~satpy.readers.FSFile` object which allows using | ||
`fsspec` or `s3fs` like files. | ||
|
||
Returns: | ||
xarray.Dataset: | ||
|
||
Notes: | ||
This can be used to enable readers to open remote files. | ||
""" | ||
f_obj = open_file_or_filename(filename) | ||
return xr.open_dataset(f_obj, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a small test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this refactor probably is the reason why some tests in the seviri nc reader are failing now too. I will have a look at it and will try to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BENR0 did you have some time to look at this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't have time to look at it yet due to a project deadline. Maybe I have a chance later today otherwise I will look at it during the weekend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem, take it easy :)
Codecov Report
@@ Coverage Diff @@
## main #1547 +/- ##
=======================================
Coverage 93.87% 93.88%
=======================================
Files 283 283
Lines 43046 43073 +27
=======================================
+ Hits 40409 40437 +28
+ Misses 2637 2636 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
satpy/tests/test_file_handlers.py
Outdated
|
||
def test_open_dataset(): | ||
"""Test xr.open_dataset wrapper.""" | ||
fn = "path/to/file.nc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a MagicMock? that would allow to check if open
was called on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this mocked at first but then I thought that this probably was covered by the test of the open_file_or_filename
function. I admit I didn't check but I have seen now that the function is not covered by a test. Should I mock it and test it here or should that be a separate test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, what we want test here is that the dataset is opened whether it's a filename or a fsfile object, right? so I think both scenarios should be tested :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the dataset should be opened regardless. But what I meant was; the logic for this is covered in open_file_or_filename
and not here. That is why I assumed it should not necessarily be tested here. Anyway probably that is nitpicking as long as it is covered. I will add it here then.
195f3d1
to
a09a4be
Compare
Ok somehow there went something wrong during the rebase I think. Need to check that out. But in general the changes in this PR should be ok to be merged. |
Does this help? https://stackoverflow.com/questions/60003798/why-does-a-pull-request-show-extra-commits-after-a-rebase |
7ce15bd
to
dbe2173
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Nice work, thanks! |
The seviri netcdf reader now supports opening files from
fsspec.open_file
objects together withFSFile
.FSFile
is still pretty new but I think this is a pretty interesting thing for users especially if this eventually will be implemented by more and more readers. So I am wondering if this should be documented more visibly than in the docstrings ofFSFile
, possibly with a list of readers supporting this?