Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for fsspec files to seviri_l1b_nc reader #1547

Merged
merged 5 commits into from
May 24, 2022

Conversation

BENR0
Copy link
Collaborator

@BENR0 BENR0 commented Feb 17, 2021

The seviri netcdf reader now supports opening files from fsspec.open_file objects together with FSFile.

FSFile is still pretty new but I think this is a pretty interesting thing for users especially if this eventually will be implemented by more and more readers. So I am wondering if this should be documented more visibly than in the docstrings of FSFile, possibly with a list of readers supporting this?

  • Tests added

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks for implementing the FSFile support in the nc seviri reader! I only have a couple of suggestions inline.

Comment on lines 67 to 72
f_obj = open_file_or_filename(self.filename)

self.nc = xr.open_dataset(self.filename,
decode_cf=True,
mask_and_scale=False,
chunks=CHUNK_SIZE)
dataset = xr.open_dataset(f_obj,
decode_cf=True,
mask_and_scale=False,
chunks=CHUNK_SIZE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is being used in multiple places now, and it might need to evolve in the future if fsspec starts return PathLib compliant files or if xarray starts supporting s3buckets for example. I think it would be beneficial if we could create a reader agnostic function that would do this, so in the future if the APIs of dependencies change, it would make it easier to change. So how about a function along these lines:

def open_dataset(filename, *args,  **kwargs):
    f_obj = open_file_or_filename(filename)
    return xr.open_dataset(f_obj, *args, **kwargs)

This goes a bit against my conviction that we should avoid *args and **kwargs though... maybe a decorator would be an alternative?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that you want to move the logic into a function completely outside of the reader so it can be used in other readers and the nc method just calls open_dataset?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So where would you think that function should go (maybe in file_handlers.py)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds about right!

Comment on lines 87 to 91
equatorial_radius = equatorial_radius[0]
polar_radius = polar_radius[0]
ssp_lon = ssp_lon[0]
self.mda['vis_ir_column_dir_grid_step'] = self.mda['vis_ir_column_dir_grid_step'][0]
self.mda['vis_ir_line_dir_grid_step'] = self.mda['vis_ir_line_dir_grid_step'][0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think item would suit better here. Also maybe this can be delegated to a small method (including the if statement)

Suggested change
equatorial_radius = equatorial_radius[0]
polar_radius = polar_radius[0]
ssp_lon = ssp_lon[0]
self.mda['vis_ir_column_dir_grid_step'] = self.mda['vis_ir_column_dir_grid_step'][0]
self.mda['vis_ir_line_dir_grid_step'] = self.mda['vis_ir_line_dir_grid_step'][0]
equatorial_radius = equatorial_radius[0]
polar_radius = polar_radius.item()
ssp_lon = ssp_lon.item()
self.mda['vis_ir_column_dir_grid_step'] = self.mda['vis_ir_column_dir_grid_step'].item()
self.mda['vis_ir_line_dir_grid_step'] = self.mda['vis_ir_line_dir_grid_step'].item()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will refactor to item. I was thinking about a method too but I am not sure if it is worth the overhead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't bother to put item in a special method.

@@ -32,7 +32,7 @@
from satpy.tests.utils import make_dataid


def new_read_file(instance):
def new_read_file(h5):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h5 isn't really an explicit name, could you rename this parameter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@mraspaud
Copy link
Member

Indeed more documentation on FSFile would be nice. I think it deserves at least a paragraph in the main documentation, with an example or two.

As for saying which readers support this, I would say we could probably use the main reader table for that (together with 'status' maybe?)

@BENR0
Copy link
Collaborator Author

BENR0 commented Feb 17, 2021

Yes I think that is a good idea to include it in the main reader table and additionally add some examples.

@ghost
Copy link

ghost commented Feb 18, 2021

DeepCode failed to analyze this pull request

Something went wrong despite trying multiple times, sorry about that.
Please comment this pull request with "Retry DeepCode" to manually retry, or contact us so that a human can look into the issue.

@BENR0 BENR0 force-pushed the feat_fsfile_support_seviri_nc branch from 30f5cd0 to ed77e35 Compare February 25, 2021 07:38
@mraspaud mraspaud added component:readers enhancement code enhancements, features, improvements labels Feb 25, 2021
Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thanks for all the new documentation! Some comment inline.

for supported readers to read from remote filesystems has been added.

To add this feature to a reader the call to :func:`xarray.open_dataset`
hast to be replaced by the function :func:`~satpy.readers.file_handlers.open_dataset`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hast to be replaced by the function :func:`~satpy.readers.file_handlers.open_dataset`
has to be replaced by the function :func:`~satpy.readers.file_handlers.open_dataset`

Comment on lines +78 to +100
# if FSFile is used h5netcdf engine is used which outputs arrays instead of floats for attributes
if isinstance(equatorial_radius, np.ndarray):
equatorial_radius = equatorial_radius.item()
polar_radius = polar_radius.item()
ssp_lon = ssp_lon.item()
self.mda['vis_ir_column_dir_grid_step'] = self.mda['vis_ir_column_dir_grid_step'].item()
self.mda['vis_ir_line_dir_grid_step'] = self.mda['vis_ir_line_dir_grid_step'].item()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this a separate method, to shorten get_metadata?

Comment on lines +30 to +46
def open_dataset(filename, *args, **kwargs):
"""Open a file with xarray.

Args:
filename (Union[str, FSFile]):
The path to the file to open. Can be a `string` or
:class:`~satpy.readers.FSFile` object which allows using
`fsspec` or `s3fs` like files.

Returns:
xarray.Dataset:

Notes:
This can be used to enable readers to open remote files.
"""
f_obj = open_file_or_filename(filename)
return xr.open_dataset(f_obj, *args, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a small test for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this refactor probably is the reason why some tests in the seviri nc reader are failing now too. I will have a look at it and will try to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BENR0 did you have some time to look at this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't have time to look at it yet due to a project deadline. Maybe I have a chance later today otherwise I will look at it during the weekend.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem, take it easy :)

@mraspaud mraspaud added this to the v0.26.0 milestone Feb 25, 2021
@BENR0 BENR0 requested a review from adybbroe as a code owner March 9, 2021 13:07
@codecov
Copy link

codecov bot commented Mar 9, 2021

Codecov Report

Merging #1547 (dbe2173) into main (c181288) will increase coverage by 0.00%.
The diff coverage is 89.09%.

@@           Coverage Diff           @@
##             main    #1547   +/-   ##
=======================================
  Coverage   93.87%   93.88%           
=======================================
  Files         283      283           
  Lines       43046    43073   +27     
=======================================
+ Hits        40409    40437   +28     
+ Misses       2637     2636    -1     
Flag Coverage Δ
behaviourtests 4.78% <5.45%> (+<0.01%) ⬆️
unittests 94.53% <89.09%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
satpy/readers/seviri_l1b_nc.py 68.51% <77.77%> (+1.22%) ⬆️
satpy/readers/file_handlers.py 97.80% <100.00%> (+0.12%) ⬆️
satpy/tests/reader_tests/test_seviri_l1b_nc.py 100.00% <100.00%> (ø)
satpy/tests/test_file_handlers.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c181288...dbe2173. Read the comment docs.


def test_open_dataset():
"""Test xr.open_dataset wrapper."""
fn = "path/to/file.nc"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a MagicMock? that would allow to check if open was called on it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this mocked at first but then I thought that this probably was covered by the test of the open_file_or_filename function. I admit I didn't check but I have seen now that the function is not covered by a test. Should I mock it and test it here or should that be a separate test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, what we want test here is that the dataset is opened whether it's a filename or a fsfile object, right? so I think both scenarios should be tested :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the dataset should be opened regardless. But what I meant was; the logic for this is covered in open_file_or_filename and not here. That is why I assumed it should not necessarily be tested here. Anyway probably that is nitpicking as long as it is covered. I will add it here then.

@mraspaud mraspaud removed this from the v0.26.0 milestone Mar 15, 2021
@BENR0 BENR0 force-pushed the feat_fsfile_support_seviri_nc branch from 195f3d1 to a09a4be Compare December 7, 2021 14:10
@coveralls
Copy link

coveralls commented Dec 7, 2021

Coverage Status

Coverage increased (+0.006%) to 94.475% when pulling dbe2173 on BENR0:feat_fsfile_support_seviri_nc into c181288 on pytroll:main.

@BENR0
Copy link
Collaborator Author

BENR0 commented May 11, 2022

Ok somehow there went something wrong during the rebase I think. Need to check that out. But in general the changes in this PR should be ok to be merged.

@gerritholl
Copy link
Member

Ok somehow there went something wrong during the rebase I think. Need to check that out. But in general the changes in this PR should be ok to be merged.

Does this help? https://stackoverflow.com/questions/60003798/why-does-a-pull-request-show-extra-commits-after-a-rebase

@BENR0 BENR0 force-pushed the feat_fsfile_support_seviri_nc branch from 7ce15bd to dbe2173 Compare May 13, 2022 10:20
@BENR0
Copy link
Collaborator Author

BENR0 commented May 24, 2022

@djhoese @mraspaud @sfinkens: from my side this is also ready to be merged. Anything from your point of view that still needs to be done?

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mraspaud mraspaud merged commit cd057ca into pytroll:main May 24, 2022
@BENR0 BENR0 deleted the feat_fsfile_support_seviri_nc branch May 25, 2022 06:12
@sfinkens
Copy link
Member

Nice work, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:readers enhancement code enhancements, features, improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants