Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor lazy_regrid.py for wflow diagnostics #2025

Closed
SarahAlidoost opened this issue Feb 12, 2021 · 7 comments · Fixed by #2024
Closed

Refactor lazy_regrid.py for wflow diagnostics #2025

SarahAlidoost opened this issue Feb 12, 2021 · 7 comments · Fixed by #2024

Comments

@SarahAlidoost
Copy link
Member

SarahAlidoost commented Feb 12, 2021

The recipe wflow.yml returns memory error if it uses e.g. 10 years of data. The diagnostics wflow.py uses regrid function with scheme area_weighted. The memory error is explained in an open issue in SciTools/iris/issues/3808 .
With the new version of iris, the lazy_regrid script benefits from a refactoring. Note that the regrid preprocessor cannot be moved to the recipe because the format of wflow target grid file is .map.

@SarahAlidoost SarahAlidoost changed the title Replace lazy_regrid.py with esmvalcore.preprocessor.regrid for wflow diagnostics Refactor lazy_regrid.py for wflow diagnostics Feb 16, 2021
@SarahAlidoost
Copy link
Member Author

Comparing the performance of wflow_recipe regarding refactoring lazy_regrid script before (wflow_master) and after (wflow_pr):

  • check the differences in one of the outputs wflow_ERA-Interim_Meuse_1990_2001.nc that includes three variables, as shown below, the differences are zero or very small.
  • check the cube shapes, they are both the same.
  • check the file sizes in the work directory, as can be seen below, after refactoring, the .nc and .XML files are smaller.
  • check the resource_usage.txt, as can be seen below, running the recipe after refactoring takes about half-hour more.

the differences in one of the outputs wflow_ERA-Interim_Meuse_1990_2001.nc
pr
wflow_diff_pr

tas
wflow_diff_tas

pet
wflow_diff_pet

the cube shapes
wflow_diff_cubes

the file sizes
wflow_diff_file_sizes

the resource_usage.txt
before resource_usage.txt and after resource_usage.txt

@bouweandela
Copy link
Member

@SarahAlidoost Thanks for the nice report! What are your conclusions from this?

@bouweandela
Copy link
Member

Ah, sorry, didn't see the conclusions, but they are there already. It's a bit worrying that there is a difference at all in the runtime and memory use. Do you think the runtimes could vary per run?

@Peter9192
Copy link
Contributor

Nice comparison @SarahAlidoost ! Half an hour more on 1-1.5 hours in total is quite a big difference I'd say. I'd like to know how this is possible, but that might be out of scope for your current objective, or not?

@SarahAlidoost
Copy link
Member Author

SarahAlidoost commented Feb 17, 2021

Ah, sorry, didn't see the conclusions, but they are there already. It's a bit worrying that there is a difference at all in the runtime and memory use. Do you think the runtimes could vary per run?

with new commits, the performance is improved. Now the differences in the variable pet is zero too (see below). It seems that the runtime varies per run. Now, it took less than one hour (see new resource_usage.txt).

the differences in one of the outputs wflow_ERA-Interim_Meuse_1990_2001.nc
pet
wflow_diff_pet_2

@SarahAlidoost
Copy link
Member Author

Nice comparison @SarahAlidoost ! Half an hour more on 1-1.5 hours in total is quite a big difference I'd say. I'd like to know how this is possible, but that might be out of scope for your current objective, or not?

please see my comment here.

@bouweandela
Copy link
Member

bouweandela commented Feb 17, 2021

It seems that the runtime varies per run. Now, it took less than one hour

OK, that's good news, because the implementation I did in iris is almost identical to what was here in ESMValTool, I would expect it to be only slightly more efficient. Therefore it would have been strange if there were big differences in runtime. Maybe the difference in runtime/memory use is because not all the nodes have the same hardware (assuming you're running on Cartesius: https://userinfo.surfsara.nl/systems/cartesius/usage/batch-usage#heading7) or because of other users also accessing the shared file system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants