Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDAT Migration Phase 2: Refactor core utilities and lat_lon set #677

Merged
merged 110 commits into from
Oct 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
f5af934
Add `dataset_new.py` and failing unit tests
tomvothecoder Mar 17, 2023
e718fc4
Add `dataset_new.py` with updated Dataset class
tomvothecoder Mar 17, 2023
1c3d166
Refactor Dataset class
tomvothecoder Mar 21, 2023
585a0bd
Update new `climo()` with DataArray parsing
tomvothecoder Mar 22, 2023
e00ab57
Add names to derived variable formula functions
tomvothecoder Mar 22, 2023
51d24f7
Update `climo()` to support derived variables
tomvothecoder Mar 22, 2023
fbfd472
Update refactored climo function with "ANN" averaging
tomvothecoder Apr 3, 2023
b0bd99a
Refactor new climatology func for cleaner code
tomvothecoder May 15, 2023
9fa2a17
Add `climo_xr.py` as a temp solution
tomvothecoder May 16, 2023
f568174
Fix conditional in `climo_xcdat.py`
tomvothecoder May 16, 2023
03e2834
Update `climo()` to operate on `xr.Dataset`
tomvothecoder May 18, 2023
f02e9b9
Add `has_z_axis()` to replace `getLevel()`
tomvothecoder May 18, 2023
e6874c2
Replace `_get_var_from_timeseries()` with `_get_timeseries_dataset()`
tomvothecoder May 18, 2023
79ec578
Completely nuke the Dataset class and refactor
tomvothecoder May 25, 2023
5d9dcca
Refactor derived variable methods
tomvothecoder May 26, 2023
1d3fe57
Refactor conditionals in `run_diags`
tomvothecoder May 26, 2023
e7ae459
Refactor Dataset `__init__` and `_get_derived_vars_map` methods
tomvothecoder May 30, 2023
2a4d00c
Refactor Dataset methods and add tests
tomvothecoder May 31, 2023
bc56cc6
Refactor `_get_time_series_filepath`
tomvothecoder Jun 1, 2023
0ccf76a
Add tests for deriving vars
tomvothecoder Jun 1, 2023
aa08d0b
Refactor and rename climo methods
tomvothecoder Jun 1, 2023
3ff78ed
Fix tests and update `climo_xcdat.py`
tomvothecoder Jun 5, 2023
cd82556
Add more tests for `get_climo_dataset()`
tomvothecoder Jun 5, 2023
fb42c7d
Clean up docstrings
tomvothecoder Jun 15, 2023
c0aafd5
Add ipykernel for interactive debugging
tomvothecoder Jun 28, 2023
bb34f3b
Add `std_xr`
tomvothecoder Jun 29, 2023
713fa71
Add tests for `spatial_avg`
tomvothecoder Jun 29, 2023
835b9dc
Add correlation() and rmse() functions
tomvothecoder Jul 5, 2023
c5c424a
Add tests for correlation
tomvothecoder Jul 6, 2023
55f5702
Add tests for `rmse()`
tomvothecoder Jul 6, 2023
83b520e
Add `serialize` param to metrics functions
tomvothecoder Jul 6, 2023
e38725e
Rename `general_xr.py` to `regrid.py`
tomvothecoder Jul 7, 2023
1219345
Add `_write_vars_to_netcdf` to replace `save_ncfiles()`
tomvothecoder Jul 7, 2023
39240b3
Add functions for z axis to pressure levels
tomvothecoder Jul 11, 2023
d48bc42
Refactor conditional for subsetting and metrics generation
tomvothecoder Jul 11, 2023
fd172b5
Update docstrings and comments
tomvothecoder Jul 12, 2023
6d501cc
Add tests for `regrid_z_axis_to_plevs`
tomvothecoder Jul 13, 2023
a6f9714
Fix conditional order bug in `regrid_z_axis_to_plevs()`
tomvothecoder Jul 17, 2023
f806d33
Fix `_pressure_to_plevs`
tomvothecoder Jul 19, 2023
14eddc1
Remove and update comments
tomvothecoder Jul 19, 2023
ee469a8
Add `regrid_to_lower_res()`
tomvothecoder Jul 20, 2023
38138ef
Add comment about not regridding same res datasets
tomvothecoder Jul 21, 2023
5a4e0be
Add initial implementation of `select_region()`
tomvothecoder Jul 21, 2023
5ff7925
Replace `mask_by` with simpler logic in `select_region()`
tomvothecoder Jul 25, 2023
3812e61
Update `select_region` logic for masking and docs
tomvothecoder Jul 26, 2023
c710e80
Add `_apply_land_sea_mask()` and `_subset_on_region`
tomvothecoder Aug 3, 2023
b728a3b
Update functions for creating and saving metrics and data
tomvothecoder Aug 4, 2023
e7ef1f2
Update `_get_land_sea_mask()` to `Dataset` class method
tomvothecoder Aug 7, 2023
829d65c
Update CoreParameter.backend default to "cartopy"
tomvothecoder Aug 7, 2023
bc03dc7
Update `lat_lon_plot.py` to support Xarray DataArrays
tomvothecoder Aug 8, 2023
dab4df1
Fix is_lon_full bool
tomvothecoder Aug 8, 2023
ba35904
Update vscode workspace settings
tomvothecoder Aug 9, 2023
e7feae8
Fix proj missing
tomvothecoder Aug 9, 2023
3174a58
Refactor `lat_lon_plot.py`
tomvothecoder Aug 15, 2023
ad7abca
Fix mypy errors
tomvothecoder Aug 15, 2023
ac5daa9
Add logging to `_write_vars_to_netcdf()`
tomvothecoder Aug 15, 2023
5d07f07
Fix `std` conditional
tomvothecoder Aug 22, 2023
f6b7c6d
Replace `mask_by` with `_apply_land_sea_mask`
tomvothecoder Aug 22, 2023
6644139
Update `dev.yml` to use xcdat=0.6.0rc1
tomvothecoder Aug 23, 2023
ef0124c
Split up `acme_new.py` into three smaller files
tomvothecoder Aug 23, 2023
1a36fba
Fix remaining variables not producing
tomvothecoder Aug 28, 2023
22f4be4
Update VS Code workspace to use black formatter
tomvothecoder Aug 29, 2023
ef9cae6
Update `create_metrics()` to make it more readable and maintainable
tomvothecoder Aug 29, 2023
ae7149d
Move check for `parameter.save_netcdf` to parent function
tomvothecoder Aug 30, 2023
c24895a
Add metrics comparison script
tomvothecoder Aug 30, 2023
a76dfe1
Refactor for loops and functions in `lat_lon_driver.py`
tomvothecoder Sep 6, 2023
ea6c8f3
Add docstrings to functions in `lat_lon_driver.py`
tomvothecoder Sep 6, 2023
179766d
Add comments to metrics comparison script
tomvothecoder Sep 6, 2023
c47e66c
Add code to keep dev and prod results
tomvothecoder Sep 11, 2023
f3187c9
Add all regions to `default_regions_xr.py`
tomvothecoder Sep 11, 2023
86ec317
Update docstrings with references to legacy modules
tomvothecoder Sep 11, 2023
bf4c100
Add type annotations in `formulas.py`
tomvothecoder Sep 11, 2023
baed314
Add and update type annotations and comments
tomvothecoder Sep 11, 2023
f4a69de
Apply suggestions from code review
tomvothecoder Sep 11, 2023
65216e3
Delete `test_climo_xcdat.py`
tomvothecoder Sep 11, 2023
a9bc1b3
Move `setup.cfg` configs to `pyproject.toml`
tomvothecoder Sep 11, 2023
6d216cb
Update pyproject.toml
tomvothecoder Sep 12, 2023
8502eaf
Update pre-commit and mypy configs
tomvothecoder Sep 15, 2023
f824faa
Rename `Dataset.type` to `Dataset.data_type`
tomvothecoder Sep 25, 2023
c02d701
Update `is_time_series` to raise RuntimeError
tomvothecoder Sep 25, 2023
334f644
Address Ryan"s PR comments
tomvothecoder Sep 28, 2023
6d595db
Update comments and workspace settings for mypy
tomvothecoder Sep 28, 2023
77bea6f
Address Jill"s PR review comments
tomvothecoder Sep 29, 2023
6428949
Update `_get_climo_filepath()` docstring
tomvothecoder Sep 29, 2023
ba830d6
Update order of labels in dev.yml
tomvothecoder Oct 2, 2023
54c43be
Add `dev.yml` with no MPI version of `esmf`
tomvothecoder Oct 2, 2023
9085c33
Update `pre-commit` hooks
tomvothecoder Oct 3, 2023
b34a88c
Update `flake8-isort` version
tomvothecoder Oct 3, 2023
a19e9a4
Simplify metrics APIs by removing unnecessary `serialize` arg
tomvothecoder Oct 3, 2023
e544788
Update `_write_vars_to_netcdf()`
tomvothecoder Oct 3, 2023
5f8d27f
Add tests for `Dataset._get_land_sea_mask()`
tomvothecoder Oct 3, 2023
9af60d0
Refactor _get_attr_from_climo()`
tomvothecoder Oct 3, 2023
baf4a2e
Refactor `Dataset.get_name_and_yrs_attr()`
tomvothecoder Oct 4, 2023
3363bd9
Add tests to cover `_get_output_dir()` and regridding functions
tomvothecoder Oct 4, 2023
36ed8bf
Silence logger messages for `test_io.py` tests
tomvothecoder Oct 4, 2023
c18c46f
Silence warnings in regridding tests
tomvothecoder Oct 4, 2023
659cd8f
Add excel sheet for metrics comparison
tomvothecoder Oct 4, 2023
7749900
Fix bug introduced in `rmse()` function call
tomvothecoder Oct 4, 2023
ae0d649
Regenerate excel metrics sheet
tomvothecoder Oct 4, 2023
6cfc3fc
Add TODO: for fixing up logic in `lat_lon_driver.run_diags()`
tomvothecoder Oct 5, 2023
e7644a9
Add type annotation for `CoreParameter.regrid_tool`
tomvothecoder Oct 5, 2023
9c6081c
Update `metrics_regression_test.py` script
tomvothecoder Oct 5, 2023
762b493
Update regression test script name with comments
tomvothecoder Oct 9, 2023
bdc02d1
Refactor `_save_and_plot_metrics_dict()` into smaller functions
tomvothecoder Oct 10, 2023
bb7a3f9
Extract `_run_diags_2d` and `_run_diags_3d` from `run_diags`
tomvothecoder Oct 10, 2023
56b8163
Fix `_get_metrics_by_region` not returning values
tomvothecoder Oct 10, 2023
d1badfa
Add legacy logic for `align_grids_to_lower_res()`
tomvothecoder Oct 10, 2023
d530e30
Delete excel sheet
tomvothecoder Oct 10, 2023
cee20e3
Update script comments
tomvothecoder Oct 10, 2023
9bf1dac
Fix regridding tests
tomvothecoder Oct 10, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@ fail_fast: true

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
rev: v4.4.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml

- repo: https://github.com/psf/black
rev: 22.10.0
rev: 23.9.1
hooks:
- id: black

Expand All @@ -23,15 +23,16 @@ repos:
# Need to use flake8 GitHub mirror due to CentOS git issue with GitLab
# https://github.com/pre-commit/pre-commit/issues/1206
- repo: https://github.com/pycqa/flake8
rev: 6.0.0
rev: 6.1.0
hooks:
- id: flake8
args: ["--config=setup.cfg"]
additional_dependencies: [flake8-isort]
additional_dependencies: [flake8-isort==6.1.0]

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.991
rev: v1.5.1
hooks:
- id: mypy
args: ["--config=setup.cfg"]
additional_dependencies: [types-pyYAML==6.0.12.6]
args: ["--config=pyproject.toml"]
additional_dependencies:
[dask, numpy>=1.23.0, xarray>=2023.3.0, types-PyYAML]
31 changes: 25 additions & 6 deletions .vscode/e3sm_diags.code-workspace
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
"path": ".."
}
],
// ===========================
// VS Code Workspace Settings.
// ===========================
"settings": {
// ===================
// Editor settings
Expand All @@ -22,17 +25,14 @@
"editor.rulers": [80, 88, 120],
"editor.wordWrap": "wordWrapColumn",
"editor.wordWrapColumn": 120,
"editor.defaultFormatter": "ms-python.python"
"editor.defaultFormatter": "ms-python.black-formatter"
},
// Code Formatting and Linting
// ---------------------------
"python.formatting.provider": "black",
"python.linting.flake8Enabled": true,
"python.linting.flake8Args": ["--config=setup.cfg"],
"flake8.args": ["--config=setup.cfg"],
// Type checking
// ---------------------------
"python.linting.mypyEnabled": true,
"python.linting.mypyArgs": ["--config=setup.cfg"],
"mypy-type-checker.args": ["--config=pyproject.toml"],
// Testing
// ---------------------------
// NOTE: Debugger doesn't work if pytest-cov is enabled, so set "--no-cov"
Expand All @@ -49,5 +49,24 @@
"editor.wordWrap": "wordWrapColumn",
"editor.wordWrapColumn": 120
}
},
// =====================================
// VS Code Python Debugger Configuration
// =====================================
"launch": {
"version": "0.2.0",
"configurations": [
{
"name": "Python: Current File",
"type": "python",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
"justMyCode": true,
"env": {
"PYTHONPATH": "${workspaceFolder}"
}
}
]
}
}
171 changes: 171 additions & 0 deletions auxiliary_tools/cdat_regression_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# %%
"""
This script checks for regressions between the refactored and `main` branches
of a diagnostic set.
How it works
------------
It compares the absolute and relative differences between two sets of
`.json` files in two separate directories, one for the refactored code
and the other for the `main` branch. This script will generate an Excel file
containing:
1. The raw metrics by each branch for each variable.
2. The absolute and relative differences of each variable between branches.
3. The highest relative differences (threshold > 2% difference)
How to use
-----------
1. mamba env create -f conda/dev-yml -n e3sm_diags_dev_<gh-issue-#>
2. mamba activate e3sm_diags_dev_<gh-issue-#>
3. Update `DEV_PATH` and `PROD_PATH` in `/auxiliary_tools/cdat_regression_test.py`
4. python auxiliary_tools/cdat_regression_test.py
5. Excel file generated in `/auxiliary_tools`
Tips
-----------
Relative differences should be taken into consideration moreso than absolute
differences.
- Relative differences show the scale using a percentage unit.
- Absolute differences is just a raw number that doesn't factor in
floating point size (e.g., 100.00 vs. 0.0001), which can be misleading.
"""
import glob
import logging
import os
import time
from typing import List

import pandas as pd

log_format = (
"%(asctime)s [%(levelname)s]: %(filename)s(%(funcName)s:%(lineno)s) >> %(message)s"
)
logging.basicConfig(format=log_format, filemode="w", level=logging.INFO)
logger = logging.getLogger(__name__)

# TODO: Update DEV_RESULTS and PROD_RESULTS.
# ------------------------------------------------------------------------------
DEV_PATH = "/global/cfs/cdirs/e3sm/www/vo13/examples_658/ex1_modTS_vs_modTS_3years/lat_lon/model_vs_model"
PRO_PATH = "/global/cfs/cdirs/e3sm/www/vo13/examples/ex1_modTS_vs_modTS_3years/lat_lon/model_vs_model"
# ------------------------------------------------------------------------------

if not os.path.exists(DEV_PATH):
raise ValueError(f"DEV_RESULTS path does not exist ({DEV_PATH})")
if not os.path.exists(PRO_PATH):
raise ValueError(f"PROD_RESULTS path does not exist ({PRO_PATH})")

DEV_GLOB = sorted(glob.glob(DEV_PATH + "/*.json"))
PROD_GLOB = sorted(glob.glob(PRO_PATH + "/*.json"))

TIME_STR = time.strftime("%Y%m%d-%H%M%S")
EXCEL_FILENAME = f"{TIME_STR}-metrics-diffs.xlsx"


def get_metrics(filepaths: List[str]) -> pd.DataFrame:
"""Get the metrics using a glob of `.json` metric files in a directory.
Parameters
----------
filepaths : List[str]
The filepaths for metrics `.json` files.
Returns
-------
pd.DataFrame
The DataFrame containing the metrics for all of the variables in
the results directory.
"""
metrics = []

for filepath in filepaths:
df = pd.read_json(filepath)

filename = filepath.split("/")[-1]
var_key = filename.split("-")[1]

# Add the variable key to the MultiIndex and update the index
# before stacking to make the DataFrame easier to parse.
multiindex = pd.MultiIndex.from_product([[var_key], [*df.index]])
df = df.set_index(multiindex)
df.stack()

metrics.append(df)

df_final = pd.concat(metrics)

# Reorder columns and drop "unit" column (string dtype breaks Pandas
# arithmetic).
df_final = df_final[["test", "ref", "test_regrid", "ref_regrid", "diff", "misc"]]

return df_final


def get_diffs(df_a: pd.DataFrame, df_b: pd.DataFrame) -> pd.DataFrame:
"""The metrics differences between two DataFrames.
Parameters
----------
df_a : pd.DataFrame
The first DataFrame representing "actual" results (aka development).
df_b : pd.DataFrame
The second DataFrame representing "reference" results (aka production).
Returns
-------
pd.DataFrame
The DataFrame containing absolute and relative differences between
the metrics DataFrames.
"""
# Absolute difference: abs(actual - reference)
df_abs = abs(df_a - df_b)
df_abs = df_abs.add_suffix("_abs")

# Relative difference: abs(actual - reference) / abs(actual)
df_rel = abs(df_a - df_b) / abs(df_a)
df_rel = df_rel.add_suffix("_rel")

# Combine both DataFrames
df_final = pd.concat([df_abs, df_rel], axis=1, join="outer")

return df_final


# %% Get the metrics DataFrames.
df_dev = get_metrics(DEV_GLOB)
df_prod = get_metrics(PROD_GLOB)

# %% Combine metrics DataFrames.
df_dev_pref = df_dev.add_prefix("dev_")
df_prod_pref = df_prod.add_prefix("prod_")
df_metrics = pd.concat([df_dev_pref, df_prod_pref], axis=1, join="outer")
#%%
# Sort the columns
df_metrics = df_metrics[
[
"dev_test",
"prod_test",
"dev_ref",
"prod_ref",
"dev_test_regrid",
"prod_test_regrid",
"dev_ref_regrid",
"prod_ref_regrid",
"dev_diff",
"prod_diff",
"dev_misc",
"prod_misc",
]
]

# %% Get differences between metrics.
df_diffs = get_diffs(df_dev, df_prod)


#%%
with pd.ExcelWriter(EXCEL_FILENAME) as writer:
df_metrics.to_excel(writer, sheet_name="metrics")
df_diffs.to_excel(writer, sheet_name="metric_diffs")


# %% Only get the metrics where the absolute and relative differences are
# greater than a specific threshold (>1%)
2 changes: 2 additions & 0 deletions conda-env/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ dependencies:
- numpy >=1.23.0
- shapely >=2.0.0,<3.0.0
- xarray >=2023.02.0
- xcdat >=0.5.0
- xesmf >=0.7.0
# Testing
# ==================
- scipy
Expand Down
61 changes: 61 additions & 0 deletions conda-env/dev-nompi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Conda development environment for testing local source code changes to `e3sm_diags` before merging them to production (`master` branch).
# This version contains the no MPI version of `esmf` as a workaround for allowing VS Code's testing API to work.
# The MPI version of `esmf` is usually installed by default, but it breaks VS Code's testing API because it throws a mysterious
# `yaksa` warning.
# More info: https://github.com/E3SM-Project/e3sm_diags/issues/737
name: e3sm_diags_dev_nompi
channels:
- conda-forge/label/xcdat_dev
- conda-forge
- defaults
dependencies:
# Base
# =================
- python >=3.9
- pip
- beautifulsoup4
- cartopy >=0.17.0
- cartopy_offlinedata
- cdp 1.7.0
- cdms2 3.1.5
- cdutil 8.2.1
- dask
- esmf >=8.4.0 nompi*
- esmpy >=8.4.0
- genutil 8.2.1
- lxml
- mache >=0.15.0
- matplotlib-base
- netcdf4
- numpy >=1.23.0
- shapely >=2.0.0,<3.0.0
- xarray >=2023.02.0
- xskillscore >=0.0.20
- xcdat==0.6.0rc1
- xesmf >=0.7.0
# Testing
# =======================
- scipy
- pytest
- pytest-cov
# Documentation
# =======================
- sphinx
- sphinx_rtd_theme
- sphinx-multiversion
# Quality Assurance Tools
# =======================
# Run `pre-commit autoupdate` to get the latest pinned versions of 'rev' in
# `.pre-commit.config.yaml`, then update the pinned versions here.
- black=23.9.1
- flake8=6.1.0
- flake8-isort=6.1.0
- isort=5.12.0
- mypy=1.5.1
- pre-commit >=3.0.0
- types-PyYAML >=6.0.0
# Developer Tools
# =======================
- tbump=6.9.0
- ipykernel
prefix: /opt/miniconda3/envs/e3sm_diags_dev
36 changes: 20 additions & 16 deletions conda-env/dev.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# Conda development environment for testing local source code changes to `e3sm_diags` before merging them to production (`master` branch).
name: e3sm_diags_dev
channels:
- conda-forge/label/xcdat_dev
- conda-forge
- defaults
dependencies:
# Base
# =================
# =======================
- python >=3.9
- pip
- beautifulsoup4
Expand All @@ -24,29 +25,32 @@ dependencies:
- numpy >=1.23.0
- shapely >=2.0.0,<3.0.0
- xarray >=2023.02.0
- xskillscore >=0.0.20
- xcdat==0.6.0rc1
- xesmf >=0.7.0
# Testing
# ==================
# =======================
- scipy
- pytest
- pytest-cov
# Documentation
# =================
# =======================
- sphinx
- sphinx_rtd_theme
- sphinx-multiversion
# Quality Assurance Tools
# =======================
# Run `pre-commit autoupdate` to get the latest pinned versions of 'rev' in
# `.pre-commit.config.yaml`, then update the pinned versions here.
- black=23.9.1
- flake8=6.1.0
- flake8-isort=6.1.0
- isort=5.12.0
- mypy=1.5.1
- pre-commit >=3.0.0
- types-PyYAML >=6.0.0
# Developer Tools
# =================
# If versions are updated, also update 'rev' in `.pre-commit.config.yaml`
- black=22.10.0
- flake8=6.0.0
- flake8-isort=5.0.3
- isort=5.11.3
- mypy=0.991
- pre-commit=2.20.0
- pytest=7.2.0
- pytest-cov=4.0.0
- types-PyYAML=6.0.12.6
# Developer Tools
# =================
# =======================
- tbump=6.9.0
- ipykernel
prefix: /opt/miniconda3/envs/e3sm_diags_dev
Loading