You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These notebooks are designed to verify that the time series files we generate are bit-for-bit identical with the history files produced by the model. Right now, the notebooks rely on diag_metadata.yaml to determine which variables are compared, which means
Only a subset of variables from pop.h are checked
For the 3D fields listed in the YAML file, we only check a subset of the vertical levels
The other streams (pop.h.nday1, pop.h.nyear1, cice.h, cice.h1) are not checked at all
Perhaps a smart parallelization technique would make it feasible to check all variables across all streams?
The text was updated successfully, but these errors were encountered:
As of d604c92 in #29 I am no longer running da.identical() to compare data, but I am verifying that time series files for every variable in the CESM history files exist. This is done for all five streams: pop.h, pop.h.nday1, pop.h.nyear1, cice.h, and cice.h1.
I tried running
history_filenames=case.get_history_files(year, stream)
# open_mfdataset_kwargs: data_vars="minimal", compat="override", coords="minimal", parallel=Trueds_hist=xr.open_mfdataset(history_filenames, **open_mfdataset_kwargs)
# vars_to_check = [var for var in ds_hist.data_vars if "time" in ds_hist[var].coords and var != "time_bound"]vars_to_check= ["TEMP"]
forvarinvars_to_check:
timeseries_filenames=case.get_timeseries_files(year, stream, var)
ds_ts=xr.open_mfdataset(timeseries_filenames, **open_mfdataset_kwargs)
# limiting comparison to single level works fine# da_hist = ds_hist[var].isel(z_t=0)# da_ts = ds_ts[var].isel(z_t=0)# comparing full 3D field blows memory, even with dask (cluster.scale(12))da_hist=ds_hist[var]
da_ts=ds_ts[var]
ifda_hist.identical(da_ts):
print(f"{var} is the same in history and time series")
else:
print(f"{var} is DIFFERENT in history and time series")
and, as the inline comments indicate, was blowing memory even with cluster.scale(12) while comparing a single level was fine in serial or parallel. In fact, I saw modest performance gains from running in parallel:
with isel(z_t=0)
----
Parallel, cluster.scale(n=8):
CPU times: user 4.28 s, sys: 92.3 ms, total: 4.38 s
Wall time: 16.4 s
Serial:
CPU times: user 19.7 s, sys: 3.17 s, total: 22.9 s
Wall time: 25.1 s
These notebooks are designed to verify that the time series files we generate are bit-for-bit identical with the history files produced by the model. Right now, the notebooks rely on
diag_metadata.yaml
to determine which variables are compared, which meanspop.h
are checkedpop.h.nday1
,pop.h.nyear1
,cice.h
,cice.h1
) are not checked at allPerhaps a smart parallelization technique would make it feasible to check all variables across all streams?
The text was updated successfully, but these errors were encountered: