Efficiently reading 1-d datasets, and more… #37

samsrabin · 2021-11-01T18:24:48Z

Over the past few weeks, I've been working on Python functions to efficiently read 1-d (i.e., not lat-lon gridded) CTSM outputs. I figured these might be useful to the wider community. I'm sure people have developed their own functions outside this repo, so I'm happy to take suggestions for improvements!

Some highlights:

import_ds(). This is the big one. Given a list of files (or a single file), it reads and concatenates them all along the time dimension. Efficiency is achieved by specifying a list of variables and/or vegetation types to import (as optional arguments myVars and myVegtypes, respectively; leave off arguments to import everything). Anything not listed in one of those will not be read into memory. Helps with concatenating monthly history files #32.
grid_one_variable(). Makes a geographically gridded DataArray (with dimensions time, vegetation type [as string], lat, lon) of one variable within a Dataset. Optionally subset by time index (integer) or slice() to improve efficiency; there's no need to grid your entire timeseries if you only need to make a map of one timestep!
xr_flexsel(). Subsets an xarray object (Dataset or DataArray) along time and/or patch dimension (see caveat below) using single integer indices, strings (for dates/times), or slices thereof. More flexible of a selection method than either xarray.sel() and .isel(), which require strings or integers respectively.

One big caveat: My functions rename the pft dimension, and all like-named variables (e.g., pft1d_itype_veg_str) to be named like patch. For compatibility, this can later be reversed using my patch2pft() function.

See notebooks/SamRabin_examples.ipynb for some simple examples of how to use my functions.

Import a dataset that's spread over multiple files, only including specified variables. Concatenates by time.

If unspecified, will import all variables.

…tring.

Return a DataArray, with defined coordinates (PFT as string), for a given variable in a dataset.

Given a PFT, returns False if it's a tree, grass, shrub, unmanaged, or not vegetated. True otherwise.

Given a list of PFTs, returns a list with True for managed crops and False otherwise.

Given a DataArray, remove all PFTs except managed crops.

Make a geographically gridded DataArray (with PFT dimension) of one timestep in a given variable within a DataSet.

Instead of requiring one timestep specified by an integer, now allows (optionally) integer, str, or slice of either.

Along with all pft-named variables.

Flexibly subset from an xarray DataSet or DataArray, to avoid having to choose between .sel() or .isel(). Selections can be individual values or slices. Similar to what was already in grid_one_variable(), but can also take selection of vegtypes (not yet tested).

…NCOMPLETE. Need to add handling of vegtype "names" when specified as (slice of) integers.

Integer, list of integers, or list of booleans. Also improved efficiency when specifying myVegtypes in xr.open_mfdataset() in import_ds().

Returns the subset of CLM pft names that are managed crops.

@andersy005

As suggested by @andersy005 in NCAR#32 (NCAR#32 (comment)).

review-notebook-app · 2021-11-01T18:24:51Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

wwieder

Thanks for contributing this PR @samsrabin! I think we should bring this into the repo for others to use / improve. @danicalombardozzi how does this functionality compare to efforts you've made for a similar tool?

samsrabin added 30 commits October 19, 2021 11:55

Added Jupyter notebook and regular Python script.

50eb92f

Functionized extraction of a variable to DataArray.

f008f52

Added cell: Print sowing and harvest date arrays for each crop.

9d51124

1d: Reworked variable plotting.

b2f88d5

Added 2d_crop_work.py.

bfbfd74

Added import_ds_from_filelist() to utils.py.

96413c0

Import a dataset that's spread over multiple files, only including specified variables. Concatenates by time.

myVars now optional in import_ds_from_filelist().

2f25d49

If unspecified, will import all variables.

Read in ALL dimensions in import_ds_from_filelist().

40ac1e0

Added to utils.py: List of PFTs used in CLM (pftlist).

f6e018a

Added to utils.py: function to get PFT of each patch as integer and s…

19a9dda

…tring.

import_ds_from_filelist() now also returns vegtypes dictionary.

f3c34a0

Added to utils.py: function get_thisVar_da().

20fbeef

Return a DataArray, with defined coordinates (PFT as string), for a given variable in a dataset.

Added to utils.py: is_this_mgd_crop().

febd079

Given a PFT, returns False if it's a tree, grass, shrub, unmanaged, or not vegetated. True otherwise.

Added to utils.py: is_each_mgd_crop().

1c9ce06

Given a list of PFTs, returns a list with True for managed crops and False otherwise.

Added to utils.py: trim_to_mgd_crop().

36d62d8

Given a DataArray, remove all PFTs except managed crops.

Added to utils.py: grid_one_timestep().

e6ad064

Make a geographically gridded DataArray (with PFT dimension) of one timestep in a given variable within a DataSet.

Updated 2d_crop_work.py to use new functions in utils.py.

3033062

Removed managed-crop restriction step from grid_one_timestep().

3083724

grid_one_timestep() is now grid_one_variable().

d0f043e

Instead of requiring one timestep specified by an integer, now allows (optionally) integer, str, or slice of either.

Changes to 1d_crop_work.ipnyb.

af96ea9

Start of comparing read-in sowing dates in 1d script.

6d7a6a1

Added clm_yield_conv.ipynb.

d9ba88f

Map now uses an actual crop.

4e1c235

import_ds_from_filelist() now renames dimension "pft" to "patch".

ee42569

Along with all pft-named variables.

Commented out bit about dates in matplotlib format.

c57e0af

import_ds_from_filelist() now just import_ds(): Can provide just 1 file.

d564668

import_ds() now avoids expanding dimensions where unnecessary.

3ee2b8a

Cleaning up 2d_crop_work.py.

d360d84

pftlist now produced by a function.

b730936

samsrabin added 15 commits October 29, 2021 17:09

Generalized functions to find matching (or NOT matching) vegtypes.

e1fb811

Moved is_*_vegtype() functions.

d4e21b7

import_ds() can now handle specified exact vegtype names to import. I…

f8fc966

…NCOMPLETE. Need to add handling of vegtype "names" when specified as (slice of) integers.

vegtype selection in xr_flexsel() now uses integers for efficiency.

4e4df29

xr_flexsel() can now handle more types of vegtype input.

797706f

Integer, list of integers, or list of booleans. Also improved efficiency when specifying myVegtypes in xr.open_mfdataset() in import_ds().

Added function define_mgdcrop_list().

7926263

Returns the subset of CLM pft names that are managed crops.

To-do/comment changes re: xr_flexsel().

fd081f4

Moved and improved description of check_sel_type().

78b403f

import_ds() now ensures filelist is sorted.

11c232f

As suggested by @andersy005 in NCAR#32 (NCAR#32 (comment)).

Added function patch2pft() to restore original "patch" dim/var names.

d28885c

Correction to call of trim_da_to_mgd_crop().

2e8a967

Commenting improvements.

713c758

is_this_vegtype() now checks data type of this_vegtype.

453d0d0

Added SamRabin_examples notebook.

79b27d7

Moved dev scripts into ignore/ directory.

0be1e01

wwieder approved these changes Nov 2, 2021

View reviewed changes

wwieder merged commit 2730c78 into NCAR:master Nov 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficiently reading 1-d datasets, and more… #37

Efficiently reading 1-d datasets, and more… #37

samsrabin commented Nov 1, 2021 •

edited

Loading

review-notebook-app bot commented Nov 1, 2021

wwieder left a comment

Efficiently reading 1-d datasets, and more… #37

Efficiently reading 1-d datasets, and more… #37

Conversation

samsrabin commented Nov 1, 2021 • edited Loading

review-notebook-app bot commented Nov 1, 2021

wwieder left a comment

Choose a reason for hiding this comment

samsrabin commented Nov 1, 2021 •

edited

Loading