-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficiently reading 1-d datasets, and more… #37
Conversation
Import a dataset that's spread over multiple files, only including specified variables. Concatenates by time.
If unspecified, will import all variables.
Return a DataArray, with defined coordinates (PFT as string), for a given variable in a dataset.
Given a PFT, returns False if it's a tree, grass, shrub, unmanaged, or not vegetated. True otherwise.
Given a list of PFTs, returns a list with True for managed crops and False otherwise.
Given a DataArray, remove all PFTs except managed crops.
Make a geographically gridded DataArray (with PFT dimension) of one timestep in a given variable within a DataSet.
Instead of requiring one timestep specified by an integer, now allows (optionally) integer, str, or slice of either.
Along with all pft-named variables.
Flexibly subset from an xarray DataSet or DataArray, to avoid having to choose between .sel() or .isel(). Selections can be individual values or slices. Similar to what was already in grid_one_variable(), but can also take selection of vegtypes (not yet tested).
…NCOMPLETE. Need to add handling of vegtype "names" when specified as (slice of) integers.
Integer, list of integers, or list of booleans. Also improved efficiency when specifying myVegtypes in xr.open_mfdataset() in import_ds().
Returns the subset of CLM pft names that are managed crops.
As suggested by @andersy005 in NCAR#32 (NCAR#32 (comment)).
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing this PR @samsrabin! I think we should bring this into the repo for others to use / improve. @danicalombardozzi how does this functionality compare to efforts you've made for a similar tool?
Over the past few weeks, I've been working on Python functions to efficiently read 1-d (i.e., not lat-lon gridded) CTSM outputs. I figured these might be useful to the wider community. I'm sure people have developed their own functions outside this repo, so I'm happy to take suggestions for improvements!
Some highlights:
import_ds()
. This is the big one. Given a list of files (or a single file), it reads and concatenates them all along the time dimension. Efficiency is achieved by specifying a list of variables and/or vegetation types to import (as optional argumentsmyVars
andmyVegtypes
, respectively; leave off arguments to import everything). Anything not listed in one of those will not be read into memory. Helps with concatenating monthly history files #32.grid_one_variable()
. Makes a geographically gridded DataArray (with dimensions time, vegetation type [as string], lat, lon) of one variable within a Dataset. Optionally subset by time index (integer) orslice()
to improve efficiency; there's no need to grid your entire timeseries if you only need to make a map of onetimestep
!xr_flexsel()
. Subsets anxarray
object (Dataset or DataArray) alongtime
and/orpatch
dimension (see caveat below) using single integer indices, strings (for dates/times), or slices thereof. More flexible of a selection method than eitherxarray.sel()
and.isel()
, which require strings or integers respectively.One big caveat: My functions rename the
pft
dimension, and all like-named variables (e.g.,pft1d_itype_veg_str
) to be named likepatch
. For compatibility, this can later be reversed using mypatch2pft()
function.See notebooks/SamRabin_examples.ipynb for some simple examples of how to use my functions.