Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiently reading 1-d datasets, and more… #37

Merged
merged 45 commits into from
Nov 2, 2021

Conversation

samsrabin
Copy link
Member

@samsrabin samsrabin commented Nov 1, 2021

Over the past few weeks, I've been working on Python functions to efficiently read 1-d (i.e., not lat-lon gridded) CTSM outputs. I figured these might be useful to the wider community. I'm sure people have developed their own functions outside this repo, so I'm happy to take suggestions for improvements!

Some highlights:

  • import_ds(). This is the big one. Given a list of files (or a single file), it reads and concatenates them all along the time dimension. Efficiency is achieved by specifying a list of variables and/or vegetation types to import (as optional arguments myVars and myVegtypes, respectively; leave off arguments to import everything). Anything not listed in one of those will not be read into memory. Helps with concatenating monthly history files #32.
  • grid_one_variable(). Makes a geographically gridded DataArray (with dimensions time, vegetation type [as string], lat, lon) of one variable within a Dataset. Optionally subset by time index (integer) or slice() to improve efficiency; there's no need to grid your entire timeseries if you only need to make a map of one timestep!
  • xr_flexsel(). Subsets an xarray object (Dataset or DataArray) along time and/or patch dimension (see caveat below) using single integer indices, strings (for dates/times), or slices thereof. More flexible of a selection method than either xarray.sel() and .isel(), which require strings or integers respectively.

One big caveat: My functions rename the pft dimension, and all like-named variables (e.g., pft1d_itype_veg_str) to be named like patch. For compatibility, this can later be reversed using my patch2pft() function.

See notebooks/SamRabin_examples.ipynb for some simple examples of how to use my functions.

Import a dataset that's spread over multiple files, only including specified variables. Concatenates by time.
If unspecified, will import all variables.
Return a DataArray, with defined coordinates (PFT as string), for a given variable in a dataset.
Given a PFT, returns False if it's a tree, grass, shrub, unmanaged, or not vegetated. True otherwise.
Given a list of PFTs, returns a list with True for managed crops and False otherwise.
Given a DataArray, remove all PFTs except managed crops.
Make a geographically gridded DataArray (with PFT dimension) of one timestep in a given variable within a DataSet.
Instead of requiring one timestep specified by an integer, now allows (optionally) integer, str, or slice of either.
Flexibly subset from an xarray DataSet or DataArray, to avoid having to choose between .sel() or .isel(). Selections can be individual values or slices. Similar to what was already in grid_one_variable(), but can also take selection of vegtypes (not yet tested).
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Collaborator

@wwieder wwieder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing this PR @samsrabin! I think we should bring this into the repo for others to use / improve. @danicalombardozzi how does this functionality compare to efforts you've made for a similar tool?

@wwieder wwieder merged commit 2730c78 into NCAR:master Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants