-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xarray wrapper for linear_regression #119
Comments
Nice. My thoughts (but happy to go the other way if you'd prefer):
|
Reading a bit more about ufunc, perhaps not using ufunc is the better choice. The docs (https://numpy.org/doc/stable/reference/ufuncs.html) say, "A universal function (or ufunc for short) is a function that operates on ndarrays in an element-by-element fashion". Our regression doesn't operate element by element, it operates 'all at once'. That suggest to me that ufunc might actually be inappropriate in this case (on the other hand maybe it doesn't matter that we're doing all at once, using xarray's ufunc just does the co-ordinate handling for us and we shouldn't get too worried about whether it's the exact perfect use case or not). |
Agree - we should not get hung up on the semantics. Having said that - it might be a generalized ufunc, which operates vector-wise. Or maybe not - because it only works only along a very specific axis... Thanks for the feedback - next round:
from typing import Optional, Mapping
def linear_regression_xr(
predictors: Mapping[str, xr.DataArray] = None,
target: xr.DataArray = None,
dim: str = None,
weights: Optional[xr.DataArray] = None,
):
# TODO: check predictors are 1D & contain `dim`
# TODO: check target is 2D and contains `dim`
# TODO: check weights is 1D and contains `dim`
predictors_concat = xr.concat(tuple(predictors.values()), dim="predictor")
target_dim = list(set(tgt.dims) - {dim})[0]
out = linear_regression(
predictors_concat.transpose(dim, "predictor"),
target.transpose(dim, target_dim),
weights,
)
keys = ["intercept"] + list(predictors)
dataarrays = {key: (target_dim, out[:, i]) for i, key in enumerate(keys)}
out = xr.Dataset(dataarrays, coords=target.coords).drop(dim)
if weights is not None:
out["weights"] = weights
return out Example: linear_regression_xr(predictors={"tas": pred0}, target=tgt, dim="time")
linear_regression_xr(predictors={"tas": pred0, "hfds": pred1}, target=tgt, dim="time")
linear_regression_xr(
predictors={"tas": pred0, "hfds": pred1},
target=tgt,
weights=weights,
dim="time",
) |
Nice, I think that is what it is (I'm not sure it matters that it's meant to be used on a specific axis, it could in theory be used on any axis).
I agree. Passing DataSets is probably better, particularly for unit handling.
This would be solved if we used DataSets instead right? |
I also think
Yes - would you pass all predictors in one Dataset or as several Datasets? I have a slight preference to pass every predictor individually. predictors={"tas": pred0, "hfds": pred1}
predictors=xr.Dataset({"tas": pred0, "hfds": pred1})
Shouldn't the unit attribute be on the DataArray? Or why is it better for unit handling? |
Yes in my head we would tell it which variables to use
One dataset was what I was thinking (just so they share co-ordinates...)
Yes, ignore me |
#116 introduces a numpy version for
linear_regression
- as discussed with @znicholls this morning/ evening we should add a xarray wrapper for this. I created a prototype for this but there are two open decisions:intercept
be a separateDataArray
xr.apply_ufunc
or not?proof of concept implementation
create test data
Without
xr.apply_ufunc
:With
xr.apply_ufunc
:observations
xr.apply_ufunc
requires less code - needs to be tested how easy it works when using more real world coordsestimate
function.edit: renamed the test variables
The text was updated successfully, but these errors were encountered: