-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add public API for Dataset._copy_listed #3894
Comments
Would that be different from ensuring the input is a list?
I very much empathize with the pain from methods being type unstable; indeed I think that's one of the biggest benefits of xarray over pandas. Here, it's stable over the same typed inputs. i.e. if supplied with a list, it returns with a dataset, otherwise it returns a DataArray. (or am I missing something?)
Is there a way in mypy we could use something like |
What's the reasoning for not returning a Dataset when |
I agree, this API is too overloaded. It would be better to have an explicit method for subsetting In early versions of xarray (back when it was called xray), we actually had a
The current check uses hashability to determine whether to try to make a DataArray. In theory, you could put a variable with the name |
That is correct. The output type is predictable from the inputs types. With #4144,
I agree. |
Or maybe "get" since it's a synonym of "select" that isn't overloaded with spatial indexing in the code base. |
NVM, |
We did a similar splitting of functionality recently with So this would leave us with:
The naming doesn't have an obvious pattern here, which seems non-ideal. I can't think of anything much better at the moment, but perhaps it would help to avoid reusing |
I think avoiding how about |
I do think having a I recognize that a hashable iterable (e.g. |
A level down, re the name — I thought |
Would this proposal mean that subsetting variables with |
IIUC, what we're discussing here is adding a new method that treats all sequences the same ( |
At most, I would require using the new method if you want your code to type-check properly. |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
I think the issue is still valid, we just couldn't think of what to name the new API. |
In my data pipelines, I have been repeatedly burned using indexing notation to grab a few variables from a dataset in the following way:
Moreover, because
Dataset__getitem__
is type unstable, it makes it hard to detect this kind of error using mypy, so it often appears 30 minutes into a long data pipeline. It would be great to have a type-stable method that can take any sequence of variable names and return the Dataset consisting of those variables and their coordinates only. In fact, this method already exists, but it currently not public API. Could we make it so? Thanks.The text was updated successfully, but these errors were encountered: