-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: bool(pd.NA) #38224
Comments
cc @pandas-dev/pandas-core if anyone has a reference to this original discussion. |
bool(nan) is dictated by Python, not NumPy:
I doubt it was picked very intentionally, though. In my opinion, the current behavior for |
Agreed with Stephan. Even if we had a typed Since Python requires that |
As for the history of it / original discussion: this behaviour was already included in the PR originally adding pd.NA (#29597), but apart from one comment of Tom (#29597 (comment)), there was not much discussion about this specific aspect (also not in the main NA issue #28095). Personally, I would rather say that the behaviour of A general interpretation of Yes, it will be annoying to deal with in some cases. But at least it ensures you deal with it explicitly. |
I really support this reasoning. |
I agree with everything said so far about the intended semantics. But I'm also concerned about the raising example from the OP. I think we're going to be chasing down corner cases for years:
Caveat: I have not had occasion to use pd.NA in my own work, so my experience with it is exclusively in debugging problems it has caused, which is bound to jaundice my view. |
@jbrockmendel That seems right to me. NaNs are not equal either. Missing values should also be not equal. in the sense that "objects that are not equal" are False. |
The specific example that sparked the discussion was
Note that this is actually also buggy for pandas/pandas/core/ops/common.py Lines 115 to 119 in 4c8d66e
So also for np.nan being not equal to itself, we need special handling in certain places (again, the fact that it doesn't raise (in this specific example) is friendlier for the user. But to find those cases for us as developer, having it raise can actually help). There will certainly be a good amount of "chasing down" those cases where we need special NA handling, that will be inevitable I think (but thanks to your refactorings, the example of "name resolution" for ops is now much more centralized, which should make it easier to fix this)
Indeed, that should be NA instead of False. But because of the assignment, you actually have an object dtype Series, and it is a known issue that NA in object type doesn't yet work: #32931 |
removing milestone |
Request for clarification: is bool(pd.NA) raising intrinsically inseparable from having pd.NA propagate in ops? |
Can you clarify your question? Has it mentioned before that those two issues are inseparable? I think it are two aspects of the NA design, and this issue is about the |
At first glance I think it would be possible to change |
xref #38102 (comment)
for
np.nan
we definebool(np.nan)
however we don't do this for
pd.NA
I think this is pretty odd and not very useful.
The text was updated successfully, but these errors were encountered: