-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas Enhancement Proposals? #28568
Comments
We have a new section in our roadmap that I think covers this: https://dev.pandas.io/docs/development/roadmap.html#roadmap-evolution |
In the case of #24046, it was exactly this point that was in question, and why I opened this issue here. I admit I hadn't seen that section about the roadmap/evolution, but the question remains (to me): has some more formal process been considered? I think pandas is big enough to warrant it. The way I see it, there's a flood of issues and PRs, and with scarce core-dev resources, any given issue PR will mostly only receive as much attention as is absolutely necessary (which is fine; no judgement about that). But this means that for larger topics, it's very hard to fully lay out a proposal and discuss its merits, because people jump into threads (that get longer and longer) at different points, and it's way too easy to get caught up in discussions about semantics or other side issues, so that eventually, it's very hard to keep track of what the status is, and people spend their scarce time elsewhere. A PEP-like format could take a longer approach, where feedback is requested on the whole proposal, and not just the latest comment on the GH issue. Maybe I'm too naive about how this would work out, but it seems to work reasonably well for PEPs/NEPs. Don't get me wrong, live would go on without PAEPs, but I'd be more hesitant about investing my time for (re-)running the gauntlet in the format that has failed (from my POV) several times already. |
You can check the discussion on the roadmap PR where the evolution process was discussed a bit. I'm not sure what all was proposed as alternatives to design issues. |
PEPs/NEPs are not a bad thing in absolute, clearly, but I don't see how they would increase the free time of devs, which is the real scarce thing. If any, a new procedure will increase the burden. @h-vetinari You start from the assumption that #24046 would have benefited from more attention from the devs, but it really got a lot. Something I would be happy to recommend would be, in any API-related issue, a comment (e.g., the second of the discussion) which is always updated to reflect the status of the current proposal(s). But I wouldn't make the process more formal than this. I perfectly understand the frustration when you feel you have wasted time. The way I (slowly learned to) try to avoid this in my pandas contributions which affect user behaviour is i) first discuss, then propose a PR ii) try to proceed one step at a time (first settle one point, then another). There are plenty of issues on github that require limited discussion: some instead do - and notice they are not necessarily more important, they are just more complicated. |
Sorry for the long time to respond. @TomAugspurger
I brought the question of pandas PEPs/NEPs up as well in #24046, where there has just been a resolution of a long discussion thread - which could be considered documentation. However, I think far-reaching or controversial decisions should aim for a higher standard (following the PEP model):
|
@toobaz
This is (emphatically) not my assumption. My assumption is that there was in fact so much discussion that people got lost about what the current status is. I believe that discussing a design document would have focused the discussion, rather than stretch it out to the point where people gave up participating.
Dev time is precious, I agree (and would also benefit IMO; see above). But it would not necessarily be more difficult than reviewing and merging a PR that introduces the proposal (way before starting the implementation). And for PRs from devs where all others are in agreement anyway, you could just skip the proposal process.
I really want to insist that the idea behind the proposal process goes far beyond (indeed precedes) the decision of #24046, and is not connected to frustration on my part (actually, I'm more happy that it's resolved than unhappy at the outcome).
I've had to learn this as well (still am), but I have tried to have those discussions. It's precisely those issues where the discussion stalls or is too complicated / dispersed that I want to address. Some examples that affected me:
But I really don't believe I'm special in this regard, and that the process would help other corners of the API, e.g. around serialisation and other interfaces. In short: The more intricate the API implications, the harder it is to discuss in github comments (because there is usually too many things to consider at the same time, or the comments/threads get ridiculously long or both). That does not mean that the given change does not have merit though, just that it's (likely) too hard to discuss in a thread format. |
I was wondering if there has been discussion about having dedicated RFC / Proposals for larger changes to pandas?
For example, after a rather heated discussion in #24046, @wesm asked:
I wanted to come back to this and maybe write such a design document, and therefore wanted to check what the thoughts of the core team are about this? Needless to say, the obvious candidates to model this after would be Python's PEPs or NumPy's NEPs, but maybe there could/should be pandas-specific departures.
I think this could benefit several high-impact design questions (e.g. around extension arrays or release plans). Actually, I have 2-3 other bigger PRs that eventually stalled in design discussion (and a few ideas of similar magnitude), where I could imagine that such a vessel would help move the discussion forward.
The text was updated successfully, but these errors were encountered: