-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDEP-1: Purpose and guidelines for pandas enhancement proposals #47444
Conversation
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
This would probably close #28568 |
Co-authored-by: Matthew Roeschke <emailformattr@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't immediately see this, but it would be good to explain how an accepted/rejected decision is made. This looks to be Numpy's: https://numpy.org/neps/nep-0000.html#how-a-nep-becomes-accepted
For example, it would be good to define:
- How consensus is defined: 100%, 2/3rds, 51% approval from core devs?
- How to collect consensus: PDEP PR? Monthly dev meeting?
- Should a "call to vote" action be available to unblock perpetual discussion?
Very good point. I thought this was quite a big topic, and maybe more related to general governance than specific to PDEPs. That's why for now I decided to just link to the governance documentation. Based on the current governance, decisions are made by consensus, with the BDFL (Wes) having the final word when consensus is not possible. I think the governance documentation is 7 years old. Many things changed since then, including the structure of the team (from like 3 active core devs to like 20, Python moved out of a BDFL model...). Also, besides the decision making, I think our governance seems outdated. I personally don't know most of the people in the pandas CoC committee, and I assume they are not performing any task, and probably don't even know they are part of it. While I fully agree on what you say, I'd personally keep the discussions separate, as I think both are quiet big, and can be taken care of separately. What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Agreed that general decision making details can go under governance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG overall. Just had a few questions as I was going through it.
|
||
#### Accepted PDEP | ||
|
||
A PDEP will be accepted by the core development team, and decisions will be made |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe reword "will" to something else? This language makes it feel as if all PDEP's will be accepted.
#### Submitting a PDEP | ||
|
||
Proposing a PDEP is done by creating a PR adding a new file to `web/pdeps/accepted/`. | ||
The file is a markdown file, you can use `web/pdeps/accepted/0001.md` as a reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it would be convenient to have an actual blank template PDEP file to be filled out.
It might also be nice to have a script that generates the next PDEP number for you(I anticipate it might be hard to find the next PDEP number if a lot of PDEPs are submitted in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd personally leave that for later, once we need it. Those sound like good ideas to me, but I wouldn't implement them initially, I would wait to see how things work first. If we merge like PDEP per month, I think checking the last PDEP number before merging is easier than a system to autogenerate them. And for a template, I'd wait to have few actual PDEPs before deciding if it helps, or if PDEPs are too different from one to another.
Does it make sense to you to start simple and iterate later as we have more experience?
## Evolution of this PDEP | ||
|
||
While most PDEPs aren't expected to change after accepted, in some cases like this | ||
PDEP, they will be updated to contain its latest version, if things evolve. A log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably clarify when a PDEP can be revised v.s. when a new PDEP has to be submitted superseding the old PDEP.
The file is a markdown file, you can use `web/pdeps/accepted/0001.md` as a reference | ||
for the expected format. | ||
|
||
By default, we expect a PDEP will be accepted, so the PR of a PDEP should be done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to also have a public comment period on PDEP's like tensorflow has with their RFC's?
We should probably also clarify when voting happens(define what proportion of core team members need to consider a PDEP ready before voting like @mroeschke stated) and how long core team members have to vote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to discuss and hear other opinions. But to me personally, I wouldn't have a voting period or deadline.
I see PDEPs more like ongoing discussions, with feedback and updates, than just a voting process. Also, I assume tensorflow core devs are mainly google employees, so imposing a deadline makes more sense, as they are supposed to be working on the project X hours. But for a mostly volunteer developed project like pandas, I'd leave it more open.
But again, I'd start by seeing how things work, and if we see PDEPs keep open for too long, and seems like having a timeframe should help, surely worth giving it a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But again, I'd start by seeing how things work, and if we see PDEPs keep open for too long, and seems like having a timeframe should help, surely worth giving it a try.
I agree, just having a PDEP is akin to a new process, so having a process for the PDEP is maybe like introducing another meta process at the same time. So in response to this comment and others related to the process, I agree with @datapythonista that to begin with we just discuss whether we want to use PDEPs, what we want out of a PDEP, and roughly what it should contain and iterate on the gaps over time. The only caveat here, is that once someone has spent effort preparing a PDEP and it is approved that we don't have "blockers" implementing, so that does affect the approval process to some degree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree - we need a voting process here - we could emulate that of NEP which is pretty simple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a voting process here
I think what Marc proposed above is to defer a discussion about the decision process (basically our governance model) for a subsequent discussion.
(which sounds good to me to keep the discussion manageable)
for the expected format. | ||
|
||
By default, we expect a PDEP will be accepted, so the PR of a PDEP should be done | ||
in the `accepted` directory and contain `Status: Accepted`. If a PDEP is finally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could alsof default to start a PDEP in a "draft" status (or "under discussion", I see there is a section for those)?
To me that seems to better signal the actual status (when going through the diff of this one, I wondered why it was saying " accepted" since we are still discussing the proposal). When we decide to merge, it's then only a small change to update the status before merging. And for larger PDEPs it might also be useful to merge it already in draft status (to have a rendered version to link to, like they do with PEPs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
Regarding merging PDEPs as draft, feels like it could make discussions more difficult. Like where to have the discussion once the PR is merged. Via new PRs? Or in a second PR, not being able to comment in parts not in the diff. But I guess in some cases it can be useful, maybe a PR with different parts, and merging as a Draft once the first part has been discussed, and before working on other parts.
I'll update the document if there are no objections to using Under discussion status (or Draft if people prefer this name) until PR is Accepted. And to consider merging PRs before they are accepted when it's useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO I think the current guidance stated here is clear as long as a PDEP is an open PR, it should be considered "under discussion" and a draft PDEP can be a draft PR for example. I agree with @datapythonista that merging intermediate states of a PDEP would fragment the discussion and there shouldn't be more than one page/PR/location where a PDEP can be discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding merging PDEPs as draft, feels like it could make discussions more difficult.
We will probably have to see how it goes in practice. For many PDEPs a single PR with proposal, discussion, acceptance all in one will quite likely be sufficient.
But I can imagine for larger/longer-running discussions (I am thinking eg about the copy/view proposal), that it could be useful to already have a hosted version of PDEP that is still under discussion.
As some references: numpy seems to already merge NEP PRs if they are still under discussion (eg numpy/numpy#18456), but they also have substantial discussion on the mailing list. Similarly in Python, PEPs are merged quickly, but also there the discussion is mostly happening on mailing list or discourse. That's of course different with the current way in pandas where most discussion typically happens on GitHub.
I updated the PR to incorporate the feedback from comments, and I replied to the ones I didn't directly incorporate. I also fixes couple of visualization issues, and improved a bit the styles, so the roadmap and also other pages are clearer. Let me know of additional feedback, and feel free to approve when you're happy with getting this merged, or add a |
@pandas-dev/pandas-core any other comment before merging this? If there are no objections I'll merge this in couple of days, and we can keep iterating as needed in separate PRs. |
The file is a markdown file, you can use `web/pdeps/accepted/0001.md` as a reference | ||
for the expected format. | ||
|
||
By default, we expect a PDEP will be accepted, so the PR of a PDEP should be done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure this is true - sure a lot will easily be accepted but some might be controversial and ultimately not accepted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. maybe just qualify this by changing By default, we expect a PDEP will be accepted
to something like By default, we expect a PDEP (with some preliminary discussion) will be accepted
before this text we have...
A PDEP should be used for changes that are not
immediate and not obvious, and are expected to require a significant amount of
discussion and require detailed documentation before being implemented.
so the PEP is part of the detailed documentation and an issue should perhaps be the initial part of the discussion. (just as I would normally expect say a bug fix PR to have an associated issue)
and we also have
If you are unsure if you should be opening
an issue or creating a PDEP, it's probably safe to start by
opening an issue, which can
be eventually moved to a PDEP.
I think that maybe issues should always be opened before submitting a PEP, either as a specific issue or an existing issue concluding that a PEP is required. Maybe we also need to ensure that @pandas-dev/pandas-core
is also notified in these cases.
If we have some form of gate for opening a PEP, then most should be accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also just leave out the "By default, we expect a PDEP will be accepted" ? To me it doesn't seem to add much, rather than a potentially wrong/confusing message (in general PDEPs are for topics that are not trivial and thus will not always be accepted)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @datapythonista generally lgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful to somewhere list the possible "status" values? (now those seem to be included in the different workflow sections, but it's not very explicit)
|
||
## PDEP definition, purpose and scope | ||
|
||
A PDEP (pandas enhancement proposal) is a proposal to a **major** change in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A PDEP (pandas enhancement proposal) is a proposal to a **major** change in | |
A PDEP (pandas enhancement proposal) is a proposal for a **major** change in |
? (not fully sure, but "proposal to" sounds like it should be followed by a verb
|
||
- The core development team, who will have the final decision on whether a PDEP | ||
is approved or not | ||
- Developers of pandas and other related projects, and experienced users. Their |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Developers of pandas and other related projects, and experienced users. Their | |
- Contributors to pandas and other related projects, and experienced users. Their |
? (might be a bit more generic wording, as "developer" sounds more code-centric?)
The file is a markdown file, you can use `web/pdeps/accepted/0001.md` as a reference | ||
for the expected format. | ||
|
||
By default, we expect a PDEP will be accepted, so the PR of a PDEP should be done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also just leave out the "By default, we expect a PDEP will be accepted" ? To me it doesn't seem to add much, rather than a potentially wrong/confusing message (in general PDEPs are for topics that are not trivial and thus will not always be accepted)
#### Rejected PDEP | ||
|
||
A PDEP can be rejected when the final decision is that its implementation is | ||
not the best for the interests of the project. They are as useful as accepted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "accepted" seems a bit strange here, since this is about PDEPs that are not accepted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to contrast rejected PDEPs against accepted PDEPs; i.e. "even rejected PDEPs are useful". I think the wording is okay, but maybe "Rejected PDEPs are just as useful as accepted PDEPs..." would make it more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed the first "as" in the sentence, so I read it as "they are useful as accepted PDEP", instead of "they are as useful as accepted PDEPs", hence the confusion.
@@ -0,0 +1,121 @@ | |||
# PDEP-1: Purpose and guidelines | |||
|
|||
- Date: 21 June 2022 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "Created" instead of "Date"? Since this will typically the date when the process is started, and not when it was eg accepted, "created" might denote that more correctly (and both NEPs and PEPs seem to use that)
Related to my previous comment (#47444 (comment)) about starting a new PDEP as draft / under discussion: this has been changed in the text, but at the same time we still say to create the new PDEP file in the Taking one step back: do we actually need those different directories? (I suppose they are currently used to simplify the script to automate the website content to list the PDEPs) |
Thanks all for the feedback, very interesting points. For the voting, I'm working on the governance documents in parallel, and I'll propose changes there to have a way to decide when a PDEP is ready to merge. For the name to use, there are several comments about names, and many valid points. It's probably not so important, but to find something that most people are happy with, I think we can run a quick poll. I'll write here different options including the ones already proposed. Feel free to suggest more, and when we're ready I'll send a link to the pandas-dev list with a poll.
I'll be addressing the rest of the comments in this PR. |
May I suggest adding a template for new proposals as part of this PR, or as a follow up if that's preferable? 😇 I find templates useful to help facilitate structured discussions, and that important points relevant for consideration aren't missed by the proposal author(s) before submitting them, e.g. how would this affect backwards compatibility. For reference I include these two templates which could be relevant as a starting point:
|
Thanks all for the feedback. Seems like the preferred name is PDEP, so keeping it. I addressed all the comments, there are two pending things, that I think make more sense to address in future PRs:
Let me know if any other comment. @jreback you reviewed with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @datapythonista
web/pandas/about/roadmap.md
Outdated
|
||
<div class="alert alert-warning" role="alert"> | ||
pandas is in the process of moving roadmap points to PDEPs (implemented in | ||
June 2022). During the transition, some roadmap points will exist as PDEPs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
June 2022). During the transition, some roadmap points will exist as PDEPs, | |
July 2022). During the transition, some roadmap points will exist as PDEPs, |
to match the PDEP created date? but probably August.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah switch to August :->
status will be changed to `Status: Implemented`, so there is visibility that the PDEP | ||
is not part of the roadmap and future plans, but a change that it already | ||
happened. The first pandas version in which the PDEP implementation is | ||
available will also be included in the PDEP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not yet relevant, but would this be just a note in the revision history or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. some open comments by @simonjayhawkins
web/pandas/about/roadmap.md
Outdated
|
||
<div class="alert alert-warning" role="alert"> | ||
pandas is in the process of moving roadmap points to PDEPs (implemented in | ||
June 2022). During the transition, some roadmap points will exist as PDEPs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah switch to August :->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding putting up PDEPs under discussion as open PRs, the one negative aspect is that they aren't protected. The author can force-push making changes hard to follow. I think this will work well assuming no bad actors, and I think that's an okay assumption to make, but wanted to be explicit about it. Also, this downside is outweighed by the upside of being able to have all discussion in a consolidated place in my opinion.
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
Thanks all for the reviews and feedback! |
An unexpected & pleasant surprise that this old issue of mine got fixed. Not sure I have the time/energy to pour those old PDEP-sized ideas into documents, but glad to know that there'd now be a way to do it. |
Thanks @datapythonista for the PR.
Thanks @h-vetinari. feel free to comment on those issues that a PDEP may be a good idea as others may think that they are worth pushing now. |
and if you have already put a lot of time and effort in the discussion, you may have already made it alot easier to progress. |
Closes #28568
Initial PDEP to define purpose and guidelines for pandas enhancement proposals (equivalent to PEPs or NEPs). Feedback very welcome.
This PR also makes the PDEPs public in the roadmap page of our website.