-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formal governance document #5676
Comments
I have one suggestion that might sound too radical. Why not simply put data.table under the umbrella of rOpenSci? |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as off-topic.
This comment was marked as off-topic.
hi @waynelapierre thanks for your comment. Can you clarify what is the "well defined governance structure" of rOpenSci? I found https://softdev4research.github.io/4OSS-lesson/04-contributions/ which lists some general recommendations about what an open-source project governance document should contain, but I was not able to find a document describing governance of the rOpenSci project. |
Hi Toby - thank you for this! I won't rush to comment save for one initial thought regarding triaging of both issues and PRs. Triage is a somewhat hidden layer between contributors and reviewers. I'll often see Jan diligently handling this and think it would be good if this process was captured in the document. |
Would the reviewers have commit rights to the main branch? The chart implies that they don't and combined with:
That keeps the current problems around... I would recommend a read of the substrait governance which is based on the ASF system of a group of committers and a PMC with some changes to make room for automatic releases and such (which currently are not possible for ASF projects) . |
Perhaps a survey would help in accumulating feedback faster and from more people?
|
This comment was marked as resolved.
This comment was marked as resolved.
It would indeed be nice to hear from @mattdowle if he has any strict requirements for the new governance -- i.e., proposals/guidelines to which he would not give final approval. Regarding timelines, it would be good to build in an acceleration mechanism, whereby we can move to the next phase in approval once sufficient consensus is achieved (e.g. certainly among @tdhock @jangorecki and myself, with perhaps a few more). It does seem to me surprising that we expect it will take a year to establish the new governance & then move to releasing new data.table code. |
Hi @sluga thanks for sharing. The survey sounds like a good idea, that would help data.table developers and contributors get an idea about what kinds of features the users would like to prioritize. Would you be willing to set that up? (maybe this could be an additional role mentioned in the governance document, survey manager) |
I would like to invite the following people to comment on this proposal, because they are listed among the top contributors https://github.com/Rdatatable/data.table/graphs/contributors |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
I would like to have release ASAP (considering how many of my PRs are in queue to get merged it is quite obvious), but I don't think we should push for it in current situation. It is more important to put back the project to be thriving again. Having release now or year later is secondary problem. Matt is doing hot fixes for CRAN whenever needed, thanks to that package is still on CRAN. Let's appreciate that we got that time and let's look for a solutions that will work in the long run, not just release ASAP. We have this amazing opportunity thanks to Toby initiative. Personally I would like this process to be faster, like 1 month for each step, but I trust Toby, because I feel he knows that stuff better. IMO let's try to do it faster but be ready to wait when Toby will not be satisfied with outcome on each step. FYI: I (and other mods) will be cleaning this thread from time to time from messages that does not bring much to discussion. |
Hi @tdhock, will do, I'll make a draft & post it here as a separate issue, so that you and others can contribute. I'm also happy to then implement the survey & write up the results, though I'd ask for some help with survey distribution. |
Several people have asked if Matt can comment on this issue, to express approval of the proposed process, and/or if he has any strict requirements for the new governance. I emailed Matt last week, to tell him about this issue, and to ask if I could publish his letters of collaboration/support, and he agreed, so here they are. I believe these letters provide convincing evidence of Matt's support; for example he wrote "I wholeheartedly support this POSE application and will collaborate with Toby's team." |
Thank you Toby for initiating this. My kudos go to @mattdowle for all the work he put into Since it seems that Matt does not have the time needed to fully commit I'm in big favor of establishing such a hierarchical review model. I'm convinced that there are long-time contributors to the project (who might also should have earned author status) like @jangorecki and @MichaelChirico who know good parts of our codebase pretty well to be able to review and approve PRs to their best knowledge similar to what Matt did. Regarding releases, I think that we should release soon when most of the known hard issues in the dev version are fixed. I know that we are all used to work with the dev version but getting things out on CRAN has an even much bigger impact and I think that there are many features |
This looks good to me, but what this whole exaggerated emphasis on diversity/inclusion? Were there some issues regarding that topic in the package development process? Why is this present literally in every sentence of the proposal? |
I am not aware of any issue regarding diversity/inclusion in the package development process. In fact, I believe Matt was doing a great job with inclusion, as he had a policy of adding people to the team https://github.com/orgs/Rdatatable/teams/project-members (permission to add branches/PRs from this repo rather than in forks) immediately after they submit their first PR. The focus on NSF POSE program is expanding the ecosystem of users and contributors, so I believe it is important that we are intentional about making diversity/inclusion a priority in the governance document, to encourage attracting and retaining the maximum number of contributors in the future. |
Thank you @tdhock for driving this. From the letters you posted, this seems to have been a long time in the making. Like others, I had grown a bit worried over the months that data.table was being abandoned. It is, however, a very mature package. Huge props to @mattdowle and Arun for developing and maintaining this package for so long. I've learned a lot from both the project setup and his direct feedback on my PRs. I think Matt's expertise and experience with R internals will be hard to replace. I feel like a governance document is necessary given the funding secured and I guess with a view towards actual growth. But at the core, we should strive to keep most of the development model and design fundamentals. There is de-facto a core team around Matt, so let's write this down in a way that doesn't get in the way later. The crucial aspects we need to tackle are commit rights to master and release to CRAN. Pre-submission testing and especially rev-dep testing seem to be very complex, so having one or more people who can tackle the release process seems essential. Overall, the proposal seems fine to me. I'm not sure how strictly we should pin down responsible people there so as to not restrict ourselves needlessly. Given Toby's comments, I don't think we should rely on Matt for commits or releases going forward. But if he were available to consult on tricky C questions that would be beneficial to all of us. But if he can't we'll do our best. A few key points regarding the design we should pin down, imo, are the long and careful life-cycle management of features (and consequent wariness of adding new features) and the backwards compatibility. *Although we might revisit the extent of the latter.) Adding people as contributors as soon as their first PR gets merged, etc. Basically, the front page needs to stay. And since I'm already at it, I'll chuck out a couple more thoughts: data.table needs some marketing and we need to do some housekeeping on the issue tracker and open PRs. Again I'm not sure if this needs to be written down in the governance document but I think something needs to be done here. |
Thanks @tdhock for spearheading this! Do you have an estimated timeline? It feels like this process is dragging. Perhaps it needs less formal discussion and more unilateral action? ..or perhaps some concrete dates so that we don't suffer paralysis by analysis. Anyways I don't mean to complain. Really appreciate your work. I'll happily make a small donation to the team if it helps and I'd love to promote data.table if/when new releases start happening. |
Just came across this document written by cURL creator/maintainer Daniel Sternberg. Sharing as relevant: |
What a great read! Thanks for sharing Michael. Here are some relevant passages Contributors will not stick around -> My experience says that you will have better success in getting more maintainers if you (as an existing maintainer) ask those you consider being contenders, rather than waiting and hoping for them to ask. https://un.curl.dev/code/quality#how-do-you-achieve-good-code-quality Roles: https://un.curl.dev/maintain BDFL? security? release manager (use a checklist), web, reviewing, support, blog, debug, merging, feature dev, doc writers, event planning, stickers, presenters, world monitoring (surveys). |
Hi, just a friendly reminder to check on the progression of the new governance. We are about to be one month for the deadline proposed by @tdhock. Do we have already a core-team for the future data.table? Are the funding issues closed? Will we claim back the throne of speed from other packages such as collapse? |
End of November was the first milestone date mentioned by Toby, so we still have time. As for benchmark with collapse, I invite you to submit new issue, for each task you are troubled by DT being slower than collapse. As benchmarking is off-topic to this issue I kindly ask to not continue that topic here but create new issue if needed. |
In my original post, I asked the questions, What roles and/or permissions should we define? What process must a contributor follow in order to obtain special roles and/or permissions? Here are some more detailed answers to these questions. Please comment constructively, discuss strengths/weaknesses of this structure, and propose concrete alternatives.
|
|
'Committer' as used by the ASF is quite descriptive imo
To keep development velocity I would suggest >= 1 👍 and no 👎 for smaller day to day changes (bugfixes, enhancements etc) and some higher requirement for more important/influential changes (e.g. changes to core api/implementation). This could also be a discussion + vote outside of the PR (e.g. for arrow we have to vote on the ML for things like format changes/additions). |
For CRAN maintainer I think it would be most suitable to choose a person whose time will be already funded by NSF (or any other company/foundation willing to sponsor that). Preparing release and CRAN communication have often short deadlines and it can be quite time consuming, therefore relying on a volunteer to handle that does not seem to be fair. |
data.table has no formal governance document; Matt Dowle, the original author and only current maintainer, has commit permissions on github, and he submits the package to CRAN. The only other author, Arun, has been inactive for several years.
Matt has done a fantastic job at creating a highly efficient and widely used R package, and he continues to submit "patch" releases to CRAN (containing minimal fixes so that the package continues to pass CRAN checks). data.table has many contributors, who have submitted PRs, which have been reviewed and merged by Matt. This form of project leadership/governance is similar to the former python model, Benevolent Dictator For Life (BDFL). This form of governance can handle as many contributions/PRs as the BDFL (Matt) has time to review and merge. The purpose of this issue is to discuss alternative forms of governance which may be able to handle more contributions, from a larger and more diverse group of contributors.
I therefore propose that we use this issue to write constructive, thoughtful, respectful, and inclusive comments containing concrete propositions for the future governance of the data.table project. At the end of three months (end of November 2023), I will synthesize the comments into a draft governance document, which I will publish as a PR that creates a new GOVERNANCE.md file. After that, there will be a
threeone month period of open comments, where people can discuss the strengths/weaknesses of the draft, and we can discuss possible edits to make. At the end of that second period (end of Dec 2023), I will consider the consensus of the comments, make corresponding edits, and publish a second draft (by editing the PR). If that draft is sufficient, I will ask contributors to sign that document by adding their names to the GOVERNANCE.md file in that PR. If there are still significant concerns, we can use anotherthreeone month period of comments to resolve them, after which I will publish another (hopefully final) draft in end of Jan 2024.Some significant questions that I suggest we try to answer in the governance document: (others are welcome too)
data.table
principles #5693We are not the first open-source project to have a governance document, here is a reading list about open-source governance, which can inform our discussion:
About the roles to define, I suggest replacing the current flat leadership model (one maintainer role at the top, many contributors on the bottom), with a hierarchical model containing intermediate "reviewers," similar to the successful model of subsystem maintainers from the linux kernel project. See figure below, but note that the names are totally arbitrary (for example, I do not expect Kelly to be release manager, but it would be nice to have someone take the role of release manager).
I would suggest that each intermediate “reviewer” volunteer to be in charge of reviewing and merging PRs for specific features/files, as defined in the CODEOWNERS file, #5629 So far only @ben-schwen @MichaelChirico and @jangorecki have volunteered to be reviewers.
In addition to the reviewer role in the figure above, there could be at least five other roles (with responsibilities):
I would volunteer to be reverse dependency manager, as I have set up the revdep-check system
I would nominate @MichaelChirico for translation manager and @jangorecki for binary manager.
Who would volunteer for release manager and performance testing manager?
The text was updated successfully, but these errors were encountered: