-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal to reduce inactive collaborator duration #1524
Comments
Back in 2018 at one collaboration summit I proposed to revoke the commit bit if it has not been used to commit anything in 3 months: https://github.com/joyeecheung/talks/blob/master/node_collab_summit_201805/core_collaboartors_status_and_scope.pdf (when I investigated in 2018, ~50% of the collaborators would be affected if we did that). Collaborators could request to go back if they started sending PR again. Didn't follow up because there were some pushbacks. |
As a counter point I don't think this was ever abused. Releasing and security issues which are the riskier bits both require different steps from being a collaborator (e.g. release team, tsc) that have shorter inactivity deadlines and removing people doesn't buy us too much as a result. Before we do this, I want to make sure it doesn't alienate people with periods of inactivity - can we run a query to check what active collaborators had a period of inactivity and returned from such a period? (Not sure how hard it is to check since a bunch of people mostly review) |
+1 on changing this. based on https://github.com/nodejs/node/blob/main/tools/find-inactive-collaborators.mjs
if we change to months of inactivity:
Note Ian Sutherland has already passed the 18 months inactivity period: nodejs/node#52300 |
I don't think reducing the inactivity period will reduce the risk of a rogue collaborators. We should document how we are doing this (code reviews, release reviews, minimum 2 weeks before a commit goes to a LTS release, a tight group of releasers). |
Strongly Agree
While Firstly, there are two CI systems (https://ci.nodejs.org/ and https://ci-release.nodejs.org/) with different Project-based Matrix Authorization Strategies. In the While in the On the other hand, even if a malicious collaborator is capable of landing commits abusing Moreover, even if this change goes unnoticed, the commit won't be released automatically to the Node.js users. As with any release in Node.js, we are required to follow these 20 steps. As part of these steps, the releasers will first cherry-pick the commits based on certain criteria and then elaborate a PR that anyone can review (see). These PRs tend to be open for days and are quite well-reviewed and commented by the collaborators. My two cents, as @mcollina said, the key here is to document and review these protections and processes that we have already in place to prevent bad things (accidental stuff, human errors, malicious actors...) 😊 |
the benefit of this suggestion might not be huge, but if it is going to affect ±20 people there is not a big downside either |
One of the hopes of the commit queue was to perhaps reduce the need to grant so many people write access to the repository. |
@UlisesGascon thanks for providing the outline of some of the things that we do that reduce the risk of a rogue collaborator. If we do make a change I think 6 months is too short but would be ok with 1 year for inactivity. |
@mhdawson Why do you think a year is too short? Can you elaborate? |
@UlisesGascon Having write access (meaning executing/running CI jobs) gives users to run any arbitrary code on the proposed infrastructure. We also talked about |
@anonrig I can easily see people taking a break (for example to care for a new child etc.) who will come back getting removed by the 6 month check. A 12 month check makes this a lot less likely in my opinion. |
If as @mcollina mentioned that this measure doesn't address the security issues mentioned in the motivation, what's the remainder motivation of the change? |
Reducing the number of collaborators as a mean to better communicate to the public the number of people actively maintaining the project, I guess? |
I went ahead a run an analysis on Node.js codebase. Using the 8232 commits (that's 20% of all the commits that have landed on {
"oneTimeContributors": 517,
"sampleSize": 324,
"25-percentile": "3 weeks",
"median": "11 weeks",
"75-percentile": "6 months",
"85-percentile": "9.5 months",
"90-percentile": "12.9 months",
"99-percentile": "22.5 months",
"max": "28.9 months"
} N.B.: this includes all returning contributors, not just collaborators. The vast majority of contributors contribute only once to the project. For 90% of contributors, 56 weeks (which is just a bit more than 12 months) is their longest period on inactivity before coming back to the project. Sorry if the presentation of results is not ideal. Someone who knows R, please make a graph 🙈 Code#!/usr/bin/env node
// Identify inactive collaborators. "Inactive" is not quite right, as the things
// this checks for are not the entirety of collaborator activities. Still, it is
// a pretty good proxy. Feel free to suggest or implement further metrics.
import cp from 'node:child_process';
import fs from 'node:fs';
import readline from 'node:readline';
const cacheIdentities = new Map();
const tableOfContributions = { __proto__: null };
async function resolveIdentity(identity) {
const childProcess = cp.spawn('git', ['check-mailmap', identity], {
cwd: new URL('..', import.meta.url),
encoding: 'utf8',
stdio: ['inherit', 'pipe', 'inherit'],
});
return (await childProcess.stdout.toArray()).join('').trim();
}
async function runGitCommand() {
const childProcess = cp.spawn('git', ['log', '-8232', '--before=1 year ago', '--pretty=format:{"hash":"%H","date":%ct,"actors":["%an <%ae>","%(trailers:only,valueonly,key=Co-authored-by,separator="%x2C")","%(trailers:only,valueonly,key=Reviewed-by,separator="%x2C")"]}'], {
cwd: new URL('..', import.meta.url),
encoding: 'utf8',
stdio: ['inherit', 'pipe', 'inherit'],
});
const lines = readline.createInterface({
input: childProcess.stdout,
});
const errorHandler = new Promise(
(_, reject) => childProcess.on('error', reject),
);
await Promise.race([errorHandler, Promise.resolve()]);
// If no mapFn, return the value. If there is a mapFn, use it to make a Set to
// return.
const allPromises = [];
for await (const line of lines) {
await Promise.race([errorHandler, Promise.resolve()]);
const { date, actors } = JSON.parse(line.replace(/ "([a-zA-Z]+)" /, ' \\u0022$1\\u0022 '));
for (const actor of actors) {
if (actor) {
let actualIdentity = cacheIdentities.get(actor);
if (actualIdentity == null) {
actualIdentity = resolveIdentity(actor);
cacheIdentities.set(actor, actualIdentity);
}
allPromises.push(actualIdentity.then((identity) => {
if (identity) {
(tableOfContributions[identity] ??= []).push(date);
} else {
console.error({ line });
}
}));
}
}
}
return Promise.race([errorHandler, Promise.all(allPromises)]);
}
await runGitCommand();
let oneTimeContributors = 0;
let sampleSize = 0;
const maxDelaysBetweenContributions = [];
const sortNumbers = (a, b) => a - b;
for (const contributor in tableOfContributions) {
const contributions = tableOfContributions[contributor].sort(sortNumbers);
if (contributions.length === 1) {
oneTimeContributors++;
delete tableOfContributions[contributor];
continue;
}
sampleSize++;
let max = Number.MIN_SAFE_INTEGER;
for (let i = 1; i < contributions.length; i++) {
const diff = contributions[i] - contributions[i - 1];
if (diff > max) max = diff;
}
maxDelaysBetweenContributions.push(max);
}
maxDelaysBetweenContributions.sort(sortNumbers);
const percentile = (percentile) => Math.ceil((percentile / 100) * maxDelaysBetweenContributions.length) - 1;
const secondsToDays = (seconds) => seconds / 3600 / 24;
const formatTime = (seconds) => {
const days = secondsToDays(seconds);
if (days < 7) {
return `${Math.round(days)} days`;
}
if (days < 150) {
return `${Math.round(days / 7)} weeks`;
}
return `${Math.round(days / 36.525 * 12) / 10} months`;
};
console.log(JSON.stringify({
oneTimeContributors,
sampleSize,
'25-percentile': formatTime(maxDelaysBetweenContributions[percentile(25)]),
'median': formatTime(maxDelaysBetweenContributions[percentile(50)]),
'75-percentile': formatTime(maxDelaysBetweenContributions[percentile(75)]),
'85-percentile': formatTime(maxDelaysBetweenContributions[percentile(85)]),
'90-percentile': formatTime(maxDelaysBetweenContributions[percentile(90)]),
'99-percentile': formatTime(maxDelaysBetweenContributions[percentile(99)]),
'max': formatTime(maxDelaysBetweenContributions.at(-1)),
}, null, 2)); Results if I run the analysis on the last 20 000 commits that have landed on
|
This (25% of collaborators inactive for 26 weeks returning, and 10% after 56 weeks) aligns with my intuition and strengthens my concern this will alienate a significant amount of contributions/reviews in the project. |
Note that this is contributors, not collaborators (which may or may not be a close enough proxy, not sure). Filtering collaborators would be bit harder, but certainly possible if someone is motivated to adapt the script. |
I've re-run the analysis, with different parameters:
The results are more or less consistent with what I've found earlier, except for the 99th percentile but I think that can be explained simply by the fact that I looked at more commits. Code#!/usr/bin/env node
import cp from 'node:child_process';
import readline from 'node:readline';
const cacheIdentities = new Map();
const tableOfContributions = { __proto__: null };
async function resolveIdentity(identity) {
const childProcess = cp.spawn('git', ['check-mailmap', identity], {
cwd: new URL('..', import.meta.url),
encoding: 'utf8',
stdio: ['inherit', 'pipe', 'inherit'],
});
return (await childProcess.stdout.toArray()).join('').trim();
}
async function runGitCommand() {
const childProcess = cp.spawn('git', ['log', '-20000', '--pretty=format:{"hash":"%H","date":%ct,"actors":["%an <%ae>","%(trailers:only,valueonly,key=Co-authored-by,separator="%x2C")","%(trailers:only,valueonly,key=Reviewed-by,separator="%x2C")"]}'], {
cwd: new URL('..', import.meta.url),
encoding: 'utf8',
stdio: ['inherit', 'pipe', 'inherit'],
});
const lines = readline.createInterface({
input: childProcess.stdout,
});
const errorHandler = new Promise(
(_, reject) => childProcess.on('error', reject),
);
await Promise.race([errorHandler, Promise.resolve()]);
// If no mapFn, return the value. If there is a mapFn, use it to make a Set to
// return.
const allPromises = [];
for await (const line of lines) {
await Promise.race([errorHandler, Promise.resolve()]);
const { date, actors } = JSON.parse(line.replace(/ "([a-zA-Z]+)" /, ' \\u0022$1\\u0022 '));
for (const actor of actors) {
if (actor) {
let actualIdentity = cacheIdentities.get(actor);
if (actualIdentity == null) {
actualIdentity = resolveIdentity(actor);
cacheIdentities.set(actor, actualIdentity);
}
allPromises.push(actualIdentity.then((identity) => {
if (identity) {
(tableOfContributions[identity] ??= []).push(date);
} else {
console.error({ line });
}
}));
}
}
}
return Promise.race([errorHandler, Promise.all(allPromises)]);
}
const percentile = (array, percentile) => array[Math.ceil((percentile / 100) * array.length) - 1];
await runGitCommand();
let skippedContributors = 0;
let sampleSize = 0;
const maxDelaysBetweenContributions = {
__proto__: null,
90: [],
95: [],
99: [],
100: [],
};
const contributionThreshold = 68;
const sortNumbers = (a, b) => a - b;
// Const countContrib = Object.values(tableOfContributions).map((c) => c.length).sort(sortNumbers);
// console.log(Object.fromEntries(Array.from({ length: 99 }, (_, i) => [i + 1, percentile(countContrib, i + 1)])));
for (const contributor in tableOfContributions) {
const contributions = tableOfContributions[contributor].sort(sortNumbers);
if (contributions.length < contributionThreshold) {
skippedContributors++;
delete tableOfContributions[contributor];
continue;
}
sampleSize++;
const delayBetweenContributions = Array(contributions.length - 1);
for (let i = 0; i < contributions.length - 1; i++) {
delayBetweenContributions[i] = contributions[i + 1] - contributions[i];
}
for (const p in maxDelaysBetweenContributions) {
maxDelaysBetweenContributions[p].push(
p === '100' ?
delayBetweenContributions.at(-1) :
percentile(delayBetweenContributions.sort(sortNumbers), p),
);
}
}
const secondsToDays = (seconds) => seconds / 3600 / 24;
const formatTime = (seconds) => {
const days = secondsToDays(seconds);
if (days < 2) {
return `${Math.round(days * 240) / 10} hours`;
}
if (days < 14) {
return `${Math.round(days)} days`;
}
if (days < 150) {
return `${Math.round(days / 7)} weeks`;
}
return `${Math.round(days / 36.525 * 12) / 10} months`;
};
const percentiles = [25, 50, 75, 80, 85, 90, 95, 99, 100];
const table = { __proto__: null };
for (const p in maxDelaysBetweenContributions) {
const _p = p === '100' ? 'max' : p === '50' ? 'median' : `${p}th-pencentile`;
const delays = maxDelaysBetweenContributions[p].sort(sortNumbers);
for (const x of percentiles) {
const _x = x === 100 ? 'max' : x === 50 ? 'median' : `${x}th-pencentile`;
table[_x] ??= { __proto__: null };
table[_x][_p] = formatTime(x === '100' ? delays.at(-1) : percentile(delays, x));
}
}
console.log(JSON.stringify({
contributionThreshold,
skippedContributors,
sampleSize,
}, null, 2));
console.table(table); Percentiles of number of contributions per contributors for the last 20k commits{
median: 1,
'67': 1,
'68': 2,
'79': 2,
'80': 3,
'84': 3,
'85': 4,
'86': 5,
'87': 5,
'88': 6,
'89': 7,
'90': 9,
'91': 12,
'92': 16,
'93': 27,
'94': 42,
'95': 68,
'96': 110,
'97': 172,
'98': 389,
'99': 879
} |
So based on this, had the inactive duration been 12 or 9 or 6 months over the past few years, how many currently active collaborators would have been moved to emeritus? |
The current security incidents around Linux made me realize that we should look into Node.js organization from a different perspective.
The current inactive collaborator duration is 18 months (1.5 years). Collaborators have
write
access to the repository, access tocollaborator
private repository and can run CI at any time (and run any code on our infrastructure).The requirement to keep collaborator status is:
I'd like to reduce the inactive collaborator duration to 6 months or 12 months.
My reasonings are:
request-ci
or custom tasks) pose serious security risk.My logic might be flawed, and 6 months or 12 months might not be the correct duration, but I think we have the obligation (to the users of Node.js) to think about collaborator membership and security of it soon.
I believe that we should at least discuss/think about this in the short term.
cc @nodejs/tsc
The text was updated successfully, but these errors were encountered: