-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Reimplements part of the integrity check #4122
Conversation
b101665
to
2f479d9
Compare
4978a20
to
1eabd01
Compare
The tests are failing consistently on CI, but not in local oõ |
3fc70b1
to
d3185c4
Compare
**Summary** The integrity file doesn't currently support workspaces, which leads to yarnpkg#3863 (removing a workspace node_modules folder and running yarn doesn't do anything, because Yarn incorrectly assumes that everything is already done). This PR rewrites a good chunk of the integrity logic to support and fix this behavior. **Test plan** Tests are passing, I still have to write an automated test before merging this PR.
d3185c4
to
0765d26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty great (and seems to be more efficient than what we had).
I was able to reproduce the failure on my local so let's fix that and I'm good.
src/integrity-checker.js
Outdated
} else if (this.config.workspaceRootFolder) { | ||
return this.config.workspaceRootFolder; | ||
} else { | ||
return path.join(this.config.lockfileFolder, 'node_modules'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use this.config.registries[name].folder
instead of 'node_modules'
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to avoid using this.config.registry
, since its semantic isn't clear (cf #4109). However I can put node_modules
in the constants.js
file, if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we actually act on #4109 we should keep things the same way so the clean up itself is easier too.
src/integrity-checker.js
Outdated
} else { | ||
locationFolder = await this._getModuleLocation(usedRegistries); | ||
return path.join(this.config.lockfileFolder, 'node_modules'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above: let's use already defined constants instead of 'node_modules'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, shouldn't we include this.config.workspaceRootFolder
in this chain too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, workspaceRootFolder
should only be used for resolving relative paths. In workspaces it will be the package root folder, and we don't want to store the .yarn-integrity
file there.
@@ -118,23 +109,59 @@ export default class InstallationIntegrityChecker { | |||
} | |||
|
|||
/** | |||
* returns a list of files recursively in a directory sorted | |||
* Get the list of the directories that contain our modules (there might be multiple such folders b/c of workspaces). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/might/may
src/integrity-checker.js
Outdated
if (this.config.modulesFolder) { | ||
locations.push(this.config.modulesFolder); | ||
} else { | ||
locations.push(path.join(this.config.lockfileFolder, 'node_modules')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No 'node_modules'
please.
src/integrity-checker.js
Outdated
const files = []; | ||
|
||
const recurse = async dir => { | ||
if (!await fs.exists(dir)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dir
might not exists at the root. I'll move it inside the l160 loop, since it only matter there.
|
||
// If we run "yarn" after "yarn --check-files", we shouldn't fail the less strict validation | ||
if (actual.flags.indexOf('checkFiles') === -1) { | ||
relevantExpectedFlags = relevantExpectedFlags.filter(flag => flag !== 'checkFiles'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like 'checkFiles'
and possible other flags should be in constants.
const actualFiles = await this._getFilesDeep(locationFolder); | ||
if (actualFiles.length > 0) { | ||
return 'FILES_MISSING'; | ||
// Early bailout if we expect more files than what we have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smart
src/integrity-checker.js
Outdated
} | ||
|
||
// If we've reached the end of the actual array, the file is missing | ||
if (v === actual.files.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another optimization: whenever actual.files.length - v
< expected.files.length - u
we're done.
src/integrity-checker.js
Outdated
workspaceLayout, | ||
); | ||
let integrityMatches = this._compareIntegrityFiles(actual, expected, flags.checkFiles, workspaceLayout); | ||
console.log(actual, expected, integrityMatches); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to remove this guy :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm debugging why it breaks
src/integrity-checker.js
Outdated
this.config.lockfileFolder, | ||
await fs.exists(path.join(this.config.lockfileFolder, modulesFolder)), | ||
); | ||
if (!await fs.exists(path.join(this.config.lockfileFolder, modulesFolder))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs comment since I don't follow the logic. This should have been caught by the checks above already, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope
-
the
files
entry contains the list of files, but is only there if we use the--check-files
option. Otherwise it does no check at all. We still want to check if at the very least the directories exist. -
the
modulesFolders
entry is a static list of directory that isn't baked by the filesystem. They are not the actual physical folders, they are the theoretical folders. So when we compare them in_compareIntegrityFiles
, we don't actually check they exist. That's done here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still want to check if at the very least the directories exist.
See, we need to add a comment about this :D
the modulesFolders entry is a static list of directory that isn't baked by the filesystem. They are not the actual physical folders, they are the theoretical folders. So when we compare them in _compareIntegrityFiles, we don't actually check they exist. That's done here.
Wow. Again, needs comment so any future person would know this when they look at the code.
src/integrity-checker.js
Outdated
@@ -185,7 +185,21 @@ export default class InstallationIntegrityChecker { | |||
artifacts, | |||
}; | |||
|
|||
result.topLevelPatterns = patterns.sort(sortAlpha); | |||
result.topLevelPatterns = patterns.sort(sortAlpha).filter(p => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do filter
first and sort
second to reduce the runtime of sort.
src/integrity-checker.js
Outdated
@@ -185,7 +185,21 @@ export default class InstallationIntegrityChecker { | |||
artifacts, | |||
}; | |||
|
|||
result.topLevelPatterns = patterns.sort(sortAlpha); | |||
result.topLevelPatterns = patterns.sort(sortAlpha).filter(p => { | |||
return !workspaceLayout || !workspaceLayout.getManifestByPattern(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since workspaceLayout
won't ever change during the lifetime of this function, you can do the following and speed up the cases where workspaces are not used:
const topLevelPatterns = workspaceLayout ? patterns.filter(p => !workspaceLayout.getManifestByPattern(p)) : patterns;
result.topLevelPatterns = topLevelPatterns.sort(sortAlpha);
src/integrity-checker.js
Outdated
if (workspaceLayout) { | ||
for (const name of Object.keys(workspaceLayout.workspaces)) { | ||
if (workspaceLayout.workspaces[name].loc) { | ||
result.topLevelPatterns = result.topLevelPatterns.concat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks sorting, right? Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still sorted, just not sorted alphabetically. As long as it has a deterministic order, it's should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Object.keys(workspaceLayout.workspaces)
doesn't have a deterministic order AFAIK so you may wanna sort the keys first. Also we should add a comment about this into the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess. It's actually a bit more complex: the sorting order is de facto implementation-defined, so it will stay the same for any given engine, which is all we need. I thought about sorting the array but it wouldn't change anything ... except be a bit safer in some weird edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still you're right, I'll add it, just in case. Better be safe :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think assuming engine internals will stay the same across runs is safe enough, yeah. Thanks! :)
src/integrity-checker.js
Outdated
// Skip over files that have been added | ||
while (v < actual.files.length && actual.files[v] !== expected.files[u]) { | ||
// Number of iterations after which there won't be enough entries remaining for the arrays to match | ||
const max = actual.files.length - expected.files.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this out of the loop? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this originally but I find it more readable for this variable and its comment to be close to the place we're using it. Since this is something that could/should be optimized by a decent compiler I think it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you can move it to the initialization part of the for
loop if you wanna keep it cozy? :D
src/integrity-checker.js
Outdated
v += 1; | ||
} | ||
|
||
// If we've reached the end of the actual array, the file is missing | ||
if (v === actual.files.length) { | ||
if (v === max) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think this logic is correct. max
is just a "small" difference where v
is the absolute index which has to go past max
. If you kept a separate advance
variable and calculated v
as u + advance
and did if (advance === max)
then the logic would be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rejecting because of non-deterministic Object.keys()
comment. Also I'd love for you to address my other comments, especially around not using the string 'node_modules'
but other optimizations and clarifications as well.
src/integrity-checker.js
Outdated
if (workspaceLayout) { | ||
for (const name of Object.keys(workspaceLayout.workspaces)) { | ||
if (workspaceLayout.workspaces[name].loc) { | ||
result.topLevelPatterns = result.topLevelPatterns.concat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Object.keys(workspaceLayout.workspaces)
doesn't have a deterministic order AFAIK so you may wanna sort the keys first. Also we should add a comment about this into the code.
|
||
// Since we know the "files" array is sorted (alphabetically), we can optimize the thing | ||
// Instead of storing the files in a Set, we can just iterate both arrays at once. O(n)! | ||
for (let u = 0, v = 0; u < expected.files.length; ++u) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly think the following is easier to follow:
const maxDiff = actual.files.length - expected.files.length;
if (maxDiff < 0) {
return 'FILES_MISSING';
}
const advance = 0;
for (const u = 0; u < expected.files.length; ++u) {
while (advance < maxDiff && actual.files[u + advance] !== expected.files[u]) {
advance += 1;
}
if (advance === maxDiff) {
return 'FILES_MISSING';
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think your code won't early exit when the check cannot succeed anymore. It will always iterate on all of actual
(minus advance
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will always iterate on all of actual (minus advance).
Not really. It keeps incrementing advance
as long as there are non-matching so if the non-matching files are at the start of the array, as soon as we exhaust them (maxDiff
iterations) we stop.
@BYK Code is now using a constant. Good to merge? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Let's start enjoying the faster and more comprehensive integrity checks!
for (const workspaceName of Object.keys(workspaceLayout.workspaces)) { | ||
const loc = workspaceLayout.workspaces[workspaceName].loc; | ||
|
||
if (loc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think after my fixes, we don't need the emptiness check here anymore.
Referring to these changes:
} | ||
|
||
for (const modulesFolder of this._getModulesFolders({workspaceLayout})) { | ||
if (await fs.exists(modulesFolder)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not entirely getting how we may have missing folders in workspaces. Can you provide more details about what this is protecting us from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _getModulesFolders
function only returns the theoretical folders, it does not check that those folders actually exist on the disk. So for example, if we have a workspace named foobar
, the function will return foobar/node_modules
without checking that this folder actually exists.
Summary
The integrity file doesn't currently support workspaces, which leads
to #3863 (removing a workspace node_modules folder and running yarn
doesn't do anything, because Yarn incorrectly assumes that everything
is already done).
This PR rewrites a good chunk of the integrity logic to support and
fix this behavior.
Test plan
Tests are passing, I still have to write an automated test before
merging this PR.