Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Reimplements part of the integrity check #4122

Merged
merged 6 commits into from
Aug 14, 2017

Conversation

arcanis
Copy link
Member

@arcanis arcanis commented Aug 8, 2017

Summary

The integrity file doesn't currently support workspaces, which leads
to #3863 (removing a workspace node_modules folder and running yarn
doesn't do anything, because Yarn incorrectly assumes that everything
is already done).

This PR rewrites a good chunk of the integrity logic to support and
fix this behavior.

Test plan

Tests are passing, I still have to write an automated test before
merging this PR.

@arcanis arcanis force-pushed the check-workspace-files branch from b101665 to 2f479d9 Compare August 8, 2017 18:15
@arcanis arcanis changed the title Reimplements part of the integrity check Fix: Reimplements part of the integrity check Aug 8, 2017
@arcanis arcanis requested a review from BYK August 8, 2017 18:15
@arcanis arcanis force-pushed the check-workspace-files branch 2 times, most recently from 4978a20 to 1eabd01 Compare August 9, 2017 12:18
@arcanis
Copy link
Member Author

arcanis commented Aug 9, 2017

The tests are failing consistently on CI, but not in local oõ

@arcanis arcanis force-pushed the check-workspace-files branch 2 times, most recently from 3fc70b1 to d3185c4 Compare August 9, 2017 14:06
**Summary**

The integrity file doesn't currently support workspaces, which leads
to yarnpkg#3863 (removing a workspace node_modules folder and running yarn
doesn't do anything, because Yarn incorrectly assumes that everything
is already done).

This PR rewrites a good chunk of the integrity logic to support and
fix this behavior.

**Test plan**

Tests are passing, I still have to write an automated test before
merging this PR.
@arcanis arcanis force-pushed the check-workspace-files branch from d3185c4 to 0765d26 Compare August 9, 2017 14:23
Copy link
Member

@BYK BYK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty great (and seems to be more efficient than what we had).

I was able to reproduce the failure on my local so let's fix that and I'm good.

} else if (this.config.workspaceRootFolder) {
return this.config.workspaceRootFolder;
} else {
return path.join(this.config.lockfileFolder, 'node_modules');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use this.config.registries[name].folder instead of 'node_modules' here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid using this.config.registry, since its semantic isn't clear (cf #4109). However I can put node_modules in the constants.js file, if you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we actually act on #4109 we should keep things the same way so the clean up itself is easier too.

} else {
locationFolder = await this._getModuleLocation(usedRegistries);
return path.join(this.config.lockfileFolder, 'node_modules');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: let's use already defined constants instead of 'node_modules'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, shouldn't we include this.config.workspaceRootFolder in this chain too?

Copy link
Member Author

@arcanis arcanis Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, workspaceRootFolder should only be used for resolving relative paths. In workspaces it will be the package root folder, and we don't want to store the .yarn-integrity file there.

@@ -118,23 +109,59 @@ export default class InstallationIntegrityChecker {
}

/**
* returns a list of files recursively in a directory sorted
* Get the list of the directories that contain our modules (there might be multiple such folders b/c of workspaces).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/might/may

if (this.config.modulesFolder) {
locations.push(this.config.modulesFolder);
} else {
locations.push(path.join(this.config.lockfileFolder, 'node_modules'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No 'node_modules' please.

const files = [];

const recurse = async dir => {
if (!await fs.exists(dir)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dir might not exists at the root. I'll move it inside the l160 loop, since it only matter there.


// If we run "yarn" after "yarn --check-files", we shouldn't fail the less strict validation
if (actual.flags.indexOf('checkFiles') === -1) {
relevantExpectedFlags = relevantExpectedFlags.filter(flag => flag !== 'checkFiles');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like 'checkFiles' and possible other flags should be in constants.

const actualFiles = await this._getFilesDeep(locationFolder);
if (actualFiles.length > 0) {
return 'FILES_MISSING';
// Early bailout if we expect more files than what we have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart

}

// If we've reached the end of the actual array, the file is missing
if (v === actual.files.length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another optimization: whenever actual.files.length - v < expected.files.length - u we're done.

workspaceLayout,
);
let integrityMatches = this._compareIntegrityFiles(actual, expected, flags.checkFiles, workspaceLayout);
console.log(actual, expected, integrityMatches);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to remove this guy :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm debugging why it breaks

this.config.lockfileFolder,
await fs.exists(path.join(this.config.lockfileFolder, modulesFolder)),
);
if (!await fs.exists(path.join(this.config.lockfileFolder, modulesFolder))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs comment since I don't follow the logic. This should have been caught by the checks above already, right?

Copy link
Member Author

@arcanis arcanis Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope

  • the files entry contains the list of files, but is only there if we use the --check-files option. Otherwise it does no check at all. We still want to check if at the very least the directories exist.

  • the modulesFolders entry is a static list of directory that isn't baked by the filesystem. They are not the actual physical folders, they are the theoretical folders. So when we compare them in _compareIntegrityFiles, we don't actually check they exist. That's done here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want to check if at the very least the directories exist.

See, we need to add a comment about this :D

the modulesFolders entry is a static list of directory that isn't baked by the filesystem. They are not the actual physical folders, they are the theoretical folders. So when we compare them in _compareIntegrityFiles, we don't actually check they exist. That's done here.

Wow. Again, needs comment so any future person would know this when they look at the code.

@BYK BYK self-assigned this Aug 9, 2017
@@ -185,7 +185,21 @@ export default class InstallationIntegrityChecker {
artifacts,
};

result.topLevelPatterns = patterns.sort(sortAlpha);
result.topLevelPatterns = patterns.sort(sortAlpha).filter(p => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd do filter first and sort second to reduce the runtime of sort.

@@ -185,7 +185,21 @@ export default class InstallationIntegrityChecker {
artifacts,
};

result.topLevelPatterns = patterns.sort(sortAlpha);
result.topLevelPatterns = patterns.sort(sortAlpha).filter(p => {
return !workspaceLayout || !workspaceLayout.getManifestByPattern(p);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since workspaceLayout won't ever change during the lifetime of this function, you can do the following and speed up the cases where workspaces are not used:

const topLevelPatterns = workspaceLayout ? patterns.filter(p => !workspaceLayout.getManifestByPattern(p)) : patterns;
result.topLevelPatterns = topLevelPatterns.sort(sortAlpha);

if (workspaceLayout) {
for (const name of Object.keys(workspaceLayout.workspaces)) {
if (workspaceLayout.workspaces[name].loc) {
result.topLevelPatterns = result.topLevelPatterns.concat(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks sorting, right? Is that intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still sorted, just not sorted alphabetically. As long as it has a deterministic order, it's should be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object.keys(workspaceLayout.workspaces) doesn't have a deterministic order AFAIK so you may wanna sort the keys first. Also we should add a comment about this into the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess. It's actually a bit more complex: the sorting order is de facto implementation-defined, so it will stay the same for any given engine, which is all we need. I thought about sorting the array but it wouldn't change anything ... except be a bit safer in some weird edge cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still you're right, I'll add it, just in case. Better be safe :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think assuming engine internals will stay the same across runs is safe enough, yeah. Thanks! :)

// Skip over files that have been added
while (v < actual.files.length && actual.files[v] !== expected.files[u]) {
// Number of iterations after which there won't be enough entries remaining for the arrays to match
const max = actual.files.length - expected.files.length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this out of the loop? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this originally but I find it more readable for this variable and its comment to be close to the place we're using it. Since this is something that could/should be optimized by a decent compiler I think it's fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you can move it to the initialization part of the for loop if you wanna keep it cozy? :D

v += 1;
}

// If we've reached the end of the actual array, the file is missing
if (v === actual.files.length) {
if (v === max) {
Copy link
Member

@BYK BYK Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this logic is correct. max is just a "small" difference where v is the absolute index which has to go past max. If you kept a separate advance variable and calculated v as u + advance and did if (advance === max) then the logic would be correct.

Copy link
Member

@BYK BYK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rejecting because of non-deterministic Object.keys() comment. Also I'd love for you to address my other comments, especially around not using the string 'node_modules' but other optimizations and clarifications as well.

if (workspaceLayout) {
for (const name of Object.keys(workspaceLayout.workspaces)) {
if (workspaceLayout.workspaces[name].loc) {
result.topLevelPatterns = result.topLevelPatterns.concat(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object.keys(workspaceLayout.workspaces) doesn't have a deterministic order AFAIK so you may wanna sort the keys first. Also we should add a comment about this into the code.


// Since we know the "files" array is sorted (alphabetically), we can optimize the thing
// Instead of storing the files in a Set, we can just iterate both arrays at once. O(n)!
for (let u = 0, v = 0; u < expected.files.length; ++u) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly think the following is easier to follow:

const maxDiff = actual.files.length - expected.files.length;
if (maxDiff < 0) {
    return 'FILES_MISSING';
}
const advance = 0;
for (const u = 0; u < expected.files.length; ++u) {
  while (advance < maxDiff && actual.files[u + advance] !== expected.files[u]) {
    advance += 1;
  }
  if (advance === maxDiff) { 
    return 'FILES_MISSING';
  }
}

Copy link
Member Author

@arcanis arcanis Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think your code won't early exit when the check cannot succeed anymore. It will always iterate on all of actual (minus advance).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will always iterate on all of actual (minus advance).

Not really. It keeps incrementing advance as long as there are non-matching so if the non-matching files are at the start of the array, as soon as we exhaust them (maxDiff iterations) we stop.

@arcanis
Copy link
Member Author

arcanis commented Aug 14, 2017

@BYK Code is now using a constant. Good to merge? :)

Copy link
Member

@BYK BYK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Let's start enjoying the faster and more comprehensive integrity checks!

for (const workspaceName of Object.keys(workspaceLayout.workspaces)) {
const loc = workspaceLayout.workspaces[workspaceName].loc;

if (loc) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

for (const modulesFolder of this._getModulesFolders({workspaceLayout})) {
if (await fs.exists(modulesFolder)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not entirely getting how we may have missing folders in workspaces. Can you provide more details about what this is protecting us from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _getModulesFolders function only returns the theoretical folders, it does not check that those folders actually exist on the disk. So for example, if we have a workspace named foobar, the function will return foobar/node_modules without checking that this folder actually exists.

@BYK BYK merged commit 50508b2 into yarnpkg:master Aug 14, 2017
@Bnaya
Copy link

Bnaya commented Aug 14, 2017

Thanks @arcanis @BYK :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants