Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scoped packages 403 on yarn install with empty cache #4027

Merged
merged 10 commits into from
Jul 28, 2017
Merged

Scoped packages 403 on yarn install with empty cache #4027

merged 10 commits into from
Jul 28, 2017

Conversation

lukeggchapman
Copy link
Contributor

Summary

Fix scoped packages not adding correct auth headers (#4016)

Test Plan

Having set up config for a scoped registry with an auth token. It should be possible to run yarn cache clean followed by yarn install without having the scoped packages 403

Fix scoped packages not adding correct auth headers (#4016)

**Test Plan**

Run yarn cache clean followed by yarn install when using scoped packages that have a non-default registry and token without receiving 403
Copy link
Member

@BYK BYK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, great start! I've added some comments and will merge once you address them.

Thanks for the debugging and the very quick PR :)

@@ -48,6 +48,11 @@ function isPathConfigOption(key: string): boolean {
return PATH_CONFIG_OPTIONS.indexOf(key) >= 0;
}

function isScopedPackage(packageIdent: string): boolean {
// scoped package names will begin with '@', scoped path names will contain '/@'
return !!packageIdent.match(/^@|\/@/);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use /^@|\/@/.test(packageIdent) instead please.

return !packageName || packageName[0] !== '@' ? '' : packageName.split(/\/|%2f/)[0];
getScope(packageIdent: string): string {
// Matches '@scope' in a packageName or pathname, otherwise returns ''
if (packageIdent && isScopedPackage(packageIdent)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the isScopedPackage check since the below matcher will fail anyways.

@@ -218,21 +221,27 @@ export default class NpmRegistry extends Registry {
}
}

getScope(packageName: string): string {
return !packageName || packageName[0] !== '@' ? '' : packageName.split(/\/|%2f/)[0];
getScope(packageIdent: string): string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function better written as:

getScope(packageIdent: string): string {
    const match = packageIdent.match(/(^@|(?!\/)@).+?((?=%2f)|(?=\/))/);
    return match && match[1] || '';
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworked the regex so this is possible.

// Matches '@scope' in a packageName or pathname, otherwise returns ''
if (packageIdent && isScopedPackage(packageIdent)) {
const matched = packageIdent.match(/(^@|(?!\/)@).+?((?=%2f)|(?=\/))/);
return (matched && matched[0]) || '';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want matched[1] if you want the first group, not matched[0]

Copy link
Contributor Author

@lukeggchapman lukeggchapman Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want the complete match (matched[0]), not any of the groups. The first group is '^@' or '/@'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, let's make those groups non-capturing then? (haven't seen the latest code yet)

getScope(packageIdent: string): string {
// Matches '@scope' in a packageName or pathname, otherwise returns ''
if (packageIdent && isScopedPackage(packageIdent)) {
const matched = packageIdent.match(/(^@|(?!\/)@).+?((?=%2f)|(?=\/))/);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regular expression is very hard to follow, can use some comments. Also what's with the %2f part? We should unescape the URL and run the matcher on it instead of doing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I matched on %2f because I didn't have the domain knowledge to know for what reason it was escaped in the pathname. If it's not to differentiate them from other forward slashes I could add an extra .replace('%2f', '/') call in this line to make the regex more readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need decodeURIComponent(str) instead.

@@ -175,6 +175,24 @@ describe('request', () => {

expect(requestParams.headers.authorization).toBe('Bearer testAuthToken');
});

test('should add authorization header with correct token for scoped package', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is very similar to the previous one so I'd either make them a bit more differentiated or fix the previous one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it is.
This tests that a scope configured with a non-default registry is receiving the token specified for that scope.
The previous test is testing the use of the default _authToken when fetching from a scope that has no configured registry.
I'm happy to remove the previous test if you believe it does not add value.

@BYK BYK self-assigned this Jul 26, 2017
@lukeggchapman
Copy link
Contributor Author

Thanks for the review @BYK! I've cleaned up some things and It's ready for you to review again.

@soda0289
Copy link
Contributor

soda0289 commented Jul 27, 2017

@lukeggchapman Thanks for fixing this and adding some more tests!

I have a question about how we determine if a package is scoped and what that scope is.
Using the regex: /(^@|(?!\/)@).+?((?=%2f)|(?=\/))/
Won't this mark the following url as having a scope:
http://my-registry.com/@registry/foo/bar.tgz
where registry could be: http://my-registry.com/@registry/foo

OR this url having an incorrect scope:
http://my-registry.com/test@test/@foo/bar.tgz
where registry could be http://my-registry.com/test@test/ and the actual scope @foo

Not sure there can be a perfect solution or if it can be improved. Maybe for URL we only selected the last / as scope and package name separator but that might not work if package name contains a /, which I'm not sure is allowed

EDIT: I noticed your patch is using /(^@|\/@)(.+?)(?=\/)/ which does a better job but has some conflict if the registry url contains an @ character as mentioned in the first example

@lukeggchapman
Copy link
Contributor Author

lukeggchapman commented Jul 27, 2017

@soda0289 You're welcome, happy to contribute.

I've recently refactored the regex to be /(^@|\/@)(.+?)(?=\/)/ which will resolve the problem with http://my-registry.com/test@test/@foo/bar.tgz.

With the first issue I've made the assumption that scoped packages will only ever have one /@ based off an existing assumption.

It would be possible to rewrite the regex to grab only the last @scope/pkg.tgz, but with registries that have an /@ already always being marked as scoped then more work would need to be done to support them properly.

@BYK
Copy link
Member

BYK commented Jul 27, 2017

It would be possible to rewrite the regex to grab only the last @scope/pkg.tgz, but with registries that have an /@ already always being marked as scoped then more work would need to be done to support them properly

This is a bit concerning. Should we file an issue for this so we don't forget? :(


return '';
// Matches strings beginning with @ or contain /@ up until the next forward slash
const match = packageIdent.replace('%2f', '/').match(/(^@|\/@)(.+?)(?=\/)/);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this closely, I really can't understand a few things:

  • Why do we even have URL-encoded parts in the ident? That doesn't look right so it should be fixed where the value originally comes from.
  • What does the ? at the end of (.+?) part stand for? Is it for the query string delimiter? If so, why do we need to include it in the match?

I feel like the following version is clearer and more accurate:

// Matches the first path segment that starts with an `@`. The RegEx is constructed as follows:
// `(?:^|\/)` Match either the start of the string or a `/` but don't capture
// `(@[^?\/]+)` Match a string starting with an `@` and not having any `/` or `?` in it and capture
// `(?:[?\/]|$)/)` Match a path delimiter, a query string delimiter or the end of the string but don't capture
const match = packageIdent.match(/(?:^|\/)(@[^?\/]+)(?:[?\/]|$)/);
return match && match[1] || '';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It's only the forward slashes in the package name (eg (https://private.npm.repo.com/@foo%2fbar) and It would be good to find where the pathname is getting stored/retrieved from to see if it is necessary to keep it in that format. I assume it is related to this.
  • I'm using the ? after a quantifier for non-greedy matching, so it matches only up to the next forward slash that is in a lookahead so it doesn't get captured.
  • I haven't seen a query string in a pathname yet, so it is not something I am catering for. Is that possible? Do you have some examples I could add to the test cases?

Your regex passes the test cases so I'll update it with that, with the added benefit of you having considered the query strings. 👍

@@ -175,6 +175,24 @@ describe('request', () => {

expect(requestParams.headers.authorization).toBe('Bearer testAuthToken');
});

test('should add authorization header with token for scope if pathname is to registry and is scoped package', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's explicitly say "custom registry" in the test name:
should add authorization header with token for custom registries with a scoped package

@lukeggchapman
Copy link
Contributor Author

lukeggchapman commented Jul 27, 2017

I'll create an issue for registries that contain /@ always being marked as scoped tomorrow.
Also I can look more into where the escaped forward slashes in package names within paths are coming from.

Copy link
Member

@BYK BYK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we sort out the %2f thing, good to go.

Great patch, thank you!

@@ -271,7 +271,7 @@ describe('getScope functional test', () => {
test('in pathname', () => {
const pathnames = [
['http://foo.bar:80/foo/bar/baz', ''],
['http://foo.bar:80/@scopedNoPkg', ''],
['http://foo.bar:80/@scopedNoPkg', '@scopedNoPkg'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this was intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was on the fence about this one anyway. Semantically getScope is correctly getting the scope, but there is no package and will likely error when trying to fetch the package from the endpoint.

return !packageName || packageName[0] !== '@' ? '' : packageName.split(/\/|%2f/)[0];
getScope(packageIdent: string): string {
// removing escaped / from package names in full paths
packageIdent = packageIdent.replace('%2f', '/');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to say this normalization should not be done here, and actually should not be done at all. http://foo.bar:80/@scope%2fbar/baz is not equivalent to http://foo.bar:80/@scope/bar/baz. See https://stackoverflow.com/a/14658209/90297

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it is not ideal and smells. Hopefully I'll have some time tomorrow (I'm in Australia) to follow the pathname around and see why it has it in the first place. So far this is the only file I've looked at in the yarn codebase.

@lukeggchapman
Copy link
Contributor Author

So 2%f was originally introduced when support for scoped packages was added.

In my case It is being added to the package name when initially added with yarn add and then stored in the path when stored and fetched from the lock file.

Doing a search on the repo for NpmRegistry.escapeName will see that it is common practice to replace the forward slash in our package names as this is how the npm registry expects our scoped package names to be formatted.

@lukeggchapman
Copy link
Contributor Author

Knowing that forward slashes in scoped package names are always escaped then I can rewrite both the isScopedPackage check and the getScope method. I'll update the PR shortly.

@lukeggchapman
Copy link
Contributor Author

@BYK Let me know what you think of this latest solution, it's a different approach to the one I was taking before.
@soda0289 latest fix should cover your concerns for registries with /@

@BYK
Copy link
Member

BYK commented Jul 28, 2017

Oh wow, great detective work @lukeggchapman. I'll be honest, I'm not a fan of this latest solution but seems like the "real" solution is not very relevant to this patch and we have passing tests. So I'll merge this and then follow up with a PR in the hopes of fixing this madness.

NPM itself says the following:

A scope follows the usual rules for package names (URL-safe characters, no leading dots or underscores).

So we should already unescape package names when working on them?

// `(?:^|\/)` Match either the start of the string or a `/` but don't capture
// `[^\/?]*?` Match any character that is not '/' or '?' and capture, up until the first occurance of:
// `%2f` Match '%2f' the escaped forward slash and don't capture
const SCOPED_PKG_REGEXP = /(?:^|\/)(@[^\/?]*?)(?=%2f)/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches an empty scope like @%2f. Also, matches @scope%2fwithout the package name. Gonna merge anyways since I don't think this is a problem but I'm quite confused about the *? part in (@[^\/?]*?). Why not just (@[^\/?]+) instead. What does even *? mean since both of them are quantifiers and you're not supposed to use two quantifiers back to back.

Goes away, reads about *? making * not greedy.

I'm now convinced that we need + instead of *? since our aim is to match a scope and the full name of the scope. So gonna replace that and then merge it. LMK if you disagree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if a eg @scope%2ffoo%2fbar should have scope @scope or @scope%2ffoo so your knowledge is appreciated there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out we need +? so it stops matching at the first sight of %2f. TIL

@lukeggchapman
Copy link
Contributor Author

@BYK I agree that the solution is not ideal as it would be much cleaner if the escape name was done only once in one place. If you search for request(NpmRegistry.escapeName( it gets used quite frequently.

This sums up my knowledge on the presence of the %2f in npm package names:
npm/npm#11738 (comment)
The extensive discussion was a little too extensive for me today 😆

@BYK
Copy link
Member

BYK commented Jul 28, 2017

@BYK I agree that the solution is not ideal as it would be much cleaner if the escape name was done only once in one place. If you search for request(NpmRegistry.escapeName( it gets used quite frequently.

Yup, gonna fix that later on.

This sums up my knowledge on the presence of the %2f in npm package names:
npm/npm#11738 (comment)
The extensive discussion was a little too extensive for me today 😆

Yeah, I've also traced back to the weird behavior of NPM registry. I think it is madness but it is the world we live in :D Sorry for all the trouble. Also fixing your PR soon. I messed it up :D

@BYK BYK merged commit cbb27f4 into yarnpkg:master Jul 28, 2017
@lukeggchapman
Copy link
Contributor Author

Hey @BYK, sorry to pester you but are you able to give me an indication of when this will be released?
With npm 5 also having similar issues It's a blocker for using private modules in our builds, I'll need to use my fork until it's released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants