Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] workspace root .gitignore supersed every .npmignore and .gitignore #106

Closed
1 task done
louis-bompart opened this issue May 27, 2022 · 13 comments · Fixed by #108
Closed
1 task done

[BUG] workspace root .gitignore supersed every .npmignore and .gitignore #106

louis-bompart opened this issue May 27, 2022 · 13 comments · Fixed by #108
Labels
Bug thing that needs fixing Needs Triage needs an initial review

Comments

@louis-bompart
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Given that I have a gitignore at the root level of my workspaces, it will supersed the .gitignore and the .npmignore at the workspace level.

This means that when I run npm publish at the level of a single package, it ends up using the root .gitignore to filter out files.

Expected Behavior

I would expect the 'ignore file resolution' order to be a such:

  1. 'ws-local' .npmignore
  2. 'ws-local' .gitignore
  3. 'ws-root' .npmignore
  4. 'ws-root' .gitignore

Steps To Reproduce

  1. In this environment:
.
├── .gitignore // file content: dist
├── .npmignore // empty file
├── package.json
└── packages
    └── a
        ├── dist
        │   └── test.js
        ├── .gitignore // empty file or anything but dist
        ├── .npmignore // empty file
        └── package.json
  1. With npm 8.11.0
  2. Run npm publish --dry-run while cwd is packages/a
  3. See that dist/test.js is not included.

Environment

  • npm: 8.11.0
  • Node: 16.15.0
  • OS: W10/Debian(WSL)
  • platform: N.A.
@louis-bompart louis-bompart added Bug thing that needs fixing Needs Triage needs an initial review labels May 27, 2022
@louis-bompart
Copy link
Author

Relates prolly to #102/ 12c23ad (sorry @nlf for the 👈. 😂 , just wanna give y'all as much intel as possible)

@louis-bompart
Copy link
Author

After checking 12c23ad a bit more in-depth, here are a few thoughts:

1. ignorefile combination order

Combining the ignorefile of the same type but across the level is interesting, however, instead of starting from the leaves and going to the root, I would propose the opposite because of how ! works in ignorefile.

Let's say I want to ignore all the .foo of my workspace, except for one package. At the root level, my ignorefile could have .foo, and in the specific package where I want to include the file, I would put !.foo.

This is what git is already doing for hierarchical .gitignore.

2. Respect the .npmignore priority.

On the readme of this repo is listed a rule set that is not followed by the out-of-tree-ignorefiles. Indeed, the content of both gitignore and npmignore is concatenated together instead of the npmignore being prioritized.

I think that it should be done as such:

  1. Find the ignorefile at the leaf package (i.e. is it a gitignore, a npmignore, or nothing)
  2. If an ignorefile is found at the leaf level, do the current algorithm, but only for the file type found at the leaf level, and instead of appending the content of the files, prepend it.

@nlf
Copy link
Contributor

nlf commented May 27, 2022

if an .npmignore file exists, it will be used and the .gitignore file will be ignored entirely, that's what this bit does https://github.com/npm/npm-packlist/blob/main/lib/index.js#L395-L397. your .gitignore is only being respected at all because the .npmignore file is empty, making its value falsey. that's a bug we should fix, but if we do it means in your example you would have no ignore rules applied at all.

a nested .gitignore/.npmignore layers on top of the root one, and the root one still applies within the nested directory unless the nested directory has a rule that overrides it. this function here: https://github.com/npm/npm-packlist/blob/main/lib/index.js#L36-L60 does exactly what you're describing for the ignore files that live outside of the workspace's directory tree, it starts at the root and builds a set of ignore rules by walking deeper into the file tree toward the workspace.

to tweak your example to work as you'd expect:

.
|_ .gitignore (contains 'dist')
|_ # .npmignore (delete this, it's not helping and would hurt if we fix the bug described above)
|_ package.json
|_ packages
|___ a
|_____ dist
|_______ index.js
|_____ .gitignore (contains '!dist')
|_____ # .npmignore (again, delete this)
|_____ package.json

this will cause the dist directory within packages/a to be included, but any other dist directories to be ignored, which i think is what you're asking for

@nlf
Copy link
Contributor

nlf commented May 27, 2022

alternatively if you want the dist directory to be ignored by git, but included by npm, keep the root .gitignore with dist and use a .npmignore that has !dist within packages/a instead of a .gitignore

another option, and arguably the better one, would be to edit packages/a/package.json and add "files": ["dist"] to it, which is the most explicit means of telling npm that you want the directory published

@nlf
Copy link
Contributor

nlf commented May 27, 2022

2. Respect the .npmignore priority.

On the readme of this repo is listed a rule set that is not followed by the out-of-tree-ignorefiles. Indeed, the content of both gitignore and npmignore is concatenated together instead of the npmignore being prioritized.

this is definitely a bug, you're right that the contents of .gitignore should be ignored if a .npmignore is present

edit: #108 fixes this

@louis-bompart
Copy link
Author

if an .npmignore file exists, it will be used and the .gitignore file will be ignored entirely, that's what this bit does https://github.com/npm/npm-packlist/blob/main/lib/index.js#L395-L397. your .gitignore is only being respected at all because the .npmignore file is empty, making its value falsey. that's a bug we should fix, but if we do it means in your example you would have no ignore rules applied at all.

Concerning the empty .npmignore, personally, I think it should play the role of saying to npm 'btw, please do not consider the gitignore and include everything except the default exclusion list'.

So yeah, in short, I did expect no ignore rules to be applied in my case.

a nested .gitignore/.npmignore layers on top of the root one, and the root one still applies within the nested directory unless the nested directory has a rule that overrides it. this function here: https://github.com/npm/npm-packlist/blob/main/lib/index.js#L36-L60 does exactly what you're describing for the ignore files that live outside of the workspace's directory tree, it starts at the root and builds a set of ignore rules by walking deeper into the file tree toward the workspace.

Yes, I misread when taking only the root and the leaf. However, I reckon https://github.com/npm/npm-packlist/blob/main/lib/index.js#L40
should instead be: result = ignoreContent + '\n' + result because, if I did understand properly the algorithm of the method, we're starting from the leaf and go down till we reach the root.

So if you have let's say root/packages/a with a gitignore at each level, you would end up with a 'compiled gitignore' that would look like this:

root
a
packages

Instead of

root
packages
a

@louis-bompart
Copy link
Author

louis-bompart commented May 27, 2022

2. Respect the .npmignore priority.

On the readme of this repo is listed a rule set that is not followed by the out-of-tree-ignorefiles. Indeed, the content of both gitignore and npmignore is concatenated together instead of the npmignore being prioritized.

this is definitely a bug, you're right that the contents of .gitignore should be ignored if a .npmignore is present

edit: #108 fixes this

Left a comment on #108, I think you need to carry "the type of ignorefile to use" in the recursion as soon as you found it, otherwise, you'll end up mixing npmignore and gitignore (potentially).

@nlf
Copy link
Contributor

nlf commented May 27, 2022

should instead be: result = ignoreContent + '\n' + result because, if I did understand properly the algorithm of the method, we're starting from the leaf and go down till we reach the root.

this would cause us to build a set of ignore rules that lists the root at the bottom, which is the reverse of what we want.

as written, based on your example directory structure, we iterate ./ -> ./packages and return a string that contains

contents-of-root
contents-of-packages

ignore rules are applied sequentially, starting with the root, this is how an ignore file in a subdirectory can re-include something that an ignore file at the root ignored, so we need the final rule list to include the contents of each file starting at the root and working deeper into the tree structure up to and including the parent of the workspace

@louis-bompart
Copy link
Author

should instead be: result = ignoreContent + '\n' + result because, if I did understand properly the algorithm of the method, we're starting from the leaf and go down till we reach the root.

this would cause us to build a set of ignore rules that lists the root at the bottom, which is the reverse of what we want.

as written, based on your example directory structure, we iterate ./ -> ./packages and return a string that contains

contents-of-root
contents-of-packages

ignore rules are applied sequentially, starting with the root, this is how an ignore file in a subdirectory can re-include something that an ignore file at the root ignored, so we need the final rule list to include the contents of each file starting at the root and working deeper into the tree structure up to and including the parent of the workspace

I reread the code, yes, my bad 😅 I misread how the path was constructed when reading the file

@nlf
Copy link
Contributor

nlf commented May 27, 2022

I reread the code, yes, my bad 😅 I misread how the path was constructed when reading the file

no worries! it happens to all of us, and often! i really appreciate your attention to this and making sure we get it right, it's a huge help

i'll follow up with another pull request that ensures we ignore a .gitignore if a .npmignore exists and is empty, i think that's the last of the remaining work here.

thanks again!

@louis-bompart
Copy link
Author

louis-bompart commented May 27, 2022

Thanks a lot @nlf

If I may bug you a bit more, I'm curious about #108 (comment). In short err, "why?"

I personally think that if we have a npmignore at the workspace level (in the example, in the folder a), we should only consider npmignore for workspace and root, because I see that as a package owner "yep, I'm using the npmignore flow, no gitignore should impact my packaging flow."

What do you think?

@nlf
Copy link
Contributor

nlf commented May 27, 2022

i can see why you would want it that way, and the answer as for why it is the way it is is the trickiest one to work around. it's the way it is because that's the way its always been, and changing it is definitely a breaking change and one that would take some care from our users to fix.

the ignore behavior was initially built using only the .gitignore so we respected those from very early on in npm's life. the .npmignore and then the files array in package.json were added later to provide more granularity.

if the concern were ignoring too many files, we would likely take the risk. the worst case there is an incomplete package gets published. the author can correct the problem, publish a new version, and deprecate the broken one and all is well.

the concern with stopping the behavior of respecting a root .gitignore and a subdirectory's .npmignore (or the reverse) is that we ignore not enough files, which means we're creating a risk of sensitive information being published as part of the package. this is still something that can be corrected, but it's trickier, and as i like to say "what was once public on the internet, is forever public on the internet" so you can never be certain that your sensitive data is actually gone

because of this i think we're probably stuck where we are. if you feel passionate about it though, please feel free to open an RRFC issue at https://github.com/npm/rfcs and we can discuss it in our weekly openrfc call

@louis-bompart
Copy link
Author

Wow, thanks for this comprehensive answer @nlf 😄

I'm not especially passionate about this issue, and, with this history, I do understand and would do take the same decision.
I was just really eager to understand the whys.

Thanks a lot for taking the time to explain things are going this way, it was really enlightening.

@nlf nlf closed this as completed in #108 May 31, 2022
louis-bompart added a commit to coveo/cli that referenced this issue Sep 13, 2022
https://coveord.atlassian.net/browse/CDX-1010

<!-- For Coveo Employees only. Fill this section.

CDX-1010

-->

## Proposed changes

I discussed with a developer from the npm team here
npm/npm-packlist#106 and got a good
understanding of their philosophy moving forward with the concept of
workspaces and their multi-level ignore files.

In short, npm will try to acquire an ignorefile at each directory level
and concatenate them. The peculiarity (at least IMO), is that if it
acquire a npmignore first, this will have no incidence for the
acquisition of the other ignorefile. So, we can end up with a mix of
`gitignore` and `npmignore`. I discussed this behavior with the npm
team, and they are kinda stuck, so its their least worst option here, so
we just have to adapt and make it work.

This is what this PR is about.

## Testing

 - Ran `npm publish --dry-run` on every package and checked them out.
 
 ## Prerequisites
 
 - [x] npm/npm-packlist#108
 - [ ] `npm-cli` released with ☝️
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug thing that needs fixing Needs Triage needs an initial review
Projects
None yet
2 participants