Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When running reuse via pre-commit, ignore untracked files (or: create lint-file) #578

Closed
vHanda opened this issue Aug 25, 2022 · 19 comments
Closed
Labels
enhancement New feature or request

Comments

@vHanda
Copy link
Contributor

vHanda commented Aug 25, 2022

Hello.

I sometimes have some random files which I haven't committed and don't necessarily plan to, but I don't want to add them to the '.gitignore'. Reuse complains about those files.

It would be awesome if I could run reuse with a --ignore-untracked option, which ignores all files which aren't in git index.


comment by @carmenbianca :

Other linters, pylint etc, when run via pre-commit, only run against staged/tracked files. This is a behaviour of pre-commit, which passes (I think) a list of files to the linters. reuse differs here, because there's presently no way to run the linter against a subset of files.

Let's create lint-file to lint a subset of files, and then allow the pre-commit to be configured to use that instead of lint.

@carmenbianca
Copy link
Member

Tempted to mark this wontfix as feature creep. pre-commit can handle this for you. When you commit some files using pre-commit, it quickly stashes the untracked files, runs the pre-commit stuff, then un-stashes the untracked files.

@fsfe fsfe deleted a comment from carmenbianca Aug 25, 2022
@ryandesign
Copy link
Contributor

Fine, then please update the documentation to provide the complete set of steps needed to accomplish this. Assume I didn't know anything about pre-commit before discovering the reuse tool.

@ryandesign
Copy link
Contributor

ryandesign commented Oct 7, 2022

To clarify: I have set up pre-commit to the best of my ability following the directions currently in the reuse documentation. Pre-commit runs the reuse tool when I commit anything. It prevents any commit from going through if there are any files in the directory that don't have proper license information, even if those files are untracked and are not part of the commit.

@carmenbianca
Copy link
Member

Ah, you may be right. pre-commit stashes and unstashes unstaged tracked files.

We'll need to look at this issue some time then. I'm inclined towards a solution that doesn't introduce git-specific interactions into the tool. I think the solution might be to only lint the files tracked by pre-commit. There's an issue open for only linting specified files, but I'm on my phone and can't easily find it.

@Wuestengecko
Copy link

Wuestengecko commented Oct 24, 2022

@carmenbianca

I'm inclined towards a solution that doesn't introduce git-specific interactions into the tool.

It has those already:

def _find_all_ignored_files(self) -> Set[Path]:
"""Return a set of all files ignored by git. If a whole directory is
ignored, don't return all files inside of it.
"""
command = [
GIT_EXE,
"ls-files",
"--exclude-standard",
"--ignored",
"--others",
"--directory",
# TODO: This flag is unexpected. I reported it as a bug in Git.
# This flag---counter-intuitively---lists untracked directories
# that contain ignored files.
"--no-empty-directory",
# Separate output with \0 instead of \n.
"-z",
]
result = execute_command(command, _LOGGER, cwd=self.project.root)
all_files = result.stdout.decode("utf-8").split("\0")
return {Path(file_) for file_ in all_files}

Looking at the ls-files invocation there and its man page, I found this interesting note (emphasis mine):

-i, --ignored

Show only ignored files in the output. When showing files in the index, print only those matched by an exclude pattern. When showing "other" files, show only those matched by an exclude pattern.
Standard ignore rules are not automatically activated, therefore at least one of the --exclude* options is required.

This issue is about exactly this set of files: Not tracked by git (yet), but also not on any ignore list.

Unfortunately, simply dropping --ignore entirely would result in all intentionally .gitignore-d files to not be listed. So the options as far as I can see are:

  • Two calls to ls-files (one with and one without the flag)
  • Inverting the current behavior: Listing all tracked files with git ls-files -z and inverting the check in is_ignored.

I presume the latter might become a memory issue on very large repos, but then again those probably have a lot of ignored files laying around as well anyways.

@nicorikken
Copy link
Member

For reference I think @carmenbianca was referring to issue #512

@Wuestengecko
Copy link

Hi! Is there any news about this issue? I had opened a PR which resolves this in #626, but there has been no movement on it for over a month now.

@mxmehl mxmehl self-assigned this Jan 11, 2023
@mxmehl
Copy link
Member

mxmehl commented Jan 11, 2023

We just had out first maintainers call this year and discussed the issue in length.

First of all, let me say that we all feel the pain by linters vs. untracked files that you actually never want to track. So thank you for making us think about it and even proposing a PR! But we are convinced that somehow ignoring it would lead to more issues that it solves.

  1. We checked some other linters, e.g. black and pylint, and they all lint untracked files. It's somehow an expected behaviour, even if it isn't ideal.
  2. We think it's confusing if reuse lint may return different output after git add --all. After all, we cannot guess which untracked files the user actually would like to add later, and which ones are just test script or temporary artifacts.
  3. We discussed adding this as optional behaviour via a flag to lint but decided against it to keep this subcommand as simple as it currently is.
  4. If we implemented this feature, we also would have to add the same magic to the Mercurial function. Certainly doable, but not desirable as it's sometimes not trivial to align fine-grained details in both VCS.
  5. There are other ways to fix this experience.

Ad 5., here is how we would make one's life easier with this specific issue:

  1. You could stash the test files, run lint, and unstash again. Similar to how pre-commit does it.
  2. You could make such temporary test files be ignored on your machine. For this, create a ~/.gitignore in which you define a custom pattern of files which you would like to ignore, e.g. *.myignorepattern* matching files like test.myignorepattern.sh. Then, make your git read this gitignore file via git config --global core.excludesfile '~/.gitignore'

All of this said, we'd like to close #626 as well.

@mxmehl mxmehl closed this as not planned Won't fix, can't repro, duplicate, stale Jan 11, 2023
@Wuestengecko
Copy link

I fully understand the reasoning behind this decision.

As for fixing the problem, I went with option 3: A wrapper script around the git hook that asks whether I want to ignore the errors and continue anyway. This has served me well both with REUSE complaints as well as several other occasional false-positives. (It's as simple as ${0%/*}/pre-commit.real "$@" || read -q '?Continue anyway? ' in zsh.)

@carmenbianca
Copy link
Member

@Wuestengecko Interesting! Normally when I encounter an issue like this with failing lints/tests because of unstaged files, I run git commit as normal, and subsequently pre-commit complains. I analyse why pre-commit failed, and if I deem it to be because of the unstaged files, I run git commit --no-verify, which skips pre-commit entirely.

(And if I use --no-verify too eagerly, the CI catches my mistakes, usually :) )

@vHanda
Copy link
Contributor Author

vHanda commented Jan 11, 2023

I really disagree with the conclusions.

Just cause some other linters behave this way, that doesn't mean reuse should. The key difference between typical "linters" and reuse is that reuse applies to all files. So, while you can easily have other random files lying around, and not have that bother other linters, with reuse that isn't doable.

Regarding 5 -

a. Stashing does not affect untracked files. So doing a stash doesn't do anything.
b. The whole point of having untracked files is that you want to see those files in the 'git status' and not have them ignored.

None of your alternatives really work.

Regardless, you've made your decision. My complaints aren't going to change your mind.

Please remember, the more difficult a tool is to use, the lower the chance that someone is going to adopt it or evangelize it.

@Wuestengecko
Copy link

Normally when I encounter an issue like this with failing lints/tests because of unstaged files, I run git commit as normal, and subsequently pre-commit complains. I analyse why pre-commit failed, and if I deem it to be because of the unstaged files, I run git commit --no-verify, which skips pre-commit entirely.

@carmenbianca I used to do it like this too, but having to remember to add -n the second time became annoying. Now I don't have to change the command anymore ;)

It's also convenient when I have a bunch of relatively unrelated changes in the work tree and want to split them into two or three different commits. If I see it's just the untracked files again for the second commit, and everything else was successful, I don't have to re-run at all.

And if I do mess up, I can rely on the CI as well. :)

Stashing does not affect untracked files.

@vHanda Use git stash -u

@vHanda
Copy link
Contributor Author

vHanda commented Jan 13, 2023

@vHanda Use git stash -u

I need to apologize for the slightly annoyed message.

I didn't realize that git stash -u was an option. Now I'm sincerely humbled and appreciate that you all all took the time to address this so thoughtfully. I, now, also completely understand the thought process.

Again, my apologies for the slightly hostile message, and thank you for working on this.

@ryandesign
Copy link
Contributor

Well, I don't completely understand yet, and still don't know how to accomplish what I want.

What I want is: I commit changes to my git repository. The pre-commit hook runs reuse to verify that what's being committed complies. Files that are not part of the commit (or, at least, files that have never been committed) are not checked and noncompliance of those files is irrelevant and does not block the commit from going through.

You speak of git stash -u. Am I supposed to include that instruction in the pre-commit setup somewhere? Where? As I said above, please update the documentation to provide the complete set of steps needed to accomplish this.

@carmenbianca
Copy link
Member

What I want is: I commit changes to my git repository. The pre-commit hook runs reuse to verify that what's being committed complies. Files that are not part of the commit (or, at least, files that have never been committed) are not checked and noncompliance of those files is irrelevant and does not block the commit from going through.

Ah. Let's re-open this. There's a subtle detail here that I think we overlooked when closing this.

Other linters, pylint etc, when run via pre-commit, only run against staged/tracked files. This is a behaviour of pre-commit, which passes (I think) a list of files to the linters. reuse differs here, because there's presently no way to run the linter against a subset of files.

I think the solution lies not in ignoring untracked files; it lies in enabling the prior behaviour. I think there's an issue for this, but I can't find it. I'll rename this issue.

@carmenbianca carmenbianca reopened this Jan 16, 2023
@carmenbianca carmenbianca changed the title Ignore untracked files When running reuse via pre-commit, ignore untracked files (or: create lint-file) Jan 16, 2023
@carmenbianca carmenbianca added the enhancement New feature or request label Jan 16, 2023
@stephanlachnit
Copy link

I'm quite interested in this since I've started using pre-commit as well.

I agree that I don't think having a different behavior for reuse lint that depends on the state of git is something that makes sense. However, what is the problem with a lint-file subcommand or lint --only option? This should be easy to implement. I think a --only subcommand makes more sense since a lint-file indicates that this is something common.

I even think the opposite, --exclude also makes sense, e.g. when you add packaging files on top and want to check the source. But this might be something for a different issue.

@cdwilson
Copy link

cdwilson commented Mar 29, 2024

Use git stash -u

I tried git stash -u but it results in any staged files being removed from the git index (since the index is included in the stash). Since this command is likely being run when the reuse pre-commit hook fails, there are always files in the index when it is run, so removing those staged files from the index is not the desired behavior.

I found that I needed to do git stash -u --keep-index instead which keeps any staged files in the index. However, since the staged files are also included in the stash, after making the commit, I need to run git stash pop --no-index so that git restores any modified/untracked files, but does not try to restore the staged files to the index (since we already just committed them).

While this works, it's annoying. Is there a simpler way to accomplish the same thing?

What I want is: I commit changes to my git repository. The pre-commit hook runs reuse to verify that what's being committed complies. Files that are not part of the commit (or, at least, files that have never been committed) are not checked and noncompliance of those files is irrelevant and does not block the commit from going through.

FWIW, this is exactly the behavior I want (and expected) too. I just want any modified/untracked files to be ignored by the pre-commit hook, and only staged files in the git index to be checked for compliance.

@Wuestengecko
Copy link

While this works, it's annoying. Is there a simpler way to accomplish the same thing?

Well, there is already an open PR for this, which you could just use ;) I do that in some repos, so I can confirm that it actually works.

Just change your .pre-commit-config.yaml entry to:

  - repo: https://github.com/Wuestengecko/reuse-tool
    rev: 7d19244ab46b6a125d1a4dbe4a3ebb9ca767ab1a
    hooks:
      - id: reuse

Do note that you'll have to revert back to the official repo once the PR (or whatever other solution) gets merged, as I'll eventually delete my fork then.

@mxmehl mxmehl removed their assignment Jul 9, 2024
zormit added a commit to ethersync/ethersync that referenced this issue Jul 10, 2024
I decided to use a pinned version that doesn't lint untracked files,
for details see
fsfe/reuse-tool#578 (comment)
zormit added a commit to lnceballosz/ethersync that referenced this issue Jul 10, 2024
I decided to use a pinned version that doesn't lint untracked files,
for details see
fsfe/reuse-tool#578 (comment)
zormit added a commit to lnceballosz/ethersync that referenced this issue Aug 14, 2024
I decided to use a pinned version that doesn't lint untracked files,
for details see
fsfe/reuse-tool#578 (comment)
blinry pushed a commit to lnceballosz/ethersync that referenced this issue Aug 16, 2024
I decided to use a pinned version that doesn't lint untracked files,
for details see
fsfe/reuse-tool#578 (comment)
@carmenbianca
Copy link
Member

Implemented in #1055, will release one of these days. Thank you!

blinry pushed a commit to lnceballosz/ethersync that referenced this issue Jan 7, 2025
I decided to use a pinned version that doesn't lint untracked files,
for details see
fsfe/reuse-tool#578 (comment)
blinry pushed a commit to lnceballosz/ethersync that referenced this issue Jan 7, 2025
I decided to use a pinned version that doesn't lint untracked files,
for details see
fsfe/reuse-tool#578 (comment)
blinry pushed a commit to lnceballosz/ethersync that referenced this issue Jan 14, 2025
I decided to use a pinned version that doesn't lint untracked files,
for details see
fsfe/reuse-tool#578 (comment)
blinry pushed a commit to ethersync/ethersync that referenced this issue Jan 16, 2025
I decided to use a pinned version that doesn't lint untracked files,
for details see
fsfe/reuse-tool#578 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants