-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More robust collection of file paths #596
More robust collection of file paths #596
Conversation
Using a null char as separator, which allows files with e.g. commas or end-of-line characters in their names.
Could you please share an example repo with such files? Why there's a need to have commas in file names? I am just curious, because I've never met this before. |
internal/config/command.go
Outdated
Interactive bool `mapstructure:"interactive" yaml:",omitempty" json:"interactive,omitempty" toml:"interactive,omitempty"` | ||
UseStdin bool `mapstructure:"use_stdin" yaml:",omitempty" json:"use_stdin,omitempty" toml:"use_stdin,omitempty"` | ||
StageFixed bool `mapstructure:"stage_fixed" yaml:"stage_fixed,omitempty" json:"stage_fixed,omitempty" toml:"stage_fixed,omitempty"` | ||
FilesFromStdin bool `mapstructure:"files_from_stdin" yaml:",omitempty" json:"files_from_stdin,omitempty" toml:"files_from_stdin,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't imagine a case when you need this option implicitly, because as I know git
does not pass files to STDIN. So, I assume the only use case is when you explicitly run something like echo file1 file2 | lefthook run do-something
. Do I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With implicitly you mean, from configuration?
In the use case of Git hooks, then true. But Lefthook can be invoked in multiple ways for multiple use cases. For example, an organization may prohibit acceptance of paths via --files
(for example, because it does have filenames with comma's in it, like data science CSV batches produced outside of their control) by enforcing this in the team-wide Lefthook config via files_from_stdin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With implicitly I mean from the git hook. If you specify lefthook run do-something --files-from-stdin
, then you probably expect that {files}
or {staged_files}
are replaced from the data parsed from STDIN. But there's no need to have a special template for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for that case, indeed. Please note my previous reply though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An organization may prohibit acceptance of paths via --files (for example, because it does have filenames with comma's in it, like data science CSV batches produced outside of their control) by enforcing this in the team-wide Lefthook config via files_from_stdin
I am sorry, I can't agree on this use case because it doesn't align with the main purpose of lefthook – git hooks management. Adding {files_from_stdin}
template doesn't fit into my vision of what lefthook should support because it adds a very special flow not related to git hooks at all.
I am ok with this change: --files-from-stdin
option forces lefthook to get the file list from STDIN and replace {files}
or {staged_files}
with the files from STDIN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean to add a files_from_stdin
template, but a configuration option. Checking that source code again, it does look it's only adding a configuration option.
But coming to think of your interpretation: yes, it may be pretty useful or required to provide the list of files to commands through stdin as well. Not via a run template, I agree. But that should be supported though (either with or without roundtrip through Lefthook scanning stdin).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrexox Can you check again whether I'm truly adding a template {files_from_stdin}
and if so, please point out which source code sections do this, so that I can remove them while keeping a configuration option files_from_stdin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was talking about the template, but I meant the configuration option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An organization may prohibit acceptance of paths via --files (for example, because it does have filenames with comma's in it, like data science CSV batches produced outside of their control) by enforcing this in the team-wide Lefthook config via files_from_stdin
I am sorry, I can't agree on this use case because it doesn't align with the main purpose of lefthook – git hooks management. Adding
{files_from_stdin}
template doesn't fit into my vision of what lefthook should support because it adds a very special flow not related to git hooks at all.
Then the above reflects where we stood, in our discussion.
Data science teams do record data files in Git (e.g., with Git LFS), and then these issues with commas pop up. An organization should be able to enforce files_from_stdin
and the only thing that can be made constant is an immutable config (file).
This official guidelines document warns against using commas in filenames, which means it does happen in some datasets and software should not break (indeed, a linter should be possible that detects this, this is where Lefthook would help).
Anecdotes:
internal/lefthook/run.go
Outdated
@@ -124,6 +125,7 @@ Run 'lefthook install' manually.`, | |||
SkipSettings: logSettings, | |||
DisableTTY: cfg.NoTTY || args.NoTTY, | |||
AllFiles: args.AllFiles, | |||
FilesFromStdin: args.FilesFromStdin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's no need to pass this argument. You can parse the files here and pass them via Files
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the functionality isn't behind a CLI option, it will break cases where users supply other information to standard input (e.g., using yes
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't get it. If the functionality isn't behind a CLI option, then lefthook is called from within a git hook. In this case files from STDIN don't make sense, since git doesn't pass files via STDIN.
And if lefthook is called explicitly, I'd like to make sure that it is called correctly. If the override is required – let's rely on the CLI options.
I want to distinguish the implicit calls that must make sense and explicit calls that tweak lefthook behavior, for instance change the files passed to the commands. That's my vision and I believe it aligns with KISS principle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't get it. If the functionality isn't behind a CLI option, then lefthook is called from within a git hook. In this case files from STDIN don't make sense, since git doesn't pass files via STDIN.
First, let's discuss Git hook behavior. According to its docs, Git hooks do get information through standard input, see e.g.: https://git-scm.com/docs/githooks#_pre_push
And if lefthook is called explicitly, I'd like to make sure that it is called correctly. If the override is required – let's rely on the CLI options.
That was exactly my intention, but previously you wrote:
I think there's no need to pass this argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's no need to pass this argument.
It was related to Go code. I meant it's better to read from STDIN in internal/lefthook/run.go
and pass the results as Files
option. It's Ok to have --files-from-stdin
CLI option that replaces files in lefthook commands.
Git hooks do get information through standard input
Yes, but it is related to Git while getting files from STDIN is not about git.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but perhaps there's a bit of a language barrier. Can you explain in particular your last comment?
The following I think are the facts:
- Git passes paths via stdin to a hook script, such as Lefthook, in some cases.
- Some Lefthook commands (i.e., Pylint) can read file paths via stdin.
- With or without passing of paths to Lefthook via stdin, Lefthook commands can execute command lines that may output paths via stdout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Git passes paths via stdin to a hook script, such as Lefthook, in some cases.
Git passes data with a special format: https://git-scm.com/docs/githooks#_pre_push. This is not files list separated with newlines. So, an option files_from_stdin
can't be safely used for a git hook. In this case this option is only valid for the explicit call of lefthook. This means we can keep just a CLI argument --files-from-stdin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me show how I see the configuration:
# lefthook.yml
custom-hook:
commands:
lint:
files: git diff --name-only main # custom files
run: python scripts/linter.py {files}
When called like lefthook run custom-hook
, {files}
will be replaced with the results of git diff --name-only main
. When called like find . -name '*.py' -print0 | lefthook run custom-hook --files-from-stdin
{files}
will be replaced from the \0-separated list from STDIN.
Does it sound good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrexox Only for pre-push, pre-receive and reference-transaction stdin gets used. I was intending for the configuration option to apply even when Lefthook is invoked from the CLI. I prefer to check for those hooks that use stdin in the source code rather than not having the config key.
What you're asking to me is like, can you show an example of someone needing a macro in Word document sent over e-mail? The problem is that you don't know which files people supply, e.g. in a PR branch. These are valid filenames. If you're asking for user stories rather than attacker stories, then as a user, I want to script Lefthook in a cross-platform way, without manipulating e.g., Git tool output to replace null chars or newlines with a comma. Mind that |
I am just curious. I agree that it is a possible case. I don't mind adding the support for getting files from STDIN and passing files via |
@mrexox I'm very keen to get this merged, so that my team has another blocker removed. Would you be so kind to develop tests and to write a (draft) docs/changelog entry to your liking already, even if one thing still must be ironed out? |
I'll be glad to help with polishing the PR and adding documentation and changelog entries when we finish the discussion 👍 |
@mrexox Can we continue this ? |
@sanmai-NL, yes, I plan to add a few edits to the PR:
I can do them myself since I think I've got the vision. |
Yes please continue, that sounds acceptable to me personally (although as discussed, removing |
@sanmai-NL, I've added the tests and refactored some parts. Now overrides are applied not only to I also added parsing files from STDIN by Please, review the change, and if it's OK I will merge the PR and prepare a release next week. |
@mrexox, but splitting on both I'm only now ready to review. Regrettably, it seems I assessed your change differently from how it was implemented. I thought you also added an option to split on newlines (exclusively), but you combined it with the null character based splitting. Splitting on |
@sanmai-NL, having newlines in file names is not conventional (however is possible). Do you have such files, or are you afraid that it can break something for your work? |
The latter, such files can be added to pull request branches and it would break tooling. |
Or rather, it would break Lefthook but not other tooling. |
Alright, I will change this in next fix version to use only |
Closes #564.
⚡ Summary
Lefthook is currently not completely ready for repositories with filenames that contain commas.
--file
as the successor of--files
. This option can be repeated for a list of file paths.--files
is dangerous, e.g., one can create a fileREADME.md,..
and if it gets supplied to a fixer, who knows what happens.--files-from-stdin
. This option reads file paths from stdin, separated by a null character. This allows for file names with a comma or newline, and is interoperable with Git and Unix tools (e.g.,git ls-files -z
,git diff --name-only -z
,find -print0
).@mrexox Can you test and add automated tests? I specifically advise to test interaction with existing stdin-related features.
☑️ Checklist