Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support line filter / diff input #1931

Closed
xmo-odoo opened this issue Jan 17, 2023 · 10 comments
Closed

Support line filter / diff input #1931

xmo-odoo opened this issue Jan 17, 2023 · 10 comments
Labels
core Related to core functionality question Asking for support or clarification

Comments

@xmo-odoo
Copy link

This is probably a use case which is more relevant for large (and old) codebases, when trying to "ratchet" rules progressively: in CI, it's useful to only check / fail on failures of the code which has been modified, as it allows progressively improving code-state without needing massive one-shot rewrites which can make history traversal and blames a lot more complicated.

This can be done by post-filtering finds emitted by ruff, but it would be useful as a pre / builtin feature, as it would avoid having to provide both input (--include/--exclude) and output (lines) filters.

@not-my-profile
Copy link
Contributor

I think it would make more sense to have a dedicated tool for that, which supports one of the output formats of ruff so then you could just pipe the output of ruff to that filter tool e.g.

ruff . --format=junit | junit_filter_git

I see no reason why this should be implemented in ruff directly.

@xmo-odoo
Copy link
Author

I see no reason why this should be implemented in ruff directly.

This objection is literally covered by the second paragraph.

@not-my-profile
Copy link
Contributor

not-my-profile commented Jan 17, 2023

Ruff is fast enough that you could just run it on the whole codebase and have the filter program filter out violations for all other files. If you care about wasted CPU cycles you could just invoke ruff like:

ruff  --format=junit -- ${{needs.changedfiles.outputs.all}} | junit_filter_git

where ${{needs.changedfiles.outputs.all}} would be some variable listing all the changed files, see for example this explanation for GitHub.

Something like this could very well be provided by a GitHub action so you wouldn't even have to configure that yourself. So yes I still see no reason why this should be implemented in ruff directly.

@charliermarsh
Copy link
Member

Thanks for filing :) I see the value in what you're describing, and -- since it's come up here -- in general, I'm open to implementing things directly in Ruff that could be accomplished by chaining together other tools. Not always, since everything we implement comes with some upfront and ongoing cost, but on a case-by-case basis. (One way I think about this: Black has Jupyter support built-in, even though the same behavior can be accomplished with nbQA, and that's useful.)

I think in this case, I'm unlikely to prioritize it if there's a reasonable workaround so I'd love to better understand the use-case. Are you imaging that this would only run over changed files? Or over changed lines? How do you think Ruff should determine the changed files?

@charliermarsh charliermarsh added question Asking for support or clarification core Related to core functionality labels Jan 17, 2023
@xmo-odoo
Copy link
Author

Not always, since everything we implement comes with some upfront and ongoing cost, but on a case-by-case basis. [...] I think in this case, I'm unlikely to prioritize it if there's a reasonable workaround

Certainly fair enough.

so I'd love to better understand the use-case. Are you imaging that this would only run over changed files? Or over changed lines?

I'd assume ruff would need to run over the entire file in order to correctly analyze the contents. Especially as the modified lines might not even provide a valid AST subtree. This'd be more of a "one-stop" shop to filter (include/exclude) and output (lines filter).

How do you think Ruff should determine the changed files?

I know that some tools can take a unified diff as input and extract the filtering from that, but I don't know if there's a ready-made crate for this, so as described above an alternative could be to extend the include/exclude arguments to allow file subranges.

@charliermarsh
Copy link
Member

I'd assume ruff would need to run over the entire file in order to correctly analyze the contents. Especially as the modified lines might not even provide a valid AST subtree. This'd be more of a "one-stop" shop to filter (include/exclude) and output (lines filter).

Ah sorry, what I meant to ask was: would you expect to see errors reported only from changed files (more errors), or from changed files (stricter)?

@xmo-odoo
Copy link
Author

Ah sorry, what I meant to ask was: would you expect to see errors reported only from changed files (more errors), or from changed files (stricter)?

Assuming the second occurrence of files should be line, then that.

And yes I do realise that a change in one line can cause an error in an other (which was not touched), but I think getting this perfect (or as close to as possible) would be complicated as it would be necessary to start from a snapshot of the full output, take a new full output (or at least file-wise in both cases since I assume ruff currently mostly operate on a per-file basis much as flake does), then use the diff's information to adjust all the messages in order to match pre- and post- content to determine whether an error is truly novel or is just a pre-existing error moved a few lines because code was inserted somewhere above it.

@charliermarsh
Copy link
Member

Uhh, yes, second occurrence should be line! Sorry about that.

Ok noted, let's keep this open! (There's also #1149 which accomplishes the same thing IIUC.)

@xmo-odoo
Copy link
Author

Uhh, yes, second occurrence should be line! Sorry about that.

No worries.

Ok noted, let's keep this open! (There's also #1149 which accomplishes the same thing IIUC.)

Oh I missed it sorry, didn't know about the term "baseline" (and almost certainly made my search using keywords like "diff", "ci", "git", ...).

If that's an option / consideration then it would be a lot better than this here proposal (which is a lot more simplistic).

@charliermarsh
Copy link
Member

Oh I missed it sorry, didn't know about the term "baseline" (and almost certainly made my search using keywords like "diff", "ci", "git", ...).

Me neither, before that Issue :)

If that's an option / consideration then it would be a lot better than this here proposal (which is a lot more simplistic).

It is! I'll close this for now in favor of that issue then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Related to core functionality question Asking for support or clarification
Projects
None yet
Development

No branches or pull requests

3 participants