Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the 'version' command to use Click #24591

Closed
wants to merge 4 commits into from

Conversation

blag
Copy link
Contributor

@blag blag commented Jun 21, 2022

This is part of a series of PRs breaking up #22613 into smaller, more reviewable chunks. The end result will be rewriting the existing airflow CLI to use Click instead of argparse. For motivation, please see #22708.

This PR adds airflow version subcommand to the airflow-ng command. It's exactly the same as the airflow version command, it's just implemented with Click. This PR also adds an additional test module for the new command. I opted to copy/add a new test module instead of keeping a single module and making it compatible with both argparse and Click, because that way it will be easier to remove the tests that rely on argparse once all commands are rewritten, the CLI is completely converted to utilize Click, and the airflow-ng command replaces the airflow command.

While this incorporates the changes from #24590, I will rebase this once that PR is merged to make the relevant changes in this PR smaller and easier to separate out and review. Until then, you can use GitHub's compare view to compare this branch to astronomer:convert-to-click-cli-pr to see just the diff for this PR. I have rebased this against main.

For comparison to the modules this supersedes:

@blag blag force-pushed the convert-version-command-to-click branch 2 times, most recently from 756ac37 to 4f7ee90 Compare June 23, 2022 06:29
kaxil pushed a commit that referenced this pull request Jun 23, 2022
This is the first PR of what will be a series of PRs breaking up #22613 into smaller, more reviewable chunks. The end result will be rewriting the existing `airflow` CLI to use Click instead of argparse. For motivation, please see #22708.

This PR installs Click, adds constraints to Rich_Click so we can rely on some nice features in recent versions of that, adds a new barebones `airflow-ng` console script, and tweaks some CLI internals to be more flexible between argparse and Click.

To see how this initial groundwork will be used by future PRs, see #22613, and to see how some of this will be used please see #24591.
@blag blag force-pushed the convert-version-command-to-click branch from 4f7ee90 to 5c26105 Compare June 23, 2022 21:55
@blag blag changed the title Convert version command to click Rewrite the 'version' command to use Click Jun 24, 2022
@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Jun 24, 2022
@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@airflow_cmd.command('version')
def version():
"""Displays Airflow version at the command line"""
console = Console()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Breeze already uses rich click. Maybe it should also use the same architecture, ie define console as singleton with configured theme/settings. See here https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/utils/console.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think it's a good idea to setup a consistent communication approach -> i.e. have custom "message type" tags rather than color tags - having that and documenting why and how to use them (like we did here) https://github.com/apache/airflow/blob/main/dev/breeze/doc/adr/0011-unified-communication-with-the-users.md - brings consistent user experience. And introducing themes makes it easy to switch to a "colour-blind-friendly" mode from the beginning.

There are 8% males with red/green collour defficiency https://en.wikipedia.org/wiki/Color_blindness and males are (for better or worse) most of the airflow users, so it might make sense to implement a consistent "theme" approach and ability to switch, especially that it is rather easy and except the need of retrieving console dynamically, does not bring too much of an overhead in the long run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree that a Console() singleton and a single theme/setting would be ideal. But I tried to scope down these PRs to ensure that converting to Click was as "mechanical" of a process as possible. Similar to the pre-commit hook below, I think that is a great idea to do after we convert over to using Click, and I have expanded the "Styling Customization" subsection in the original omnibus PR description to include that.

But as below, please let me know if you'd like me to do that now, or if we can defer that to an issue for later work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think it's a good idea to setup a consistent communication approach -> i.e. have custom "message type" tags rather than color tags - having that and documenting why and how to use them (like we did here) https://github.com/apache/airflow/blob/main/dev/breeze/doc/adr/0011-unified-communication-with-the-users.md - brings consistent user experience. And introducing themes makes it easy to switch to a "colour-blind-friendly" mode from the beginning.

There are 8% males with red/green collour defficiency https://en.wikipedia.org/wiki/Color_blindness and males are (for better or worse) most of the airflow users, so it might make sense to implement a consistent "theme" approach and ability to switch, especially that it is rather easy and except the need of retrieving console dynamically, does not bring too much of an overhead in the long run.

Sure, but I would not do that in this PR, but convert first, then add theme support as a follow up. a) Focus on one thing at a time, and b) that can be done in parallel to the migration work.

To make it easier we could have

@cache
def get_console() -> Console:
    return Console()

in airflow/cli/__init__.py, just so that everything in the CLI uses that singleton already.

@@ -18,6 +18,7 @@
# under the License.

from airflow.cli import airflow_cmd
from airflow.cli.commands import version # noqa: F401
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use __all__ to avoid the NOQA comment instead.

Also, how much additional overhead is introduced for other commands, due to Click needing to be initiated?

Copy link
Member

@potiuk potiuk Jun 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @uranusjr. Until we have https://peps.python.org/pep-0690/ implemented (which if we are lucky gets into python 3.12 IMHO) - comment from `breeze' experience - we have implemented a check to make sure that only minimum set of imorts is actually done while the @click decorators are being parsed.

This means (and in Breeze we are following it) that most of the imports used in "command" are local imports. But we do not only have to implement it now, but also make sure it will be kept in the future, otherwise anyone committing changes to commands in the future will not fall into a "trap" of adding top-level import and influencing auto-complete to take even several hundred milliseconds - which will make it barely usable.

We have a pre-commit for it:

https://github.com/apache/airflow/blob/main/.pre-commit-config.yaml#L638

And I guess similar pre-commit should be run for all @click commands

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uranusjr Thanks for the pointer with __all__. I don't have comprehensive statistics on additional overhead, although that was discussed starting here in the "omnibus" PR, and improved in a related PR. I think it might be easier to address any slowdowns on a case-by-case basis for each command rewrite instead of all at once in this PR. That being said, here is some not-super-scientific timings for the version subcommand:

argparse

❯ time ( for i in $(seq 10); do airflow version --help 1>/dev/null; done )
( for i in $(seq 10); do; airflow version --help > /dev/null; done; )  4.59s user 0.90s system 99% cpu 5.533 total

Click

❯ time ( for i in $(seq 10); do airflow-ng version --help 1>/dev/null; done )
( for i in $(seq 10); do; airflow-ng version --help > /dev/null; done; )  4.21s user 0.83s system 99% cpu 5.093 total

argparse

❯ time ( for i in $(seq 10); do airflow version 1>/dev/null; done )
( for i in $(seq 10); do; airflow version > /dev/null; done; )  4.57s user 0.90s system 99% cpu 5.513 total

Click

❯ time ( for i in $(seq 10); do airflow-ng version 1>/dev/null; done )
( for i in $(seq 10); do; airflow-ng version > /dev/null; done; )  4.20s user 0.82s system 98% cpu 5.069 total

@potiuk I have tweaked the imports for this subcommand to improve timing, and captured your suggestion in the original omnibus PR description (in the "Future Work" section). I'm happy to also create an issue to ensure that doesn't get lost, but I think that might be out of the scope for this PR. I do think that should be implemented before we finally switch the airflow command to being Click-based, but since this is only modifying a brand new, totally separate airflow-ng command I don't think it should be a requirement for this PR, or other PRs converting other commands, to be merged.

But I'll leave the final resolution up to you. Please let me know if you'd like me to add that pre-commit hook now, or if we can defer it to an issue for later work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think console "accessibility" can be quite easily deferred - it's easy to change later (however it would be great that people who take care about implementing other parts will have an understanding what is our approach of user communication is and how they should write their CLI's - but this does not impact design and implementation approach as much, so it can be done as "next" PR (but it should be done before others get engaged- to make sure we keep consistent approach).

But I'd say pre-commit or some other (not necessary pre-commit) gate for not doing unnecessary imports and making sure our CLI commands are optimized for the way how click works are absolutely necessary as a "founding PR" - mainly to work out which approach will be good and make sure any next command will follow it.

I personally think it is quite important because this will gate contributors from making mistakes that are not at all obvious but they can easily compound into unusable CLI. Once we get out of the gates, it's very difficult to keep it if there is no check in place. If you look at some history, every now and then mainly @ashb performs such ad-hoc clean-ups which will speed-up CLI import by a factor of few times - impacting heavily user performance.
An example of this here: https://github.com/apache/airflow/pull/21438/files (there were a few more complex but I found this one first). I think this is a "known" and "expected" problem, and we should rather prevent this problem from even occuring - it is always cheaper to prevent stuff in the first place than deal with it afterwards.

The thing here is that just implementing such gate might reveal some problems and influence the way our modules are structured, and how they intract with rest of airlfow.

Main problem is that we have not kept it in check from the very beginning and we have no verification - so any new command added might break the "fast import" approach (and well - they did, many times).

Problems with those command and using click, this will get far worse than current implementation of the CLI - if we don't gate it from the beginning, because we have less control over it. Click heavily relies on importing all modules in all click-annotated functions and it's not possible (afaik) to change it. And it is super easy to make even simple mistake that leads to huge slow-down for example by adding a TypeHint to a function in the same module that has an click-annotated command). If we don't gate it, we will likely see the slow-downs rapidly growing if we are not careful.

And it's not "academic" discussion. We've already been there - if you look at the history #6594 - the original CLI implementation of Airflow had the very same problem and it became unbearable and borderline unusable. With click we can't do this lazy import the same way as @mik-laj did, because click will simply make all the "click" annotated methods load at startup. And even worse - alll of them are actually parsed every time user presses for auto-completion. This last point can likely be optimized by pre-computing and caching completion, but AFAIK there is no good implementation of such caching yet ( I thought about writing one but had no time to do it). But the architecture of click currently kind of "forces" you to write optimized command definitions.

And we want click interface/autocomplete to be better than the current one - not to regress.

Also airflow at places has rather "bad legacy" of dragging literally half of the airflow imports once some classes are imported (especially conf). One bad import at top level might add literally seconds to boostrap time. While this is something yet-to-be-solved, this one is rather difficult due to backwards compatibility so we rather make sure that people adding commands will not fall into the same trap.

Those are not isolated airflow concerns - those problem even made "pre-commit" maintainer plainly refused accepting click as a cli solution in order to implement auto-complete (pre-commit/pre-commit#1119 - where multiple people - including myself - offered help in optimizing it). And our CLI is far more complex and elaborated than pre-commit one.

So I believe, we need to make sure our click-annotaed files commands are implemented - from day 0. Otherwise we risk that some decisions made by the authors and the commands are implemented in the way that makes it difficult to change, so basically the author of the original implementation might not realize that decisions made when implementing it have some bad consequences and chooses an "easy path" instead of - for example - refactor some part of code and remove unnecessary couplings. The bad thing is that it will not be immediately visible. The nature of it is that it will get slower with every new command and we will only realise that we need to optimize when it will need a change similar to Kamil's #6594 - +4000,-3000 lines of code.

I do not want to hold it off - so I'd ask what @ashb and @mik-laj think as they were mainly dealing with the CLI problems of Airflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk Honest question then: Given click's architecture (load eveyrthing at once) seems counter to our goals (speedly startup, quick completion) why are we actually using click for airflow?

Did we ever discuss this in a wider audience?

Copy link
Member

@potiuk potiuk Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is even more important actually if we decide to explore mypyc route, to work out the right structure of the code so that it can be "mypyc-ed".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it might be more helpful to comment in #22613 if you are making suggestions for future work. A GitHub or mailing list discussion might be more appropriate if you would like more people to explore restructuring the code to better fit other pursuits.

And if you would like to provide feedback about the version subcommand rewrite, this comment thread, or the general comments for this PR would be great for that.

But I'm having difficulty figuring out what the hard requirements for this PR are, what the softer goals of the rewrite as a whole are, and what are proposals for future work.

I would like to keep the scope of this PR as small as possible. Is this PR a good Click implementation of the version subcommand or not - that's the question here.

If we need to pause the Click CLI rewrite and restructure the CLI to resolve other issues first, that's fine. But it seems like major scope creep to go from this version subcommand conversion all the way to "research and possibly completely restructure the CLI for an entirely different goal".

And if we do have larger re-architecting questions we need to resolve before converting over to Click, then perhaps an AIP and a broader discussion (to capture hard requirements and overall goals of the CLI, and investigate possible alternatives besides just Click) would be appropriate after all. In which case, I will simply close this and all related PRs and wait until there's some consensus on the way forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we do have larger re-architecting questions we need to resolve before converting over to Click, then perhaps an AIP and a broader discussion (to capture hard requirements and overall goals of the CLI, and investigate possible alternatives besides just Click) would be appropriate after all. In which case, I will simply close this and all related PRs and wait until there's some consensus on the way forward.

Yes. I think this is the best way forward.

Copy link
Member

@potiuk potiuk Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the learning from that discussion in this PR is - click introduction is far from "done deal". Merging any PR with click changes when there are so many doubts about whether it will be performant enough.

Seems that at least few of us are extremely concerned about it and seems to be something really important that can break the whole benefit click might bring and make the experience worse.

Merging any click related PR makes no sense before those concerns are addressed IMHO. And we have good set of ideas that could be explored to find out an answer to the question "are we able to make click performance good enough for us". There should be Proof Of Concept that should explore those possibilities and see if this might work for us and likely an AIP describing it - trying to implement some of the most potentially troublesome cases and measuring what are the delays introduced and how we can structure the code (and possibly implement other options like mypyc) speed it up. Some studies involving the acceptable delays can be found and referred to I think.

My proposal @blag - if you want to continue introducing your idea - is to turn your work (and experience you gained so far when preparing this PR and other CLI actions) into this POC that will explore those options and lead the process on proposing, discussing and voting on the AIP based on that POC. I think just merging this PR without doing this exploration makes very little sense.

It's not a "scope creep" - it's just realising that scope of the orignal PR does not bring answers to more general question "do we really want to do it?" which results from the knowledge of "can we make it good enough?".

I think the discussion above was valuable to show that we currently have no answer to either of those questions and merging the PR does not bring us any closer - so we should not do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW. Just to encourage you @blag - this very much resembles my story in Airlfow (which was quoted in https://github.com/readme/featured/oss-decision-making). I though I was 100% sure I know how the change to our dependency management shoudl be - and I was sure it is right by talking to a few people - but when I "announced" publicly the change as a "change is coming" and sparked a more wide-spread conversation I was reminded that this needs far more of a discussion, delibration and thoughts.

I rolled my sleeves up then and started to do the ground work. Four implemented AIPs, becoming a committer, PMC member in the meantime and about 2 years later we got a process where we had a good solution to the problem I saw when I started - very, very different from what I announced, but it was handling all the complexities of the problems, I had no idea they existed in the first place.

I hope this story might encourage you to follow up.

@blag blag force-pushed the convert-version-command-to-click branch from 5c26105 to 43a8992 Compare June 29, 2022 09:19
@ashb
Copy link
Member

ashb commented Jun 29, 2022

Does pallets/click#945 help btw?

@potiuk
Copy link
Member

potiuk commented Jun 29, 2022

Does pallets/click#945 help btw?

Yep. This is very much in line with what I thought. We should explore those options and check it as part of the founding PR - because it might impact the way we want to write the commands (and make sure that either we verify that whatever rules we will have to keep it optimized. It does not have to be pre-commit - just "unified" way of approaching the CLI in the way that will keep them optimized and making sure it's verified at PR time (if it is not easily verifiable by review).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:CLI okay to merge It's ok to merge this PR as it does not require more tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants