-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SparkKubernetesOperator crd implementation #22253
Conversation
007830a
to
10543ab
Compare
10543ab
to
0635ca3
Compare
@dstandish - this is yet another example and contribution to the discussion we have here: https://lists.apache.org/thread/m40ljkgcqsnvbxz6vlgnwflygvr6vznz The change here spans acrross both "airflow/kubernetes" (part of core) and "cncf.kubernetes" provider. And by a quick look I have completely no idea how those two interact. Will the new provider work without upgrading Airflow after the change ? Will it require the change to be released? Will the core continue running with old No idea. This is far more important issue to solve than just deprecating config parameters. And the root cause of it is that unlike in all other providers we have functionality split betweeen "core" and "provider" and one uses the other and there is implicit dependency. There are two options we could approach to solve this problem
The first one might be possible - but we need to know if it really is possible, and whether it brings some backwards, compatibility issues. But ultimately - your proposal about moving core kubernetes settings is in-line with this change. The second case is "easier" (requires no real changes to the core except moving the provider code to the core) but it does not nicely fit our "modular" approach of core and separate providers (Kubernetes is essentially an optional dependency). I think - in order to decide on deprecating the kubernetes settings from core - we need to be sure that 1) is possible - and for that we need to have a plan how to achieve that. |
@potiuk what you are talking about is code sharing, provider depending on code in core, having deps intermixed with core deps. but just because the operator depends on core doesn't mean that it has to take it's configuration in a non-standard way, in a way that no other operator does, namely from |
Yeah. I know that aligning it to be "same as others" is a good goal as long as we know that it is going to be a separate provider in the future. To be honest I do not know - see continuation in https://lists.apache.org/thread/m40ljkgcqsnvbxz6vlgnwflygvr6vznz One of the possible paths for cncf.kubernetes after seeing how disruptive any changes might be and how unforeseen compatibility issues with cncf.kubernetes provider are is to incorporate it BACK into the core. I (and I think we all) have not enough discussion and exploration to understand if this is a possibe or better path - but I think we should take a look at this and make deliberate decision on the future of it before we attempt to make any changes in the settings, because we actually might end up with getting it incorporated into the core if we will not decouple K8S Executor from the core. |
@potiuk @dstandish can you let me know what action should be taken from myside? |
@hamedhsn - first of all : rebase (it has conflicts now). Secondly - make sure that all checks pass. Then ping us. |
3936fb4
to
4b49663
Compare
You can. See the output of the failed build It contains detailed instructions right after the error message. And our contrbution documentation https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#id7 contains a chapter conveniently named "documentation" which has links to the docs that explain how you can run docs building locally |
@potiuk its not that straightforward, after breeze installation when I run
|
@hamedhsn documentation generation is always a bit tough especially in the beginning and I speak from experience. I have added some comments. Make sure to check the error message carefully because they do say what's wrong. In other circumstances when the build errors out on whole blocks you probably need to check proper indentation (spaces, tabs) and line spacing. |
6cbf3f3
to
2022a89
Compare
2022a89
to
4213cd5
Compare
well, docs are green, just have a few other issues to resolve, it seems 👏 |
Almost there @hamedhsn !
And
|
@bolkedebruin can you approve the build pls? |
0858c25
to
2053d1f
Compare
@bolkedebruin sounds like all tests are green now |
Jihaa! Thank you for your patience and persistence @hamedhsn |
docs/apache-airflow-providers-google/operators/cloud/kubernetes_engine.rst
Outdated
Show resolved
Hide resolved
Side note @potiuk: this is where I do not like "unresolved conversations". They are hiding in outdated changes :/ |
@bolkedebruin Thanks for the review, glad it finally merged 😌 🎉 |
🎉 |
@bolkedebruin - you just gave the most important reason why enabling "requiring resolving conversation" is useful. Those unresolved conversations are STILL there. Does not matter if you we have "require resolving conversation" enabled. And you might have MISSED that there are unresolved conversation. By requiring them to be resolved, yes - you have to do few more clicks to find them, read and resolve. Which is PRECISELY the point. It forces SOMEONE to actually check and do some action to resolve the conversations to merge the PR. You migh not like having to do it yourself, but it is good for the project as a whole (and for you as well) because some important points from teh conversation could have been missed. I will actually add this comment to my list of positive effects that I am gatherting to present at the end of the trial. |
I'm of the opposite opinion here @potiuk . I understand why you would like to have it, I do not support it though as I think it a administrative waste tbh. Trust the authors and contributors to do the right thing rather than put this into technology. I have not seen your argument backed up by facts on the amount of PRs merged vs errors we have made while merging them. That is my biggest concern with adding this kind of administrative overhead. It seems to be a solution in search of a problem. This PR started over one and a half year ago. Its remarks are tied to that timeline. The world has progressed. The remarks were not surfaced by Github and still needed to be resolved while being irrelevant. The risk that this generates is that committers will start looking at the conversations and just click "resolve" without reading to get past the administrative part.
No it wasn't. The point is to prevent errors while merging. The means that is experimented with now is to accomplish that by forcing resolution by clicking "resolve". So your welcome to list it as a positive, I will list it as a negative :-). |
I simply thinnk that individual "little" time on reviewing the comments and resolving them is very little price to pay for "project" positive effect of making sure we have not forgotten everything. The value for project trumps (small) inconvenience for individual. I personally review and merge maybe 80 PRs a week. And I still do not see an inconvenience of resolving such unresolved comments. I just find it super-strange that someone finds it inconvenient to do it for few PRs they are seeing and iterating over during the week. Somehow I cannot believe it's that big of an overhead. And yes this case. is a BIT of an outlier because it accumulated some historical conversation that were not resolved "on the spot" - but hopefully, if we agree to get the conversattion requirement to go ahead, this will be less and less of a problem as people will get used to resolving conversations when they are well, resolved. Yes. That asks for change in an individual habits and behaviours. But I strongly belive the small inconvenience are nothing comparing with the "pause and think before merging and make sure what you are merging had not missed important comment" for the project. |
Going to agree with you @bolkedebruin. I had similar issues about 5 years ago, and while I think the intention is best, when it results in a PR being open on the order of years and not a more reasonable few months, it gives me pause. I can't comment on the details of the community here because it's been too long since I contributed, but it seems to me that something is off when the PR time is on the order of years. My heart goes out to you @potiuk, I hope you find the right balance, and seek feedback if/when you determine that things might not be working. |
@vsoch :) -> yep I know what you talk about, but I think in this case there is reverse cause <> effect. What we are discussing here (need to resolve comments to merge PR) is just causing problems when PR runs long, but I think it has little chance of causing it in the first place IMHO. The experiment here is to see the results of experiment where we enabled "require conversation resolving" to merge PR. We enabled it 2 weeks ago and the problem is that this PR has been opened a long time ago, so what @bolkedebruin is complaining here is that he had to go back and solve a number of old and stale conversations here - but the root cause was really that it had some outstanding changes and tests that were solved just a few days ago (and it took few minutes for @bolkedebruin - I guess - to resolve all the conversations that were outstanding (even if old) after the tests passed and comments were addressed. So I would not really connect your past experience then and the discussion we have :D |
I understood that. The high level pattern is adding developer churn to the process of review, regardless of the specific details. |
I believe it adds more to the process of reviewer and that one who merges the PR, not developer. Which If you are looking at the stats, about 30% -40% of all PRs is actually ... me .... So what I am proposing here is to add more "pauses" to my own workflow - way more than anyone else to be honest (and this is also good, I think because it will prevent me from merging PRs where I missed that there is unresolved conversation). |
Currently, the SparkKubernetes operator is based on spark-on-k8s-operator project.
While it is a good first step but it just accepts yaml file to create a spark job. Moreover, the built-in functionality for the operators is not available e.g., error handling, log management, retry, delete job, timeout, etc. It is not also designed to be used by Data Scientists and other users that are not familiar with Kubernetes yaml specification.
This PR reimplements the SparkKubernetes operator to add functionalities to simplify the interface and accept different parameters in Python. Similar to the KubernetesOperator, we have added the logic to wait for a job after submission, manage error handling, retrieve logs from the driver pod, ability to delete a spark job. It also supports out-of-the-box Kubernetes functionalities such as handling volumes, config maps, secrets, etc.