-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27812][K8S][2.4] Bump K8S client version to 4.6.1 #26152
[SPARK-27812][K8S][2.4] Bump K8S client version to 4.6.1 #26152
Conversation
Thank you, @igorcalabria . |
ok to test |
BTW, @igorcalabria .
I highly recommend you to do that. |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #112247 has finished for PR 26152 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @igorcalabria and @srowen .
Merged to branch-2.4
# What changes were proposed in this pull request? Backport of #26093 to `branch-2.4` ### Why are the changes needed? https://issues.apache.org/jira/browse/SPARK-27812 https://issues.apache.org/jira/browse/SPARK-27927 We need this fix fabric8io/kubernetes-client#1768 that was released on version 4.6 of the client. The root cause of the problem is better explained in #25785 ### Does this PR introduce any user-facing change? No ### How was this patch tested? This patch was tested manually using a simple pyspark job ```python from pyspark.sql import SparkSession if __name__ == '__main__': spark = SparkSession.builder.getOrCreate() ``` The expected behaviour of this "job" is that both python's and jvm's process exit automatically after the main runs. This is the case for spark versions <= 2.4. On version 2.4.3, the jvm process hangs because there's a non daemon thread running ``` "OkHttp WebSocket https://10.96.0.1/..." #121 prio=5 os_prio=0 tid=0x00007fb27c005800 nid=0x24b waiting on condition [0x00007fb300847000] "OkHttp WebSocket https://10.96.0.1/..." #117 prio=5 os_prio=0 tid=0x00007fb28c004000 nid=0x247 waiting on condition [0x00007fb300e4b000] ``` This is caused by a bug on `kubernetes-client` library, which is fixed on the version that we are upgrading to. When the mentioned job is run with this patch applied, the behaviour from spark <= 2.4.0 is restored and both processes terminate successfully Closes #26152 from igorcalabria/k8s-client-update-2.4. Authored-by: igor.calabria <igor.calabria@ubee.in> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Attempting to start Spark 2.4.0 in Azure using AKS with Kubernetes version 1.14.6, we ran into the error referenced here: kubernetes/kubernetes#82131 I attempted to build an image from tag
Is it possible to apply this as a patch to 2.4.0, or will it absolutely require switching to the |
Hi, @sethhorrigan . What do you mean by |
If AKS breaks something in your production environment, please file an issue to them. That's the best way you get the commercial support you paid. |
Put another way, use 2.4.4 at least. 2.4.0-rc5 is some release candidate of an old release, not even a final release. |
@dongjoon-hyun in the commit referenced above (https://github.com/sethhorrigan/spark/commit/8f96a5ea3d078a205ceb5924bf7aa2af04e6ced1), you can see what I mean I am aware that @srowen the tag Has the change in this pull request been verified to fix kubernetes/kubernetes#82131 or is that just a hopeful guess? Edit: reading through the comments on https://issues.jenkins-ci.org/browse/JENKINS-59000 (referenced from kubernetes/kubernetes#82131) I see that using Hope this solution helps anyone else who stumbles on this thread as well. |
@sethhorrigan This PR also fixes the issue with kubernetes you mentioned(kubernetes-client was upgraded to a higher version). If you are building from source, I recommend that you start with the latest released minor(2.4.4) and apply this patch. Or, you could simply use |
What changes were proposed in this pull request?
Backport of #26093 to
branch-2.4
Why are the changes needed?
https://issues.apache.org/jira/browse/SPARK-27812
https://issues.apache.org/jira/browse/SPARK-27927
We need this fix fabric8io/kubernetes-client#1768 that was released on version 4.6 of the client. The root cause of the problem is better explained in #25785
Does this PR introduce any user-facing change?
No
How was this patch tested?
This patch was tested manually using a simple pyspark job
The expected behaviour of this "job" is that both python's and jvm's process exit automatically after the main runs. This is the case for spark versions <= 2.4. On version 2.4.3, the jvm process hangs because there's a non daemon thread running
This is caused by a bug on
kubernetes-client
library, which is fixed on the version that we are upgrading to.When the mentioned job is run with this patch applied, the behaviour from spark <= 2.4.0 is restored and both processes terminate successfully