-
Notifications
You must be signed in to change notification settings - Fork 118
spark-submit must work with the local kubectl proxy endpoint #63
Comments
@iyanuobidele Thanks for the pointer to those two issues. I'll investigate further. |
Ah nvm. @iyanuobidele, authentication issue was at my end and unrelated. Looks like it is watches that are broken and since quite a long time. Maybe in the interim, we need some fallback if |
The exception comes from the okhttp3 client used internally by the fabric8 client in its watcher implementation. I traced it here. Yes, I agree, but that changes a lot of things though. Since the |
I'm not sure why they used the websockets implementation instead of the plain hanging HTTP GET like a lot of other clients. Websockets seem to have a little broken support, with |
@foxish submitting through What is the path forward for getting this working? Do you intend to submit a PR to fabric8 and depend on a future release of that library? |
@erikerlandson put me in contact with one of the guys who works on fabric8 kubernetes-client, who said that they would look into GKE authentication support. If that lands, we can avoid the kubectl proxy solution. If that does not seem viable, the alternatives are:
I'll have a little bit of discussion with the Fabric8 guys and then we can decide on the best course of action. In either case, I can take this issue. |
@foxish What I did was to extract the http client - like we did before - and then simply issue a get request with We don't really need to send any messages to the server so a persistent connection with a one sender (server) should work. However, feel free to pull me into any of these conversations, I will be happy to help. I could work on this and send in a PR on it shortly. |
@iyanuobidele Thanks! I think that in addition to the hanging GET, we must also do a regular GET and then fetch the latest resource version. We should watch providing the resource version. Secondly, we'd need some robust handling of connection failures, or interruptions with a retry loop. I think the change to allow watches using a hanging HTTP GET should live upstream in the fabric8 library itself. One of the authors said that they'd be okay with taking the patch as long as it is not too complicated. I'm planning on trying this today/tomorrow. If it turns out complicated, we could implement the "alternate-watch" in our spark fork rather than in fabric8's, and in that case, you could send the PR as you described. |
Attempted a fix here: fabric8io/kubernetes-client#652 The next release of fabric8's client should unblock us for the release. |
## What changes were proposed in this pull request? In DataSourceV2Strategy, it seems we eliminate the subqueries by mistake after normalizing filters. We have a sql with a scalar subquery: ``` scala val plan = spark.sql("select * from t2 where t2a > (select max(t1a) from t1)") plan.explain(true) ``` And we get the log info of DataSourceV2Strategy: ``` Pushing operators to csv:examples/src/main/resources/t2.txt Pushed Filters: Post-Scan Filters: isnotnull(t2a#30) Output: t2a#30, t2b#31 ``` The `Post-Scan Filters` should contain the scalar subquery, but we eliminate it by mistake. ``` == Parsed Logical Plan == 'Project [*] +- 'Filter ('t2a > scalar-subquery#56 []) : +- 'Project [unresolvedalias('max('t1a), None)] : +- 'UnresolvedRelation `t1` +- 'UnresolvedRelation `t2` == Analyzed Logical Plan == t2a: string, t2b: string Project [t2a#30, t2b#31] +- Filter (t2a#30 > scalar-subquery#56 []) : +- Aggregate [max(t1a#13) AS max(t1a)apache-spark-on-k8s#63] : +- SubqueryAlias `t1` : +- RelationV2[t1a#13, t1b#14] csv:examples/src/main/resources/t1.txt +- SubqueryAlias `t2` +- RelationV2[t2a#30, t2b#31] csv:examples/src/main/resources/t2.txt == Optimized Logical Plan == Filter (isnotnull(t2a#30) && (t2a#30 > scalar-subquery#56 [])) : +- Aggregate [max(t1a#13) AS max(t1a)apache-spark-on-k8s#63] : +- Project [t1a#13] : +- RelationV2[t1a#13, t1b#14] csv:examples/src/main/resources/t1.txt +- RelationV2[t2a#30, t2b#31] csv:examples/src/main/resources/t2.txt == Physical Plan == *(1) Project [t2a#30, t2b#31] +- *(1) Filter isnotnull(t2a#30) +- *(1) BatchScan[t2a#30, t2b#31] class org.apache.spark.sql.execution.datasources.v2.csv.CSVScan ``` ## How was this patch tested? ut Closes apache#24321 from francis0407/SPARK-27411. Authored-by: francis0407 <hanmingcong123@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Currently, spark-submit fails against the local endpoint obtained by running
kubectl proxy
. Since the fabric8 client does not support the other auth providers, we need it to work against the kubectl proxy endpoint.The text was updated successfully, but these errors were encountered: