Add node selectors for driver and executor pods #355

FANNG1 · 2017-06-22T02:22:38Z

Fixes #358

Allows assigning pod to specific
nodes, annotations seems a little complicated to use, we could simple use node selector

ash211

Thanks for the contribution @sandflee !

Can you explain more about the use case you need this for? I can imagine one around wanting to run Spark pods only nodes labeled as running HDFS for data locality purposes. Are there others you have in mind?

On annotations, one of the things we've been trying to do in this project is maintain compatibility with two versions of k8s at the same time -- "current" and "previous". Right now that's 1.6 and 1.5. Is the syntax you used supported in kubernetes 1.5?

ash211 · 2017-06-22T06:34:51Z

.../scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala

@@ -430,6 +433,7 @@ private[spark] class KubernetesClusterSchedulerBackend(
      .endMetadata()
      .withNewSpec()
        .withHostname(hostname)
+        .withNodeSelector(nodeSelector.asJava)


@kimoonkim was this the k8s 1.6+ feature you were referencing in https://github.com/apache-spark-on-k8s/spark/pull/316/files#diff-206ce9343d722622e995071cbb69c330R338 ?

the main reason to add this is easy debuging, there are several kubelets, I just want the driver and executor run on the specified machine. so I could analyze the process(such as look at cpu usage).
we had a yarn cluster running mr and spark jobs, there is a big nessesary for node labels(run some jobs on high mem machine), so I think spark on k8s jobs also need it

for the high memory machine use case, I would expect setting an appropriately large memory request on the driver/executor pods would cause the k8s scheduler to place them only in places where they fit, so here the high mem machine

the performance benchmarking use case is a good one

high mem is just a example, there are some other factors like ssd, ppc

I do think trying to restrict the nodes a job runs on is a use case several people will have. But I like the solution of using the node affinity (annotation till 1.5, field in 1.6+), because it lets us express a superset of what we can express using node selectors.

It may be valuable to just have support for node selectors and then later, have custom pod yamls (#38) for affinity, but we should have a discussion about this before adding this option.

@ash211 This node selector is not same as node affinity that I referred to as 1.6+ featured. As @foxish mentioned, node affinity is a superset of node selector. K8s added node affinity later to support more general use cases. From the reference doc:

nodeSelector provides a very simple way to constrain pods to nodes with particular labels. The affinity/anti-affinity feature, currently in beta, greatly expands the types of constraints you can express.
...
nodeSelector continues to work as usual, but will eventually be deprecated, as node affinity can express everything that nodeSelector can express

ash211 · 2017-06-22T06:44:00Z

@kimoonkim we probably want to think about how this would interact with the locality work you've been doing. Should the preferences stack on top of each other, or should one replace the other?

Curious on your thoughts

ash211 · 2017-06-22T07:04:52Z

Linking to #38 which is the umbrella issue for fully-generic customization of things like this via user-provided yaml files instead of incrementally adding additional options for every k8s feature.

kimoonkim · 2017-06-22T21:32:24Z

@ash211, it seems the main use case of @sandflee is to debug a job, which I think will be useful. Then this, if specified as an option, would take precedence and suppress the locality node affinity for the job. I'd prefer renaming the option more clearly to express the intent for debugging, so people would be less confused about precedences or how multiple node preferences work together.

In terms of implementation, it might be better to switch this code to the node affinity requiredDuringSchedulingIgnoredDuringExecution mechanism (hard limit), so that it is clear that this will suppress any soft limit node affinities expressed as preferredDuringSchedulingIgnoredDuringExecution (HDFS locality node affinity uses this). This way, we can combine both node affinities in the pod spec and rely on well defined k8s behaviors as to what will happen.

@kimoonkim we probably want to think about how this would interact with the locality work you've been doing. Should the preferences stack on top of each other, or should one replace the other?

Curious on your thoughts

ash211 · 2017-06-23T08:00:18Z

I think this is for more than just debugging use cases. Suppose you have a Spark job that requires a GPU, so you have to schedule Spark pods only on nodes with the special GPU hardware. You would still want to use HDFS locality where possible though, in addition to the required GPU locality (the job fails without GPU).

So we need:

a way to express the full affinity spec, not just the simpler node selector
a way to merge user-provided affinity spec with the Spark-generated affinity spec for HDFS locality

Am I over-thinking this?

kimoonkim · 2017-06-23T15:37:58Z

I like the @ash211 proposal. node affinity can express both "hard" and "soft" constraints stacked up on top of each other. Within "soft" constraints, we could use weights to express relative priorities.

See an example from the reference doc below.

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
      - weight: 2
        preference:
          matchExpressions:
          - key: yet-another-node-label-key
            operator: In
            values:
            - yet-another-node-label-value

foxish · 2017-07-07T06:15:50Z

As discussed in our weekly call, I think we can take this for now, and support the more complex affinity semantics at a later time. @sandflee can you rebase this PR and resolve conflicts?

FANNG1 · 2017-07-07T08:30:05Z

@foxish patch rebased

ash211 · 2017-07-11T19:08:13Z

rerun unit tests please

ash211 · 2017-07-11T19:08:24Z

rerun integration test please

mccheah · 2017-07-12T02:09:20Z

.../scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala

@@ -385,6 +385,9 @@ private[spark] class KubernetesClusterSchedulerBackend(
        .withValue(cp)
        .build()
    }
+    val nodeSelector = ConfigurationUtils.parseKeyValuePairs(


We should use the new key-value pair structure. See how we handle labels and annotations.

spark.kubernetes.driver.nodeselector.<key> = `

mccheah

Use the new key-value paradigm.

FANNG1 · 2017-07-14T07:55:38Z

@mccheah @ash211 patch updated

ash211 · 2017-07-14T17:32:05Z

docs/running-on-kubernetes.md

+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.node.selector</code></td>


we shouldn't introduce this config at all if it's already deprecated

ash211 · 2017-07-14T17:34:16Z

.../scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala

@@ -82,6 +82,12 @@ private[spark] class KubernetesClusterSchedulerBackend(
          KUBERNETES_EXECUTOR_ANNOTATION_PREFIX,
          KUBERNETES_EXECUTOR_ANNOTATIONS,
          "executor annotation")
+  private val nodeSelector =
+      ConfigurationUtils.combinePrefixedKeyValuePairsWithDeprecatedConf(


let's use just ConfigurationUtils.parseKeyValuePairs so we don't introduce deprecated conf

ConfigurationUtils.parseKeyValuePairs only parses the deprecated config format. It's probably sufficient to use SparkConf.getAllWithPrefix instead of using anything in ConfigurationUtils.

ash211 · 2017-07-15T00:29:11Z

@sandflee looks like there's a merge conflict with the latest master -- are you able to fix the conflicts?

ash211

LGTM -- @mccheah ?

ash211 · 2017-07-18T19:12:53Z

Will merge once build is green

ash211 · 2017-07-18T20:51:56Z

rerun integration test please

ash211 · 2017-07-18T22:43:17Z

Thanks @sandflee for the contribution!

…ream Update from upstream

)

ash211 reviewed Jun 22, 2017

View reviewed changes

mccheah mentioned this pull request Jun 27, 2017

User-specified node selector on pods #358

Closed

ash211 changed the title ~~add node selector for driver and executor pod~~ Add node selectors for driver and executor pods Jul 11, 2017

mccheah reviewed Jul 12, 2017

View reviewed changes

mccheah suggested changes Jul 12, 2017

View reviewed changes

apache-spark-on-k8s deleted a comment from mccheah Jul 12, 2017

ash211 suggested changes Jul 14, 2017

View reviewed changes

ss

357f127

ash211 approved these changes Jul 18, 2017

View reviewed changes

Merge branch 'branch-2.1-kubernetes' into node-selector

8285da2

mccheah approved these changes Jul 18, 2017

View reviewed changes

ash211 merged commit 6dbd32e into apache-spark-on-k8s:branch-2.1-kubernetes Jul 18, 2017

foxish pushed a commit that referenced this pull request Jul 24, 2017

Add node selectors for driver and executor pods (#355)

e086f4d

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019

Merge pull request apache-spark-on-k8s#355 from palantir/rk/from-upst…

2529984

…ream Update from upstream

puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019

Add node selectors for driver and executor pods (apache-spark-on-k8s#355

f666b10

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add node selectors for driver and executor pods #355

Add node selectors for driver and executor pods #355

FANNG1 commented Jun 22, 2017 •

edited by ash211

Loading

ash211 left a comment

ash211 Jun 22, 2017

FANNG1 Jun 22, 2017

ash211 Jun 22, 2017

FANNG1 Jun 22, 2017

foxish Jun 22, 2017 •

edited

Loading

foxish Jun 22, 2017

kimoonkim Jun 22, 2017

ash211 commented Jun 22, 2017

ash211 commented Jun 22, 2017

kimoonkim commented Jun 22, 2017

ash211 commented Jun 23, 2017

kimoonkim commented Jun 23, 2017

foxish commented Jul 7, 2017 •

edited

Loading

FANNG1 commented Jul 7, 2017

ash211 commented Jul 11, 2017

ash211 commented Jul 11, 2017

mccheah Jul 12, 2017

mccheah left a comment

FANNG1 commented Jul 14, 2017

ash211 Jul 14, 2017

ash211 Jul 14, 2017

mccheah Jul 14, 2017

ash211 commented Jul 15, 2017

ash211 left a comment

ash211 commented Jul 18, 2017

ash211 commented Jul 18, 2017

ash211 commented Jul 18, 2017

Add node selectors for driver and executor pods #355

Add node selectors for driver and executor pods #355

Conversation

FANNG1 commented Jun 22, 2017 • edited by ash211 Loading

ash211 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

foxish Jun 22, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ash211 commented Jun 22, 2017

ash211 commented Jun 22, 2017

kimoonkim commented Jun 22, 2017

ash211 commented Jun 23, 2017

kimoonkim commented Jun 23, 2017

foxish commented Jul 7, 2017 • edited Loading

FANNG1 commented Jul 7, 2017

ash211 commented Jul 11, 2017

ash211 commented Jul 11, 2017

Choose a reason for hiding this comment

mccheah left a comment

Choose a reason for hiding this comment

FANNG1 commented Jul 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ash211 commented Jul 15, 2017

ash211 left a comment

Choose a reason for hiding this comment

ash211 commented Jul 18, 2017

ash211 commented Jul 18, 2017

ash211 commented Jul 18, 2017

FANNG1 commented Jun 22, 2017 •

edited by ash211

Loading

foxish Jun 22, 2017 •

edited

Loading

foxish commented Jul 7, 2017 •

edited

Loading