Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

Add node selectors for driver and executor pods #355

Merged
merged 2 commits into from
Jul 18, 2017
Merged

Add node selectors for driver and executor pods #355

merged 2 commits into from
Jul 18, 2017

Conversation

FANNG1
Copy link

@FANNG1 FANNG1 commented Jun 22, 2017

Fixes #358

Allows assigning pod to specific
nodes, annotations seems a little complicated to use, we could simple use node selector

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @sandflee !

Can you explain more about the use case you need this for? I can imagine one around wanting to run Spark pods only nodes labeled as running HDFS for data locality purposes. Are there others you have in mind?

On annotations, one of the things we've been trying to do in this project is maintain compatibility with two versions of k8s at the same time -- "current" and "previous". Right now that's 1.6 and 1.5. Is the syntax you used supported in kubernetes 1.5?

@@ -430,6 +433,7 @@ private[spark] class KubernetesClusterSchedulerBackend(
.endMetadata()
.withNewSpec()
.withHostname(hostname)
.withNodeSelector(nodeSelector.asJava)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main reason to add this is easy debuging, there are several kubelets, I just want the driver and executor run on the specified machine. so I could analyze the process(such as look at cpu usage).
we had a yarn cluster running mr and spark jobs, there is a big nessesary for node labels(run some jobs on high mem machine), so I think spark on k8s jobs also need it

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the high memory machine use case, I would expect setting an appropriately large memory request on the driver/executor pods would cause the k8s scheduler to place them only in places where they fit, so here the high mem machine

the performance benchmarking use case is a good one

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high mem is just a example, there are some other factors like ssd, ppc

Copy link
Member

@foxish foxish Jun 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think trying to restrict the nodes a job runs on is a use case several people will have. But I like the solution of using the node affinity (annotation till 1.5, field in 1.6+), because it lets us express a superset of what we can express using node selectors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be valuable to just have support for node selectors and then later, have custom pod yamls (#38) for affinity, but we should have a discussion about this before adding this option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ash211 This node selector is not same as node affinity that I referred to as 1.6+ featured. As @foxish mentioned, node affinity is a superset of node selector. K8s added node affinity later to support more general use cases. From the reference doc:

nodeSelector provides a very simple way to constrain pods to nodes with particular labels. The affinity/anti-affinity feature, currently in beta, greatly expands the types of constraints you can express.
...
nodeSelector continues to work as usual, but will eventually be deprecated, as node affinity can express everything that nodeSelector can express

@ash211
Copy link

ash211 commented Jun 22, 2017

@kimoonkim we probably want to think about how this would interact with the locality work you've been doing. Should the preferences stack on top of each other, or should one replace the other?

Curious on your thoughts

@ash211
Copy link

ash211 commented Jun 22, 2017

Linking to #38 which is the umbrella issue for fully-generic customization of things like this via user-provided yaml files instead of incrementally adding additional options for every k8s feature.

@kimoonkim
Copy link
Member

@ash211, it seems the main use case of @sandflee is to debug a job, which I think will be useful. Then this, if specified as an option, would take precedence and suppress the locality node affinity for the job. I'd prefer renaming the option more clearly to express the intent for debugging, so people would be less confused about precedences or how multiple node preferences work together.

In terms of implementation, it might be better to switch this code to the node affinity requiredDuringSchedulingIgnoredDuringExecution mechanism (hard limit), so that it is clear that this will suppress any soft limit node affinities expressed as preferredDuringSchedulingIgnoredDuringExecution (HDFS locality node affinity uses this). This way, we can combine both node affinities in the pod spec and rely on well defined k8s behaviors as to what will happen.

@kimoonkim we probably want to think about how this would interact with the locality work you've been doing. Should the preferences stack on top of each other, or should one replace the other?

Curious on your thoughts

@ash211
Copy link

ash211 commented Jun 23, 2017

I think this is for more than just debugging use cases. Suppose you have a Spark job that requires a GPU, so you have to schedule Spark pods only on nodes with the special GPU hardware. You would still want to use HDFS locality where possible though, in addition to the required GPU locality (the job fails without GPU).

So we need:

  • a way to express the full affinity spec, not just the simpler node selector
  • a way to merge user-provided affinity spec with the Spark-generated affinity spec for HDFS locality

Am I over-thinking this?

@kimoonkim
Copy link
Member

I like the @ash211 proposal. node affinity can express both "hard" and "soft" constraints stacked up on top of each other. Within "soft" constraints, we could use weights to express relative priorities.

See an example from the reference doc below.

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
      - weight: 2
        preference:
          matchExpressions:
          - key: yet-another-node-label-key
            operator: In
            values:
            - yet-another-node-label-value

@foxish
Copy link
Member

foxish commented Jul 7, 2017

As discussed in our weekly call, I think we can take this for now, and support the more complex affinity semantics at a later time. @sandflee can you rebase this PR and resolve conflicts?

@FANNG1
Copy link
Author

FANNG1 commented Jul 7, 2017

@foxish patch rebased

@ash211
Copy link

ash211 commented Jul 11, 2017

rerun unit tests please

@ash211
Copy link

ash211 commented Jul 11, 2017

rerun integration test please

@ash211 ash211 changed the title add node selector for driver and executor pod Add node selectors for driver and executor pods Jul 11, 2017
@@ -385,6 +385,9 @@ private[spark] class KubernetesClusterSchedulerBackend(
.withValue(cp)
.build()
}
val nodeSelector = ConfigurationUtils.parseKeyValuePairs(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the new key-value pair structure. See how we handle labels and annotations.

spark.kubernetes.driver.nodeselector.<key> = `

Copy link

@mccheah mccheah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the new key-value paradigm.

@apache-spark-on-k8s apache-spark-on-k8s deleted a comment from mccheah Jul 12, 2017
@FANNG1
Copy link
Author

FANNG1 commented Jul 14, 2017

@mccheah @ash211 patch updated

</td>
</tr>
<tr>
<td><code>spark.kubernetes.node.selector</code></td>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't introduce this config at all if it's already deprecated

@@ -82,6 +82,12 @@ private[spark] class KubernetesClusterSchedulerBackend(
KUBERNETES_EXECUTOR_ANNOTATION_PREFIX,
KUBERNETES_EXECUTOR_ANNOTATIONS,
"executor annotation")
private val nodeSelector =
ConfigurationUtils.combinePrefixedKeyValuePairsWithDeprecatedConf(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use just ConfigurationUtils.parseKeyValuePairs so we don't introduce deprecated conf

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfigurationUtils.parseKeyValuePairs only parses the deprecated config format. It's probably sufficient to use SparkConf.getAllWithPrefix instead of using anything in ConfigurationUtils.

@ash211
Copy link

ash211 commented Jul 15, 2017

@sandflee looks like there's a merge conflict with the latest master -- are you able to fix the conflicts?

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- @mccheah ?

@ash211
Copy link

ash211 commented Jul 18, 2017

Will merge once build is green

@ash211
Copy link

ash211 commented Jul 18, 2017

rerun integration test please

@ash211 ash211 merged commit 6dbd32e into apache-spark-on-k8s:branch-2.1-kubernetes Jul 18, 2017
@ash211
Copy link

ash211 commented Jul 18, 2017

Thanks @sandflee for the contribution!

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019
puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants