Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21694][MESOS] Support Mesos CNI network labels #18910

Closed
wants to merge 4 commits into from

Conversation

susanxhuynh
Copy link
Contributor

JIRA ticket: https://issues.apache.org/jira/browse/SPARK-21694

What changes were proposed in this pull request?

Spark already supports launching containers attached to a given CNI network by specifying it via the config spark.mesos.network.name.

This PR adds support to pass in network labels to CNI plugins via a new config option spark.mesos.network.labels. These network labels are key-value pairs that are set in the NetworkInfo of both the driver and executor tasks. More details in the related Mesos documentation: http://mesos.apache.org/documentation/latest/cni/#mesos-meta-data-to-cni-plugins

How was this patch tested?

Unit tests, for both driver and executor tasks.
Manual integration test to submit a job with the spark.mesos.network.labels option, hit the mesos/state.json endpoint, and check that the labels are set in the driver and executor tasks.

@ArtRand @skonto

@SparkQA
Copy link

SparkQA commented Aug 10, 2017

Test build #80498 has finished for PR 18910 at commit 255e519.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -21,6 +21,7 @@ import org.apache.mesos.Protos.{ContainerInfo, Image, NetworkInfo, Parameter, Vo
import org.apache.mesos.Protos.ContainerInfo.{DockerInfo, MesosInfo}

import org.apache.spark.{SparkConf, SparkException}
import org.apache.spark.deploy.mesos.config._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to avoid wildcard import.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this makes it more consistent with MesosClusterScheduler

@@ -162,7 +163,11 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging {
}

conf.getOption("spark.mesos.network.name").map { name =>
val info = NetworkInfo.newBuilder().setName(name).build()
val networkLabels = MesosProtoUtils.mesosLabels(conf.get(NETWORK_LABELS).getOrElse(""))
Copy link
Contributor

@skonto skonto Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of this PR but...shouldn't spark.mesos.network.name have a config name like labels have (NETWORK_LABELS)?
At some point I think all mesos specific config options should be placed in the config file to be easily managed and referenced.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for making the NETWORK_LABELS alias.

@skonto
Copy link
Contributor

skonto commented Aug 10, 2017

@susanxhuynh have you tested how spark jobs behave with a real cni network this PR is enough right, just curious...

Copy link

@ArtRand ArtRand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @susanxhuynh. Just a question about your doc string and testing.

ConfigBuilder("spark.mesos.network.labels")
.doc("Network labels to pass to CNI plugins. This is a comma-separated list " +
"of key-value pairs, where each key-value pair has the format key:value. " +
"Example: key1=val1,key2=val2")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we use : to separate the key-value pairs? So it would say:
`Example: key1:val1,key2:val2

@@ -21,6 +21,7 @@ import org.apache.mesos.Protos.{ContainerInfo, Image, NetworkInfo, Parameter, Vo
import org.apache.mesos.Protos.ContainerInfo.{DockerInfo, MesosInfo}

import org.apache.spark.{SparkConf, SparkException}
import org.apache.spark.deploy.mesos.config._
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this makes it more consistent with MesosClusterScheduler

@@ -162,7 +163,11 @@ private[mesos] object MesosSchedulerBackendUtil extends Logging {
}

conf.getOption("spark.mesos.network.name").map { name =>
val info = NetworkInfo.newBuilder().setName(name).build()
val networkLabels = MesosProtoUtils.mesosLabels(conf.get(NETWORK_LABELS).getOrElse(""))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for making the NETWORK_LABELS alias.

assert(networkInfos.get(0).getLabels.getLabels(0).getKey == "key1")
assert(networkInfos.get(0).getLabels.getLabels(0).getValue == "val1")
assert(networkInfos.get(0).getLabels.getLabels(1).getKey == "key2")
assert(networkInfos.get(0).getLabels.getLabels(1).getValue == "val2")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to test that the driver's worker tasks also get the labels? It's important that all of the ContainerInfos associated with a job (not jus the driver scheduler) get the labels for CNI to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArtRand I believe this is testing the worker tasks: (L.572) the setBackend() method creates a MesosCoarseGrainedSchedulerBackend, L.577-579 creates the requirements for an Executor, (L. 580) backend.resourceOffers() looks for an offer that matches the requirements and launches an Executor task.
There is a separate unit test above, in MesosClusterSchedulerSuite.scala, which checks for the labels in the driver itself.

@SparkQA
Copy link

SparkQA commented Aug 11, 2017

Test build #80555 has finished for PR 18910 at commit d261593.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@susanxhuynh
Copy link
Contributor Author

@skonto @ArtRand Thanks for the feedback. I have fixed the documentation and added NETWORK_NAME to the config object. Please let me know what you think.

@skonto I have not tested this particular change on a real CNI network. I think the only difference is that a Spark job runs on a different network, but there's no change in the Spark functionality.

@SparkQA
Copy link

SparkQA commented Aug 11, 2017

Test build #80556 has finished for PR 18910 at commit dc09312.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ArtRand
Copy link

ArtRand commented Aug 21, 2017

LGTM.

@susanxhuynh
Copy link
Contributor Author

@vanzin Would you mind helping review this PR? Thanks.

@susanxhuynh
Copy link
Contributor Author

@srowen Sean, would you like to review this PR? Thanks.

@@ -70,4 +70,19 @@ package object config {
"during a temporary disconnection, before tearing down all the executors.")
.doubleConf
.createWithDefault(0.0)

private [spark] val NETWORK_NAME =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit, could you remove the space after private? that would make it consistent with the rest of the code.

I trust your judgment as to whether this is the right functional change for Mesos and the details and style look OK to me otherwise.

@SparkQA
Copy link

SparkQA commented Aug 23, 2017

Test build #81051 has finished for PR 18910 at commit 38e3b4d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@susanxhuynh
Copy link
Contributor Author

@srowen Thanks for reviewing. I've removed those spaces; please take a look.

@srowen
Copy link
Member

srowen commented Aug 24, 2017

Merged to master

@asfgit asfgit closed this in ce0d3bb Aug 24, 2017
@susanxhuynh susanxhuynh deleted the sh-mesos-cni-labels branch September 19, 2017 19:17
susanxhuynh added a commit to d2iq-archive/spark that referenced this pull request Jan 8, 2018
JIRA ticket: https://issues.apache.org/jira/browse/SPARK-21694

Spark already supports launching containers attached to a given CNI network by specifying it via the config `spark.mesos.network.name`.

This PR adds support to pass in network labels to CNI plugins via a new config option `spark.mesos.network.labels`. These network labels are key-value pairs that are set in the `NetworkInfo` of both the driver and executor tasks. More details in the related Mesos documentation:  http://mesos.apache.org/documentation/latest/cni/#mesos-meta-data-to-cni-plugins

Unit tests, for both driver and executor tasks.
Manual integration test to submit a job with the `spark.mesos.network.labels` option, hit the mesos/state.json endpoint, and check that the labels are set in the driver and executor tasks.

ArtRand skonto

Author: Susan X. Huynh <xhuynh@mesosphere.com>

Closes apache#18910 from susanxhuynh/sh-mesos-cni-labels.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants