Support MetricNameSpace in executors and include shuffle to shuffleservice metrics #532

matyix · 2017-10-20T17:41:46Z

What changes were proposed in this pull request?

Support MetricNameSpace in shuffle to shuffleservice metrics.

Pass spark.metrics.namespace configuration to shuffle such as the shuffleservice can publish metrics with custom namespace. In case of shuffleservice the shuffle prefix is used in the metrics.

How was this patch tested?

Executed all unit and integration tests.

Manual testing through deploying a Spark cluster, Prometheus server, Pushgateway and ran SparkPi - and checking for the shuffle metrics in Pushgateway/Grafana.

…rvice metrics

liyinan926 · 2017-10-23T17:27:05Z

Unit test build for this PR seems stuck for almost 3 days.

foxish · 2017-10-23T18:16:53Z

@ssuchter @kimoonkim, should the unit test build timeout eventually?

liyinan926 · 2017-10-23T19:31:42Z

Some test failed due to OOM. Can someone who has admin access to the Jenkins instance kill the build?

kimoonkim · 2017-10-23T23:25:03Z

@foxish I think we can time out the Jenkins jobs, say, after 8 hours. I'll see if i can make this change to Jenkins jobs.

@liyinan926 Yes, let me find and kill the hanging build.

kimoonkim · 2017-10-23T23:29:35Z

I just checked. The build was killed already by Jenkins. The build log console says:

...
Build was aborted
Aborted by Jenkins Admin

kimoonkim · 2017-10-23T23:33:22Z

Hmm, also the unit test had set up timeout already. The config page says:

Abort the build if it's stuck
Time-out strategy Absolute
    Timeout minutes: 60

I guess it doesn't always work :-(

cvpatel · 2017-10-23T23:53:25Z

@liyinan926 @kimoonkim @foxish Sorry forgot to update, but I killed the test earlier and added a 60 minute timeout to the configuration around noon today.

cvpatel · 2017-10-23T23:55:37Z

rerun all tests please

mccheah · 2017-11-09T00:51:46Z

I'm a little confused as to what this property actually does. I see the configuration of spark.metrics.namespace being set, but I don't see it being used in this pull request itself?

mccheah · 2017-11-09T00:55:10Z

core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala

@@ -201,7 +202,8 @@ private[spark] object CoarseGrainedExecutorBackend extends Logging {
        clientMode = true)
      val driver = fetcher.setupEndpointRefByURI(driverUrl)
      val cfg = driver.askSync[SparkAppConfig](RetrieveSparkAppConfig(executorId))
-      val props = cfg.sparkProperties ++ Seq[(String, String)](("spark.app.id", appId))
+      val props = cfg.sparkProperties ++ Seq[(String, String)](("spark.app.id", appId),
+        ("spark.metrics.namespace", metricsNamespace))


Why can't the driver provide this as it would with any other Spark property? i.e. if I set spark.metrics.namespace in my SparkConf then shouldn't that be what's used here?

…le as that is pulled from driver spark conf

matyix · 2017-11-10T11:09:14Z

Hello @mccheah - thanks for the feedback and review. The starting base for the PR was an earlier version and the custom metrics namespace in executors was missing back then. I have reverted the changes and updated the PR text as well reflecting the latest commit.

erikerlandson · 2017-11-15T22:22:34Z

Is the idea here to backport the spark support for custom metric namespaces to this fork? I ask because I'd expect this to be resolved either via the upstreaming process or via our next rebase.

matyix · 2017-11-16T14:36:18Z

@erikerlandson After rebasing with latest branch-2.2-kubernetes the original PR has reduced the scope to support custom metrics namespace for external shuffle service only (missing from master).

erikerlandson · 2017-11-16T20:55:31Z

@matyix Oh I see - I was referring to the next time we rebase against an upstream release (spark-2.2.1 or spark-2.3, etc)

matyix · 2017-11-17T13:59:24Z

@erikerlandson shall I push this PR upstream as well?

matyix · 2018-01-03T16:06:17Z

Closing this as it's redundant, this PR apache#19775 fixes this one as well.

apache-spark-on-k8s#532) * [SPARK-25299] Use the shuffle writer plugin for the SortShuffleWriter. * Remove unused * Handle empty partitions properly. * Adjust formatting * Don't close streams twice. Because compressed output streams don't like it. * Clarify comment

Support MetricNameSpace in executors and include shuffle to shufflese…

5befed7

…rvice metrics

matyix added 2 commits November 1, 2017 09:12

Merge branch 'branch-2.2-kubernetes' into metric-namespace-support

772c37f

Merge branch 'branch-2.2-kubernetes' into metric-namespace-support

a4fb670

mccheah reviewed Nov 9, 2017

View reviewed changes

Revert passing metrics namespace to executores via environment variab…

4878ae5

…le as that is pulled from driver spark conf

Remove unused metrics namespace from Dockerfiles

8c7d903

matyix closed this Jan 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support MetricNameSpace in executors and include shuffle to shuffleservice metrics #532

Support MetricNameSpace in executors and include shuffle to shuffleservice metrics #532

matyix commented Oct 20, 2017 •

edited

Loading

liyinan926 commented Oct 23, 2017

foxish commented Oct 23, 2017

liyinan926 commented Oct 23, 2017

kimoonkim commented Oct 23, 2017

kimoonkim commented Oct 23, 2017

kimoonkim commented Oct 23, 2017

cvpatel commented Oct 23, 2017

cvpatel commented Oct 23, 2017

mccheah commented Nov 9, 2017

mccheah Nov 9, 2017

matyix commented Nov 10, 2017

erikerlandson commented Nov 15, 2017

matyix commented Nov 16, 2017

erikerlandson commented Nov 16, 2017

matyix commented Nov 17, 2017

matyix commented Jan 3, 2018

Support MetricNameSpace in executors and include shuffle to shuffleservice metrics #532

Support MetricNameSpace in executors and include shuffle to shuffleservice metrics #532

Conversation

matyix commented Oct 20, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

liyinan926 commented Oct 23, 2017

foxish commented Oct 23, 2017

liyinan926 commented Oct 23, 2017

kimoonkim commented Oct 23, 2017

kimoonkim commented Oct 23, 2017

kimoonkim commented Oct 23, 2017

cvpatel commented Oct 23, 2017

cvpatel commented Oct 23, 2017

mccheah commented Nov 9, 2017

mccheah Nov 9, 2017

Choose a reason for hiding this comment

matyix commented Nov 10, 2017

erikerlandson commented Nov 15, 2017

matyix commented Nov 16, 2017

erikerlandson commented Nov 16, 2017

matyix commented Nov 17, 2017

matyix commented Jan 3, 2018

matyix commented Oct 20, 2017 •

edited

Loading