Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

Support MetricNameSpace in executors and include shuffle to shuffleservice metrics #532

Closed
wants to merge 5 commits into from

Conversation

matyix
Copy link

@matyix matyix commented Oct 20, 2017

What changes were proposed in this pull request?

Support MetricNameSpace in shuffle to shuffleservice metrics.

Pass spark.metrics.namespace configuration to shuffle such as the shuffleservice can publish metrics with custom namespace. In case of shuffleservice the shuffle prefix is used in the metrics.

How was this patch tested?

Executed all unit and integration tests.

Manual testing through deploying a Spark cluster, Prometheus server, Pushgateway and ran SparkPi - and checking for the shuffle metrics in Pushgateway/Grafana.

@liyinan926
Copy link
Member

Unit test build for this PR seems stuck for almost 3 days.

@foxish
Copy link
Member

foxish commented Oct 23, 2017

@ssuchter @kimoonkim, should the unit test build timeout eventually?

@liyinan926
Copy link
Member

Some test failed due to OOM. Can someone who has admin access to the Jenkins instance kill the build?

@kimoonkim
Copy link
Member

@foxish I think we can time out the Jenkins jobs, say, after 8 hours. I'll see if i can make this change to Jenkins jobs.

@liyinan926 Yes, let me find and kill the hanging build.

@kimoonkim
Copy link
Member

I just checked. The build was killed already by Jenkins. The build log console says:

...
Build was aborted
Aborted by Jenkins Admin

@kimoonkim
Copy link
Member

Hmm, also the unit test had set up timeout already. The config page says:

Abort the build if it's stuck
Time-out strategy Absolute
    Timeout minutes: 60

I guess it doesn't always work :-(

@cvpatel
Copy link
Member

cvpatel commented Oct 23, 2017

@liyinan926 @kimoonkim @foxish Sorry forgot to update, but I killed the test earlier and added a 60 minute timeout to the configuration around noon today.

@cvpatel
Copy link
Member

cvpatel commented Oct 23, 2017

rerun all tests please

@mccheah
Copy link

mccheah commented Nov 9, 2017

I'm a little confused as to what this property actually does. I see the configuration of spark.metrics.namespace being set, but I don't see it being used in this pull request itself?

@@ -201,7 +202,8 @@ private[spark] object CoarseGrainedExecutorBackend extends Logging {
clientMode = true)
val driver = fetcher.setupEndpointRefByURI(driverUrl)
val cfg = driver.askSync[SparkAppConfig](RetrieveSparkAppConfig(executorId))
val props = cfg.sparkProperties ++ Seq[(String, String)](("spark.app.id", appId))
val props = cfg.sparkProperties ++ Seq[(String, String)](("spark.app.id", appId),
("spark.metrics.namespace", metricsNamespace))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't the driver provide this as it would with any other Spark property? i.e. if I set spark.metrics.namespace in my SparkConf then shouldn't that be what's used here?

@matyix
Copy link
Author

matyix commented Nov 10, 2017

Hello @mccheah - thanks for the feedback and review. The starting base for the PR was an earlier version and the custom metrics namespace in executors was missing back then. I have reverted the changes and updated the PR text as well reflecting the latest commit.

@erikerlandson
Copy link
Member

Is the idea here to backport the spark support for custom metric namespaces to this fork? I ask because I'd expect this to be resolved either via the upstreaming process or via our next rebase.

@matyix
Copy link
Author

matyix commented Nov 16, 2017

@erikerlandson After rebasing with latest branch-2.2-kubernetes the original PR has reduced the scope to support custom metrics namespace for external shuffle service only (missing from master).

@erikerlandson
Copy link
Member

@matyix Oh I see - I was referring to the next time we rebase against an upstream release (spark-2.2.1 or spark-2.3, etc)

@matyix
Copy link
Author

matyix commented Nov 17, 2017

@erikerlandson shall I push this PR upstream as well?

@matyix
Copy link
Author

matyix commented Jan 3, 2018

Closing this as it's redundant, this PR apache#19775 fixes this one as well.

@matyix matyix closed this Jan 3, 2018
ifilonenko pushed a commit to bloomberg/apache-spark-on-k8s that referenced this pull request Apr 17, 2019
apache-spark-on-k8s#532)

* [SPARK-25299] Use the shuffle writer plugin for the SortShuffleWriter.

* Remove unused

* Handle empty partitions properly.

* Adjust formatting

* Don't close streams twice.

Because compressed output streams don't like it.

* Clarify comment
ifilonenko pushed a commit to bloomberg/apache-spark-on-k8s that referenced this pull request Jul 10, 2019
apache-spark-on-k8s#532)

* [SPARK-25299] Use the shuffle writer plugin for the SortShuffleWriter.

* Remove unused

* Handle empty partitions properly.

* Adjust formatting

* Don't close streams twice.

Because compressed output streams don't like it.

* Clarify comment
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants