Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

Add support for publishing Spark metrics into Prometheus #531

Closed
wants to merge 5 commits into from
Closed

Add support for publishing Spark metrics into Prometheus #531

wants to merge 5 commits into from

Conversation

matyix
Copy link

@matyix matyix commented Oct 20, 2017

What changes were proposed in this pull request?

Publishing Spark metrics into Prometheus - as discussed earlier in #384. Implemented a metrics sink that publishes Spark metrics into Prometheus via Prometheus Pushgateway. Metrics data published by Spark is based on Dropwizard. The format of Spark metrics is not supported natively by Prometheus thus these are converted using DropwizardExports prior pushing metrics to the pushgateway.

Also the default Prometheus pushgateway client API implementation does not support metrics timestamp thus the client API has been ehanced to enrich metrics data with timestamp.

How was this patch tested?

This PR is not affecting the existing code base and not altering the functionality. Nevertheless, I have executed all unit and integration tests. Also this setup has been deployed with Helm and running for a few weeks now for several long and short-running workloads on a K8S cluster (on AWS) and been monitored via Prometheus (Prometheus 1.7.1 + Pushgateway 0.3.1).

Manual testing through deploying a Spark cluster, Prometheus server, Pushgateway and ran SparkPi.

@matyix matyix changed the title Add support for prometheus Add support for publishing Spark metrics into Prometheus Oct 20, 2017
@tnachen
Copy link

tnachen commented Oct 20, 2017

This looks like should be submitted to upstream apache git repo instead.

@foxish
Copy link
Member

foxish commented Oct 20, 2017

Thanks for the PR @matyix. I agree with @tnachen, this doesn't seem k8s specific and might as well be an upstream PR

@matyix
Copy link
Author

matyix commented Oct 21, 2017

Hello @tnachen and @foxish - reason I submitted here is that Prometheus is usually the default monitoring solution on Kubernetes and this is more welcomed/useful by/for the K8S community. Also I had the assumption that the PR is good to land here, as suggested by @foxish and @kimoonkim in the original issue I opened a while ago #384.

Additionally this is more convenient for me as I am using the Spark-on-K8S fork only and all the tests, etc I did was on K8S (as Prometheus, Pushgateway, etc).

Is there any chance it can land here? Ultimately this fork (hopefully) will be merged upstream so the Prometheus based monitoring will be available for all Spark users.

@foxish
Copy link
Member

foxish commented Oct 21, 2017

I'm okay with letting it rest here for the time being, as long as we have a plan to upstream it along with the other parts.

@matyix
Copy link
Author

matyix commented Oct 22, 2017

Yes @foxish I totally agree with you, makes only sense if this gets upstreamed. Meanwhile will fix the checkstyle violation which causes the integration test to fail.

@ash211
Copy link

ash211 commented Oct 23, 2017

Yep, since this is orthogonal to the k8s changes made to Spark this should ideally go upstream. @matyix can you please open an issue upstream at https://issues.apache.org/jira/browse/SPARK so we can make prometheus available to all Spark users, not just k8s users?

@matyix
Copy link
Author

matyix commented Oct 24, 2017

Hello @ash211 - I've created the issue in the Apache Jira - https://issues.apache.org/jira/browse/SPARK-22343 - please let me know what are the next steps.

@aloiscochard
Copy link

Hello @matyix, this is a great work!

I wanted to use it, but our pushgateway use https could you make this a parameter? that would be awesome.

Thanks in advance.

@aloiscochard
Copy link

FYI: I did "monkey patch" by simply replacing the two hardcoded occurrence of the "http" string with "https" and it works!

@matyix
Copy link
Author

matyix commented Oct 31, 2017

Hello @aloiscochard - I have updated the PR with support for changing protocols.

@ash211 @foxish @tnachen - the check-style issue is fixed as well, so now all checks are passing. Let me know if anything else needed to have the PR merged.

@erikerlandson
Copy link
Member

I'm a bit confused from the thread above - I agree with the philosophy that this ought to go upstream independent from spark-on-kube. If so, I'd suggest that this PR be transferred to be directly against apache/spark instead, referencing SPARK-22343

@matyix
Copy link
Author

matyix commented Nov 17, 2017

@erikerlandson @foxish I've pushed the same PR (without K8S specifics upstream), please find it here.

@matyix matyix closed this Jun 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants