Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SDK and API metrics to GlueHiveMetastore #15355

Merged
merged 1 commit into from
Nov 2, 2020

Conversation

pettyjamesm
Copy link
Contributor

Add support for AWS SDK client request metrics and Glue API metrics to GlueHiveMetastore.

== RELEASE NOTES ==
Hive Changes
* Add support for AWS SDK client request metrics and Glue API metrics to GlueHiveMetastore

Add support for AWS SDK client request metrics and Glue API metrics to
GlueHiveMetastore.
@pettyjamesm
Copy link
Contributor Author

Anything else we need to do here before this can merge @arhimondr ?

@arhimondr arhimondr merged commit 11c8053 into prestodb:master Nov 2, 2020
@pettyjamesm pettyjamesm deleted the glue-metastore-stats branch November 2, 2020 16:31
@caithagoras caithagoras mentioned this pull request Nov 12, 2020
6 tasks
@wjyao0316
Copy link

wjyao0316 commented Jul 26, 2021

Hello,

@pettyjamesm
Can you help provide the instruction how to enable the Glue SDK metrics in Presto?
I tried adding -Dcom.amazonaws.sdk.enableDefaultMetrics in /etc/presto/conf/jvm.config but I get

java.lang.ClassNotFoundException: com.amazonaws.metrics.internal.cloudwatch.DefaultMetricCollectorFactory
Looks like the presto application in EMR is missing CloudWatchClient.

Thanks,
Wenjie

@pettyjamesm
Copy link
Contributor Author

pettyjamesm commented Jul 27, 2021

The metrics here don’t automatically publish to cloudwatch, which seems to be what you’re looking for by setting: com.amazonaws.sdk.enableDefaultMetric

Instead, they record per request metrics so that they can be queried via presto’s standard metrics interface: the JMX connector.

Collection of metrics in presto and the AWS SDK client(s) is distinct from the cloudwatch publishing interface because choosing to record timings is distinct from any mechanism to report them to a centralized repository (which is a billable choice that presto does not have a mechanism to perform automatically via system properties as far as I know).

If you want debugging metrics for a given cluster, you can enable the JMX connector and query it locally within that cluster to get insights on demand (related: tip 8 in https://aws.amazon.com/blogs/big-data/top-9-performance-tuning-tips-for-prestodb-on-amazon-emr/ and https://prestodb.io/docs/current/connector/jmx.html ). If you want to use built in cloudwatch metrics collection, you’ll at least need to add the metrics publisher library to your class path and (especially for larger clusters) will want to make sure you understand how many metrics will be published automatically per node because of their billable nature. I don’t have figures on hand for how many metrics you might expect especially on a per node basis.

@wjyao0316
Copy link

wjyao0316 commented Jul 27, 2021

Thanks for the detail instruction!

I am already using JMX connector and publish to CW following the wiki you linked. And I am able to query the glue metric data. However I don't quite understand the output format. It also has 21 rows with first 20 rows Nan or 0.
Do you happen to know how to understand the metric output here?

sudo presto-cli --catalog hive --execute "SELECT \"gettable.time.oneminute.count\", \"gettable.time.oneminute.p50\" FROM jmx.current.\"com.facebook.presto.hive.metastore.glue:name=hive,type=gluehivemetastore\""
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"1.688727509108893E-8","151.313634"

@pettyjamesm
Copy link
Contributor Author

When you query tables via the JMX connector, each node produces 1 row (hence, the "node" column that you'll find on every table). Since you're querying for stats about Glue getTable calls, you would expect only the coordinator node to performed any of those calls, which is why you have NaN for all rows (these are your worker nodes) except for one (your coordinator).

In this context you can add a filter to the query based on the coordinator node id like WHERE node = '<coordinator node's id>' or some other indirect aspect to eliminate uninteresting rows like WHERE "gettable.time.oneminute.count" > 0

@wjyao0316
Copy link

That makes a lot of sense. Thank you for your great help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants