Add SDK and API metrics to GlueHiveMetastore #15355

pettyjamesm · 2020-10-26T14:07:00Z

Add support for AWS SDK client request metrics and Glue API metrics to GlueHiveMetastore.

== RELEASE NOTES ==
Hive Changes
* Add support for AWS SDK client request metrics and Glue API metrics to GlueHiveMetastore

Add support for AWS SDK client request metrics and Glue API metrics to GlueHiveMetastore.

pettyjamesm · 2020-11-02T16:29:09Z

Anything else we need to do here before this can merge @arhimondr ?

wjyao0316 · 2021-07-26T23:58:21Z

Hello,

@pettyjamesm
Can you help provide the instruction how to enable the Glue SDK metrics in Presto?
I tried adding -Dcom.amazonaws.sdk.enableDefaultMetrics in /etc/presto/conf/jvm.config but I get

java.lang.ClassNotFoundException: com.amazonaws.metrics.internal.cloudwatch.DefaultMetricCollectorFactory
Looks like the presto application in EMR is missing CloudWatchClient.

Thanks,
Wenjie

pettyjamesm · 2021-07-27T01:05:07Z

The metrics here don’t automatically publish to cloudwatch, which seems to be what you’re looking for by setting: com.amazonaws.sdk.enableDefaultMetric

Instead, they record per request metrics so that they can be queried via presto’s standard metrics interface: the JMX connector.

Collection of metrics in presto and the AWS SDK client(s) is distinct from the cloudwatch publishing interface because choosing to record timings is distinct from any mechanism to report them to a centralized repository (which is a billable choice that presto does not have a mechanism to perform automatically via system properties as far as I know).

If you want debugging metrics for a given cluster, you can enable the JMX connector and query it locally within that cluster to get insights on demand (related: tip 8 in https://aws.amazon.com/blogs/big-data/top-9-performance-tuning-tips-for-prestodb-on-amazon-emr/ and https://prestodb.io/docs/current/connector/jmx.html ). If you want to use built in cloudwatch metrics collection, you’ll at least need to add the metrics publisher library to your class path and (especially for larger clusters) will want to make sure you understand how many metrics will be published automatically per node because of their billable nature. I don’t have figures on hand for how many metrics you might expect especially on a per node basis.

wjyao0316 · 2021-07-27T07:19:48Z

Thanks for the detail instruction!

I am already using JMX connector and publish to CW following the wiki you linked. And I am able to query the glue metric data. However I don't quite understand the output format. It also has 21 rows with first 20 rows Nan or 0.
Do you happen to know how to understand the metric output here?

sudo presto-cli --catalog hive --execute "SELECT \"gettable.time.oneminute.count\", \"gettable.time.oneminute.p50\" FROM jmx.current.\"com.facebook.presto.hive.metastore.glue:name=hive,type=gluehivemetastore\""
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"0.0","NaN"
"1.688727509108893E-8","151.313634"

pettyjamesm · 2021-07-27T14:26:34Z

When you query tables via the JMX connector, each node produces 1 row (hence, the "node" column that you'll find on every table). Since you're querying for stats about Glue getTable calls, you would expect only the coordinator node to performed any of those calls, which is why you have NaN for all rows (these are your worker nodes) except for one (your coordinator).

In this context you can add a filter to the query based on the coordinator node id like WHERE node = '<coordinator node's id>' or some other indirect aspect to eliminate uninteresting rows like WHERE "gettable.time.oneminute.count" > 0

wjyao0316 · 2021-07-27T15:50:58Z

That makes a lot of sense. Thank you for your great help!

Add SDK and API metrics to GlueHiveMetastore

6a94ab8

Add support for AWS SDK client request metrics and Glue API metrics to GlueHiveMetastore.

pettyjamesm mentioned this pull request Oct 26, 2020

Add AWS SDK client request metrics to GlueMetastoreStats trinodb/trino#5693

Merged

pettyjamesm requested a review from arhimondr October 26, 2020 15:16

arhimondr approved these changes Oct 26, 2020

View reviewed changes

arhimondr merged commit 11c8053 into prestodb:master Nov 2, 2020

pettyjamesm deleted the glue-metastore-stats branch November 2, 2020 16:31

caithagoras mentioned this pull request Nov 12, 2020

Add release notes for 0.244 #15433

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SDK and API metrics to GlueHiveMetastore #15355

Add SDK and API metrics to GlueHiveMetastore #15355

pettyjamesm commented Oct 26, 2020

pettyjamesm commented Nov 2, 2020

wjyao0316 commented Jul 26, 2021 •

edited

Loading

pettyjamesm commented Jul 27, 2021 •

edited

Loading

wjyao0316 commented Jul 27, 2021 •

edited

Loading

pettyjamesm commented Jul 27, 2021

wjyao0316 commented Jul 27, 2021

Add SDK and API metrics to GlueHiveMetastore #15355

Add SDK and API metrics to GlueHiveMetastore #15355

Conversation

pettyjamesm commented Oct 26, 2020

pettyjamesm commented Nov 2, 2020

wjyao0316 commented Jul 26, 2021 • edited Loading

pettyjamesm commented Jul 27, 2021 • edited Loading

wjyao0316 commented Jul 27, 2021 • edited Loading

pettyjamesm commented Jul 27, 2021

wjyao0316 commented Jul 27, 2021

wjyao0316 commented Jul 26, 2021 •

edited

Loading

pettyjamesm commented Jul 27, 2021 •

edited

Loading

wjyao0316 commented Jul 27, 2021 •

edited

Loading