-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SDK and API metrics to GlueHiveMetastore #15355
Conversation
Add support for AWS SDK client request metrics and Glue API metrics to GlueHiveMetastore.
Anything else we need to do here before this can merge @arhimondr ? |
Hello, @pettyjamesm
Thanks, |
The metrics here don’t automatically publish to cloudwatch, which seems to be what you’re looking for by setting: com.amazonaws.sdk.enableDefaultMetric Instead, they record per request metrics so that they can be queried via presto’s standard metrics interface: the JMX connector. Collection of metrics in presto and the AWS SDK client(s) is distinct from the cloudwatch publishing interface because choosing to record timings is distinct from any mechanism to report them to a centralized repository (which is a billable choice that presto does not have a mechanism to perform automatically via system properties as far as I know). If you want debugging metrics for a given cluster, you can enable the JMX connector and query it locally within that cluster to get insights on demand (related: tip 8 in https://aws.amazon.com/blogs/big-data/top-9-performance-tuning-tips-for-prestodb-on-amazon-emr/ and https://prestodb.io/docs/current/connector/jmx.html ). If you want to use built in cloudwatch metrics collection, you’ll at least need to add the metrics publisher library to your class path and (especially for larger clusters) will want to make sure you understand how many metrics will be published automatically per node because of their billable nature. I don’t have figures on hand for how many metrics you might expect especially on a per node basis. |
Thanks for the detail instruction! I am already using JMX connector and publish to CW following the wiki you linked. And I am able to query the glue metric data. However I don't quite understand the output format. It also has 21 rows with first 20 rows Nan or 0.
|
When you query tables via the JMX connector, each node produces 1 row (hence, the "node" column that you'll find on every table). Since you're querying for stats about Glue In this context you can add a filter to the query based on the coordinator node id like |
That makes a lot of sense. Thank you for your great help! |
Add support for AWS SDK client request metrics and Glue API metrics to
GlueHiveMetastore
.