-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[outputs/stackdriver] Allow to group metrics to bypass MetricDescriptor quota (500) #5567
Conversation
It seems that 500 metric descriptors should be enough when metrics are using the measurement name and field names properly, such as in the
However we have some plugins that do not layout the metrics like this, here is an example of what the output from the prometheus input looks like:
This creates many unique measurement names, which requires many metric descriptors. However, I don't think the right fix is to move the measurement name into the field name, we need to address these issues at the metric creation points (such as #4415) . Also, I don't want to have multiple ways that the output can layout the data as that is a maintenance and usability headache, we need to pick a style and use it across the board. @fean5959a What is the layout of the metrics that are causing you to go over the 500 metric descriptor limit? |
You're right in the approach and I going to expose my context. I hava 2 Vault Cluster and 4 Consul Cluster, 28 VM Instances in GCP). I activated Hashicorp Telemetry and use Stackdriver Agent provided by Google. Vault and Consul send metrics through compliant Statsd agent. This is a Collectd "like" agent which organized metrics into 3 MetricDescriptor (derive, gauge, latency). Why I reach the quota with native Telegraf agent : for example consider metric "consul.session_ttl.active", I have in the reality consul..session_ttl.active so one per VM instance etc ...., and some other are like "net" metric in your example with many fields value ... For these reason I can't use Telegraf agent which create 1 measurement per metrics and field value (or payed to extand my quota). An other example with CPU Usage, I have 10 Metric Descriptor / VM Instance with native Telegraf agent ... Stackdriver Agent register use natively the method of my Pull Request ... After many tests to understand how Stackdriver works, a MetricDescriptor has only one type (integer, double, etc ....), so we can't create metric with many field of different type of value. If I right understood we can have per MetricDescriptor :
The goal of my pull request is just to work as Stackdriver Agent. Perhaps my pull request is just an other output plugin ? I right with you and with your comments, I just want a good solution for me, I think other people will have this problem with Stackdriver quota. Perhaps a good solution is to create a measurement per value type and then add tags per field ? |
I perform some other tests today and I confirm I reach Stackdriver quota with the last release Telegraf agent. I tested too an other version of my code more compliant with your description I create a MetricDescriptor per measurement/type value and put field in tags. Example : |
Is it even possible to have a single MetricDescriptor that contains multiple time series with different labels? From looking at the documentation it seems to me that each descriptor can only have a single time series. |
Well, what I understood and what I tested, yes I think that each descriptor can only have a sible time series but, a descriptor is characterized by :
So for a same timestamp and a single MetricDescriptor (Name : custom.googleapis.com/cpu-gauge, Value type : float, Kind Gauge) we can have retreive all CPU metrics. I'am not sure about vocabulary if this is consider like a sigle TimeSeries or multiple TimeSeries but it work and this is permit to play with Stackdriver quota. I confirme that native Stackdriver agent with statsd configuration work like this because when I tested it, I had only 3 MetricDescriptor available and all my metrics data organized by labels (tags). Actualy I don't run the PR because I change a little the code by it work very fine. I constat an other problem too, when there are a lot of metrics, api don't allow to write more than 1 point of a MetricDescriptor (Name + Kind + Value type + Labels) per request. If a follow my example it is not possible to write CPU Idle for more than 1 timestamp value. |
Thanks for the info, for sure this is a clever way to get Stackdriver to accept more data, but I'm very hesitant to add this layout since it doesn't feel like we would be structuring the data properly. Maybe it is best to contact Google support about an increased limit as suggested here? |
@fean5959a I'm going to close this pull request, I don't think this layout is something we will want to change. If you do contact support about an increased limit I'd be interested in knowing how it goes though. |
Required for all PRs:
Hello, I propose to add capabilities to group metric by tag to allow to bypass Stackdriver quota.
MetricDescriptor is limited to 500, so this PR allow to switch to register metrics like Stackdriver Agent and organize metrics by tags.
Initial logic is preserve.
I don't write unit tests by I tested on my GCP environment.