-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<Question> Getting "One or more TimeSeries could not be written" when attempting to emit metrics following the Java MetricsExporter example here #339
Comments
@psx95 can you take a look? |
Hi, I'm also facing the same issue. Basically, I took the example here and modified it a bit so that it can run in a simple Spring application to do the counter every time a Post request comes in.
When I deploy this app to Cloud Run, I always get a ton of these errors:
I tried to set the "maximum number of Cloud Run instances" to 1 but the error still occurs. Hope we have a solution soon. Thank you. |
Hi @ZiggyZaggyy & @dtd2512, Thank you for your detailed reports on this issue. I did some investigation on using the exporter in serverless environments (particularly Cloud Run) and based on your reports and my investigation, the root cause is probably that the timeseries being emitted from multiple instances of serverless environments (like Cloud Run/Cloud Functions) are indistinguishable. This is due to a lack of an When timeseries from two different instances is indistinguishable, it either leads them to be out of order or being written to too many times - these were the errors you were seeing. @dtd2512 your workaround to set the max instances of cloud run to 1 should technically have worked, as there is only a single instance. However, Cloud Run does not make any hard guarantees and particularly at the time of deploying/re-deploying a service, there might be more than 1 instance active. If the metric export happens around this time, you could still experience this error. There is a metric that shows you the number of active instances in Cloud Run. Did you confirm there was indeed only a single active instance? This can be fixed with proper resource detection. I was able to replicate this issue on my side when I was running a sample spring application with no resource detection. However, after adding proper resource detection, I was able to see a resource_attribute named I have a Draft PR with a sample that showcases this with a manually instrumented Spring application. TL;DR - Proper resource detection is required to prevent such errors. This example shows resource detection in action. @ZiggyZaggyy I haven't tried this personally with Cloud Functions yet, but the root cause should be the same - did you set up proper resource detection ? Looking at your report, differentiating between the different instances of Cloud Function was the right intuition, but I'm not sure what value I will also try running a sample in Cloud Function once to make sure it's not something else. Let me know if this helps. |
@ZiggyZaggyy, regarding Cloud Functions, the reliable way to get metrics out as per the current recommendations is using log-based metrics. I did some experiments with Cloud Functions and with proper resource detection, you should be able to avoid However, This is because when using exporter with Cloud Functions, you would need to force flush prior to exiting, since you may not have CPU allocated to perform the export once you exit the function. Currently, the minimum minimum resolution supported by Cloud Monitoring is 10 seconds. Theoretically, this means that if you attempt to write metrics more than once in under 10 seconds, you would run into this issue. This is why the current recommendation is to use log-based metrics. Let me know if this makes sense. @dtd2512, I have updated the metrics example you were referring to with manual resource detection in #370. You should refer to that as well sample added in #368. |
Caveats when using SDK exporters in Cloud Run Service / JobsIt is important to note that while the SDK exporters do work in Cloud Run environment if resource detection is setup properly, it is not the recommended approach. The current recommendation for Cloud Run is to use the OpenTelemetry Sidecar approach. Why the SDK exporters work ?Using SDK exporters in Cloud Run works, but it relies on reporting the Cloud Run Monitored Resource as Generic Task. This reporting is automatically done by the GCP Resource detector and it is done because the |
Closing this issue, please feel free to reopen in case the comments above do not resolve the query or if there are any follow-up questions. |
For Posterity: Here is an example of exporting metrics to Google Cloud from a Google Cloud Functions environment #377 |
Hi @psx95, After testing with 100 CloudRun instances simultaneously, the metrics I setup were exported perfectly to GCP Monitoring without any errors 💯 |
Hi, I am using the exporter in CloudFunction, and I'm seeing the following two errors showing up in the log, particularly after any new deployment update to the CloudFunction:
One or more TimeSeries could not be written: Points must be written in order. One or more of the points specified had an older start time than the most recent point
orOne or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric.: generic_node{namespace:,node_id:,location:global} timeSeries[0,1]
The rough idea of my metrics and exporter configuration is as follows:
Cloud function is scheduled to run every minute and it's guaranteed to finish within that one minute. In fact in most cases it finishes within the first several seconds of starting. But the scheduled metricReader apparently goes off on its own thread every 20s even after the main program finishes its job it seems.
The errors happen most frequently within the first 15 mins of a new deployment. Typically it will be thrown about 3 or 4 such exceptions during its execution, 3 of them from scheduled MetricReader every 20s within the 60s execution window before next execution takes over, and 1 more from the
forceFlush()
method I call from shutdown() inside main execution function body. Sometimes, after running for a while, the exception frequency will be reduced to only once per execution, that once being from theforceFlush()
method before shutdown.I'm not sure if it exactly helped anything, but putting more manual resource attribute key configuration as below didn't exactly drive exceptions away but was consistently making it at most 1 exception per execution on
forceFlush()
whereas previously sometimes that frequency got reduced sometimes it didn't after a while.Also the error type seems to be alternating between
One or more points were written more frequently than the maximum sampling period
andOne or more of the points specified had an older start time than the most recent point
depending on new deployments (maybe they were related to some slight change in metric setup/dependency configuration but probably not, can try to isolate more if that may be a useful troubleshooting route)If I change the periodic metric reading interval from 20s to 120s, I won't see the above exceptions any more. Because my cloud function is bound to finish within one minute, setting to 120s means only during forceflush() is metrics getting emitted. So it seems like the exceptions are due to conflicts among scheduled metric reporting runs and forceflush()? I'd think exporter would take care of multiple consecutive runs of emitting metrics on the same name and instance id? (but it looks like every call always invokes createTimeSeries and according to google's doc you can't do this multiple times with same name on same instance)
As to the metric itself it does look getting published just fine as I compare it side by side with my logging-based metrics and they largely align, even during executions with 3,4 exceptions. So I'm not sure if there are actual metric loss, and of course the 2 won't exactly align due to accounting difference or some general background loss so I don't know if those exceptions could account for any very minor discrepancies but suffice to say the difference is very minimal if any.
Other than following the example usage in the MetricExporter I can't find any more examples to help me understand the root cause of such exceptions. Internet search would typically point to using Cloud Monitoring API itself or about confusing multiple instances etc but in our case it's just one Cloud Function one instance, no overlapping, with even my attempt above to specify execution-specific instance.id on resource. Hope you could help with some insights, thank you!
P.S. I found this issue attempting to add unique identifier to Java SDK but it got abandoned because they thought setting
service.instance.id
to SdkMeterProvider would be equivalent. As shown above I tried that but it didn't make the exception go away.The text was updated successfully, but these errors were encountered: