-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics Start-time resource semantic convention #1273
Comments
Another reason for this semantic convention worth mentioning: There is an interest in translating Prometheus Remote Write streams into OTLP streams, where data points with Cumulative start time SHOULD have a start time. Traditional Prometheus reporting does not include this information, thus it uses a reset heuristic for detecting when cumulative series are reset. When there is a |
@jmacd we discussed this briefly during today's SIG Spec meeting. Is my understanding correct that today you would go with the "more traditional" approach of a Maybe we can add both |
Yes. I agree that both specifications are good to have.
It would be nice to establish a semantic connection between these-- that is the suggestion made in this issue originally. If you have are holding a Span object with a |
I've just noticed that the Elastic Common Schema defines this as process.start with a value of e.g. |
I like |
What are you trying to achieve?
There has been some discussion about an Uptime metric. For example, the OpenTelemetry-Go
runtime
instrumentation includes one:https://github.com/open-telemetry/opentelemetry-go-contrib/blob/d1534b84593e617bff9a848454a992a7af49385c/instrumentation/runtime/runtime.go#L122
There is a related request for an
up
metric, meaning something like "was able to produce metrics" in #1078. The uptime metric is different and can be used for monitoring process longevity, for example. There is a question of whether we should standardize a semantic-conventional metric name for uptime.However, note that when we know the process start time, we are able to deduce the uptime provided we know that a process was up. Logically, a combination of the
up
metric and aprocess.start_time
resource combine so that we can synthesize anprocess.uptime
metric.I've encountered a reason to prefer the use of a
start_time
resource and anup
metric as opposed to anprocess.uptime
metric, stated as follows.An
UpDownSumObserver
instrument writes an OTLP Non-Monontonic Cumulative Sum data point, there is a well-defined conversion to Gauge in systems such as Prometheus that do not recognize Non-Monotonic Cumulatives. AnUpDownCounter
instrument writes an OTLP Non-Monotonic Delta Sum data point for the Stateless export configuration, but it is converted to a Cumulative in the default configuration. As long as the state that we maintain in an SDK for Delta-to-Cumulative conversion is never reset, there is no difference to the consumer of an OTLP Non-Monotonic Cumulative Sum (OTLP-NMCS) data point whether it was originally anUpDownSumObserver
or anUpDownCounter
.If we move the Delta-to-Cumulative conversion out of the process (e.g., into a sidecar), then there may be a difference between an OTLP-NMCS that was reset and one that was never reset. We could use the start-time resource to detect this difference. This feels significant because ultimately, if the user is going to view a Cumulative Sum as its current, total value, then we should know whether it's the cumulative from the beginning of the process or cumulative from an arbitrary reset point. In a user-interface for a OTLP-NMCS timeseries, I would consider a generating an error to say that for Non-Monotonic Sums that have been reset you should only use Rate views, not Total views.
Concretely speaking the proposed semantic convention would be named
process.start_time
and would be documented here.The text was updated successfully, but these errors were encountered: