-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics semantic conventions for timed operations #657
Closed
justinfoote
wants to merge
16
commits into
open-telemetry:master
from
justinfoote:metrics_semantic_conventions
Closed
Changes from 5 commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
f63a729
Add general metrics semantic conventions
justinfoote a73803e
Change duration metric to a `ValueRecorder`
justinfoote f130a79
Specify duration metrics should be in milliseconds
justinfoote ffa2640
Replace the three timed operations metrics with a single ValueRecorder
justinfoote 6ed45ca
Clarify that the duration metric semantic conventions refer to the Va…
justinfoote a3ff441
Update metric-general.md
justinfoote bd1dda1
Update metric-general.md
justinfoote 5bff9b8
Update metric-general.md
justinfoote 753de65
Add clarification of span-based metric categories
justinfoote 9b5446e
Fix markdownlint errors in metric-general semantic conventions
justinfoote 4f71365
Merge remote-tracking branch 'upstream/master' into metrics_semantic_…
justinfoote 7417406
Fix links in metrics-general semantic conventions
justinfoote 3182288
Fix typo in specification/metrics/semantic_conventions/metric-general.md
justinfoote c2283b6
Reduce coupling between metric semantic conventions and sampling API
justinfoote 8ade2de
Update specification/metrics/semantic_conventions/metric-general.md
justinfoote c92d8f5
Update metric-general.md
justinfoote File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
33 changes: 33 additions & 0 deletions
33
specification/metrics/semantic_conventions/metric-general.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# General | ||
|
||
The conventions described in this section are generic; they are not specific to a | ||
particular operation. | ||
|
||
## Summarizing timed operations | ||
|
||
Timed operations that may be more fully described by Spans can be summarized by a single [`ValueRecorder`](../api.md#valuerecorder) recording duration. The instrument name should be prefixed with the Span category and kind. | ||
The resulting name should look like `{category}.{span.kind}.duration`. See [examples](#examples) below. | ||
|
||
For each operation being recorded, the duration of the operation **in milliseconds** should be recorded onto the `duration` instrument. | ||
|
||
#### errors | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The `duration` metric should include an `error` label indicating whether the operation failed. The value of this label should be `true` if the operation ended in a failed state. | ||
|
||
### Examples | ||
|
||
A simple HTTP service would create the metric: | ||
|
||
* `http.server.duration` | ||
justinfoote marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
If this service also calls out to other HTTP services, it would create the additional metric: | ||
|
||
* `http.client.duration` | ||
|
||
If this service also queries one or more databases, it would create the additional metric: | ||
|
||
* `db.client.duration` | ||
|
||
All of these metrics would include labels | ||
* `error=true` | ||
* `error=false` |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that these metrics should be reported in addition to spans and thus duplicate data (since spans already report the operation's start and end timestamp and thus also wall clock duration) or instead of spans?
If in addition, is this meant to provide inexpensive 100% coverage for the duration only when spans would otherwise be sampled?
Either way: Please clarify the intent more clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is actually duplicate data, since spans are, in general, sampled, as you point out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent point. As @jkwatson points out, this is not intended to be duplicate data because spans are sampled, and the metrics generated from this instrument would not be.
I had intended (but clearly not actually written) that every recorded span would contribute to the duration metric, regardless of whether it is sampled.
I had tried to avoid entangling this spec too much with Spans, but I'm thinking now that it would be clearer if I made this more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a configuration question. If the user wants metrics instead of spans, they may turn off span reporting and leave metrics reporting enabled. The spans will still be seen by the SDK for the purpose of generating these metrics, but spans wouldn't be exported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we are drifting into configuration a bit here.
I started writing this up this morning, but I'm struggling with the wording. Here's my thought process:
If the span.isRecorded is true and the user has configured that metrics should be generated for the span's category and kind, then the duration must be recorded onto a ValueRecorder with the name
{span.category}.{span.kind}.duration
(regardless of the value of span.sampled).Also, I'm feeling like we need a justification section here, to explain why we'd want metrics that in some cases carry redundant data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a related question, which maybe falls into the scope of this document. In #381 there is the idea that the Tracing SDK should have a configurable way to automatically synthesize metric events corresponding to the span's duration. In that case, I would suggest that the metric labels should be exactly the same as the span attributes, the resource value should be the same (i.e., the corresponding Trace SDK's resource). The instrument kind should be ValueRecorder.
Now, for the controversial part: I think the metric name should equal the span name, which is to say that I think the naming conventions described for metric instruments (open-telemetry/oteps#108) should be adopted for spans as well. This convention addresses the concern about high-cardinality span names that is often discussed in relation to tracing.
(There, I've said the controversial thing I've been meaning to say!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, questions about "Why generate duplicate events" are missing the point, I think. We can synthesize these metric events in the collector on seeing a span (because we have conventions), and by the time the data is exported into a traditional metric system it is no longer duplicate. We need these conventions because metrics data formats are different from span data formats, it shouldn't imply any form of duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinfoote I think we can close this after adding a sentence somewhere to state "These conventions are meant both to enable automatic metric events from spans as well as for application-level timing that may be considered too brief to report Spans."