-
Notifications
You must be signed in to change notification settings - Fork 165
Span Status Improvements (without errors) #133
Conversation
Feedback suggestions Co-authored-by: Armin Ruech <armin.ruech@dynatrace.com>
text/0123-improve-span-status-api.md
Outdated
"gRPC", "HRESULT", "POSIX". The list is end-user-extensible but common status | ||
type names should be standardized. | ||
(Perhaps there is already some standardization we could borrow?) | ||
2. Code - An integer status code. Can be combined with an status message. Either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integer is insufficient. E.g. POSIX codes are symbolic names (like ENOENT) which do not have standardized integer values (e.g. ENOENT might have a different value on Linux than on BSD)
text/0123-improve-span-status-api.md
Outdated
2. Code - An integer status code. Can be combined with an status message. Either | ||
a code or message are required. | ||
3. Message - A string status message. Can be combined with a status code. Either | ||
a code or message are required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a code or message are required. | |
a code or message is required. |
text/0123-improve-span-status-api.md
Outdated
freeform message or a textual status code. Adding a 2nd string to the | ||
StatusData would allow both to be collected side by side. This is another example of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above (#133 (comment)).
text/0123-improve-span-status-api.md
Outdated
- [#432](https://github.com/open-telemetry/opentelemetry-specification/pull/432) | ||
- [#521](https://github.com/open-telemetry/opentelemetry-specification/pull/521) | ||
- [#599](https://github.com/open-telemetry/opentelemetry-specification/issues/599) | ||
- [open-telemetry/oteps#123](https://github.com/open-telemetry/oteps/pull/123) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [open-telemetry/oteps#123](https://github.com/open-telemetry/oteps/pull/123) | |
- [open-telemetry/oteps#123](https://github.com/open-telemetry/oteps/pull/123) on which this OTEP is based | |
- [open-telemetry/opentelemetry-specification#697](https://github.com/open-telemetry/opentelemetry-specification/pull/697) for reporting exceptions |
text/0123-improve-span-status-api.md
Outdated
Although UI creators are free experiment with how the data is presented I | ||
expect most presentations would either be the StatusData alone, or the | ||
StatusData qualified with the StatusType and some separator character. For | ||
example StatusData alone might create names like "FileNotFoundException", | ||
"503", "E_FAIL (0x80004005)", "SyntaxError on line 405: Did you forget a | ||
semicolon?", and "BadQuery". | ||
|
||
Exceptions could have progressive level of detail drilling into messages, | ||
stack traces, inner exceptions, links to source, etc if the exporter serialized | ||
sufficient data but how and whether that occurs is out-of-scope in this design. | ||
example StatusData alone might create names like "503", "E_FAIL (0x80004005)", | ||
"Status Code 12: Unimplemented". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be updated since StatusData
was renamed to Code
and StatusType
to Domain
above and Message
was introduced as a separate parameter.
What is the Error Working Group position on this? Do we remove Status? Do we keep it but improve? Do we have 2 opposing proposals because Error Working Group was not able to come to a concensus on this? I would like to have more clarity on this otherwise I am not sure which of the opposing proposals to actively pursue. The OTEP to remove Status is also submitted: #134 Just to be clear: I am happy if the goal is to discuss 2 proposals in parallel and chose a winning one, just want to understand if that is the intent here. |
We had much more agreement on removing current status code, than on its replacement. So we want to remove Google RPC code bases status before GA, that is what my otep is about. And separately we discuss what should we put instead of it. Either before GA or after. And for that there are several options. One is this PR. Another is open-telemetry/opentelemetry-specification#822 |
I think we need to require having a reasonable replacement for conveying erroneous state in the Span. I believe this is needed before GA. Ideally we will first know what is this replacement before removing the Status. I am also happy if the answer is that we don't need any replacement, but I would like that analysis to also be done before removal of the Status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal definitely solves some of the problems of the current Span.Status. But it leaves one (probably the most) important question open: how to quickly determine if span represents potential failure?
a code or message are required. | ||
|
||
Although UI creators are free experiment with how the data is presented I | ||
expect most presentations would either be the StatusData alone, or the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StatusData
is not defined up to this point. What is it?
If all the data collected by the API is transmitted to the back-end this also | ||
increases the size of transmitted telemetry, but the exporter authors always | ||
retain the freedom to drop or reduce any information they don't believe is | ||
valuable with potentially a slight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cut sentence?
|
||
## Motivation | ||
|
||
Right now OpenTelemetry Status is defined as an enumeration of gRPC status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not gRPC :)) I think I keep saying this. They are Google's codes which are used in gRPC but they are used in http APIs as well that Google expose.
1. Domain - The name of the domain the status data applies to such as "HTTP", | ||
"gRPC", "HRESULT", "POSIX". The list is end-user-extensible but common status | ||
type names should be standardized. | ||
(Perhaps there is already some standardization we could borrow?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Google did the same thing, and at the end they decided to remove this and support only what they called "canonical status" because custom translation were hard to keep track of.
One thing that was very important was the notion of "IsOk" on the Status itself, every domain has it's own codes, but in a lot of time was important to know if it is ok or not :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Google canonical codes are a good idea, and am generally favorable towards Span.status
as it currently exists. The main issue is that we don't have guidance on how to map from different domains onto the google canonical space. If we could provide specification that explains how to translate between domains, I think it'd be easier to make the argument that we should keep Span.status
as is.
@mwear please reopen if you change your mind about this |
We recently discussed OTEP #123 at the error working group and agreed it introduced some good ideas, but they were coupled with error reporting. Exception data was handled by open-telemetry/opentelemetry-specification#697 and the
successHint
inspired thiserror.hint
PR: open-telemetry/opentelemetry-specification#822.There are currently proposals to remove Span.status (see: open-telemetry/opentelemetry-specification#706). If we decide to keep it, this adaption of OTEP-123 could be an improvement worth considering.
See the original OTEP for commentary and rationale.