-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify logs vs events vocabulary and usage #2863
Clarify logs vs events vocabulary and usage #2863
Conversation
Within a particular `event.domain`, the `event.name` uniquely defines a particular class | ||
or type of OpenTelemetry Event. OpenTelemetry Events with the same `event.domain` / | ||
`event.name` follow the same schema which assists in analysis in observability platforms. | ||
See also OpenTelemetry Event [semantic conventions](./semantic_conventions/events.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will discuss this more in the SIG call tomorrow, but I feel we should just say Events are LogRecords with these two attributes. We happened to keep the event.domain
attribute in InstrumentationScope for the purpose of API ergonomics - to be able to create multiple Events with a given domain. I also assumed that the InstrumentationScope attributes automatically propagate to the LogRecords inside it, that's why it made sense to me to keep the attribute at the scope level. It turns out this assumption is wrong. However, the backends will move the event.domain
attribute into the LogRecord during storage and hence it makes sense to say Events are LogRecords with these two attributes.
Can you also take a look at the changes to ./semantic_conventions/events.md in my PR #2848?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the backends will move the event.domain attribute into the LogRecord during storage
I think you are making an assumption that is not necessarily true for all backends, maybe it is true for some.
|
||
Not necessarily. Just because the data in an external data source is called an "event" it | ||
does not mean it automatically qualifies as an OpenTelemetry Event and must be shaped like | ||
an OpenTelemetry Event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be very helpful to include guidance on when to produce Events vs Logs, and that I think is based on the expectations of the backend/receiver. For example, at AppDynamics, Events and Logs have some differences in their processing pipelines in the backend. Events are relatively limited in number and are produced by sources creating events of a certain importance, whereas Logs are far large in volume with good amount of noise in them. As an example, Kubernetes Events are (or, can be) Events and Kubernetes Pod logs, by default, are Logs.
What should Go Structured Logs map to in OpenTelemetry? IMO, they should remain as Logs as they are used produced by applications for the general purpose of troubleshooting and are likely to contain a good amount of noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Events are relatively limited in number and are produced by sources creating events of a certain importance, whereas Logs are far large in volume with good amount of noise in them.
I don't think that this distinction makes a difference from the data model or API perspective. The same could be said about different log severity levels. The important point, AFAICT, is that it needs to be possible for a processor of log records to identify records that satisfy some criteria to direct them for separate processing. That is separate and distinct from how we refer to the concepts of "opaque bucket of bits" log records and "structured and schema-validatable" log records.
to OpenTelemetry LogRecords (for example in OpenTelemetry Collector) it is reasonable | ||
to shape them like OpenTelemetry Events. In the given example it is reasonable to map | ||
the `name` field from the data source to `event.name` and the `category` field to | ||
`event.domain`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very unlikely that category
is same as domain
. The data source is a better fit for domain
, because this data source is likely emitting events with different values for category
and that's why they have category
as a field/attribute in their Events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the two things we brought up on the SIG making their way into this document.
-
In .NET we have the EventSource class.
EventSource
emits strongly-typed events. Each event emitted is uniquely identified by theEventSource.Name
and the individual event'sEvent.Id
/Event.Name
. Events also have a message template and may have attributes. All events emitted through an EventSource will the same exact message template + attribute structure.So given the definition/description above, I feel like these
EventSource
events would be appropriate to shim into OpenTelemetry as Events? WithEventSource.Name
=event.domain
&Event.Name
=event.name
. -
In .NET we have the ILogger interface.
ILogger
is a framework for emitting structured logs. AnILogger
is bound to acategoryName
. Typically some type name (ex:"MyCompanyName.MyLoggingClass"
).ILogger
has a whole bunch of different overloads which emit log messages.You can log without any type of strong-name:
myLogger.LogInformation("Something happend for {userId}", userId)
. That will emit a structured log through the defined category.You can also log with a strong-name:
myLogger.LogInformation(new EventId(id: 100, name: "AddUserFailure), "Something happend for {userId}", userId)
. That will emit a structured log through the defined category with the strongly-typed name information.What to do in this case is less clear. We could shim the second call with
EventId
supplied into OpenTelemtry Events withcategoryName
=event.domain
andEventId.Name
=event.name
. It is assumed but not enforced by the API that all logs sharing acategoryName
+EventId
will share structure. I think if some backend relied onevent.domain
+event.name
for structure detection, and I usedILogger
withEventId
to strongly type my logs, I would expect reasonably that the backend would be able to detect my structures?
/cc @alanwest
The spec is currently not very clear about what is a log and what is an event. Here is a for example a discussion that shows that there are different ways to interpret what is written in the spec today: open-telemetry/opentelemetry-collector-contrib#14474 This PR attempts to clarify the spec by defining the vocabulary more strictly and by adding a relevant FAQ.
@@ -208,15 +210,106 @@ Wikipedia’s [definition of log file](https://en.wikipedia.org/wiki/Log_file): | |||
>In computing, a log file is a file that records either events that occur in an | |||
>operating system or other software runs. | |||
|
|||
From OpenTelemetry's perspective LogRecords and Events are both represented | |||
using the same [data model](./data-model.md). | |||
From OpenTelemetry's perspective logs and events conceptually are not different. Both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR seems to be focusing on specification/logs
folder only. While we don't want to boil the ocean, I've noticed a continuous source of confusion regarding "span events" vs. "log events". I feel that if we don't clarify span events vs log events, we'll continue to run in circles. I want to ask some stepping back questions:
- Do we even want to use the term
event
in both places, or we want to eventually get rid of one? - Why do we want to have the term "event" here? Would something like "named log record" be good enough / better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svrnm I wish to get your opinion here from the documentation perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho, s/span events/span logs/
(as it was in OpenTracing) would address some of the confusion. Framed as span logs, these entries can still be sub-categorized as logs vs. events just like regular log records. It does create a new confusion between span logs and regular logs, which is fundamentally unavoidable as we're forcing the user to make this distinction instead of handling it behind the scenes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method to add a span event/log is called AddEvents
(based on this across languages, so if we start calling it "Span Log" we run into an issue here:
- Removing addEvent is probably not possible, because it is a breaking change, so adding addLog as an alias would be an option, but that increases confusion
- Calling it "Span Log" and then having a method called "addEvent" is really confusing.
My opinion: From a documentation perspective the key point is being conscious about that potential confusion and give end-user guidance here by being transparent and sharing the thought process that went into things, even if it boils down to "today we would call it Span Log, but we had called it Span Event in the past and we don't want breaking changes, so we have to live with that"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we made a mistake when we renamed InstrumentationLibrary
to InstrumentationScope
in stable API surface and data models and I'd want to avoid making that same mistake again, so I would be strongly opposed to renaming span.AddEvent()
or the Span Event
concept and OTLP message.
I have concern that using the term Event
as a term of art for log records that have specific properties is going to lead to (even more) significant confusion in the long run. I'm not sure I have a good proposal for an alternative at this time, but something that denotes that there is additional information available to help interpret the attributes on the log record would be good. SchematicLogRecord
is a mouthful, but it should effectively convey that there is a schema available that describes how to validate and interpret the semantics of the record. Open to alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this vocabulary update should include something about span events and feel having Log Event, Log Record, Span Event will lead to confusion.
The best alternative I could come up with for Log Event is Report, but I also like SemanticLogRecord, it is clearer what it actually is with no definition required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
1c8fba9
to
1ccee58
Compare
|
||
**Who produces OpenTelemetry Events?** | ||
|
||
OpenTelemetry Events are produced using OpenTelemetry Events API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to limit the source of the events to OpenTelemetry Events API only? Can't we have OTel collector to fetch events from another event source?
@@ -124,10 +126,10 @@ languages have established standards for using particular logging libraries. For | |||
example in Java world there are several highly popular and widely used logging | |||
libraries, such as Log4j or Logback. | |||
|
|||
OpenTelemetry defines [events](#events-and-logs) as a type of LogRecord with | |||
OpenTelemetry defines [Events](#events-and-logs) as a type of LogRecord with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to elaborate a bit on my thoughts in the SIG meeting today...
What we framed things this way:
"OpenTelemetry provides a generic way to define additional signal types. These signals are modeled as a type of LogRecord with specific characteristics...."
Then instead of event.domain
and event.name
we use signal.type
and signal.name
.
Examples include
signal.type |
signal.name |
|
---|---|---|
browser | click | well-known type (i.e. the semantics are defined by the OTel spec) |
k8s | pod_start | well-known type (i.e. the semantics are defined by the OTel spec) |
my_custom_type | some_name | not well-known (i.e, the end user defines the semantics) |
metric | http.server.duration | this one is just to illustrate how in an alternate universe we might have modeled metrics using one data model to rule them all |
Additionally, thinking of things in this way alleviates any need to discuss span events in the scope of this conversation. To me span events are still just simply something that happened during the course of a span at a particular point in time. That is, a span event is not some notion of a nested signal type.
This is an alternate to open-telemetry#2863 ## Problem "Event" is a confusing term that is understood differently by different people in different contexts. We would like to avoid using it if possible. ## Proposal The OpenTelemetry Event is defined as a LogRecord that has specific attributes, namely event.name and event.domain. The event.domain here is one of the important elements. It places the event definitions into isolated buckets (domains). We could use the term "Domenized Logs", but it sounds a bit weird, so I want to suggest renaming the concept of "domain" to "category", without changing the semantics. The specially shaped Log Records can be called "Categorized Logs" and have attributes "log.category" (previously known as "event.domain") and "log.name" (previously known as "event.name"). We will refrain from using the term "event" as much as possible to avoid confusion. I am open to other name suggestions for "Categorized Logs". We just want to make sure to avoid the word "events" and avoid inventing completely new terms, so some sort of adjective + "logs" seems to be the best approach. ## What Changes? - "Event" is renamed to "Categorized LogRecord" - "event.name" is renamed to "log.name" - "event.domain" is renamed to "log.category" - "log.category" is an attribute of LogRecord (instead of previously "event.domain" being a Scope attribute) ## What Did We Lose? "event.domain" previously could be recorded as a Scope attribute and be used for efficient batch processing/routing of logrecords. This is no longer possible, but we can add another such attribute in the future that serves the same purpose (e.g. see the [proposal to add "signal.type"](https://github.com/open-telemetry/opentelemetry-specification/pull/2863/files#r993794504)).
This is an alternate to open-telemetry#2863 ## Problem "Event" is a confusing term that is understood differently by different people in different contexts. We would like to avoid using it if possible. ## Proposal The OpenTelemetry Event is defined as a LogRecord that has specific attributes, namely event.name and event.domain. The event.domain here is one of the important elements. It places the event definitions into isolated buckets (domains). We could use the term "Domenized Logs", but it sounds a bit weird, so I want to suggest renaming the concept of "domain" to "category", without changing the semantics. The specially shaped Log Records can be called "Categorized Logs" and have attributes "log.category" (previously known as "event.domain") and "log.name" (previously known as "event.name"). We will refrain from using the term "event" as much as possible to avoid confusion. I am open to other name suggestions for "Categorized Logs". We just want to make sure to avoid the word "events" and avoid inventing completely new terms, so some sort of adjective + "logs" seems to be the best approach. ## What Changes? - "Event" is renamed to "Categorized LogRecord" - "event.name" is renamed to "log.name" - "event.domain" is renamed to "log.category" - "log.category" is an attribute of LogRecord (instead of previously "event.domain" being a Scope attribute) ## What Did We Lose? "event.domain" previously could be recorded as a Scope attribute and be used for efficient batch processing/routing of logrecords. This is no longer possible, but we can add another such attribute in the future that serves the same purpose (e.g. see the [proposal to add "signal.type"](https://github.com/open-telemetry/opentelemetry-specification/pull/2863/files#r993794504)).
All, after the SIG discussion yesterday I came up with an alternate proposal, please see here #2876 I am going to convert this one to draft as well for now. |
All, I made an alternate proposal that attempts to address the problems we experience with events: #2897 |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
The spec is currently not very clear about what is a log and what is an event.
Here is a for example a discussion that shows that there are different ways
to interpret what is written in the spec today: open-telemetry/opentelemetry-collector-contrib#14474
This PR attempts to clarify the spec by defining the vocabulary more strictly
and by adding a relevant FAQ.