-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(otel): add opentelemety utility functions #272
base: main
Are you sure you want to change the base?
Conversation
This PR extracts opentelemetry utility functions from my private project and adds them to this project without calling them. It resolves rabbitmq#43 I'd like a broader discussion about whether these should be automatically called by the library where possible, or if they should simply be provided to clients to use if they so wish. I did my best to follow OpenTelemetry semantic conventions as described here https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/, but they are at times ambiguous for rabbitmq-- e.g. is the destination for a message the Queue or the Consumer Tag the message was delivered to. Given the channel based approaches of this library, it is impossible for the library to know the full execution of a consumer. Unless autoack=false, we cannot actually know when to end the span associated with a delivery, so at least in the consumer case, it's probably best to allow the client to manage spans for themselves. We *can* manage spans on the producer side, and at the very least extract span identifiers to include on published headers automatically, and provide utilities for pulling them back out again. My intention with putting this PR up is to move the conversation forward. Because the PR *only* provides private methods (if I left members public please call them out), it can be safely merged while these questions are worked out.
I converted this to a draft. it's getting dinged for unused methods, which was intentional, so maybe best to leave it in draft unmerged until we resolve some of those design questions |
Hey, thank you for taking the time to contribute to this library. I'll respond in-line to the topics in the OP.
Given that this project is a library, it's a great opportunity to provide automatic native instrumentation. That means we should automatically create spans, and inject/extract context where it makes sense.
I agree that some semantics are ambiguous for RabbitMQ. I would advocate to adhere to the specific conventions for RabbitMQ described in this link, and do our best with ambiguities not covered (and document them!).
Given the "subscription" workflow of RabbitMQ (polling/pulling is highly discouraged), I think we can record the attributes of the subscription (and add more if necessary), and inject those into the spans upon receiving messages, and just before forwarding them into the Go channel. This idea needs validation, but I would prefer this over leaving the consumption instrumentation to the users.
I agree 👍 This is a sensible idea. |
For what it's worth, I suggest looking into how OTel was added to the .NET client. There is a LOT of discussion here: |
I'll wire up what I can. I still think there's some open questions on the consumer side.
okay, sounds good.
The problem is that we need a way to transport the spans, and then an idiomatic way for clients to consume them downstream, so if you've called deliveries, err := c.channel.Consume(
queue.Name, // name
c.tag, // consumerTag,
*autoAck, // autoAck
false, // exclusive
false, // noLocal
false, // noWait
nil, // arguments
) we now need to embed a span or a context (go authors say not to embed contexts, so span?) in the delivery. Is that amenable? Each span is probably best treated as a new root span, and as implemented in my draft, gets a link to the publication that created it.
|
I suppose another way to go would be to provide a |
I admit I didnt go through this. I'm sure it's informative but it's not super accessbile to me (yet?). I haven't written any dotnet... |
@@ -1492,7 +1492,7 @@ func (ch *Channel) Publish(exchange, key string, mandatory, immediate bool, msg | |||
/* | |||
PublishWithContext sends a Publishing from the client to an exchange on the server. | |||
|
|||
NOTE: this function is equivalent to [Channel.Publish]. Context is not honoured. | |||
NOTE: Context termination is not honoured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're now using the context for span propagation in-process.
delivery.go
Outdated
// the appropraite headers set. See [context-propagation] for more details | ||
// | ||
// [context-propagation]: https://opentelemetry.io/docs/concepts/context-propagation/ | ||
func (d *Delivery) Span(ctx context.Context, options ...trace.SpanStartOption) (context.Context, trace.Span) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an okay route-- clients can use their own context + span to indicate boundaries of a batch, and then get child spans for each delivery, with each span linked to the publication.
I also provide access to the Link for a delivery in case they want to combine multiple links into one span for their batch (that's what I would prefer in my use case, but that's because I'm defining a naturally batching consumer).
The tradeoff is that without storing additional state, we're relying on the client to tell us the context when we go to ack nack, which could lead to errors.
If we instead store the span on the delivery, we can close it when we ack, after inserting a child settle
span for the ack itself. This has the implication that every consumer needs to settle
the delivery even if their in autoack mode in order for them to see spans in their telemetry info (spans are usually not sent until they are closed). In autoack
mode the settle
method would just close the span, with no further implications at the wire level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we instead store the span on the delivery, we can close it when we ack, after inserting a child settle span for the ack itself
I prefer this approach, TBH. Relying on the user would make this implementation brittle. I'm ok with having a limitation handling autoack
, because that's really not a recommended practice. autoack
is synonym of YOLO I don't care about my data, just GO!
delivery.go
Outdated
} | ||
} | ||
|
||
func (d *Delivery) Settle(ctx context.Context, response DeliveryResponse, multiple, requeue bool) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel deeply ambivalent about this approach, but the alternative would seem to be providing (Ack,Nack,Rject)Ctx
methods. TBH that's probably better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that (Ack,Nack,Reject)Ctx
methods are probably a better alternative, and settle the delivery automagically in those functions.
} | ||
|
||
// extractSpanFromReturn creates a span for a returned message | ||
func extractSpanFromReturn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh I haven't wired the return up yet. Probably gets a similar treatment to Delivery, if that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rabbitmq semconv specs does not mention how message returns should be instrumented, maybe we should open an issue in https://github.com/open-telemetry/semantic-conventions/issues asking for clarification.
I don't think we should add opentelemetry spport directly to the amqp091-go package. It's better to implement the instrumentation in a sub package or separate repo (e.g. goredis instrumentation redisotel and gorm instrumentation go-gorm/opentelemetry), so that users who want to use amqp091-go without otel will not be forced to indirectly depends on otel modules. However, I do think, to support instrumentation, some utilities should be added to the amqp091-go package. for example,
|
I think this is acceptable. We can inject Span, or the necessary attributes to build a span, in the message header or properties.
I like this idea 👍 It's probably easier to reason about using links between spans, than creating a sub-span from a "publish" span. Specially if we consider the use case where a consumer may reject a message and re-queue it. |
I'm ok with having the open telemetry bits in a different package. At the same time, the library should provide automatic instrumentation. I think it's ok to "force" consumers of the library to "depend" on OTEL modules is acceptable, because the API libraries are non-functional/no-op calls without the OTEL SDK. It will be the user's decision to import OTEL SDK to make the API calls functional.
Those suggestions are nice-to-have, but I'm not sure I understand why those utilities are necessary in order to support OTEL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments in the discussions, and above in the main thread.
I'm fine with either approach.
2 is the same thing as this pr's |
} | ||
|
||
func (d Delivery) Settle(ctx context.Context, response DeliveryResponse, multiple, requeue bool) error { | ||
defer settleDelivery(ctx, &d, response, multiple, requeue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my understanding of messaging semconv, (“Settle” spans SHOULD be created for every manually or automatically triggered settlement operation. A single “Settle” span can account for a single message or for multiple messages (in case messages are passed for settling as batches). For each message it accounts for, the “Settle” span MAY link to the creation context of the message.
)
the settle
Span should start before calling Acknowledger.Ack()
etc., and end right after Acknowledger.Ack()
have returned.
if err != nil { | ||
errFn(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errFn
needs to be called regardless of there is an err or not, to properly ends the Span.
if err != nil { | |
errFn(err) | |
errFn(err) | |
if err != nil { |
Maybe also rename errFn
to endFn
to make the intention clearer.
exchange, routinKey string, | ||
immediate bool, | ||
) (context.Context, Publishing, func(err error)) { | ||
spanName := fmt.Sprintf("%s publish", routinKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specs recently changed to The span name SHOULD be {messaging.operation.name} {destination}
.
And, to keep consistency with the example in the specs and .net implementation, messaging.destination.name
attribute should be the exchange.
So, maybe:
destinationName := exchange
if len(destinationName) == 0 {
destinationName = "amq.default"
}
spanName := "publish " + destinationName
...
trace.WithAttributes(
semconv.MessagingDestinationName(destinationName),
semconv.MessagingMessageID(publishing.MessageId), | ||
semconv.MessagingMessageConversationID(publishing.CorrelationId), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think messaging.message.conversation_id
and messaging.message.id
attrs should only be set if non empty, as in .net implementation
semconv.MessagingMessageID(publishing.MessageId), | ||
semconv.MessagingMessageConversationID(publishing.CorrelationId), | ||
semconv.MessagingSystemRabbitmq, | ||
semconv.MessagingClientIDKey.String(publishing.AppId), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AppId
is not the rabbitmq client id, but a application specified per message header.
I think maybe config.Properties["connection_name"]
could be used if set, see https://www.rabbitmq.com/docs/connections#client-provided-names.
Hi folks, I haven't had a chance to circle back on this. Is the consensus at this point that OTEL is an acceptable default choice? @wzy9607 how do you feel about @Zerpet's pushback on moving to a middleware/hooks based approach. @lukebakken any other feedback at this point? |
This PR extracts opentelemetry utility functions from my private project
and adds them to this project without calling them. It partially resolves #43
I'd like a broader discussion about whether these should be
automatically called by the library where possible, or if they should
simply be provided to clients to use if they so wish.
I did my best to follow OpenTelemetry semantic conventions as described
here
https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/,
but they are at times ambiguous for rabbitmq-- e.g. is the destination
for a message the Queue or the Consumer Tag the message was delivered to.
Given the channel based approaches of this library, it is impossible for
the library to know the full execution of a consumer. Unless
autoack=false, we cannot actually know when to end the span associated
with a delivery, so at least in the consumer case, it's probably best to
allow the client to manage spans for themselves.
We can manage spans on the producer side, and at the very least
extract span identifiers to include on published headers automatically,
and provide utilities for pulling them back out again.
My intention with putting this PR up is to move the conversation
forward. Because the PR only provides private methods (if I left
members public please call them out), it can be safely merged while
these questions are worked out.