-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for local spans in tracing #75
Conversation
38560e4
to
bb6ba1f
Compare
'test-span-id', | ||
None, | ||
0, | ||
'test') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing a newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
'test-component', | ||
) | ||
|
||
def test_init_local_component(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests have the same name, is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, copy/pasta error. thanks for catching!
General question: why treat local spans as a completely different thing altogether? the difference is in annotations only, right? should client spans made inside a local span be children of the local span? Should graphite metrics etc. apply to the spans too (via the regular span infrastructure)? 💅 for discussion |
Along with the lack of server/client-specific annotations (cs, cr, ss, sr), the local span binary annotation with component and operation names need to be added directly. There's also a slight metadata difference with the
Yes they should for purposes of modeling and visualization. I can't say for sure, but openzipkin/brave#114 is leading me to believe weird things will happen if we have client spans in local ones without explicitly setting the local span as the parent of the client span. I had only considered local spans as being used for non-client-call blocks of code, but this case should be supported. This means we'll need to do a bit more work to support this entirely (local spans without client requests inside them will be fine as things are right now) since server spans are used directly to make child spans in Baseplate contexts. Initial thoughts are that we'd need to allow all Span objects, not just ServerSpans, to make child spans, and then pass along the notion of being in a local span as opposed to just the server span to contexts without breaking the API too badly.
Not sure what you mean by this, could you provide an example? |
💇 |
🐟 lgtm |
Yeah, that makes sense here. It ends up making sense in the metrics case (below) too since we'd likely want those in a separate timer namespace (not
Yup. Agreed. Before having local spans, there was no reason to have multi-level span ancestry in the same service since all spans were considered remote. But if we have local spans, it seems like we should allow for nesting; if you don't want to do this right now we can punt it and make it happen during the near-future structural improvements push.
Right now, anything wrapped in a span gets a timer metric sent to graphite automatically via the MetricsObserver stuff. It'd be nice to have the metrics observer support this new span type so we get automatic timers for local spans as well. 💅 to continue discussion. feel free to tell me you want to move ahead like this since I think we'd only be making additive changes to the public API. |
Thanks for the feedback and explanations - I think it's straightforward to get everything in, and I'd rather the next Baseplate release have all of that functionality included. To be continued... |
7ec23cc
to
93ff06a
Compare
"I think it's straightforward to get everything in" one month later.... Major changes for supporting nested local spans:
This is a lot of surgery, more testing is needed. Also I might have missed a simpler way to handle all this, open to any and all suggestions! |
💇 opening this up for initial feedback as I continue testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good. I've added some comments in various places; it's a tough problem to build this right so I'm not convinced of any of my own arguments here and would love to debate the pros and cons.
The base Span class can now create local and child spans. I didn't explicitly subclass anything but might add some attributes to signal if a span is a local or child span so that it's explicitly not possible to create a local span from a child span. Open to thoughts here.
I'd definitely be in favor of making it impossible to accidentally build a bad tree of spans that makes no sense to zipkin.
Spans now require as an argument and track the context in which they were created. This is mainly to simplify the make_local_span call and not require the caller to pass in the context at creation.
Totally agreed on this decision. Those constructors should be internal to baseplate anyway.
Open q: Is it worth doing the request context wrapping in the framework integration code or just have it in the server span creation part and make it known that the context returned from a make_local_span request is a wrapped version of the "native" context object?
I think it'd be better to have server/local spans take a raw context object and wrap them; no need for the framework integrations to know about wrapping and one fewer thing to branch on in the constructors.
How about the following class hierarchy
- Span (doesn't store context, doesn't have make_* methods)
- LocalSpan (has make_child, make_local_child, also carries context)
- ServerSpan (same as LocalSpan, but subclassed to be more obvious for things like the sqlalchemy client and for future flexibility)
- LocalSpan (has make_child, make_local_child, also carries context)
baseplate/context/__init__.py
Outdated
@@ -53,3 +53,7 @@ def __init__(self, name, context_factory): | |||
def on_server_span_created(self, context, server_span): | |||
context_attr = self.context_factory.make_object_for_context(self.name, server_span) | |||
setattr(context, self.name, context_attr) | |||
|
|||
def on_local_span_created(self, context, span, component_name): | |||
context_attr = self.context_factory.make_object_for_context(self.name, span) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this would probably confuse the SQLAlchemySessionContextFactory: https://github.com/reddit/baseplate/blob/master/baseplate/context/sqlalchemy.py#L102 as it'd close the session at the end of a local span before the server span had finished.
We probably need to look at the type of the span (ew?) and only do the session closing on server spans or something else smarter there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we'd be creating a new session when a local span is created ( https://github.com/reddit/baseplate/blob/master/baseplate/context/sqlalchemy.py#L101 ) and a new observer to go with it to close it, this shouldn't inadvertently close the server span's session I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with my changes all of the references to server_span
and ServerSpan
in the context class definitions should be Span
since objects for contexts won't just be created for server spans anymore.
class WrappedRequestContext(object): | ||
def __init__(self, context, trace=None): | ||
self._context = context | ||
self.__dict__.update(self._context.__dict__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why copy over the dict? Doesn't this make the __getattr__
do nothing essentially? I think doing just the getattr would be cleanest since it keeps the wrapper free of any stuff in upstream stuff (e.g. does pyramid's request have a clone
that would shadow WrappedRequestContext
's native clone?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was mainly to support passing in the wrapped context to existing framework integrations (which make some direct calls that were troublesome). I think I understand enough of how all of this is working now to no longer need this + move to only having wrapped contexts in baseplate components.
baseplate/core.py
Outdated
@@ -317,19 +333,44 @@ def __exit__(self, exc_type, value, traceback): | |||
else: | |||
self.finish() | |||
|
|||
def make_local(self, component_name, op_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i keep dithering on if this should be its own method or just a boolean flag on make_child
. that'd mean we'd need to figure out a new signature for make_child
which just returns the new context or something and you have to get at the span from context.trace
? whatcha think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are enough differences between the two types that its own method is warranted. Child spans and local spans are more sibling types than subtypes, and I think using the boolean flag implies local spans are a specific type of child span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed my mind. More explanation to follow in commit
baseplate/diagnostics/metrics.py
Outdated
@@ -48,6 +48,10 @@ def on_child_span_created(self, span): # pragma: nocover | |||
observer = MetricsSpanObserver(self.batch, "clients." + span.name) | |||
span.register(observer) | |||
|
|||
def on_local_span_created(self, context, span, component_name): # pragma: nocover | |||
observer = MetricsSpanObserver(self.batch, component_name + "." + span.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the issue with the sqlalchemy guy, maybe we can just rely on checking types of spans for generating the namespace? then we probably don't need a separate observer method for just local spans which frees up the class hierarchy a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, updating types now to see how it all feels.
return getattr(self._context, attr) | ||
|
||
def clone(self): | ||
new_context = copy.copy(self._context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we actually want to copy the underlying object ever? or just keep referencing it since we're just a neat wrapper.
i'm concerned that we'd do something like
with request.trace.make_local("...") as (local_request, local):
local_request.response_code = 302
and that would get lost because we were copying the underlying request object instead of just referencing.
(it'd probably never happen so dumbly in real code, but i could see injecting a local span around a block of code that passes a request object around various function calls getting messed up)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this was just me running circles around myself. I was worried about the flip side -> inadvertent changing of underlying context attributes on accident. I think it's better to keep the reference to existing context object and do some logging every time we access the underlying context instead of changing our wrapped attrs or adding new ones.
fd30034
to
b92742b
Compare
Updates for this revision pass:
Having
Context object factories still need some variable renaming; |
💇 tests/comments are needed, would appreciate feedback on changes as that's happening! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a big step in the right direction, just a few little things left but overall structurally looks good. I like the new class hierarchy.
Having ServerSpan subclass LocalSpan seemed a bit weird, but I didn't want to overcomplicate the hierarchy and logically it technically works -> the server span can be seen as a top level local span for the server. In the future this might need to be split out a bit more but I'm okay with this.
I totally agree with this and don't even think it's weird.
baseplate/context/sqlalchemy.py
Outdated
session = Session(bind=engine) | ||
server_span.register(SQLAlchemySessionServerSpanObserver(session)) | ||
span.register(SQLAlchemySessionSpanObserver(session)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think making a new session for local spans would be pretty confusing, particularly if they rollback or whatever at the end of the local span which might be pretty unexpected behaviour. It also makes local spans much more heavyweight of an operation if you have sqlalchemy set up. What do you think of this happening only for server spans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see your point. Definitely easy to shoot yourself in the foot if you don't know that local spans create another session. Better to stick to one request -> one session.
baseplate/core.py
Outdated
"""Return a child span representing an outbound service call.""" | ||
|
||
class LocalSpan(Span): | ||
# pylint: disable=invalid-name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, what's the invalid name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't think there's one. This was copypasta'd from the definition in Span
, but I just removed and ran linting and there wasn't complaining. Removing.
baseplate/core.py
Outdated
span = LocalSpan(self.trace_id, self.id, span_id, self.sampled, | ||
self.flags, name, context_copy) | ||
if component_name is None: | ||
raise SpanError("Cannot create local span without component name.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueError
should be fine here
Raised when a built-in operation or function receives an argument that has the right type but an inappropriate value, and the situation is not described by a more precise exception such as IndexError.
what is component_name
anyway? (it'd be good to have param docs here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call on ValueError
. Those docs got accidentally removed while shuffling things around, adding back in. To answer here, the component_name
is a secondary namespace of the local span. The span name itself should be the specific operation, but you might want to query similar operations under different namespaces ("components"), e.g. if you had "query" named spans for both "cassandra" and "mysql" components. http://javadox.com/io.zipkin.java/zipkin/1.1.4/zipkin/Constants.html#LOCAL_COMPONENT
|
||
def __setattr__(self, attr, value): | ||
if attr in self._context.__dict__: | ||
self.logger.debug("Assigning attr=%s of wrapped request context.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a bit spammy for running apps in debug mode. Do you think it's generally useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the case where we're logging for changing the wrapped request context will definitely be spammy (since we assign a ton of context objects when a new wrapped context gets created for local spans), removing that one. The reason I added the one for setting attributes of the underlying context is to provide explicit messaging in case developers are inadvertently messing with the context itself when they want to manipulate WrappedContext attrs and trying to debug whether or not that's the case.
baseplate/core.py
Outdated
if component_name is None: | ||
raise SpanError("Cannot create local span without component name.") | ||
span.component_name = component_name | ||
context_copy.trace = span |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like context_copy
isn't actually going anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's getting passed into the span (https://github.com/reddit/baseplate/pull/75/files/b92742b01897b2921a5ae78f065b5ee0dab6a8b9#diff-e23a2f0193db987b0509d66964675524R362) which is how the copied context will be accessed within the span block.
baseplate/context/__init__.py
Outdated
@@ -53,3 +56,8 @@ def __init__(self, name, context_factory): | |||
def on_server_span_created(self, context, server_span): | |||
context_attr = self.context_factory.make_object_for_context(self.name, server_span) | |||
setattr(context, self.name, context_attr) | |||
|
|||
def on_child_span_created(self, span): | |||
if isinstance(span, LocalSpan): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is span
here the child span? if so can it be called child_span
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -223,6 +228,43 @@ def record(self): | |||
self.recorder.send(self) | |||
|
|||
|
|||
class TraceLocalSpanObserver(TraceSpanObserver): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If in the future we want to create a new type of observer how do we decide whether to make a completely new class like this vs. just updating on_child_span_created
like in MetricsServerSpanObserver
? Is this separate class just needed here because traces/spans are so coupled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This separate class is needed because the serialization for tracing these local spans is different from the base spans (which I think you're alluding to with your second question). In general you'll want to follow the on_child_span_created
path. These observers are strictly for new types of spans, and only if there's some weird serialization that's needed for tracing.
💇 getting there! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fantastic! I think this covers all my concerns. Thanks for all your work on this rather complex rework. We should be pretty cagey when testing it out; activity service volunteers to be tribute if you need one.
Two non-blocking nits.
baseplate/context/__init__.py
Outdated
from ..core import BaseplateObserver | ||
from ..core import ( | ||
LocalSpan, | ||
BaseplateObserver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: trailing comma please (or just put 'em on one line which is fine if it's short enough)
baseplate/core.py
Outdated
which is then split up into operations which could each be wrapped | ||
in local spans. | ||
|
||
:param str name: Name to identify the operation this span |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i don't think these are supposed to be extra-indented (per pep-257)
c269848
to
b373fd1
Compare
In order to support local span creation and context management in local spans, some major restructuring of the baseplate span system was done: * A codified heirarchy of Span -> LocalSpan -> ServerSpan to allow arbitrary levels of local spans while protecting against nonsensical remote span creation. * Context factory updates so context integrations create new local-span-aware clients. * Introduction of WrappedRequestContext to be able to swap local-span-aware context integrations into the request context.
This adds baseplate support for tracing local in-request spans. It adheres to the "spec" outlined in the following:
openzipkin/zipkin#808
openzipkin/zipkin#821
This interface is intended to be able to generate local spans in various contexts from the main server span. The differences in serialization, use case, and explicit annotations (or lack thereof) were enough for me to lean towards making a new observer and API to specifically handle local spans. This allows us the freedom to customize Baseplate local spans as we see fit moving forward, e.g. force all local spans to carry certain tags, etc.