Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for local spans in tracing #75

Merged
merged 1 commit into from
Mar 27, 2017

Conversation

ckwang8128
Copy link
Contributor

This adds baseplate support for tracing local in-request spans. It adheres to the "spec" outlined in the following:

openzipkin/zipkin#808
openzipkin/zipkin#821

This interface is intended to be able to generate local spans in various contexts from the main server span. The differences in serialization, use case, and explicit annotations (or lack thereof) were enough for me to lean towards making a new observer and API to specifically handle local spans. This allows us the freedom to customize Baseplate local spans as we see fit moving forward, e.g. force all local spans to carry certain tags, etc.

@ckwang8128
Copy link
Contributor Author

👓 @spladug @bsimpson63

'test-span-id',
None,
0,
'test')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a newline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

'test-component',
)

def test_init_local_component(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests have the same name, is that ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, copy/pasta error. thanks for catching!

@spladug
Copy link
Contributor

spladug commented Jan 30, 2017

General question: why treat local spans as a completely different thing altogether? the difference is in annotations only, right? should client spans made inside a local span be children of the local span? Should graphite metrics etc. apply to the spans too (via the regular span infrastructure)?

💅 for discussion

@ckwang8128
Copy link
Contributor Author

ckwang8128 commented Jan 30, 2017

why treat local spans as a completely different thing altogether? the difference is in annotations only, right?

Along with the lack of server/client-specific annotations (cs, cr, ss, sr), the local span binary annotation with component and operation names need to be added directly. There's also a slight metadata difference with the duration being calculated and passed (Zipkin would use the timestamps of the CS/CR/SS/SR annotations to do that server-side for the other span types). There were enough serialization and use-case differences that I felt making a separate observer would be more simple and extensible than pushing that differentiation into the Span object itself. Having all of the local span logic isolated in the custom observer means that if there's future differentiation and customization of local spans vs. other span types it'll be more straightforward to add in.

should client spans made inside a local span be children of the local span?

Yes they should for purposes of modeling and visualization. I can't say for sure, but openzipkin/brave#114 is leading me to believe weird things will happen if we have client spans in local ones without explicitly setting the local span as the parent of the client span. I had only considered local spans as being used for non-client-call blocks of code, but this case should be supported. This means we'll need to do a bit more work to support this entirely (local spans without client requests inside them will be fine as things are right now) since server spans are used directly to make child spans in Baseplate contexts. Initial thoughts are that we'd need to allow all Span objects, not just ServerSpans, to make child spans, and then pass along the notion of being in a local span as opposed to just the server span to contexts without breaking the API too badly.

Should graphite metrics etc. apply to the spans too (via the regular span infrastructure)?

Not sure what you mean by this, could you provide an example?

@ckwang8128
Copy link
Contributor Author

💇

@bsimpson63
Copy link
Contributor

🐟 lgtm

@spladug
Copy link
Contributor

spladug commented Jan 31, 2017

Having all of the local span logic isolated in the custom observer means that if there's future differentiation and customization of local spans vs. other span types it'll be more straightforward to add in.

Yeah, that makes sense here. It ends up making sense in the metrics case (below) too since we'd likely want those in a separate timer namespace (not clients). 👍

Initial thoughts are that we'd need to allow all Span objects, not just ServerSpans, to make child spans, and then pass along the notion of being in a local span as opposed to just the server span to contexts without breaking the API too badly.

Yup. Agreed. Before having local spans, there was no reason to have multi-level span ancestry in the same service since all spans were considered remote. But if we have local spans, it seems like we should allow for nesting; if you don't want to do this right now we can punt it and make it happen during the near-future structural improvements push.

Should graphite metrics etc. apply to the spans too (via the regular span infrastructure)?

Not sure what you mean by this, could you provide an example?

Right now, anything wrapped in a span gets a timer metric sent to graphite automatically via the MetricsObserver stuff. It'd be nice to have the metrics observer support this new span type so we get automatic timers for local spans as well.

💅 to continue discussion. feel free to tell me you want to move ahead like this since I think we'd only be making additive changes to the public API.

@ckwang8128
Copy link
Contributor Author

Thanks for the feedback and explanations - I think it's straightforward to get everything in, and I'd rather the next Baseplate release have all of that functionality included. To be continued...

@ckwang8128 ckwang8128 force-pushed the feature/local-spans branch 2 times, most recently from 7ec23cc to 93ff06a Compare March 2, 2017 23:57
@ckwang8128
Copy link
Contributor Author

ckwang8128 commented Mar 3, 2017

"I think it's straightforward to get everything in" one month later....

Major changes for supporting nested local spans:

  • The base Span class can now create local and child spans. I didn't explicitly subclass anything but might add some attributes to signal if a span is a local or child span so that it's explicitly not possible to create a local span from a child span. Open to thoughts here.

  • Spans now require as an argument and track the context in which they were created. This is mainly to simplify the make_local_span call and not require the caller to pass in the context at creation.

  • Request contexts are wrapped in the Pyramid + Thrift integration level OR at the server span creation level into a WrappedRequestContext object (in cases where full integrations aren't used, like r2) which proxies to the underlying wrapped request context and has the ability to copy itself. The copying is necessary for when we want to attach new local-span-aware clients to a context for use within the span, but want to preserve the outside-of-the-span context elsewhere. Open q: Is it worth doing the request context wrapping in the framework integration code or just have it in the server span creation part and make it known that the context returned from a make_local_span request is a wrapped version of the "native" context object?

  • The ContextFactory class handles local span creation events by attaching a new span-aware client to the provided context, which should be a copy of the outer WrappedRequestContext object.

  • Local-span-aware context objects are returned as part of the make_local_span call; code within the local span will need to enforce using that context.

This is a lot of surgery, more testing is needed. Also I might have missed a simpler way to handle all this, open to any and all suggestions!

@ckwang8128
Copy link
Contributor Author

💇 opening this up for initial feedback as I continue testing.

Copy link
Contributor

@spladug spladug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good. I've added some comments in various places; it's a tough problem to build this right so I'm not convinced of any of my own arguments here and would love to debate the pros and cons.

The base Span class can now create local and child spans. I didn't explicitly subclass anything but might add some attributes to signal if a span is a local or child span so that it's explicitly not possible to create a local span from a child span. Open to thoughts here.

I'd definitely be in favor of making it impossible to accidentally build a bad tree of spans that makes no sense to zipkin.

Spans now require as an argument and track the context in which they were created. This is mainly to simplify the make_local_span call and not require the caller to pass in the context at creation.

Totally agreed on this decision. Those constructors should be internal to baseplate anyway.

Open q: Is it worth doing the request context wrapping in the framework integration code or just have it in the server span creation part and make it known that the context returned from a make_local_span request is a wrapped version of the "native" context object?

I think it'd be better to have server/local spans take a raw context object and wrap them; no need for the framework integrations to know about wrapping and one fewer thing to branch on in the constructors.

How about the following class hierarchy

  • Span (doesn't store context, doesn't have make_* methods)
    • LocalSpan (has make_child, make_local_child, also carries context)
      • ServerSpan (same as LocalSpan, but subclassed to be more obvious for things like the sqlalchemy client and for future flexibility)

@@ -53,3 +53,7 @@ def __init__(self, name, context_factory):
def on_server_span_created(self, context, server_span):
context_attr = self.context_factory.make_object_for_context(self.name, server_span)
setattr(context, self.name, context_attr)

def on_local_span_created(self, context, span, component_name):
context_attr = self.context_factory.make_object_for_context(self.name, span)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this would probably confuse the SQLAlchemySessionContextFactory: https://github.com/reddit/baseplate/blob/master/baseplate/context/sqlalchemy.py#L102 as it'd close the session at the end of a local span before the server span had finished.

We probably need to look at the type of the span (ew?) and only do the session closing on server spans or something else smarter there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we'd be creating a new session when a local span is created ( https://github.com/reddit/baseplate/blob/master/baseplate/context/sqlalchemy.py#L101 ) and a new observer to go with it to close it, this shouldn't inadvertently close the server span's session I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with my changes all of the references to server_span and ServerSpan in the context class definitions should be Span since objects for contexts won't just be created for server spans anymore.

class WrappedRequestContext(object):
def __init__(self, context, trace=None):
self._context = context
self.__dict__.update(self._context.__dict__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why copy over the dict? Doesn't this make the __getattr__ do nothing essentially? I think doing just the getattr would be cleanest since it keeps the wrapper free of any stuff in upstream stuff (e.g. does pyramid's request have a clone that would shadow WrappedRequestContext's native clone?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was mainly to support passing in the wrapped context to existing framework integrations (which make some direct calls that were troublesome). I think I understand enough of how all of this is working now to no longer need this + move to only having wrapped contexts in baseplate components.

@@ -317,19 +333,44 @@ def __exit__(self, exc_type, value, traceback):
else:
self.finish()

def make_local(self, component_name, op_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i keep dithering on if this should be its own method or just a boolean flag on make_child. that'd mean we'd need to figure out a new signature for make_child which just returns the new context or something and you have to get at the span from context.trace? whatcha think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are enough differences between the two types that its own method is warranted. Child spans and local spans are more sibling types than subtypes, and I think using the boolean flag implies local spans are a specific type of child span.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed my mind. More explanation to follow in commit

@@ -48,6 +48,10 @@ def on_child_span_created(self, span): # pragma: nocover
observer = MetricsSpanObserver(self.batch, "clients." + span.name)
span.register(observer)

def on_local_span_created(self, context, span, component_name): # pragma: nocover
observer = MetricsSpanObserver(self.batch, component_name + "." + span.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the issue with the sqlalchemy guy, maybe we can just rely on checking types of spans for generating the namespace? then we probably don't need a separate observer method for just local spans which frees up the class hierarchy a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, updating types now to see how it all feels.

return getattr(self._context, attr)

def clone(self):
new_context = copy.copy(self._context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we actually want to copy the underlying object ever? or just keep referencing it since we're just a neat wrapper.

i'm concerned that we'd do something like

with request.trace.make_local("...") as (local_request, local):
    local_request.response_code = 302

and that would get lost because we were copying the underlying request object instead of just referencing.

(it'd probably never happen so dumbly in real code, but i could see injecting a local span around a block of code that passes a request object around various function calls getting messed up)

Copy link
Contributor Author

@ckwang8128 ckwang8128 Mar 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this was just me running circles around myself. I was worried about the flip side -> inadvertent changing of underlying context attributes on accident. I think it's better to keep the reference to existing context object and do some logging every time we access the underlying context instead of changing our wrapped attrs or adding new ones.

@ckwang8128 ckwang8128 force-pushed the feature/local-spans branch from fd30034 to b92742b Compare March 15, 2017 18:23
@ckwang8128
Copy link
Contributor Author

Updates for this revision pass:

  • Spans are more hierarchically organized, Base Span -> Local Span -> Server Span:
    -> Base Spans represent outgoing request wrapping; the Span class cannot create either child base spans or local spans.
    -> LocalSpans are for local sets of operations within the server request handling pipeline and can create both child base spans (for outgoing requests) or local spans (for sub-division of operations).
    -> ServerSpans wrap an entire request handled by the server and can create both child base spans and local spans

Having ServerSpan subclass LocalSpan seemed a bit weird, but I didn't want to overcomplicate the hierarchy and logically it technically works -> the server span can be seen as a top level local span for the server. In the future this might need to be split out a bit more but I'm okay with this.

  • make_local has been consolidated with make_child with the addition of some flags to specify whether or not a local span is needed. I think this is a better abstraction moving forward as it communicates that the span being created is always a child of the span make_child is invoked on.

  • Context wrapping has been cleaned up -> contexts are wrapped initially in the server span and passed on from there to children. Attribute setting of the core context object is also propagated.

Context object factories still need some variable renaming; server_span doesn't apply to most of the make_object_for_context calls since local spans might also be passed in.

@ckwang8128
Copy link
Contributor Author

💇

tests/comments are needed, would appreciate feedback on changes as that's happening!

Copy link
Contributor

@spladug spladug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a big step in the right direction, just a few little things left but overall structurally looks good. I like the new class hierarchy.

Having ServerSpan subclass LocalSpan seemed a bit weird, but I didn't want to overcomplicate the hierarchy and logically it technically works -> the server span can be seen as a top level local span for the server. In the future this might need to be split out a bit more but I'm okay with this.

I totally agree with this and don't even think it's weird.

session = Session(bind=engine)
server_span.register(SQLAlchemySessionServerSpanObserver(session))
span.register(SQLAlchemySessionSpanObserver(session))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making a new session for local spans would be pretty confusing, particularly if they rollback or whatever at the end of the local span which might be pretty unexpected behaviour. It also makes local spans much more heavyweight of an operation if you have sqlalchemy set up. What do you think of this happening only for server spans?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see your point. Definitely easy to shoot yourself in the foot if you don't know that local spans create another session. Better to stick to one request -> one session.

"""Return a child span representing an outbound service call."""

class LocalSpan(Span):
# pylint: disable=invalid-name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, what's the invalid name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think there's one. This was copypasta'd from the definition in Span, but I just removed and ran linting and there wasn't complaining. Removing.

span = LocalSpan(self.trace_id, self.id, span_id, self.sampled,
self.flags, name, context_copy)
if component_name is None:
raise SpanError("Cannot create local span without component name.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError should be fine here

Raised when a built-in operation or function receives an argument that has the right type but an inappropriate value, and the situation is not described by a more precise exception such as IndexError.

what is component_name anyway? (it'd be good to have param docs here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on ValueError. Those docs got accidentally removed while shuffling things around, adding back in. To answer here, the component_name is a secondary namespace of the local span. The span name itself should be the specific operation, but you might want to query similar operations under different namespaces ("components"), e.g. if you had "query" named spans for both "cassandra" and "mysql" components. http://javadox.com/io.zipkin.java/zipkin/1.1.4/zipkin/Constants.html#LOCAL_COMPONENT


def __setattr__(self, attr, value):
if attr in self._context.__dict__:
self.logger.debug("Assigning attr=%s of wrapped request context.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit spammy for running apps in debug mode. Do you think it's generally useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the case where we're logging for changing the wrapped request context will definitely be spammy (since we assign a ton of context objects when a new wrapped context gets created for local spans), removing that one. The reason I added the one for setting attributes of the underlying context is to provide explicit messaging in case developers are inadvertently messing with the context itself when they want to manipulate WrappedContext attrs and trying to debug whether or not that's the case.

if component_name is None:
raise SpanError("Cannot create local span without component name.")
span.component_name = component_name
context_copy.trace = span
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like context_copy isn't actually going anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's getting passed into the span (https://github.com/reddit/baseplate/pull/75/files/b92742b01897b2921a5ae78f065b5ee0dab6a8b9#diff-e23a2f0193db987b0509d66964675524R362) which is how the copied context will be accessed within the span block.

@@ -53,3 +56,8 @@ def __init__(self, name, context_factory):
def on_server_span_created(self, context, server_span):
context_attr = self.context_factory.make_object_for_context(self.name, server_span)
setattr(context, self.name, context_attr)

def on_child_span_created(self, span):
if isinstance(span, LocalSpan):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is span here the child span? if so can it be called child_span?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -223,6 +228,43 @@ def record(self):
self.recorder.send(self)


class TraceLocalSpanObserver(TraceSpanObserver):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If in the future we want to create a new type of observer how do we decide whether to make a completely new class like this vs. just updating on_child_span_created like in MetricsServerSpanObserver? Is this separate class just needed here because traces/spans are so coupled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This separate class is needed because the serialization for tracing these local spans is different from the base spans (which I think you're alluding to with your second question). In general you'll want to follow the on_child_span_created path. These observers are strictly for new types of spans, and only if there's some weird serialization that's needed for tracing.

@ckwang8128
Copy link
Contributor Author

💇 getting there!

Copy link
Contributor

@spladug spladug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic! I think this covers all my concerns. Thanks for all your work on this rather complex rework. We should be pretty cagey when testing it out; activity service volunteers to be tribute if you need one.

Two non-blocking nits.

from ..core import BaseplateObserver
from ..core import (
LocalSpan,
BaseplateObserver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: trailing comma please (or just put 'em on one line which is fine if it's short enough)

which is then split up into operations which could each be wrapped
in local spans.

:param str name: Name to identify the operation this span
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i don't think these are supposed to be extra-indented (per pep-257)

@ckwang8128 ckwang8128 force-pushed the feature/local-spans branch 4 times, most recently from c269848 to b373fd1 Compare March 27, 2017 18:32
In order to support local span creation and context management
in local spans, some major restructuring of the baseplate span
system was done:

* A codified heirarchy of Span -> LocalSpan -> ServerSpan to allow
  arbitrary levels of local spans while protecting against
  nonsensical remote span creation.
* Context factory updates so context integrations create new
  local-span-aware clients.
* Introduction of WrappedRequestContext to be able to swap
  local-span-aware context integrations into the request context.
@ckwang8128 ckwang8128 merged commit b373fd1 into reddit:master Mar 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants