Make log an extension or rename it? #26

codefromthecrypt · 2016-01-16T14:41:37Z

Each feature we add to OpenTracing comes at a cost. I feel the logging aspect is the heaviest feature, and also the most curious.

At least in Java, almost all tracing projects I have seen have log integration through context properties (ex MDC). This allows the systems to be decoupled, yet allow a query to join them together (ex by span id). It means zipkin doesn't have to also be logs task and visa versa.

Logging apis are hard to get right and while we have a lot of tracing folks in the room, we are notably missing the app designers of log apps like SLF4J etc.

If we had a better named version of 'annotation' (like event) which had no aspiration to be a logging api, I'd feel more comfortable. I really don't want to accidentally inherit logging responsibilities and think having logging apis is a slippery slope to that.

IOTW Zipkin is already difficult to sustain and ancillary minimum features such as logging and attachments worry me. Tell me why I shouldn't be!

bhs · 2016-01-16T21:09:16Z

@adriancole I would argue strenuously for the inclusion of a simple, straightforward logging API that – at a minimum – allows for a per-log-record error bit and optional structured payload. Per some of the things we discussed at the distributed tracing workshop in November, tracing is really just specialized logging. With Dapper we saw many cases where per-span "micrologs" were essential in explaining why a particular span took too long or led to an error (etc).

Integration with existing application logging frameworks is all well and good – and I certainly don't object to it – but in a single process handling high concurrency, process-level logging frameworks rarely have the right (i.e., "fine enough") level of detail to explain what's happening with individual spans. Distributed tracing is special/noteworthy because – with both context propagation and sampling – very fine-grained logging does not lead to signal:noise issues or prohibitive cost.

As for the API itself, I would love to keep it simple... both because that's always a good rule of thumb, and also because more complex logging APIs often suffer from a "combinatorial explosion" of methods... e.g., http://www.slf4j.org/apidocs/org/slf4j/Logger.html .

Do all Tracer implementations need to honor everything in every log record? Absolutely not... but some systems make great use of message and payload logging, and I know from speaking with trace consumers at various credible companies that microlog data is compelling. (How much – or whether – Zipkin supports it is very much Zipkin's business, needless to say... the spec does not say where Log data goes, and /dev/null is a viable option for any impl)

yurishkuro · 2016-01-17T17:23:29Z

I also prefer to keep the logs. In addition to the logging level issue Ben mentioned (and also log sampling),

making information presented by the tracing system dependent on centralized logging deployment is going to create a barrier to entry for people to deploy distributed tracing
multi-platform integration is a nightmare, there are 100s of logging frameworks in different languages, MDC support is far from universal

I am happy with the very simple (message, ..payload) API we have now. The one thing that's missing is a clarify on the formatting language in the message. Do we reuse the syntax common for a given platform, if even there's such thing? Or do we come with our own syntax that all tracing systems will be required (encouraged) to support? I liked Ben's original suggestion of path-based expressions like ${payload.something.something}, but it sounds not very straightforward to support across languages.

bhs · 2016-01-17T20:41:22Z

Sorry if this is off-topic for the issue (well, let's be honest: s/if/that/), but the path-based syntax I initially suggestion comes with two big problems in my mind:

it goes against the grain with the OTAPIs in general insofar as it doesn't do "when in Rome" appropriation of platform-idiomatic patterns... what I suggested is kinda/sorta reasonable in python but doesn't feel familiar in Go.
for what it's worth, the fact that it's ...payload (i.e., that payload is a vararg) is problematic, as the first deref in the path-based expression really needs to be an array index. Which is awkward. (e.g., "${[0].something.something}") We could require that there by zero or one payload args, but that also seems onerous.

yurishkuro · 2016-01-17T21:03:44Z

let's fork the message format question to #30

codefromthecrypt · 2016-01-18T05:37:17Z

I'm unconvinced that logging with formats and attachments is the simplest api for the job. I'm also unconvinced that this will reduce the burden folks already have with logging apis.

I could be convinced, though. I'd like to hear from some non-authors directly (like they speak for themselves). Ex. those who aren't already authoring OpenTracing, and would switch to it because of this.

Would google, netflix, twitter, lyft, naver, soundcloud, appneta, or others from the tracing workshop use this feature? Ack that the three primary authors will.

Would those who are doing integration (as opposed to most libraries built-in house) indeed have less work if we add a couple log methods with formatting and attachments here?

I suspect most decisions will be made before the next tracing workshop, otherwise, I'd offer it up as a discussion topic there.

Regardless, convincing me isn't a gate as this is a historical decision anyway.

codefromthecrypt · 2016-01-18T06:07:51Z

To be concise, I'd love to be able to have an api to log an event, ex "cs", in a timeline, without an api contract that implies it needs to be parsed for tokens, evaluated against a log level, or resolved against attachments.

Closest I can find, in java anyway, is to overload the info method, but without varargs for attachments. This saves an array allocation, but more importantly implies the string should be left alone, as there are no attachments to resolve against. We can in some implementors guide or FAQ, suggest this is how you simply add an event to a timeline.

I'm still quite unhappy about having to drag in log semantics, and with it problems like #30, but anyway those who wish to use OpenTracing anyway can likely succeed with above.

bhs · 2016-01-18T21:39:33Z

@bogdandrutu are you able to do some very basic/quick codesearch stuff within google to get a sense of the number of groups that use TRACEPRINTF and/or its Java (or Python/Go) equivalents? Need not be precise numbers, but just a sense. This stuff has faded from memory a bit, but I think (?) that all StatsRequest logs also ended up in Dapper? I know that those StatsRequest log messages (in dapper) were essential when debugging certain bigtable performance issues, for instance.

(@adriancole: TRACEPRINTF is Dapper's version of the logging API being discussed here)

I also want to re-point to what I said higher up in the thread, as it's important and more of a first-principles argument than a matter of (my) personal opinion... the bit beginning with "in a single process handling high concurrency..."

Finally, @adriancole again: are you advocating for something like span.event("cr") in addition to or instead of a message-oriented API like span.info()? (I do get the sense that you'd like to record cr on the Span timeline somewhere... i.e., something other than the setTag API which is timestamp-free) If so, you mentioned "extension" in the issue title: did you have something in mind for that? (I.e., how would such an extension work from a software-eng perspective?)

dkuebric · 2016-01-18T22:08:36Z

Because of the potential diversity of tracers, this is an interesting thread. @bensigelman makes a compelling argument for a trace as a structured collection of "micro-logs". On the other hand, it seems most existing tracing implementations are mostly concerned with timing and structure. I think both are valid, but potentially different.

I'm more concerned about the usage by instrumentation authors and how that maps to tracers than the exact interface for logging. In order to make sure the API is used in a way that can be effective with all types of tracers, we'll have to be as specific as possible about the intention for data provided. A "logging" API would have the expectations that tracers will treat indexing and access patterns similar to logging models today: throw in whatever strings you want, perhaps human-readable, not necessarily structured for parsing.

On the other hand, a more "X-Trace-style" event has a particular structure that a subset of tracers would then rely on to perform more advanced analysis. For instance, expect that a username or other ID is provided in a particular way so it is easy for the tracer to provide special treatment if it wants to.

The "error bit" suggested above is a bit of a nod to the importance of the latter: it's structured to the point of signaling one type of attribute, but the text itself is unstructured.

I like the idea of adding logging, but I have a fear that it will reduce compliance with a more structured semantic API. It places more burden on post-processing intelligence, for instance potentially needing to handle the formats used by different instrumenters to report the same data across multiple similar frameworks. I admit this could be a problem even with a more structured API system as well, but the ease of logging lends itself directly to it. If we do add logging, it should be with expectations about what data gets log'd vs what gets promoted to semantic key/value status ala HTTP spans. (Call me a semantic structuralist?) Thoughts?

codefromthecrypt · 2016-01-19T01:42:23Z

I just had a chat with @bogdandrutu. He can probably clear this up better, but the summary seems to be that the lazy formatting api at google is scoped to in-process. For example, format string + args is materialized to a string if ever sent on the wire, iotw attachments are never exported.

The OTAPI is a mix, where some aspects are lowest common denominator features, and others, like formatting logging are new (to most implementations outside google). Implementing this logging feature now, in 1.0, is a choice to make, but it is a thick one. Ex. is this just a way to more efficiently address lazy formatted timeline annotations? Or is this an intention to send attachments on the wire?

You can build a lazy formatter on top of an api that doesn't include it. Ex I can write in any existing tracer a weakly referenced formatter. By making this api a part of 1.0, we are front-loading a very big feature that could be deferred. We know we need to record timestamped strings regardless of how, when or if tracers add lazy formatters.

bhs · 2016-01-19T05:34:13Z

I'll argue against the inclusion of lazy-formatting for v1.0 (or v1.1 for that matter). Not because it's not important (it is!), but because:

it's a subtle API to get right, and
it's a performance optimization

I don't mean to suggest that performance isn't imperative for what's ultimately a monitoring API, but just that I would rather try to figure it out once OpenTracing is deployed in multiple high-throughput / low-latency production environments. Then it will be feasible take credible measurements and compare different approaches.

bcronin · 2016-01-19T17:55:19Z

I believe I agree with Adrian (esp. "You can build a lazy formatter on top of an api that doesn't include it"): if the core API supports a basic info(message, payload), it's easy to write convenience libraries of the user's choice based on chaining, printf, vararg-style payload capture, or (with a bit more work) lazy-logging / lazy-formatting.

Bottom-line for me, that allows a lot of the more subtle logging-centric decisions to be pushed outside of nailing down 1.0, which I believe would be a good thing.

bhs · 2016-01-26T05:20:28Z

@adriancole I think we have a consensus on how we're proceeding with this; namely, to get the LogEvent/LogEventWithPayload (in Go parlance) in, then try to figure out if there's a path forward for formatted message strings. Agreed? If so, I would like to close this.

bhs · 2016-02-10T05:56:41Z

(it's been 15 days with no concerns raised -- closing, though if anyone is uneasy about the current direction, please be in touch / reopen / etc)

codefromthecrypt · 2016-02-13T01:27:08Z

Yeap I agree we can close this.

yurishkuro mentioned this issue Jan 17, 2016

What to do about Span.info() / Span.error()? #30

Closed

bhs closed this as completed Feb 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make log an extension or rename it? #26

Make log an extension or rename it? #26

codefromthecrypt commented Jan 16, 2016

bhs commented Jan 16, 2016

yurishkuro commented Jan 17, 2016

bhs commented Jan 17, 2016

yurishkuro commented Jan 17, 2016

codefromthecrypt commented Jan 18, 2016

codefromthecrypt commented Jan 18, 2016

bhs commented Jan 18, 2016

dkuebric commented Jan 18, 2016

codefromthecrypt commented Jan 19, 2016

bhs commented Jan 19, 2016

bcronin commented Jan 19, 2016

bhs commented Jan 26, 2016

bhs commented Feb 10, 2016

codefromthecrypt commented Feb 13, 2016 via email

Make log an extension or rename it? #26

Make log an extension or rename it? #26

Comments

codefromthecrypt commented Jan 16, 2016

bhs commented Jan 16, 2016

yurishkuro commented Jan 17, 2016

bhs commented Jan 17, 2016

yurishkuro commented Jan 17, 2016

codefromthecrypt commented Jan 18, 2016

codefromthecrypt commented Jan 18, 2016

bhs commented Jan 18, 2016

dkuebric commented Jan 18, 2016

codefromthecrypt commented Jan 19, 2016

bhs commented Jan 19, 2016

bcronin commented Jan 19, 2016

bhs commented Jan 26, 2016

bhs commented Feb 10, 2016

codefromthecrypt commented Feb 13, 2016 via email