Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

Introduce carrier format identifiers for parity with other platforms #31

Closed
yurishkuro opened this issue Jul 12, 2016 · 20 comments
Closed

Comments

@yurishkuro
Copy link
Member

yurishkuro commented Jul 12, 2016

This is driven by several concerns:

  1. The implementation to locate the appropriate injectors/extractors based on runtime information of the passed-in carrier type is ugly and inefficient. Having a unique identifier for a format, like opentracing.TextMap in Go, makes the implementation as simple as a map lookup, and remove ambiguity of dealing with potentially multiple superclasses and interfaces implemented by a carrier
  2. It is currently impossible to reuse the same carrier for different formats. For example, the TextMapReader/Writer interfaces in Go can be used for both plain TextMap format as well as for upcoming HTTPHeader format. It is much easier to define another constant for the format than to duplicate the reader/writer interfaces.
  3. One earlier concern was that having both a format and the carrier creates a possibility of type mismatches. The proposal below addresses that at compile time.
interface TextMapReader {}
interface TextMapWriter {}

interface Format<R, W> {}

public class Formats<R, W> implements Format<R, W> {
    public final static Format<TextMapReader, TextMapWriter> TEXT_MAP = new Formats<>();
    public final static Format<TextMapReader, TextMapWriter> HTTP_HEADER = new Formats<>();
    public final static Format<ByteBuffer, ByteBuffer> BINARY= new Formats<>();
}

interface Span {}
interface Tracer {
    <W> void inject(Span span, Format<?, W> format, W carrier);
    <R> Span extract(Format<R, ?> format, R carrier);
}

public void test(Tracer tracer) {
    Span span = tracer.extract(Formats.TEXT_MAP, new TextMapReader(){});
    tracer.inject(span, Formats.TEXT_MAP, new TextMapWriter(){});
}
@bhs
Copy link
Contributor

bhs commented Jul 12, 2016

(Note that this will be s/Span/SpanContext/g soon)

@bhs
Copy link
Contributor

bhs commented Jul 12, 2016

FWIW, I really like this direction. I'm planning to do an RFC for a SpanContext change in Java today and will try to incorporate this direction in the PR.

@bhs
Copy link
Contributor

bhs commented Jul 13, 2016

Alright, I sent out #32 ... comments welcome.

@michaelsembwever
Copy link
Contributor

michaelsembwever commented Jul 13, 2016

I can see the discussion and project has evolved to SpanContext, and i'm a lot more comfortable with it now.

I'm not sold on the ideas of Formats though (not yet at least).

I disagree with (1) being either ugly and inefficient.
Do we have implementing code that demonstrates this?
Can this be just inexperience with coding such things in java?

And (2) just doesn't make sense.

It is currently impossible to reuse the same carrier for different formats.

This is a nonsensical statement. Sorry for being blunt. But how does one re-use carriers when the notion of different formats doesn't explicitly exist? And what is preventing you from re-using carriers for different formats undefined but implicitly existing? Can you explain this a bit better for me please.

@yurishkuro
Copy link
Member Author

I disagree with (1) being either ugly and inefficient. Do we have implementing code that demonstrates this? Can this be just inexperience with coding such things in java?

https://github.com/opentracing/opentracing-java/blob/master/opentracing-impl-java8/src/main/java/io/opentracing/propagation/AbstractTracer.java#L67

        public <T> Injector<T> getInjector(Class<T> carrierType) {
            Class<?> c = carrierType;
            // match first on concrete classes
            do {
                if (injectors.containsKey(c)) {
                    return injectors.get(c);
                }
                c = c.getSuperclass();
            } while (c != null);
            // match second on interfaces
            for (Class<?> iface : carrierType.getInterfaces()) {
                if (injectors.containsKey(iface)) {
                    return injectors.get(iface);
                }
            }
            throw new AssertionError("no registered injector for " + carrierType.getName());
        }

vs.

        public <W> Injector<W> getInjector(Format<?, W> format) {
            return injectors.get(format);
        }

I stand by my qualification that the former is "ugly and inefficient", and not due to inexperience of the author, but due to restrictions imposed by the API.


It is currently impossible to reuse the same carrier for different formats.

This is a nonsensical statement. Sorry for being blunt. But how does one re-use carriers when the notion of different formats doesn't explicitly exist? And what is preventing you from re-using carriers for different formats undefined but implicitly existing? Can you explain this a bit better for me please.

What I meant was "It is currently impossible to reuse the same carrier type for different formats."

Given the above example, if I defined an interface for a carrier, e.g.

public interface TextMapWriter {
    void put(String key, String value)
}

then class-based injector lookup will always return the same injector when passed an instance of TextMapWriter. Both HTTP headers and xRPC headers (for some new RPC framework) can be represented as carriers by the exact same TextMapWriter interface. Yet the behavior of the tracer must be different: HTTP headers impose restrictions on the keys and values, xRPC framework can take arbitrary utf8 strings. So the tracer needs to return two different injectors for the same carrier type, which is impossible in the current API.

Basically, the carrier type defines an API for how data is injected into it. The Format defines what data can be injected.

@michaelsembwever
Copy link
Contributor

(1) This is not code we expose to the user or client implementors. I'm in favour of making the users life simpler for the sake of 9 more lines of code internally. Simplicity here boils down to a trade-off between adding the Format contruct and the caveat that concrete classes will be matched before interfaces.

(2)That's for the clarification. And that argument is valid.
So my question from here is do we actually give users simplicity and less lines of code by introducing Formats as opposed to them just extending/wrapping carriers in such situations?

@yurishkuro
Copy link
Member Author

yurishkuro commented Jul 13, 2016

Agreed on (1).

On (2), what we changing for the users is asking them to pass one extra parameter. Not really "lines of code", but extra typing for sure. What they are getting in return is:

  1. Clarity as to what exactly will happen, given by the Format spec.
  2. Self-documentation of OpenTracing API - the Format constants are typed to r/w carrier formats, and API provides exactly 3 built-in formats, making it pretty obvious to the user what types of carriers they can pass in. Right now it's open-ended <T> void inject(Span span, T carrier), the user needs to look at some documentation to know what objects can be passed in as carriers (and that documentation does not exist because we did not have the notion of standard carriers).
  3. Decoupling between user code and tracer implementation for extended carriers. For example, I was just implementing OpenTracing support in TChannel, and besides the standard 2 formats I needed one more special format to allow user code to retrieve Zipkin-style trace IDs. I used a string constant as the Format, and a plain Map as the carrier. That required no direct dependency between user code and tracer implementation, user code only depends on OT API. If I had to do it via some bespoke carrier interface, then I would've needed to define that interface in the tracer implementation and make user code depend on it, creating tight coupling.
  4. Minor point, but it also brings the API in line with implementations in other platforms.

@michaelsembwever
Copy link
Contributor

michaelsembwever commented Jul 13, 2016

(3) Sounds like enough good reasoning has been provided.
My vote then goes from "-1" to "+0".

If the numbers are behind going with this then do it.

(@bensigelman i'm still in favour of two separate PRs, since this and the SpanContext deserve to be separate atomic commits.)

@bhs
Copy link
Contributor

bhs commented Jul 13, 2016

@michaelsembwever noted, I'm working on it :)

@bhs
Copy link
Contributor

bhs commented Jul 14, 2016

Ok, #32 has the SpanContext stuff without the Format changes. I will try (?) to get another PR built off of that base that reintroduces the Format stuff later this evening; hopefully I'll have time.

@adriancole curious to hear what you think about this. I'm mainly interested in Yuri's point (2) above... namely that inject/extract can be subtle, and self-documenting code is preferable to "maximally concise" code.

The concerns of the Tracer implementor are perhaps a tie-breaking vote if we need one, but as long as it's not going to show up in a profiler I don't really care about the verbosity/awkwardness there... I'm much more interested in how it looks to the caller.

I'm a +0.5 vote in favor of @yurishkuro's proposal (after normalizing for my lack of recent experience in Java).

@codefromthecrypt
Copy link
Contributor

@bensigelman as made in the other PR, Format<?, W> format is a fake. It is like putting a racing stripe on an old civic.. looks cool but doesn't really make it faster.

If a user "chooses" to pass the same format object to both read and write calls, then they have accomplished their goal. There's nothing about Format<?, W> format that adds more type safety than a single-parameter type key. In either case, the user has to choose to do the right thing.

Since this is up to the user anyway, if we must expose an implementation detail like this generically, I'd suggest we focus on the type that's in use by the method, in this case W.

here's one example, but these things abound as many libraries have a need to qualify a type and/or capture a generic type.

https://google.github.io/guice/api-docs/latest/javadoc/index.html?com/google/inject/Key.html

@codefromthecrypt
Copy link
Contributor

More concretely to this use-case, there's no generic type resolution problem, as the standard types have no generic type parameters. To solve the problem of having two different formats (or versions) of the same type, we minimally need a parameter for the qualifier.

Ex.

// I don't care.. let the system decide
<T> Injector<T> getInjector(Class<T> carrierType);

// I expect to use generic parameters, but don't want to define how this is implemented
<T> Injector<T> getInjector(Type carrierType, Annotation/Object/String/etc format);

// I expect to use generic parameters, and I want to define how this is implemented
<T> Injector<T> getInjector(Format<W> format)

We lost the battle on defining Injector, so talking about that api is distracting. Here's the api being discussed.

was
<W> void inject(Span span, W carrier);

now
<W> void inject(Span span, Format<?, W> format, W carrier);

suggestion to drop the racing stripes and keep the old api since it is less complex
<W> void inject(Span span, W carrier, Format<W> format);

@codefromthecrypt
Copy link
Contributor

ps Format<W> only includes a type parameter to capture the generic type (as opposed to a subclass or a raw generic type). If we don't care about this, it can just be a marker (annotation string, whatever)

@yurishkuro
Copy link
Member Author

@bensigelman

I'm mainly interested in Yuri's point (2) above... namely that inject/extract can be subtle, and self-documenting code is preferable to "maximally concise" code.

For me it's the (3) that is a show stopped, it creates a dependency that is impossible to break without either introducing a 3rd library purely for a unique interface class, or making the user code dependent on specific tracing implementation classes (even thought the the same behavior can be supported by multiple tracers).

@adriancole

ps Format only includes a type parameter to capture the generic type (as opposed to a subclass or a raw generic type). If we don't care about this, it can just be a marker (annotation string, whatever)

The reason the format is generically typed is to prevent wrong combinations like

tracer.inject(span, Format<ByteArray>, new HashMap<String, String>())

But generics are just a candy, for type safety. The main points are (2) and (3) in my earlier comment today, which are not addressed by a carrier-only syntax of inject(SpanContext, Object) and extract(Object).

@bhs bhs changed the title Introduce carrier format idnetifiers for parity with other platforms Introduce carrier format identifiers for parity with other platforms Jul 15, 2016
@bhs
Copy link
Contributor

bhs commented Jul 22, 2016

So there are basically four options on the table IIUC (other ideas welcome):

  • The status quo (no explicit format param, impls must use instanceof to infer one, etc
  • Adding a String format parameter... partly to switch off of in impls, but also to make calling code more readable and self-documenting
  • Adding a Format<T>, where T would be a reader-writer carrier like HTTPHeaderCarrier or TextMapCarrier or whichever... this would afford some of the benefits of @yurishkuro's proposal (below) with incrementally less Java generics boilerplate, and
  • Format<R, W> which is the most flexible and type-aware of the explicit-format-param options; but it is also perhaps daunting for the common-case user

Am I missing anything? Anybody want to propose a strong endorsement and/or a veto (and ideally "why" if you didn't already make it clear in the discussion above).

Thanks.

@yurishkuro
Copy link
Member Author

It's worth mentioning that options 2-4 are only different in the method signature. From the end-user point of view, the invocations of the API look exactly the same.

My preference is 4, 3, 2, and a strong No to 1, because it disallows reuse of common types like Map or ByteBuffer for different formats, forcing instrumentation and tracing implementation to agree on some carrier interface that needs to be released as an external dependency for both.

@bhs
Copy link
Contributor

bhs commented Jul 24, 2016

PTAL at #33.

@maxwindiff
Copy link

maxwindiff commented Aug 3, 2016

Hello there, I just started looking at the OpenTracing project and got interested in it.

I wonder why Format is designed as a marker object instead of actually carrying the implementation of injection/extraction logic? For example the TextMap class could actually contain methods for put-ing SpanContext and Baggage into a Map<String, String>. Right now it seems the format is only used as a key to look up the actual implementation inside the tracer binding.

Putting the logic inside Format will makes it easier to extend injection/extraction support to different types of carriers. More importantly, it allows end-users to implement formats for custom carriers, many of which may not be visible to tracer implementations (e.g. new RPC libraries, proprietary carrier types).

A potential problem of concrete Format classes is that they may need implementation-dependent information to do their job. For example if the TextMap format is responsible for serializing a SpanContext into a Map<String, String>, it need to choose what header names to store the traceId and spanId under. I assume OpenTracing is not interested in standardizing these header names.

One solution is to define a standard set of formats that must be provided by all tracer implementations (basically the TextMap and Binary formats stated in the spec). The OpenTracing API could then define some methods (e.g. Tracer.textMapFormat(), Tracer.binaryFormat()) for obtaining an tracer-implementation-specific instance for a standard format.

@bhs
Copy link
Contributor

bhs commented Aug 3, 2016

@maxwindiff thanks for the note, and this is a good question.

TL;DR, supporting things like Tracer.textMapFormat() and Tracer.binaryFormat() clutters up the API surface area, and that indirection is needed (as you said, the Tracer impl needs control over specific keys, header names, etc).

Also, imagine that some new IPC technology called BetteRPC takes the world by storm, and thus gets its own (externally-defined) OpenTracing format... we would have the same situation as with TextMap or Binary, except that we can't add an abstract method to the Tracer interface called Tracer.betteRPCFormat(). With the current scheme, the BetteRPC devs can create a format identifier and define the expectation around the carrier Object, and Tracer implementations can choose to support it (or not) without anyone changing the OpenTracing spec.

@cwe1ss
Copy link
Member

cwe1ss commented Aug 22, 2016

I think it's wrong to have different formats with the same type signature and no other difference. This now means that HttpHeaders and TextMap are only distinguishable by its object reference. Given a format object, the only way to detect the actual format is to do a reference comparison with the static variables. If (for whatever reason) you don't have access to this variable or someone created the object manually, you are lost.

The format interface should implement a public name string and equals() should use it for comparison. This way, implementations can depend on the string instead of an object reference.

In general, I'm not yet convinced by this typed concept but that might be my bad. 😀

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants