Skip to content
This repository has been archived by the owner on Jan 26, 2024. It is now read-only.

Add support for dynamic loading. #45

Merged
merged 52 commits into from
Feb 22, 2018
Merged

Add support for dynamic loading. #45

merged 52 commits into from
Feb 22, 2018

Conversation

rnburn
Copy link
Contributor

@rnburn rnburn commented Dec 20, 2017

This PR adds support for dynamic loading to allow C++ applications instrumented with OpenTracing to dynamically load in tracing libraries at runtime instead of requiring them to be linked in at compile-time.

The basic idea is tracing libraries will implement the function opentracing_make_tracer_factory that constructs and returns a pointer to a TracerFactory which can be used construct the library's tracers from configuration strings. To dynamically load a tracing library, an application calls the function dynamically_load_tracing_library which takes in the name of the library, calls the library's opentracing_make_tracer_factory function, and returns the TracerFactory.

Returning a factory like this instead of an actual tracer will allow for users to potentially reconfigure tracers by reconstructing a new replacement tracer from a different configuration string.

The opentracing_make_tracer_factory function is declared to have weak linkage so that multiple tracing libraries can be linked in without getting a multiple definition error. I know there was some concern about the portability of weak linkage, but its broadly supported by compilers on unix platforms and something similar can be done on windows with __declspec(selectany). Still, I'm open to other approaches if anyone thinks there's a better way to solve this.

I currently have an example of dynamic loading working with a branch of envoy and jaeger, which can be demoed with this docker application. I also plan on adding a mock tracer library into opentracing-cpp which will support dynamic loading and can be used to add testing coverage for this new code.

Fixes #28

@rnburn
Copy link
Contributor Author

rnburn commented Jan 9, 2018

Something that's come up in the envoy review: envoyproxy/data-plane-api#386 (comment)

Should we standardize on a tracer's configuration format being in json/yaml? Doing so would allow applications to specify the configuration inline.

cc @yurishkuro @isaachier

@yurishkuro
Copy link
Member

@rnburn my 2 cents

"standadizing" is a difficult thing because each tracer's configuration is unique. If envoy needs a place in its own config that can be opaquely passed to the tracer, then it's envoy's business. E.g. it can say "tracerConfig" property is what contains the config, and stop "standardizing" at that point.

json/yaml again sounds like the concern of the containing application, not the tracer. In Jaeger clients we expose a Configuration object that can be initialized from a config file, but doesn't have to be. In fact, in Python/JS that Configuration simply takes a dict / object as constructor argument, which is idiomatic way of passing nested configuration. It's harder with compiled langs like Java/Go, where we expose an object/struct that contains say yaml annotations. But it would be better if there was some neutral configuration API where you can just say cfg.getFloat("sampler.probability"). Unfortunately, this is not something that's natively part of the language(s).

Finally, where standardizing does really make sense (aside from that cfg.getX API), is in passing that configuration to tracer factory, which is necessary for dynamic loading.

So in sumary

  • an API for reading config
  • a tracer factory API that takes the config reader
  • possible yet another factory API that takes static (tracer-level) tags if there is a need to standardize on those

@rnburn
Copy link
Contributor Author

rnburn commented Jan 10, 2018

By standardize, I only meant to specify that the configuration is in json or yaml, not say anything about what's in the configuration.

Currently, the factory class for the tracer takes a string representing the configuration. What form the string takes is the concern of both the application and the tracer since they have to agree on the format.

Ideally, in envoy's configuration you'd write something like

...
tracerConfig:
    opt1: val11
    opt2:
        - val21
          val22
...

and envoy would call the factory method with the string

opt1: val11
opt2:
    - val21
       val22

by agreeing that the tracer's configuration is in something like yaml, it can be easily embedded into the rest of envoy's configuration and at least someone validated (i.e. checked that it's valid yaml) before passing it to the tracer.

@yurishkuro
Copy link
Member

By standardize, I only meant to specify that the configuration is in json or yaml, not say anything about what's in the configuration.

I would split the concern of the config format from the config values. At minimum Tracer factory needs to have an API for a programmatically accessible config, such as cfg.Get("sampler.priority") . Then the actual format of the config (json, yaml, toml, xml, whatever) is no longer of concern for Tracer, but for the application that uses the config file.

@rnburn
Copy link
Contributor Author

rnburn commented Jan 10, 2018

Couldn't the parsing of the configuration be handled by a library (e.g. yaml-cpp) linked in by the tracer? It allows an accessible interface like what you're describing.

@yurishkuro
Copy link
Member

My point is that I would make the parsing of the config separate from providing the config to the tracer. The latter is best done via some API. The parsing will typically be done by the enclosing application anyway, so why make it a tracer responsibility (with all side effects like extra dependencies, different formats, etc.)?

@rnburn
Copy link
Contributor Author

rnburn commented Jan 10, 2018

But that would complicate the API since tracers would need to be constructed from some sort of object that allows it to query the configuration instead of a simple string.

Also, applications need not have to parse the tracer's configuration. If it's specified by a file, it can be read into a string and passed to the tracer library via a string.

@yurishkuro
Copy link
Member

Also, applications need not have to parse the tracer's configuration.

Yes, by in all scenarios that I've seen they do. Config files are typically managed by some infrastructure, which may include merging multiple files, overriding parts of them, doing variable substitution, etc. In some cases configurations are not even files, but some config store, who knows. Tracer doesn't need to know anything about that. The special case of tracer having its own config file is easily included in this approach, the file name just becomes a property passed to the tracer via the overall config mechanism.

But that would complicate the API since tracers need to be constructed from some sort of object that allows it to query the configuration instead of a simple string.

That's the whole point. It's an abstraction that allows for a lot of flexibility. The config object's API can be very simple, a single Get method that takes nested property name (e.g. as done in spf13/viper). If you instead have a simple string instead of config object, then each tracer needs to compile with some parsing library (likely more than one), increasing dependencies hell and limiting portability if the enclosing application happen to use a different config file format or dynamic source. Also, if your API is just an opaque string, potentially read from another file, then what standardization do we even need? The string could be base64 encoding for all we care, only tracer needs to parse it.

NB: the implementations of the config API for different config file formats could be a shared util in open tracing, if there is a demand for it, so that envoy and nginx, for example, would not need to re-implement them (as long as they use supported file formats).

@rnburn
Copy link
Contributor Author

rnburn commented Jan 11, 2018

What about nested configurations or configurations with values other than string? Once you account for that the API gets a lot bigger than the simple key-value lookup you're proposing.

And I don't see linking in a json parsing library as being that burdensome -- it's likely something that tracer's would want to do anyways (Jaeger's already linking to yaml-cpp btw).

@yurishkuro
Copy link
Member

I am proposing that the Get function handles nested properties, Get("sampler.type"). Extending this API to return different data types is debatable.

Linking parsing libraries is a problem imo. One of the important design principles for tracing libraries is to bring as few dependencies as possible. Unless the dependencies are vendored (which sucks to maintain), more dependencies create more dependency hell (diamond deps).

I wasn't following jaeger-cpp that closely, would have argued against linking to yaml lib. No other jaeger lib in other languages includes parsing, it's simply not the responsibility of the tracer to parse all possible formats of config files.

@tedsuo
Copy link
Member

tedsuo commented Jan 17, 2018

@yurishkuro I'd prefer a string serialized into a standard data format for this API. A couple of reasons:

This API is separate from the tracer config API. The later is tracer-specific. It makes sense for that to be a non-serialized object which does not require JSON or other parsing. But this is a generalized transport that connects an application with any tracer, and any kind of option any tracer may want.

Any non-standard API will create difficulty for application developers, as they will have to manually implement the interface and translate all of the tracer options from their former into it. That sounds like a lot of work that can't easily be wrapped up in a shared helper library. Limiting the API to less than JSON - only supporting non-nested string types for example - will increase that burden. Likewise on the tracer side of the fence - the tracer will have more work to do converting from a limited format into whatever format it wants. That problem could be compounded by conversions which have some ambiguity around their string representation. It would be much simpler to let the application re-serialize the tracer portion of the configuration file as JSON and pass it on.

Regarding the issue of tracers bundling in a JSON parser - the parser is only needed when the tracer is included as a shared object, not compiled in directly, so it should be possible to ship a version of your tracer that does not include a parsing library, and wrap it in a version that handles parsing and exposes a TracerFactory. If it turns out we can standardize on something more restrictive than JSON, we could write a small parsing library that tracers could include instead of a generalized JSON parser, and avoid bundling a larger dependency.

Export maps can be used on linux to only expose the Tracer API, and ensure there are no dependency conflicts.

@rnburn
Copy link
Contributor Author

rnburn commented Feb 21, 2018

@isaachier -- does this answer everything for you?

Copy link
Contributor

@isaachier isaachier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rnburn rnburn merged commit 3fb91d3 into opentracing:master Feb 22, 2018
htuch pushed a commit to envoyproxy/envoy that referenced this pull request Feb 28, 2018
This PR adds support for dynamically loading tracers into Envoy. It will allow authors of OpenTracing compliant tracers to support envoy without requiring envoy to bake their tracers into its build. Tracing libraries can be enabled by adding a section like this to envoy's configuration:

  tracing:
    http:
      name: envoy.dynamic.ot
      config:
        library: /path/to/tracing/library
        config: {
          ....
        }
With this change, Envoy will be able to use Jaeger's new C++ tracing library. Some features this will allow that aren't available when using Jaeger as a collector for Zipkin:

Propagation will work correctly with other services using Jaeger without requiring them to make configuration changes.
Jaeger's different sampling strategies can be used.
More efficient tranports are available.
While it might also make sense sometime in the future to offer a version of Jaeger or other tracers built into Envoy, decoupled plug-in support will allow for quicker on-boarding of tracers and more experimentation in the area without making Envoy's external dependencies become unmanageable.

Notes:

One challenge with using dynamically loaded libraries in Envoy is that the standard c++ library is statically compiled in with the option -static-libstdc++. If plugin libraries are linked to the libstdc++ in a standard way, this will create problems as two versions of the standard library will be loaded and calls to their functions from the plugin will be intermingled between them, breaking many of the standard library functions. The approach I took with Jaeger was to have it also use -static-libstdc++ and then use an export map to ensure that it's version of the standard library is used without any intermingling (see dsohowto for background on the approach). It does mean that there will be multiple copies of libstdc++ running, but from what I've heard this can be made to work. Another approach could be to add a config option to link libstdc++ dynamically.
This depends on opentracing/opentracing-cpp#45 being merged in to opentracing-cpp, but I wanted to get a PR in early to get feedback on the approach in case we need to make changes to that PR.
Risk Level: Medium

Testing:
I have a small example that demoes this functionality using docker and a branch of jaeger.

I also plan to add unit test coverage using a mock tracer library that will be added to opentracing-cpp.

Docs Changes:

envoyproxy/data-plane-api#386

Release Notes:

Included in PR.

Signed-off-by: Ryan Burn <ryan.burn@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants