-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define sdk extension component support in file configuration #3802
Define sdk extension component support in file configuration #3802
Conversation
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One NIT, but it's mostly about possible misunderstanding of the spec and is non-blocking.
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, in case of SDK extensions:
- the component provider accepts a file configuration model (a.k.a a yaml parsed tree node for the extension point) as input, in the Create() operation
There is no way to parse yaml into a memory configuration model, or make up completely a memory configuration model programmatically, and then have the Create() operation take this memory model as input.
I am aware that supporting a memory configuration model for SDK extensions has some extra difficulties, like:
- have a generic representation of an "exporter extension" in the memory configuration model (which is built-in), that contains a representation of the yaml fragment for a given (generic, not built-in) exporter
- have the exporter extension Create() operation take this generic representation as input
Still, separating parse from create, as in:
- Parse takes a file configuration as input, returns a memory configuration as output
- Create takes a memory configuration as input, instantiates the proper objects in the SDK extension
is better, because it allows to:
- process and adjust the memory configuration model, if needed, between parse and create.
- build a memory configuration model programmatically, instead of parsing yaml.
A possibility is to also expand the SDK extension point to provide a CreateModel() operation, where the model returned is a subclass (per language idiomatic constructs) of an abstract extension point (for exporters, samplers, etc)
Also, please do not let this stall. |
The component provider accepts a
I don't understand. There is a way to do this. The
It will be up to each language to determine how to references to non-built-in types in the in memory representation fo the model. For example in java, this will likely be a |
Ok, so it seem we both agree on how it works, based of these clarifications, thanks for the details. My comment is on the wording then, as some parts are unclear, at least to me. In the PR, in section I would like some details about what Then, Likewise, in the PR text: "When Create is called to interpret a file configuration model", this is a shortcut.
I think this will add clarity, unless I totally missed the picture. Looking forward to implement this in C++. |
I think this PR tries to give special treatment to plugins when we could handle them in the same way that we handle the rest of the configuration components. AFAIK what file configuration does is:
What I think is being proposed here is:
We could do something different:
The value I see in this is that the approach is general for everything and not particular for plugins and we do the configuration-against-schema checking only once and not multiple times. Also, having create plugin methods would be troublesome for the current Python implementation because there are no particular places where create functions are called (the Python implementation doesn't have any specific "create" functions), but everything is treated in the same way when the configuration tree is being traversed recursively. |
In practice, at least in the java implementation I've been prototyping, there is no loading of a schema. Instead, types are generated statically corresponding to the schema, and the configuration file is parsed to those statically generated types using off the shelf tooling. This detail is important because you suggest having each plugin provide its own schema. Presumably they would provide their schemas using JSON schema for symmetry with the rest of the schema. This would necessitate interpreting JSON schemas at runtime. I think this is a bad idea because in my experience, the implementations of JSON schema are all over the map in terms of quality and the JSON schema draft versions they support. I would not want to rely on the java implementations at runtime. If you buy that, then we don't have a reliable way for components to describe their schema which the parse operation can consume and conform to. This is what lead me down the path of instead having component providers be responsible for enforcing their schema during the "Create Plugin" operation. I think there's not a lot of difference from the users perspective:
Most users will combine parse and create in a single operation, so they'll see an error in either case. An additional argument is that requiring components to describe their schema in some syntax like JSON schema increases the threshold for writing one of these things. Its comparatively simpler to have a contract in which a component provider implementation is just expected to read config out of a generic properties container, and either succeed in returning a component or error. |
Yes, I suggest so but this PR also suggests that, right?
Hmm, I'm a bit confused here. So, it is bad to interpret JSON schemas for the plugins but it is ok to interpret JSON schemas for the normal components?
Hmm, this seems to me like a critical problem that we need to solve first. So, when implementing this in Python I used a third-party library to do the validation of the configuration file against the JSON schema. I don't know about the situation in Java, but in Python I am just trusting that this component does a proper validation. Maybe I should not trust it? If we don't have a reliable way of validating the schema, then what? Should we write our own JSON schema validator? (I really hope not)
Do these 2 sentences above mean that it is possible for a plugin schema to be defined in a way that is not a JSON schema? So we can have a different type of schema for every plugin?
I am confused here. It seems like once a configuration file is checked against the JSOM schema (and passes the checking) there can be additional checks (like checking a string is a proper URL). I was under the impression that the only checking that the configuration component was supposed to do was the checking against the JSON schema. Maybe there is a motivation to catch errors as soon as possible, but now I don't see a clearly defined set of responsibilities for the configuration component. SDK components do checking of input values too so who's supposed to do the checking now? Now, for the Python implementation, I'm not sure how mandatory is to have Create Plugin or Component Provider. This doesn't fit our implementation where we don't have a function for the creation of any specific component but for every leaf in the schema tree. Can the way the component for plugins is created be left to the way that fits each language best? |
The java implementation I've been working on generates static types from the JSON schema. There is no JSON schema artifact present at runtime. At runtime, we use a generic YAML parsing library to parse the incoming YAML to the statically generated types. If the YAML does not conform to the structure of the statically generated types, the binding to the static types will fail.
I'm imagining that a plugin expresses its schema by way of examples, and possibly in docs and internally using JSON schema, but it doesn't need to provide its schema to the implementation of file configuration at all as a part of being a Component Provider implementation. The component provider reads configuration from the I suppose I would be open to adding an optional component provider operation where a particular implementation (like python) could require components providers to describe their schema in JSON schema format, but I think its important that this is not required across all implementations: OpenTelemetry implementations are low level libraries and should generally be extremely discerning about taking dependencies on external libraries. Its entirely reasonable that for a given language, there is not a ubiquitous JSON schema implementation available which supports the 2020-12 draft version we've been using. I think its critical that we don't force these languages to build json schema parsers.
I was referring to the checks being done by the SDK components themselves. I don't think that the create operation needs to perform additional validation beyond those done by SDK components. My point was just that just because a piece of YAML conforms to the schema doesn't mean that we'll be able to generate configured SDK components without error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking comments
…y-specification into config-extension-components
@ocelotl What did you have in mind? This concept will definitely vary in implementation by language and I want to be sure that the language is specific enough to make it the intent clear (i.e. I don't want implementers to come away thinking this is impossible because there's no detail), but open enough to give implementers flexibility. |
Apparently I did something wrong while adding a comment yesterday so here I go again. The Python prototype now supports SDK extension components.
The Python prototype uses a JSON schema checker to check the configuration file. If it does not match, it will raise an exception.
the Python prototype doees merge the subschema of the plugin into the larger OTel schema. The component provider reads configuration from the
The Python prototye does not have a
I agree, I intentionally am not adding any other check to the file configuration component that is not JSON schema checking. |
The Python prototype is using standard entry points which is the normal Python mechanism to dynamically add some external component to a Python project. In this example I created a plugin that adds a sampler that only samples on Mondays but only sometimes. This sampler is an example of a custom component any user may define and add for it to work with OpenTelemetry. We don't have any "Create Plugin" or "Component Provider" per se, but we do have the plugin class |
@ocelotl it looks like the implementation is similar. The Wondering what kind of changes you envision for this PR such that you feel comfortable with your implementation? My point of view is that as long as the net affect is the same, I don't think it matters much that there is uniformity among languages with this API. Maybe I can adjust the normative language to only include very general "MUSTs", but leave the specifics with "MAY" such that implementations can vary as needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the clarifications on Parse().
Nice ✌️ I am now under the impression that your intention for things as component providers and such is to define a mechanism that adds components dynamically. I am ok with this PR right now as it is, I would only recommend to add a clarification that says the exact nature of component providers and the create plugin interface depends on the specific mechanisms every language has for this particular purpose. Thank you for the clarification 👍 |
Will merge Monday 1/29/24 if no further comments. |
…lemetry#3802) Part of incorporating [OTEP open-telemetry#225](open-telemetry/oteps#225) into the specification. Followup to open-telemetry#3744. This defines how file configuration works with custom SDK extension components (Samplers, Exporters, etc). It defines the concept of a Component Provider: - Component providers are registered with the type of extension component they provide and a name. Component providers are registered automatically or manually based on what is idiomatic in the language. - Component providers have a Create Plugin method, which passes configuration properties as a parameter and returns the configured component - When Create is called to interpret a file configuration model, and it comes across a reference to a extension component which is not built-in, it invokes Create Plugin on the corresponding component provider. If no corresponding component provider exists, or if Create Plugin returns an Error, Create returns an error. Prototype implementation in java here: open-telemetry/opentelemetry-java-examples#227 cc @open-telemetry/configuration-maintainers
Part of incorporating OTEP #225 into the specification.
Followup to #3744.
This defines how file configuration works with custom SDK extension components (Samplers, Exporters, etc).
It defines the concept of a Component Provider:
Prototype implementation in java here: open-telemetry/opentelemetry-java-examples#227
cc @open-telemetry/configuration-maintainers