Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define sdk extension component support in file configuration #3802

Merged
merged 10 commits into from
Jan 29, 2024

Conversation

jack-berg
Copy link
Member

@jack-berg jack-berg commented Dec 22, 2023

Part of incorporating OTEP #225 into the specification.

Followup to #3744.

This defines how file configuration works with custom SDK extension components (Samplers, Exporters, etc).

It defines the concept of a Component Provider:

  • Component providers are registered with the type of extension component they provide and a name. Component providers are registered automatically or manually based on what is idiomatic in the language.
  • Component providers have a Create Plugin method, which passes configuration properties as a parameter and returns the configured component
  • When Create is called to interpret a file configuration model, and it comes across a reference to a extension component which is not built-in, it invokes Create Plugin on the corresponding component provider. If no corresponding component provider exists, or if Create Plugin returns an Error, Create returns an error.

Prototype implementation in java here: open-telemetry/opentelemetry-java-examples#227

cc @open-telemetry/configuration-maintainers

@jack-berg jack-berg requested review from a team December 22, 2023 17:36
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 30, 2023
Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One NIT, but it's mostly about possible misunderstanding of the spec and is non-blocking.

specification/configuration/file-configuration.md Outdated Show resolved Hide resolved
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 18, 2024
Copy link
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, in case of SDK extensions:

  • the component provider accepts a file configuration model (a.k.a a yaml parsed tree node for the extension point) as input, in the Create() operation

There is no way to parse yaml into a memory configuration model, or make up completely a memory configuration model programmatically, and then have the Create() operation take this memory model as input.

I am aware that supporting a memory configuration model for SDK extensions has some extra difficulties, like:

  • have a generic representation of an "exporter extension" in the memory configuration model (which is built-in), that contains a representation of the yaml fragment for a given (generic, not built-in) exporter
  • have the exporter extension Create() operation take this generic representation as input

Still, separating parse from create, as in:

  • Parse takes a file configuration as input, returns a memory configuration as output
  • Create takes a memory configuration as input, instantiates the proper objects in the SDK extension

is better, because it allows to:

  • process and adjust the memory configuration model, if needed, between parse and create.
  • build a memory configuration model programmatically, instead of parsing yaml.

A possibility is to also expand the SDK extension point to provide a CreateModel() operation, where the model returned is a subclass (per language idiomatic constructs) of an abstract extension point (for exporters, samplers, etc)

@marcalff
Copy link
Member

Also, please do not let this stall.

@jack-berg jack-berg removed the Stale label Jan 18, 2024
@jack-berg
Copy link
Member Author

the component provider accepts a file configuration model (a.k.a a yaml parsed tree node for the extension point) as input, in the Create() operation

The component provider accepts a properties as a parameter for the Create Plugin operation.

There is no way to parse yaml into a memory configuration model, or make up completely a memory configuration model programmatically, and then have the Create() operation take this memory model as input.

I don't understand. There is a way to do this. The Create operation accepts a configuration parameter, which is an in memory representation of the configuration model which is either parsed from yaml or created programmatically by some other means. If in the course of interpreting the configuration parameter, the Create operation comes across a reference to an extension component, it is expected to translate configuration for that component into a generic properties representation, and call Create Plugin of the respective component provider.

have a generic representation of an "exporter extension" in the memory configuration model (which is built-in), that contains a representation of the yaml fragment for a given (generic, not built-in) exporter

It will be up to each language to determine how to references to non-built-in types in the in memory representation fo the model. For example in java, this will likely be a Map<String, Object>, which we can translate to a properties representation, which provides type safe accessors, before calling Create Plugin.

@marcalff
Copy link
Member

There is no way to parse yaml into a memory configuration model, or make up completely a memory configuration model programmatically, and then have the Create() operation take this memory model as input.

I don't understand. There is a way to do this. The Create operation accepts a configuration parameter, which is an in memory representation of the configuration model which is either parsed from yaml or created programmatically by some other means. If in the course of interpreting the configuration parameter, the Create operation comes across a reference to an extension component, it is expected to translate configuration for that component into a generic properties representation, and call Create Plugin of the respective component provider.

Ok, so it seem we both agree on how it works, based of these clarifications, thanks for the details.

My comment is on the wording then, as some parts are unclear, at least to me.

In the PR, in section SDK Extension Components, the description gives some yaml with an extension point, and then jumps directly to talk about what Create() does.

I would like some details about what Parse() does first, when encountering an SDK extension, such as creating a Properties object in the in memory configuration model, that represents the extension configuration bits.

Then, Create() takes this Properties object and invokes the plugin as described.

Likewise, in the PR text: "When Create is called to interpret a file configuration model", this is a shortcut.

Parse() is called to interpret a file configuration model.
What Create() is invoked with is an in memory configuration model.

I think this will add clarity, unless I totally missed the picture.

Looking forward to implement this in C++.

@ocelotl
Copy link
Contributor

ocelotl commented Jan 18, 2024

I think this PR tries to give special treatment to plugins when we could handle them in the same way that we handle the rest of the configuration components.

AFAIK what file configuration does is:

  1. Load a schema.
  2. Load a configuration file.
  3. Check that the configuration file matches the schema.
  4. Create the objects using the information from the configuration.

What I think is being proposed here is:

  1. Load a schema.
  2. Load a configuration file.
  3. Check that the configuration file matches the schema.
  4. Load plugins which come with their own sub-schema.
  5. Load the configurations from the plugin objects.
  6. Create the non-plugin objects using the information from the configuration.
  7. When a plugin object is created, the plugin object sub-schema is checked and an error is raised if there is a mismatch between the sub-schema and the plugin object configuration.

We could do something different:

  1. Load a schema.
  2. Load the plugin object subschemas.
  3. Add the plugin object subschemas to the right point in the schema to make a new, bigger schema.
  4. Load the configuraiton.
  5. Load the plugin object configurations.
  6. Add the plugin object configurations to the right point in the configuration to make a new, bigger configuration.
  7. Check that the new, bigger configuration matches the new, bigger schema.
  8. Create the objects using the information from the new, bigger configuration.

The value I see in this is that the approach is general for everything and not particular for plugins and we do the configuration-against-schema checking only once and not multiple times. Also, having create plugin methods would be troublesome for the current Python implementation because there are no particular places where create functions are called (the Python implementation doesn't have any specific "create" functions), but everything is treated in the same way when the configuration tree is being traversed recursively.

@jack-berg
Copy link
Member Author

jack-berg commented Jan 19, 2024

  1. Load a schema.
  2. Load a configuration file.
  3. Check that the configuration file matches the schema.
  4. Create the objects using the information from the configuration.

In practice, at least in the java implementation I've been prototyping, there is no loading of a schema. Instead, types are generated statically corresponding to the schema, and the configuration file is parsed to those statically generated types using off the shelf tooling.

This detail is important because you suggest having each plugin provide its own schema. Presumably they would provide their schemas using JSON schema for symmetry with the rest of the schema. This would necessitate interpreting JSON schemas at runtime. I think this is a bad idea because in my experience, the implementations of JSON schema are all over the map in terms of quality and the JSON schema draft versions they support. I would not want to rely on the java implementations at runtime.

If you buy that, then we don't have a reliable way for components to describe their schema which the parse operation can consume and conform to. This is what lead me down the path of instead having component providers be responsible for enforcing their schema during the "Create Plugin" operation.

I think there's not a lot of difference from the users perspective:

  • If a component provider describes its own schema, and parse is responsible for enforcing it, then a configuration that fails to conform to the schema triggers an error in parse. But some parts of a schema won't be able to be enforced until create is called anyway - imagine a component provider with an endpoint argument with a string type. Parse would be able to reject configurations which set the endpoint to a number, but you probably can't tell that a string endpoint is invalid until create plugin is called and tries to parse the string as a URL.
  • If a component provider enforces its schema only during create plugin, then certain simple schema violations make it past the parse operation, but trigger failures during create.

Most users will combine parse and create in a single operation, so they'll see an error in either case.

An additional argument is that requiring components to describe their schema in some syntax like JSON schema increases the threshold for writing one of these things. Its comparatively simpler to have a contract in which a component provider implementation is just expected to read config out of a generic properties container, and either succeed in returning a component or error.

@ocelotl
Copy link
Contributor

ocelotl commented Jan 19, 2024

This detail is important because you suggest having each plugin provide its own schema.

Yes, I suggest so but this PR also suggests that, right?

The set of properties a component provider accepts, along with their requirement level and expected type, comprise a configuration schema.

Presumably they would provide their schemas using JSON schema for symmetry with the rest of the schema. This would necessitate interpreting JSON schemas at runtime. I think this is a bad idea because in my experience, the implementations of JSON schema are all over the map in terms of quality and the JSON schema draft versions they support. I would not want to rely on the java implementations at runtime.

Hmm, I'm a bit confused here. So, it is bad to interpret JSON schemas for the plugins but it is ok to interpret JSON schemas for the normal components?

If you buy that, then we don't have a reliable way for components to describe their schema which the parse operation can consume and conform to.

Hmm, this seems to me like a critical problem that we need to solve first. So, when implementing this in Python I used a third-party library to do the validation of the configuration file against the JSON schema. I don't know about the situation in Java, but in Python I am just trusting that this component does a proper validation. Maybe I should not trust it? If we don't have a reliable way of validating the schema, then what? Should we write our own JSON schema validator? (I really hope not)

Presumably they would provide their schemas using JSON schema for symmetry with the rest of the schema.
If a component provider describes its own schema...

Do these 2 sentences above mean that it is possible for a plugin schema to be defined in a way that is not a JSON schema? So we can have a different type of schema for every plugin?

But some parts of a schema won't be able to be enforced until create is called anyway - imagine a component provider with an endpoint argument with a string type. Parse would be able to reject configurations which set the endpoint to a number, but you probably can't tell that a string endpoint is invalid until create plugin is called and tries to parse the string as a URL.

I am confused here. It seems like once a configuration file is checked against the JSOM schema (and passes the checking) there can be additional checks (like checking a string is a proper URL). I was under the impression that the only checking that the configuration component was supposed to do was the checking against the JSON schema. Maybe there is a motivation to catch errors as soon as possible, but now I don't see a clearly defined set of responsibilities for the configuration component. SDK components do checking of input values too so who's supposed to do the checking now?

Now, for the Python implementation, I'm not sure how mandatory is to have Create Plugin or Component Provider. This doesn't fit our implementation where we don't have a function for the creation of any specific component but for every leaf in the schema tree. Can the way the component for plugins is created be left to the way that fits each language best?

@jack-berg
Copy link
Member Author

If we don't have a reliable way of validating the schema, then what? Should we write our own JSON schema validator? (I really hope not)

The java implementation I've been working on generates static types from the JSON schema. There is no JSON schema artifact present at runtime. At runtime, we use a generic YAML parsing library to parse the incoming YAML to the statically generated types. If the YAML does not conform to the structure of the statically generated types, the binding to the static types will fail.

Do these 2 sentences above mean that it is possible for a plugin schema to be defined in a way that is not a JSON schema?

I'm imagining that a plugin expresses its schema by way of examples, and possibly in docs and internally using JSON schema, but it doesn't need to provide its schema to the implementation of file configuration at all as a part of being a Component Provider implementation. The component provider reads configuration from the properties file. If all the configuration it needs is present and passes any internal validation performed by the component provider passes, then you can say that the properties argument conformed to the schema. If a piece of required configuration is missing, or some internal validation fails, then the properties argument did not conform to its schema.

I suppose I would be open to adding an optional component provider operation where a particular implementation (like python) could require components providers to describe their schema in JSON schema format, but I think its important that this is not required across all implementations: OpenTelemetry implementations are low level libraries and should generally be extremely discerning about taking dependencies on external libraries. Its entirely reasonable that for a given language, there is not a ubiquitous JSON schema implementation available which supports the 2020-12 draft version we've been using. I think its critical that we don't force these languages to build json schema parsers.

It seems like once a configuration file is checked against the JSOM schema (and passes the checking) there can be additional checks (like checking a string is a proper URL). I was under the impression that the only checking that the configuration component was supposed to do was the checking against the JSON schema. Maybe there is a motivation to catch errors as soon as possible, but now I don't see a clearly defined set of responsibilities for the configuration component. SDK components do checking of input values too so who's supposed to do the checking now?

I was referring to the checks being done by the SDK components themselves. I don't think that the create operation needs to perform additional validation beyond those done by SDK components. My point was just that just because a piece of YAML conforms to the schema doesn't mean that we'll be able to generate configured SDK components without error.

Copy link
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking comments

specification/configuration/file-configuration.md Outdated Show resolved Hide resolved
@jack-berg
Copy link
Member Author

jack-berg commented Jan 25, 2024

Now, for the Python implementation, I'm not sure how mandatory is to have Create Plugin or Component Provider. This doesn't fit our implementation where we don't have a function for the creation of any specific component but for every leaf in the schema tree. Can the way the component for plugins is created be left to the way that fits each language best?

@ocelotl What did you have in mind? This concept will definitely vary in implementation by language and I want to be sure that the language is specific enough to make it the intent clear (i.e. I don't want implementers to come away thinking this is impossible because there's no detail), but open enough to give implementers flexibility.

@ocelotl
Copy link
Contributor

ocelotl commented Jan 25, 2024

Apparently I did something wrong while adding a comment yesterday so here I go again.

The Python prototype now supports SDK extension components.

The java implementation I've been working on generates static types from the JSON schema. There is no JSON schema artifact present at runtime. At runtime, we use a generic YAML parsing library to parse the incoming YAML to the statically generated types. If the YAML does not conform to the structure of the statically generated types, the binding to the static types will fail.

The Python prototype uses a JSON schema checker to check the configuration file. If it does not match, it will raise an exception.

I'm imagining that a plugin expresses its schema by way of examples, and possibly in docs and internally using JSON schema, but it doesn't need to provide its schema to the implementation of file configuration at all as a part of being a Component Provider implementation.

the Python prototype doees merge the subschema of the plugin into the larger OTel schema.

The component provider reads configuration from the properties file. If all the configuration it needs is present and passes any internal validation performed by the component provider passes, then you can say that the properties argument conformed to the schema.

I suppose I would be open to adding an optional component provider operation where a particular implementation (like python) could require components providers to describe their schema in JSON schema format...

The Python prototye does not have a properties file. Each plugin specifies its schema with a dictionary that is equivalent to a JSON schema. This dictionary is then merged into the larger OTel schema and the resulting bigger schema is used to validate the configuration file. Here is an example of how this subschema is defined. Each plugin defines also a path that defines where in the larger OTel schema this subschema is to be added.

I was referring to the checks being done by the SDK components themselves. I don't think that the create operation needs to perform additional validation beyond those done by SDK components. My point was just that just because a piece of YAML conforms to the schema doesn't mean that we'll be able to generate configured SDK components without error.

I agree, I intentionally am not adding any other check to the file configuration component that is not JSON schema checking.

@ocelotl
Copy link
Contributor

ocelotl commented Jan 25, 2024

@ocelotl What did you have in mind? This concept will definitely vary in implementation by language and I want to be sure that the language is specific enough to make it the intent clear (i.e. I don't want implementers to come away thinking this is impossible because there's no detail), but open enough to give implementers flexibility.

The Python prototype is using standard entry points which is the normal Python mechanism to dynamically add some external component to a Python project. In this example I created a plugin that adds a sampler that only samples on Mondays but only sometimes. This sampler is an example of a custom component any user may define and add for it to work with OpenTelemetry. We don't have any "Create Plugin" or "Component Provider" per se, but we do have the plugin class SometimesMondaysOnSamplerPlugin which "adds" or "provides" this plugin. Does this fit what this PR is trying to specify?

@jack-berg
Copy link
Member Author

@ocelotl it looks like the implementation is similar. The schema_path in your implementation is akin to the type in this PR. The name in this PR is provided by the schema your implementation (i.e. the sometimes_mondays_on portion ensures that this plugin is only used when its selected).

Wondering what kind of changes you envision for this PR such that you feel comfortable with your implementation? My point of view is that as long as the net affect is the same, I don't think it matters much that there is uniformity among languages with this API. Maybe I can adjust the normative language to only include very general "MUSTs", but leave the specifics with "MAY" such that implementations can vary as needed.

Copy link
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the clarifications on Parse().

@ocelotl
Copy link
Contributor

ocelotl commented Jan 26, 2024

@ocelotl it looks like the implementation is similar. The schema_path in your implementation is akin to the type in this PR. The name in this PR is provided by the schema your implementation (i.e. the sometimes_mondays_on portion ensures that this plugin is only used when its selected).

Wondering what kind of changes you envision for this PR such that you feel comfortable with your implementation? My point of view is that as long as the net affect is the same, I don't think it matters much that there is uniformity among languages with this API. Maybe I can adjust the normative language to only include very general "MUSTs", but leave the specifics with "MAY" such that implementations can vary as needed.

Nice ✌️ I am now under the impression that your intention for things as component providers and such is to define a mechanism that adds components dynamically. I am ok with this PR right now as it is, I would only recommend to add a clarification that says the exact nature of component providers and the create plugin interface depends on the specific mechanisms every language has for this particular purpose. Thank you for the clarification 👍

@jack-berg
Copy link
Member Author

Will merge Monday 1/29/24 if no further comments.

@jack-berg jack-berg merged commit a6ca2fd into open-telemetry:main Jan 29, 2024
7 checks passed
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this pull request Oct 31, 2024
…lemetry#3802)

Part of incorporating [OTEP
open-telemetry#225](open-telemetry/oteps#225) into the
specification.

Followup to open-telemetry#3744.

This defines how file configuration works with custom SDK extension
components (Samplers, Exporters, etc).

It defines the concept of a Component Provider:
- Component providers are registered with the type of extension
component they provide and a name. Component providers are registered
automatically or manually based on what is idiomatic in the language.
- Component providers have a Create Plugin method, which passes
configuration properties as a parameter and returns the configured
component
- When Create is called to interpret a file configuration model, and it
comes across a reference to a extension component which is not built-in,
it invokes Create Plugin on the corresponding component provider. If no
corresponding component provider exists, or if Create Plugin returns an
Error, Create returns an error.

Prototype implementation in java here:
open-telemetry/opentelemetry-java-examples#227

cc @open-telemetry/configuration-maintainers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants