Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metrics] MeterConfig disabled does not follow Collector enabled convention #4344

Open
mx-psi opened this issue Dec 17, 2024 · 17 comments
Open
Labels
spec:metrics Related to the specification/metrics directory triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted

Comments

@mx-psi
Copy link
Member

mx-psi commented Dec 17, 2024

The MeterConfig's disabled parameter (introduced in #3877) conflicts with the Collector convention which is to use enabled (this is not documented, this search shows many places where this is used in the YAML configuration). Using enabled should be considered for consistency across the whole project.

Another argument against disabled: false is that it has a double negative which is harder to parse. IIRC this is the reason we use enabled instead in the Collector configuration.

disabled: false however is more idiomatic for Go because of false being the zero value of booleans.

cc @jack-berg @codeboten

@marcalff
Copy link
Member

The fact that boolean environment variables have a default of false comes from the spec.

See this conversation:

@mx-psi
Copy link
Member Author

mx-psi commented Dec 17, 2024

@marcalff Thanks for the link! This is not an environment variable though, I am not sure I see the relation

@jack-berg
Copy link
Member

We've carried the boolean env var convention into the naming / defaults of component properties. The boolean env var convention is sensible enough, and following it when introducing new component properties (i.e. not env vars) is the path of least resistance for contributors.

Another example of following this guidance where the property doesn't have a standard env variable are these prometheus exporter properties: without_units, without_type_suffix, without_scope_info, without_target_info

This is not an environment variable though, I am not sure I see the relation

By this logic, the collector YAML config you reference is unrelated theMeterConfig.disabled, since YAML config != programmatic config 😉. I think the fact of the matter is that programmatic config, YAML config, and env var config are all intertwined with each other, and conventions that apply to one tend to carry into others.

For example, we carry many of the characteristics of programmatic and env var properties into declarative config, including naming and defaults.

@mx-psi
Copy link
Member Author

mx-psi commented Dec 17, 2024

By this logic, the collector YAML config you reference is unrelated theMeterConfig.disabled, since YAML config != programmatic config 😉. I think the fact of the matter is that programmatic config, YAML config, and env var config are all intertwined with each other, and conventions that apply to one tend to carry into others.

I guess the point I didn't understand is that we still intend to add environment variables to configure the SDK even when we have the declarative configuration. If that were not the case, then it could make sense to not consider the "env argument".

I think this only makes sense to reconsider if:

  1. This is the first disabled example
  2. We don't intend to add further environment variables

I suspect (2) is not the case, but could you confirm?

@pellared
Copy link
Member

conflicts with the Collector convention which is to use enabled

Isn't the case for the collector that all usages of enabled are false by default so the user has to explicitly enable them? If so then Collector also follows the spec:

The fact that boolean environment variables have a default of false comes from the spec.

@trask trask added the triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted label Dec 17, 2024
@pellared
Copy link
Member

Another argument against disabled: false is that it has a double negative which is harder to parse. IIRC this is the reason we use enabled instead in the Collector configuration.

For the same reason the OpenTelemetry .NET Automatic Instrumentation has _ENABLED env vars even if the default value is true. E.g. OTEL_DOTNET_AUTO_RESOURCE_DETECTOR_ENABLED. See: https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/blob/main/docs/config.md. We got enough feedback to not go with the _DISABLED env vars.

@trask
Copy link
Member

trask commented Dec 19, 2024

my 2c from #2755 (comment):

I prefer *enabled flags over *disabled flags (even if that means the default value is true), and we use this pattern for all boolean flags in the otel java agent

@jack-berg
Copy link
Member

So there are two questions that seem relevant to answer:

  1. Do we want to re-litigate the boolean env var naming convention decided in Definition of boolean environment variables on the SDK #2755?
  2. Do we like following the env var guidance in the programmatic configuration interface?

From my perspective, having different conventions for different configuration interfaces (env vars vs. programmatic) is a bad design. I'd like to see us consistent (to the extent it makes sense), which here seems to mean either accepting the convention, or re-litigating the discussion. On re-ligating: the environment variable spec doc along with the boolean convention is stable. The specific sentence that we're arguing about uses normative "SHOULD". Coupled with the fact that there are very few env vars which follow this advice, I think there's technically some wiggle room to reverse course.

All Boolean environment variables SHOULD be named and defined such that false is the expected safe default behavior. Renaming or changing the default value MUST NOT happen without a major version upgrade.

With that said, I don't think re-litigating this is a good use of time, so I would personally not participate in that discussion (or block any result).

@mx-psi
Copy link
Member Author

mx-psi commented Dec 20, 2024

From my perspective, having different conventions for different configuration interfaces (env vars vs. programmatic) is a bad design

We are not just building the specification, we are also building the Collector, the Java Agent, the .NET instrumentation... I think based on your argument, any option is a bad design for OpenTelemetry as a whole, since we are bound to have an inconsistency somewhere (be it within the specification or within the wider official OTel ecosystem).

@jack-berg
Copy link
Member

I think based on your argument, any option is a bad design for OpenTelemetry as a whole, since we are bound to have an inconsistency somewhere (be it within the specification or within the wider official OTel ecosystem).

Any inconsistency is bad design, IMO. But the degree of bad depends on the component. I.e. its more problematic if we're not even be able to maintain consistency between configuration interfaces defined within the spec. Ideally, SIGs which are not explicitly bound to the guidance of the specification reach the conclusion that consistency is important.

@trask
Copy link
Member

trask commented Dec 20, 2024

just sharing a data point, I suspect we won't change OTEL_INSTRUMENTATION_[NAME]_ENABLED in the Java agent, since some of the instrumentations are enabled by default and some are disabled by default, and so it would feel weird (and inconsistent in a different way) to have some specific instrumentations using OTEL_INSTRUMENTATION_[NAME]_ENABLED and other specific instrumentations using OTEL_INSTRUMENTATION_[NAME]_DISABLED

(and an additional wrinkle is that we have occasionally changed the default for specific instrumentations from enabled to disabled in major version bumps)

@github-actions github-actions bot added the triage:followup Needs follow up during triage label Jan 1, 2025
@danielgblanco danielgblanco removed the triage:followup Needs follow up during triage label Jan 6, 2025
@mx-psi
Copy link
Member Author

mx-psi commented Jan 7, 2025

We discussed this on the Specification SIG meeting on 2025-01-07. Some arguments raised (other than the ones already raised on this thread):

  • On the 'default should be false' convention for environment variables
    • The number of environment variables that follow the 'default should be false' convention, excluding language-specific conventions, is just one: OTEL_SDK_DISABLED. We also have one environment variable that does not follow this convention (OTEL_EXPORTER_OTLP_INSECURE), that probably precedes the convention
    • We are not obligated to follow the 'default should be false' convention for declarative configuration. There would not be inconsistency between the declarative configuration and the environment variable configuration for this because there will not be a MeterConfig.disabled environment variable configuration
    • The spec wording is a 'SHOULD', so we can have exceptions (e.g. for ENABLED, to avoid double negation).
    • To some people this convention is less important than being clear on 'enabled'
  • We could change any environment variables that use 'DISABLED' to 'ENABLED' by having both and establishing what should happen when both environment variables are defined. This would be backwards compatible and ensure consistency.
  • Every option probably leads to some inconsistency, we need to figure out what is best for end-users.
  • "Telemetry should be easy" is an argument for prioritizing solving this inconsistency (inconsistency is hard!)

@mx-psi
Copy link
Member Author

mx-psi commented Jan 8, 2025

I filed #4351 which is one of the approaches discussed yesterday. I am not sure if this would need a sponsor or if I should file a separate issue for this change, feel free to take a look at the PR, but I will keep it as draft until someone can help me with clarifying sponsorship/process.

@trask
Copy link
Member

trask commented Jan 8, 2025

I am not sure if this would need a sponsor

this is a relatively small (if not uncontroversial) change, so I don't believe a sponsor is needed

@MrAlias
Copy link
Contributor

MrAlias commented Jan 10, 2025

We discussed this on the Specification SIG meeting on 2025-01-07. Some arguments raised (other than the ones already raised on this thread):

  • On the 'default should be false' convention for environment variables

    • The number of environment variables that follow the 'default should be false' convention, excluding language-specific conventions, is just one: OTEL_SDK_DISABLED. We also have one environment variable that does not follow this convention (OTEL_EXPORTER_OTLP_INSECURE), that probably precedes the convention
    • We are not obligated to follow the 'default should be false' convention for declarative configuration. There would not be inconsistency between the declarative configuration and the environment variable configuration for this because there will not be a MeterConfig.disabled environment variable configuration
    • The spec wording is a 'SHOULD', so we can have exceptions (e.g. for ENABLED, to avoid double negation).
    • To some people this convention is less important than being clear on 'enabled'
  • We could change any environment variables that use 'DISABLED' to 'ENABLED' by having both and establishing what should happen when both environment variables are defined. This would be backwards compatible and ensure consistency.

  • Every option probably leads to some inconsistency, we need to figure out what is best for end-users.

  • "Telemetry should be easy" is an argument for prioritizing solving this inconsistency (inconsistency is hard!)

This is a very one-side summary of the meeting conversation. It should be noted for readers here there were dissenting views to many of these points that are not captured. It should not be inferred that consensus was reached.

@mx-psi
Copy link
Member Author

mx-psi commented Jan 13, 2025

This is a very one-side summary of the meeting conversation. It should be noted for readers here there were dissenting views to many of these points that are not captured. It should not be inferred that consensus was reached.

It was not my intention to claim that there was consensus. As I say on my message I only made a summary of some arguments, "other than the ones already raised on this thread". The main argument against was the re-litigation (which is already mentioned above). If I am missing something else important, feel free to let me know and I can edit my message.

@danielgblanco danielgblanco removed the triage:followup Needs follow up during triage label Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:metrics Related to the specification/metrics directory triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted
Projects
None yet
Development

No branches or pull requests

7 participants