Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opentelemetry-collector] Remove memory_ballast extension from default template and replace by GOMEMLIMIT #891

Closed
mx-psi opened this issue Sep 13, 2023 · 27 comments
Assignees
Labels
chart:collector Issue related to opentelemetry-collector helm chart

Comments

@mx-psi
Copy link
Member

mx-psi commented Sep 13, 2023

We want to remove the memory_ballast extension in favor of Go 1.19+ GOMEMLIMIT environment variable as stated in open-telemetry/opentelemetry-collector/issues/8343.

The first step to verify that we can effectively remove the extension is to change this on the OpenTelemetry Collector Helm chart.
We can replace the extension by a value of GOMEMLIMIT that is something like 80-90% of the total memory limit for the Collector container. Once we release this change and wait some time to gather feedback, we can move on with deprecation of the memory ballast extension.

@mx-psi
Copy link
Member Author

mx-psi commented Sep 13, 2023

cc @dmitryax @TylerHelmuth

@TylerHelmuth
Copy link
Member

@mx-psi @dmitryax i think we should implement this via a feature-gate-like config.

The first PR can add the option to use the Go setting instead of the extension, but disabled by default. After some time, we can enable it by default, but still allow users who want to use the extension to disable it. Then we'd wait until the extension is deprecated to remove the option.

@mx-psi
Copy link
Member Author

mx-psi commented Sep 14, 2023

i think we should implement this via a feature-gate-like config.

Have we done anything similar before on the Collector Helm chart? The plan sounds reasonable to me

@mx-psi
Copy link
Member Author

mx-psi commented Sep 22, 2023

@TylerHelmuth should this be a usual configuration option or is there an alternative mechanism in the Helm chart for temporary toggles?

@TylerHelmuth
Copy link
Member

We dont have anything like feature gates in the chart, we normally do it via a configuration option (or 2).

@povilasv
Copy link
Contributor

Question: what are we planning to do we do when request limits are not set?

ATM we default memorybalast to (https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_config.tpl#L25C1-L27C1) :

      memory_ballast:
        size_in_percentage: 40

and requests / limits to empty:

resources: {}
# resources:
#   limits:
#     cpu: 250m
#     memory: 512Mi

I think the only thing we can do is not set the GOMEMLIMIT env?

This would be a different behaviour from current one.

@mx-psi
Copy link
Member Author

mx-psi commented Sep 26, 2023

I think the only thing we can do is not set the GOMEMLIMIT env?

I am personally fine with that, if you explicitly unset resource limits then you should be on your own to configure this

@TylerHelmuth TylerHelmuth self-assigned this Sep 27, 2023
@TylerHelmuth TylerHelmuth added the chart:collector Issue related to opentelemetry-collector helm chart label Sep 27, 2023
@TylerHelmuth
Copy link
Member

I have started work on this

@mx-psi
Copy link
Member Author

mx-psi commented Oct 9, 2023

How much should we wait until we enable the GOMEMLIMIT option by default? Is something like one month enough?

@TylerHelmuth
Copy link
Member

@mx-psi is this helm chart feature critical for the decision on whether or not to deprecate the extension or has the decision already been made to deprecate it?

@mx-psi
Copy link
Member Author

mx-psi commented Oct 9, 2023

@mx-psi is this helm chart feature critical for the decision on whether or not to deprecate the extension or has the decision already been made to deprecate it?

I would want to validate the approach with the Helm chart before committing to deprecate the extension. I don't feel like I personally have enough knowledge about the Go garbage collector to ensure this replacement will be satisfactory for all use cases (it probably will, that's why I pushed for this, but I want to be sure before deprecating the extension).

If anybody else feels like this is clear cut, then we don't have to wait for this, but personally I feel like this is safer.

@TylerHelmuth
Copy link
Member

In that case I think we need to leave it around as long as it take to make a decision. We won't move forward with turning it always on until we're sure we are deprecating the extension.

I can enable it on our internal use of the collector, but we'll need other examples as well.

@mx-psi
Copy link
Member Author

mx-psi commented Oct 11, 2023

@JaredTan95 based on #912 it seems like you are testing this out. Would you mind sharing any findings about the setting (memory and CPU usage patterns before/after?)

I also posted a call out for more testing on a few channels on the CNCF Slack. Once we have at least two I think we can proceed with the deprecation and enable this by default.

@mx-psi
Copy link
Member Author

mx-psi commented Nov 3, 2023

We have been running Collectors using the Helm chart with useGOMEMLIMIT enabled for a few weeks on Datadog's infrastructure with a similar CPU usage as with the memory ballast and lower memory consumption (because of the memory ballast).

@krisztianfekete
Copy link
Contributor

Can you share some stats/numbers on the improvements?

@TylerHelmuth
Copy link
Member

We see similar stats at Honeycomb

@JaredTan95
Copy link
Member

JaredTan95 commented Nov 4, 2023

@JaredTan95 based on #912 it seems like you are testing this out. Would you mind sharing any findings about the setting (memory and CPU usage patterns before/after?)

I also posted a call out for more testing on a few channels on the CNCF Slack. Once we have at least two I think we can proceed with the deprecation and enable this by default.

We switched to this(useGOMEMLIMIT) configuration because of an issue with memory ballast's compatibility with our arm architecture. We haven't verified this configuration in production, but in our test environment, this feature solves the arm compatibility problem and the effect is similar to that of the original memory ballast.

So, I support it.

@TylerHelmuth
Copy link
Member

@mx-psi I'd still like to see the extension officially deprecated in Core before we make this a default value in the helm chart.

@dmitryax
Copy link
Member

dmitryax commented Nov 6, 2023

@JaredTan95 can you please share more details about memory ballast's compatibility issues with our arm architecture? It be nice to post them in open-telemetry/opentelemetry-collector#8343 as well

@mx-psi
Copy link
Member Author

mx-psi commented Nov 6, 2023

Based on the discussion on open-telemetry/opentelemetry-collector#8343 and this issue, I filed open-telemetry/opentelemetry-collector#8803 to deprecate the memory ballast extension.

@dmitryax
Copy link
Member

dmitryax commented Dec 4, 2023

I'd prefer the next step before deprecating the memory_ballast extension to be switching the useGOMEMLIMIT feature gate to true in the helm chart. That way we are not forced to do it and will have a chance to get some more feedback before the actual deprecation. I'd think we should keep useGOMEMLIMIT: true for a couple of versions, then deprecate memory_ballast extension. @mx-psi @JaredTan95 @povilasv @TylerHelmuth WDYT?

@mx-psi
Copy link
Member Author

mx-psi commented Dec 4, 2023

I am fine with that, although I think @TylerHelmuth wanted to go the other way around 😄 Up to the Helm chart maintainers I guess :)

@TylerHelmuth
Copy link
Member

Ya I was imagining after the initial testing that the helm chart would follow the guidance from Core. My though being that I didn't want to disrupt users until the component was actually deprecated.

But we could enable it by default and use helm chart users as testers (unless they switch it back to false).

@dmitryax
Copy link
Member

dmitryax commented Dec 4, 2023

But we could enable it by default and use helm chart users as testers (unless they switch it back to false).

That's what I thought. I wanted to gather as much feedback as possible before the deprecation. I'm thinking of useGOMEMLIMIT as an improvement anyway

@JaredTan95
Copy link
Member

I'd think we should keep useGOMEMLIMIT: true for a couple of versions, then deprecate memory_ballast extension

I agree! 👏

@JaredTan95
Copy link
Member

JaredTan95 commented Dec 6, 2023

@JaredTan95 can you please share more details about memory ballast's compatibility issues with our arm architecture? It be nice to post them in open-telemetry/opentelemetry-collector#8343 as well

The compatibility problem we encountered was only tested on the Chinese operating system(Kylin OS), so I think it has little reference value for users using international operating systems, so I did not put it up :-P

bogdandrutu added a commit to open-telemetry/opentelemetry-collector that referenced this issue Dec 19, 2023
**Description:** 

Based on user reports on
open-telemetry/opentelemetry-helm-charts/issues/891 and the discussion
on #8343, we can deprecate the memory ballast extension in favor of
using `GOMEMLIMIT`. This PR:

- Deprecates the memory ballast extension in the README
- Removes references to the memory ballast extension on docs
- Updates k8s example to use `GOMEMLIMIT` with the same approach as in
the Helm chart (80% of memory limit)
- Deprecates the memory ballast extension Go module

Once this PR is accepted,
open-telemetry/opentelemetry-helm-charts/issues/891 can move ahead with
enabling `useGOMEMLIMIT` by default on the Helm chart.

Other issues will be opened for opentelemetry.io, the Opentelemetry
Operator and other parts of the OpenTelemetry project to remove
references to this extension once the PR is merged.

No explicit timeline is given for removal of the extension.

**Link to tracking Issue:** Updates #8343

---------

Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>
@mx-psi
Copy link
Member Author

mx-psi commented Dec 20, 2023

open-telemetry/opentelemetry-collector/pull/8803 got merged, we can revert before Jan 8th if we find issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chart:collector Issue related to opentelemetry-collector helm chart
Projects
None yet
Development

No branches or pull requests

6 participants