-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New ECS health check extension #5144
Conversation
# Conflicts: # go.mod
# Conflicts: # go.mod # internal/components/components.go
# Conflicts: # go.mod # internal/components/components.go
# Conflicts: # go.mod # internal/components/components.go
# Conflicts: # extension/awsecshealthcheckextension/go.mod # extension/awsecshealthcheckextension/go.sum # internal/components/components.go
Hello, open question about this interesting new extension ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
so basically OT already has a health check extension for general usage, and we create this new one because we want to set some specific healthy condition for ECS, that's why it's a ECS health check extension. |
} | ||
|
||
func (hc *ecsHealthCheckExtension) Start(_ context.Context, host component.Host) error { | ||
hc.logger.Info("Starting ECS health check extension", zap.Any("config", hc.config)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really need to emit the entire config at Info
? Maybe have a separate entry with it at Debug
if it is useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's fine, just tell the user this extension starts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll restate:
This does not need to emit the entire config at Info? Have a separate entry with it at Debug if it is useful?
hc.exporter.mu.Lock() | ||
defer hc.exporter.mu.Unlock() | ||
|
||
return hc.config.ExporterErrorLimit >= len(hc.exporter.exporterErrorQueue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't access internals of the exporter like this. Add an accessor that can handle the locking itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this exporter is embedded in the extension and not used anywhere else, i think it's OK to lock and unlock like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may be the current situation, but it is not good practice and it may not always be the case. If it is changed in the future then the person who changes it may not be aware of this use.
In that case, let me copy in my comment. I think there is a recurring theme that people wonder why we are adding this extension since it doesn't have ECS-specific logic at all. We should all work together to improve the default health check instead as a community.
|
# Conflicts: # cmd/configschema/go.mod # go.mod
Moreover, healthcheckextension is now on contrib instead of core, so it's probably more acceptable to make some eventual breaking changes to make it more complete and adapted to any situation. |
8321cd0
to
ff20722
Compare
ff20722
to
bdad8cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to echo the sentiments of @anuraaga and others. I don't see why this is positioned as an ECS health check and not an extension to the general health check. This provides a seemingly general capability and should not take the risk of deterring potential users by mislabeling itself.
@skyduo i agree with @Aneurysm9 @anuraaga and others that this is a good addition to the existing healthcheck extension. I don't think it needs to be AWS ECS specific. We don't want to be in the business of maintaining custom components wherever possible. |
@alolita @Aneurysm9 @anuraaga @skyduo I agree that this should be an addition/extra feature inside the existing health check extension. IIRC, this is what we first proposed when we first ever discussed our need/use case in a SIG meeting. However, the exact use case of making the health check fail if the exporter has errors was said to be very ECS specific by the folks in that meeting. So we were asked to pursue this route. If I remember correct, @tigrannajaryan and @bogdandrutu recommended that approach. I am happy if the community prefers we put all health check functionality in one extension. But want to double check that all stakeholders have weighed in. |
yeah i have no objection on making it a general one, just like Wesley said if everyone agree with it |
@PettitWesley @skyduo - thx for clarifying. Will bring this up for discussion w maintainers. |
// initExporter function could register the customized exporter | ||
func (hc *ecsHealthCheckExtension) initExporter() error { | ||
hc.exporter = newECSHealthCheckExporter() | ||
view.RegisterExporter(hc.exporter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be unregistered when the extension is shut down?
based on the discussion in the SIG meeting, we will move new health check strategy to the existing health check extension. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Should this be closed in favor of the other PR? |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
I'm closing this in favor of the other PR. |
Because, most likely for the moment only otlp exporter/receiver is using this package, we can remove this after just one version, no third-party deps. Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
Description:
This is a new health check extension feature for AWS/ECS, as we are going to monitor the OT collector health through the return status from endpoint of the new extension, detailed design in the design doc below.
Link to tracking Issue:
open-telemetry/opentelemetry-collector#2573
Testing:
make otelcontribcol
and run executable fileDocumentation:
https://docs.google.com/document/d/1SpUMsWA2DeaoVazeQ8uEc1Wvu5LphmQU_TjzLmuJ4QM/edit#heading=h.rs1luwizct2w
original pr which has some discussion with @bogdandrutu and previous pr #4451