-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingester: don't log errors that cause OOMs, using interface #5581
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I like this approach compared to other one (adding DoNotLogError
in common
).
@@ -17,6 +17,11 @@ const ( | |||
errorKey = "err" | |||
) | |||
|
|||
// If an error implements Observe(), it will get called and GRPCServerLog will do nothing. | |||
type Observer interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document (or codify) some of the expectations of implementers of this? For example, we expect if you implement Observer
you also implement error
and have a .Unwrap()
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we expect if you implement Observer you also implement error and have a .Unwrap() method
I don't think we expect that, right? You'll have to implement error
because this is checked on an error, but we don't need that. And there's no need for the Unwrap()
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also looking at #5584, I’m not a big fan of Observe()
. We’re going to have the logging logic scattered around. I’m wondering if we could simply achieve the same with a function which tells whether the error should be logged or not (return bool), instead of delegating the log to the error's Observe()
.
How about |
pkg/distributor/distributor.go
Outdated
@@ -45,6 +44,7 @@ import ( | |||
"github.com/grafana/mimir/pkg/mimirpb" | |||
"github.com/grafana/mimir/pkg/util" | |||
"github.com/grafana/mimir/pkg/util/globalerror" | |||
log_util "github.com/grafana/mimir/pkg/util/log" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
util_log
for consistency.
Looks better to me. Do we need to pass |
I took the five parameters from the previous version, and tried to imagine which ones could be used. I know it's YAGNI, but we can never change that interface (that would silently break all users). |
This allows us to decorate them with extra information for gRPC responses and our logging middleware (to prevent them from being logged, increasing resource usage). Related #5581 Related weaveworks/common#299 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
@@ -91,9 +83,6 @@ func (l Log) Wrap(next http.Handler) http.Handler { | |||
statusCode, writeErr := wrapped.getStatusCode(), wrapped.getWriteError() | |||
|
|||
if writeErr != nil { | |||
if errors.Is(writeErr, DoNotLogError{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't required anymore? Can you explain why was it removed from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is the return from Write()
; I don't see how it can be one of our wrapped errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I looked at the high-volume log lines that have been problemmatic and they were all from grpc_logging, none from http request logging.
I pushed a tiny change to a test, to improve it. Hope you don't mind: 0108bb6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (modulo a question)
|
I agree (also mentioned in the related design doc). I was waiting for the next PR that will introduce the sampling, to stress about this ;) |
Upstream PR in weaveworks/common has merged, so I rebased. Please check I didn't drop any of your commits.
PR #5584 is sampling 6 or so errors. |
// Use a fork of weaveworks/common while we work out if there is a better design for https://github.com/weaveworks/common/pull/293 | ||
replace github.com/weaveworks/common => github.com/weaveworks/common v0.0.0-20230714173453-d1f8877b91ce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: removed because weaveworks/common#299 has been merged.
Using weaveworks/common#299 which is a more flexible approach. Note the check disappeared from `logging.go`, because it was a mistake to check that error. It comes from `io.Writer`, it won't be an app- level error.
Signed-off-by: Marco Pracucci <marco@pracucci.com>
This allows us to decorate them with extra information for gRPC responses and our logging middleware (to prevent them from being logged, increasing resource usage). Related #5581 Related weaveworks/common#299 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
This allows us to decorate them with extra information for gRPC responses and our logging middleware (to prevent them from being logged, increasing resource usage). Related #5581 Related weaveworks/common#299 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
…gging for them (#5585) This allows us to decorate them with extra information for gRPC responses and our logging middleware (to prevent them from being logged which is expensive). Related #5581 Related weaveworks/common#299
What this PR does
Not-ready and over-max-inflight can be emitted when the server is overloaded, in which case we don't want to spend more resource logging the fact.
The max-inflight errors will be recorded in metrics added in #5551.
Temporarily using a branch from upstream weaveworks/common, to check Mimir still passes CI.
Update on #5494
Checklist
CHANGELOG.md
updated