-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loki: Add a ring to the query scheduler to allow discovery via the ring as an alternative to DNS #4424
Conversation
9469fcd
to
d5a754e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea but there seems to be an awful lot of copy pasta. What is the reason not to add it to Cortex?
@@ -0,0 +1,75 @@ | |||
package frontend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the package path stuttering? Why not pkg/loki/frontend/config.go
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is done mostly for simplicity, lokifrontend/config.go already existed and hopefully when we upstream these changes and removed the forked code it can just remain the way it was.
) | ||
|
||
// GrpcRoundTripper is similar to http.RoundTripper, but works with HTTP requests converted to protobuf messages. | ||
type GrpcRoundTripper interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something we could add to dskit instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could eventually be moved to dskit (as it was taken verbatim from cortex), but I don't think that should be in scope for this PR.
@@ -0,0 +1,352 @@ | |||
package v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the change to the Cortex version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand the question.
|
||
"github.com/cortexproject/cortex/pkg/frontend/v2/frontendv2pb" | ||
"github.com/cortexproject/cortex/pkg/querier/stats" | ||
"github.com/cortexproject/cortex/pkg/ring" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using ring from dskit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because Loki hasnt been updated to use the ring from dskit as a whole, that should be done as a separate step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking my approval of the docs part of this PR.
At some point, it'd be super to correct all the grammar and punctuation and improve the wording of all the configuration knobs. We'll do that in the code and then autogenerate this list of knobs.
087b824
to
c6ae5a4
Compare
c6ae5a4
to
8c14e31
Compare
I tried to explain in the description, copying is intentional to prove the concept before upstreaming, also with the refactoring taking place with dskit it would be difficult to make this change upstream and revendor it within the timeframe I'd like to see this working for Loki. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-review: I've added quite a few suggestions. They are not required, but would be very nice in the documentation for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, excited to get this merged!
My biggest question is around the change to module dependencies. I'm curious why we have to make the Querier and Frontend dependent on the Scheduler? If we need to do this, there are some configurations we're accounting for in other places that are no longer possible. It also seems to remove the need for the v1 Frontend altogether?
pkg/loki/loki.go
Outdated
@@ -432,16 +433,16 @@ func (t *Loki) setupModuleManager() error { | |||
Distributor: {Ring, Server, Overrides, TenantConfigs}, | |||
Store: {Overrides}, | |||
Ingester: {Store, Server, MemberlistKV, TenantConfigs}, | |||
Querier: {Store, Ring, Server, IngesterQuerier, TenantConfigs}, | |||
Querier: {QueryScheduler, Store, Ring, Server, IngesterQuerier, TenantConfigs}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is confusing to me. Do we in fact always want to spin up a scheduler when we spin up a querier? I'm missing the technical reason why, as it seems like the worker config can take a nil
for the query schedule ReadRing
. Is it an order thing, where when we do have both a querier and a scheduler we need to make sure the scheduler starts up first?
By making this change is seems like we would prevent the need to check for cfg.FrontendAddress
as you would always have a scheduler, so you would probably want to either use the scheduler ring or scheduler address on localhost if none other was specified.
This change seems to make it impossible to have a frontend + querier configuration, or to have frontend + scheduler pods be separate from querier pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, does that mean querierRunningStandalone
will never return true
anymore?
pkg/loki/loki.go
Outdated
QueryFrontendTripperware: {Server, Overrides, TenantConfigs}, | ||
QueryFrontend: {QueryFrontendTripperware}, | ||
QueryScheduler: {Server, Overrides}, | ||
QueryFrontend: {QueryScheduler, QueryFrontendTripperware}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again here, I'm confused why the scheduler needs to become a dependency of the frontend?
pkg/loki/modules.go
Outdated
@@ -416,7 +417,7 @@ func (t *Loki) initQueryFrontend() (_ services.Service, err error) { | |||
FrontendV1: t.Cfg.Frontend.FrontendV1, | |||
FrontendV2: t.Cfg.Frontend.FrontendV2, | |||
DownstreamURL: t.Cfg.Frontend.DownstreamURL, | |||
}, disabledShuffleShardingLimits{}, t.Cfg.Server.GRPCListenPort, util_log.Logger, prometheus.DefaultRegisterer) | |||
}, t.queryScheduler.ReadRing(), disabledShuffleShardingLimits{}, t.Cfg.Server.GRPCListenPort, util_log.Logger, prometheus.DefaultRegisterer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this call site is getting pretty big, how would you feel about pulling frontend.CombinedFrontendConfig
into a local variable and/or moving the arguments to their own lines?
// is configured, Loki will default to using the frontend on localhost on it's own GRPC listening port. | ||
if (*cfg.QuerierWorkerConfig).FrontendAddress == "" && (*cfg.QuerierWorkerConfig).SchedulerAddress == "" { | ||
if cfg.SchedulerRing == nil && (*cfg.QuerierWorkerConfig).FrontendAddress == "" && (*cfg.QuerierWorkerConfig).SchedulerAddress == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depending on how we resolve the dependency changes (ie. if the querier really does depend on the scheduler), then we may want to set the scheduler address instead of the frontend address on this condition.
if r.Worker.FrontendAddress == "" && | ||
r.Worker.SchedulerAddress == "" && | ||
r.Frontend.FrontendV2.SchedulerAddress == "" { | ||
r.QueryScheduler.UseSchedulerRing = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add a quick test for this in config_wrapper_test.go
?
@@ -87,5 +95,6 @@ func applyMemberlistConfig(r *ConfigWrapper) { | |||
r.Ingester.LifecyclerConfig.RingConfig.KVStore.Store = memberlistStr | |||
r.Distributor.DistributorRing.KVStore.Store = memberlistStr | |||
r.Ruler.Ring.KVStore.Store = memberlistStr | |||
r.QueryScheduler.SchedulerRing.KVStore.Store = memberlistStr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add a quick test for this in config_wrapper_test.go
? there's a few other memberlist tests so maybe just adding to an existing for copy/pasing?
@slim-bean I reworked the module dependencies, and have been testing locally where I have the |
d6d3387
to
84fb418
Compare
@slim-bean ok, now I think it's actually ready to go, got the nil pointer issue resolved when running the querier or frontend without a co-located scheduler. I'll babysit this through the CI checks, but then we should be good for some more reviews. |
16ae43c
to
61d035f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving my own changes to the module dependency issue, but would like a review from others.
…worker and querier workers to find the scheduler address as an alternative to using DNS
…ince #4312 was merged after the original fork was created.
Signed-off-by: Trevor Whitney <trevorjwhitney@gmail.com>
Signed-off-by: Trevor Whitney <trevorjwhitney@gmail.com>
A few doc fixes Co-authored-by: Karen Miller <84039272+KMiller-Grafana@users.noreply.github.com>
5b83af8
to
2553167
Compare
To support this change, several packages were forked from upstream Cortex. This is intended to be a transient fork to validate these changes in Loki before upstreaming.
The first commit is forking the files, the second commit includes all the changes.
The newly added ring watcher leverages the existing infrastructure which used DNS for finding the scheduler, there are some interfaces and config options I re-used for the the new ring watcher which are now not the most consistently named.
For example, the poll interval on the ring still uses the existing DNS named config options for polling, e.x.
dns_lookup_duration
I debated introducing new config options but currently am of the opinion this just makes the config more confusing and cumbersome. Instead explaining the dual functionality in the documentation.
Checklist