-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing generalized tree queue with node state implemented by tenant-querier assignments #7873
Conversation
a580731
to
b2ed920
Compare
d8691b8
to
64b1ac8
Compare
ccfadcf
to
6294be7
Compare
7aaea6f
to
f5f461a
Compare
pkg/scheduler/queue/tree_queue.go
Outdated
} else if dequeueNode == nil { | ||
// no dequeue-able child found; reset checked children to 0, | ||
// as we won't update state before moving on | ||
n.childrenChecked = 0 | ||
return path, v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO childrenChecked
should be algorithm state, not tree state - all uses of it are in the algorithm, not the tree and we can just call to update the state here before this early return here.
It's the algo's job rather than the tree's job to a) decide if it's done selecting nodes and b) set itself to the correct state after a successful or failed dequeue.
Also there's some awkwardness with having !checkedAllNodes
be a condition on the outerloop, but it never comes into play there because we will never return to the top of the loop with a case where where v is nil and checkedAllNodes
is true - because we early return here.
We should do one or the other - don't early return here, or don't check that flag in the outer loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered keeping childrenChecked
in the state, but decided against it in the end. With tenantQuerierAssignments
, we explicitly take advantage of things in state being shared across all nodes at the same level. Things like childrenChecked
(and, e.g., queuePosition
) apply to/are maintained by individual nodes, not shared state, so it seemed counterintuitive and possibly error-prone for implementers of DequeueAlgorithm
to put them in algorithm state.
I think you're right about not returning early, I'll fix that.
388fdb6
to
ed1d14f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs LGTM
1175dfd
to
6cba727
Compare
(done looking at the smaller stuff for now, now to deep dive on the tests) |
15897c1
to
c46f9ff
Compare
c46f9ff
to
a05f971
Compare
tenant *queueTenant | ||
} | ||
|
||
func assertExpectedValuesOnDequeue(t *testing.T, qb *queueBroker, lastTenantIndex int, querierID QuerierID, expectedVals []dequeueVal) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
big fan of this approach
3a7af96
to
c56cdfc
Compare
pkg/scheduler/queue/tenant_queues.go
Outdated
if useMultiAlgoTreeQueue { | ||
tree, err = NewTree( | ||
tqas, // root; QueuingAlgorithm selects tenants | ||
&roundRobinState{}, // tenant queues; QueuingAlgorithm selects query component | ||
&roundRobinState{}, // query components; QueuingAlgorithm selects query from local queue | ||
) | ||
// An error building the tree is fatal; we must panic | ||
if err != nil { | ||
panic(fmt.Sprintf("error creating the tree queue: %v", err)) | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting to see another case for when prioritizeQueryComponents
is true. Is that missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Franco (or maybe you) gave some feedback that this version of the tree should be a complete no-op, and we shouldn't introduce a config option without at least some test coverage. IIRC, the conclusion was, after we verify that the new tree works under production load/migrating to using it, to:
- introduce
prioritizeQueryComponents
as a config option (along with comprehensive test coverage which doesn't currently exist) - eventually deprecate the current
TreeQueue
data structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
ca72570
to
a121631
Compare
a121631
to
bae99c7
Compare
02d975e
to
a12ded4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly just looking for more comments / clarification on some test cases
I am not sure I have looked quite closely enough to confidently approve considering how big it is, I think maybe just running it locally with and without the flag would get me there, but in any case I am most comfortable with Patrick being the one to give final approval
What this PR does
This PR heavily edits the
TreeQueue
data structure -- nodes now contain aDequeueAlgorithm
interface, which, when implemented, manages the bulk of logic used to select nodes to dequeue from, and how to update state after the dequeue operation.Why do we want to do this?
This makes it easy for us to configure the ordering of query components vs. tenant queues in the tree. The previous state is that, if
additionalQueueDimensionsEnabled
, the tree would be built as such:The problem with this is that we will always dequeue from a tenant, regardless of the component. Thus, we cannot, for instance, reliably prioritize
store-gateway
queries when ingesters are experiencing heavy load.With this change, if
additionalQueueDimensionsEnabled
andprioritizeQueryComponents
, the tree would look like:With this tree configuration, we can ensure that something is always dequeued from underloaded components if queues exist for those components. Tenant fairness is less guaranteed, since each component may have a different subset of tenant queues, but we mitigate this by having query component nodes share a global tenant order.
Note that this PR does not change the tree prioritization -- prioritizeQueryComponents is currently false everywhere
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.