Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing generalized tree queue with node state implemented by tenant-querier assignments #7873

Merged
merged 30 commits into from
Jul 5, 2024

Conversation

chencs
Copy link
Contributor

@chencs chencs commented Apr 10, 2024

What this PR does

This PR heavily edits the TreeQueue data structure -- nodes now contain a DequeueAlgorithm interface, which, when implemented, manages the bulk of logic used to select nodes to dequeue from, and how to update state after the dequeue operation.

Why do we want to do this?

This makes it easy for us to configure the ordering of query components vs. tenant queues in the tree. The previous state is that, if additionalQueueDimensionsEnabled, the tree would be built as such:

root
├─ tenant-1
│  ├─ ingester
│  ├─ store-gateway
│  ├─ ingester-and-store-gateway
...
├─ tenant-n
│  ├─ ingester
│  ├─ store-gateway
│  ├─ ingester-and-store-gateway

The problem with this is that we will always dequeue from a tenant, regardless of the component. Thus, we cannot, for instance, reliably prioritize store-gateway queries when ingesters are experiencing heavy load.

With this change, if additionalQueueDimensionsEnabled and prioritizeQueryComponents, the tree would look like:

root
├─ ingester
│  ├─ tenant-1
│  ├─ ...
│  ├─ tenant-n
├─ store-gateway
│  ├─ tenant-1
│  ├─ ...
│  ├─ tenant-n
├─ ingester-and-store-gateway
│  ├─ tenant-1
│  ├─ ...
│  ├─ tenant-n

With this tree configuration, we can ensure that something is always dequeued from underloaded components if queues exist for those components. Tenant fairness is less guaranteed, since each component may have a different subset of tenant queues, but we mitigate this by having query component nodes share a global tenant order.

Note that this PR does not change the tree prioritization -- prioritizeQueryComponents is currently false everywhere

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@chencs chencs force-pushed the casie/tree-queue-sketch branch 2 times, most recently from a580731 to b2ed920 Compare June 3, 2024 20:56
@chencs chencs force-pushed the casie/tree-queue-sketch branch 2 times, most recently from d8691b8 to 64b1ac8 Compare June 10, 2024 22:15
@chencs chencs changed the title DRAFT: Implementing tree queue nodes that are aware of tenant-querier relationship Implementing generalized tree queue that with node state implemented by tenant-querier assignments Jun 10, 2024
@chencs chencs force-pushed the casie/tree-queue-sketch branch from ccfadcf to 6294be7 Compare June 10, 2024 23:22
@chencs chencs marked this pull request as ready for review June 10, 2024 23:29
@chencs chencs requested review from a team and jdbaldry as code owners June 10, 2024 23:29
@chencs chencs force-pushed the casie/tree-queue-sketch branch 4 times, most recently from 7aaea6f to f5f461a Compare June 11, 2024 19:17
Comment on lines 211 to 215
} else if dequeueNode == nil {
// no dequeue-able child found; reset checked children to 0,
// as we won't update state before moving on
n.childrenChecked = 0
return path, v
Copy link
Member

@francoposa francoposa Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO childrenChecked should be algorithm state, not tree state - all uses of it are in the algorithm, not the tree and we can just call to update the state here before this early return here.

It's the algo's job rather than the tree's job to a) decide if it's done selecting nodes and b) set itself to the correct state after a successful or failed dequeue.

Also there's some awkwardness with having !checkedAllNodes be a condition on the outerloop, but it never comes into play there because we will never return to the top of the loop with a case where where v is nil and checkedAllNodes is true - because we early return here.
We should do one or the other - don't early return here, or don't check that flag in the outer loop

Copy link
Contributor Author

@chencs chencs Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered keeping childrenChecked in the state, but decided against it in the end. With tenantQuerierAssignments, we explicitly take advantage of things in state being shared across all nodes at the same level. Things like childrenChecked (and, e.g., queuePosition) apply to/are maintained by individual nodes, not shared state, so it seemed counterintuitive and possibly error-prone for implementers of DequeueAlgorithm to put them in algorithm state.

I think you're right about not returning early, I'll fix that.

@chencs chencs force-pushed the casie/tree-queue-sketch branch from 388fdb6 to ed1d14f Compare June 11, 2024 22:45
Copy link
Member

@jdbaldry jdbaldry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs LGTM

@chencs chencs force-pushed the casie/tree-queue-sketch branch 3 times, most recently from 1175dfd to 6cba727 Compare June 12, 2024 19:35
@francoposa
Copy link
Member

(done looking at the smaller stuff for now, now to deep dive on the tests)

@chencs chencs force-pushed the casie/tree-queue-sketch branch from 15897c1 to c46f9ff Compare June 12, 2024 21:26
@chencs chencs force-pushed the casie/tree-queue-sketch branch from c46f9ff to a05f971 Compare June 12, 2024 21:33
tenant *queueTenant
}

func assertExpectedValuesOnDequeue(t *testing.T, qb *queueBroker, lastTenantIndex int, querierID QuerierID, expectedVals []dequeueVal) int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

big fan of this approach

@chencs chencs force-pushed the casie/tree-queue-sketch branch from 3a7af96 to c56cdfc Compare June 26, 2024 19:33
pkg/scheduler/queue/tenant_queues.go Outdated Show resolved Hide resolved
Comment on lines 57 to 67
if useMultiAlgoTreeQueue {
tree, err = NewTree(
tqas, // root; QueuingAlgorithm selects tenants
&roundRobinState{}, // tenant queues; QueuingAlgorithm selects query component
&roundRobinState{}, // query components; QueuingAlgorithm selects query from local queue
)
// An error building the tree is fatal; we must panic
if err != nil {
panic(fmt.Sprintf("error creating the tree queue: %v", err))
}
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting to see another case for when prioritizeQueryComponents is true. Is that missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Franco (or maybe you) gave some feedback that this version of the tree should be a complete no-op, and we shouldn't introduce a config option without at least some test coverage. IIRC, the conclusion was, after we verify that the new tree works under production load/migrating to using it, to:

  • introduce prioritizeQueryComponents as a config option (along with comprehensive test coverage which doesn't currently exist)
  • eventually deprecate the current TreeQueue data structure

pkg/scheduler/queue/tenant_queues.go Show resolved Hide resolved
@chencs chencs changed the title Implementing generalized tree queue that with node state implemented by tenant-querier assignments Implementing generalized tree queue with node state implemented by tenant-querier assignments Jun 27, 2024
Copy link
Contributor

@tacole02 tacole02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@chencs chencs force-pushed the casie/tree-queue-sketch branch 2 times, most recently from ca72570 to a121631 Compare July 1, 2024 22:53
@chencs chencs force-pushed the casie/tree-queue-sketch branch from a121631 to bae99c7 Compare July 1, 2024 23:00
CHANGELOG.md Outdated Show resolved Hide resolved
@chencs chencs force-pushed the casie/tree-queue-sketch branch from 02d975e to a12ded4 Compare July 2, 2024 19:58
Copy link
Member

@francoposa francoposa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly just looking for more comments / clarification on some test cases

I am not sure I have looked quite closely enough to confidently approve considering how big it is, I think maybe just running it locally with and without the flag would get me there, but in any case I am most comfortable with Patrick being the one to give final approval

@tacole02 tacole02 added the type/docs Improvements or additions to documentation label Jul 2, 2024
@chencs chencs merged commit 40cfafd into main Jul 5, 2024
29 checks passed
@chencs chencs deleted the casie/tree-queue-sketch branch July 5, 2024 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release/notified-changelog-cut type/docs Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants