Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Scheduled Actions V2] state machine protobufs WIP #6901

Merged
merged 5 commits into from
Jan 16, 2025
Merged

Conversation

lina-temporal
Copy link
Contributor

@lina-temporal lina-temporal commented Nov 27, 2024

What changed?

  • initial protobufs for the V2 scheduler

Why?

  • these are incomplete, and written as-needed in tandem with the implementation. I believe this is a good starting point, as it's the state I need to handle the basic case of buffering scheduled actions.

How did you test it?

  • No tests for the protobufs themselves

Potential risks

  • New fields in new structs, except for RequestId on BufferedStart. As a completely new field the V1 scheduler won't look at, I don't think there's a risk.

Comment on lines 63 to 64
// Generator is buffering actions.
SCHEDULER_GENERATOR_STATE_BUFFERING = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my review of #6905, you don't need this separate state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will remove.

@@ -47,6 +47,47 @@ enum SchedulerState {
SCHEDULER_STATE_EXECUTING = 2;
}

enum Scheduler2State {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need ScheduleState anymore. It's part of Tianyu's work that we're replacing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was planning to remove Tianyu's prototype as part of a later PR; I can send it now and fix this up so we don't have the naming conflict.

Comment on lines 71 to 76
// Executor is awaiting actions to be buffered and eligible for execution.
SCHEDULER_EXECUTOR_STATE_WAITING = 1;
// Executor is starting actions.
SCHEDULER_EXECUTOR_STATE_EXECUTING = 2;
// Executor is backing off from executing actions.
SCHEDULER_EXECUTOR_STATE_BACKING_OFF = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need two distinct states for waiting and backing off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think they can be removed.

Comment on lines 83 to 88
// Backfiller is awaiting backfill to be requested.
SCHEDULER_BACKFILLER_STATE_WAITING = 1;
// Backfiller is actively starting actions.
SCHEDULER_BACKFILLER_STATE_EXECUTING = 2;
// Backfiller is backing off from starting actions.
SCHEDULER_BACKFILLER_STATE_BACKING_OFF = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@@ -48,6 +48,9 @@ message BufferedStart {
temporal.api.enums.v1.ScheduleOverlapPolicy overlap_policy = 3;
// Trigger-immediately or backfill
bool manual = 4;
// An ID generated when the action is buffered for deduplication during
// execution. Only used by the V2 Scheduler (otherwise left empty).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth calling the scheduler "state machine scheduler" instead of v2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's probably a more lasting name for it :)

@@ -155,3 +158,59 @@ message HsmSchedulerState {
google.protobuf.Timestamp next_invocation_time = 3;

}

// V2 Scheduler state
message HsmSchedulerV2State {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove Hsm from the name since it's an implementation detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also remove the name State from all of these messages to avoid confusion with the enums that are similarly named.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will fix both

message HsmSchedulerV2State {
temporal.server.api.enums.v1.Scheduler2State state = 1;

// scheduler request parameters and metadata.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please be consistent in docstrings. It's good practice to capitalize first letters in sentences and use punctuation.

string schedule_id = 7;

// Implemented as a sequence number. Useful for substate machines to
// invalidate transactions based on update requests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'd say we're invalidating transactions, more like invaliding "work" or task, right? but also used as an optimistic locking mechanism for concurrent update requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used for optimistic locking, yeah; I'm trying to give an example of the specific sort of condition that would bump the token in each different struct. If that's confusing, I can simplify the comment.

Comment on lines 191 to 193
// Implemented as a sequence number. Useful for invalidating a stale
// Generator persisted state write.
int64 conflict_token = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... can we use the conflict token of the scheduler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to avoid making an assumption that we can have transactions across multiple columns/documents in CHASM, since I don't think we have that in HSM today (with MachineTransition operating on a single item). If our framework lets me do a transaction across multiple state machines, we could use only the one in the top-level scheduler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MachineTransition is a single transition within a transaction on the entire tree. You can rely on that here. CHASM will have similar semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - will update!

@@ -47,6 +47,42 @@ enum SchedulerState {
SCHEDULER_STATE_EXECUTING = 2;
}

enum Scheduler2State {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems redundant if we only have one state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove (here and for the other mono-state machines).

}

// State for the state machine scheduler's Generator.
enum SchedulerGeneratorState {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

}

// State for the state machine scheduler's Backfiller.
enum SchedulerBackfillerState {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I thought the backfiller is just another type of generator that sends requests to the executor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are, but because backfills have a shared backoff timer and batch size, my inclination is to have a single Backfiller node responsible for all ongoing backfills. Since the Backfiller would then be around for 0..n possible backfills, I think we'd want the distinct "wait an indeterminate amount of time" sleep state. Once a backfill is active, I'll structure it similar to the Generator, where Backfiller will set a delay on a repeated timer task until complete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI don't necessarily take that backfill logic as hard requirements. It was just some vaguely plausible thing I came up with, that was a strict improvement over "all backfills run synchronously".

Feel free to do whatever is more natural for this structure, which could be one node per concurrent backfill. The only real requirement is that the total rate limit is respected, and ongoing backfills don't "interfere" with regularly scheduled runs, i.e. they get a lower effective priority. I did that with the "half the buffer size" thing and BackfillsPerIteration but there may be better ways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, hadn't realized it wasn't a requirement. In that case, multiple backfillers is probably a lot more ergonomic. I'll update the protos.

Comment on lines +167 to +165
// Scheduler request parameters and metadata.
temporal.api.schedule.v1.Schedule schedule = 2;
temporal.api.schedule.v1.ScheduleInfo info = 3;
temporal.api.schedule.v1.SchedulePatch initial_patch = 4;

// State common to all generators is stored in the top-level machine.
string namespace = 5;
string namespace_id = 6;
string schedule_id = 7;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that these docstrings will only be attached to the field in the following line.
You'll want to ensure that all fields have docstrings that directly apply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware of this, but it's a pervasive style throughout the codebase (even in this same file); what would you suggest instead? Freely floating the comment above the block so it isn't applied?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but I think that should work.

// Scheduler request parameters and metadata.
temporal.api.schedule.v1.Schedule schedule = 2;
temporal.api.schedule.v1.ScheduleInfo info = 3;
temporal.api.schedule.v1.SchedulePatch initial_patch = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... why is this the initial patch? Where do we store the "latest" patch? Is that required? I'm not super familiar with the business logic here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial_patch is only set through the handler's CreateWorkflow operation, so there's no concept of a "latest" patch. Past CreateWorkflow, a user would start a backfill instead.

}

// State machine scheduler's Backfiller internal state.
message BackfillerInternal {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was imagining that we'd have multiple backfillers and each would have their own backfill config attached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, ended up updating it to multiple backfillers.

@@ -155,3 +159,55 @@ message HsmSchedulerState {
google.protobuf.Timestamp next_invocation_time = 3;

}

// State machine scheduler internal state.
message SchedulerInternal {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you probably don't want to type Internal every time in the Go code, it should be implied IMHO. Up to you.

@lina-temporal lina-temporal marked this pull request as ready for review January 6, 2025 23:07
@lina-temporal lina-temporal requested review from bergundy and dnr January 6, 2025 23:08
Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the name Internal for these messages, maybe Info or Data? I kinda wish we'd used Status for enums and State for data but not blocking the PR.

@lina-temporal lina-temporal enabled auto-merge (squash) January 16, 2025 22:47
@lina-temporal lina-temporal merged commit 2497e58 into main Jan 16, 2025
50 checks passed
@lina-temporal lina-temporal deleted the sched2_proto branch January 16, 2025 23:06
stephanos pushed a commit to stephanos/temporal that referenced this pull request Jan 17, 2025
…6901)

## What changed?
- initial protobufs for the V2 (CHASM) scheduler

## Why?
- initial pass at internal protobufs for CHASM scheduler

## How did you test it?
- No tests for the protobufs themselves

## Potential risks
- New fields in new structs, **except** for `RequestId` on
`BufferedStart`. As a completely new field the V1 scheduler won't look
at, I don't think there's a risk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants