[Scheduled Actions V2] state machine protobufs WIP #6901

lina-temporal · 2024-11-27T23:29:11Z

What changed?

initial protobufs for the V2 scheduler

Why?

these are incomplete, and written as-needed in tandem with the implementation. I believe this is a good starting point, as it's the state I need to handle the basic case of buffering scheduled actions.

How did you test it?

No tests for the protobufs themselves

Potential risks

New fields in new structs, except for RequestId on BufferedStart. As a completely new field the V1 scheduler won't look at, I don't think there's a risk.

bergundy · 2024-12-02T19:13:44Z

proto/internal/temporal/server/api/enums/v1/common.proto

+    // Generator is buffering actions.
+    SCHEDULER_GENERATOR_STATE_BUFFERING = 2;


Based on my review of #6905, you don't need this separate state.

Agreed, will remove.

bergundy · 2024-12-02T19:15:22Z

proto/internal/temporal/server/api/enums/v1/common.proto

@@ -47,6 +47,47 @@ enum SchedulerState {
    SCHEDULER_STATE_EXECUTING = 2;
 }

+enum Scheduler2State {


We don't need ScheduleState anymore. It's part of Tianyu's work that we're replacing.

Sure, I was planning to remove Tianyu's prototype as part of a later PR; I can send it now and fix this up so we don't have the naming conflict.

bergundy · 2024-12-02T19:17:34Z

proto/internal/temporal/server/api/enums/v1/common.proto

+    // Executor is awaiting actions to be buffered and eligible for execution.
+    SCHEDULER_EXECUTOR_STATE_WAITING = 1;
+    // Executor is starting actions.
+    SCHEDULER_EXECUTOR_STATE_EXECUTING = 2;
+    // Executor is backing off from executing actions.
+    SCHEDULER_EXECUTOR_STATE_BACKING_OFF = 3;


Why do you need two distinct states for waiting and backing off?

Yeah, I think they can be removed.

bergundy · 2024-12-02T19:17:47Z

proto/internal/temporal/server/api/enums/v1/common.proto

+    // Backfiller is awaiting backfill to be requested.
+    SCHEDULER_BACKFILLER_STATE_WAITING = 1;
+    // Backfiller is actively starting actions.
+    SCHEDULER_BACKFILLER_STATE_EXECUTING = 2;
+    // Backfiller is backing off from starting actions.
+    SCHEDULER_BACKFILLER_STATE_BACKING_OFF = 3;


Same as above.

bergundy · 2024-12-02T19:18:26Z

proto/internal/temporal/server/api/schedule/v1/message.proto

@@ -48,6 +48,9 @@ message BufferedStart {
    temporal.api.enums.v1.ScheduleOverlapPolicy overlap_policy = 3;
    // Trigger-immediately or backfill
    bool manual = 4;
+    // An ID generated when the action is buffered for deduplication during
+    // execution. Only used by the V2 Scheduler (otherwise left empty).


Maybe worth calling the scheduler "state machine scheduler" instead of v2?

Sure, that's probably a more lasting name for it :)

bergundy · 2024-12-02T19:19:19Z

proto/internal/temporal/server/api/schedule/v1/message.proto

@@ -155,3 +158,59 @@ message HsmSchedulerState {
    google.protobuf.Timestamp next_invocation_time = 3;

 }
+
+// V2 Scheduler state
+message HsmSchedulerV2State {


I'd remove Hsm from the name since it's an implementation detail.

I'd also remove the name State from all of these messages to avoid confusion with the enums that are similarly named.

Sure, will fix both

bergundy · 2024-12-02T19:23:25Z

proto/internal/temporal/server/api/schedule/v1/message.proto

+message HsmSchedulerV2State {
+    temporal.server.api.enums.v1.Scheduler2State state = 1;
+
+    // scheduler request parameters and metadata. 


Could you please be consistent in docstrings. It's good practice to capitalize first letters in sentences and use punctuation.

bergundy · 2024-12-02T19:24:41Z

proto/internal/temporal/server/api/schedule/v1/message.proto

+    string schedule_id = 7;
+
+    // Implemented as a sequence number. Useful for substate machines to
+    // invalidate transactions based on update requests


Not sure I'd say we're invalidating transactions, more like invaliding "work" or task, right? but also used as an optimistic locking mechanism for concurrent update requests.

It's used for optimistic locking, yeah; I'm trying to give an example of the specific sort of condition that would bump the token in each different struct. If that's confusing, I can simplify the comment.

bergundy · 2024-12-02T19:25:27Z

proto/internal/temporal/server/api/schedule/v1/message.proto

+    // Implemented as a sequence number. Useful for invalidating a stale
+    // Generator persisted state write.
+    int64 conflict_token = 4;


Hmm... can we use the conflict token of the scheduler?

I'm trying to avoid making an assumption that we can have transactions across multiple columns/documents in CHASM, since I don't think we have that in HSM today (with MachineTransition operating on a single item). If our framework lets me do a transaction across multiple state machines, we could use only the one in the top-level scheduler.

MachineTransition is a single transition within a transaction on the entire tree. You can rely on that here. CHASM will have similar semantics.

Got it - will update!

bergundy · 2024-12-18T04:23:25Z

proto/internal/temporal/server/api/enums/v1/common.proto

@@ -47,6 +47,42 @@ enum SchedulerState {
    SCHEDULER_STATE_EXECUTING = 2;
 }

+enum Scheduler2State {


Seems redundant if we only have one state.

Will remove (here and for the other mono-state machines).

bergundy · 2024-12-18T04:23:30Z

proto/internal/temporal/server/api/enums/v1/common.proto

+}
+
+// State for the state machine scheduler's Generator.
+enum SchedulerGeneratorState {


bergundy · 2024-12-18T04:24:35Z

proto/internal/temporal/server/api/enums/v1/common.proto

+}
+
+// State for the state machine scheduler's Backfiller.
+enum SchedulerBackfillerState {


Hmm... I thought the backfiller is just another type of generator that sends requests to the executor.

Yes, they are, but because backfills have a shared backoff timer and batch size, my inclination is to have a single Backfiller node responsible for all ongoing backfills. Since the Backfiller would then be around for 0..n possible backfills, I think we'd want the distinct "wait an indeterminate amount of time" sleep state. Once a backfill is active, I'll structure it similar to the Generator, where Backfiller will set a delay on a repeated timer task until complete.

FYI don't necessarily take that backfill logic as hard requirements. It was just some vaguely plausible thing I came up with, that was a strict improvement over "all backfills run synchronously".

Feel free to do whatever is more natural for this structure, which could be one node per concurrent backfill. The only real requirement is that the total rate limit is respected, and ongoing backfills don't "interfere" with regularly scheduled runs, i.e. they get a lower effective priority. I did that with the "half the buffer size" thing and BackfillsPerIteration but there may be better ways.

Ah, hadn't realized it wasn't a requirement. In that case, multiple backfillers is probably a lot more ergonomic. I'll update the protos.

bergundy · 2024-12-18T04:27:10Z

proto/internal/temporal/server/api/schedule/v1/message.proto

+    // Scheduler request parameters and metadata. 
+    temporal.api.schedule.v1.Schedule schedule = 2;
+    temporal.api.schedule.v1.ScheduleInfo info = 3;
+    temporal.api.schedule.v1.SchedulePatch initial_patch = 4;
+
+    // State common to all generators is stored in the top-level machine.
+    string namespace = 5;
+    string namespace_id = 6;
+    string schedule_id = 7;


Note that these docstrings will only be attached to the field in the following line.
You'll want to ensure that all fields have docstrings that directly apply.

I'm aware of this, but it's a pervasive style throughout the codebase (even in this same file); what would you suggest instead? Freely floating the comment above the block so it isn't applied?

Not sure, but I think that should work.

bergundy · 2024-12-18T04:28:04Z

proto/internal/temporal/server/api/schedule/v1/message.proto

+    // Scheduler request parameters and metadata. 
+    temporal.api.schedule.v1.Schedule schedule = 2;
+    temporal.api.schedule.v1.ScheduleInfo info = 3;
+    temporal.api.schedule.v1.SchedulePatch initial_patch = 4;


Hmm... why is this the initial patch? Where do we store the "latest" patch? Is that required? I'm not super familiar with the business logic here.

initial_patch is only set through the handler's CreateWorkflow operation, so there's no concept of a "latest" patch. Past CreateWorkflow, a user would start a backfill instead.

bergundy · 2024-12-18T04:29:05Z

proto/internal/temporal/server/api/schedule/v1/message.proto

+}
+
+// State machine scheduler's Backfiller internal state.
+message BackfillerInternal {


I was imagining that we'd have multiple backfillers and each would have their own backfill config attached.

Yep, ended up updating it to multiple backfillers.

bergundy · 2024-12-18T04:30:27Z

proto/internal/temporal/server/api/schedule/v1/message.proto

@@ -155,3 +159,55 @@ message HsmSchedulerState {
    google.protobuf.Timestamp next_invocation_time = 3;

 }
+
+// State machine scheduler internal state.
+message SchedulerInternal {


nit: you probably don't want to type Internal every time in the Go code, it should be implied IMHO. Up to you.

bergundy

Not sure about the name Internal for these messages, maybe Info or Data? I kinda wish we'd used Status for enums and State for data but not blocking the PR.

…6901) ## What changed? - initial protobufs for the V2 (CHASM) scheduler ## Why? - initial pass at internal protobufs for CHASM scheduler ## How did you test it? - No tests for the protobufs themselves ## Potential risks - New fields in new structs, **except** for `RequestId` on `BufferedStart`. As a completely new field the V1 scheduler won't look at, I don't think there's a risk.

lina-temporal assigned bergundy and yycptt Nov 27, 2024

lina-temporal requested a review from a team as a code owner November 27, 2024 23:29

lina-temporal marked this pull request as draft November 27, 2024 23:29

lina-temporal mentioned this pull request Nov 27, 2024

[Scheduled Actions V2] WIP Common/util Package #6903

Closed

bergundy reviewed Dec 2, 2024

View reviewed changes

lina-temporal unassigned bergundy and yycptt Dec 3, 2024

bergundy reviewed Dec 18, 2024

View reviewed changes

lina-temporal marked this pull request as ready for review January 6, 2025 23:07

lina-temporal requested review from bergundy and dnr January 6, 2025 23:08

bergundy approved these changes Jan 8, 2025

View reviewed changes

lina-temporal added 5 commits January 16, 2025 13:55

WIP Scheduler V2 state machine protobufs

b9cc862

protobuf feedback

b57271e

protobuf feedback pt2

57439a1

update backfiller protos to represent 1:1 mapping with backfill requests

8df4873

cleanup rebase

b4a9fd0

lina-temporal force-pushed the sched2_proto branch from 13fc17a to b4a9fd0 Compare January 16, 2025 22:17

lina-temporal enabled auto-merge (squash) January 16, 2025 22:47

lina-temporal merged commit 2497e58 into main Jan 16, 2025
50 checks passed

lina-temporal deleted the sched2_proto branch January 16, 2025 23:06

		// Generator is buffering actions.
		SCHEDULER_GENERATOR_STATE_BUFFERING = 2;

[Scheduled Actions V2] state machine protobufs WIP #6901

[Scheduled Actions V2] state machine protobufs WIP #6901

Conversation

lina-temporal commented Nov 27, 2024 • edited Loading

What changed?

Why?

How did you test it?

Potential risks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bergundy left a comment

Choose a reason for hiding this comment

lina-temporal commented Nov 27, 2024 •

edited

Loading