-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Scheduled Actions V2] state machine protobufs WIP #6901
Conversation
// Generator is buffering actions. | ||
SCHEDULER_GENERATOR_STATE_BUFFERING = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my review of #6905, you don't need this separate state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will remove.
@@ -47,6 +47,47 @@ enum SchedulerState { | |||
SCHEDULER_STATE_EXECUTING = 2; | |||
} | |||
|
|||
enum Scheduler2State { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need ScheduleState
anymore. It's part of Tianyu's work that we're replacing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I was planning to remove Tianyu's prototype as part of a later PR; I can send it now and fix this up so we don't have the naming conflict.
// Executor is awaiting actions to be buffered and eligible for execution. | ||
SCHEDULER_EXECUTOR_STATE_WAITING = 1; | ||
// Executor is starting actions. | ||
SCHEDULER_EXECUTOR_STATE_EXECUTING = 2; | ||
// Executor is backing off from executing actions. | ||
SCHEDULER_EXECUTOR_STATE_BACKING_OFF = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need two distinct states for waiting and backing off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think they can be removed.
// Backfiller is awaiting backfill to be requested. | ||
SCHEDULER_BACKFILLER_STATE_WAITING = 1; | ||
// Backfiller is actively starting actions. | ||
SCHEDULER_BACKFILLER_STATE_EXECUTING = 2; | ||
// Backfiller is backing off from starting actions. | ||
SCHEDULER_BACKFILLER_STATE_BACKING_OFF = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
@@ -48,6 +48,9 @@ message BufferedStart { | |||
temporal.api.enums.v1.ScheduleOverlapPolicy overlap_policy = 3; | |||
// Trigger-immediately or backfill | |||
bool manual = 4; | |||
// An ID generated when the action is buffered for deduplication during | |||
// execution. Only used by the V2 Scheduler (otherwise left empty). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth calling the scheduler "state machine scheduler" instead of v2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's probably a more lasting name for it :)
@@ -155,3 +158,59 @@ message HsmSchedulerState { | |||
google.protobuf.Timestamp next_invocation_time = 3; | |||
|
|||
} | |||
|
|||
// V2 Scheduler state | |||
message HsmSchedulerV2State { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove Hsm
from the name since it's an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also remove the name State
from all of these messages to avoid confusion with the enums that are similarly named.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will fix both
message HsmSchedulerV2State { | ||
temporal.server.api.enums.v1.Scheduler2State state = 1; | ||
|
||
// scheduler request parameters and metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please be consistent in docstrings. It's good practice to capitalize first letters in sentences and use punctuation.
string schedule_id = 7; | ||
|
||
// Implemented as a sequence number. Useful for substate machines to | ||
// invalidate transactions based on update requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I'd say we're invalidating transactions, more like invaliding "work" or task, right? but also used as an optimistic locking mechanism for concurrent update requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used for optimistic locking, yeah; I'm trying to give an example of the specific sort of condition that would bump the token in each different struct. If that's confusing, I can simplify the comment.
// Implemented as a sequence number. Useful for invalidating a stale | ||
// Generator persisted state write. | ||
int64 conflict_token = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... can we use the conflict token of the scheduler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to avoid making an assumption that we can have transactions across multiple columns/documents in CHASM, since I don't think we have that in HSM today (with MachineTransition
operating on a single item). If our framework lets me do a transaction across multiple state machines, we could use only the one in the top-level scheduler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MachineTransition is a single transition within a transaction on the entire tree. You can rely on that here. CHASM will have similar semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it - will update!
@@ -47,6 +47,42 @@ enum SchedulerState { | |||
SCHEDULER_STATE_EXECUTING = 2; | |||
} | |||
|
|||
enum Scheduler2State { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems redundant if we only have one state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove (here and for the other mono-state machines).
} | ||
|
||
// State for the state machine scheduler's Generator. | ||
enum SchedulerGeneratorState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
} | ||
|
||
// State for the state machine scheduler's Backfiller. | ||
enum SchedulerBackfillerState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I thought the backfiller is just another type of generator that sends requests to the executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are, but because backfills have a shared backoff timer and batch size, my inclination is to have a single Backfiller
node responsible for all ongoing backfills. Since the Backfiller
would then be around for 0..n
possible backfills, I think we'd want the distinct "wait an indeterminate amount of time" sleep state. Once a backfill is active, I'll structure it similar to the Generator
, where Backfiller
will set a delay on a repeated timer task until complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI don't necessarily take that backfill logic as hard requirements. It was just some vaguely plausible thing I came up with, that was a strict improvement over "all backfills run synchronously".
Feel free to do whatever is more natural for this structure, which could be one node per concurrent backfill. The only real requirement is that the total rate limit is respected, and ongoing backfills don't "interfere" with regularly scheduled runs, i.e. they get a lower effective priority. I did that with the "half the buffer size" thing and BackfillsPerIteration
but there may be better ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, hadn't realized it wasn't a requirement. In that case, multiple backfillers is probably a lot more ergonomic. I'll update the protos.
// Scheduler request parameters and metadata. | ||
temporal.api.schedule.v1.Schedule schedule = 2; | ||
temporal.api.schedule.v1.ScheduleInfo info = 3; | ||
temporal.api.schedule.v1.SchedulePatch initial_patch = 4; | ||
|
||
// State common to all generators is stored in the top-level machine. | ||
string namespace = 5; | ||
string namespace_id = 6; | ||
string schedule_id = 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that these docstrings will only be attached to the field in the following line.
You'll want to ensure that all fields have docstrings that directly apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aware of this, but it's a pervasive style throughout the codebase (even in this same file); what would you suggest instead? Freely floating the comment above the block so it isn't applied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but I think that should work.
// Scheduler request parameters and metadata. | ||
temporal.api.schedule.v1.Schedule schedule = 2; | ||
temporal.api.schedule.v1.ScheduleInfo info = 3; | ||
temporal.api.schedule.v1.SchedulePatch initial_patch = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... why is this the initial patch? Where do we store the "latest" patch? Is that required? I'm not super familiar with the business logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial_patch
is only set through the handler's CreateWorkflow
operation, so there's no concept of a "latest" patch. Past CreateWorkflow
, a user would start a backfill instead.
} | ||
|
||
// State machine scheduler's Backfiller internal state. | ||
message BackfillerInternal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was imagining that we'd have multiple backfillers and each would have their own backfill config attached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, ended up updating it to multiple backfillers.
@@ -155,3 +159,55 @@ message HsmSchedulerState { | |||
google.protobuf.Timestamp next_invocation_time = 3; | |||
|
|||
} | |||
|
|||
// State machine scheduler internal state. | |||
message SchedulerInternal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you probably don't want to type Internal
every time in the Go code, it should be implied IMHO. Up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the name Internal
for these messages, maybe Info
or Data
? I kinda wish we'd used Status
for enums and State
for data but not blocking the PR.
13fc17a
to
b4a9fd0
Compare
…6901) ## What changed? - initial protobufs for the V2 (CHASM) scheduler ## Why? - initial pass at internal protobufs for CHASM scheduler ## How did you test it? - No tests for the protobufs themselves ## Potential risks - New fields in new structs, **except** for `RequestId` on `BufferedStart`. As a completely new field the V1 scheduler won't look at, I don't think there's a risk.
What changed?
Why?
How did you test it?
Potential risks
RequestId
onBufferedStart
. As a completely new field the V1 scheduler won't look at, I don't think there's a risk.