Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow support #183

Merged
merged 15 commits into from
Jan 14, 2025
Merged

Workflow support #183

merged 15 commits into from
Jan 14, 2025

Conversation

cretz
Copy link
Member

@cretz cretz commented Jan 7, 2025

What was changed:

Added full workflow support including:

  • Full workflow worker with deterministic fiber scheduler
  • All workflow functionality/features
  • Workflow async via "future"s
  • Checks to disallow known non-deterministic calls from inside workflows
  • Updates to client to support providing direct workflow, signal, query, and update definitions
  • Time-skipping workflow test environment
  • Sigs and tests for all
  • 💥 BREAKING CHANGE - Activity classes now have to extend from Temporalio::Activity::Definition instead of just Temporalio::Activity

Not implemented yet:

Want to help review?

Great! We welcome all reviews/feedback from everyone. If the PR gets too many comments on it, we may create a new PR for another round of comments. I know the 160-file-count can seem daunting, but a lot of it is generated or unimportant code.

What type of reviewer do you want to be?

I want to review high-level design only

Review README.md (rendered here).

I want to review the workflow features but do not care about the implementation

In addition to README.md, also review files starting with temporalio/test/worker_workflow_.

I want to review the Ruby implementation but do not want to dig into the Rust side

Review everything but what's in temporalio/ext.

I want to help review everything including the Rust side

Review everything.

@cretz cretz force-pushed the workflows branch 3 times, most recently from f355a1d to fd35a1b Compare January 8, 2025 19:23
@cretz cretz marked this pull request as ready for review January 8, 2025 20:23
@cretz cretz requested a review from a team January 8, 2025 21:37
Copy link

@yuandrew yuandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewed the README.md, looks great! Mostly a few minor grammar comments. Progress is looking good 😊

README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved

A time-skipping `Temporalio::Testing::WorkflowEnvironment` can be started via `start_time_skipping` which is a
reimplementation of the Temporal server with special time skipping capabilities. This too lazily downloads the process
to run when first called. Note, this class is not thread safe nor safe for use with independent tests. It can be reused,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with 2 "Note"s back to back, should they be on separate lines, for clarity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following. I have a short paragraph for non-time-skipping and a short paragraph for time-skipping.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original thought was 2 sentences back to back starting with "Note," felt a little strange, but the more I think about it, I think it's fine

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Comment on lines 147 to 148
client = Temporalio::Client.connect('localhost:7233', 'my-namespace')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add something about error management (maybe a comment that it throws?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error management of which part, workers? Not really needed in the README any more than it is in Python on .NET READMEs. Worker lifecycle including shutdown can get a bit too complex for the README, but is documented in the API docs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was more like if connect fails do we throw an standard temporal error, or a vanilla Ruby error, or we just return nil, or ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if connect fails in any other SDK? Their READMEs aren't expected to provide that information either. Like all other SDKs, client connection failure is represented like any other method error (so in Ruby's case it raises an exception, API docs are clear that this cannot return nil). We can get specific in API docs on which error if needed, though most SDKs don't for client connect.

README.md Outdated Show resolved Hide resolved
Comment on lines +358 to +359
workflows: [MyModule::MyWorkflow],
# There are various forms an activity can take, see "Activities" section for details

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to add Options, i.e., RegisterWithOptions...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Details of customizing activities and workflows are in their respective sections in the README

@@ -309,21 +372,493 @@ Notes about the above code:

* A worker uses the same client that is used for other Temporal things.
* This just shows providing an activity class, but there are other forms, see the "Activities" section for details.
* The `workflow_executor` defaults to `Temporalio::Worker::WorkflowExecutor::Ractor.instance` which intentionally does

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it intentional that it does not work?

Copy link
Member Author

@cretz cretz Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we haven't completely built it but if/when we do (still some decisions to make, see #184), we would want it to be the default. So we can't default to the working one today then just switch defaults on the user later. So during beta we require the user to explicitly set this value since the default does not work yet. This is kinda touched on in the PR description, though not much.

Comment on lines +394 to +395
Workflows are defined as classes that extend `Temporalio::Workflow::Definition`. The entry point for a workflow is
`execute` and must be defined. Methods for handling signals, queries, and updates are marked with `workflow_signal`,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the constructor? Do we need one? Does it have any special requirements?

Copy link
Member Author

@cretz cretz Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No SDK that uses class-based workflows requires a constructor (and other SDK READMEs also don't need to clarify this since it's the default situation). However, an advanced workflow_init feature does exist and is touched on below. This README, like the Python and .NET ones, do not go into detail on the advanced workflow init constructor.

Comment on lines +429 to +432
workflow_update
def update_greeting_params(greeting_params_update)
@greeting_params_update = greeting_params_update
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is is not clear what is returned from the update, maybe be more explicit, or add comment...

Copy link
Member Author

@cretz cretz Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is normal Ruby code and it is clear to Ruby developers for these basic snippets. It's the value of the last statement, which in this case is the same update passed in. We could add nil as the last statement to return nil. We do not want to add full-blown YARD docs to all of the code snippets here though spelling out param and return types as it detracts from the small code.

Comment on lines +494 to +495
To start a workflow from a client, you can `start_workflow` and use the resulting handle:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you talk about error handling? What is thrown, best practices...

Copy link
Member Author

@cretz cretz Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you're talking about error handling from the start and execute workflow calls, We do not have that detail in any of the SDK READMEs and would not like to expand the README to that level. I would expect official docs and API docs to both clarify that though.

Comment on lines +509 to +512
handle.execute_update(
GreetingWorkflow.update_greeting_params,
{ salutation: 'Aloha', name: 'John' }
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a return value, it looks the same to signal below...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an update that basically has no important return. Sometimes updates do look like signals and the return is void and/or unimportant to the caller, only the action of the update.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but if people are not familiar with signal vs update they will see them similarly if update does not return something useful in the example...

Copy link
Member Author

@cretz cretz Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can delegate that to official docs and samples instead of forcing the README to do it. Some of our READMEs don't have update at all. This README is not meant to be the full docs. For this use, yes, you could basically use a signal. Maybe I will switch to signal and leave update out entirely.

Comment on lines +568 to +569
#### Workflow Fiber Scheduling and Cancellation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tiny example with cancellation will be useful here to understand how it works...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this or any SDK README needs to get into that level of detail. But would expect official docs to do so.

Comment on lines +589 to +590
safe wrapper around `Fiber.schedule` for starting and `Workflow.wait_condition` for waiting.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment that is deterministic during replay may help people not familiar with how Temporal works...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following, can you clarify a bit what you mean? All of our Temporal utilities are deterministic during replay (unless they are in the "unsafe" area), this utility is not unique and I wouldn't expect to have to make such a statement in each section.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that section you have Future.any_of ,and people get confused why it will always get the same one on replay.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure they get confused about that any more than why the response is the same one for every utility on replay. Not sure how this one utility's return value is special there.

Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hoo boy that is long.

Mostly focused on the worker/event loop parts. Looks good to me. Definitely feels a bit simpler than some of the others which is nice.

One thing I definitely do not love about Ruby: Class field initialization happening in any 'ol place instead of well-defined fields.

README.md Outdated Show resolved Hide resolved
Comment on lines +468 to +469
The following protected class methods can be called just before defining instance methods to customize the
definition/behavior of the method:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't "called just before" really something like "taking instance methods as an argument"?

Might be a tad more clear, if Ruby docs don't usually say it that way then no prob.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are method decorators, so it wouldn't be correct to say the instance method is an argument.

I don't think Ruby docs have a common way of describing this (because it's pretty uncommon), but I think "called just before" works well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is not the most common "Ruby-ism" but after discussion in the proposal, we found it was the best way to both "color" a method and make its definition available as a class method (as opposed to something like workflow_signals :foo, :bar at the top of the class). I figured Ruby developers would understand the concept of "called just before".


#### Timers and Conditions

* A timer is represented by `Temporalio::Workflow.sleep`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wahooo explicit

Copy link
Member Author

@cretz cretz Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, needed for things like summary and cancellation override, but per a couple of bullets below, technically we support regular sleep/timeout and just discourage it. Ruby devs kinda expect those to work and they are delegated to the scheduler.

Comment on lines +565 to +566
* Each wait conditions accepts a `Cancellation`, but if none is given, it defaults to
`Temporalio::Workflow.cancellation`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make a mention of what happens with 0 value timers since that discussion came up again recently and seems a periodic point of confusion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do make this clear in the API docs:

    # @param duration [Float, nil] Time to sleep in seconds. `nil` represents infinite, which does not start a timer and
    #   just waits for cancellation. `0` is assumed to be 1 millisecond and still results in a server-side timer. This
    #   value cannot be negative. Since Temporal timers are server-side, timer resolution may not end up as precise as
    #   system timers.

This is like .NET and gets rid of that concern of causing non-determinism when moving between zero and non-zero. Not sure we need to clarify in the README though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 That's good then

README.md Outdated Show resolved Hide resolved
Comment on lines +312 to +313
# TODO(cretz): Use the details somehow?
@cancellation_proc.call(reason: 'Workflow canceled')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well include them in another field or something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately these aren't really wired up to cancellation client call or server event in any way I can see so I can't really test it

Comment on lines +543 to +548
rescue Exception => e # rubocop:disable Lint/RescueException
if top_level
on_top_level_exception(e)
else
@current_activation_error ||= e
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for multiple errors in different scheduled fibers to clobber each other in the same activation?

Copy link
Member Author

@cretz cretz Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely possible, though each non-top-level scheduled fiber is expected to take care of its own errors (which is only query and update atm). But yes, we only track/return/log the first one raised (other newer SDKs do similar).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth pushing them into a list so they can all be shown, but, probably not a huge deal.

README.md Outdated
end

# Wait for them all to complete
Temporalio::Workflow.Future.all_of(fut1, fut2, fut3).wait

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: typo

Suggested change
Temporalio::Workflow.Future.all_of(fut1, fut2, fut3).wait
Temporalio::Workflow::Future.all_of(fut1, fut2, fut3).wait

Copy link
Member Author

@cretz cretz Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixing now. EDIT: Fixed

As a side node, thanks for reviewing! We also welcome any general/high-level feedback you may have on our design/approach here or in #ruby-sdk at https://t.mp/slack or via any avenue you'd like.

@cretz
Copy link
Member Author

cretz commented Jan 14, 2025

Merging, but reviews can still be made in here if anyone would like and they will still be read/applied (they just aren't marked "approved" or not)

@cretz cretz merged commit 4512224 into temporalio:main Jan 14, 2025
6 checks passed
@cretz cretz deleted the workflows branch January 14, 2025 16:02
@@ -72,6 +74,10 @@ def skip_if_fibers_not_supported!
skip('Fibers not supported in this Ruby version')
end

def skip_if_not_x86!
skip('Test only supported on x86') unless RbConfig::CONFIG['host_cpu'] == 'x86_64'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd suggest making this predicate more targeted, i.e. skip_if_test_server_not_supported and allow arm64 on macos.

Most devs in the team are working on Apple M* cpus, and as it is now, they wouldn't be running those tests localy even though they technically could. I know there's very few tests that depends on that condition, but still…

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will change

logger: Logger.new($stdout),
dev_server_extra_args: [
# Allow continue as new to be immediate
'--dynamic-config-value', 'history.workflowIdReuseMinimalInterval="0s"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, really?!? TIL!

README.md Show resolved Hide resolved
workflows: [MyModule::MyWorkflow],
# There are various forms an activity can take, see "Activities" section for details
activities: [MyModule::MyActivity],
# During the beta period, this must be provided explicitly, see below for details
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the "beta period" expression (it is used in a few place in this README file). It is not part of our usual terminology, and it's not clear to me how that would map to our actual version scheme. And anyway, is it really bound to a specific release, like that will be this way until 1.0, then change from that point on? Or is it just something not yet implemented? If the later, might just say "For now".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept of a "beta" library is well known to developers, especially Ruby ones. We need to decide what to do with this default by 1.0. This will not be a required field in 1.0, but we are not sure what will be the default yet, so we are requiring it be explicitly set.

⚠️ Workflows cannot yet be implemented Ruby.
#### Workflow Definition

Workflows are defined as classes that extend `Temporalio::Workflow::Definition`. The entry point for a workflow is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are marked ... just before the method is defined

nit: Doesn't Ruby community have some common term for that pattern? e.g. anotations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, this is not that common of a pattern. Also discussed a bit in another comment thread: #183 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants