-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker client replacement #721
Conversation
client/src/lib.rs
Outdated
@@ -296,7 +296,7 @@ pub struct ConfiguredClient<C> { | |||
options: Arc<ClientOptions>, | |||
headers: Arc<RwLock<ClientHeaders>>, | |||
/// Capabilities as read from the `get_system_info` RPC call made on client connection | |||
capabilities: Option<get_system_info_response::Capabilities>, | |||
capabilities: Arc<Option<get_system_info_response::Capabilities>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capabilities is just a bunch of bools. No need to Arc it. In fact, you could change the proto build.rs to add a derive Copy to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember the problem now. The issue is that WorkerClient::capabilities()
wants to return a reference, but that is now on a replaceable client so I can't return a reference. Should I clone/copy for every invocation of WorkflowClient::capabilites()
call and change that to return a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's totally fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Will make this call return a clone of the proto on each invocation and change back everywhere else to continue doing the same.
core/src/worker/mod.rs
Outdated
); | ||
let worker_key = client.workers().register(Box::new(provider)); | ||
let worker_key = Mutex::new(client.workers().register(Box::new(slot_provider.clone()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provider shouldn't need to be cloned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it is a non-cloneable struct that is moved into this "register" call. But I need to call this "register" call with it again so now I need to have multiple references to this slot provider, so I had to make it cloneable and store it on the worker. I can wrap it in an Arc and update everywhere its used or I can clone it upon use here (or do more advanced stuff w/ references and lifetimes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, I see since it is getting put into Self
below, I missed that initially.
I think the alternative here would be to not instantiate the worker key here any longer, since you can create it from within self now that it owns the provider. It probably ends up being the same issue though.
Imo it probably does make sense to wrap the provider in an arc rather than the channel inside it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to get away with as little of changes to the existing worker registration/eager code as I could, but I fear we're basically saying I may need to do a bit more to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that much change, and it's nice to maintain some good stewardship where then structure makes sense rather than a minimum possible change, not like this is in a huge rush or anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched from Box to Arc, you were correct, not a big change (417b709).
slot_provider: Arc<SlotProvider>, | ||
/// Registration key to enable eager workflow start for this worker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cretz Sorry for being so late in the game (already merged...) I wonder if holding a reference to slot_provider will hang the worker shutdown in some cases. slot_provider contains a reference to external_wft_tx
and I think we need to drop all refs to it, so that the rx part closes, and that unblocks the worker shutdown...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this have failed tests that rely on worker shutdown to complete (or are there none or would worker shutdown only trigger in certain ways)? Is the SlotProvider
drop needed in time for shutdown
or finalize_shutdown
? The latter consumes this worker and therefore this slot provider should be dropped.
Can we write a test for this? Do you have alternative suggestions on how to handle this so I can still use the same slot provider upon client replacement? Maybe we can make unregister()
return the slot provider it removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have opened #725. I think that is a cleaner approach anyways and I wish I thought of it before.
What was changed
Added
Worker::replace_client
. Notes:Checklist