-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add server config option to spawn handlers detached from client disconnect behavior #701
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a change like this, I think it'd be useful to test-integrate it into Omicron before we land it here. What do you think? (I'm afraid you're going to wind up undoing some changes I did recently to remove ..Default::default()
from a whole bunch of ConfigDropshot
structures. Sorry about that.)
For the record: I confirmed that the current behavior is expected from hyper. I wanted to confirm that because it does seem counter-intuitive to a bunch of us and if we're going so far out of our way to do things differently, I wanted to make sure we weren't just holding it wrong. (I'm not sure why it's the behavior and I remain frustrated that cancellation appears to be undocumented in the canonical Rust and async/await references.)
dropshot/src/server.rs
Outdated
// Ignore errors from `send()`: if it fails, it's because the | ||
// outer `http_request_handle` future was cancelled while | ||
// waiting for this result. | ||
_ = tx.send(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a silly question, but why not use the Future's output as provided by awaiting on the spawn handle?
Given that we're not, is it at least worth warning in this case? (The answer might be "no", but it seems potentially useful to learn this.)
Come to think of it: I feel like it would be useful to to get a "warn"-level log message any time a client disconnects while we're running a request handler. What do you think? (That can definitely be deferred, too.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a silly question, but why not use the Future's output as provided by awaiting on the spawn handle?
I... have no idea. I think it's just habit to use oneshot channels like this; normally when spawning I think I'm not in a position where I can await the future directly.
I do think adding a warning here would be handy, and I think easy? I'll give it a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added logging (both here and around the .unwrap()
below) in efef6f9.
dropshot/tests/test_config.rs
Outdated
// We weren't cancelled: mem::forget() drop_counter so its drop impl | ||
// doesn't run. This leaks a reference to our drop_count, but we're in a | ||
// unit test so won't worry about it. | ||
mem::forget(drop_counter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two new tests feel more complicated than they need to be (caveat: I'm not sure about that!). Part of it is that feels a little weird to test Drop
so explicitly when it's really a proxy for some other behavior that's what we really care about. Especially given that we're using this out to bypass the normal Drop behavior.
One implementation would be to just panic at this point. Of course, that wouldn't let you test that when not cancelled it runs to completion, so I guess we could be worried that it's not panicking by accident.
How about this: what if you had a handler with two explicit counters: one bumped before receiving the message from the main task and one bumped after. I think you could use the same handler and ServerContext. And I think you could drive them the same way? You could have a helper function that's given a value for the config, constructs a server with this handler, starts it, runs one request to normal completion, then starts another request but cancels it, and then shuts down the server and waits for it to shut down. Then it could return the final counter values. Besides hopefully being simpler, an advantage would be that we know we're testing exactly the same thing in both cases.
I haven't fleshed this out any more than that so it's definitely possible there's something I'm missing. The code here looks correct. If you think what's here is a better path, that's fine with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a stab at simplifying them in 8946d44 - I didn't go with the two counters idea, but I think what I did makes what is being tested much more obvious. LMK what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I mentioned above but I think got lost: I think it could be quite valuable if, when using CancelOnDisconnect
we emitted a warning message when the Future was being dropped while in a request handler. This can definitely be a follow-on change (and not asking you to do that) but I figured I'd mention it. The easiest way I can think to do this is to store a bit (AtomicBool?) that starts false, gets set to true
after we finish the request handler, and then create something in the same scope with a Drop
handler (maybe scope_guard
can help here?) that checks if the boolean is false? Seems kind of cheesy though.
dropshot/src/server.rs
Outdated
error!(request_log, "handler panicked"); | ||
panic!("handler panicked before sending response"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to handle this. It'd be nice to report the error in these cases (or else is it lost completely?). I'm not clear that we can because we don't have anything usefully printable.
This example is interesting:
https://docs.rs/tokio/latest/tokio/task/struct.JoinError.html#examples-1
Maybe that's the way to go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, that looks great. It's a little more work to get the error out; added in 204c111
Co-authored-by: David Pacheco <dap@oxidecomputer.com>
I added a TODO to this effect, but yeah, I'm not sure how best to do this either at the moment. With the default changing it doesn't seem super urgent at the moment.
(Lifting this up since github resolved the conversation when I took your CHANGELOG edit) I don't think that's true: we're using at least |
That's possible. It's not obvious to me that they would. |
Adds a new
default_handler_disposition
field to bothDropshotConfig
andServerConfig
. The default value for this (inDropshotConfig
'sDefault
impl) isHandlerDisposition::CancelOnDisconnect
, which matches the current behavior of dropshot: if a client disconnects, the future returned by the endpoint handler is dropped (and therefore cancelled). If the config is insteadHandlerDisposition::DetachFromClient
, all endpoint handler futures aretokio::spawn()
'd, and they will therefore always run to completion.I phrased this field as
default_handler_disposition
under the assumption that we'd (eventually) provide a way for individual endpoints to override this choice, but am very much not wed to the naming here - any suggestions are welcome.