-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrade to hyper v1 #1028
upgrade to hyper v1 #1028
Conversation
I currently make fairly extensive use of: Body::wrap_stream(tokio_stream::wrappers::ReceiverStream::new(rx)) Will that still work with the new code? |
It'd be pretty straight forward to add such a constructor to |
ffe15a3
to
6e1ed6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking very promising. I'd love to get @sunshowers and @davepacheco to take a look. Can you please include the design document here? Also I would suggest taking a stab at migration content in the changelog as we try to help consumers figure out the path from previous versions. Thank you!
impl Default for Body { | ||
fn default() -> Body { | ||
Body::empty() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may prefer people use ::empty() in order to be explicit about their intention... what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a Default
impl makes it easier to compose Body
in other types that impl Default
. For example, if you have this struct:
#[derive(Default)]
struct Bodies {
inner: HashMap<i32, dropshot::Body>
}
You couldn't use the derive(Default)
without the impl Default for Body
here. You'd have to write a manual impl Default
and set inner
to HashMap::new()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah true: can you envision a use case for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also used by Client::get(url)
, where it creates a "default" body that is assumed to be empty. And I believe that was used in some tests or docs.
In hindsight, default doesn't necessarily mean empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I too prefer the explicitness of empty()
over default()
if the semantics are "an empty body". I went and looked at what would break if we just removed this Default
impl and I see what you were saying @seanmonstar. It's stuff like:
error: could not compile `dropshot` (test "test_config") due to 9 previous errors
warning: build failed, waiting for other jobs to finish...
error[E0277]: the trait bound `dropshot::Body: Default` is not satisfied
--> dropshot/tests/test_tls.rs:388:12
|
388 | client.get(uri.clone()).await.unwrap_err();
| ^^^ the trait `Default` is not implemented for `dropshot::Body`
|
note: required by a bound in `hyper_util::client::legacy::Client::<C, B>::get`
--> /home/dap/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-util-0.1.5/src/client/legacy/client.rs:164:12
|
162 | pub fn get(&self, uri: Uri) -> ResponseFuture
| --- required by a bound in this associated function
163 | where
164 | B: Default,
| ^^^^^^^ required by this bound in `Client::<C, B>::get`
Since GET requests must have an empty body, hyper's already implicitly assuming that "default" means "empty", and I see that hyper::Body
in 0.14 impl'd Default
in the same way. So I don't love this but it's probably not worse than before.
If we did want to avoid impl'ing Default
here, would there be an easy way to fix up the tests? I don't really see one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can convert all calls to .get()
to .request(req_with_explicitly_empty_body)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's good to know. I could go either way on this -- @ahl what do you think?
let fr = maybefr?; | ||
if let Ok(buf) = fr.into_data() { | ||
nbytesread += buf.len(); | ||
} | ||
} | ||
|
||
// TODO-correctness why does the is_end_stream() assertion fail? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the commented-out assertion here was probably failing because, after discarding all the DATA
frames, an HTTP/2 body was terminated by a TRAILERS
, so the stream was not EOF. I wonder if this assertion can be put back, now that the new implementation will also read and discard any TRAILERS
frame that's sent as well as the DATA
frames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your hard work on this!
@seanmonstar this is looking great. I would like @davepacheco to take a pass before we merge it, but, @davepacheco, feel free to interpret that with whatever degree of rigor you consider appropriate. |
Sorry I'm late to this one. I haven't had a chance to look yet. If there are breaking changes here, as it sounds like there are, then I'd like to get the changelog update in this PR. Our general approach is to provide super clear instructions for people to understand how to know if they're affected by a breaking change and exactly what changes to make, as concretely as we can. There are some examples here, from release 0.9.0. @ahl also mentioned a design doc? That might help review? |
I forgot to add: for breaking changes, I usually like to do a test update of one or more of our important consumers. We've found a lot of issues by doing that. Have we converted any consumers? Ideally it'd be great to convert Omicron's consumers -- is that too much to ask? |
I followed up with dave privately with the document. We'll want to make sure it's recorded in this PR once we have appropriate ownership and permissions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! I'm still working through server.rs but wanted to post these in the meantime.
impl Default for Body { | ||
fn default() -> Body { | ||
Body::empty() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I too prefer the explicitness of empty()
over default()
if the semantics are "an empty body". I went and looked at what would break if we just removed this Default
impl and I see what you were saying @seanmonstar. It's stuff like:
error: could not compile `dropshot` (test "test_config") due to 9 previous errors
warning: build failed, waiting for other jobs to finish...
error[E0277]: the trait bound `dropshot::Body: Default` is not satisfied
--> dropshot/tests/test_tls.rs:388:12
|
388 | client.get(uri.clone()).await.unwrap_err();
| ^^^ the trait `Default` is not implemented for `dropshot::Body`
|
note: required by a bound in `hyper_util::client::legacy::Client::<C, B>::get`
--> /home/dap/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-util-0.1.5/src/client/legacy/client.rs:164:12
|
162 | pub fn get(&self, uri: Uri) -> ResponseFuture
| --- required by a bound in this associated function
163 | where
164 | B: Default,
| ^^^^^^^ required by this bound in `Client::<C, B>::get`
Since GET requests must have an empty body, hyper's already implicitly assuming that "default" means "empty", and I see that hyper::Body
in 0.14 impl'd Default
in the same way. So I don't love this but it's probably not worse than before.
If we did want to avoid impl'ing Default
here, would there be an easy way to fix up the tests? I don't really see one.
dropshot/src/server.rs
Outdated
tokio::spawn(graceful) | ||
loop { | ||
tokio::select! { | ||
Ok((sock, remote_addr)) = self.0.accept() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if accept()
returns an error? The docs aren't super explicit but I gather this could happen any time the underlying accept(2) syscall returns an error, which can include transient errors like ECONNABORTED (on illumos, anyway). EMFILE might also be considered transient. (The docs also say "These would terminate the stream if not handled in any way." I'm not sure what that means. This struct is not a Stream
and I don't know how it could know whether you handled the error.)
If this happens, according to the tokio::select!
docs:
Once an returns a value, attempt to apply the value to the provided , if the pattern matches, evaluate and return. If the pattern does not match, disable the current branch and for the remainder of the current call to select!. Continue from step 3.
Step 3 waits for another of the futures to complete. So my interpretation of all of this is: if we encounter a transient error from accept()
, this program will stop accepting new connections. This would be very bad.
Besides fixing this, it makes me a little nervous that we're implementing something so delicate and hard-to-test here. Is there some existing implementation we could use here? Would it help to turn this into a https://docs.rs/tokio-stream/0.1.15/tokio_stream/wrappers/struct.TcpListenerStream.html?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, hyper did handle all those before. But as part of the removal of the Accept
trait, due to all of it's subtle issues, that was removed as well. Something generic might appear again in hyper-util, and when it does, Dropshot could use it.
For now, I can inline the behavior that used to exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an HttpAcceptor
type here that does what hyper used to do.
TokioIo::new(sock), | ||
self.1.make_svc(remote_addr), | ||
); | ||
let fut = graceful.watch(fut.into_owned()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a hard time understanding the various futures involved here and how the new Graceful
mechanism works. Would it be reasonable to add a comment describing this in more detail? And/or using more descriptive names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments.
I believe I've addressed all the feedback. I added some breaking changes notes to the changelog. Upgrading Omicron is currently out of scope for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates.
// These are errors on the individual socket that we | ||
// tried to accept, and so can be ignored. | ||
std::io::ErrorKind::ConnectionRefused | ||
| std::io::ErrorKind::ConnectionAborted | ||
| std::io::ErrorKind::ConnectionReset => (), | ||
|
||
// This could EMFILE implying resource exhaustion. | ||
// Sleep a little bit and try again. | ||
_ => { | ||
warn!(self.log, "accept error"; "error" => e); | ||
tokio::time::sleep(std::time::Duration::from_millis( | ||
100, | ||
)) | ||
.await; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably okay but I'm a little nervous about the error handling. This is definitely an area where the standards and manual pages are good but operational experience is also relevant, which is why I'm still a little uneasy about implementing this behavior here instead of using a hardened library. But I guess hyper 0.x was doing this inline before? Is this the same logic it used?
The ECONNABORTED check is definitely right and it's one a lot of software misses so thanks for getting that. ✔️
I checked out the illumos manual page (our production systems), POSIX, and Linux. I think most of the errors fall into one of these buckets:
- "shouldn't ever happen and if they did you probably don't want to accept any more": EBADF, EFAULT, ENODEV, ENOTSOCK, EOPNOTSUPP, EPROTO, EINVAL, EAGAIN/EWOULDBLOCK
- "can totally happen and should just be retried": ECONNABORTED, EINTR
- "probably shouldn't happen, but definitely can, and there's probably not a great response but we could retry, and maybe pausing would help": EMFILE, ENOMEM, ENOSR, ENOBUFS, EPERM? (Linux)
I don't see ECONNREFUSED or ECONNRESET as possible errors that can be produced here, but the Linux manual page does say any other error for the new socket can happen here. That seems weird to me, but okay.
It seems brittle to encode all of these, so I guess the logic here is saying: assume the "can't happen" stuff really can't happen, handle ECONNABORTED[/ECONNREFUSED/ECONNRESET] explicitly as transient errors that don't require pausing, and treat everything else as a transient error, with a pause to avoid getting stuck in too tight a loop if we're wrong about it being transient. Is that the idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this is essentially the same code that hyper 0.14 was doing (minus some TCP keep-alive options stuff, which Dropshot didn't enable): https://github.com/hyperium/hyper/blob/0.14.x/src/server/tcp.rs#L211-L254
I do think it'd be nice to have this in hyper-util, so server frameworks don't all need a copy of this. There's a couple design questions to make it sufficiently generic (listener type, timer type, how to log, etc), such that I don't expect it to be ready to be depended on immediately.
@@ -327,6 +332,39 @@ impl<C: ServerContext> InnerHttpServerStarter<C> { | |||
} | |||
} | |||
|
|||
/// This accepts connections more gracefully from a TcpListener. | |||
struct HttpAcceptor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're incorporating this pretty tricky functionality into Dropshot, we should probably have some test coverage for it. Any ideas how to do that? It'd be really nice if we could exercise some of those error cases but that seems hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not have a way to programmatically trigger those error kinds. Maybe if it was sufficiently generic, such that some ErroringListener
could be provided. But that won't be available yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. How about the happy path? What about a test that starts a Dropshot server and makes 100 or 1000 connections to it in a row? Just thinking out loud about how best to test this automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I get that, I write a lot of tests and have multiple times refactored large test suites to make it easier to add more. In this case, I don't see much that could be tested. Especially since the accept code happy path is required for the rest of the unit tests to work, too. But I can add one accepting in a loop, like you mentioned.
I think the last items here are:
|
…cer configuration (#1369) Trying to get my arms around syncing nexus and its satellites with oxidecomputer/dropshot#1028
This defines a `dropshot::Body` type that is used for both requests and responses, in place of the now-gone `hyper::Body`.
Co-authored-by: David Pacheco <dap@cs.brown.edu>
127a57d
to
796918e
Compare
@seanmonstar it looks like Cargo.lock needs an upodate |
@seanmonstar tests are failing |
@seanmonstar looks like windows tests are hung? |
Eyyyy! Passing tests! I'm working on merging this into omicron and its vassal repos; on vacation right now, but hope to shake out any kinks in a week or so. |
WOOT |
This defines a
dropshot::Body
type that is used for both requests andresponses, in place of the now-gone
hyper::Body
.Most of the changes are just renames. The interesting files (at least) are:
body.rs
: This contains the new multi-purposeBody
type. It has a few constructors to make things nice. Otherwise, it's not that complex, but could be tweaked or optimized with more usage and/or measurements.server.rs
: hyper got rid of theAccept
trait. So now an accept loop is in this file manually. It also changed how graceful shutdown is coordinated, with a less-magical helper. It allows for adding a "graceful period timeout" if you eventually want.