-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client hang with hyper 0.14 (tokio, async-std) #2312
Comments
Thanks for trying this out with the newer version. I wonder about using Have you seen the same issue if you change this to async/await? #[tokio::main]
async fn main() {
// env_logger::Builder::from_default_env()
// .target(env_logger::fmt::Target::Stdout)
// .init();
tracing_subscriber::fmt::init();
let client: hyper::Client<hyperlocal::UnixConnector> =
hyper::Client::builder().build(hyperlocal::UnixConnector);
for i in 0.. {
println!("{}", i);
let _resp = client
.get(hyperlocal::Uri::new("/var/run/docker.sock", "//events?").into())
.await;
client
.get(hyperlocal::Uri::new("/var/run/docker.sock", "/events?").into())
.await;
}
} |
Thank you for the response! I tried the following version without #[tokio::main]
async fn main() {
// env_logger::Builder::from_default_env()
// .target(env_logger::fmt::Target::Stdout)
// .init();
tracing_subscriber::fmt::init();
let client: hyper::Client<hyperlocal::UnixConnector> =
hyper::Client::builder().build(hyperlocal::UnixConnector);
for i in 0.. {
println!("{}", i);
let _resp = client
.get(hyperlocal::Uri::new("/var/run/docker.sock", "//events?").into())
.await;
let res = client
.get(hyperlocal::Uri::new("/var/run/docker.sock", "/events?").into())
.await
.unwrap();
tokio::spawn(res.into_body().into_future());
}
} The last |
The spawned future ( Probably too many spawned tasks block |
I ran your latest example in a Ubuntu WSL2 install and it ran just fine until I eventually killed it at iteration ~15,000. Are you sure the problem is with hyper and not your Docker driver not responding for some reason? |
Hmm. My WSL2 Box (Ubuntu 20.04.1 LTS (Focal Fossa)) does reproduce the hang. |
I do see the hang with the async-std version after a couple hundred iterations, yeah. |
My coworker reported that the following version randomly stops working when invoking repeatedly: use futures::prelude::*;
#[tokio::main]
async fn main() {
let args: Vec<String> = std::env::args().collect();
env_logger::init();
let client: hyper::Client<hyperlocal::UnixConnector> =
hyper::Client::builder().build(hyperlocal::UnixConnector);
let _resp = client
.get(hyperlocal::Uri::new("/var/run/docker.sock", "//events").into()) // this uri can be "//"
.await;
let resp = client
.get(hyperlocal::Uri::new("/var/run/docker.sock", "/events").into())
.await
.unwrap();
tokio::spawn(resp.into_body().into_future());
let _resp = client
.get(hyperlocal::Uri::new("/var/run/docker.sock", "//events").into()) // this uri can be "//", too
.await;
println!("ok: {}", args[1]);
} I couldn't reproduce it with trace log of the last invocation
|
Another coworker reported that adding .pool_idle_timeout(std::time::Duration::from_millis(0))
.pool_max_idle_per_host(0) to the client builder is a work around. |
Update:
|
there is from my debugging it looks like there is no initial Connection poll (& then through ProtoClient -> Dispatcher -> Conn request is not encoded), but i have surprisingly hard time to figure out where that poll should be initiated to investigate it further. |
i have added
when the problem arises, it's always with the last dropped/pooled Connection from previous round, in this case id=2 any pointers where to look @seanmonstar ?
|
ah, there is some difference. previous round before every sucessfull ends with chunked encoding response & that accidentaly prevents reusing connection. notice that reqwest is returning before chunked body is completed, but on client side the body is consumed: let res = self.client.post(self.url.as_ref()).json(&payload).send().await?;
let text = res.text().await?; every round is spawned inside stream, but debugged with
|
when comparing problematic content-length body request on trace debug level, the hang occurs if it is last completed request & round ends with
but in successfull case (when it is not the last completed request in round) there is this last poll later with third flush (notice that it happend after
it looks to me this should be close to enough info to fix this for someone with deep understanding of internals. happy to investigate further. |
I'm experiencing a similar issue. In an application I'm working on, I'm sending multiple requests (from a shared reqwest According to a Wireshark trace, the fourth request is never sent to the network. I added debug logging deep down into hyper, but didn't really find the source of the issue. I suspected that it's related to connection reuse (because the first three requests work), and just as described above, setting In case it's useful, here are a few screenshots from my debugging session. The reqwest library delegates the HTTP request to hyper: When looking at the program using |
This comment was marked as spam.
This comment was marked as spam.
I just wasted my entire day because of this. This also affects std::thread::spawn(move || {
let a = tokio::runtime::Builder::new_multi_thread().build().unwrap();
a.block_on(
async {
// reqwest call here then `.await`
}
);
}).join().expect("Thread panicked") But, |
This is a workaround for hyperium/hyper#2312
is there any timeline for fixing this or have we given up? |
I have a two step container build process to reduce container size. In the initial build container everything works fine. However, In my second container it breaks with the same behaviour as described here. Could this be the same issue or is it a docker driver issue that @sfackler described above?
|
As indicated in Azure#1549, there is an issue with hyper (the underlying layer used by reqwest) that hangs in some cases on connection pools. This PR uses a commonly discussed workaround of setting `pool_max_idle_per_host` to 0. Ref: hyperium/hyper#2312
As indicated in #1549, there is an issue with hyper (the underlying layer used by reqwest) that hangs in some cases on connection pools. This PR uses a commonly discussed workaround of setting `pool_max_idle_per_host` to 0. Ref: hyperium/hyper#2312
@jevolk Just being curious, I saw you pushed a commit in a fork (jevolk@56a64cf) that seemingly to have "[fixed] the deadlock". Would you mind shedding a light on this issue? |
The patch appears to prevent the deadlock for the test https://github.com/pandaman64/hyper-hang-tokio but it is not the best solution. I've observed some regressions when using it under normal circumstances which is why I never opened a PR. I left it for future reference. I should note my application makes an arguably exotic use of reqwest and hyper (legacy) client pools by communicating with thousands of remote hosts in a distributed system, often with several connections each, in an async+multi-threaded process: we do not set |
@pandaman64 out of curiosity, does this repro if you read the response of the first request in your example? println!("{}", i);
let _resp = client
.get(hyperlocal::Uri::new("/var/run/docker.sock", "//events?").into())
.await;
let res = client
.get(hyperlocal::Uri::new("/var/run/docker.sock", "/events?").into())
.await
.unwrap();
tokio::spawn(_resp.into_body().into_future());
tokio::spawn(res.into_body().into_future()); |
@juliusl I no longer have the environment reproduced the issue so I cannot comment on this, but I believe some other examples linked in this issue do read the body. |
I appreciate that many have linked to this issue, and that some have included details. But the maintainers have so far been unable to reproduce in a way that we can investigate this. The code in question is also run under large load in many systems where it doesn't seem to occur. And, because of the generic title, I suspect many link here with issues that are probably separate. So, my plan here is to close this specific issue. This isn't to say that the issue can't be in hyper. We unfortunately do write bugs. But if this is an issue in your system, if you can help debug where exactly the hang is, that would greatly help us help you. New issues with unique details are always welcome. |
Context: we are investigating if upgrading hyper to 0.13 fixes #2306, and it seems not.
Steps to reproduce
Prerequisites:
ulimit -n 65536
(increasing open file limits)cargo run
Expected behavior
The program should run until the system resource is exhausted.
Actual behavior
It hangs after indeterminate iterations.
Log (last several iterations)
Reproducer
I'm pasting the reproducer here for ease of reference.
Notes
"//events?"
) does matter. We need two requests with different uris.The text was updated successfully, but these errors were encountered: