Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rt: cancel all in-flight ops on driver drop #158

Merged
merged 3 commits into from
Nov 3, 2022

Conversation

Noah-Kennedy
Copy link
Contributor

This fixes an immediate issue where ops like Accept can cause runtime shutdown to hang.

This may need to be revisited once we have a real cancellation story. It may be better to do cancel-on-drop for ops.

@Noah-Kennedy
Copy link
Contributor Author

Hmm, it doesn't work as well as expected.

@FrankReh
Copy link
Collaborator

FrankReh commented Oct 30, 2022 via email

@Noah-Kennedy
Copy link
Contributor Author

Figured out what's wrong. For this to really work, we need to drop the runtime tasks on process exit.

@Noah-Kennedy
Copy link
Contributor Author

I need #157 in first to finish this.

@Noah-Kennedy Noah-Kennedy force-pushed the noah/shutdown-cancellation branch from 4f515a6 to f81ba93 Compare October 30, 2022 21:10
@FrankReh
Copy link
Collaborator

Is this meant to fix the problem caused by exit being called from within the runtime too? If so, I can give it a test.

@Noah-Kennedy
Copy link
Contributor Author

Yes, but it doesn't work yet.

@FrankReh
Copy link
Collaborator

I don’t think you will find operations in the submission queue. That queue is empty even when operations are in flight.

I see. That loop was just to flush any pending sqe's.

@FrankReh
Copy link
Collaborator

Yes, but it doesn't work yet.

Don't know if you are looking for ideas. Have you tried setting the slab's entry's lifecycle to ignored so when the complete gets called by tick, it will signal a drop to the slab?

@Noah-Kennedy
Copy link
Contributor Author

The issue it currently has is that we need to drop the runtime before we can drop this, because all tasks need to be moved to the "ignored" state. So exit still fails right now.

@FrankReh
Copy link
Collaborator

I don't follow. The driver drop is synchronous so tasks won't run will they? You mean the ops lifecycles need to be moved to the ignored state? The driver drop is already breaking invariants. Why not have its loop change the lifecycles it finds to ignored? I may be missing something. I'll let you think about it. You can ask me to have a look later if you want. I'm sure you'll figure it out quickly without me looking over your shoulder.

@FrankReh
Copy link
Collaborator

FrankReh commented Nov 1, 2022

@Noah-Kennedy are you waiting on a review on this?

@fasterthanlime
Copy link
Contributor

This PR makes https://github.com/hapsoc/hring tests pass again.

(They still fail when std::process:exit is called, but that's fine).

@FrankReh
Copy link
Collaborator

FrankReh commented Nov 2, 2022

@fasterthanlime Ironically, it is a destructor calling back into our drop that the libc exit is calling for our thread local storage that is causing the problem. Until we looked at the stack, we didn't know there was a libc cleanup per thread when exiting.

@Noah-Kennedy Noah-Kennedy force-pushed the noah/shutdown-cancellation branch from f81ba93 to 79bcfef Compare November 2, 2022 23:10
This fixes an immediate issue where ops like Accept can cause runtime shutdown to hang.

Unfortunately, it also stalls some work related to multi-shot ops.
@Noah-Kennedy Noah-Kennedy force-pushed the noah/shutdown-cancellation branch from 79bcfef to 558a1c3 Compare November 2, 2022 23:11
@Noah-Kennedy
Copy link
Contributor Author

@FrankReh this is ready for review.

@Noah-Kennedy
Copy link
Contributor Author

CC @fasterthanlime

@fasterthanlime
Copy link
Contributor

CC @fasterthanlime

I won't be able to review this for at least 12 hours but if the hring rest suite passes with it, it's good with me 👍

@Noah-Kennedy
Copy link
Contributor Author

@fasterthanlime I'm hitting a few errors when trying to run your tests:

$ cargo test
    Finished test [unoptimized + debuginfo] target(s) in 0.03s
     Running unittests src/lib.rs (target/debug/deps/hring-f8b5b0f235a94d7b)

running 25 tests
test buffet::roll::tests::test_roll_iter ... ok
test buffet::roll::tests::test_roll_keep_different_buf - should panic ... ok
test buffet::roll::tests::test_roll_keep_different_box - should panic ... ok
test buffet::roll::tests::test_roll_keep_different_type - should panic ... ok
test buffet::roll::tests::test_roll_keep_before_buf - should panic ... ok
test buffet::roll::tests::test_roll_keep_before_box - should panic ... ok
test buffet::roll::tests::test_roll_iobuf ... ok
test buffet::roll::tests::test_roll_keep ... ok
test buffet::roll::tests::test_roll_put_does_not_fit ... ok
test buffet::roll::tests::test_roll_put_then_grow ... ok
test buffet::roll::tests::test_roll_realloc ... ok
test buffet::roll::tests::test_roll_put ... ok
test buffet::roll::tests::test_roll_realloc_big ... ok
test buffet::roll::tests::test_roll_readfrom_start ... ok
test buffet::tests::size_test ... ok
test buffet::roll::tests::test_roll_reserve ... ok
test h1::parse::tests::test_h1_parse_various_lowlevel_functions ... ok
test io::chan::tests::test_chan_reader ... ok
test buffet::roll::tests::test_roll_nom_sample ... ok
test buffet::roll::tests::test_roll_take_at_most ... ok
test buffet::tests::split_test ... ok
test buffet::roll::tests::test_roll_take_all_empty - should panic ... ok
test buffet::tests::freeze_test ... ok
test buffet::roll::tests::test_roll_take_at_most_panic - should panic ... ok
test buffet::roll::tests::test_roll_take_all ... ok

test result: ok. 25 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s

     Running tests/integration_test.rs (target/debug/deps/integration_test-3295f65df25cafaa)

running 9 tests
test curl_echo_body_noproxy_content_len ... FAILED
test curl_echo_body_noproxy_chunked ... FAILED
test proxy_echo_body_chunked ... FAILED
test curl_echo_body_content_len ... FAILED
test request_api ... FAILED
test proxy_statuses ... FAILED
test proxy_echo_body_content_len ... FAILED
test serve_api ... FAILED
test curl_echo_body_chunked ... FAILED

failures:

---- curl_echo_body_noproxy_content_len stdout ----
thread 'curl_echo_body_noproxy_content_len' panicked at 'called `Result::unwrap()` on an `Err` value: InstallError', tests/integration_test.rs:620:73

---- curl_echo_body_noproxy_chunked stdout ----
thread 'curl_echo_body_noproxy_chunked' panicked at 'called `Result::unwrap()` on an `Err` value: InstallError', tests/integration_test.rs:620:73
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- proxy_echo_body_chunked stdout ----
thread 'proxy_echo_body_chunked' panicked at 'called `Result::unwrap()` on an `Err` value: InstallError', tests/integration_test.rs:375:73

---- curl_echo_body_content_len stdout ----
thread 'curl_echo_body_content_len' panicked at 'called `Result::unwrap()` on an `Err` value: InstallError', tests/integration_test.rs:536:73

---- request_api stdout ----
thread 'request_api' panicked at 'failed to set global default subscriber: SetGlobalDefaultError("a global default trace dispatcher has already been set")', /home/noah/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-subscriber-0.3.16/src/util.rs:91:14

---- proxy_statuses stdout ----
thread 'proxy_statuses' panicked at 'failed to set global default subscriber: SetGlobalDefaultError("a global default trace dispatcher has already been set")', /home/noah/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-subscriber-0.3.16/src/util.rs:91:14

---- proxy_echo_body_content_len stdout ----
thread 'proxy_echo_body_content_len' panicked at 'failed to set global default subscriber: SetGlobalDefaultError("a global default trace dispatcher has already been set")', /home/noah/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-subscriber-0.3.16/src/util.rs:91:14

---- serve_api stdout ----
thread 'serve_api' panicked at 'failed to set global default subscriber: SetGlobalDefaultError("a global default trace dispatcher has already been set")', /home/noah/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-subscriber-0.3.16/src/util.rs:91:14

---- curl_echo_body_chunked stdout ----
thread 'curl_echo_body_chunked' panicked at 'Error: No such file or directory (os error 2)', tests/helpers/mod.rs:10:13


failures:
    curl_echo_body_chunked
    curl_echo_body_content_len
    curl_echo_body_noproxy_chunked
    curl_echo_body_noproxy_content_len
    proxy_echo_body_chunked
    proxy_echo_body_content_len
    proxy_statuses
    request_api
    serve_api

test result: FAILED. 0 passed; 9 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

error: test failed, to rerun pass `--test integration_test`

@Noah-Kennedy
Copy link
Contributor Author

The block_on_twice test was already failing.

@fasterthanlime
Copy link
Contributor

@Noah-Kennedy yeah sorry, only individual tests will run with "cargo test" — the whole suite requires "cargo nextest run" (since it does one process per test)

@Noah-Kennedy
Copy link
Contributor Author

@fasterthanlime thanks, now I'm seeing:

thread 'proxy_statuses' panicked at 'a handler must always be installed if the `auto-install` feature is disabled', tests/testbed.rs:19:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Copy link
Collaborator

@FrankReh FrankReh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very neat. Do you mean to commit this before the CompletionList issues are solved too? I wouldn't mind that.

src/driver/mod.rs Show resolved Hide resolved
src/driver/mod.rs Show resolved Hide resolved
src/driver/mod.rs Show resolved Hide resolved
src/driver/mod.rs Show resolved Hide resolved
src/driver/mod.rs Show resolved Hide resolved
src/driver/mod.rs Show resolved Hide resolved
src/driver/mod.rs Show resolved Hide resolved
@Noah-Kennedy
Copy link
Contributor Author

@FrankReh I'd like us to get this in and then figure out what to do about multishot ops. This blocks us getting a release out.

src/driver/mod.rs Show resolved Hide resolved
@Noah-Kennedy Noah-Kennedy merged commit 4b0e07e into master Nov 3, 2022
@Noah-Kennedy Noah-Kennedy deleted the noah/shutdown-cancellation branch November 3, 2022 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants