Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move finalized state database and filesystem operations to blocking threads to avoid hangs #2188

Closed
Tracked by #3322 ...
teor2345 opened this issue May 24, 2021 · 1 comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug I-hang A Zebra component stops responding to requests I-slow Problems with performance or responsiveness I-usability Zebra is hard to understand or use

Comments

@teor2345
Copy link
Contributor

teor2345 commented May 24, 2021

Previous Work

We tried to fix this in PR #4199.

It worked, but it didn't help performance much, and the extra threading caused some fragile tests to intermittently fail.

Scheduling

We should only to this ticket if we fix the other hang bugs, and Zebra still hangs frequently.

Is your feature request related to a problem? Please describe.

Zebra runs finalized state database and filesystem operations on async threads, which can block:

  • async code running concurrently in the same task, and
  • async code from other tasks on the same thread.

Describe the solution you'd like

Instead, Zebra should:

  • use tokio's spawn_blocking function to run these tasks, and
  • make all callers into async functions (or access the executor directly)

https://docs.rs/tokio/1.6.0/tokio/task/fn.spawn_blocking.html

Describe alternatives you've considered

We can use tokio's block_in_place function, and move test code into a tokio::test. This function only blocks concurrent code in the same task. This makes it suitable for initialization or finalization tasks, because we're not doing much concurrent work at that time.

We could also use block_in_place as we transition code.

Additional context

Blocking functions might be the source of hangs or slowdowns in the inbound service, or state service, or unrelated tasks.

@teor2345 teor2345 added C-bug Category: This is a bug A-rust Area: Updates to Rust code S-needs-triage Status: A bug report needs triage P-Medium I-hang A Zebra component stops responding to requests I-slow Problems with performance or responsiveness I-usability Zebra is hard to understand or use labels May 24, 2021
@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label May 31, 2021
@teor2345 teor2345 changed the title Move finalized state database and filesystem operations to blocking threads Move finalized state database and filesystem operations to blocking threads to avoid hangs Jan 5, 2022
@teor2345 teor2345 added P-Low and removed P-Medium labels Jan 5, 2022
@teor2345
Copy link
Contributor Author

teor2345 commented Mar 1, 2022

This doesn't seem to cause bugs in practice, but it could cause:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug I-hang A Zebra component stops responding to requests I-slow Problems with performance or responsiveness I-usability Zebra is hard to understand or use
Projects
None yet
Development

No branches or pull requests

2 participants