Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task_mgr::spawn's shutdown_process_on_error` doesnt work reliably [P:3] [S:0] #3402

Closed
LizardWizzard opened this issue Jan 23, 2023 · 2 comments
Labels
c/storage/pageserver Component: storage: pageserver c/storage Component: storage t/bug Issue Type: Bug triaged bugs that were already triaged

Comments

@LizardWizzard
Copy link
Contributor

LizardWizzard commented Jan 23, 2023

Steps to reproduce

Was discovered via #3387

Apparently synthetic size calculation task had an error that triggered shut down. Synthetic size calculation shouldnt lead to pageserver shutdown, and this is fixed in #3392. But shutdown on error should still work even if its triggered erroneously. This is what this issue all about.

Expected result

pageserver restart.

Actual result

pageserver was stuck in semi-alive state when some of the tasks were stopped and some continue running. Postgres protocol listener was shut down so this resulted in connection refused errors during basebackups.

Environment

prod.

Logs, links

@LizardWizzard LizardWizzard added t/bug Issue Type: Bug c/storage/pageserver Component: storage: pageserver c/storage Component: storage labels Jan 23, 2023
@LizardWizzard LizardWizzard changed the title task_mgr::spawn's shutdown_process_on_error` doesnt work reliably task_mgr::spawn's shutdown_process_on_error` doesnt work reliably [P3:S0] Feb 28, 2023
@LizardWizzard LizardWizzard changed the title task_mgr::spawn's shutdown_process_on_error` doesnt work reliably [P3:S0] task_mgr::spawn's shutdown_process_on_error` doesnt work reliably [P:3] [S:0] Feb 28, 2023
@shanyp
Copy link
Contributor

shanyp commented Mar 23, 2023

consider having a timeout for this one

@jcsp
Copy link
Collaborator

jcsp commented Mar 15, 2024

Fixed in #6105 -- we now exit(1) if a shutdown_process=true case.

@jcsp jcsp closed this as completed Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver c/storage Component: storage t/bug Issue Type: Bug triaged bugs that were already triaged
Projects
None yet
Development

No branches or pull requests

3 participants