Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate sporadic quit() calls. #6881

Open
wants to merge 17 commits into
base: unstable
Choose a base branch
from
Open

Eliminate sporadic quit() calls. #6881

wants to merge 17 commits into from

Conversation

cheatfate
Copy link
Contributor

@cheatfate cheatfate commented Jan 28, 2025

This PR converts all the established quit calls into raiseDefect statements and also establishes place where all this specific Defects being handled and appropriate exit code returned.

The only quit() left is Doppelganger one, i make it available for 2 reasons.

  1. Is it ok to convert it to Defect?
  2. 129 is invalid exit code number because it falls into the signals range. For Linux SIGHUP signal sent to process could also return exit code 129.

Copy link

github-actions bot commented Jan 28, 2025

Unit Test Results

       15 files  ±0    2 285 suites  ±0   1h 14m 50s ⏱️ + 8m 31s
  5 368 tests ±0    5 021 ✔️ ±0  347 💤 ±0  0 ±0 
37 069 runs  ±0  36 579 ✔️ ±0  490 💤 ±0  0 ±0 

Results for commit 4b8d93d. ± Comparison against base commit 67ac6fc.

♻️ This comment has been updated with latest results.

@kdeme
Copy link
Contributor

kdeme commented Jan 29, 2025

2. 129 is invalid exit code number because it falls into the signals range. For Linux SIGHUP signal sent to process could also return exit code 129

I don't remember exactly but wasn't that number chosen (arbitrary?) so that for example a service could be set up not to restart the process when the program terminated with that specific exit code?

template withDefectsHandlers*(pbody: untyped) =
try:
pbody
except ExitProcessDefect as exc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching Defect is undefined behavior - this is not something we want to use/rely on broadly - it can change meaning in future nim versions and/or not behave properly .

Rather than defect, in order to avoid quit, the fix would be to return errors from the functions and deal with them or let the defect be raised and not caught.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its first time i see that catching Defect is undefined behavior, because in such case generating Defect is also undefined behavior and code which was written with Result.expect() and quit X in mind should be totally rewritten before attempts to combine CL and EL into single binary. Otherwise undefined behavior created by CL could cause significant losses in EL (e.g. database could be left in undefined state).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise undefined behavior created by CL could cause significant losses in EL (e.g. database could be left in undefined state).

ELs should not be able to put into undefined states by anything the engine API does. That's an EL bug.

Copy link
Member

@arnetheduck arnetheduck Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generating Defect is also undefined behavior

a raised Defect is guaranteed to terminate the application - how it does that is undefined - ie except Defect may or may not work, ie you can't expect that quit(exc.exitCode) or anything inside this exception handler will actually run.

This undefined behavior is why we don't use except: and except Exception: - they are both undefined as a consequence of catching Defect being undefined (I consider it a design flaw in the language).

The third correct option is to turn ExitProcessDefect into a CatchableError.

From a code perspective, it's also bad to have untracked exceptions / hidden control flows like this - this goes back to our original decision to use raises consistently - raises tracking does not work with Defect either.

expect is the same as doAssert, but with a better error message - it works as intended, ie it terminates the application.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(e.g. database could be left in undefined state)

generally, all our database interactions happen in a transaction - the worst thing that can happen is a rollback.

@arnetheduck
Copy link
Member

arnetheduck commented Jan 29, 2025

129 is invalid exit code number because it falls into the signals range. For Linux SIGHUP signal sent to process could also return exit code 129.

I don't remember the reasoning either, but it would break a lot of installations to change it now, because it's also part of the systemd service file that we recommend. If SIGHUP can really cause this, we should probably install a handler to ignore it.

@tersec
Copy link
Contributor

tersec commented Jan 29, 2025

129 is invalid exit code number because it falls into the signals range. For Linux SIGHUP signal sent to process could also return exit code 129.

I don't remember the reasoning either, but it would break a lot of installations to change it now, because it's also part of the systemd service file that we recommend. If SIGHUP can really cause this, we should probably install a handler to ignore it.

It addresses #3973 where previously it had been 1031, which was not a valid choice, and it doesn't conflict with https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#Process%20Exit%20Codes.

https://nimbus.guide/doppelganger-detection.html documents this:

If any activity is detected, the node shuts down with exit code 129.

@cheatfate
Copy link
Contributor Author

It doesn't matter where we used 129 - we should change this number to any number which is < 128.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants