-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(baseapp): signal then panic at halt-height #338
fix(baseapp): signal then panic at halt-height #338
Conversation
You can compare to my attempt at a fix: https://github.com/agoric-labs/cosmos-sdk/tree/8326-exit-on-halt Also note that it will fix Agoric/agoric-sdk#8326 |
Here is my annotated version of the code I see there today: // halt attempts to gracefully shutdown the node via SIGINT and SIGTERM falling
// back on os.Exit if both fail.
func (app *BaseApp) halt() {
app.logger.Info("halting node per configuration", "height", app.haltHeight, "time", app.haltTime)
p, err := os.FindProcess(os.Getpid())
if err == nil {
// attempt cascading signals in case SIGINT fails (os dependent)
sigIntErr := p.Signal(syscall.SIGINT)
sigTermErr := p.Signal(syscall.SIGTERM)
if sigIntErr == nil || sigTermErr == nil {
// mfig: sigIntErr and sigTermErr are always nil if the process
// was found and the signal was sent. It has nothing to do with
// how the process handles the signal, thus just returning here
// does not prevent new blocks from being produced. A panic
// is the most appropriate way to prevent further progress.
return
}
}
// Resort to exiting immediately if the process could not be found or killed
// via SIGINT/SIGTERM signals.
app.logger.Info("failed to send SIGINT/SIGTERM; exiting...")
os.Exit(0) // mfig: this should not be a 0 (success) exit code
} |
0fc47f5
to
1efdb50
Compare
Port of cosmos#16639 Co-authored-by: yihuang <huang@crypto.com>
1efdb50
to
ecb7f57
Compare
811864c
to
00164a3
Compare
00164a3
to
8ebafcd
Compare
27b8f1a
to
8ebafcd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted, I'm uncomfortable with using something other than block time to establish "now". Is there a way to get a block context into the function?
baseapp/abci.go
Outdated
@@ -312,6 +314,37 @@ func (app *BaseApp) deliverTxWithoutEventHistory(req abci.RequestDeliverTx) (res | |||
} | |||
} | |||
|
|||
// checkHalt checkes if height or time exceeds halt-height or halt-time respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// checkHalt checkes if height or time exceeds halt-height or halt-time respectively. | |
// checkHalt forces a process exit if height or time exceeds halt-height or halt-time respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've rephrased.
baseapp/abci.go
Outdated
case app.haltHeight > 0 && uint64(height) > app.haltHeight: | ||
halt = true | ||
|
||
case app.haltTime > 0 && time.Unix() > int64(app.haltTime): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this invalid? time.Unix
requires POSIX epoch timestamp input in the form of sec
and nsec
arguments, and returns a time.Time
which is incomparable with int64
.
case app.haltTime > 0 && time.Unix() > int64(app.haltTime): | |
case app.haltTime > 0 && time.Now().Unix() > int64(app.haltTime): |
(but I'm uncomfortable with using local time rather than block time for this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. You were misled by the fact the call wasn't using a static function of the time
package, because that package was shadowed by checkHalt
's time
argument. So, it was actually using the argument's instance method, Time.Unix
. To avoid this confusion, I've renamed the argument to tm
instead.
baseapp/abci.go
Outdated
var halt bool | ||
switch { | ||
case app.haltHeight > 0 && uint64(height) > app.haltHeight: | ||
halt = true | ||
|
||
case app.haltTime > 0 && time.Unix() > int64(app.haltTime): | ||
halt = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch
seems too verbose here.
var halt bool | |
switch { | |
case app.haltHeight > 0 && uint64(height) > app.haltHeight: | |
halt = true | |
case app.haltTime > 0 && time.Unix() > int64(app.haltTime): | |
halt = true | |
} | |
halt := (app.haltHeight > 0 && uint64(height) > app.haltHeight) || | |
(app.haltTime > 0 && time.Unix() > int64(app.haltTime)) |
or
var halt bool | |
switch { | |
case app.haltHeight > 0 && uint64(height) > app.haltHeight: | |
halt = true | |
case app.haltTime > 0 && time.Unix() > int64(app.haltTime): | |
halt = true | |
} | |
var halt bool | |
if app.haltHeight > 0 && uint64(height) > app.haltHeight { | |
halt = true | |
} else if app.haltTime > 0 && time.Unix() > int64(app.haltTime) { | |
halt = true | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
34babbe
to
ad59089
Compare
The CometBFT block header as supplied to app.checkHalt(req.Header.Height, req.Header.Time) |
baseapp/abci.go
Outdated
// checkHalt forces a state machine halt and attempts to kill the current | ||
// process if height or tm exceeds halt-height or halt-time respectively. | ||
func (app *BaseApp) checkHalt(height int64, tm time.Time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions for further clarification:
// checkHalt forces a state machine halt and attempts to kill the current | |
// process if height or tm exceeds halt-height or halt-time respectively. | |
func (app *BaseApp) checkHalt(height int64, tm time.Time) { | |
// checkHalt forces a state machine halt and attempts to kill the current | |
// process if block height or timestamp exceeds halt-height or halt-time respectively. | |
func (app *BaseApp) checkHalt(blockHeight int64, blockTime time.Time) { |
baseapp/abci.go
Outdated
return | ||
} | ||
|
||
app.logger.Info("halt per configuration", "height", app.haltHeight, "time", app.haltTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
app.logger.Info("halt per configuration", "height", app.haltHeight, "time", app.haltTime) | |
app.logger.Info("halt per configuration", "haltHheight", app.haltHeight, "haltTime", app.haltTime, "blockHeight", height, "blockTime", tm) |
ad59089
to
80f51b2
Compare
3768f9c
into
mfig-v0.46.16-alpha.agoric.2.4
FYI, the Overall LGTM, sorry for dropping the ball on this review. |
Description
Refs: #337, #305
Closes: Agoric/agoric-sdk#8326
Implements both newer Cosmos panic and Agoric-compatible signal behaviour when
--halt-height
is reached. First, try toSIGINT
self, thenSIGTERM
self, followed by theCONSENSUS FAILURE!!!
panic. This will stop the state machine in all cases and should exit the process as Agoric expects.Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!
to the type prefix if API or client breaking changeCHANGELOG.md
make lint
andmake test
Reviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...
!
in the type prefix if API or client breaking change