-
Notifications
You must be signed in to change notification settings - Fork 20.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #20859 added unsafe shutdown detector #21133
Conversation
Not sure why it's failing on ARM or OSX. |
@muhammednagy Don't worry, these tests are known to be flaky and fail sometimes |
cmd/geth/main.go
Outdated
@@ -378,13 +378,25 @@ func startNode(ctx *cli.Context, stack *node.Node) { | |||
} | |||
ethClient := ethclient.NewClient(rpcClient) | |||
|
|||
var ethService *eth.Ethereum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @muhammednagy this is unfortunately not valid. You can run a light-node without having the eth.Ethereum
service started. @ligi correct me if I'm wrong but we mostly care about crashes in full client not in light clients, so I think its valid to move this code into the if ctx.GlobalInt...
on line 383.
To repro failure:
> make geth
> ./build/bin/geth --syncmode=light --goerli
...
Fatal: Failed to retrieve ethereum service: unknown service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea - it would mainly be interesting for full nodes - so I think we should go with what you proposed. But if there is a way to have it for both this would also be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using a different database? is it possible?
We could probably store them in a file instead which will work for whatever type of node we are running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely possible, but I think we want to have it in the same database. Otherwise we just wind up with a separate set of problems, where we need to care about consistency over many files, and keeping that file in sync with when e.g people do removedb
or removes the db manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay i think in that case a file would work best for all regardless of the type of node.
Should i go ahead with having a file?
If not then do you know which db i can use that would work for all kinds of nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use the leveldb instance we already have throughout the code for everything.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
What about if it's not serving LES requests? |
You most definitely have a database. It's in the |
Hey, |
We've discussed this today. This PR is a good first step in the right direction, but it is a bit lacking. We would like to be able to show, in the startup log, when the last unclean shutdown happened. In order to do that, we cannot delete the |
@@ -567,6 +577,9 @@ func (s *Ethereum) Stop() error { | |||
s.lesServer.Stop() | |||
} | |||
|
|||
if err := s.chainDb.Delete([]byte("unsafe-shutdown")); err != nil { | |||
log.Error("err", err) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to move this down, right above chaindb.Close()
. The biggest offender for an unclean shutdown is the blockchain shutdown mechanism.
// Create an invalid shutdown in database in case the app crashed | ||
if err = chainDb.Put([]byte("unsafe-shutdown"), []byte(time.Now().String())); err != nil { | ||
log.Warn("Failed to record possible future unsafe shutdown", "err", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR uses a single unsafe-shutdown
key to store a single timestamp. That's a good start, but we need a bit more information, otherwise quirky scenarios get lost. Notably, it's important to retain past crashes too, so a user might see if there was a fault recently, not just in the last restart (user's tend to restart their nodes a few times until attempting to investigate an issue, so a restart cannot delete the cause).
A simplistic suggestion would be to use a slice of timestamps instead of a single timestamp as the value of the entry. When Geth starts, you can append the current time, and on shutdown, you can pop off the tail. That way on startup you can actually list all the recent crashes, not just the last one.
} | ||
// Create an invalid shutdown in database in case the app crashed | ||
if err = chainDb.Put([]byte("unsafe-shutdown"), []byte(time.Now().String())); err != nil { | ||
log.Warn("Failed to record possible future unsafe shutdown", "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's safe to return nil, err
here. Although it's not a critical system, if writing to the database failed something's going to go belly up either way.
// Check for invalid shutdown | ||
invalidShutdown, _ := chainDb.Get([]byte("unsafe-shutdown")) | ||
if invalidShutdown != nil { | ||
log.Error("unsafe shutdown detected", "time", string(invalidShutdown)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please capitalize the log message. Also please use a proper time value, not a pre-stringified version.
log.Error("unsafe shutdown detected", "time", string(invalidShutdown)) | ||
} | ||
// Create an invalid shutdown in database in case the app crashed | ||
if err = chainDb.Put([]byte("unsafe-shutdown"), []byte(time.Now().String())); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls use time.Now().Unix() to store the current timestamp instead of a localized string
@@ -231,6 +232,15 @@ func New(ctx *node.ServiceContext, config *Config) (*Ethereum, error) { | |||
return nil, err | |||
} | |||
|
|||
// Check for invalid shutdown | |||
invalidShutdown, _ := chainDb.Get([]byte("unsafe-shutdown")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're using unsafe-shutdown
in quite a few places and in different packages, it might make sense to move this into core/rawdb/schema.go
. We have a lot of other metadata keys defined there.
You can't show the error message per se, because you don't have it, but I'm thinking you can show something like There's no need to list every single one |
This PR seems to be abandoned. If you feel like taking it from here, please go ahead. |
Hey |
Thanks for getting started on this @muhammednagy! I took your work and extended it with the additional requirements we had, which is implemented in #21893. I'm closing this PR in favour of that one |
Added unsafe shutdown detector in case the app crashed.
fixes: #20859