Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looong startup time with no indication in log of what's happening. #2778

Open
dan-da opened this issue Jun 29, 2019 · 2 comments
Open

Looong startup time with no indication in log of what's happening. #2778

dan-da opened this issue Jun 29, 2019 · 2 comments

Comments

@dan-da
Copy link

dan-da commented Jun 29, 2019

environment: Running against spruned on ubuntu. Both spruned and lightning running over tor.

When I start lightningd, it does not quickly exit back to command-line (enter daemon mode) as it is supposed to. Instead, it just seems to hang. In debug.log, there are only two entries re "creating db" and "missing gossip messages".

After trying several things, including removing ~/.lightning entirely, I just let it stay that way. After approx 15 minutes, there is a line that estimatesmartfee rpc exited with status 1 and finally the daemon starts up.

2019-06-28T16:59:37.178Z lightningd(15089): Creating database
2019-06-28T16:59:37.463Z lightning_gossipd(15101): We seem to be missing gossip messages
2019-06-28T17:14:38.443Z lightningd(15089): bitcoin-cli: finished bitcoin-cli -datadir=/home/c-lightning/.bitcoin estimatesmartfee 100 ECONOMICAL (900034 ms)
2019-06-28T17:14:38.443Z lightningd(15089): bitcoin-cli -datadir=/home/c-lightning/.bitcoin estimatesmartfee 100 ECONOMICAL exited with status 1
2019-06-28T17:14:39.730Z lightningd(15089): --------------------------------------------------
2019-06-28T17:14:39.730Z lightningd(15089): Server started with public key ##, alias ### (color ####) and lightningd v0.7.1rc4

Okay, so a few observations here:

  1. Seemingly spruned is taking a long time to serve estimatesmartfee for some reason. though if I try the same command now, it is instant.

  2. I can't be certain that the time is spent waiting for spruned, as no message is logged when the rpc call begins, only upon return. And further, the log message does not indicate start or duration time of the call, only the end time. This is useless for profiling. edit: I see that duration in milliseconds is included in the log message that is (finally) printed after the call note: using log-level=debug.

  3. Clearly lightningd is making rpc call to bitcoind/spruned before it forks into daemon mode. I'm not sure why that is necessary. Anyway, it has the potential to stall lightning-cli (rpc api) for an arbitrary amount of time at startup.

Suggestions:

  1. when log-level=debug, make a log entry before each rpc call, as well as after. Thus, it becomes more clear what is happening if something gets stuck.

  2. If possible, move the call to estimatesmartfee to after the point at which the daemon forks.

And of course I am open to any ideas/explanations what may be going on here to make things so slow, and how to fix it on my end.

@cdecker
Copy link
Member

cdecker commented Jun 29, 2019

Thanks @dan-da for the extensive report, I really appreciate the time you took to root-cause this.

  1. Seemingly spruned is taking a long time to serve estimatesmartfee for some reason. though if I try the same command now, it is instant.

This might be related to the DDoS attack that electrum servers are currently suffering from, I'm not too familiar with how spruned selects the electrum server to contact, but if it is one of the affected ones, that might be the root-cause for that.

  1. I can't be certain that the time is spent waiting for spruned, as no message is logged when the rpc call begins, only upon return. And further, the log message does not indicate start or duration time of the call, only the end time. This is useless for profiling. edit: I see that duration in milliseconds is included in the log message that is (finally) printed after the call note: using log-level=debug.

We might instead opt for a lower -rpcclienttimeout when calling out to bitcoin-cli (currently set to 900 seconds by default, resulting in the 15 minute wait you are experiencing above). Would exposing that through the CLI options be a solution for you?

  1. Clearly lightningd is making rpc call to bitcoind/spruned before it forks into daemon mode. I'm not sure why that is necessary. Anyway, it has the potential to stall lightning-cli (rpc api) for an arbitrary amount of time at startup.

Some calls are definitely necessary, i.e., that we are on the correct chain or what the current blockheight so we know from where to start, so moving all calls below the fork might not be such a good idea (it'd be forking and exiting immediately, without the option of displaying an error message to the user, whereas now it'll tell you some things before forking which is a better user experience imho). But the call to estimatesmartfee can definitely be moved after the daemonization. While it doesn't really solve the issue of hanging, we might at least already be in parallel operation mode, allowing us to make some progress :-)

@dan-da
Copy link
Author

dan-da commented Jun 29, 2019

@cdecker thx for looking at this.

This might be related to the DDoS attack that electrum servers are currently suffering from, I'm not too familiar with how spruned selects the electrum server to contact, but if it is one of the affected ones, that might be the root-cause for that.

ok, I'll check into that if it keeps happening, thx for the lead.

We might instead opt for a lower -rpcclienttimeout when calling out to bitcoin-cli (currently set to 900 seconds by default, resulting in the 15 minute wait you are experiencing above). Would exposing that through the CLI options be a solution for you?

I think that a bitcoind-rpc-timeout option would be a good idea, yes.

However, I would like to focus your attention on what I perceive to be the biggest issue here, which is that the log was not giving me any indication of what operation was being attempted (and blocking). So I wasted a bunch of time googling "We seem to be missing gossip messages", restarting many times, and generally going in circles. A log message prior to each call would fix this. I realize it is a lot more verbose, but hopefully could be an option. Or it could be that at lower log levels, only the pre rpc-call messages are logged, and at log-level=debug the end calls (with duration) are also printed.

Some calls are definitely necessary, i.e., that we are on the correct chain or what the current blockheight so we know from where to start, so moving all calls below the fork might not be such a good idea (it'd be forking and exiting immediately, without the option of displaying an error message to the user, whereas now it'll tell you some things before forking which is a better user experience imho).

I don't see any other rpc-calls being made prior to forking in the log on this particular startup, but I take your word for it.

regarding user experience, it is a bad one when it just hangs without reporting anything in the shell or log. I think it would be just fine if it just forked and exited immediately so that the json-rpc api is available. Any bitcoind (comm) errors can be reported via getinfo call or via debug.log. So I would suggest that the main error to report at the shell is if the something prevents the fork and/or rpc server from coming up.

But the call to estimatesmartfee can definitely be moved after the daemonization. While it doesn't really solve the issue of hanging, we might at least already be in parallel operation mode, allowing us to make some progress :-)

yes please! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants