Bad commit_sig signature #4152

darosior · 2020-10-22T20:43:19Z

az0re on IRC just reported it. Unfortunately Github won't let them get here because they use Tor.
Told them to send the (private) signature, transaction and feerate by mail.

The text was updated successfully, but these errors were encountered:

cdecker · 2020-10-26T11:08:55Z

Very interesting indeed, looking at the transaction whose signature failed it seems like the transaction that ultimately made it on-chain looks identical to the one which failed the signature. The fact that the unilateral close was ours suggests that the two consecutive states look identical, narrowing this down to a feerate being the cause for the new commit. However we also add and remove an HTLC that are identical except the payment_hash:

       "htlcs": [
	  {
	     "direction": "out",
	     "id": 95,
	     "msatoshi": 125000125,
	     "amount_msat": "125000125msat",
	     "expiry": 653923,
	     "payment_hash": "hashA",
	     "state": "SENT_ADD_ACK_REVOCATION"
	  },
	  {
	     "direction": "out",
	     "id": 96,
	     "msatoshi": 125000125,
	     "amount_msat": "125000125msat",
	     "expiry": 653923,
	     "payment_hash": "hashB",
	     "state": "RCVD_REMOVE_HTLC"
	  }
       ]

So it could be that the unilateral close was really the one with hashB, and the new (failed commitment) state was the one with hashA. The counterparty being an lnd node, which we had synchronization issues in the past with rapid successions of HTLCs suggests that we might have de-synched inbetween the two commits.

0e319b6 · 2020-11-18T19:30:59Z

This happened again with a different peer, this time with an empty htlcs field, using git cd7d5cd. I will send another email with the listpeers object and relevant log excerpt.

0e319b6 · 2021-04-14T16:21:16Z

This happened yet again, using git bbfcae6, this time with totally full HTLC slots. All but two of those HTLCs were incoming, in state RCVD_REMOVE_REVOCATION and the two outgoing HTLCs were in state SENT_ADD_ACK_REVOCATION. This one might be a simple race issue, where both sides see max - 1 HTLCs and so simultaneously think it's OK to add one more HTLC. Then maybe they both issue an HTLC and then are unable to deal with the resulting channel state with more than the max number of allowed in-flight HTLCs. Being a bit flexible with the maximum number of allowed in-flight HTLCs might help.

This bug has cost me a ton of money, and fixing it is clearly not a priority. Despite my significant investment in C-Lightning infrastructure, I have started planning to migrate to an LND node. I am not willing to keep burning satoshis like this. I will also warn other people not to use C-Lightning on mainnet until this bug is fixed so they don't get burned, either.

rustyrussell · 2021-04-16T06:57:10Z

OK, the good news is I've caught one of these in my logs, too! Finally...

@cdecker's idea that it's around feerate changes seems to be borne out here. Seems like we were updating fees, both sides were on different levels, and then when we tried updating fees again (while they were still in flux), something b0rked.

There's a simple workaround, however: we can avoid changing fees again until they're completely quiescent: we know the fee logic works in the simple cases, otherwise we wouldn't keep channels open at all.

That means even if LND's or our fee logic doesn't work in complex cases (and I'm still tracking this down to try to get the exact cause here), we won't trigger it in the meantime.

rustyrussell · 2021-04-16T07:00:55Z

This bug has cost me a ton of money, and fixing it is clearly not a priority. Despite my significant investment in C-Lightning infrastructure, I have started planning to migrate to an LND node. I am not willing to keep burning satoshis like this. I will also warn other people not to use C-Lightning on mainnet until this bug is fixed so they don't get burned, either.

I totally understand! It's a nasty bug, and because it wasn't happening all the time, it didn't hit the top of our TODO list. However, now I've got some more clues I am pretty sure I can work around it (for existing peers), and make sure that we're actually doing the right thing (fixing LND or c-lightning, whichever is wrong).

cdecker · 2021-04-16T07:07:15Z

There's a simple workaround, however: we can avoid changing fees again until they're completely quiescent: we know the fee logic works in the simple cases, otherwise we wouldn't keep channels open at all.

What are you referring to when you say quiescent? Do you mean no feerate update in flight, no HTLCs in flight or no commitment? I noticed that reports often share a couple of HTLCs and some are even close to the default 30 concurrent HTLCs. Some of these nodes seem incredibly busy, so waiting for there to be no HTLCs might starve our fee adjustment. Would remembering the last couple of feerates and just trying them out work?

rustyrussell · 2021-04-16T10:14:30Z

Was planning on "no feerates in flight". In my logs, there are multiple fee changes going on at the same time...

0e319b6 · 2021-04-16T20:43:33Z

I totally understand! It's a nasty bug, and because it wasn't happening all the time, it didn't hit the top of our TODO list. However, now I've got some more clues I am pretty sure I can work around it (for existing peers), and make sure that we're actually doing the right thing (fixing LND or c-lightning, whichever is wrong).

Super glad this might be resolved soon. I really don't want to migrate to LND, so if I stop bleeding satoshis then I am absolutely going to stay on C-Lightning. Thanks for looking at this!

There are several reports of desynchronizaion with LND here; a simple approach is to only have one feerate change in flight at any time. Even if this turns out to be our fault, it's been a historic area of confusion, so this restriction seems reasonable. Fixes: ElementsProject#4152 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

There are several reports of desynchronizaion with LND here; a simple approach is to only have one feerate change in flight at any time. Even if this turns out to be our fault, it's been a historic area of confusion, so this restriction seems reasonable. Changelog-Fixed: Protocol: Don't create more than one feerate change at a time, as this seems to desync with LND. Fixes: ElementsProject#4152 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

There are several reports of desynchronization with LND here; a simple approach is to only have one feerate change in flight at any time. Even if this turns out to be our fault, it's been a historic area of confusion, so this restriction seems reasonable. Changelog-Fixed: Protocol: Don't create more than one feerate change at a time, as this seems to desync with LND. Fixes: ElementsProject#4152 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

There are several reports of desynchronization with LND here; a simple approach is to only have one feerate change in flight at any time. Even if this turns out to be our fault, it's been a historic area of confusion, so this restriction seems reasonable. Changelog-Fixed: Protocol: Don't create more than one feerate change at a time, as this seems to desync with LND. Fixes: #4152 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

cdecker added the lnd-compat label Oct 26, 2020

rustyrussell self-assigned this Apr 16, 2021

rustyrussell added this to the v0.10.1 milestone Apr 16, 2021

rustyrussell mentioned this issue Apr 18, 2021

channeld: don't ever send back-to-back feerate changes, fix feerate logic. #4480

Merged

rustyrussell closed this as completed in #4480 Apr 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad commit_sig signature #4152

Bad commit_sig signature #4152

darosior commented Oct 22, 2020

cdecker commented Oct 26, 2020

0e319b6 commented Nov 18, 2020

0e319b6 commented Apr 14, 2021 •

edited

Loading

rustyrussell commented Apr 16, 2021

rustyrussell commented Apr 16, 2021

cdecker commented Apr 16, 2021

rustyrussell commented Apr 16, 2021

0e319b6 commented Apr 16, 2021

Bad commit_sig signature #4152

Bad commit_sig signature #4152

Comments

darosior commented Oct 22, 2020

cdecker commented Oct 26, 2020

0e319b6 commented Nov 18, 2020

0e319b6 commented Apr 14, 2021 • edited Loading

rustyrussell commented Apr 16, 2021

rustyrussell commented Apr 16, 2021

cdecker commented Apr 16, 2021

rustyrussell commented Apr 16, 2021

0e319b6 commented Apr 16, 2021

0e319b6 commented Apr 14, 2021 •

edited

Loading