Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: dust HTLC is not failed upstream before downstream channel close is confirmed on-chain #7969

Closed
C-Otto opened this issue Sep 9, 2023 · 3 comments · Fixed by #9068
Closed
Assignees
Labels
bug Unintended code behaviour force closes needs triage P1 MUST be fixed or reviewed
Milestone

Comments

@C-Otto
Copy link
Contributor

C-Otto commented Sep 9, 2023

Please also have a look at #7683 for a very related discussion.

Background

I analyzed two related unilateral channel closes related to a dust transaction on the route LOOP -> c-otto.de -> xmrk. My node closed the channel to xmrk, but the corresponding transaction did not confirm in time. As a result, LOOP closed the channel to my node.

During the deadline computation the dust HTLC is ignored, and instead another pending HTLC (LightningATM -> c-otto.de -> xmrk) is used to determine the urgency of getting the close transaction confirmed.

I believe that

  • lnd should take dust HTLCs into account when computing the deadline for the close transaction, and
  • fail dust HTLCs upstream no matter what (possibly at a loss)

Your environment

  • version of lnd: lnd 0.16.4
  • which operating system (uname -a on *Nix): Linux server 6.1.0-11-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4 (2023-08-08) x86_64 GNU/Linux
  • version of btcd, bitcoind, or other backend: bitcoind v25

Steps to reproduce

(All times are CEST. I replaced channel IDs etc. with the corresponding peer names for readability. Raw logs are available upon request.)

Between blocks 806455 and 806456 my node tried to forward two HTLCs to the node xmrk, which failed:

2023-09-06 12:41:36.290 [DBG] HSWC: ChannelLink(xmrk): queueing keystone of ADD open circuit: (Chan ID=LOOP, HTLC ID=131805)->(Chan ID=xmrk, HTLC ID=71431)
2023-09-06 12:41:36.359 [DBG] HSWC: ChannelLink(xmrk): removing Add packet (Chan ID=LOOP, HTLC ID=131805) from mailbox
2023-09-06 12:43:29.324 [DBG] HSWC: ChannelLink(xmrk): queueing keystone of ADD open circuit: (Chan ID=LightningATM, HTLC ID=94616)->(Chan ID=xmrk, HTLC ID=71432)
2023-09-06 12:44:29.389 [ERR] HSWC: ChannelLink(xmrk): failing link: unable to complete dance with error: remote unresponsive
2023-09-06 12:44:29.390 [DBG] HSWC: Expiring add htlc with keystone=(Chan ID=LightningATM, HTLC ID=94616) --> (Chan ID=xmrk, HTLC ID=71432)
2023-09-06 12:44:29.391 [DBG] HSWC: ChannelLink(LightningATM): queueing removal of FAIL closed circuit: (Chan ID=LightningATM, HTLC ID=94616)->(Chan ID=0:0:0, HTLC ID=0)
2023-09-06 12:44:29.413 [DBG] HSWC: ChannelLink(LightningATM): removing Fail/Settle packet (Chan ID=LightningATM, HTLC ID=94616) from mailbox

The HTLC coming from LOOP was just 1 sat and had a refund timeout of 806495 (as seen in "RemotePendingHtlcSet", logged when restarting lnd).

The other HTLC, coming from LightningATM, was worth around 50,003 sat, which appears as an output in the resulting close transaction. This HTLC had a refund timeout of 806774, see below.

At block 806495, i.e. ~40 blocks later, my node closed the channel to xmrk. This corresponds to the time lock delta configured by LightningATM on the channel to c-otto.de.

2023-09-06 19:45:47.630 [INF] CNCT: ChannelArbitrator(xmrk): force closing chan
2023-09-06 19:45:47.769 [DBG] CNCT: ChannelArbitrator(xmrk): calculated deadline: 279, using deadlineMinHeight=806774, heightHint=806495
2023-09-06 19:45:47.779 [DBG] CNCT: ChannelArbitrator(xmrk: skipped deadline for dust htlc=DUST_HTLC_WITH_LOOP

Block 806774 confirmed at 2023-09-08 20:34. This deadline corresponds to the refund timeout of the non-dust HTLC coming from LightningATM.

At block 806595 (2023-09-07 14:19:29.58) the close transaction published by LOOP was received by my node. This is 140 blocks after the HTLC from LOOP was processed, which corresponds to LOOP's time lock delta of 144 (I think the close transaction got stuck in mempool for a few blocks).

Close transaction with xmrk (unconfirmed as of now): https://mempool.space/tx/8ecd3f0aca30f404926aa0b1ceae7ef2ad46b40315edfca0f0dd598511156858

Close transaction with LOOP:
https://mempool.space/tx/db20a2ddc9985204b8975cd074434887f5f98801db75a05502fa304fb8e8667b

Expected behaviour

When closing the channel to xmrk, the dust HTLC should have been taken into account to compute the timeout. Furthermore, even though the close transaction was not confirmed, yet, the dust HTLC should have been failed upstream to LOOP to avoid them closing the channel.

Actual behaviour

Only an unrelated HTLC, with a later timeout, was regarded to compute the deadline for the close transaction. Furthermore, the dust HTLC was never failed upstream by my node.

@C-Otto C-Otto added bug Unintended code behaviour needs triage labels Sep 9, 2023
@C-Otto C-Otto changed the title [bug]: dust HTLC is not failed upstream before downstream transaction is settled [bug]: dust HTLC is not failed upstream before downstream transaction is settled on-chain Sep 9, 2023
@C-Otto C-Otto changed the title [bug]: dust HTLC is not failed upstream before downstream transaction is settled on-chain [bug]: dust HTLC is not failed upstream before downstream channel close is confirmed on-chain Sep 9, 2023
@saubyk saubyk added force closes P1 MUST be fixed or reviewed labels Sep 13, 2023
@ziggie1984
Copy link
Collaborator

Good point, I think we should optimistically fail all outgoing htlcs back as soon as we go onchain for a channel contract, and keep the calcuation for the CPFP fee only tied to real HTLCS.

Apart from that you can still help yourself right now by disconnecting the incoming peer: #7701, all htlcs are reconsidered when reconnecting and failed back if the timeout is below 13 blocks of the due date.

@Roasbeef
Copy link
Member

Related PR: #7685

@Roasbeef
Copy link
Member

Yeah I think this is a good idea, we've seen some instances of a cascade due to people not being able to confirm in time lately.

Code diff should be relatively minimal, just moving the area we send the cancel resolution to right when we go to force close vs when we detect something on chain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unintended code behaviour force closes needs triage P1 MUST be fixed or reviewed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants