Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add htlc dust ceiling #4837

Merged
merged 9 commits into from
Oct 23, 2021
Merged

Conversation

niftynei
Copy link
Contributor

@niftynei niftynei commented Oct 4, 2021

Implements lightning/bolts#919

@cdecker
Copy link
Member

cdecker commented Oct 5, 2021

Excellent fix @niftynei 👍

ACK 70db581

@niftynei niftynei force-pushed the nifty/htlc_fee_ceiling branch from ff38155 to 70db581 Compare October 5, 2021 16:45
@cdecker
Copy link
Member

cdecker commented Oct 6, 2021

There seem to be a couple of failing tests here:

  • Elements fees seem to behave slightly differently, which causes the following test failures:
    • test_pay_peer to slam into the dust limiter ("Too much dust to update fee"), which in a second run was obscured by a dev_pay failing with "ran out of routes" due to the channel being killed, but the cause is the same
    • test_mpp_presplit also fails with "Too much dust to update fee"
    • test_delpay_payment_split which I'm still looking into, runs out of routes, but doesn't have the dust warning
  • Normal valgrind seems to not like test_lockup_drain which admittedly was disabled due to falkyness. The last line before the crash however suggests that we are running into dust + feerate issues here too: "Feerate: RCVD_ADD_REVOCATION->RCVD_ADD_ACK_COMMIT LOCAL now 35000"

@niftynei
Copy link
Contributor Author

niftynei commented Oct 6, 2021

There seem to be a couple of failing tests here:

  • Elements fees seem to behave slightly differently, which causes the following test failures:

    • test_pay_peer to slam into the dust limiter ("Too much dust to update fee"), which in a second run was obscured by a dev_pay failing with "ran out of routes" due to the channel being killed, but the cause is the same
    • test_mpp_presplit also fails with "Too much dust to update fee"
    • test_delpay_payment_split which I'm still looking into, runs out of routes, but doesn't have the dust warning
  • Normal valgrind seems to not like test_lockup_drain which admittedly was disabled due to falkyness. The last line before the crash however suggests that we are running into dust + feerate issues here too: "Feerate: RCVD_ADD_REVOCATION->RCVD_ADD_ACK_COMMIT LOCAL now 35000"

Hmm. These elements failures are a good illustration of how this dust-bucket limit interacts badly with MPP + high feerate environments.

test_pay_peer fails because we attempt to bump the feerate above our 25% buffer; this would make our dust-bucket exceed the limit so we fail the channel. If I update the buffer to 50%, the channel no longer falls to chain but the MPP payment fails since it eats up the entire bucket budget (at the higher feerate), so we start rejecting HTLC parts.

It might be worthwhile to take this into account when routing MPPs, so as not to route too many 'dusty' payments through the same peer (esp relevant during high fee periods).

I raised the dust-bucket limit for the test to avoid this and changed the channel to "warn" instead of fail if a proposed feerate update will put you above the dust-bucket (though it's worth noting that this will render this channel unusable until feerates go back down). Not sure if this is better or worse than just dropping to chain to be honest -- this problem is hard lol.

@niftynei niftynei force-pushed the nifty/htlc_fee_ceiling branch from 70db581 to ae76ca2 Compare October 6, 2021 17:50
@cdecker
Copy link
Member

cdecker commented Oct 6, 2021

Good point, would it make sense to introduce a global limit on the total number of parts we're willing to create (i.e., a lower limit below which we stop splitting) or is this something that we should be accounted for per-channel? Both are possible, the latter requires slightly more accounting, but could potentially be more permissive. Then again, we won't have as precise an idea of what the real situation at the channel is (i e., Others may also be using the channel and we might just be contributing). Might be worthwhile to open a new issue to discuss this.

@cdecker
Copy link
Member

cdecker commented Oct 6, 2021

By the way in the elements case it'd be totally ok to just mark them as skippable, just want to make sure we don't have remotely triggerable channel closures or faults 😉

@niftynei
Copy link
Contributor Author

niftynei commented Oct 7, 2021

Haha yeah, except in this case I think it's kind of a good idea to 'mark' the test as requiring a change in the max htlc exposure limit. Probably could have been more explicit by only varying it in the case of it being the liquid network, oh well.

Started an issue to discuss strategies etc around the MPP split + dust limit interactions.
#4846

@niftynei niftynei force-pushed the nifty/htlc_fee_ceiling branch from ae76ca2 to 6a5a94a Compare October 7, 2021 18:12
@rustyrussell
Copy link
Contributor

I disagree with the spec PR anyway...

@cdecker
Copy link
Member

cdecker commented Oct 12, 2021

Trying to dig into the issues here. test_delpay_payment_split (with elements) looks like we're causing our own demise:

lightningd-1: 2021-10-12T15:03:44.887Z DEBUG   lightningd: Payment 3/4: 8932346msat PENDING
lightningd-1: 2021-10-12T15:03:44.888Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: NEW:: HTLC LOCAL 0 = SENT_ADD_HTLC/RCVD_ADD_HTLC
lightningd-1: 2021-10-12T15:03:44.888Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Adding HTLC 0 amount=8972262msat cltv=120 gave CHANNEL_ERR_ADD_OK
lightningd-1: 2021-10-12T15:03:44.888Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_UPDATE_ADD_HTLC
lightningd-1: 2021-10-12T15:03:44.888Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: REPLY WIRE_CHANNELD_OFFER_HTLC_REPLY with 0 fds
lightningd-1: 2021-10-12T15:03:44.889Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: NEW:: HTLC LOCAL 1 = SENT_ADD_HTLC/RCVD_ADD_HTLC
lightningd-1: 2021-10-12T15:03:44.889Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Adding HTLC 1 amount=11666821msat cltv=120 gave CHANNEL_ERR_ADD_OK
lightningd-1: 2021-10-12T15:03:44.889Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_UPDATE_ADD_HTLC
lightningd-1: 2021-10-12T15:03:44.889Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: REPLY WIRE_CHANNELD_OFFER_HTLC_REPLY with 0 fds
lightningd-1: 2021-10-12T15:03:44.890Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: NEW:: HTLC LOCAL 2 = SENT_ADD_HTLC/RCVD_ADD_HTLC
lightningd-1: 2021-10-12T15:03:44.890Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Adding HTLC 2 amount=11620273msat cltv=120 gave CHANNEL_ERR_ADD_OK
lightningd-1: 2021-10-12T15:03:44.890Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_UPDATE_ADD_HTLC
lightningd-1: 2021-10-12T15:03:44.890Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: REPLY WIRE_CHANNELD_OFFER_HTLC_REPLY with 0 fds
lightningd-1: 2021-10-12T15:03:44.890Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: NEW:: HTLC LOCAL 3 = SENT_ADD_HTLC/RCVD_ADD_HTLC
lightningd-1: 2021-10-12T15:03:44.891Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Adding HTLC 3 amount=8932436msat cltv=120 gave CHANNEL_ERR_ADD_OK
lightningd-1: 2021-10-12T15:03:44.891Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_UPDATE_ADD_HTLC
lightningd-1: 2021-10-12T15:03:44.891Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: REPLY WIRE_CHANNELD_OFFER_HTLC_REPLY with 0 fds
lightningd-1: 2021-10-12T15:03:44.891Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: NEW:: HTLC LOCAL 4 = SENT_ADD_HTLC/RCVD_ADD_HTLC
lightningd-1: 2021-10-12T15:03:44.891Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Adding HTLC 4 amount=8808711msat cltv=120 gave CHANNEL_ERR_ADD_OK
lightningd-1: 2021-10-12T15:03:44.892Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_UPDATE_ADD_HTLC
lightningd-1: 2021-10-12T15:03:44.892Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: REPLY WIRE_CHANNELD_OFFER_HTLC_REPLY with 0 fds
lightningd-1: 2021-10-12T15:03:44.897Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_WARNING
lightningd-1: 2021-10-12T15:03:44.897Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: billboard perm: Too much dust to update fee (Desired feerate update 11000)
lightningd-1: 2021-10-12T15:03:44.898Z INFO    022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Peer transient failure in CHANNELD_NORMAL: channeld WARNING: Too much dust to update fee (Desired feerate up
date 11000)
lightningd-1: 2021-10-12T15:03:44.898Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Status closed, but not exited. Killing

This is lightning-1 creating 5 HTLCs, and then killing the channel when attempting to commit. Two questions here:

  • In this case it's locally triggered, could this be triggered remotely as well?
  • Can we move the dust check earlier, before we attempt to add the dusty HTLC, thus avoiding killing the channel?

The fix here is likely to increase the channel capacity, or reduce the splits by reducing the amount, the highlighted issue is but a side-effect in this case.

@rustyrussell
Copy link
Contributor

Sorry, AFAICT the whole concept of this change is wrong. Maybe because the spec PR is an unreadable mess, but I've left comments there, too.

@cdecker
Copy link
Member

cdecker commented Oct 15, 2021

Turns out that the elements test is failing correctly, with a feerate adjustment after the htlcs have been added, pushing them into the dust bucket (despite 10% safety margin). So we should adjust the test.

@rustyrussell how would you solve this if we could do a complete greenfield implementation?

@niftynei and I had been wondering if putting the channel in a drain mode, in which we fail HTLCs that we'd add to the dust bucket, and accept + fail + remove new HTLCs from remote. This would allow us to be purely reactive, and still drain HTLCs from the channel, thus avoiding stuck payments that could tear down the channel altogether.

@niftynei
Copy link
Contributor Author

Ok so per @rustyrussell's proposal to change the spec, lightning/bolts#919 (comment)

(On sending update_fee): MUST NOT send an update_fee which would increase the total dust on sent HTLCs in the peer's commitment transaction above max_received_dust_msat. MAY send an update-fee which will increase the total dust on its own commitment transaction above max_received_dust_msat.

According to this, we simply would not send the update_fee for the case of the currently failing liquid-regtest.

@niftynei
Copy link
Contributor Author

Ok posted to the other PR as well; if we use @rustyrussell's proposed rule we'd need

  1. a feature flag about this policy for new channel creation
  2. some way to signal an upgrade to this policy for existing channels

I don't think we've shipped 2. yet across the network.

Unclear to me how this offers protection to existing channels right now if we can't upgrade them?

To reduce the surface area of amount of a channel balance that can be
eaten up as htlc dust, we introduce a new config
'--max-dust-htlc-exposure-msat', which sets the max amount that any
channel's balance can be added as dust

Changelog-Added: config: new option --max-dust-htlc-exposure-msat, which limits the total amount of sats to be allowed as dust on a channel
Liquid's threshold for dust is a bit higher, so we bump up the max limit
here so we can actually get the whole MPP'd payment sent over the wire
If we're over the dust limit, we fail it immediatey *after* commiting
it, but we need a way to signal this throughout the lifecycle, so we add
it to htlc_in struct and persist it through to the database.

If it's supposed to be failed, we fail after the commit cycle is
completed.
for every new added htlc, check that adding it won't go over our 'dust
budget' (which assumes a slightly higher than current feerate, as this
prevents sudden feerate changes from overshooting our dust budget)

note that if the feerate changes surpass the limits we've set, we
immediately fail the channel.
And update some behavior to check both sides on receipt of a
update_fee, as per the proposed spec.

lightning/bolts#919
Let's make this a softer launch by just warning on the channel til the
feerates go back down.

You can also 'fix' this by upping your dust limit with
the `max-dust-htlc-exposure-msat` config.
Fails liquid-regtest otherwise; liquid tends to hit the dust limit
earlier than non-liquid tx, and MPP exacerbates this by divvying up
payments into dusty bits then attempting to shove them through the same
channel, hitting the dust max. The MPP then fails as not all the parts
were able to arrive at their destination.
@cdecker cdecker force-pushed the nifty/htlc_fee_ceiling branch from 6a5a94a to a8c061e Compare October 22, 2021 15:46
We were triggering the dust panic.
@cdecker
Copy link
Member

cdecker commented Oct 23, 2021

ACK 8f68dba

@cdecker cdecker merged commit 012c2ec into ElementsProject:master Oct 23, 2021
@whitslack
Copy link
Collaborator

@niftynei: Shouldn't this PR have added max-dust-htlc-exposure-msat to doc/lightningd-config.5.md? I'm seeing that the 0.10.2 release adds this new option without having documented it.

@vincenzopalazzo
Copy link
Contributor

Right @whitslack,

This 272fb72 should fix the make check and fixed the regression on this side by adding the check on the doc with the integration testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants