-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPP pay splits into too many subpayments #3926
Comments
Can you provide logs? In particular a message "Split into " in the inform-level logs (though debug-level logs would be best). Is the receiver also forwarding/receiving payments at the same time as well? |
First time i tried this, i ended up with #3846, then with #3848 and then with #3851, and then with #3915. The logs always looked like this:
|
The reason the logs are important is that we have to know if it is the presplit splitter that is too aggressive, or the adaptive splitter. |
Finally: if you do not want to try paying again for a while due to the extant bugs: how many channels do you have? |
And if possible: if the destination is published, how many channels does it have? If the destination is not published, how many routehints were on the invoice? @cdecker maybe we should consider as well the HTLC budget on the payee end, not just on the payer end? In #3916 @thestick613 claims to have over 100 HTLCs pending, if those are from a single |
Logs incoming:
and it goes on like this for a while The pay command outputed, which i hope means it failed.
The payment size is ~500.000 satoshi. I have over 16 channels that have this much available to pay capacity. It just seems odd to split the payment into 51 subpayments. |
And surprisingly ... |
Indeed. It looks like we did not test the case where the payer has substantially greater number of channels than the payee, or even the average number of channels that a typical network node has. We should probably get some statistics for the median number of channels that a network node has, or maybe mean to bias it for the benefit of the majority of the network. We estimate a certain number of HTLCs that each outgoing channel can have; if you do not mess around with some of the more obscure options we have, I think we go with 10 HTLCs per outgoing channel. Since you have 16 channels, that amounts to 160 HTLCs limit. Then, at the presplit stage, we take your 500,000-sat invoice and divide it into 10,000-sat lots (the 10,000-sat is hardcoded, based on @cdecker statistics that 83% of 10,000-sat HTLCs reach random destinations from his controlled nodes). This results in about 50 initial HTLCs. Since this is far lower than the 160 HTLCs limit we got from you having 16 channels, the presplitter goes ahead with splitting into 10,000-sat lots (if the number of lots were greater than our computed HTLCs limit, we would have increased the lot size accordingly). You got 51 lots because we do not actually split into exact lot sizes of 10,000 sats each, we randomize a bit, so getting 49->51 lots is expected, more rarely a wider spread, but that is approximately what we expect. But the root is really that the payer has a ton of channels and that made us assume that the limit on the number of HTLCs was far greater than what the rest of the network, and/or the receiver, can handle. Our initial HTLC limit should be the lowest among:
This prevents the following issues:
Those are probably the reason why you get these logs:
Not a surprise. Our new |
No, that is incorrect, who the heck wrote that? We only get errors that detailed if and only if it is a local error. Remote errors just get the failure code and a useless "reply from remote" message. So it looks like we tried to pile on the 51 sub-payments on a single channel (probably because it was the most direct route to the destination), which choked. We should not do that. Instead, we should round-robin the outgoing channels, as I suggested in #3894. The reason why we should avoid piling such HTLCs on a single channel is that we want to avoid incorrect capacity estimates of our local channels. Nevertheless, we do see some later |
…on channels. Fixes: ElementsProject#3926 (probably) Changelog-Fixed: pay: Also limit the number of splits if the payee seems to have a low number of channels that can enter it, given the max-concurrent-htlcs limit.
…on channels. Fixes: ElementsProject#3926 (probably) Changelog-Fixed: pay: Also limit the number of splits if the payee seems to have a low number of channels that can enter it, given the max-concurrent-htlcs limit.
Hi @thestick613 , I wrote #3936 based on your report, if you have the inclination, could you check that PR and see if it fixes your problem? |
…on channels. Fixes: ElementsProject#3926 (probably) Changelog-Fixed: pay: Also limit the number of splits if the payee seems to have a low number of channels that can enter it, given the max-concurrent-htlcs limit.
…on channels. Fixes: ElementsProject#3926 (probably) Changelog-Fixed: pay: Also limit the number of splits if the payee seems to have a low number of channels that can enter it, given the max-concurrent-htlcs limit.
…on channels. Fixes: ElementsProject#3926 (probably) Changelog-Fixed: pay: Also limit the number of splits if the payee seems to have a low number of channels that can enter it, given the max-concurrent-htlcs limit.
I just ran into a mega case of this. Why do you split a large payment into so many tiny payments right away, without first trying the large payment? Basically, why run a "presplitter" at all? Isn't the adaptive splitter all you need? I tried to pay a ~3.8-Msat invoice, and C-Lightning foolishly split my payment into 385 sub-payments, which of course was never going to work. Most of the attempts failed with "WIRE_TEMPORARY_CHANNEL_FAILURE: Too many HTLCs." Worse, it was completely unclear to me whether the overall payment attempt had actually been abandoned or was still ongoing in the background. The How am I supposed to know when C-Lightning has fully abandoned my payment attempt? It is unclear to me whether/when I will be guaranteed that the payment failed safely and I will not lose this money. |
When However, you should not execute a new So:
|
Is that cryptographically enforced or merely a typical node policy? Couldn't a receiver say, "oh yeah, I'll claim sub-payments totaling 80% of the invoice amount"? (Ah, I see you edited your reply to acknowledge this possibility further down.)
This warning needs to go in the manpage!
60 seconds? So how do I still have attempts in
Yeaaahhhhhh.
How would I know when/if that has occurred? |
If this were to happen, how would I find out about it? |
@whitslack how many channels do you have and why did presplitter allow 385?? wow. @cdecker I think we should consider not returning immediately once we have decided to stop retrying and a failing payment comes on. The current
I think the above two items above deserve their own issue.... The easy fix is to simply:
This also gives us a decent point at which to delete the entire payment tree. This seems to be an argument for having a flat array of payments we consider as pending, as well. |
Are they still |
Months later when your LN node has 0 funds on it. :(. In theory you could keep polling the For now, you might want to use |
FWIW I just realized this utterly bad behavior now. Sigh. |
I have no idea. It seems quite excessive. I get that 3.8 Msat is a larger payment than would typically be sent over the Lightning Network, but I wanted to try out the new multi-part payments functionality, and I figured the whole point of MPP is to allow sending large payments. But splitting 3.8 Msat into ~10-ksat chunks is pretty ridiculous, IMHO. |
Thanks for the tip.
730 sub-payment attempts in |
Disconcerting that |
The presplitter allocates 10 HTLCs per outgoing channel (if your default The presplitter will not split to lower than 10,000-sat parts (the adaptive splitter will not split lower than 100-sat parts). Since your initial limit is 700 parts, but you were sending 3,800,000 sats, the presplitter went with 380 x 10,000 parts (the extra 5 is due to randomizing the splits, you actually get "around" 380 parts, with some randomness). That is a very popular node you are running there. For now, I suggest using |
… of HTLCs based on payee connectivity. Fixes: ElementsProject#3926 (probably) Changelog-Fixed: pay: Also limit the number of splits if the payee seems to have a low number of channels that can enter it, given the max-concurrent-htlcs limit.
And here I was thinking that I was doing myself a favor by getting my node "well established" in the network while chain fees are still somewhat affordable. I never imagined that a better connected node would be less able to complete a payment! 😅 |
Haha, yes. #3936 should help with that issue, if you are willing to get on master and in addition pull the PR as well. It is very hard to have tests that simulate a heavily-connected node, for the very simple reason we have to run tons of |
I'll try again when bitcoin fees are lower. |
Tried with #3936
In the end, it failed as well. |
|
Tried a different, smaller one (0.01 BTC) with https://boltz.exchange/, and MPP pay seemed to work. The |
15 minutes, heck. Did you feed a Getting #3917 merged in would be much more helpful to help figure out the bad behavior of the algorithm. In particular we are not currently printing any logs when the adaptive splitter triggers. So existing logs, while useful, could be improved. Sorry for this. @cdecker @niftynei @rustyrussell I would rather not merge unilaterally, please review #3917? In any case thanks for the effort @thestick613, sorry for the losses incurred in fees testing this, and yes please provide logs if possible. Looks to me we want to consider having a paymod that merges multiple failing subpayments. But by what rule do we use to determine if we should merge instead of split? Obviously at some point our channel hints are poisoned, since the same error is reported for both insufficient msat capacity and insufficient HTLC capacity: the answer to insufficient msat capacity is split, but the answer to insufficient HTLC capacity is merge. This is complicated by the fact that at least some of the HTLC-capacity limit is caused by onchain fees. So if onchain fees are high, every split-out becomes expensive. In particular, there will always exist some N where a single N-amount HTLC can pass through a channel, but two N/2 HTLCs cannot. What we need is to figure out whether our splits are above or below this N. If it is above, we should split, if it is below, we should merge. Hmmmmmm. |
I just wrote |
Received, thanks! |
Sorry, I don't have logs or anything, just some anedoctal evidence: I tried 3 times over a couple of hours to pay a 1344342 sat invoice right now with Then I tried The receiver was a node with only 5 channels. Besides that, I don't really have good stats on this, but it seems payments originating from t.me/lntxbot are failing more since I upgraded to 0.9.x, or at least that's what the number of people complaining say (which is never too much, so the increase may be a coincidence). My uninformed opinion is that the splitting is too aggressive. |
@ZmnSCPxj: Is claiming a partial payment actually possible? I just came across some messages between you and @Roasbeef discussing how atomic multipath payments would prevent the claiming of any sub-payment until all sub-payments have been routed through. Am I misunderstanding? |
@whitslack yes. It would be irrational to do so if the preimage is somehow inherently valuable, with equal value to the invoice amount (for example, if it unlocks some onchain HTLC, or if it is the decryption key of some valuable data, or it is used as an authorization to access some resource, or somebody will beat you up if you don't pay them and can show a preimage to prove you paid them). However, if the preimage is not valuable, claiming a partial payment may be rational. Consider the case where a custodial service exists on Lightning (custodial services are evil, of course). The evil custodial service could support paying an arbitrary invoice. A user of that custodial service could attack the custodial service if it knows the custodial service performs MPP, and treats failure of one sub-payment as failure of the entire sub-payment.
Thus, it is required not to consider a payment as definitely failed until all sub-payments have failed. If we consider the payment as failed if some sub-payment is failed, then we risk falling victim to the above attack. |
@whitslack what you linked is not the same as MPP, incidentally. The current MPP system has the assumption that the invoice preimage is valuable (proof-of-payment). AMP does not have this assumption, but loses proof-of-payment completely (it is basically a multipath |
@ZmnSCPxj: In your attack scenario, all of the sub-payments are successfully routed all the way to the payee (the attacker), but the attacker chooses to reject some of them. Can the attack also succeed if the attacker does NOT learn of all of the sub-payments — i.e., if only some of the sub-payments route all the way to the attacker while others fail in mid-route due to channel capacity exhaustion or outstanding HTLC limits?
Ahh! I was wondering why the "AMP" acronym was quietly replaced with "MPP." I was thinking that MPP was still atomic, so thank you for the clarification. I think this is a point that does not have enough visibility to the user community. I know for me personally, the atomicity guarantee is more important than proof of payment, as I'm nearly always paying payees whom I trust; it's the software between us that I don't trust. |
Yes.
MPP is still atomic assuming you value the proof-of-payment. For example, in a reverse submarine swap / loop out / whatever the heck they call it these days. The trust in the software can be increased by us reporting failure if and only if all parts are failures. In the case the payee is trying to abuse your trust, you would see the payment as stuck even if our software has given up paying. I am still trying to grok this part of the new |
… of HTLCs based on payee connectivity. Fixes: ElementsProject#3926 (probably) Changelog-Fixed: pay: Also limit the number of splits if the payee seems to have a low number of channels that can enter it, given the max-concurrent-htlcs limit.
… of HTLCs based on payee connectivity. Fixes: ElementsProject#3926 (probably) Changelog-Fixed: pay: Also limit the number of splits if the payee seems to have a low number of channels that can enter it, given the max-concurrent-htlcs limit.
… of HTLCs based on payee connectivity. Fixes: ElementsProject#3926 (probably) Changelog-Fixed: pay: Also limit the number of splits if the payee seems to have a low number of channels that can enter it, given the max-concurrent-htlcs limit.
… of HTLCs based on payee connectivity. Fixes: ElementsProject#3926 (probably) Changelog-Fixed: pay: Also limit the number of splits if the payee seems to have a low number of channels that can enter it, given the max-concurrent-htlcs limit.
… of HTLCs based on payee connectivity. Fixes: ElementsProject#3926 (probably) Changelog-Fixed: pay: Also limit the number of splits if the payee seems to have a low number of channels that can enter it, given the max-concurrent-htlcs limit.
… of HTLCs based on payee connectivity. Fixes: #3926 (probably) Changelog-Fixed: pay: Also limit the number of splits if the payee seems to have a low number of channels that can enter it, given the max-concurrent-htlcs limit.
… of HTLCs based on payee connectivity. Fixes: ElementsProject#3926 (probably) Changelog-Fixed: pay: Also limit the number of splits if the payee seems to have a low number of channels that can enter it, given the max-concurrent-htlcs limit.
I have an outbound channel with an BTC to Lightning exchange worth 2.000.000 satoshi, with enough capacity.
I try to swap my 2.000.000 satoshi for the same amount, but on-chain.
Lightningd splits the payment into too many subpayments, and eventually fails, even though the destination is only one hop away, and has enough capacity.
The text was updated successfully, but these errors were encountered: