-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split by fibonacci sequence #3951
Conversation
8121cc1
to
851d5b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my general sense reading over this and the few other PRs related to splitting seem to indicate that for certain (i.e. large) payments, splitting at the 10k boundary is infeasible because of htlc count bandwidth limits -- too many parts for too little htlc space. which is to say that we're going to move away from the '80% good' number anyway -- using the fibonnaci sequence as a scaling factor for payment splits seems as reasonable as any for this.
one thing that confuses me a bit about this PR is that the fibonnacci sequence steps, aside from the first step, aren't close to the 10k 80% pass-able limit. if we're using a ladder of values, doesn't that push our payment reliability for the parts away from the 80% metric? what level of reliability can we expect at ever step in the split ladder?
i do wonder if there's a more intelligent way to split large payments, i.e. one large piece and then a bunch of smaller near the 10k limit? though if the larger payment won't get through, the smaller ones are useless.
at what point is pre-emptive splitting really more harmful than helpful? what are we hoping to gain from doing splits on this ladder schedule?
For the steps on the ladder far above the 10k-sat boundary, the point is not so much reliability as it is privacy. Here I describe "standardized multipart splitting amounts": https://lists.linuxfoundation.org/pipermail/lightning-dev/2020-January/002431.html If the payment succeeds at higher ladder steps, it leaks less information to use standardized split sizes. An idea is to just remove the presplitter entirely as well, but now you leak the initial payment size to some nodes. Though note that a lot of the privacy boost of standardized multipart splitting amounts only really exists once we have path decorrelation. |
851d5b4
to
fd5ddcf
Compare
This PR doesn't do this though. afaict it picks a common target and splits the payment into equal sized parts. It'd be nice if it did the adaptive fitting that you outlined in your post. |
Was thinking of implementing that as well. Give me a few days, am working on multifundchannel... |
53d38e9
to
9e57d4f
Compare
Changelog-Changed: pay: Adaptive splitter now splits by golden ratio, which should cause us to split approximately by a Fibonacci sequence. This is conjectured to result in fewer splits.
…nd cap the number of presplits. Changelog-Changed: pay: Cap the number of presplits, and use a Fibonacci schedule for presplits.
The fuzz only exists to prevent intermediate nodes from guessing the distance to the payee by subtracting the nearest lower bin, which gives us the fee paid to reach the payee. But since we have a default limit on the max fee budget of 0.5% (that some of our users complain is *too big*), fuzzing around the bin size by +/-1% would be sufficient to obscure the fee paid to reach the payee. This reduces the presplitter +/-25% and the adaptive splitter +/-10% to just 5%, which is more than enough to obscure the fee paid to reach the payee, and could probably be made even smaller.
9e57d4f
to
0d5145d
Compare
Rebased to adapt to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you got carried away with "fibonacci everything", even where there's no good rationale.
Overall I think having a schedule that covers larger amounts is definitely needed, and fibonacci seems at first sight to be just as good as power of two (they share the same asymptotic behavior btw), so I'm not opposed to that. However there are a number of things that I don't like in this PR:
- The choice of fibonacci, or any unequal splitting schedule, is based on the assumption that we can guarantee that the larger of the two splits reaches the channel that'd be the bottleneck before the smaller one. Just queuing them in the correct order does not provide that guarantee even locally where we have that information, and any ordering we might want to impose breaks down pretty quickly. If we want to take advantage of the larger preceding the smaller ones there's little we can do other than sending each HTLC sequentially, and even in that case how do you detect that it is time to start the next one?
- From what I could gather there are now two splitting schedules that are mixed: a desired amount split and a fibonacci sequence split. Please stick to one, especially since it seems the desired amount split is a special case of the general fibonacci sequence one.
/* Payment `a` is the larger split, thus less likely | ||
* to succeed, so start `a` first so that it gets into | ||
* channels earlier. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that calling payment_start
first on a
does not guarantee that it'll actually result in its sendonion
call to be first (which is what gives the HTLC its ID and thus it's the order in which they will be applied to channels. Furthermore even if we order them correctly locally, they might get reordered if the path prefix to the failing channel is not identical:
a -> b -> c -> d [large payment]
a -> c -> d [small payment]
In this case if c -> d
is the bottleneck, the smaller payment may obstruct the larger one (notice that this is more related to latency than to path length, so it's hard to discern these locally). Generally speaking we have very little influence over the order the HTLCs are applied to remote channels, which account for the vast majority of capacity failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intend to add a path diversity paymod later which will getroute
once for the shortest route, then generate alternate routes from there, by trial-banning some of the nodes/channels on the original route to generate various routes. That paymod will specifically delay later payments until it has generated alternate routes, as long as the payment goes to the same destination (which is significant for payees with multiple routehints).
The effect is that later splits will go through longer paths, and thus more likely to get delayed reaching the destination relative to the earlier payments, and since it defers later payments to the same destination (it immediately triggers on the earlier payments as soon as it receives the shortest path, then delays subsequent payments to the same destination until it has acquired alternate, and presumably longer, paths).
You want path diversity anyway, regardless of using a non-equal or equal split, since dumping all your payments on the same path is a waste. If you presplit into 2 parts and route both parts on the exact same path, like the current paymod system does, you just wasted fees and HTLC capacity.
The rule-of-thumb "earlier started is more likely to get available capacity" does not need to be perfect, just good enough to improve payment success rates.
Or just give up on MPP and use JIT routing, it is not as if we have any decent research on MPP, whereas JIT routing gets you 90% of what you need from MPP, is far simpler, and does not have any of this uncertainty of whether it would work out or not given this or that condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intend to add a path diversity paymod later which will
getroute
once for the shortest route, then generate alternate routes from there, by trial-banning some of the nodes/channels on the original route to generate various routes. That paymod will specifically delay later payments until it has generated alternate routes, as long as the payment goes to the same destination (which is significant for payees with multiple routehints).
Sounds like you're implementing Yen's k-shortest path algorithm :-)
The effect is that later splits will go through longer paths, and thus more likely to get delayed reaching the destination relative to the earlier payments, and since it defers later payments to the same destination (it immediately triggers on the earlier payments as soon as it receives the shortest path, then delays subsequent payments to the same destination until it has acquired alternate, and presumably longer, paths).
This is far from a guarantee however, given that we don't select by roundtrip latency, but rather costs that our payment incur. So it can very well be that even though a "longer" path has numerically more hops, it is shorter latency-wise. Using the time to compute longer routes as a delay between parts of the same payment also seems strange, given that there are several orders of magnitude between network latency and the time to compute routes, so we'd be back at the artificial introduction of delays, adding to the overall time to completion.
The rule-of-thumb "earlier started is more likely to get available capacity" does not need to be perfect, just good enough to improve payment success rates.
I'm not saying it has to be perfect, just wanted to play devils advocate and highlight this assumption as it doesn't have any guarantees.
You want path diversity anyway, regardless of using a non-equal or equal split, since dumping all your payments on the same path is a waste. If you presplit into 2 parts and route both parts on the exact same path, like the current paymod system does, you just wasted fees and HTLC capacity.
I think the path-diversity discussion is independent of the splitter discussion, but I totally agree that having more diversity is a good thing here. I have a patch that issues the getroute
calls in sequential order, allowing a call to update the channel_hints
before collecting exclusions for the next call, which can help avoid oversubscribing channels and introduce path diversity.
* 13 seems like a reasonable fibonacci-sequence | ||
* number to use. | ||
*/ | ||
#define REASONABLE_MAX_PRESPLIT 13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this have to be a fibonacci number? I get that aligning part amounts to fibonacci numbers might be sensible, but why the number of splits as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason. If you can give a better one I am all ears.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably get a better one by asking node operators what their max-concurrent-htlcs
are, ask them for their routing logs and check how often HTLCs get failed due to lack of fees, and figuring out the average length paths get, the number of channels intermediate nodes tend to be, and so on. The number currently there is derived ex asino anyway.
if (**comment) | ||
tal_append_fmt(comment, ", "); | ||
|
||
/* Append a description of the new child. */ | ||
tal_append_fmt(comment, "new partid %"PRIu32" (%s)", | ||
c->partid, | ||
type_to_string(tmpctx, struct amount_msat, | ||
&child_amount)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two can be combined to have a single allocation:
if (**comment) | |
tal_append_fmt(comment, ", "); | |
/* Append a description of the new child. */ | |
tal_append_fmt(comment, "new partid %"PRIu32" (%s)", | |
c->partid, | |
type_to_string(tmpctx, struct amount_msat, | |
&child_amount)); | |
/* Append a description of the new child. */ | |
tal_append_fmt(comment, "%snew partid %"PRIu32" (%s)", | |
**commen d!= NULL ? ", ": "", | |
c->partid, | |
type_to_string(tmpctx, struct amount_msat, | |
&child_amount)); |
#define PRESPLIT_MAX_HTLC_SHARE 3 | ||
#define MPP_PRESPLIT_PREFERRED_AMOUNT 9227465L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just keep the 10ksat amount? It's unlikely to make any difference since we randomize on each split, so any precision here is wasted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think the big randomization is valuable, which is why a later commit reduces those randomizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue it is, in particular for the remainder amounts that do not fit nicely in a bucket
/* Split by equal units of our preferred amount. | ||
* If dividing the amount by the preferred amount resulted in fewer | ||
* splits than the HTLC limit, we use this method of splitting, in | ||
* order to get the theoretical 80% percentile success. | ||
* Returns a one-line report, allocated off @ctx, of how we split up | ||
* the payment. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please chose one way of splitting and do that all the way, no switching between splitting schedules. Any unnecessary complexity will likely come to bite us later on.
/* Flip a coin, if it comes up heads, stop the presplit. */ | ||
if (pseudorand(2) == 0) | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reason we have a similar randomization in shadow routing.
See: https://lists.linuxfoundation.org/pipermail/lightning-dev/2020-August/002778.html
Closes #3942
I now prefer this over #3942 , as I conjecture that a Fibonacci split would be able to fit into channel capacities better (fewer splits and better utilization of capacity).
Also sneaked in a tightening of the split-based amount fuzzing. Larger fuzzes cause us to deviate from the ideal Fibonacci schedule more.