Pay: take channel capacity into account for route selection #4771

rustyrussell · 2021-09-08T05:27:41Z

If this seems sane, I'd like to see if @cdecker can help produce any evidence to show that it helps? :)

cdecker · 2021-09-08T08:59:44Z

Needs a changelog, otherwise perfect 👍

ACK rustyrussell@c75f74b

vincenzopalazzo

Super cool 💯

ack c75f74b

rustyrussell · 2021-09-09T05:14:14Z

Needs a changelog, otherwise perfect +1

ACK rustyrussell@c75f74b

But does it work??

Let's coordinate a before/after test using paytest?

rustyrussell · 2021-09-22T11:02:19Z

OK, so using paytest with Christian's node, which I seem to have trouble routing to:

Before:
30,000sat: 92 attempts, failed.
3,000sat: 11 attempts, success.
1,500sat: 6 attempts, success.

After:
30,000sat: 90 attempts, failed.
3,000sat: 4 attempts, success.
1,500sat: 4 attempts, success.

So it does make a difference!

cdecker · 2021-09-22T15:59:54Z

ACK 4b9f4bf

renepickhardt · 2021-09-22T17:44:55Z

Didn't fully review the code yet but please note that the linerazed score is amt*(1/(c+1) + \mu*r) with r=ppm. Now ppm is an integer larger than 1 and 1/(c+1) is several orders of magnitude smaller than 1 so you need to select \mu in a way to have a similar order of magnitude for the capacity term too become significant instead of 1/2. You could try 0.00001 to begin with.

Also note that instead of the linearized amt/(c+1) you could use the actual negative log probabities: -log((c-amt+1)/(c+1). \mu should probably be in a similar size as the linearized version.

rustyrussell · 2021-09-23T03:03:56Z

The question is: what is the capacity influence, vs fee influence? If we make the capacity influence << fee influence, it has only effect as a tiebreaker.

Of course, the current calculation uses the fee for this particular hop to scale the capacity "cost", which is clearly wrong: using the median fee would make more sense. That's currently 1000 msat + 100ppm (though we could actually estimate it with reasonable efficiency in case it changes).

We could simply add to the cost function: (1 - (amt / capacity)) * (1000 + 100 * amt / 1000000). This gives it the same selection power as median fees if capacity is infinite, which splits the difference fairly.

renepickhardt · 2021-09-23T16:57:13Z

Let me try to be very brief (I am happy to elaborate though):

Clarification about your formular

cost = (1 - (amt / capacity)) * (1000 + 100 * amt / 1000000)

the cost will be really low if the amount reaches the capacity as cap/cap = 1 and 1 -1 = 0. We thus get 0 cost for channels that we fully saturate - independently of the fees that we wish to pay. Thus your suggestion seems inherently flawed or am I missing something? I do think however that you have the right intuition by going to the product. Thus I will write about about averages of scores

Combining various scores

If we don't try to do find a min cost flow but just single paths we don't need convex or linear for our cost function thus the question remains which weights to take and how to combine them properly. If I recall correctly you currently use 3 features in your cost function:

x_1 = f = fees coming from the routing fees
x_2 = r = cltv encoded as the risk factor
x_3 = p = success probability encoded in the linear term of the tailor expansion of negative log probabilities.

then what you do is you take the arithmetic mean of the features potentially with some weigh

cost = f + w_r*r + w_p*p

IIRC w_r is encoded in the risk factor via config, cli and default value and w_p is the lagrange multiplier (\mu) from our formular.

In our paper we also suggest the weighted arithmetic mean as it helps for our function to stay linear / convex.

However was noted before in the case of dijkstra we don't need that. I thus propose that instead of the arithmetic mean you use the Harmonic mean for which (as described in the linked wikipedia article) also a weighted version exist.

The reasoning is the same as I provided to the LND autopilot.

In our case the cost from probabilities are always in the [0,1] range and fees as well as cltv deltas can be normalized by dividing with the theoretic max that could be seen on the network.

I think the multiplicative nature of the harmonic mean should give a much better interaction with our 3 different features, especially since they might vary over several orders of manitude without the necessity for us to learn optimal weights w_r and w_p.

fun fact:

In my former live as a data science consultant I was once hired to help with a predictive model as it performed poorly and much worse than expected. After a couple of days of dissecting the existing model and features I came to the conclusion that it was modelled properly and the only problem was that they used the arithmetic mean as an average of their features instead of harmonic mean. After changing this one formula everything performed in a way how it was supposed to and the job was completed. Given that experience I might be slightly biased towards harmonic means in situations like we are in. Of course I cannot tell you that the trick will also work here but I believe the setting / reasons should be similar.

rustyrussell · 2021-09-27T06:39:59Z

Clarification about your formular
cost = (1 - (amt / capacity)) * (1000 + 100 * amt / 1000000)

Doh, brain fart! That 1- is bogus, as you point out. (amt / capacity) * (1000 + 100 * amt / 1000000) is the correct naive formula.

x_1 = f = fees coming from the routing fees
x_2 = r = cltv encoded as the risk factor
x_3 = p = success probability encoded in the linear term of the tailor expansion of negative log probabilities.

then what you do is you take the arithmetic mean of the features potentially with some weigh

cost = f + w_r*r + w_p*p

However was noted before in the case of dijkstra we don't need that. I thus propose that instead of the arithmetic mean you use the Harmonic mean for which (as described in the linked wikipedia article) also a weighted version exist.

This is likely to cause the capacity to dominate, since Harmonic mean emphasizes the smallest value. I'm more comfortable with a simple sum of the three at this stage. Deeper changes would require deeper testing.

In our case the cost from probabilities are always in the [0,1] range and fees as well as cltv deltas can be normalized by dividing with the theoretic max that could be seen on the network.

There is no useful theoretical max, but there is a perfectly reasonable expectation: the medians. For cltv this is 40. At riskfactor 10, this works out to a risk cost of 760msat on a 10,000sat payment. The median fee on that payment is 1001msat.

For the moment I've altered it to simply add "median_fee(amt) * (amt / (capacity + 1)". This makes it comparable to the effect of fees in the normal case, which seems reasonable.

It's a simple heuristic, and I look forward to more sophisticated approaches!

renepickhardt · 2021-09-27T09:22:06Z

First of all I am sorry that I forgot to mention in my last reply that I really liked your approach of using the median instead of the max (as max values can easily have strong outliers) However I am a bit confused now:

You write that you don't like the harmonic mean but that you would rather prefer the arethmetic one (simple sum) however even when omitting your 1- the formula that you propose cost = (amt / capacity) * (1000 + 100 * amt / 1000000) has the structure:

cost = p * f

I don't see the risk factor as a feature in there and I will ignore it for now as my argument and confusion is independent of the 3rd feature. (btw maybe you know something that I don't but I never fully understood why one would want to optimize for a low CLTV anyway.)

The Harmonic mean of two numbers p and f is H = 2*p*f/(p+f) (note the product in the numerator) while the arithmetic is A = 1/2(p+f) what you propose is neither the harmonic nor the arithmetic, though intuitively I believe it will behave closer to the harmonic than it would to the arithmetic mean.

actually what I see is that the Rusty way R = H * A = 2*p*f/(p+f) * 1/2(p+f) = p*f. While I propose Harmonic and you wrote that you like arithmetic it seems that you actually propose the product of harmonic and arithmetic.

BTW looking at your actual code I am even more confused. in LIne 719

lightning/plugins/libplugin-pay.c

Line 719 in d939e35

return fee.millisatoshis /* Raw: complex math & laziness */

you compute the feature with the capacity as the product of the fee times the probability.

but then in line 734 you just linearly add it to the other values

lightning/plugins/libplugin-pay.c

Line 734 in d939e35

+ capacity_bias(global_gossmap, c, dir, cost);

this means your cost function now looks like this:

C = f + r + p*f = f*(1+p) + r

Indepenedently of your actual choices of how to compute the total cost I think it might make sense to not have the median values as constants in the code but rather learn them from gossip either at node start or once in a while. Also I want to emphasize that

cdecker · 2021-09-28T17:28:35Z

I spent the weekend measuring the three variants we have here:

"original": the code currently deployed in v0.10.1
"variant1-4b9f4bf": the first version of this PR
"variant2-d939e3": the current version of the PR

With the help of @renepickhardt's and @rustyrussell's nodes running the paytest plugin we performed 6747 test payments, with varying amounts (1000 * 2**x) and the variants above. The following aresome preliminary plots:

Simple scatterplot of completion times vs amount, not really all that helpful.

Median time to completion for the various variants and amounts. Taking the channel size into consideration definitely helps, except maybe tiny amounts. The hump in the middle is likely a topological issue with my surroundings, resulting in some bad channels being attempted before succeeding.

My favorite plot: we hugely increae the success probability. The dip for the original seems to be the exemptfee causing us to pick smaller & cheaper channels, wasting time on unsuccesful attempts.

And finally the median number of attempts to complete. Again we need fewer attempts, exept for tiny amounts maybe.

More measurements would definitely be nice, to analyze the details of what's going on.

IMHO this can be merged as is, and we can defer tweaking parameters, or do so incrementally.

renepickhardt · 2021-09-28T18:59:42Z

Why not making additional experiemnts? At least one version where you use: -log((c+1-amt)/(c+1)) instead of the linearized amt/(c+1) and also at least one experiment using this and harmonic mean to combine the features in the cost function. I expect much better results if you do so for all the reasons discussed above.

also I am a bit confused by the experimental setup. Are you actually always using rusty's and my node as destination or do you force intermediary nodes? If you always use the same nodes as destinations I would not be surprised that the curves are so close to each other.

While this is obviously for you to decide I would at this point not merge the linearized and arithmetic mean variant that still emphasizes heavily on the fees (even though it seems to be an improvement)

cdecker · 2021-09-29T10:43:21Z

Why not making additional experiemnts? At least one version where you use: -log((c+1-amt)/(c+1)) instead of the linearized amt/(c+1) and also at least one experiment using this and harmonic mean to combine the features in the cost function. I expect much better results if you do so for all the reasons discussed above.

If you code the variant up I'll test it. I don't really trust myself not to introduce a random error into the formula 😉

also I am a bit confused by the experimental setup. Are you actually always using rusty's and my node as destination or do you force intermediary nodes? If you always use the same nodes as destinations I would not be surprised that the curves are so close to each other.

Yes, this is very simplistic, given I only have 2 destinations to test against for now, hopefully more users / devs will run the plugin over time, allowing us to perform more realistic tests with more diversity. As for the second point, in order to compare the results we want the test conditions to be very similar, so any difference can be attributed to our change, I am not following how testing the same scenario by only changing one variable under our control could possibly be bad here.

While this is obviously for you to decide I would at this point not merge the linearized and arithmetic mean variant that still emphasizes heavily on the fees (even though it seems to be an improvement)

Obviously we want the best possible variant to make it into the release, but if we don't have data to corroborate the improvement we'll merge the one that we're confident has shown improvement. It's all a matter of how well we can test the variants.

renepickhardt · 2021-09-29T13:25:49Z

I have just created a small notebook at: https://github.com/renepickhardt/probabilistic-pathfinding/blob/main/Probabilistic%20Pathfinding%20Simulation.ipynb which I will plan to extend with a small simulation of various cost functions before I make a concrete proposal for a mainnet test. I am happy to provide you with the c-code of a modified cost function as those changes don't seem large and rather easy to build in.

Note I also copy the pictures from the current version of the notebook here:

I would argue that we can already see that the harmonic mean has a nice impact on the shape of the cost function but of course that does not say anything about how experiments / simulations will actually work.

renepickhardt · 2021-09-29T15:27:07Z

Ok I have updated the notebook (I am not sure if I compute the risk factor correctly (especially with respect to putting it to the right order of magnitude as fees) other than that I have now tested in a simulation a version with a smoothed harmonic mean (to avoid division by 0 if fees are zero) and the method proposed by rusty.

At least in the simulation the harmonic mean with negative log probabilities works much better than rusties formular that heavily focuses on minimizing fees.

probabilistic with CLTV successrate: 91.87%
probabilistic no CLTV successrate: 91.46%
rusty's method successrate: 71.54%

as we can see the CLTV / risk factor seems not to make a huge difference but as noted before I am not sure if I compute this properly as of now.

What I did in this experiment: I randomly assigned balance values to the LN network (following a uniform distribution, which as of now is the best we can do) for all three cost functions I used the same instance of the randomly assinged balance values. Then I tried to deliver 100'000 sats (1 mBTC) from my node to every node in the current bos-score list. I only did one attempt along the shortest path according to the cost function. While I am not found of the bos score in this quick test it was the easiest way for me to have plausible / active destination nodes.

renepickhardt · 2021-10-04T16:32:07Z

Just leaving a rough (untested) implementation of harmonic mean with probabilistic pathfinding and risk factor.

No normalization needed due to harmonic mean but scores are cast to longs and this might loose accuracy renepickhardt@a24f363 thus some refactoring of method signatures might be necessary

Couldn't test the code right now so please view this barely as a documentation that should be tested on mainnet

ZmnSCPxj · 2021-10-05T08:59:59Z

The question is: what is the capacity influence, vs fee influence? If we make the capacity influence << fee influence, it has only effect as a tiebreaker.

https://lists.linuxfoundation.org/pipermail/lightning-dev/2021-August/003191.html ? Especially if this is "just" Dijkstra and not mincostflow and need to convex (or is it concave?) cost function is not needed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

We bias by channel linearly by capacity, scaled by median fee. This means that we effectively double the fee if we would use the entire capacity, and only increase it by 50% if we would only use 1/2 the capacity. This should drive us towards larger channels. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: Plugins: `pay` now biases towards larger channels, improving success probability.

Changelog-Changed: pay: The route selection will now use the log-propability-based channel selection to increase success rate and reduce time to completion

cdecker · 2021-10-21T18:11:43Z

Updated the pull request to use the log-probabilities, since they perform consistently better than the linear bias of the original version. Rebased on master and happy to merge as soon as it passes CI.

@renepickhardt could you check that by cleaning up the last commit I didn't accdentally mess up the logic?

vincenzopalazzo

LGTM.

ack 8378ed2

renepickhardt

Delighted to write ACK (:

though I have a few nitpicking comments on naming and one issue with the harmonic mean formula which I messed up before our evaluation. but I think this should be up to future evaluation.

plugins/libplugin-pay.c

renepickhardt · 2021-10-22T12:51:11Z

plugins/libplugin-pay.c

+{
+	u64 cmsat = cost.millisatoshis; /* Raw: lengthy math */
+	u64 rmsat = risk.millisatoshis; /* Raw: lengthy math */
+	u64 bias = capacity_bias(global_gossmap, c, dir, cost);


I understand that historically this was thought of as a capacity bias but I would actually for future redability rename this to negative_uniform_logprob

renepickhardt · 2021-10-22T12:52:50Z

plugins/libplugin-pay.c

+}
+
+/* Prioritize costs over distance, but bias to larger channels. */
+static u64 route_score(u32 distance,


nit: maybe rename to cost instead of score just to be closer to terms used in the literature (c.f. min-cost-flow instead of minimum score flow)

I think channel_cost or cost_function would be a better name also at the point where you make the dijsktra call

renepickhardt · 2021-10-22T13:01:25Z

plugins/libplugin-pay.c

+
+/* Prioritize costs over distance, but bias to larger channels. */
+static u64 route_score(u32 distance,
+		       struct amount_msat cost,


nit: here the same this cost I would actually name fees as the fees are just a feature in the final cost function.

rustyrussell requested a review from cdecker September 8, 2021 05:27

rustyrussell force-pushed the guilt/pay-capacity-weight-hack branch from 6bfc039 to c75f74b Compare September 8, 2021 07:16

vincenzopalazzo approved these changes Sep 8, 2021

View reviewed changes

rustyrussell marked this pull request as ready for review September 22, 2021 11:05

rustyrussell force-pushed the guilt/pay-capacity-weight-hack branch from c75f74b to 4b9f4bf Compare September 22, 2021 11:05

rustyrussell added this to the v0.10.2 milestone Sep 22, 2021

rustyrussell force-pushed the guilt/pay-capacity-weight-hack branch from 4b9f4bf to d939e35 Compare September 27, 2021 06:36

rustyrussell added 2 commits October 21, 2021 15:30

common/dijkstra: hand channel direction to path_score callback.

347967d

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

cdecker force-pushed the guilt/pay-capacity-weight-hack branch from d939e35 to bdcb9f6 Compare October 21, 2021 16:09

pay: Use log probability based bias in channel selection

8378ed2

Changelog-Changed: pay: The route selection will now use the log-propability-based channel selection to increase success rate and reduce time to completion

vincenzopalazzo approved these changes Oct 22, 2021

View reviewed changes

renepickhardt approved these changes Oct 22, 2021

View reviewed changes

renepickhardt reviewed Oct 22, 2021

View reviewed changes

cdecker merged commit 0ba1bc3 into ElementsProject:master Oct 22, 2021

renepickhardt mentioned this pull request Dec 7, 2021

Look Into using -log((channel capacity + 1 - HTLC amount ) / (channel capacity +1)) for scoring lightningdevkit/rust-lightning#1172

Closed

renepickhardt mentioned this pull request Jan 19, 2022

Probabilistic channel scoring lightningdevkit/rust-lightning#1227

Merged

renepickhardt mentioned this pull request Mar 14, 2022

Improvement to payment splitting algorithm ACINQ/eclair#1698

Open

renepickhardt mentioned this pull request Jul 25, 2023

renepay: an experimental payment plugin #6376

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pay: take channel capacity into account for route selection #4771

Pay: take channel capacity into account for route selection #4771

rustyrussell commented Sep 8, 2021 •

edited by cdecker

Loading

cdecker commented Sep 8, 2021 •

edited

Loading

vincenzopalazzo left a comment

rustyrussell commented Sep 9, 2021

rustyrussell commented Sep 22, 2021

cdecker commented Sep 22, 2021

renepickhardt commented Sep 22, 2021

rustyrussell commented Sep 23, 2021

renepickhardt commented Sep 23, 2021 •

edited

Loading

rustyrussell commented Sep 27, 2021

Clarification about your formular

renepickhardt commented Sep 27, 2021

cdecker commented Sep 28, 2021

renepickhardt commented Sep 28, 2021

cdecker commented Sep 29, 2021

renepickhardt commented Sep 29, 2021

renepickhardt commented Sep 29, 2021 •

edited

Loading

renepickhardt commented Oct 4, 2021

ZmnSCPxj commented Oct 5, 2021 •

edited

Loading

cdecker commented Oct 21, 2021

vincenzopalazzo left a comment

renepickhardt left a comment

renepickhardt Oct 22, 2021

renepickhardt Oct 22, 2021

renepickhardt Oct 22, 2021

renepickhardt Oct 22, 2021

Pay: take channel capacity into account for route selection #4771

Pay: take channel capacity into account for route selection #4771

Conversation

rustyrussell commented Sep 8, 2021 • edited by cdecker Loading

cdecker commented Sep 8, 2021 • edited Loading

vincenzopalazzo left a comment

Choose a reason for hiding this comment

rustyrussell commented Sep 9, 2021

rustyrussell commented Sep 22, 2021

cdecker commented Sep 22, 2021

renepickhardt commented Sep 22, 2021

rustyrussell commented Sep 23, 2021

renepickhardt commented Sep 23, 2021 • edited Loading

Clarification about your formular

Combining various scores

fun fact:

rustyrussell commented Sep 27, 2021

Clarification about your formular

renepickhardt commented Sep 27, 2021

cdecker commented Sep 28, 2021

renepickhardt commented Sep 28, 2021

cdecker commented Sep 29, 2021

renepickhardt commented Sep 29, 2021

renepickhardt commented Sep 29, 2021 • edited Loading

renepickhardt commented Oct 4, 2021

ZmnSCPxj commented Oct 5, 2021 • edited Loading

cdecker commented Oct 21, 2021

vincenzopalazzo left a comment

Choose a reason for hiding this comment

renepickhardt left a comment

Choose a reason for hiding this comment

renepickhardt Oct 22, 2021

Choose a reason for hiding this comment

renepickhardt Oct 22, 2021

Choose a reason for hiding this comment

renepickhardt Oct 22, 2021

Choose a reason for hiding this comment

renepickhardt Oct 22, 2021

Choose a reason for hiding this comment

rustyrussell commented Sep 8, 2021 •

edited by cdecker

Loading

cdecker commented Sep 8, 2021 •

edited

Loading

renepickhardt commented Sep 23, 2021 •

edited

Loading

renepickhardt commented Sep 29, 2021 •

edited

Loading

ZmnSCPxj commented Oct 5, 2021 •

edited

Loading