renepay: an experimental payment plugin #6376

Lagrang3 · 2023-07-04T10:38:33Z

Overview

This PR about implementing a payment plugin called renepay that seeks to
construct optimal Multi-Path-Payments (o-MPP) in terms of fees and reliability.
The original idea was published by Rene Pickhardt and Stefan Richter [1].
There exists a python implementation of o-MPP by Pickhardt (pickhardtpayments).

This work was possible thanks to a Build on L2 Grant.

Pickhardtpayments

Any node in the Lightning Network (LN) has knowledge of the existence of public
channels and their capacities, but the channel liquidity is unknown unless that
channel is local to the node.
Therefore, when a node constructs a route to send a payment it is assuming that
all channels along the path have enough liquidity to forward the payment amount.
If the payment fails, alternative routes can be tried and the process repeats.

It is reasonable to assume that the smaller the payment amount it is more likely
that the payment attempt succeeds. With the use of MPP one could split the
payment into smaller parts and send it along different routes. However, also the
more routes a MPP has, the more likely it will fail. One must find a compromise
between the number of routes and the size of each payment part as well as an
strategy to select routes if one wishes to maximize the probability of success
of the payment.

Pickhardt-Richter [1] paper formalizes this problem stating that the liquidity
of every channel is a random variable distributed in the integer range between
0 and C, where C is the capacity of the channel. For simplicity we are
neglecting the channel reserved fund.
A MPP can also be modeled as flow in the LN, that is an integer value is
associated to each channel, which corresponds to the amount of satoshis this
channel must forward so that this MPP can be completed. Notice that a channel
could be used by two different routes within the same MPP and the flow
associated to that channel is the sum of all funds that need to get through it.
Formally a flow is an integer function in the domain of the channels.

For a realistic payment the Flow constraints include:

for any channel c the flow cannot exceed the channel's capacity:
0 <= Flow(c) <= Capacity(c);
there is satoshi conservation, hence each node has a satoshi balance computed
as the sum of all incoming flows minus the outgoing flows,
the sum of all nodes balance must be zero.
also for the payment to considered valid one must have that
the destination node have an balance which equals the requested payment
amount.
The payer node have a negative balance and must correspond to the
payment amount plus the routing fees.
Routing nodes will have a positive balance that corresponds to the total fees
they collect.

The probability of success of a certain MPP, or its associated Flow,
is then the multiplication of the
probabilities that each channel is able to forward the corresponding Flow:

    Prob(Flow) = Product_{channel c} Prob( c can forward Flow(c))

Maximizing this probability is equivalent to minimizing the function -log Prob(Flow)

    - log Prob(Flow) = - sum_{channel c} log Prob( c can forward Flow(c))

If we assume that:

the probability distribution of the liquidity of every channel c is such that
-log Prob(c can forward x) is always a convex function of x;
and the routing fees are negligible, hence we can assume that the balance of
every node different from the source and the destination is zero;

then the problem of finding an optimal Flow that satisfies the capacity and balance
constraints, while maximizing Prob(Flow) can be solved efficiently using
polynomial algorithms. Chapter 14 of Ahuja-Magnanti-Orlin's Network Flows book [2]
is dedicated to this class of problems.

References

[1] Rene Pickhardt, Stefan Richter. Optimally Reliable & Cheap Payment Flows on the Lightning Network https://arxiv.org/abs/2107.05322
[2] R.K. Ahuja, T.L. Magnanti, and J.B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993.

cdecker

Forgot to submit these comments as usual, sorry if they are already out of date.

I'm not quite through the entire PR, but what I've seen is looking excellent 🚀

plugins/renepay/debug.h

plugins/renepay/dijkstra.c

rustyrussell · 2023-07-25T01:46:12Z

Great, I will rebase this, since db changes broke it...

rustyrussell

This was so nice to read! 🧡

No serious changes, mainly tweaks to things which work, but can improve, and answering some TODO.

For review, I squashed and rebased your commits without any changes, and split the ccan intro into a separate commit. Then I played with some fixups on top for you to look at:

https://github.com/rustyrussell/lightning/tree/guilt/pr-6376

plugins/renepay/dijkstra.c

plugins/renepay/heap.h

plugins/renepay/mcf.c

plugins/renepay/pay.c

rustyrussell · 2023-07-25T04:13:18Z

plugins/renepay/pay.c

+					   fraction);
+	pay_plugin->last_time = now_sec;
+
+	// TODO(eduardo): are there route hints for B12?


No. We have blinded paths instead, which must be used. Though, a node can just include a dummy blinded path to/from itself.

So, if it doesn't, we need to isolate the destination from any real channels, and create fake "channels" from the entry point using the blinded paths (they have info on fees and cltv). But that's just typing, basically, since algo stays the same.

can we have several blinded paths which do not sum to the full but a higher amount? basically like receivers MPP split? In that case the minimum cost flow problem could be redefined as a single source multi destination problem but the constraints become weired as the blinded paths probably expect either the full amount or no amount at all.

Yep, you can offer multiple paths with lesser capacity, it will Just Work. Or multiple paths each with sufficient capacity, and sender chooses how to use them.

This is hard to solve before the release. Can I just drop support for Bolt12 until we fix this?

plugins/renepay/pay.c

rustyrussell · 2023-07-25T04:15:04Z

plugins/renepay/pay.c

+// 	 * notification. */
+// 	// plugin_log(pay_plugin->plugin,LOG_DBG,"received shutdown notification, freeing data.");
+// 	pay_plugin->ctx = tal_free(pay_plugin->ctx);
+// 	return notification_handled(cmd);


Weird, this should work...

But you shouldn't do this anyway.

plugins/renepay/pay.h

rustyrussell · 2023-07-25T04:28:41Z

plugins/renepay/flow.c

+	/* Remove self from map when done */
+	// TODO(eduardo):
+	// Is this desctructor really necessary? the chan_extra will deallocated
+	// when the chan_extra_map is freed. Anyways valgrind complains that the
+	// hash table is removing the element with a freed pointer.
+	// tal_add_destructor2(ce, destroy_chan_extra, chan_extra_map);


If you free the "ce" though, the pointer stays in the hash table! valgrind is right, this can be fatal.

You have two choices:

Remove self from table on free automatically, using destructor. This is simplest!

Remove from table manually. This optimization might happen after benchmarking, only!

All the chan_extra stored in the chan_extra_map are allocated with the map as the parent, their destructor will first remove them from the hash table. The problem IMO is that tal will free the hash table first and then call the destructor of every child which will try to remove every chan_extra_map from a hash table that no longer exists.

I'm stuck here.
One possible solution would be to allocate the chan_extras with another parent context and ensure that that context is freed before the hash table, maybe by having that ctx as parent of the hashtable itself.

renepickhardt

I am not through with reviewing and have only had a high level look at the code and comments and also left only a few high level comments yet.

I am confident that this PR is a huge step in the right direction and I am looking forward to finnish / extend the review soon.

plugins/renepay/pay.c

renepickhardt · 2023-07-25T13:25:10Z

plugins/renepay/flow.h

+/*
+ * mu (μ) is used as follows in the cost function:
+ *
+ *     -log((c_e + 1 - f_e) / (c_e + 1)) + μ fee


yes the paper suggests a weighted arithmetic mean between routing cost and uncertainty cost. However given all the issues that one can see from this I want to open the discussion about other means (like harmonic as deiscussed in #4771 ) or at least adding laplacian smoothing by adding some constant (c.f.: https://en.wikipedia.org/wiki/Additive_smoothing )

I thought of adding prob. cost P and routing cost R using a fraction f (0<=f<=1):

cost = P * f + (1-f) R

so that when f=1 you get the entire costs is related to uncertainty, while with f=0 the costs is only fees. The good thing about this choice is that the interval to search is compact [0,1], and one can do binary search to look for the best value of f.
Then we upgraded this to integer numbers by adopting the current parameter mu that lies in the interval [0,MU_MAX] and

cost = P*mu + R*(MU_MAX-mu)

in essence is the same but with integers.

Allow me some time to study the discussion #4771.

plugins/renepay/payment.h

plugins/renepay/flow.c

renepickhardt · 2023-07-25T13:42:33Z

plugins/renepay/flow.c

+	    bfee = c->half[dir].base_fee,
+	    delay = c->half[dir].delay;
+
+	return pfee + bfee* base_fee_penalty+ delay*delay_feefactor;


should the CLTV still be a penalty in finding payment flows?

How could we consider it otherwise? If the delay is non-important then we set the delay_feefactor=0 or just a small number.

renepickhardt · 2023-07-25T13:48:52Z

plugins/renepay/mcf.c

+ * problem is:
+ *
+ * 	Find a routing solution that pays the least of fees while keeping
+ * 	the probability of success above a certain value `min_probability`.


META COMMENT: -log(Probability) can be defined as an uncertainty cost and routing fees can be considered as a routing cost. In that case we would wish to minimize the total cost which consists of the combination of both routing and uncertainty cost. I think that is a more general and less confusing way to formulate the problem.

Yes. We could formulate the problem in that way, the issue is that we don't know how to combine the uncertainty cost and routing cost, and in practice as a user I am interested in getting my payment through quickly and with low fees. It's a two dimensional optimization problem, in general the cheapest routes will not correspond to the most reliable ones. We thought the easiest user experience will be: get me any solution that satisfies this constraint in uncertainty (eg. prob must be greater than 50%) and routing costs (eg. fees are less than 10sats), in the background I will try to go as cheap as possible, but those bounds have the priority. That was the simplest choice I could make.

This process is taken care of by the binary search in the minflow function and the is_better function decides the priorities (both in mcf.c).

In the future we can improve this.

renepickhardt · 2023-07-25T13:54:37Z

plugins/renepay/mcf.c

+ * 	arc_2 -> [a+(b-a)*f1, a+(b-a)*f2)
+ * 	arc_3 -> [a+(b-a)*f2, a+(b-a)*f3)
+ *
+ * where f1 = 0.5, f2 = 0.8, f3 = 0.95;


I love the non uniform split of the capacity! However I believe 0.5 is a pretty high probability. I fear this will trigger and saturate too large payments. I'd love to have a discussion on the choices. also one does not need to take a uniform distribution for the probability to begin with. LND uses a a bimodal distribution. However that it hard to linearize.

There were just chosen by hand:

4 sections seemed like a nice number because it is a power of two (so that arcs can be identified with bits)

from 0 to 0.5 the uniform distribution of liquidity gives an almost linear cost function, then the following pieces must have ever decreasing capacities to preserve linearity, that's why from 0.5 to the next pivot I chose dx=0.3, resulting in 0.8, while the last segment is 0.8 + 0.15 = 0.95

The last two segments are worst linear approximation than the first.

These are more parameters can we have to tune with experience.

@renepickhardt, there is another detail here that bothers me, it is the fact that we cut the tail of the distribution at 95% of the channel capacity. That means there might be solutions that the algorithm will not see becauseit will think that the channel is already saturated at 95%.
We need either to add more arcs above that increasing the computational and memory burden, or produce a less precise linearization, for example like extending the last bin from 80% to 99%.

This is easy to fix, because these are just parameters.

rustyrussell · 2023-07-28T05:39:22Z

plugins/renepay/mcf.c

-			struct flow *fp = tal(list_ctx,struct flow);
+			struct flow *fp = tal(this_ctx,struct flow);


tallocate this off flows...

Caveat: these are flow candidates, minflow produces different sets flows and selects one. Ownership of the flows to the flow array is elegant and allows me to do:

if(is_better(candidate_flow_array,best_flow_array)) { tmp = best_flow_array; best_flow_array = candidate_flow_array; tal_free(tmp); }

rustyrussell · 2023-07-28T05:39:26Z

plugins/renepay/mcf.c

+	/* Stablish ownership. */
+	for(int i=0;i<tal_count(flows);++i)
 	{
-		flows[pos++] = tal_steal(flows,ld->flow_path);
+		flows[i] = tal_steal(flows,flows[i]);


...and skip this loop.

plugins/renepay/pay.c

[ Split into separate commit --RR ] Signed-off-by: Lagrang3 <eduardo.quintana@pm.me>

rustyrussell · 2023-07-31T00:18:32Z

Trivial rebase which merged many of the fixup commits together, made flake8 happy with the final commit, and added a Changelog-Added line.

Ack bbc9591

(Cleanups can go in followup PRs...)

Signed-off-by: Lagrang3 <eduardo.quintana@pm.me> Changelog-Added: Plugins: `renepay`: an experimental pay plugin implementing Pickhardt payments (`renepay` and `renepaystatus`).

Signed-off-by: Lagrang3 <eduardo.quintana@pm.me>

- remove internal gheap checks - add check for arc_t.chanidx overflow - remove outdated comments - check the delta flow bounds before augmenting along a path - get_flow_paths uses a dynamic tal array instead of a list. - fix a unit test that depended on the order of returned flows - fix bug: lightnind doesn't like if I reuse the partid of a failed flow, therefore use a higher partid than any of the previous attempts. - plugin_err instead of LOG_BROKEN if sendpay fails and we cannot get a an error code. - fix wrong comments. - remove the background timer. - This is a bugfix. Previous to this the MCF network was built using the knowledge of the min and max liquidity but it didn't take into account pending HTLCs. - Also remove the min_prob_success option but hardcode a 90% value. Removing some options that are not relevant to the user, they're kept for developer mode only: - base_fee_penalty - min_prob_success - prob_cost_factor - remove heap.h, not used Signed-off-by: Lagrang3 <eduardo.quintana@pm.me>

The global is an *internal* hack because dijkstra_item_mover doesn't take a context arg! It should be used with care. Easy, since all the accessors exist: we just hand in the struct dijkstra. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

- adopt "const <type> *"convention - remove use_shadow option for some pyln tests - show prob. information of flows into paynotes - show prob. of success of entire payment flow in paynotes - minflow: We were not releasing the memory of flow arrays when replacing them with a new canditate. - use memleak_scan_obj in memleak_check - replace u64 with size_t Signed-off-by: Lagrang3 <eduardo.quintana@pm.me>

Signed-off-by: Lagrang3 <eduardo.quintana@pm.me>

rustyrussell · 2023-07-31T01:54:31Z

... and fixed a typo in run-mcf-diamond.c, which had string length wrong:

diff --git b/plugins/renepay/test/run-mcf-diamond.c a/plugins/renepay/test/run-mcf-diamond.c
index 3ca75cec6..b764d5b0e 100644
--- b/plugins/renepay/test/run-mcf-diamond.c
+++ a/plugins/renepay/test/run-mcf-diamond.c
@@ -80,10 +80,10 @@ int main(int argc, char *argv[])
 	assert(node_id_from_hexstr("0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518", 66, &l2));
 	assert(node_id_from_hexstr("035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d", 66, &l3));
 	assert(node_id_from_hexstr("0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199", 66, &l4));
-	assert(short_channel_id_from_str("1x2x0", 7, &scid12));
-	assert(short_channel_id_from_str("1x3x0", 7, &scid13));
-	assert(short_channel_id_from_str("2x4x0", 7, &scid24));
-	assert(short_channel_id_from_str("3x4x0", 7, &scid34));
+	assert(short_channel_id_from_str("1x2x0", 5, &scid12));
+	assert(short_channel_id_from_str("1x3x0", 5, &scid13));
+	assert(short_channel_id_from_str("2x4x0", 5, &scid24));
+	assert(short_channel_id_from_str("3x4x0", 5, &scid34));
 
 	mods = gossmap_localmods_new(tmpctx);

And rebased on master to get latest test flake fixes.

rustyrussell · 2023-07-31T01:54:44Z

Ack 5056ebc

cdecker reviewed Jul 6, 2023

View reviewed changes

plugins/renepay/debug.h Outdated Show resolved Hide resolved

plugins/renepay/dijkstra.c Outdated Show resolved Hide resolved

plugins/renepay/dijkstra.c Outdated Show resolved Hide resolved

plugins/renepay/dijkstra.c Outdated Show resolved Hide resolved

rustyrussell added this to the v23.08 milestone Jul 25, 2023

rustyrussell reviewed Jul 25, 2023

View reviewed changes

renepickhardt reviewed Jul 25, 2023

View reviewed changes

rustyrussell reviewed Jul 28, 2023

View reviewed changes

Import lqueue from CCAN

614c905

[ Split into separate commit --RR ] Signed-off-by: Lagrang3 <eduardo.quintana@pm.me>

rustyrussell and others added 6 commits July 31, 2023 11:21

renepay: an experimental payment plugin

f4ae111

Signed-off-by: Lagrang3 <eduardo.quintana@pm.me> Changelog-Added: Plugins: `renepay`: an experimental pay plugin implementing Pickhardt payments (`renepay` and `renepaystatus`).

renepay: add information decay linear in time

562f6e8

Signed-off-by: Lagrang3 <eduardo.quintana@pm.me>

renepay: add test

5056ebc

Signed-off-by: Lagrang3 <eduardo.quintana@pm.me>

rustyrussell merged commit e5695b3 into ElementsProject:master Jul 31, 2023

Lagrang3 mentioned this pull request Dec 11, 2023

pay: most payments fail #6793

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

renepay: an experimental payment plugin #6376

renepay: an experimental payment plugin #6376

Lagrang3 commented Jul 4, 2023

cdecker left a comment

rustyrussell commented Jul 25, 2023

rustyrussell left a comment

rustyrussell Jul 25, 2023

renepickhardt Jul 25, 2023

rustyrussell Jul 26, 2023

Lagrang3 Jul 28, 2023

rustyrussell Jul 25, 2023

rustyrussell Jul 25, 2023

Lagrang3 Jul 28, 2023

renepickhardt left a comment

renepickhardt Jul 25, 2023

Lagrang3 Jul 26, 2023

renepickhardt Jul 25, 2023

Lagrang3 Jul 27, 2023

renepickhardt Jul 25, 2023

Lagrang3 Jul 26, 2023

renepickhardt Jul 25, 2023

Lagrang3 Jul 26, 2023

Lagrang3 Jul 26, 2023

Lagrang3 Jul 28, 2023 •

edited

Loading

rustyrussell Jul 28, 2023

Lagrang3 Jul 29, 2023

rustyrussell Jul 28, 2023

rustyrussell commented Jul 31, 2023

rustyrussell commented Jul 31, 2023

rustyrussell commented Jul 31, 2023

		struct flow *fp = tal(list_ctx,struct flow);
		struct flow *fp = tal(this_ctx,struct flow);

renepay: an experimental payment plugin #6376

renepay: an experimental payment plugin #6376

Conversation

Lagrang3 commented Jul 4, 2023

Overview

Pickhardtpayments

References

cdecker left a comment

Choose a reason for hiding this comment

rustyrussell commented Jul 25, 2023

rustyrussell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

renepickhardt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lagrang3 Jul 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rustyrussell commented Jul 31, 2023

rustyrussell commented Jul 31, 2023

rustyrussell commented Jul 31, 2023

Lagrang3 Jul 28, 2023 •

edited

Loading