-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
renepay: an experimental payment plugin #6376
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to submit these comments as usual, sorry if they are already out of date.
I'm not quite through the entire PR, but what I've seen is looking excellent 🚀
Great, I will rebase this, since db changes broke it... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was so nice to read! 🧡
No serious changes, mainly tweaks to things which work, but can improve, and answering some TODO.
For review, I squashed and rebased your commits without any changes, and split the ccan intro into a separate commit. Then I played with some fixups on top for you to look at:
https://github.com/rustyrussell/lightning/tree/guilt/pr-6376
fraction); | ||
pay_plugin->last_time = now_sec; | ||
|
||
// TODO(eduardo): are there route hints for B12? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. We have blinded paths instead, which must be used. Though, a node can just include a dummy blinded path to/from itself.
So, if it doesn't, we need to isolate the destination from any real channels, and create fake "channels" from the entry point using the blinded paths (they have info on fees and cltv). But that's just typing, basically, since algo stays the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have several blinded paths which do not sum to the full but a higher amount? basically like receivers MPP split? In that case the minimum cost flow problem could be redefined as a single source multi destination problem but the constraints become weired as the blinded paths probably expect either the full amount or no amount at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you can offer multiple paths with lesser capacity, it will Just Work. Or multiple paths each with sufficient capacity, and sender chooses how to use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hard to solve before the release. Can I just drop support for Bolt12 until we fix this?
// * notification. */ | ||
// // plugin_log(pay_plugin->plugin,LOG_DBG,"received shutdown notification, freeing data."); | ||
// pay_plugin->ctx = tal_free(pay_plugin->ctx); | ||
// return notification_handled(cmd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, this should work...
But you shouldn't do this anyway.
/* Remove self from map when done */ | ||
// TODO(eduardo): | ||
// Is this desctructor really necessary? the chan_extra will deallocated | ||
// when the chan_extra_map is freed. Anyways valgrind complains that the | ||
// hash table is removing the element with a freed pointer. | ||
// tal_add_destructor2(ce, destroy_chan_extra, chan_extra_map); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you free the "ce" though, the pointer stays in the hash table! valgrind is right, this can be fatal.
You have two choices:
- Remove self from table on free automatically, using destructor. This is simplest!
- Remove from table manually. This optimization might happen after benchmarking, only!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the chan_extra
stored in the chan_extra_map
are allocated with the map as the parent, their destructor will first remove them from the hash table. The problem IMO is that tal will free the hash table first and then call the destructor of every child which will try to remove every chan_extra_map
from a hash table that no longer exists.
I'm stuck here.
One possible solution would be to allocate the chan_extra
s with another parent context and ensure that that context is freed before the hash table, maybe by having that ctx as parent of the hashtable itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not through with reviewing and have only had a high level look at the code and comments and also left only a few high level comments yet.
I am confident that this PR is a huge step in the right direction and I am looking forward to finnish / extend the review soon.
/* | ||
* mu (μ) is used as follows in the cost function: | ||
* | ||
* -log((c_e + 1 - f_e) / (c_e + 1)) + μ fee |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes the paper suggests a weighted arithmetic mean between routing cost and uncertainty cost. However given all the issues that one can see from this I want to open the discussion about other means (like harmonic as deiscussed in #4771 ) or at least adding laplacian smoothing by adding some constant (c.f.: https://en.wikipedia.org/wiki/Additive_smoothing )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of adding prob. cost P and routing cost R using a fraction f (0<=f<=1):
cost = P * f + (1-f) R
so that when f=1 you get the entire costs is related to uncertainty, while with f=0 the costs is only fees. The good thing about this choice is that the interval to search is compact [0,1], and one can do binary search to look for the best value of f.
Then we upgraded this to integer numbers by adopting the current parameter mu
that lies in the interval [0,MU_MAX]
and
cost = P*mu + R*(MU_MAX-mu)
in essence is the same but with integers.
Allow me some time to study the discussion #4771.
bfee = c->half[dir].base_fee, | ||
delay = c->half[dir].delay; | ||
|
||
return pfee + bfee* base_fee_penalty+ delay*delay_feefactor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the CLTV still be a penalty in finding payment flows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could we consider it otherwise? If the delay is non-important then we set the delay_feefactor=0
or just a small number.
* problem is: | ||
* | ||
* Find a routing solution that pays the least of fees while keeping | ||
* the probability of success above a certain value `min_probability`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
META COMMENT: -log(Probability)
can be defined as an uncertainty cost and routing fees can be considered as a routing cost. In that case we would wish to minimize the total cost which consists of the combination of both routing and uncertainty cost. I think that is a more general and less confusing way to formulate the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We could formulate the problem in that way, the issue is that we don't know how to combine the uncertainty cost and routing cost, and in practice as a user I am interested in getting my payment through quickly and with low fees. It's a two dimensional optimization problem, in general the cheapest routes will not correspond to the most reliable ones. We thought the easiest user experience will be: get me any solution that satisfies this constraint in uncertainty (eg. prob must be greater than 50%) and routing costs (eg. fees are less than 10sats), in the background I will try to go as cheap as possible, but those bounds have the priority. That was the simplest choice I could make.
This process is taken care of by the binary search in the minflow
function and the is_better
function decides the priorities (both in mcf.c
).
In the future we can improve this.
* arc_2 -> [a+(b-a)*f1, a+(b-a)*f2) | ||
* arc_3 -> [a+(b-a)*f2, a+(b-a)*f3) | ||
* | ||
* where f1 = 0.5, f2 = 0.8, f3 = 0.95; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the non uniform split of the capacity! However I believe 0.5 is a pretty high probability. I fear this will trigger and saturate too large payments. I'd love to have a discussion on the choices. also one does not need to take a uniform distribution for the probability to begin with. LND uses a a bimodal distribution. However that it hard to linearize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were just chosen by hand:
- 4 sections seemed like a nice number because it is a power of two (so that arcs can be identified with bits)
- from 0 to 0.5 the uniform distribution of liquidity gives an almost linear cost function, then the following pieces must have ever decreasing capacities to preserve linearity, that's why from 0.5 to the next pivot I chose dx=0.3, resulting in 0.8, while the last segment is 0.8 + 0.15 = 0.95
The last two segments are worst linear approximation than the first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are more parameters can we have to tune with experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@renepickhardt, there is another detail here that bothers me, it is the fact that we cut the tail of the distribution at 95% of the channel capacity. That means there might be solutions that the algorithm will not see becauseit will think that the channel is already saturated at 95%.
We need either to add more arcs above that increasing the computational and memory burden, or produce a less precise linearization, for example like extending the last bin from 80% to 99%.
This is easy to fix, because these are just parameters.
plugins/renepay/mcf.c
Outdated
struct flow *fp = tal(list_ctx,struct flow); | ||
struct flow *fp = tal(this_ctx,struct flow); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tallocate this off flows...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caveat: these are flow candidates, minflow
produces different sets flows and selects one. Ownership of the flows to the flow array is elegant and allows me to do:
if(is_better(candidate_flow_array,best_flow_array))
{
tmp = best_flow_array;
best_flow_array = candidate_flow_array;
tal_free(tmp);
}
plugins/renepay/mcf.c
Outdated
/* Stablish ownership. */ | ||
for(int i=0;i<tal_count(flows);++i) | ||
{ | ||
flows[pos++] = tal_steal(flows,ld->flow_path); | ||
flows[i] = tal_steal(flows,flows[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and skip this loop.
[ Split into separate commit --RR ] Signed-off-by: Lagrang3 <eduardo.quintana@pm.me>
Trivial rebase which merged many of the fixup commits together, made flake8 happy with the final commit, and added a Changelog-Added line. Ack bbc9591 (Cleanups can go in followup PRs...) |
Signed-off-by: Lagrang3 <eduardo.quintana@pm.me> Changelog-Added: Plugins: `renepay`: an experimental pay plugin implementing Pickhardt payments (`renepay` and `renepaystatus`).
Signed-off-by: Lagrang3 <eduardo.quintana@pm.me>
- remove internal gheap checks - add check for arc_t.chanidx overflow - remove outdated comments - check the delta flow bounds before augmenting along a path - get_flow_paths uses a dynamic tal array instead of a list. - fix a unit test that depended on the order of returned flows - fix bug: lightnind doesn't like if I reuse the partid of a failed flow, therefore use a higher partid than any of the previous attempts. - plugin_err instead of LOG_BROKEN if sendpay fails and we cannot get a an error code. - fix wrong comments. - remove the background timer. - This is a bugfix. Previous to this the MCF network was built using the knowledge of the min and max liquidity but it didn't take into account pending HTLCs. - Also remove the min_prob_success option but hardcode a 90% value. Removing some options that are not relevant to the user, they're kept for developer mode only: - base_fee_penalty - min_prob_success - prob_cost_factor - remove heap.h, not used Signed-off-by: Lagrang3 <eduardo.quintana@pm.me>
The global is an *internal* hack because dijkstra_item_mover doesn't take a context arg! It should be used with care. Easy, since all the accessors exist: we just hand in the struct dijkstra. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
- adopt "const <type> *"convention - remove use_shadow option for some pyln tests - show prob. information of flows into paynotes - show prob. of success of entire payment flow in paynotes - minflow: We were not releasing the memory of flow arrays when replacing them with a new canditate. - use memleak_scan_obj in memleak_check - replace u64 with size_t Signed-off-by: Lagrang3 <eduardo.quintana@pm.me>
Signed-off-by: Lagrang3 <eduardo.quintana@pm.me>
... and fixed a typo in run-mcf-diamond.c, which had string length wrong: diff --git b/plugins/renepay/test/run-mcf-diamond.c a/plugins/renepay/test/run-mcf-diamond.c
index 3ca75cec6..b764d5b0e 100644
--- b/plugins/renepay/test/run-mcf-diamond.c
+++ a/plugins/renepay/test/run-mcf-diamond.c
@@ -80,10 +80,10 @@ int main(int argc, char *argv[])
assert(node_id_from_hexstr("0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518", 66, &l2));
assert(node_id_from_hexstr("035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d", 66, &l3));
assert(node_id_from_hexstr("0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199", 66, &l4));
- assert(short_channel_id_from_str("1x2x0", 7, &scid12));
- assert(short_channel_id_from_str("1x3x0", 7, &scid13));
- assert(short_channel_id_from_str("2x4x0", 7, &scid24));
- assert(short_channel_id_from_str("3x4x0", 7, &scid34));
+ assert(short_channel_id_from_str("1x2x0", 5, &scid12));
+ assert(short_channel_id_from_str("1x3x0", 5, &scid13));
+ assert(short_channel_id_from_str("2x4x0", 5, &scid24));
+ assert(short_channel_id_from_str("3x4x0", 5, &scid34));
mods = gossmap_localmods_new(tmpctx); And rebased on master to get latest test flake fixes. |
Ack 5056ebc |
Overview
This PR about implementing a payment plugin called
renepay
that seeks toconstruct optimal Multi-Path-Payments (o-MPP) in terms of fees and reliability.
The original idea was published by Rene Pickhardt and Stefan Richter [1].
There exists a python implementation of o-MPP by Pickhardt (pickhardtpayments).
This work was possible thanks to a Build on L2 Grant.
Pickhardtpayments
Any node in the Lightning Network (LN) has knowledge of the existence of public
channels and their capacities, but the channel liquidity is unknown unless that
channel is local to the node.
Therefore, when a node constructs a route to send a payment it is assuming that
all channels along the path have enough liquidity to forward the payment amount.
If the payment fails, alternative routes can be tried and the process repeats.
It is reasonable to assume that the smaller the payment amount it is more likely
that the payment attempt succeeds. With the use of MPP one could split the
payment into smaller parts and send it along different routes. However, also the
more routes a MPP has, the more likely it will fail. One must find a compromise
between the number of routes and the size of each payment part as well as an
strategy to select routes if one wishes to maximize the probability of success
of the payment.
Pickhardt-Richter [1] paper formalizes this problem stating that the liquidity
of every channel is a random variable distributed in the integer range between
0 and C, where C is the capacity of the channel. For simplicity we are
neglecting the channel reserved fund.
A MPP can also be modeled as flow in the LN, that is an integer value is
associated to each channel, which corresponds to the amount of satoshis this
channel must forward so that this MPP can be completed. Notice that a channel
could be used by two different routes within the same MPP and the flow
associated to that channel is the sum of all funds that need to get through it.
Formally a flow is an integer function in the domain of the channels.
For a realistic payment the Flow constraints include:
c
the flow cannot exceed the channel's capacity:0 <= Flow(c) <= Capacity(c)
;balance
computedas the sum of all incoming flows minus the outgoing flows,
the sum of all nodes
balance
must be zero.the destination node have an
balance
which equals the requested paymentamount.
The payer node have a negative
balance
and must correspond to thepayment amount plus the routing fees.
Routing nodes will have a positive
balance
that corresponds to the total feesthey collect.
The probability of success of a certain MPP, or its associated
Flow
,is then the multiplication of the
probabilities that each channel is able to forward the corresponding
Flow
:Maximizing this probability is equivalent to minimizing the function
-log Prob(Flow)
If we assume that:
c
is such that-log Prob(c can forward x)
is always a convex function ofx
;every node different from the source and the destination is zero;
then the problem of finding an optimal
Flow
that satisfies the capacity and balanceconstraints, while maximizing
Prob(Flow)
can be solved efficiently usingpolynomial algorithms. Chapter 14 of Ahuja-Magnanti-Orlin's Network Flows book [2]
is dedicated to this class of problems.
References
[1] Rene Pickhardt, Stefan Richter. Optimally Reliable & Cheap Payment Flows on the Lightning Network https://arxiv.org/abs/2107.05322
[2] R.K. Ahuja, T.L. Magnanti, and J.B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993.