-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pay: Avoid clobbering the channel_hints at all costs #4195
pay: Avoid clobbering the channel_hints at all costs #4195
Conversation
As far as I can see the |
I'm now running this on mainnet and will report if this works better in the next days... |
Tried this changes today with a 150$ payment on mainnet (this is impossible with unpatched code for me). I captured stdout/info/debug incase someone needs to look... |
@niftynei I think this may need consideration for rc2? |
👍 I'll try to fix up the failing |
37be428
to
2646e1c
Compare
Ok, now ready to review. Added the following changes:
See end-to-end diff between the last version and current version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today I retried this both versions (this and also the one from yesterday 23ab1c5 ) on mainnet in order to get a payment from my node to my Phoenix wallet (not connected to my node, phoenix has their own channel management). I tried first 0.01btc
then 0.006btc
and then 0.003btc
(~50$). Unfortunately all of these attempts failed for both Git revisions today.
Last time, when I paid a bitrefill invoice of ~150$, it worked almost instantly (which does not work on master for me). We should test this version and the master version internally between the devs to see if one or the other is really better.
plugins/libplugin-pay.c
Outdated
if (remove) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function argument remove
is never written to within this function. Thus, if remove
is true
it can be expected that thisif (remove) continue;
statement leads to a pointless(?) iteration through this route and it would be the same as if (remove) break;
just a little slower. Is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, I just didn't want to indent the whole for
-loop into an if
:-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then why not break
instead of continue
? But not that important ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an excellent usecase for goto
Kind of hard to tell where the issue is. It could be local, but it could also be in the network (min-cut across all paths from you to the destination doesn't have sufficient capacity), or it might be the recipient not having incoming capacity (though Phoenix opens channels on-demand iirc). I think the PR itself is an improvement, since we reduce the time to completion in all cases (not attempting parts/routes that are doomed to fail). While I was initially thrilled to see that you managed to perform a payment with such a high value, I don't think that or the more recent negative result are representative or indicative of the full effect this PR might have or not. As such I'd judge this PR more on its own merits to increase correctness and efficiency, and consider improvements to the success rate as anecdotal until we can perform really representative measurements. Regarding representative measurements, an idea that came to mind was to create a fake-MPP receiver plugin, that'll pretend unknown HTLCs destined for the local node are part of an MPP payment and hold on to the parts for the 60 second timeout just like a real MPP recipient would, and return different errors depending of whether the MPP payment would have been successful. This way we could use random payment hashes, that are not redemable, to test MPP payments. With a number of devs running this plugin, we could perform MPP tests without actually sending any funds back and forth (always a contentious issue among friends) 😉 |
I made sure the payment would have fit in the channel I already have on phoenix (you can see your dynamic channel list). In any case your right we can't compare the cases, since bitrefill != phoenix and the situation channel wise might have been different suboptimal this time. |
We were delaying the channel_hint update till after the `createonion` call which gave us the same situation we had with concurrent `getroute` calls. Now we update the hints as soon as the plugins have had their say in the route construction. If we still fail, either because a modifier changed the route causing the failure, or because we interleaved the route computation for multiple parts, we reset the attempt and retry inline (i.e., without creating a new sub-payment). Notice that interleaved route computations now only happen if the modifier makes an async call to some RPC or similar.
This allows us to atomically update all channel_hints and determine if we had a collision and therefore should retry.
This adds a new state `PAYMENT_STEP_RETRY_GETROUTE` which is used to retry just that one step, without spawning a completely new attempt. It's a new state so that modifiers do not act on it twice. Changelog-Fixed: pay: Improved the performance of the `pay` command considerably by avoiding conflicting changes to our local network view.
I previously mistyped the rather lengthy conditions for failures, so let's dissect it into its smaller components and add rationale behind the individual parts of the decision.
80ed69f
to
72efbbb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 72efbbb
Turned the continue
to a goto
for maximum readability.
plugins/libplugin-pay.c
Outdated
if (remove) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an excellent usecase for goto
Due to the asynchronous
getroute
andcreateonion
calls we were issuing multiple concurrentgetroute
calls for multi-part payments. This is particularly painful with thepresplit
modifier which causes all of the initial parts to be scheduled at the same time, competing for a route.Since #4093 we no longer have an asynchronous call to
getroute
which alleviates the problem, but doesn't resolve it:createonion
is still asynchronous and we were only updating the channel_hints after it was called. In addition plugins can yield control of theio_loop
if they perform any asynchronous RPC call, resulting in the same issue again.This PR moves the application of the route to the channel_hints up to just before the
createonion
call, which makes the route creation and postprocessing synchronous unless a plugin performs an RPC call.In order to address the final venue for colliding route computations we introduce a new state, that serves only to repeat the getroute call, but nothing else (theoretically a plugin could hook into it as well, but that's unlikely to be of use).