pay: Avoid clobbering the channel_hints at all costs #4195

cdecker · 2020-11-11T18:00:55Z

Due to the asynchronous getroute and createonion calls we were issuing multiple concurrent getroute calls for multi-part payments. This is particularly painful with the presplit modifier which causes all of the initial parts to be scheduled at the same time, competing for a route.

Since #4093 we no longer have an asynchronous call to getroute which alleviates the problem, but doesn't resolve it: createonion is still asynchronous and we were only updating the channel_hints after it was called. In addition plugins can yield control of the io_loop if they perform any asynchronous RPC call, resulting in the same issue again.

This PR moves the application of the route to the channel_hints up to just before the createonion call, which makes the route creation and postprocessing synchronous unless a plugin performs an RPC call.

In order to address the final venue for colliding route computations we introduce a new state, that serves only to repeat the getroute call, but nothing else (theoretically a plugin could hook into it as well, but that's unlikely to be of use).

cdecker · 2020-11-11T18:06:59Z

As far as I can see the test_mpp_adaptive test is the only one failing at the moment, will need to dig into that one.

m-schmoock · 2020-11-12T16:46:27Z

@cdecker

I'm now running this on mainnet and will report if this works better in the next days...

m-schmoock · 2020-11-13T10:28:00Z

Tried this changes today with a 150$ payment on mainnet (this is impossible with unpatched code for me).
With this patch it worked in 5 seconds on the first attempt.

I captured stdout/info/debug incase someone needs to look...

rustyrussell · 2020-11-13T23:09:25Z

@niftynei I think this may need consideration for rc2?

cdecker · 2020-11-16T10:07:16Z

@niftynei I think this may need consideration for rc2?

👍 I'll try to fix up the failing mpp_adaptive test asap

cdecker · 2020-11-16T13:45:44Z

Ok, now ready to review. Added the following changes:

Fixed the route applicability check (I had a parenthesis in the wrong place)
Split the applicability condition into multiple, documented, parts in order to avoid the above.

See end-to-end diff between the last version and current version.

m-schmoock

Today I retried this both versions (this and also the one from yesterday 23ab1c5 ) on mainnet in order to get a payment from my node to my Phoenix wallet (not connected to my node, phoenix has their own channel management). I tried first 0.01btc then 0.006btc and then 0.003btc (~50$). Unfortunately all of these attempts failed for both Git revisions today.

Last time, when I paid a bitrefill invoice of ~150$, it worked almost instantly (which does not work on master for me). We should test this version and the master version internally between the devs to see if one or the other is really better.

m-schmoock · 2020-11-16T16:39:07Z

plugins/libplugin-pay.c

+		if (remove)
+			continue;


The function argument remove is never written to within this function. Thus, if remove is true it can be expected that thisif (remove) continue; statement leads to a pointless(?) iteration through this route and it would be the same as if (remove) break; just a little slower. Is this intended?

I know, I just didn't want to indent the whole for-loop into an if :-)

then why not break instead of continue? But not that important ;)

This seems like an excellent usecase for goto

cdecker · 2020-11-16T18:48:16Z

Today I retried this both versions (this and also the one from yesterday 23ab1c5 ) on mainnet in order to get a payment from my node to my Phoenix wallet (not connected to my node, phoenix has their own channel management). I tried first 0.01btc then 0.006btc and then 0.003btc (~50$). Unfortunately all of these attempts failed for both Git revisions today.

Last time, when I paid a bitrefill invoice of ~150$, it worked almost instantly (which does not work on master for me). We should test this version and the master version internally between the devs to see if one or the other is really better.

Kind of hard to tell where the issue is. It could be local, but it could also be in the network (min-cut across all paths from you to the destination doesn't have sufficient capacity), or it might be the recipient not having incoming capacity (though Phoenix opens channels on-demand iirc).

I think the PR itself is an improvement, since we reduce the time to completion in all cases (not attempting parts/routes that are doomed to fail). While I was initially thrilled to see that you managed to perform a payment with such a high value, I don't think that or the more recent negative result are representative or indicative of the full effect this PR might have or not.

As such I'd judge this PR more on its own merits to increase correctness and efficiency, and consider improvements to the success rate as anecdotal until we can perform really representative measurements.

Regarding representative measurements, an idea that came to mind was to create a fake-MPP receiver plugin, that'll pretend unknown HTLCs destined for the local node are part of an MPP payment and hold on to the parts for the 60 second timeout just like a real MPP recipient would, and return different errors depending of whether the MPP payment would have been successful. This way we could use random payment hashes, that are not redemable, to test MPP payments. With a number of devs running this plugin, we could perform MPP tests without actually sending any funds back and forth (always a contentious issue among friends) 😉

m-schmoock · 2020-11-16T19:57:00Z

I made sure the payment would have fit in the channel I already have on phoenix (you can see your dynamic channel list). In any case your right we can't compare the cases, since bitrefill != phoenix and the situation channel wise might have been different suboptimal this time.

We were delaying the channel_hint update till after the `createonion` call which gave us the same situation we had with concurrent `getroute` calls. Now we update the hints as soon as the plugins have had their say in the route construction. If we still fail, either because a modifier changed the route causing the failure, or because we interleaved the route computation for multiple parts, we reset the attempt and retry inline (i.e., without creating a new sub-payment). Notice that interleaved route computations now only happen if the modifier makes an async call to some RPC or similar.

This allows us to atomically update all channel_hints and determine if we had a collision and therefore should retry.

This adds a new state `PAYMENT_STEP_RETRY_GETROUTE` which is used to retry just that one step, without spawning a completely new attempt. It's a new state so that modifiers do not act on it twice. Changelog-Fixed: pay: Improved the performance of the `pay` command considerably by avoiding conflicting changes to our local network view.

I previously mistyped the rather lengthy conditions for failures, so let's dissect it into its smaller components and add rationale behind the individual parts of the decision.

niftynei

ACK 72efbbb

Turned the continue to a goto for maximum readability.

niftynei · 2020-11-17T00:26:58Z

plugins/libplugin-pay.c

+		if (remove)
+			continue;


This seems like an excellent usecase for goto

cdecker marked this pull request as draft November 11, 2020 18:03

cdecker force-pushed the pay-getroute-retry branch 4 times, most recently from 37be428 to 2646e1c Compare November 16, 2020 12:46

cdecker marked this pull request as ready for review November 16, 2020 13:45

m-schmoock reviewed Nov 16, 2020

View reviewed changes

niftynei added this to the v0.9.2 milestone Nov 16, 2020

cdecker added 4 commits November 16, 2020 19:53

pay: Add a pre-apply check to channel_hint updates

a5d621b

This allows us to atomically update all channel_hints and determine if we had a collision and therefore should retry.

pay: Cleanup the route applicability checks for channel hints

72efbbb

I previously mistyped the rather lengthy conditions for failures, so let's dissect it into its smaller components and add rationale behind the individual parts of the decision.

niftynei force-pushed the pay-getroute-retry branch from 80ed69f to 72efbbb Compare November 17, 2020 01:55

niftynei approved these changes Nov 17, 2020

View reviewed changes

rustyrussell merged commit 313976e into ElementsProject:master Nov 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pay: Avoid clobbering the channel_hints at all costs #4195

pay: Avoid clobbering the channel_hints at all costs #4195

cdecker commented Nov 11, 2020

cdecker commented Nov 11, 2020

m-schmoock commented Nov 12, 2020

m-schmoock commented Nov 13, 2020 •

edited

Loading

rustyrussell commented Nov 13, 2020

cdecker commented Nov 16, 2020

cdecker commented Nov 16, 2020

m-schmoock left a comment •

edited

Loading

m-schmoock Nov 16, 2020 •

edited

Loading

cdecker Nov 16, 2020

m-schmoock Nov 16, 2020

niftynei Nov 17, 2020

cdecker commented Nov 16, 2020

m-schmoock commented Nov 16, 2020

niftynei left a comment

niftynei Nov 17, 2020

pay: Avoid clobbering the channel_hints at all costs #4195

pay: Avoid clobbering the channel_hints at all costs #4195

Conversation

cdecker commented Nov 11, 2020

cdecker commented Nov 11, 2020

m-schmoock commented Nov 12, 2020

m-schmoock commented Nov 13, 2020 • edited Loading

rustyrussell commented Nov 13, 2020

cdecker commented Nov 16, 2020

cdecker commented Nov 16, 2020

m-schmoock left a comment • edited Loading

Choose a reason for hiding this comment

m-schmoock Nov 16, 2020 • edited Loading

Choose a reason for hiding this comment

cdecker Nov 16, 2020

Choose a reason for hiding this comment

m-schmoock Nov 16, 2020

Choose a reason for hiding this comment

niftynei Nov 17, 2020

Choose a reason for hiding this comment

cdecker commented Nov 16, 2020

m-schmoock commented Nov 16, 2020

niftynei left a comment

Choose a reason for hiding this comment

niftynei Nov 17, 2020

Choose a reason for hiding this comment

m-schmoock commented Nov 13, 2020 •

edited

Loading

m-schmoock left a comment •

edited

Loading

m-schmoock Nov 16, 2020 •

edited

Loading