-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3/3]: Blinded Route Error Handling #8485
[3/3]: Blinded Route Error Handling #8485
Conversation
Important Auto Review SkippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis update introduces significant enhancements to the Lightning Network, focusing on improving privacy and functionality through the support of blinded payments. By simplifying configuration and adding new features for route blinding, the changes aim to enhance anonymity and usability. The modifications span across configuration, documentation, feature management, error handling, and testing, reflecting a comprehensive effort to support more secure and private transactions. Changes
Possibly related issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Opening this as a draft because I still want to test this against other implementations, but basic functionality is in place. |
b7b9bc3
to
6b7c6d2
Compare
9d21a0c
to
f430ed4
Compare
Update: ready for review! |
34cb0f6
to
fe61ef2
Compare
f8a2c5b
to
cc09f5f
Compare
d8a1835
to
bb5380f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow I learned a lot here 🤯 Thanks so much for the detailed PR description - it gave me a great blueprint of the relevant switch & link code.
Most (if not all?) of my comments are questions mostly for my understanding.
This all looks really good!!
htlcswitch/switch.go
Outdated
@@ -1456,10 +1456,31 @@ func (s *Switch) checkCircularForward(incoming, outgoing lnwire.ShortChannelID, | |||
// The ciphertext will be derived from the failure message proivded by context. | |||
// This method returns the failErr if all other steps complete successfully. | |||
func (s *Switch) failAddPacket(packet *htlcPacket, failure *LinkError) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still making my way through the diff so this might be answered in a later commit but: what about local switch failures for where we are the intro node? - could this be done in a catch all here by having a isBlinding
field in htlcPacket
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cause at this point we have already parsed the onion so would know if a blinding point is embedded there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is wrong here (sorry!) - by the time we get here we're looking at our outgoing HTLC so the blinding point will always be set if we're part of a blinded route
htlcswitch/link.go
Outdated
// blindedRouteFailure looks up the originally added htlc, and provides a | ||
// switched out failure message if it was part of a blinded route. | ||
func (l *channelLink) blindedRouteFailure(htlcIdx uint64) ( | ||
lnwire.FailureMessage, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do "lnwire.FailureMessage, bool, error" just to stick to the "if a bool is returned, check that before referencing" pattern? just to avoid accidental nil pointer dereferences in future.
htlcswitch/link.go
Outdated
@@ -2036,6 +2036,20 @@ func (l *channelLink) handleUpstreamMsg(msg lnwire.Message) { | |||
} | |||
} | |||
|
|||
// Lookup the original htlc, if it was part of a blinded route | |||
// switch out our error. | |||
blindingFailure, err := l.blindedRouteFailure(msg.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so just checking my understanding: if our peer is behaving correctly, the case lnwire.CodeInvalidBlinding: case above will always be overwritten here right? cause they only send Malformed back if we are also part of the path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if our peer is behaving correctly, the case lnwire.CodeInvalidBlinding: case above will always be overwritten here
Yeah. Didn't want to tie the blinded conversion to getting the Malformed
(in case of buggy/bad peers) so I think this is the simplest way of handling it?
📓 Note to self to add a comment here.
htlcswitch/link.go
Outdated
@@ -2083,9 +2097,36 @@ func (l *channelLink) handleUpstreamMsg(msg lnwire.Message) { | |||
} | |||
} | |||
|
|||
// Lookup the original htlc, if it was part of a blinded route | |||
// switch out our error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another understanding check: if we receive an UpdateFailHTLC
from our peer when we are also in the blinded route, then they are breaking the rules right? Cause they should only send us UpdateFailMalformed
... right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah 👿
a29ea3c
to
4cde71c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow I learned a lot here 🤯 Thanks so much for the detailed PR description - it gave me a great blueprint of the relevant switch & link code.
Can second that, great PR 🙏, especially the itests! Will need to look closer into spec compliance and the general flow.
4cde71c
to
40e969c
Compare
…ypes This commit moves all our validation related to the presence of fields into ValidateParsedPayloadTypes so that we can handle them in a single place. We draw the distinction between: - Validation of the payload (and the context within it's being parsed, final hop / blinded hop etc) - Processing and validation of encrypted data, where we perform additional cryptographic operations and validate that the fields contained in the blob are valid. This helps draw the line more clearly between the two validation types, rather than splitting some payload-releated blinded hop processing into the encrypted data processing part. The downside of this approach (vs doing the blinded path payload check _after_ payload validation) is that we have to pass additional context into payload validation (ie, whether we got a blinding point in our UpdateAddHtlc - as we already do for isFinalHop).
When handling blinded errors, we need to know whether there was a blinding key in our payload when we successfully parsed our payload but then found an invalid set of fields. The combination of parsing and validation in NewPayloadFromReader means that we don't know whether a blinding point was available to us by the time the error is returned. This commit splits parsing and validation into two functions so that we can take a look at what we actually pulled of the payload in between parsing and TLV validation.
We need to know what role we're playing to be able to handle errors correctly, but the information that we need for this is held by our iterator: - Whether we had a blinding point in update add (blinding kit) - Whether we had a blinding point in payload As we're now going to use the route role return value even when our err!=nil, we rename the error to signal that we're using less canonical golang here. An alternative to this approach is to attach a RouteRole to our ErrInvalidPayload. The downside of that approach is: - Propagate context through parsing (whether we had updateAddHtlc) - Clumsy handling for errors that are not of type ErrInvalidPayload
Introduce two wrapper types for our existing SphinxErrorEncrypter that are used to represent error encrypters where we're a part of a blinded route. These encrypters are functionally the same as a sphinx encrypter, and are just used as "markers" so that we know that we need to handle our error differently due to our different role. We need to persist this information to account for restart cases where we've resovled the outgoing HTLC, then restart and need to handle the error for the incoming link. Specifically, this is relevant for: - On chain resolution messages received after restart - Forwarding packages that are re-forwarded after restart This is also generally helpful, because we can store this information in one place (the circuit) rather than trying to reconstruct it in various places when forwarding the failure back over the switch.
Create our error encrypter with a wrapped type if we have a blinding point present. Doing this in the iterator allows us to track this information when we have both pieces of information available to us, compared to trying to handle this later down the line: - Downstream link on failure: we know that we've set a blinding point for out outgoing HTLC, but not whether we're introduction or not - Upstream link on failure: once the failure packet has been sent through the switch, we no longer know whether we were the introduction point (without looking it up / examining our payload again / propagating this information through the switch).
Set obfuscator for use in blinded error handling when we forward failures through the switch.
777b36c
to
c64c012
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉 awesome work!
There are some itest failures, all failing at SendToRouteV2
in the on chain to blinded
test. It seems to be a too long test for the DefaultTimeout
together with the call. A way to circumvent would be to call the SendToRouteV2
on the Router
with a larger timeout.
c64c012
to
cdc18ce
Compare
Bumped here, hopefully a minute is long enough for the GH potato - will see how this run goes and increase further if anything flakes! |
Fixed a race that popped up in the uint tests here (unrelated) here: #8694 |
With that aside, there is a failure of a new itest added in this PR (neutrino backend):
|
cdc18ce
to
696652b
Compare
Thought our timeout might be too slow, bumping up timeout to larger window so that there's time for CI to resolve on-chain. |
For tests where our payments require an on-chain resolution, provide longer timeout and return cancel functions.
696652b
to
e826d5e
Compare
This itest adds a test that we still propagate blinded errors back properly after a restart with an on-chain resolution. The test also updates our sendpayment timeout to longer so that there's time to resolve the on chain claim.
e826d5e
to
6572f5c
Compare
Made #8696 to address that sqlite race. |
This PR is the third and final in the route blinding forwarding series, including:
INVALID_ONION_BLINDING
Depends on #8160
Update: Writing up a summary of my understanding of error handling in LND / how this relates to route blinding because I got this conceptually wrong on my first attempt.
Description
TL;DR for route blinding errors:
UpdateFailHTLC
withinvalid_onion_blinding
UpdateFailMalformedHTLC
withinvalid_onion_blinding
In LND, there are a few places where a HTLC can fail:
processRemoteAdds
fails to add the incoming HLTC ->sendHTLCError
handlePacketForward
fails to transmit the incoming HTLC to the outgoing link ->failAddPacket
handleDownstreamUpdateAddHtlc
fails to add the outgoing HTLC ->FailAdd
handleUpstreamMsg
receives a failure from peer, propagated to incoming link ->FailHTLC
Following review from @bitromortac, approach has been updated to handle all of these errors on the incoming link so that we don't have to scatter handling all over the codebase (see below for previous approach).
Something that's key for this PR is that we need to know when we're in a blinded route when we're handling the error on the incoming link so that we send back the correct message. We could do this with a bool propagated through the switch, but there are some pesky edge cases around restarts/ on-chain resolutions where we don't have this information on hand (started with this approach - full branch here).
This PR takes the approach of using the existing
ErrorEncrypterType
to store a "marker" struct (which just wraps our existing implementation) so that we have the information available to us when we're resolving the HTLC across restarts. We're already storing the type for mock vs real encryptor, so this persistence costs us nothing extra. This trick is very helpful because we can just restore the circuit on the packet on re-forwrad through the switch and we'll have full information about the type of forward this was available.Previous Approach
There are a few subtleties about this error handling that are important: * There are *a lot* of different paths that errors can take, so we have to account for the switch required in route blinding in a few places * With the exception of (1) where we've just handled the onion, we only handle error encryption (when we're the source of the error) _in the switch_, so we can't just handle errors on the incoming link because we don't have our encrypted available (and surfacing it is a bad layering violation) * Malformed errors are handled on the outgoing link, and converted into regular failuresWith all this in mind, the following approach is taken:
UpdateFailHTLC
on the outgoing link, errors will be appropriately encrypted in the switch as requiredUpdateMalformedHTLC
if we are a relaying node, leave as-is otherwise (which will appropriately transmit both introduction node errors and regular unblinded errors)This does lead to an admittedly odd flow for relaying nodes that get an error from their peer: receive malformed -> convert to failure -> convert back to malformed, but we don't transmit malformed errors across the switch (and I don't think that we should). And while we're sending this message to the peer, we handle everything else exactly like a failure so I think it's worth leaving flow as as .