-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable watchtower plugins via the commitment_revocation
hook
#3601
Conversation
This is an initial proposal to brainstorm around, and I am happy to see what else is missing from this. Ping @ZmnSCPxj, @NicolasDorier and @sr-gi since you were involved in either #1353 or #3422.
|
The `commitment_revocation` hook is a chained hook, i.e., multiple plugins can | ||
register it, and they will be called in the order they were registered in. | ||
Plugins should always return `{"result": "continue"}`, otherwise subsequent | ||
hook subscribers would not get called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the use-case be for interactivity of hooks? Is there any reason for a plugin to somehow affect state update? Would not htlc_accepted
(and maybe a future ptlc_accepted
) be sufficient as a hook, since we will largely be manipulating HTLC/PTLCs in the future only? It seems to me more sensible to use a notification for commitment_revocation
event, and use a hook for htlc_accepted
.
We can only emit this event (whether notification or hook) after the counterparty sends a revoke_and_ack
to us. This is pretty much the last step in the update dance (I think we send our own revoke_and_ack
even before we know of the counterparty revoke_and_ack
, as otherwise it would be a chicken-and-egg problem), so I do not see what a hook can affect for this state: it would be "too late" at this point to do anything about this state, so what can a plugin affect for this state? Even if we disconnect at this point, as far as the counterparty is concerned the new state is now the latest (and it will re-send the revocation data on channel re-establish).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.
What it could affect would be dependent processing: i.e. we do not consider any HTLCs introduced by the new state to be "irrevoably committed" until the most recent commitment_revocation
hook returns. This implies adding a new state in the database: we know the revocation of a previous state, but have not yet confirmed with commitment_revocation
that all the plugins have confirmed they have processed this revocation correctly, in which case on a restart we will re-invoke commitment_revocation
on all plugins. It seems safer to implement this first if we are going to make this a hook anyway.
So processing goes this way:
- We receive
revoke_and_ack
. - We set the channel to state "revocation known but
commitment_revocation
not yet done." - Send
commitment_revocation
down the hook chain. - Once all hooked plugins say
{"result": "continue"}
set the state to the "new state" and handle HTLC forwardings and HTLC incoming payments as previous.
In case of a hard crash while commitment_revocation
is still being propagated/handled by plugin(s), then on restart we redo the commitment_revocation
to all plugins, to ensure that it is handled.
I am unsure what we should do in case any hooked plugin says anything that is not {"result": "continuie"}
.... maybe fail all incoming HTLCs on the affected channel with temporary_node_failure
instead of doing any further processing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in case of a failure in commitment_revocation
(i.e. it returns anything other than a non-error result {"result": "continue"}
), then what we should do is to store the failed commitment_revocation
on-database, then set the channel to "always fail" mode for now.
Then future updates on the channel, we:
- We receive
revoke_and_ack
. - Append the new
commitment_revocation
to the list in he database. - For every
commitment_revocation
in the in-database list (in order of appending!):- Send
commitment_revocation
to the plugin hook chain. - If it succeeds, remove that from the in-database list and continue with the next item in the list.
- If it fails, exit this loop and mark all live incoming HTLCs as failing with
temporary_node_failure
.
- Send
- If the above loop completely succeeded, the in-database list would be empty.
- If the above loop completely succeeded, then we can safely handle (forward or claim) any incoming HTLCs.
On startup we perform the above loop as well, and handle incoming HTLCs as normal if the loop completes.
This allows us to keep the node "alive" in some way if its watchtower dies.
If we would want to send out over that channel while its list of pending commitment_revocation
s is not empty, then we refuse to perform that send out and fail it immediately (temporary_node_failure
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the use-case be for interactivity of hooks? Is there any reason for a plugin to somehow affect state update? Would not
htlc_accepted
(and maybe a futureptlc_accepted
) be sufficient as a hook, since we will largely be manipulating HTLC/PTLCs in the future only? It seems to me more sensible to use a notification forcommitment_revocation
event, and use a hook forhtlc_accepted
.
The reason I like it being a hook is that if we decide that notifying watchtowers is as critical as it's assumed to be, we can make sure that the watchtower gets the penalty_tx
before we make progress. This can be achieved by splitting the peer_got_revoke
function, then sending the towire_channel_got_revoke_reply
and save to DB only after the watchtower plugin gave us the all-clear. If the watchtower is currently unreachable it is perfectly fine for a plugin to return a {"result": "fail"}
signalling that we must abort since proceeding could mean that our counterparty publishes this specific state when we're offline, and the watchtower cannot save us. Aborting and restarting means we get another chance later, since our peer will resend the last round of commitment and revocation, at which point the watchtower hopefully is operational again.
I'm not sure myself whether the watchtower hook is that important, but it's nice to have the option :-) 'm not proposing any other state modification, since those are better suited for htlc_accepted
and ptlc_accepted
as you correctly point out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reason for an intermediate state. If we receive the revocation and don't complete all the hooks, we will ask the peer for it again, just as if we'd never received it.
If we are treating the penalty transaction as a blackbox, then the
I had two approaches in mind regarding second stage htlc transactions:
If we sent them within the encrypted blob then the tower could start monitoring them after a breach + decryption. The main issue I can see with this approach is that a big chunk of data will be potentially replicated in the tower. Every new commitment will have it's pending htlcs attached to it, so several commitments may share htlcs. This makes the user waste some of the tower space he paid for.
Second stage htlc transaction could be sent to the tower following the same approach as commitment-penalty transactions pairs are. The This approach has the advantage that each |
Not quite: since the This disappears if we are able to switch to |
Oh yeah, you're right. I forgot about the htlc_sigs exchanged on every update. The first approach also has the issue of how the blobs grow between different updates, that may leak the number of htlcs in the blob though.
Not sure I follow you here, this seems an issue for the first approach given how the blob grows, but not for this one. Even if an htlc is added/removed between updates and the user sends one appointment more/less in the next round that it did in the last round, given that all locators will be new, that should be indistinguishable from multiple channels being updated at the same time. |
I was still commenting on your "This approach has the advantage that each In any case giving each HTLC-commitment pair its own appointment seems better than putting them all in a single var-length appointment, we can give an upper bound on blob size and the watchtower can charge based on that. I was trying to devise a safe alternative where we use synthetic locators for the HTLC blobs as well but there is no safe alternative, sigh; it would require giving the watchtower the ability to sign for any penalty on the HTLC, which would allow the watchtower to be bribed by the counterparty by making the penalty on the HTLC split between the coutnerparty and the watchtower, thus not trustless. |
Is having access to this really useful at all? It's a deadline by which the transaction has to be broadcast, and there is little to no leeway in how the transaction is being broadcast (there is no way of adding more fees at the moment). So isn't broadcasting as soon as the revoked commitment is being seen the only thing that can possibly be done? Even if it's too late, what else can a watchtower do?
From this I gather that your watchtower would indeed work if we provide only the
If we have either of those we get almost perfect watchtowers for free. eltoo's We could also emulate
A simple "solution" would be to pad. However that's uncompressible data that the watchtower needs to store in addition to the simple penalty. We could always just treat |
Yeah, not much to do for now. I was specially thinking on fee bumping and priority queues here, but I guess we could leave that for the future.
Agreed, I think that treating them in the same way as commitment/penalties is the best way to go for now. |
If you expect it to be set I'll just add it, since it is quite easy to add anyway. Just seemed like a rather random fact to use in this interface 😉
If you're using the commitment txid as a locator and decryption key all HTLCs could still get collated this way, but we could spread the HTLCs over a number of watchtowers, since they're likely less valuable than the |
Yes, but some might strongly prefer the punitive Poon-Dryja mechanism rather than the Decker-Russell-Osuntokun mechanism even post
That seems like a workable approach as well. For a consistent interface, we could divide blobs into two classes:
An HTLC-timeout/-success tx blob would:
A commitment-tx blob would:
Still, this is less elegant than the simple transaction-reaction that we could do with the proposed scheme where every commitment-tx and every HTLC-success/timeout-tx gets its own blob stored in the watchtower. |
Agreed. Furthermore, from the tower perspective using the same approach for both makes the design cleaner, since any kind of transaction goes trough the same logic, and triggers (either old commitments or htlc-success/timeout) are threaten the same way. |
I've definitely heard a few people expressing that preference, however I think |
51b0edc
to
e087459
Compare
Ok, since no objections have been raised I think we can consider this a working solution. Rebased on top of |
80396ce
to
c27e01c
Compare
I was playing around with trying to neaten this a little, and I think your implementation is wrong? Seems like openingd needs to tell lightnignd two things for the penalty tx: the remote tx, and which output is the to-self output for that. However, you didn't add the remote tx, you're still just giving lightningd the local tx, which doesn't work. The penalty tx creation just needs remote commit txid, outpoint, and amount, and we need to add this to the db if we want to replay penalty creation (or the penalty tx directly, but saving this information is cheaper).
From here:
Now, once we've done all this, we might as well get channeld to generate the penalties as well, which means it needs the penalty feerate (and it needs to be kept up-to-date like the normal feerate). This avoids some of your patches, and I think keeps it simple. I'll push my work for you to look at... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've based this on your work; it's unfinished, but I think the result looks neater?
https://github.com/rustyrussell/lightning/tree/guilt/pr-3601
channeld/commit_tx.c
Outdated
@@ -95,6 +95,7 @@ struct bitcoin_tx *commit_tx(const tal_t *ctx, | |||
const struct htlc **htlcs, | |||
const struct htlc ***htlcmap, | |||
u64 obscured_commitment_number, | |||
int output_index[NUM_SIDES], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the commit which adds this, you don't actually populate it. Why?
lightningd/peer_htlcs.c
Outdated
* next finds the previous one to build the penalty transaction. */ | ||
tal_free(channel->prev_commitment); | ||
channel->prev_commitment = channel->next_commitment; | ||
channel->next_commitment = tal(channel, struct commitment_state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird indent (spaces instead of tabs)?
But this commit leaves next_commitment non-NULL but invalid, which is weird.
channeld/channel_wire.csv
Outdated
@@ -109,6 +109,9 @@ msgdata,channel_got_funding_locked,next_per_commit_point,pubkey, | |||
|
|||
# When we send a commitment_signed message, tell master. | |||
msgtype,channel_sending_commitsig,1020 | |||
msgdata,channel_sending_commitsig,commit_txid,bitcoin_txid, | |||
msgdata,channel_sending_commitsig,commit_to_them_outnum,s16, | |||
msgdata,channel_sending_commitsig,commit_to_them_amount,amount_sat, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use ?u16 and ?amount_msat here IMHO, to mean optional fields?
That's a dumb mistake on my end, I'll fix that asap.
I agree that wrapping this information in a struct would cut down on the changes in the inter-daemon protocol and it'd be a much nicer bundle to hand around. I'll cherry pick the struct into the PR. I'm not sure limiting the output categorization to the
Like I wrote in the PR message I chose to construct the penalty tx in
Also if we need to store the commitment Re: storing the information in the database: I was unsure whether we'd require this, but I see now that there is a gap in my coverage, namely when the remote peer sent the revoke, but we didn't process it, then we won't get the previous commitment on reconnect, which I is what I had mistakenly assumed. Too bad I was hoping to avoid having to drag state around further. |
Clean rebase on top of |
3a56d42
to
06d4e0b
Compare
We could provide the commitment tx without witness data, all we needed was to see the commitment tx outputs in detail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is confusing, because we do too much work in lightningd. Moving the work into channeld makes things simpler, IMHO.
But we definitely need a restart test to make sure...
msg, msg, &pbase)) | ||
channel_internal_error(channel, | ||
"bad channel_resending_commitsig %s", | ||
tal_hex(channel, msg)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think technically it's OK if we continue after calling channel_internal_error, but I'd prefer a return here...
# When we reconnect and resend the commitment we also need to tell the master | ||
# the commitment details so it can create the penalty once we get the revoke. | ||
msgtype,channel_resending_commitsig,1130 | ||
msgdata,channel_resending_commitsig,pbase,?penalty_base, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is in the wrong place, leading to us having to keep two values inside lightningd.
See 74c8a21
Have channeld keep the two values, give it at the right time (when we receive the revoke), and we don't need that logic in lightningd. We just need lightningd to save (the single, last pbase) to the db so it can hand it back to channeld when channeld restarts.
Removed the milestone since this isn't ready yet. |
There is now an alternative implementation of this (without changing the public interface) in PR #3645 |
44505c3
to
171d017
Compare
The main daemon, and only the main daemon, needs the ability to sign a transaction on behalf of a channel for the penalty transaction creation.
Suggested-by: Rusty Russell <@rustyrussell>
Changelog-Added: plugin: Added a new `commitment_revocation` hook that provides the plugin with penalty transactions for all revoked transactions.
Added a couple of invariants: - Commitment numberss are incremented by 1 on every commitment, so either two consecutive commitments differ by exactly 1 or one of the commitments isn't set (due to lack of funds on the other side). - Revocations must match the prev_commitment number if one was stashed away.
There might be a way we can get away with storing just one, but I couldn't convince myself that either can be dropped.
Implements the
commitment_revocation
hook for watchtowers. Commitments areforwarded from
openingd
andchanneld
, and tracked inlightningd
. Uponreceiving a revocation
lightningd
generates the penalty transaction andcalls the hook so plugins can postprocess it, e.g., encrypt it, and then push
it to the watchtower.
The result of the hook is currently not used, and we don't actually wait for
the hook call to complete before telling
channeld
that it can continue. Thismatches the save path in the DB, since we'd simply be replaying the
commitment/revocation if we reconnect anyway. Should synchronous operation be
required we can move the
channel_got_revoce_reply
to hook callback, but forthe sake of simplicity I left it were it is for the time being.
The penalty transaction is generated in
lightningd
, notchanneld
whichrequired forwarding information from
channeld
about thecommitment_transaction
tolightningd
, and requireslightningd
to callout to
hsmd
for the signature, instead ofchanneld
. The decision to havelightningd
create the penalty transaction has the following reasons:channeld
may not have all the information about the previous commit,i.e., in case we come from
openingd
the first commitment is negotiated inopeningd
so we'd have to funnel that information throughlightningd
anyway.
lightningd
will also deliver the hook call, and it can avoid building andsigning the penalty tx if there are no subscribers, avoiding the tx
generation and the request to
hsmd
altogether (not yet implemented).Included in the
commitment_revocation
hook call is the fully signed penaltytransaction, as well as the commitment transaction hash the penalty
transaction is spending from, i.e., the transaction whose broadcast should
trigger the release of the penalty transaction. The commitment transaction
hash could also be extracted from the penalty transaction (it's its only input
after all), but this allows treating the penalty transaction as an opaque blob
and avoids parsing the transaction in the plugin.
Not included are the
htlc_success
andhtlc_failure
transactions thatmay also be spending
commitment_tx
outputs. This is because thesetransactions are much more dynamic and have a predictable timeout, allowing
wallets to ensure a quick checkin when the CLTV of the HTLC is about to
expire.
Open Questions
name descriptive for what it does, not what it can be used for, so calling
it
watchtower
seemed inapropriate. I'm open for better suggestions though😉
htlc_success
andhtlc_timeout
transactions, despite not having the full information regarding fees and
aggregations available?
make sense to define a notification instead? I went with the hook for now
mainly so we can eventually make it synchronous.
Closes #3422
Closes #1353
Closes #3645
Closes #3659