Add `next_funding_txid` awareness for dual-fund opens #6824

niftynei · 2023-10-27T20:23:28Z

This commit reworks how reconnects work for dual-funding/v2 channel opens.

We migrate to using the funding_next_txid protocol that @t-bast developed (see lightning/bolts@cd3c99e). It works very nicely!

PR also includes some cleanups and tidying in the dualopend.c daemon.

Short overview of changes:

We now create channel_inflights with blank/empty last_tx (the commitment tx field).
We split up saving a channel and saving the commitments for a channel into two inter-daemon calls (dualopend_commit_ready + dualopend_commit_rcvd). commit_ready is called (as the spec requires) before we send our commitment_signed. commit_rcvd is called after we've received the peer's commitment_signed.
We use the next_funding_txid sent by a peer to resend them the requested commitment_signed + tx_signatures, as requested/required.
We can handle receiving commitment_signed out of order (prior to this we'd fail since we'd only accept a commitment_signed in a specific part of the initial channel fow).
Some other general cleanups, tidying.
Updated the tests to make sure reconnects work for RBFs + initial setup!

It looks like a lot of commits, but they should be fairly easy to review? Here's hoping any way.

vincenzopalazzo · 2023-10-27T22:12:50Z

We migrate to using the funding_next_txid protocol

There is a description of this protocol somewhere? or the BOLT is already describing this proposal?

niftynei · 2023-10-27T23:42:56Z

There is a description of this protocol somewhere? or the BOLT is already describing this proposal?

lightning/bolts@cd3c99e

rustyrussell · 2023-10-30T01:34:17Z

wallet/wallet.h

+		/* We save channels that aren't yet committed :( */
+		BUILD_ASSERT(DUALOPEND_OPEN_INIT == 1);
+		return s;
 	}


Absolutely not. Not again.

You need a new state, DUALOPEND_OPEN_COMMITTED. When you add it, the compiler will tell you everywhere you have to change it.

rustyrussell · 2023-10-30T01:35:30Z

lightningd/peer_control.c

+bool too_early_v2(const struct channel *channel)
+{
+	return channel->state == DUALOPEND_OPEN_INIT
+	       && !channel->last_tx;
+}
+


No, this is exactly the kind of vague nomenclature I had to clean up. If you add a new state, you don't need this.

rustyrussell · 2023-10-30T01:38:50Z

lightningd/dual_open_control.c

+		/* we check validity in dualopend, this is a sanity check */
+		assert(bitcoin_txid_eq(&txid, &inflight_txid));


As a rule (loosely followed!) we prefer to fail the channel if a subd does something weird, rahter than kill lightningd.

rustyrussell · 2023-10-30T01:41:35Z

lightningd/dual_open_control.c

+	/* Only after we've updated/saved our psbt do we check
+	 * for peer connected */
+	if (!channel->owner)
+		return command_fail(cmd, FUNDING_PEER_NOT_CONNECTED,
+				    "Peer not connected");
+


!!WARNING!! This will be messy after rebase, since now in master we use param_check! So if we are simply running the check RPC, we still want to do this check.

I'm not sure how to handle this in general. I mean, the command still does something, though it "fails": that just seems... weird.

Agree that it's weird that we save incoming data to disk, but it makes fundchannel work really nicely across reconnects so unsure if that's the worst thing??

Yes, agree that it's best from where we are, but worth thinking for future.

In theory there are two things going on: saving the info, and telling the peer. This is a bit like close, so, you could argue we shouldn't actually here, or even block until peer is told.

openingd/dualopend.c

rustyrussell · 2023-10-30T01:52:19Z

openingd/dualopend.c

 		return NULL;
 	}
+	tal_free(wscript);



This commit is just weird... the only danger with tmpctx is when you've got an inner io_loop, which is already an anti-pattern, and there's not one here that I can see?

ACK, i'll drop it!

rustyrussell · 2023-10-30T01:53:21Z

openingd/dualopend.c

+static char *do_reconnect_commit_sigs(const struct state *state,
+				      const struct tx_state *tx_state)


If you allocate for the caller, you must use the ctx pattern...

rustyrussell · 2023-10-30T01:54:58Z

openingd/dualopend.c

-static void do_reconnect_dance(struct state *state)
+static bool do_reconnect_dance(struct state *state)
 {


I suspect you did this because your previous commit did return false and that didn't compile? :)

i think it's b/c i wanted reconnected to be false if we aborted; regardless i'm getting rid of the idea of returning a bool entirely...

rustyrussell · 2023-10-30T01:57:47Z

Also, valgrind failures in test_multifunding_v1_v2_mixed:

Valgrind error file: valgrind-errors.25972
==25972== Uninitialised byte(s) found during client check request
==25972==    at 0x2019D3: memcheck_ (mem.h:247)
==25972==    by 0x201DBA: db_bind_blob (bindings.c:94)
==25972==    by 0x2023EC: db_bind_signature (bindings.c:206)
==25972==    by 0x1BED18: wallet_channel_save (wallet.c:2202)
==25972==    by 0x1BFD2C: wallet_channel_insert (wallet.c:2482)
==25972==    by 0x145AB3: wallet_commit_channel (dual_open_control.c:1472)
==25972==    by 0x14ADF9: handle_commit_ready (dual_open_control.c:3375)
==25972==    by 0x14B59C: dual_opend_msg (dual_open_control.c:3541)
==25972==    by 0x1A8850: sd_msg_read (subd.c:555)
==25972==    by 0x37A160: next_plan (io.c:59)
==25972==    by 0x37AD95: do_plan (io.c:407)
==25972==    by 0x37ADD7: io_ready (io.c:417)
==25972==  Address 0x6cf0ea8 is 40 bytes inside a block of size 104 alloc'd
==25972==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==25972==    by 0x38CAFE: allocate (tal.c:256)
==25972==    by 0x38D179: tal_alloc_ (tal.c:463)
==25972==    by 0x38D345: tal_alloc_arr_ (tal.c:506)
==25972==    by 0x202385: db_bind_signature (bindings.c:202)
==25972==    by 0x1BED18: wallet_channel_save (wallet.c:2202)
==25972==    by 0x1BFD2C: wallet_channel_insert (wallet.c:2482)
==25972==    by 0x145AB3: wallet_commit_channel (dual_open_control.c:1472)
==25972==    by 0x14ADF9: handle_commit_ready (dual_open_control.c:3375)
==25972==    by 0x14B59C: dual_opend_msg (dual_open_control.c:3541)
==25972==    by 0x1A8850: sd_msg_read (subd.c:555)
==25972==    by 0x37A160: next_plan (io.c:59)

t-bast · 2023-10-30T15:33:37Z

Nice work, I just tested that against eclair, and here is the report (there are a few things to fix, but otherwise it's looking good and we've understood the spec the same way).

`commit_sig` sent but not received

This test fails: when cln is not the channel initiator, it waits for the other node to send commit_sig before sending its own commit_sig. There is no reason to do that, both nodes should send commit_sig immediately after exchanging tx_complete? Otherwise it's a missed opportunity to finalize the channel creation on reconnection, because in that case cln hasn't saved the channel and fails it on reconnection.

`commit_sig` received but `tx_signatures` not received

This test works, with one minor issue: on reconnection, cln sends its tx_signatures twice (duplicate?).

`tx_signatures` received by only one node

OK

RBF attempt forgotten by `cln` but not `eclair`

OK -> on reconnection, cln correctly sends tx_abort which lets eclair forget this RBF attempt.

RBF attempt partially signed

OK -> on reconnection, commit_sig and tx_signatures are correctly retransmitted.

crash on `tx_signatures`

If cln receives tx_signatures before commit_sig, it crashes the whole node: it should instead just send an error for that channel.

niftynei · 2023-10-31T05:29:43Z

Ok this still has some bugs + I'm expecting a few tests to fail but:

cln sends its tx_signatures twice (duplicate?).

Nice catch. This was causing a crash in our tests also; knowing to look for this as a root cause was very helpful. Thank you! I've patched it not to do this.
3c8cf20

This test fails: when cln is not the channel initiator, it waits for the other node to send commit_sig before sending its own commit_sig

Oh no you caught me red-handed. 🙈 I did wonder if you'd catch me doing this... seems like the answer was yes! It was a quick fix 80e8503, should send ASAP for both sides now :)

If cln receives tx_signatures before commit_sig, it crashes the whole node: it should instead just send an error for that channel.

I think this has been patched (there was a crash happening that I patched that think explains this), but haven't tested conclusively.

Also, valgrind failures in test_multifunding_v1_v2_mixed:

Tested + patched with the latest updates.

Thanks @rustyrussell + @t-bast for the extremely fast review!!

cdecker · 2023-10-31T11:40:29Z

.msggen.json

@@ -195,6 +195,7 @@
            "CLOSINGD_SIGEXCHANGE": 4,
            "DUALOPEND_AWAITING_LOCKIN": 10,
            "DUALOPEND_OPEN_COMMITTED": 12,
+            "DUALOPEND_OPEN_COMMIT_READY": 13,


We need to be careful with this: the enum numeric values are stored in the DB, so adding to the end is the right thing to do here (not to reorder states, causing faulty loads from DB), however IIRC there are places where we check for sets of states by using < and > on the numeric values. This would obviously fail.

Can we have a quick check through the code to see if there are numeric comparisons in the code on values for this enum?

iiuc those have all been updated to be comprehensive swtich statements in a clean up Rusty did earlier this year, but I'll do a sweep to double check!

Good news: you cannot get this wrong, as we have channel_state_in_db() which indeed, is patched correctly, AND I removed all those numeric comparisons last release for exactly this reason (when the new splicing state was added, I had the same concerns!).

t-bast · 2023-10-31T15:19:37Z

Nice catch. This was causing a crash in our tests also; knowing to look for this as a root cause was very helpful. Thank you! I've patched it not to do this.

Confirmed, this is now working fine 👍

Oh no you caught me red-handed. 🙈 I did wonder if you'd catch me doing this... seems like the answer was yes! It was a quick fix 80e8503, should send ASAP for both sides now :)

Heh, did you think you could trick my testing skills?

This is indeed fixed, cln now immediately sends commit_sig! However, it seems like cln is not storing the channel at that point. If we disconnect before cln receives eclair's commit_sig and reconnect, cln has forgotten the channel. It should remember it, we should be able to finalize the signing flow at that point.

I think this has been patched (there was a crash happening that I patched that think explains this), but haven't tested conclusively.

I'm still seeing that crash, with the following logs:

spenderp: FATAL SIGNAL 11 (version v23.08.1-404-g62ff475-modded)
0x559836dc98ba send_backtrace
	common/daemon.c:33
0x559836dc9951 crashdump
	common/daemon.c:75
0x7f37f42c351f ???
	./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x7f37f441ac92 ???
	../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:83
0x559836db7760 bitcoin_txid_eq
	./bitcoin/tx.h:29
0x559836db7760 collect_sigs
	plugins/spender/openchannel.c:509
0x559836db81de check_sigs_ready
	plugins/spender/openchannel.c:531
0x559836db84dd json_peer_sigs
	plugins/spender/openchannel.c:611
0x559836dbcad7 ld_command_handle
	plugins/libplugin.c:1611
0x559836dbcd9d ld_read_json_one
	plugins/libplugin.c:1721
0x559836dbce29 ld_read_json
	plugins/libplugin.c:1741
0x559836ef3bff next_plan
	ccan/ccan/io/io.c:59
0x559836ef40da do_plan
	ccan/ccan/io/io.c:407
0x559836ef4177 io_ready
	ccan/ccan/io/io.c:417
0x559836ef5b14 io_loop
	ccan/ccan/io/poll.c:453
0x559836dbd48d plugin_main
	plugins/libplugin.c:1948
0x559836db22bf main
	plugins/spender/main.c:35
0x7f37f42aad8f __libc_start_call_main
	../sysdeps/nptl/libc_start_call_main.h:58
0x7f37f42aae3f __libc_start_main_impl
	../csu/libc-start.c:392
0x559836da3774 ???
	???:0
0xffffffffffffffff ???
	???:0
2023-10-31T15:15:57.458Z INFO    plugin-spenderp: Killing plugin: exited during normal operation
2023-10-31T15:15:57.458Z **BROKEN** plugin-spenderp: Plugin marked as important, shutting down lightningd!
2023-10-31T15:15:57.458Z DEBUG   lightningd: io_break: lightningd_exit
2023-10-31T15:15:57.458Z DEBUG   lightningd: io_loop_with_timers: main
2023-10-31T15:15:57.458Z DEBUG   connectd: REPLY WIRE_CONNECTD_START_SHUTDOWN_REPLY with 0 fds
2023-10-31T15:15:57.458Z DEBUG   lightningd: io_break: connectd_start_shutdown_reply
2023-10-31T15:15:57.458Z DEBUG   021ccce7bc396996c8f3b7bfeb1e30c6600269517026a74adfe2217b7187879797-dualopend-chan#1: Status closed, but not exited. Killing
2023-10-31T15:15:57.458Z DEBUG   lightningd: Command returned result after jcon close
2023-10-31T15:15:57.458Z INFO    021ccce7bc396996c8f3b7bfeb1e30c6600269517026a74adfe2217b7187879797-chan#1: Unsaved peer failed. Deleting channel.
2023-10-31T15:15:57.464Z DEBUG   lightningd: io_break: destroy_plugin
2023-10-31T15:15:57.464Z DEBUG   connectd: Shutting down
2023-10-31T15:15:57.464Z DEBUG   gossipd: Shutting down
2023-10-31T15:15:57.464Z DEBUG   hsmd: Shutting down

niftynei · 2023-11-01T02:58:38Z

me, friday: this is a cute, well written, <1k line patch
me today: i am exhausted; why is this patch now 1.7k lines

From the spec: Once peers are ready to exchange commitment signatures, they must remember the details of the funding transaction to allow resuming the signatures exchange if a disconnection happens. Basically this means we add channels to the database before we've gotten commitments for them; it's nice that there's now a state for commitments recevied but we now save the channel prior to that. This commit makes it possible to track the pre-commit-rcvd but not quite open-init state.

What if the last_tx is empty for the channel? We're about to let the channels not have last_txs at start.

We're going to add the commitment transaction data at a different time than when we init a new inflight. Split them up!

we now save them before we get the commitment data.

If we get an error on a channel that doesn't have commitments yet, we can just delete it.

Here, we split up what was "commit_received" into two phases: - commit-ready, where we're about to send our commitment tx to peer - commit-received, when we've gotten the commitment tx from our peer This lets us do the right thing (as far as the spec is concerned) with returning the correct 'next_funding_txid' on reconnect (later commits).

We don't actually use this internal to this method? Weird. Anyway, if we don't want/need it allow the caller to signal that by passing in NULL, if desired.

When we reconnect, if we get a note from the peer that they dont know about a pending inflight, we need to be able to clean it up so we can restart/re-negotiate a new RBF etc. This adds a cleanup method to remove any inflights for a channel without a last_tx (commitment tx)

If you resend us a commitment tx, and we already have one, we check that it's correct!

(ones that are missing last_txs)

depending on the state, we might - forget the channel - drop it to chain - reconnect via dualopend

We need to keep track of if we've gotten the last negotiation's commitment sigs, for reconnect logic (helps us know what messages to send in the reconnect case)

You can't publish a tx you don't have!

Changelog-Changed: RPC `listpeerchannels`.`inflights` may sometimes not include `scratch_txid` (mandatory -> optional)

We're going to need to reuse this for reconnect; make the method standalone in that it can figure out what to send to HSMD independent of where it's located in the setup call flow.

Let's make it easier to build remote commitments (we're going to need this for reconnects soon!)

Move common code for verifying a commitment sig from peer into one place. On reconnects, we'll need to verify peer's commitments. Changelog-None.

A bit gratuitous, but it's a bit cleaner on a whole?

If we get a commitment-signed message from a peer, outside of a normal flow, process it! We're about to send these during reconnect, so we need to be able to handle them!

If you get the right series of disconnects, it's possible for your peer to send you a tx-sigs even though the current state of the channel open is that you've seen the funding open on chain (your channel_ready[LOCAL] = true) In this case, if we haven't marked that we've seen the tx sigs yet, we go ahead and mark them as seen and just ignore this tx-sigs msg.

In the case where you're echoing back a tx-abort, just let it through. Not doing this causes problems in the case where your node has forgotten about an in-progress open. This fixes the following problem: - you send a tx-abort (even tho you have marked tx-sigs as received) - peer echos it back (we echo back tx-aborts always) - you throw an error because you're already in a tx-abort unallowed state In this commit, we allow for echos to come thru no matter our current state and this fixes things/makes them work as expected.

Here we conform to the specification, which requires that we handle next-funding-id in a specific way. Note that we were already sending it, but now we actually correctly handle its presence. Changelog-Changed: Spec: dual-funding now follows the next-funding-id rules.

Makes it easier to see why things are failing in the logs.

rustyrussell

Minor comments only, on the last dev_disconnect hack.

rustyrussell · 2023-11-02T00:52:13Z

.msggen.json

@@ -195,6 +195,7 @@
            "CLOSINGD_SIGEXCHANGE": 4,
            "DUALOPEND_AWAITING_LOCKIN": 10,
            "DUALOPEND_OPEN_COMMITTED": 12,
+            "DUALOPEND_OPEN_COMMIT_READY": 13,


Good news: you cannot get this wrong, as we have channel_state_in_db() which indeed, is patched correctly, AND I removed all those numeric comparisons last release for exactly this reason (when the new splicing state was added, I had the same concerns!).

rustyrussell · 2023-11-02T00:59:14Z

lightningd/dual_open_control.c

+	/* Only after we've updated/saved our psbt do we check
+	 * for peer connected */
+	if (!channel->owner)
+		return command_fail(cmd, FUNDING_PEER_NOT_CONNECTED,
+				    "Peer not connected");
+


Yes, agree that it's best from where we are, but worth thinking for future.

In theory there are two things going on: saving the info, and telling the peer. This is a bit like close, so, you could argue we shouldn't actually here, or even block until peer is told.

rustyrussell · 2023-11-02T01:05:41Z

lightningd/dual_open_control.c

+		if (!updated) {
+			log_info(channel->log, "Already had sigs, skipping notif");
+			return;
+		}


!!Thanks!! 🧡 I saw this intermittantly and couldn't figure out what was happening!

rustyrussell · 2023-11-02T01:11:33Z

connectd/multiplex.c

+       case DEV_DISCONNECT_DROP:
+	       /* try again? */
+	       return read_hdr_from_peer(peer_conn, peer);


This is strictly impossible, since our connections always send in order. So now we ignore a msg, that contract is broken. You're using it in a very specific way, so that's OK, but "don't use this" is probably needed.

Can you do it on the send side (i.e. don't send this msg)? Would be significant code reduction...

rustyrussell · 2023-11-02T01:14:44Z

I just want to say that this code was a delight to review. 🧡 !

Let's test that things stay together! One cool thing to note is that now we sort of "magically" recover from pretty brutal disconnects! Very nice!

@t-bast

When we got our peer's sigs, if we were the remote, we would re-notify the plugin, which in turn would re-send the tx-sigs to use. In the case of CLN, we'd then - break, because we'd re-forward the sigs to the `openchannel` plugin, which was then in the wrong state (MULTIFUNDCHANNEL_SIGNED) spenderp: plugins/spender/openchannel.c:598: json_peer_sigs: Assertion `dest->state == MULTIFUNDCHANNEL_SECURED' failed. spenderp: FATAL SIGNAL 6 (version 5880d59-modded) In the case of eclair, they'd just see our 2nd TX_SIGS message and @t-bast would complain: > This test works, with one minor issue: on reconnection, cln sends its tx_signatures twice (duplicate?). This commit does two things: - has the openchannel / spender plugin log a broken instead of crashing when the state is not what we're expecting - stops us from calling the `funder` plugin if this is a replay/second receipt of commit-sigs.

@t-bast

Originally the accepter waited for the peer to send us their commitment sigs before we send ours; this changes things so that the accepter sends their commitment sigs ASAP. This test fails: when cln is not the channel initiator, it waits for the other node to send commit_sig before sending its own commit_sig. There is no reason to do that, both nodes should send commit_sig immediately after exchanging tx_complete? Otherwise it's a missed opportunity to finalize the channel creation on reconnection, because in that case cln hasn't saved the channel and fails it on reconnection. Reported-By: @t-bast

Now that we save the commitment sigs immediately, we have to drop the connection elsewhere in the flow to get the state where only one peer remembers.

We don't let go of the `msg` on error, which triggers a memleak warning! lightningd-2 2023-10-31T19:54:06.582Z **BROKEN** lightningd: MEMLEAK: 0x55ae3615b498 lightningd-2 2023-10-31T19:54:06.582Z **BROKEN** lightningd: label=openingd/dualopend_wiregen.c:919:u8[] lightningd-2 2023-10-31T19:54:06.582Z **BROKEN** lightningd: alloc: lightningd-2 2023-10-31T19:54:06.685Z **BROKEN** lightningd: ccan/ccan/tal/tal.c:477 (tal_alloc_) lightningd-2 2023-10-31T19:54:06.686Z **BROKEN** lightningd: ccan/ccan/tal/tal.c:506 (tal_alloc_arr_) lightningd-2 2023-10-31T19:54:06.686Z **BROKEN** lightningd: openingd/dualopend_wiregen.c:919 (towire_dualopend_send_tx_sigs) lightningd-2 2023-10-31T19:54:06.686Z **BROKEN** lightningd: lightningd/dual_open_control.c:1122 (openchannel2_sign_hook_cb) lightningd-2 2023-10-31T19:54:06.686Z **BROKEN** lightningd: lightningd/plugin_hook.c:194 (plugin_hook_call_next) lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd: lightningd/plugin_hook.c:169 (plugin_hook_callback) lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd: lightningd/plugin.c:660 (plugin_response_handle) lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd: lightningd/plugin.c:772 (plugin_read_json_one) lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd: lightningd/plugin.c:823 (plugin_read_json) lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd: ccan/ccan/io/io.c:59 (next_plan) lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd: ccan/ccan/io/io.c:407 (do_plan) lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd: ccan/ccan/io/io.c:417 (io_ready) lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd: ccan/ccan/io/poll.c:453 (io_loop) lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd: lightningd/io_loop_with_timers.c:22 (io_loop_with_timers) lightningd-2 2023-10-31T19:54:06.688Z **BROKEN** lightningd: lightningd/lightningd.c:1333 (main) lightningd-2 2023-10-31T19:54:06.688Z **BROKEN** lightningd: ../sysdeps/nptl/libc_start_call_main.h:58 (__libc_start_call_main) lightningd-2 2023-10-31T19:54:06.688Z **BROKEN** lightningd: ../csu/libc-start.c:392 (__libc_start_main_impl) lightningd-2 2023-10-31T19:54:06.688Z **BROKEN** lightningd: parents:

If we disconnect, we lose the open_attempt record. Which is fine, but we should prevent the user from starting another RBF if the last one isn't done yet!

Here we make sure we can drop the initial tx to chain, and that an inflight txid that's missing its commitment sigs is properly ignored.

results in an error.

@t-bast

We weren't blocking if the tx-sigs arrived before the commitment sigs. This was causing problems in the openchannel (spender plugin) spenderp: FATAL SIGNAL 11 (version v23.08.1-404-g62ff475-modded) 0x559836dc98ba send_backtrace common/daemon.c:33 0x559836dc9951 crashdump common/daemon.c:75 0x7f37f42c351f ??? ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 0x7f37f441ac92 ??? ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:83 0x559836db7760 bitcoin_txid_eq ./bitcoin/tx.h:29 0x559836db7760 collect_sigs plugins/spender/openchannel.c:509 0x559836db81de check_sigs_ready plugins/spender/openchannel.c:531 0x559836db84dd json_peer_sigs plugins/spender/openchannel.c:611 0x559836dbcad7 ld_command_handle plugins/libplugin.c:1611 0x559836dbcd9d ld_read_json_one plugins/libplugin.c:1721 0x559836dbce29 ld_read_json plugins/libplugin.c:1741 0x559836ef3bff next_plan ccan/ccan/io/io.c:59 0x559836ef40da do_plan ccan/ccan/io/io.c:407 0x559836ef4177 io_ready ccan/ccan/io/io.c:417 0x559836ef5b14 io_loop ccan/ccan/io/poll.c:453 0x559836dbd48d plugin_main plugins/libplugin.c:1948 0x559836db22bf main plugins/spender/main.c:35 0x7f37f42aad8f __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 0x7f37f42aae3f __libc_start_main_impl ../csu/libc-start.c:392 0x559836da3774 ??? ???:0 0xffffffffffffffff ??? ???:0 2023-10-31T15:15:57.458Z INFO plugin-spenderp: Killing plugin: exited during normal operation 2023-10-31T15:15:57.458Z **BROKEN** plugin-spenderp: Plugin marked as important, shutting down lightningd! 2023-10-31T15:15:57.458Z DEBUG lightningd: io_break: lightningd_exit 2023-10-31T15:15:57.458Z DEBUG lightningd: io_loop_with_timers: main 2023-10-31T15:15:57.458Z DEBUG connectd: REPLY WIRE_CONNECTD_START_SHUTDOWN_REPLY with 0 fds 2023-10-31T15:15:57.458Z DEBUG lightningd: io_break: connectd_start_shutdown_reply 2023-10-31T15:15:57.458Z DEBUG 021ccce7bc396996c8f3b7bfeb1e30c6600269517026a74adfe2217b7187879797-dualopend-chan#1: Status closed, but not exited. Killing 2023-10-31T15:15:57.458Z DEBUG lightningd: Command returned result after jcon close 2023-10-31T15:15:57.458Z INFO 021ccce7bc396996c8f3b7bfeb1e30c6600269517026a74adfe2217b7187879797-chan#1: Unsaved peer failed. Deleting channel. 2023-10-31T15:15:57.464Z DEBUG lightningd: io_break: destroy_plugin 2023-10-31T15:15:57.464Z DEBUG connectd: Shutting down 2023-10-31T15:15:57.464Z DEBUG gossipd: Shutting down 2023-10-31T15:15:57.464Z DEBUG hsmd: Shutting down Reported-By: @t-bast

t-bast · 2023-11-02T09:40:12Z

I ran tests again master, and everything is looking good now 👍

niftynei added spec v1.1 things for spec v1.1 dualfunding labels Oct 27, 2023

niftynei added this to the v23.11 milestone Oct 27, 2023

niftynei force-pushed the nifty/df-retries branch from 40c05ef to d020f9e Compare October 27, 2023 20:28

vincenzopalazzo self-requested a review October 27, 2023 21:46

rustyrussell requested changes Oct 30, 2023

View reviewed changes

niftynei force-pushed the nifty/df-retries branch from d020f9e to 96f1bfd Compare October 31, 2023 01:47

niftynei requested a review from cdecker as a code owner October 31, 2023 01:47

cdecker reviewed Oct 31, 2023

View reviewed changes

niftynei force-pushed the nifty/df-retries branch 2 times, most recently from c332489 to 37fa464 Compare November 1, 2023 02:51

niftynei force-pushed the nifty/df-retries branch from 37fa464 to ec4b34e Compare November 1, 2023 19:38

niftynei added 10 commits November 1, 2023 18:19

wallet: allow the channel to not have a last_tx

e3ff336

What if the last_tx is empty for the channel? We're about to let the channels not have last_txs at start.

inflights: split up adding sigs from making a new inflight

a508381

We're going to add the commitment transaction data at a different time than when we init a new inflight. Split them up!

wallet: allow inflights to have empty last_tx

d018b7b

we now save them before we get the commitment data.

dualfund: add switch for if the incoming channel is "too early"

97a253c

If we get an error on a channel that doesn't have commitments yet, we can just delete it.

init channel: only fill in wscript if requested

7dc4cc1

We don't actually use this internal to this method? Weird. Anyway, if we don't want/need it allow the caller to signal that by passing in NULL, if desired.

nit: spelling error (int -> in)

1533d4f

dualfund: when updating an inflight, check for existing data

3a6d712

If you resend us a commitment tx, and we already have one, we check that it's correct!

niftynei added 14 commits November 1, 2023 18:19

dualfund: if we get an abort, clean up dangling inflights

86476f0

(ones that are missing last_txs)

dualfund: on error, handle different states differently

b78fdc3

depending on the state, we might - forget the channel - drop it to chain - reconnect via dualopend

dualfund: report on whether or not we've gotten commitments

315edae

We need to keep track of if we've gotten the last negotiation's commitment sigs, for reconnect logic (helps us know what messages to send in the reconnect case)

dualfund: when dropping to chain, only drop if we have a commitment tx

2eca4b1

You can't publish a tx you don't have!

listpeerchannels: only add the scratch_txid if it exists

0c8d669

Changelog-Changed: RPC `listpeerchannels`.`inflights` may sometimes not include `scratch_txid` (mandatory -> optional)

dualfund, cleanup: make method for reporting channel state to HSMD

dcc2d7d

We're going to need to reuse this for reconnect; make the method standalone in that it can figure out what to send to HSMD independent of where it's located in the setup call flow.

dualfund, cleanup: move common remote commit tx code into single place

6d0638d

Let's make it easier to build remote commitments (we're going to need this for reconnects soon!)

dualfund, cleanup: reuse code for verifying peer's commitment sigs

9b83ea9

Move common code for verifying a commitment sig from peer into one place. On reconnects, we'll need to verify peer's commitments. Changelog-None.

dualfund, nit: make method for "their_role"

03282eb

A bit gratuitous, but it's a bit cleaner on a whole?

dualfund: handle commitment signed

50e386a

If we get a commitment-signed message from a peer, outside of a normal flow, process it! We're about to send these during reconnect, so we need to be able to handle them!

mfc, nit: print out the error reason when an open fails

94150ed

Makes it easier to see why things are failing in the logs.

niftynei force-pushed the nifty/df-retries branch from ec4b34e to 2866b4e Compare November 1, 2023 23:21

rustyrussell reviewed Nov 2, 2023

View reviewed changes

niftynei added 9 commits November 1, 2023 20:41

tests: update opening tests for new reconnect behavior

1bd6fbb

Let's test that things stay together! One cool thing to note is that now we sort of "magically" recover from pretty brutal disconnects! Very nice!

dualfund, tests: break out "peer forgets" test

26c4dd3

Now that we save the commitment sigs immediately, we have to drop the connection elsewhere in the flow to get the state where only one peer remembers.

dualfund, bump: when bumping a channel make sure it's in ok state

f6b126e

If we disconnect, we lose the open_attempt record. Which is fine, but we should prevent the user from starting another RBF if the last one isn't done yet!

dualfund, test: add test for dropping to chain during RBF

e435d2f

Here we make sure we can drop the initial tx to chain, and that an inflight txid that's missing its commitment sigs is properly ignored.

dualfund: add test to make sure that tx-sigs sent before commitment

63e1e44

results in an error.

niftynei force-pushed the nifty/df-retries branch from 2866b4e to cd5ccd5 Compare November 2, 2023 02:29

rustyrussell approved these changes Nov 2, 2023

View reviewed changes

rustyrussell merged commit 3190c26 into ElementsProject:master Nov 2, 2023
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `next_funding_txid` awareness for dual-fund opens #6824

Add `next_funding_txid` awareness for dual-fund opens #6824

niftynei commented Oct 27, 2023 •

edited

Loading

vincenzopalazzo commented Oct 27, 2023

niftynei commented Oct 27, 2023

rustyrussell Oct 30, 2023

rustyrussell Oct 30, 2023

rustyrussell Oct 30, 2023

rustyrussell Oct 30, 2023

niftynei Oct 31, 2023

rustyrussell Nov 2, 2023

rustyrussell Oct 30, 2023

niftynei Oct 30, 2023

rustyrussell Oct 30, 2023

rustyrussell Oct 30, 2023

niftynei Oct 31, 2023

rustyrussell commented Oct 30, 2023

t-bast commented Oct 30, 2023

niftynei commented Oct 31, 2023 •

edited

Loading

cdecker Oct 31, 2023

niftynei Oct 31, 2023

rustyrussell Nov 2, 2023

t-bast commented Oct 31, 2023

niftynei commented Nov 1, 2023

rustyrussell left a comment

rustyrussell Nov 2, 2023

rustyrussell Nov 2, 2023

rustyrussell Nov 2, 2023

rustyrussell Nov 2, 2023

rustyrussell commented Nov 2, 2023

t-bast commented Nov 2, 2023

		/* we check validity in dualopend, this is a sanity check */
		assert(bitcoin_txid_eq(&txid, &inflight_txid));

		static char do_reconnect_commit_sigs(const struct state state,
		const struct tx_state *tx_state)

Add next_funding_txid awareness for dual-fund opens #6824

Add next_funding_txid awareness for dual-fund opens #6824

Conversation

niftynei commented Oct 27, 2023 • edited Loading

vincenzopalazzo commented Oct 27, 2023

niftynei commented Oct 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rustyrussell commented Oct 30, 2023

t-bast commented Oct 30, 2023

commit_sig sent but not received

commit_sig received but tx_signatures not received

tx_signatures received by only one node

RBF attempt forgotten by cln but not eclair

RBF attempt partially signed

crash on tx_signatures

niftynei commented Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

t-bast commented Oct 31, 2023

niftynei commented Nov 1, 2023

rustyrussell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rustyrussell commented Nov 2, 2023

t-bast commented Nov 2, 2023

Add `next_funding_txid` awareness for dual-fund opens #6824

Add `next_funding_txid` awareness for dual-fund opens #6824

niftynei commented Oct 27, 2023 •

edited

Loading

`commit_sig` sent but not received

`commit_sig` received but `tx_signatures` not received

`tx_signatures` received by only one node

RBF attempt forgotten by `cln` but not `eclair`

crash on `tx_signatures`

niftynei commented Oct 31, 2023 •

edited

Loading