Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add next_funding_txid awareness for dual-fund opens #6824

Merged
merged 37 commits into from
Nov 2, 2023

Conversation

niftynei
Copy link
Contributor

@niftynei niftynei commented Oct 27, 2023

This commit reworks how reconnects work for dual-funding/v2 channel opens.

We migrate to using the funding_next_txid protocol that @t-bast developed (see lightning/bolts@cd3c99e). It works very nicely!

PR also includes some cleanups and tidying in the dualopend.c daemon.

Short overview of changes:

  • We now create channel_inflights with blank/empty last_tx (the commitment tx field).
  • We split up saving a channel and saving the commitments for a channel into two inter-daemon calls (dualopend_commit_ready + dualopend_commit_rcvd). commit_ready is called (as the spec requires) before we send our commitment_signed. commit_rcvd is called after we've received the peer's commitment_signed.
  • We use the next_funding_txid sent by a peer to resend them the requested commitment_signed + tx_signatures, as requested/required.
  • We can handle receiving commitment_signed out of order (prior to this we'd fail since we'd only accept a commitment_signed in a specific part of the initial channel fow).
  • Some other general cleanups, tidying.
  • Updated the tests to make sure reconnects work for RBFs + initial setup!

It looks like a lot of commits, but they should be fairly easy to review? Here's hoping any way.

@niftynei niftynei added spec v1.1 things for spec v1.1 dualfunding labels Oct 27, 2023
@niftynei niftynei added this to the v23.11 milestone Oct 27, 2023
@vincenzopalazzo vincenzopalazzo self-requested a review October 27, 2023 21:46
@vincenzopalazzo
Copy link
Contributor

We migrate to using the funding_next_txid protocol

There is a description of this protocol somewhere? or the BOLT is already describing this proposal?

@niftynei
Copy link
Contributor Author

There is a description of this protocol somewhere? or the BOLT is already describing this proposal?

lightning/bolts@cd3c99e

wallet/wallet.h Outdated
Comment on lines 316 to 270
/* We save channels that aren't yet committed :( */
BUILD_ASSERT(DUALOPEND_OPEN_INIT == 1);
return s;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely not. Not again.

You need a new state, DUALOPEND_OPEN_COMMITTED. When you add it, the compiler will tell you everywhere you have to change it.

Comment on lines 263 to 268
bool too_early_v2(const struct channel *channel)
{
return channel->state == DUALOPEND_OPEN_INIT
&& !channel->last_tx;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is exactly the kind of vague nomenclature I had to clean up. If you add a new state, you don't need this.

Comment on lines 1338 to 1339
/* we check validity in dualopend, this is a sanity check */
assert(bitcoin_txid_eq(&txid, &inflight_txid));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a rule (loosely followed!) we prefer to fail the channel if a subd does something weird, rahter than kill lightningd.

Comment on lines +2702 to +2759
/* Only after we've updated/saved our psbt do we check
* for peer connected */
if (!channel->owner)
return command_fail(cmd, FUNDING_PEER_NOT_CONNECTED,
"Peer not connected");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!!WARNING!! This will be messy after rebase, since now in master we use param_check! So if we are simply running the check RPC, we still want to do this check.

I'm not sure how to handle this in general. I mean, the command still does something, though it "fails": that just seems... weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it's weird that we save incoming data to disk, but it makes fundchannel work really nicely across reconnects so unsure if that's the worst thing??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree that it's best from where we are, but worth thinking for future.

In theory there are two things going on: saving the info, and telling the peer. This is a bit like close, so, you could argue we shouldn't actually here, or even block until peer is told.

openingd/dualopend.c Show resolved Hide resolved
openingd/dualopend.c Show resolved Hide resolved
return NULL;
}
tal_free(wscript);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit is just weird... the only danger with tmpctx is when you've got an inner io_loop, which is already an anti-pattern, and there's not one here that I can see?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, i'll drop it!

Comment on lines 1108 to 1155
static char *do_reconnect_commit_sigs(const struct state *state,
const struct tx_state *tx_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you allocate for the caller, you must use the ctx pattern...

Comment on lines 3926 to 3983
static void do_reconnect_dance(struct state *state)
static bool do_reconnect_dance(struct state *state)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you did this because your previous commit did return false and that didn't compile? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's b/c i wanted reconnected to be false if we aborted; regardless i'm getting rid of the idea of returning a bool entirely...

@rustyrussell
Copy link
Contributor

Also, valgrind failures in test_multifunding_v1_v2_mixed:

Valgrind error file: valgrind-errors.25972
==25972== Uninitialised byte(s) found during client check request
==25972==    at 0x2019D3: memcheck_ (mem.h:247)
==25972==    by 0x201DBA: db_bind_blob (bindings.c:94)
==25972==    by 0x2023EC: db_bind_signature (bindings.c:206)
==25972==    by 0x1BED18: wallet_channel_save (wallet.c:2202)
==25972==    by 0x1BFD2C: wallet_channel_insert (wallet.c:2482)
==25972==    by 0x145AB3: wallet_commit_channel (dual_open_control.c:1472)
==25972==    by 0x14ADF9: handle_commit_ready (dual_open_control.c:3375)
==25972==    by 0x14B59C: dual_opend_msg (dual_open_control.c:3541)
==25972==    by 0x1A8850: sd_msg_read (subd.c:555)
==25972==    by 0x37A160: next_plan (io.c:59)
==25972==    by 0x37AD95: do_plan (io.c:407)
==25972==    by 0x37ADD7: io_ready (io.c:417)
==25972==  Address 0x6cf0ea8 is 40 bytes inside a block of size 104 alloc'd
==25972==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==25972==    by 0x38CAFE: allocate (tal.c:256)
==25972==    by 0x38D179: tal_alloc_ (tal.c:463)
==25972==    by 0x38D345: tal_alloc_arr_ (tal.c:506)
==25972==    by 0x202385: db_bind_signature (bindings.c:202)
==25972==    by 0x1BED18: wallet_channel_save (wallet.c:2202)
==25972==    by 0x1BFD2C: wallet_channel_insert (wallet.c:2482)
==25972==    by 0x145AB3: wallet_commit_channel (dual_open_control.c:1472)
==25972==    by 0x14ADF9: handle_commit_ready (dual_open_control.c:3375)
==25972==    by 0x14B59C: dual_opend_msg (dual_open_control.c:3541)
==25972==    by 0x1A8850: sd_msg_read (subd.c:555)
==25972==    by 0x37A160: next_plan (io.c:59)

@t-bast
Copy link

t-bast commented Oct 30, 2023

Nice work, I just tested that against eclair, and here is the report (there are a few things to fix, but otherwise it's looking good and we've understood the spec the same way).

commit_sig sent but not received

This test fails: when cln is not the channel initiator, it waits for the other node to send commit_sig before sending its own commit_sig. There is no reason to do that, both nodes should send commit_sig immediately after exchanging tx_complete? Otherwise it's a missed opportunity to finalize the channel creation on reconnection, because in that case cln hasn't saved the channel and fails it on reconnection.

commit_sig received but tx_signatures not received

This test works, with one minor issue: on reconnection, cln sends its tx_signatures twice (duplicate?).

tx_signatures received by only one node

OK

RBF attempt forgotten by cln but not eclair

OK -> on reconnection, cln correctly sends tx_abort which lets eclair forget this RBF attempt.

RBF attempt partially signed

OK -> on reconnection, commit_sig and tx_signatures are correctly retransmitted.

crash on tx_signatures

If cln receives tx_signatures before commit_sig, it crashes the whole node: it should instead just send an error for that channel.

@niftynei
Copy link
Contributor Author

niftynei commented Oct 31, 2023

Ok this still has some bugs + I'm expecting a few tests to fail but:

cln sends its tx_signatures twice (duplicate?).

Nice catch. This was causing a crash in our tests also; knowing to look for this as a root cause was very helpful. Thank you! I've patched it not to do this.
3c8cf20

This test fails: when cln is not the channel initiator, it waits for the other node to send commit_sig before sending its own commit_sig

Oh no you caught me red-handed. 🙈 I did wonder if you'd catch me doing this... seems like the answer was yes! It was a quick fix 80e8503, should send ASAP for both sides now :)

If cln receives tx_signatures before commit_sig, it crashes the whole node: it should instead just send an error for that channel.

I think this has been patched (there was a crash happening that I patched that think explains this), but haven't tested conclusively.

Also, valgrind failures in test_multifunding_v1_v2_mixed:

Tested + patched with the latest updates.

Thanks @rustyrussell + @t-bast for the extremely fast review!!

@@ -195,6 +195,7 @@
"CLOSINGD_SIGEXCHANGE": 4,
"DUALOPEND_AWAITING_LOCKIN": 10,
"DUALOPEND_OPEN_COMMITTED": 12,
"DUALOPEND_OPEN_COMMIT_READY": 13,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful with this: the enum numeric values are stored in the DB, so adding to the end is the right thing to do here (not to reorder states, causing faulty loads from DB), however IIRC there are places where we check for sets of states by using < and > on the numeric values. This would obviously fail.

Can we have a quick check through the code to see if there are numeric comparisons in the code on values for this enum?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc those have all been updated to be comprehensive swtich statements in a clean up Rusty did earlier this year, but I'll do a sweep to double check!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good news: you cannot get this wrong, as we have channel_state_in_db() which indeed, is patched correctly, AND I removed all those numeric comparisons last release for exactly this reason (when the new splicing state was added, I had the same concerns!).

@t-bast
Copy link

t-bast commented Oct 31, 2023

Nice catch. This was causing a crash in our tests also; knowing to look for this as a root cause was very helpful. Thank you! I've patched it not to do this.

Confirmed, this is now working fine 👍

Oh no you caught me red-handed. 🙈 I did wonder if you'd catch me doing this... seems like the answer was yes! It was a quick fix 80e8503, should send ASAP for both sides now :)

Heh, did you think you could trick my testing skills?

This is indeed fixed, cln now immediately sends commit_sig! However, it seems like cln is not storing the channel at that point. If we disconnect before cln receives eclair's commit_sig and reconnect, cln has forgotten the channel. It should remember it, we should be able to finalize the signing flow at that point.

I think this has been patched (there was a crash happening that I patched that think explains this), but haven't tested conclusively.

I'm still seeing that crash, with the following logs:

spenderp: FATAL SIGNAL 11 (version v23.08.1-404-g62ff475-modded)
0x559836dc98ba send_backtrace
	common/daemon.c:33
0x559836dc9951 crashdump
	common/daemon.c:75
0x7f37f42c351f ???
	./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x7f37f441ac92 ???
	../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:83
0x559836db7760 bitcoin_txid_eq
	./bitcoin/tx.h:29
0x559836db7760 collect_sigs
	plugins/spender/openchannel.c:509
0x559836db81de check_sigs_ready
	plugins/spender/openchannel.c:531
0x559836db84dd json_peer_sigs
	plugins/spender/openchannel.c:611
0x559836dbcad7 ld_command_handle
	plugins/libplugin.c:1611
0x559836dbcd9d ld_read_json_one
	plugins/libplugin.c:1721
0x559836dbce29 ld_read_json
	plugins/libplugin.c:1741
0x559836ef3bff next_plan
	ccan/ccan/io/io.c:59
0x559836ef40da do_plan
	ccan/ccan/io/io.c:407
0x559836ef4177 io_ready
	ccan/ccan/io/io.c:417
0x559836ef5b14 io_loop
	ccan/ccan/io/poll.c:453
0x559836dbd48d plugin_main
	plugins/libplugin.c:1948
0x559836db22bf main
	plugins/spender/main.c:35
0x7f37f42aad8f __libc_start_call_main
	../sysdeps/nptl/libc_start_call_main.h:58
0x7f37f42aae3f __libc_start_main_impl
	../csu/libc-start.c:392
0x559836da3774 ???
	???:0
0xffffffffffffffff ???
	???:0
2023-10-31T15:15:57.458Z INFO    plugin-spenderp: Killing plugin: exited during normal operation
2023-10-31T15:15:57.458Z **BROKEN** plugin-spenderp: Plugin marked as important, shutting down lightningd!
2023-10-31T15:15:57.458Z DEBUG   lightningd: io_break: lightningd_exit
2023-10-31T15:15:57.458Z DEBUG   lightningd: io_loop_with_timers: main
2023-10-31T15:15:57.458Z DEBUG   connectd: REPLY WIRE_CONNECTD_START_SHUTDOWN_REPLY with 0 fds
2023-10-31T15:15:57.458Z DEBUG   lightningd: io_break: connectd_start_shutdown_reply
2023-10-31T15:15:57.458Z DEBUG   021ccce7bc396996c8f3b7bfeb1e30c6600269517026a74adfe2217b7187879797-dualopend-chan#1: Status closed, but not exited. Killing
2023-10-31T15:15:57.458Z DEBUG   lightningd: Command returned result after jcon close
2023-10-31T15:15:57.458Z INFO    021ccce7bc396996c8f3b7bfeb1e30c6600269517026a74adfe2217b7187879797-chan#1: Unsaved peer failed. Deleting channel.
2023-10-31T15:15:57.464Z DEBUG   lightningd: io_break: destroy_plugin
2023-10-31T15:15:57.464Z DEBUG   connectd: Shutting down
2023-10-31T15:15:57.464Z DEBUG   gossipd: Shutting down
2023-10-31T15:15:57.464Z DEBUG   hsmd: Shutting down

@niftynei niftynei force-pushed the nifty/df-retries branch 2 times, most recently from c332489 to 37fa464 Compare November 1, 2023 02:51
@niftynei
Copy link
Contributor Author

niftynei commented Nov 1, 2023

me, friday: this is a cute, well written, <1k line patch
me today: i am exhausted; why is this patch now 1.7k lines

From the spec:

	Once peers are ready to exchange commitment signatures, they must remember
	the details of the funding transaction to allow resuming the signatures
	exchange if a disconnection happens.

Basically this means we add channels to the database before we've gotten
commitments for them; it's nice that there's now a state for commitments
recevied but we now save the channel prior to that.

This commit makes it possible to track the pre-commit-rcvd but not quite
open-init state.
What if the last_tx is empty for the channel?

We're about to let the channels not have last_txs at start.
We're going to add the commitment transaction data at a different time
than when we init a new inflight. Split them up!
we now save them before we get the commitment data.
If we get an error on a channel that doesn't have commitments yet,
we can just delete it.
Here, we split up what was "commit_received" into two phases:
	- commit-ready, where we're about to send our commitment tx to
	  peer
	- commit-received, when we've gotten the commitment tx from our
          peer

This lets us do the right thing (as far as the spec is concerned) with
returning the correct 'next_funding_txid' on reconnect (later commits).
We don't actually use this internal to this method? Weird.

Anyway, if we don't want/need it allow the caller to signal that by
passing in NULL, if desired.
When we reconnect, if we get a note from the peer that they dont know
about a pending inflight, we need to be able to clean it up so we can
restart/re-negotiate a new RBF etc.

This adds a cleanup method to remove any inflights for a channel without
a last_tx (commitment tx)
If you resend us a commitment tx, and we already have one, we check that
it's correct!
depending on the state, we might
- forget the channel
- drop it to chain
- reconnect via dualopend
We need to keep track of if we've gotten the last negotiation's
commitment sigs, for reconnect logic (helps us know what messages to
send in the reconnect case)
Changelog-Changed: RPC `listpeerchannels`.`inflights` may sometimes not include `scratch_txid` (mandatory -> optional)
We're going to need to reuse this for reconnect; make the method
standalone in that it can figure out what to send to HSMD independent of
where it's located in the setup call flow.
Let's make it easier to build remote commitments (we're going to need
this for reconnects soon!)
Move common code for verifying a commitment sig from peer into one
place.

On reconnects, we'll need to verify peer's commitments.

Changelog-None.
A bit gratuitous, but it's a bit cleaner on a whole?
If we get a commitment-signed message from a peer, outside of a normal
flow, process it!

We're about to send these during reconnect, so we need to be able to
handle them!
If you get the right series of disconnects, it's possible for your peer
to send you a tx-sigs even though the current state of the channel open
is that you've seen the funding open on chain (your channel_ready[LOCAL]
= true)

In this case, if we haven't marked that we've seen the tx sigs yet,
we go ahead and mark them as seen and just ignore this tx-sigs msg.
In the case where you're echoing back a tx-abort, just let it through.

Not doing this causes problems in the case where your node has forgotten
about an in-progress open.

This fixes the following problem:

- you send a tx-abort (even tho you have marked tx-sigs as received)
- peer echos it back (we echo back tx-aborts always)
- you throw an error because you're already in a tx-abort unallowed
  state

In this commit, we allow for echos to come thru no matter our current state and
this fixes things/makes them work as expected.
Here we conform to the specification, which requires that we handle
next-funding-id in a specific way.

Note that we were already sending it, but now we actually correctly
handle its presence.

Changelog-Changed: Spec: dual-funding now follows the next-funding-id rules.
Makes it easier to see why things are failing in the logs.
Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments only, on the last dev_disconnect hack.

@@ -195,6 +195,7 @@
"CLOSINGD_SIGEXCHANGE": 4,
"DUALOPEND_AWAITING_LOCKIN": 10,
"DUALOPEND_OPEN_COMMITTED": 12,
"DUALOPEND_OPEN_COMMIT_READY": 13,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good news: you cannot get this wrong, as we have channel_state_in_db() which indeed, is patched correctly, AND I removed all those numeric comparisons last release for exactly this reason (when the new splicing state was added, I had the same concerns!).

Comment on lines +2702 to +2759
/* Only after we've updated/saved our psbt do we check
* for peer connected */
if (!channel->owner)
return command_fail(cmd, FUNDING_PEER_NOT_CONNECTED,
"Peer not connected");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree that it's best from where we are, but worth thinking for future.

In theory there are two things going on: saving the info, and telling the peer. This is a bit like close, so, you could argue we shouldn't actually here, or even block until peer is told.

Comment on lines +3524 to +3542
if (!updated) {
log_info(channel->log, "Already had sigs, skipping notif");
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!!Thanks!! 🧡 I saw this intermittantly and couldn't figure out what was happening!

Comment on lines 1116 to 1118
case DEV_DISCONNECT_DROP:
/* try again? */
return read_hdr_from_peer(peer_conn, peer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strictly impossible, since our connections always send in order. So now we ignore a msg, that contract is broken. You're using it in a very specific way, so that's OK, but "don't use this" is probably needed.

Can you do it on the send side (i.e. don't send this msg)? Would be significant code reduction...

@rustyrussell
Copy link
Contributor

I just want to say that this code was a delight to review. 🧡 !

Let's test that things stay together!

One cool thing to note is that now we sort of "magically" recover from
pretty brutal disconnects!

Very nice!
When we got our peer's sigs, if we were the remote, we would re-notify
the plugin, which in turn would re-send the tx-sigs to use.

In the case of CLN, we'd then
- break, because we'd re-forward the sigs to the `openchannel` plugin,
  which was then in the wrong state (MULTIFUNDCHANNEL_SIGNED)

    spenderp: plugins/spender/openchannel.c:598: json_peer_sigs: Assertion `dest->state == MULTIFUNDCHANNEL_SECURED' failed.
    spenderp: FATAL SIGNAL 6 (version 5880d59-modded)

In the case of eclair, they'd just see our 2nd TX_SIGS message and
@t-bast would complain:

	> This test works, with one minor issue: on reconnection, cln sends its tx_signatures twice (duplicate?).

This commit does two things:
	- has the openchannel / spender plugin log a broken instead of
	  crashing when the state is not what we're expecting
	- stops us from calling the `funder` plugin if this is a
	  replay/second receipt of commit-sigs.
Originally the accepter waited for the peer to send us their commitment
sigs before we send ours; this changes things so that the accepter
sends their commitment sigs ASAP.

	This test fails: when cln is not the channel initiator, it waits for the other node to send commit_sig before sending its own commit_sig. There is no reason to do that, both nodes should send commit_sig immediately after exchanging tx_complete? Otherwise it's a missed opportunity to finalize the channel creation on reconnection, because in that case cln hasn't saved the channel and fails it on reconnection.

Reported-By: @t-bast
Now that we save the commitment sigs immediately, we have to drop the
connection elsewhere in the flow to get the state where only one peer
remembers.
We don't let go of the `msg` on error, which triggers a memleak warning!

lightningd-2 2023-10-31T19:54:06.582Z **BROKEN** lightningd: MEMLEAK: 0x55ae3615b498
lightningd-2 2023-10-31T19:54:06.582Z **BROKEN** lightningd:   label=openingd/dualopend_wiregen.c:919:u8[]
lightningd-2 2023-10-31T19:54:06.582Z **BROKEN** lightningd:   alloc:
lightningd-2 2023-10-31T19:54:06.685Z **BROKEN** lightningd:     ccan/ccan/tal/tal.c:477 (tal_alloc_)
lightningd-2 2023-10-31T19:54:06.686Z **BROKEN** lightningd:     ccan/ccan/tal/tal.c:506 (tal_alloc_arr_)
lightningd-2 2023-10-31T19:54:06.686Z **BROKEN** lightningd:     openingd/dualopend_wiregen.c:919 (towire_dualopend_send_tx_sigs)
lightningd-2 2023-10-31T19:54:06.686Z **BROKEN** lightningd:     lightningd/dual_open_control.c:1122 (openchannel2_sign_hook_cb)
lightningd-2 2023-10-31T19:54:06.686Z **BROKEN** lightningd:     lightningd/plugin_hook.c:194 (plugin_hook_call_next)
lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd:     lightningd/plugin_hook.c:169 (plugin_hook_callback)
lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd:     lightningd/plugin.c:660 (plugin_response_handle)
lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd:     lightningd/plugin.c:772 (plugin_read_json_one)
lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd:     lightningd/plugin.c:823 (plugin_read_json)
lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd:     ccan/ccan/io/io.c:59 (next_plan)
lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd:     ccan/ccan/io/io.c:407 (do_plan)
lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd:     ccan/ccan/io/io.c:417 (io_ready)
lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd:     ccan/ccan/io/poll.c:453 (io_loop)
lightningd-2 2023-10-31T19:54:06.687Z **BROKEN** lightningd:     lightningd/io_loop_with_timers.c:22 (io_loop_with_timers)
lightningd-2 2023-10-31T19:54:06.688Z **BROKEN** lightningd:     lightningd/lightningd.c:1333 (main)
lightningd-2 2023-10-31T19:54:06.688Z **BROKEN** lightningd:     ../sysdeps/nptl/libc_start_call_main.h:58 (__libc_start_call_main)
lightningd-2 2023-10-31T19:54:06.688Z **BROKEN** lightningd:     ../csu/libc-start.c:392 (__libc_start_main_impl)
lightningd-2 2023-10-31T19:54:06.688Z **BROKEN** lightningd:   parents:
If we disconnect, we lose the open_attempt record. Which is fine, but we
should prevent the user from starting another RBF if the last one isn't
done yet!
Here we make sure we can drop the initial tx to chain, and that an
inflight txid that's missing its commitment sigs is properly ignored.
We weren't blocking if the tx-sigs arrived before the commitment sigs.

This was causing problems in the openchannel (spender plugin)

spenderp: FATAL SIGNAL 11 (version v23.08.1-404-g62ff475-modded)
0x559836dc98ba send_backtrace
	common/daemon.c:33
0x559836dc9951 crashdump
	common/daemon.c:75
0x7f37f42c351f ???
	./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x7f37f441ac92 ???
	../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:83
0x559836db7760 bitcoin_txid_eq
	./bitcoin/tx.h:29
0x559836db7760 collect_sigs
	plugins/spender/openchannel.c:509
0x559836db81de check_sigs_ready
	plugins/spender/openchannel.c:531
0x559836db84dd json_peer_sigs
	plugins/spender/openchannel.c:611
0x559836dbcad7 ld_command_handle
	plugins/libplugin.c:1611
0x559836dbcd9d ld_read_json_one
	plugins/libplugin.c:1721
0x559836dbce29 ld_read_json
	plugins/libplugin.c:1741
0x559836ef3bff next_plan
	ccan/ccan/io/io.c:59
0x559836ef40da do_plan
	ccan/ccan/io/io.c:407
0x559836ef4177 io_ready
	ccan/ccan/io/io.c:417
0x559836ef5b14 io_loop
	ccan/ccan/io/poll.c:453
0x559836dbd48d plugin_main
	plugins/libplugin.c:1948
0x559836db22bf main
	plugins/spender/main.c:35
0x7f37f42aad8f __libc_start_call_main
	../sysdeps/nptl/libc_start_call_main.h:58
0x7f37f42aae3f __libc_start_main_impl
	../csu/libc-start.c:392
0x559836da3774 ???
	???:0
0xffffffffffffffff ???
	???:0
2023-10-31T15:15:57.458Z INFO    plugin-spenderp: Killing plugin: exited during normal operation
2023-10-31T15:15:57.458Z **BROKEN** plugin-spenderp: Plugin marked as important, shutting down lightningd!
2023-10-31T15:15:57.458Z DEBUG   lightningd: io_break: lightningd_exit
2023-10-31T15:15:57.458Z DEBUG   lightningd: io_loop_with_timers: main
2023-10-31T15:15:57.458Z DEBUG   connectd: REPLY WIRE_CONNECTD_START_SHUTDOWN_REPLY with 0 fds
2023-10-31T15:15:57.458Z DEBUG   lightningd: io_break: connectd_start_shutdown_reply
2023-10-31T15:15:57.458Z DEBUG   021ccce7bc396996c8f3b7bfeb1e30c6600269517026a74adfe2217b7187879797-dualopend-chan#1: Status closed, but not exited. Killing
2023-10-31T15:15:57.458Z DEBUG   lightningd: Command returned result after jcon close
2023-10-31T15:15:57.458Z INFO    021ccce7bc396996c8f3b7bfeb1e30c6600269517026a74adfe2217b7187879797-chan#1: Unsaved peer failed. Deleting channel.
2023-10-31T15:15:57.464Z DEBUG   lightningd: io_break: destroy_plugin
2023-10-31T15:15:57.464Z DEBUG   connectd: Shutting down
2023-10-31T15:15:57.464Z DEBUG   gossipd: Shutting down
2023-10-31T15:15:57.464Z DEBUG   hsmd: Shutting down

Reported-By: @t-bast
@rustyrussell rustyrussell merged commit 3190c26 into ElementsProject:master Nov 2, 2023
38 checks passed
@t-bast
Copy link

t-bast commented Nov 2, 2023

I ran tests again master, and everything is looking good now 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dualfunding spec v1.1 things for spec v1.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants