-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple channel support. #5078
Multiple channel support. #5078
Conversation
081c9e3
to
3466058
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After all the pre-work this PR is great. Congrats pulling this one off. I tried my best, but could only find nits, kudos 👍 Now if only CI were as happy as I am xD
ACK 3466058
lightningd/dual_open_control.c
Outdated
channel_state_name(channel)); | ||
|
||
channel = peer_unsaved_channel(peer); | ||
/* FIXME: This is wrong! */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow why this would be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really shouldn't be re-using an unsaved channel here: we should always create a new one I think.
a130ed9
to
7f79f90
Compare
... more flake fixes... |
6748fda
to
f7b3841
Compare
More flakes, and removed an overzealous BROKEN msg. |
f7b3841
to
3b1d5a0
Compare
Rebased, and fixed (actually tested!) test cases. |
ACK 3b1d5a0 |
running this, opened more than one channel to a peer and nothing seems to have caught fire yet. nice! |
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently intuit this by whether there's a subdaemon owning it. But we're about to change the rules and allow connectd to hold idle connections, so we need an explicit flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This happens when we send a warning or lightningd tells us to send a final message then close. Normally io logging is done by the subdaemon that creates it, but this is a special case. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use tmpctx, rather than freeing manually everywhere (proof: next patch added a branch and forgot to free it!). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We still always have 1, but the infrastructure is now in place. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. The notification should be called every time. 2. channel can never be NULL, since it's tested above. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Fixed: JSON-RPC: `connect` notification now called even if we already have a live channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
f5a3ebe
to
25a08d3
Compare
The message from lightningd simply acknowleges that we are allowed to discard the peer (because no subdaemons are talking to it anymore). This difference becomes more stark once connectd holds on to idle peers. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
openingd currently holds the connection to idle peers, but we're about to change that: it will only look after peers which are actively opening a connection. We can start this process by disconnecting whenever we have a negotiation failure. We could stay connected if we wanted to, but that would be up to connectd to decide. Right now it's easier if we disconnect from any idle peer once it's been active. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We would return success from connect even though the peer was closing; this is technically correct but fairly undesirable. Better is to pass every connect attempt to connectd, and have it block if the peer is exiting (and retry), otherwise tell us it's already connected. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This avoids races in our tests where we assume it's sync (and is kind of nicer). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…nnectd. Simpler, and closes a potential race. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Either because lightningd tells us it wants to talk, or because the peer says something about a channel. We also introduce a behavior change: we disconnect after a failed open. We might want to modify this later, but we it's a side-effect of openingd not holding onto idle connections. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't need to hand it to channeld: it will read it! We simply need to tell it to expect it. Similarly, openingd/dualopend will never see it, so remove that logic. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means lightningd needs to create the temporary one and tell it to openingd/dualopend, rather than the other way around. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…eer. Now we always have it (either extracted from an unsolicited message, or told to us by lightningd when it tells us it wants to talk), we can always send it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means doing some wire interpretation, and handling the transient case where we switch from temporary to permenant channel_id, but it's not that bad (and required for accurate demux when multiple channels are involved for a single peer). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We still don't *have* multiple subds per peer, but now we could! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than intuiting whether this is a new channel / active channel, use the channel_id. This simplifies things and makes them explicit, and prepares for multiple live channels per peer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Generally this means converting a lazy "peer_active_channel(peer)" call into an explicit iteration. 1. notify_feerate_change: call all channels (ignores non-active ones anyway). 2. peer_get_owning_subd remove unused function. 3. peer_connected hook: don't save channel, do lookup and iterate channels. 4. In json_setchannelfee "all" remove useless call to peer_active_channel since we check state anyway, and iterate. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Activate means a specific thing now (connectd said something), so avoid confusing it with this function. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Sure, we want to connect (usually) because of an active channel, but it's not specific to the channel itself. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
More efficient to search a known peer than the whole set. Also, move find_channel_by_id() from channel_control.c into channel.c where we'd expect it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is generally verboten now, since there can be multiple. There are a few exceptions: 1. We sometimes want to know if there are *any* active channels. 2. Some dev commands still take peer id when they mean channel_id. 3. We still allow peer id when it's fully determined. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: JSON-RPC: `close` by peer id will fail if there is more than one live channel (use `channel_id` or `short_channel_id` as id arg).
…nels. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Added: Protocol: we now support opening multiple channels with the same peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Otherwise it waits for subds to exit, but they don't. Plus, the others may still be talking! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Update the address and direction as soon as it connects not just when we're about to make it active: we want this even if we don't have an active channel, or if the connect hook rejects it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had some flakes because we returned from `connect`, but we hadn't started subds yet. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We may not see a disconnect instantly: ``` > assert len(l2.rpc.listpeers()['peers']) == 0 E assert 1 == 0 E +1 E -0 ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
... and then dualopend returns, and we access the fread leak_detect struct. ``` lightningd: FATAL SIGNAL 6 (version 065ca1e) 0x55ecd4be8145 send_backtrace common/daemon.c:33 0x55ecd4be81f1 crashdump common/daemon.c:46 0x7f200acab51f ??? ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 0x7f200acff828 __pthread_kill_implementation ./nptl/pthread_kill.c:44 0x7f200acff828 __pthread_kill_internal ./nptl/pthread_kill.c:80 0x7f200acff828 __GI___pthread_kill ./nptl/pthread_kill.c:91 0x7f200acab475 __GI_raise ../sysdeps/posix/raise.c:26 0x7f200ac917b6 __GI_abort ./stdlib/abort.c:79 0x55ecd4c6827f call_error ccan/ccan/tal/tal.c:93 0x55ecd4c68470 check_bounds ccan/ccan/tal/tal.c:165 0x55ecd4c684c2 to_tal_hdr ccan/ccan/tal/tal.c:175 0x55ecd4c68eb8 tal_free ccan/ccan/tal/tal.c:479 0x55ecd4b8bdd0 finish_report lightningd/memdump.c:138 0x55ecd4b8c115 leak_detect_req_done lightningd/memdump.c:201 0x55ecd4c68664 notify ccan/ccan/tal/tal.c:237 0x55ecd4c68b9e del_tree ccan/ccan/tal/tal.c:402 0x55ecd4c68bf3 del_tree ccan/ccan/tal/tal.c:412 0x55ecd4c68bf3 del_tree ccan/ccan/tal/tal.c:412 0x55ecd4c68f43 tal_free ccan/ccan/tal/tal.c:486 0x55ecd4c5751f io_close ccan/ccan/io/io.c:450 0x55ecd4bbce68 subd_shutdown_remaining lightningd/subd.c:911 0x55ecd4b8724a shutdown_subdaemons lightningd/lightningd.c:541 0x55ecd4b883cc main lightningd/lightningd.c:1207 0x7f200ac92fcf __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 0x7f200ac9307c __libc_start_main_impl ../csu/libc-start.c:409 0x55ecd4b5cc54 ??? ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Simply exit, like we do when master daemon_conn exits. ``` Valgrind error file: valgrind-errors.2211908 ==2211908== Invalid read of size 8 ==2211908== at 0x12AC13: daemon_conn_send (daemon_conn.c:137) ==2211908== by 0x113CD9: queue_peer_msg (gossipd.c:118) ==2211908== by 0x11B806: query_channel_range (queries.c:1169) ==2211908== by 0x1250DD: peer_gossip_probe_scids (seeker.c:706) ==2211908== by 0x1253B1: check_firstpeer (seeker.c:788) ==2211908== by 0x1256CA: seeker_check (seeker.c:884) ==2211908== by 0x1366AC: timer_expired (timeout.c:62) ==2211908== by 0x1163D1: main (gossipd.c:1146) ==2211908== Address 0x4cafdf0 is 48 bytes inside a block of size 88 free'd ==2211908== at 0x48460C4: free (vg_replace_malloc.c:872) ==2211908== by 0x1805EA: del_tree (tal.c:421) ==2211908== by 0x1808BE: tal_free (tal.c:486) ==2211908== by 0x12AB25: destroy_dc_from_conn (daemon_conn.c:112) ==2211908== by 0x17FFDF: notify (tal.c:237) ==2211908== by 0x180519: del_tree (tal.c:402) ==2211908== by 0x1808BE: tal_free (tal.c:486) ==2211908== by 0x16EE9A: io_close (io.c:450) ==2211908== by 0x16ECA9: do_plan (io.c:401) ==2211908== by 0x16ED16: io_ready (io.c:417) ==2211908== by 0x1710B2: io_loop (poll.c:453) ==2211908== by 0x1163C5: main (gossipd.c:1144) ==2211908== Block was alloc'd at ==2211908== at 0x484384F: malloc (vg_replace_malloc.c:381) ==2211908== by 0x180064: allocate (tal.c:250) ==2211908== by 0x180634: tal_alloc_ (tal.c:428) ==2211908== by 0x12AB65: daemon_conn_new_ (daemon_conn.c:122) ==2211908== by 0x1155F4: gossip_init (gossipd.c:763) ==2211908== by 0x116014: recv_req (gossipd.c:999) ==2211908== by 0x12A828: handle_read (daemon_conn.c:31) ==2211908== by 0x16E09F: next_plan (io.c:59) ==2211908== by 0x16ECD4: do_plan (io.c:407) ==2211908== by 0x16ED16: io_ready (io.c:417) ==2211908== by 0x1710B2: io_loop (poll.c:453) ==2211908== by 0x1163C5: main (gossipd.c:1144) ==2211908== ```
Breaks when there are no peers: ``` > _fundchannel(l1, l2, amt_normal, None) tests/test_connection.py:1564: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ tests/test_connection.py:1535: in _fundchannel wait_for(lambda: not has_normal_channels(l2, l1)) contrib/pyln-testing/pyln/testing/utils.py:88: in wait_for while not success(): tests/test_connection.py:1535: in <lambda> wait_for(lambda: not has_normal_channels(l2, l1)) tests/test_connection.py:1527: in has_normal_channels for c in only_one(l1.rpc.listpeers(l2.info['id'])['peers'])['channels']]) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ arr = [] def only_one(arr): """Many JSON RPC calls return an array; often we only expect a single entry """ > assert len(arr) == 1 E AssertionError ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make sure it sees disconnect before reconnect, otherwise the next command fails since we're now disconnected. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This avoids reuse which can cause confusion; the long-term fix is to rewrite this to use real channels like dualfunding does. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This was missed in e8d2176. ``` > raise ValueError(str(errors)) E ValueError: E Node errors: E - lightningd-2: had bad gossip messages E - lightningd-3: had bad gossip messages E Global errors: contrib/pyln-testing/pyln/testing/fixtures.py:201: ValueError ... 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-gossipd: Ignoring future channel_announcment for 105x1x2 (current block 104) 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 105x1x2/0 ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
25a08d3
to
3a56366
Compare
Rebase on new master with setchannel support... |
May still need some love, but it Works For Me!