Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple channel support. #5078

Merged
merged 40 commits into from
Mar 23, 2022

Conversation

rustyrussell
Copy link
Contributor

May still need some love, but it Works For Me!

@rustyrussell rustyrussell added this to the v0.11 milestone Mar 9, 2022
@rustyrussell rustyrussell requested a review from cdecker as a code owner March 9, 2022 04:33
@rustyrussell rustyrussell force-pushed the guilt/multichannel branch 2 times, most recently from 081c9e3 to 3466058 Compare March 10, 2022 04:33
Copy link
Member

@cdecker cdecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After all the pre-work this PR is great. Congrats pulling this one off. I tried my best, but could only find nits, kudos 👍 Now if only CI were as happy as I am xD

ACK 3466058

channel_state_name(channel));

channel = peer_unsaved_channel(peer);
/* FIXME: This is wrong! */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow why this would be wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really shouldn't be re-using an unsaved channel here: we should always create a new one I think.

lightningd/peer_control.c Outdated Show resolved Hide resolved
lightningd/peer_htlcs.c Outdated Show resolved Hide resolved
lightningd/peer_htlcs.c Show resolved Hide resolved
@rustyrussell rustyrussell force-pushed the guilt/multichannel branch 4 times, most recently from a130ed9 to 7f79f90 Compare March 16, 2022 02:51
@rustyrussell
Copy link
Contributor Author

... more flake fixes...

@rustyrussell
Copy link
Contributor Author

More flakes, and removed an overzealous BROKEN msg.

@rustyrussell
Copy link
Contributor Author

Rebased, and fixed (actually tested!) test cases.

@cdecker
Copy link
Member

cdecker commented Mar 21, 2022

ACK 3b1d5a0

@jb55
Copy link
Collaborator

jb55 commented Mar 21, 2022

running this, opened more than one channel to a peer and nothing seems to have caught fire yet. nice!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently intuit this by whether there's a subdaemon owning it.
But we're about to change the rules and allow connectd to hold idle
connections, so we need an explicit flag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This happens when we send a warning or lightningd tells us to send a
final message then close.  Normally io logging is done by the
subdaemon that creates it, but this is a special case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use tmpctx, rather than freeing manually everywhere (proof: next patch
added a branch and forgot to free it!).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We still always have 1, but the infrastructure is now in place.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. The notification should be called every time.
2. channel can never be NULL, since it's tested above.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: JSON-RPC: `connect` notification now called even if we already have a live channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The message from lightningd simply acknowleges that we are allowed to
discard the peer (because no subdaemons are talking to it anymore).
This difference becomes more stark once connectd holds on to idle
peers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
openingd currently holds the connection to idle peers, but we're about
to change that: it will only look after peers which are actively
opening a connection.  We can start this process by disconnecting
whenever we have a negotiation failure.

We could stay connected if we wanted to, but that would be up to
connectd to decide.  Right now it's easier if we disconnect from any
idle peer once it's been active.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We would return success from connect even though the peer was closing;
this is technically correct but fairly undesirable.  Better is to pass
every connect attempt to connectd, and have it block if the peer is
exiting (and retry), otherwise tell us it's already connected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This avoids races in our tests where we assume it's sync (and is kind
of nicer).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…nnectd.

Simpler, and closes a potential race.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Either because lightningd tells us it wants to talk, or because the peer
says something about a channel.

We also introduce a behavior change: we disconnect after a failed open.
We might want to modify this later, but we it's a side-effect of openingd
not holding onto idle connections.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't need to hand it to channeld: it will read it!  We simply
need to tell it to expect it.

Similarly, openingd/dualopend will never see it, so remove that logic.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means lightningd needs to create the temporary one and tell it to
openingd/dualopend, rather than the other way around.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…eer.

Now we always have it (either extracted from an unsolicited message,
or told to us by lightningd when it tells us it wants to talk), we can
always send it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means doing some wire interpretation, and handling the transient
case where we switch from temporary to permenant channel_id, but it's
not that bad (and required for accurate demux when multiple channels
are involved for a single peer).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We still don't *have* multiple subds per peer, but now we could!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than intuiting whether this is a new channel / active channel,
use the channel_id.  This simplifies things and makes them explicit,
and prepares for multiple live channels per peer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Generally this means converting a lazy "peer_active_channel(peer)" call
into an explicit iteration.

1. notify_feerate_change: call all channels (ignores non-active ones anyway).
2. peer_get_owning_subd remove unused function.
3. peer_connected hook: don't save channel, do lookup and iterate channels.
4. In json_setchannelfee "all" remove useless call to peer_active_channel
   since we check state anyway, and iterate.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Activate means a specific thing now (connectd said something), so avoid
confusing it with this function.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Sure, we want to connect (usually) because of an active channel, but
it's not specific to the channel itself.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
More efficient to search a known peer than the whole set.

Also, move find_channel_by_id() from channel_control.c into channel.c
where we'd expect it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is generally verboten now, since there can be multiple.  There are a
few exceptions:

1. We sometimes want to know if there are *any* active channels.
2. Some dev commands still take peer id when they mean channel_id.
3. We still allow peer id when it's fully determined.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON-RPC: `close` by peer id will fail if there is more than one live channel (use `channel_id` or `short_channel_id` as id arg).
…nels.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: we now support opening multiple channels with the same peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Otherwise it waits for subds to exit, but they don't.  Plus, the others
may still be talking!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Update the address and direction as soon as it connects not just when
we're about to make it active: we want this even if we don't have
an active channel, or if the connect hook rejects it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had some flakes because we returned from `connect`, but we hadn't started
subds yet.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We may not see a disconnect instantly:

```
>       assert len(l2.rpc.listpeers()['peers']) == 0
E       assert 1 == 0
E         +1
E         -0
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
... and then dualopend returns, and we access the fread leak_detect struct.

```
lightningd: FATAL SIGNAL 6 (version 065ca1e)
0x55ecd4be8145 send_backtrace
	common/daemon.c:33
0x55ecd4be81f1 crashdump
	common/daemon.c:46
0x7f200acab51f ???
	./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x7f200acff828 __pthread_kill_implementation
	./nptl/pthread_kill.c:44
0x7f200acff828 __pthread_kill_internal
	./nptl/pthread_kill.c:80
0x7f200acff828 __GI___pthread_kill
	./nptl/pthread_kill.c:91
0x7f200acab475 __GI_raise
	../sysdeps/posix/raise.c:26
0x7f200ac917b6 __GI_abort
	./stdlib/abort.c:79
0x55ecd4c6827f call_error
	ccan/ccan/tal/tal.c:93
0x55ecd4c68470 check_bounds
	ccan/ccan/tal/tal.c:165
0x55ecd4c684c2 to_tal_hdr
	ccan/ccan/tal/tal.c:175
0x55ecd4c68eb8 tal_free
	ccan/ccan/tal/tal.c:479
0x55ecd4b8bdd0 finish_report
	lightningd/memdump.c:138
0x55ecd4b8c115 leak_detect_req_done
	lightningd/memdump.c:201
0x55ecd4c68664 notify
	ccan/ccan/tal/tal.c:237
0x55ecd4c68b9e del_tree
	ccan/ccan/tal/tal.c:402
0x55ecd4c68bf3 del_tree
	ccan/ccan/tal/tal.c:412
0x55ecd4c68bf3 del_tree
	ccan/ccan/tal/tal.c:412
0x55ecd4c68f43 tal_free
	ccan/ccan/tal/tal.c:486
0x55ecd4c5751f io_close
	ccan/ccan/io/io.c:450
0x55ecd4bbce68 subd_shutdown_remaining
	lightningd/subd.c:911
0x55ecd4b8724a shutdown_subdaemons
	lightningd/lightningd.c:541
0x55ecd4b883cc main
	lightningd/lightningd.c:1207
0x7f200ac92fcf __libc_start_call_main
	../sysdeps/nptl/libc_start_call_main.h:58
0x7f200ac9307c __libc_start_main_impl
	../csu/libc-start.c:409
0x55ecd4b5cc54 ???
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Simply exit, like we do when master daemon_conn exits.

```
Valgrind error file: valgrind-errors.2211908
==2211908== Invalid read of size 8
==2211908==    at 0x12AC13: daemon_conn_send (daemon_conn.c:137)
==2211908==    by 0x113CD9: queue_peer_msg (gossipd.c:118)
==2211908==    by 0x11B806: query_channel_range (queries.c:1169)
==2211908==    by 0x1250DD: peer_gossip_probe_scids (seeker.c:706)
==2211908==    by 0x1253B1: check_firstpeer (seeker.c:788)
==2211908==    by 0x1256CA: seeker_check (seeker.c:884)
==2211908==    by 0x1366AC: timer_expired (timeout.c:62)
==2211908==    by 0x1163D1: main (gossipd.c:1146)
==2211908==  Address 0x4cafdf0 is 48 bytes inside a block of size 88 free'd
==2211908==    at 0x48460C4: free (vg_replace_malloc.c:872)
==2211908==    by 0x1805EA: del_tree (tal.c:421)
==2211908==    by 0x1808BE: tal_free (tal.c:486)
==2211908==    by 0x12AB25: destroy_dc_from_conn (daemon_conn.c:112)
==2211908==    by 0x17FFDF: notify (tal.c:237)
==2211908==    by 0x180519: del_tree (tal.c:402)
==2211908==    by 0x1808BE: tal_free (tal.c:486)
==2211908==    by 0x16EE9A: io_close (io.c:450)
==2211908==    by 0x16ECA9: do_plan (io.c:401)
==2211908==    by 0x16ED16: io_ready (io.c:417)
==2211908==    by 0x1710B2: io_loop (poll.c:453)
==2211908==    by 0x1163C5: main (gossipd.c:1144)
==2211908==  Block was alloc'd at
==2211908==    at 0x484384F: malloc (vg_replace_malloc.c:381)
==2211908==    by 0x180064: allocate (tal.c:250)
==2211908==    by 0x180634: tal_alloc_ (tal.c:428)
==2211908==    by 0x12AB65: daemon_conn_new_ (daemon_conn.c:122)
==2211908==    by 0x1155F4: gossip_init (gossipd.c:763)
==2211908==    by 0x116014: recv_req (gossipd.c:999)
==2211908==    by 0x12A828: handle_read (daemon_conn.c:31)
==2211908==    by 0x16E09F: next_plan (io.c:59)
==2211908==    by 0x16ECD4: do_plan (io.c:407)
==2211908==    by 0x16ED16: io_ready (io.c:417)
==2211908==    by 0x1710B2: io_loop (poll.c:453)
==2211908==    by 0x1163C5: main (gossipd.c:1144)
==2211908== 
```
Breaks when there are no peers:

```
>       _fundchannel(l1, l2, amt_normal, None)

tests/test_connection.py:1564: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/test_connection.py:1535: in _fundchannel
    wait_for(lambda: not has_normal_channels(l2, l1))
contrib/pyln-testing/pyln/testing/utils.py:88: in wait_for
    while not success():
tests/test_connection.py:1535: in <lambda>
    wait_for(lambda: not has_normal_channels(l2, l1))
tests/test_connection.py:1527: in has_normal_channels
    for c in only_one(l1.rpc.listpeers(l2.info['id'])['peers'])['channels']])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

arr = []

    def only_one(arr):
        """Many JSON RPC calls return an array; often we only expect a single entry
        """
>       assert len(arr) == 1
E       AssertionError
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make sure it sees disconnect before reconnect, otherwise the next command
fails since we're now disconnected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This avoids reuse which can cause confusion; the long-term fix is to
rewrite this to use real channels like dualfunding does.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This was missed in e8d2176.

```
>           raise ValueError(str(errors))
E           ValueError: 
E           Node errors:
E            - lightningd-2: had bad gossip messages
E            - lightningd-3: had bad gossip messages
E           Global errors:

contrib/pyln-testing/pyln/testing/fixtures.py:201: ValueError

...

0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-gossipd: Ignoring future channel_announcment for 105x1x2 (current block 104)
0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-gossipd: Bad gossip order: WIRE_CHANNEL_UPDATE before announcement 105x1x2/0
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell
Copy link
Contributor Author

Rebase on new master with setchannel support...

@rustyrussell rustyrussell merged commit be75518 into ElementsProject:master Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants