Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libp2p quic second attempt #2159

Closed
wants to merge 18 commits into from
Closed

Libp2p quic second attempt #2159

wants to merge 18 commits into from

Conversation

dvc94ch
Copy link
Contributor

@dvc94ch dvc94ch commented Jul 26, 2021

Adds support for tls, removes the UnboundedSenders from the endpoint. The endpoint still has UnboundedReceivers, but that's not different than having a VecDeque. The transport dial api for example doesn't allow applying backpressure.

Closes paritytech/substrate#1334

@mxinden
Copy link
Member

mxinden commented Aug 9, 2021

Thanks @dvc94ch for giving this another shot.

As someone not directly involved in the many Rust QUIC efforts, could you compare this implementation on a high level to the many unfinished efforts out there, namely:

//CC @kpp who might be able to add some additional context as well.

@dvc94ch
Copy link
Contributor Author

dvc94ch commented Aug 9, 2021

well first demie wrote an implementation. then tomaka improved on it. it stalled mainly because of rustls/quinn-proto not having a new release for over a year. this implementation builds on previous work and resolves the open todo's. the tls implementation was updated to the latest spec and was tested with other implementations by kpp (correctness).

features it has that the old version didn't have are proper udp socket handling with ecn codes and gso (throughput). support for plugging in your own crypto (security). support for 0rtt encryption (latency). key logging to help debugging with wireshark (maintainability). other than correctness, throughput, latency, security and maintainability not really sure how a generic connection oriented transport can differentiate itself.

@kpp
Copy link
Contributor

kpp commented Aug 9, 2021

@tomaka ^

@dvc94ch
Copy link
Contributor Author

dvc94ch commented Aug 9, 2021

anyone interested in performing a first round of review?

@dvc94ch
Copy link
Contributor Author

dvc94ch commented Aug 11, 2021

@kpp if parity is interested in getting this merged, maybe you could give it a review?

@kpp
Copy link
Contributor

kpp commented Aug 11, 2021

Well, my issue is to finish @tomaka’s attempt.

@dvc94ch
Copy link
Contributor Author

dvc94ch commented Aug 11, 2021

Why do you think that is a better starting point?

@dvc94ch
Copy link
Contributor Author

dvc94ch commented Aug 11, 2021

Are you referring to https://github.com/kpp/rust-libp2p/tree/master/transports/tls-quic/src? It seems incomplete to me but the tls code does look quite neat.

@kpp
Copy link
Contributor

kpp commented Aug 12, 2021

Why do you think that is a better starting point?

That's an issue I was assigned.

the tls code does look quite neat.

Thank you! I will merge Pierre's code from https://github.com/tomaka/libp2p-rs/tree/quiccc-again/transports/quic into it.

@dvc94ch
Copy link
Contributor Author

dvc94ch commented Aug 12, 2021

So parity decided to finish it's branch now. How convenient.

I assume that the most contentious part is that there is noise support in this branch and you probably think it'll make your code much simpler to support just TLS. It doesn't make much difference as you saw in the crypto trait. Although there isn't support for non ed25519 keys, it's probably not that hard to add.

Even in the quiccc branch there are lots of open todos and missing features. Are you planning on fixing those?

@dvc94ch
Copy link
Contributor Author

dvc94ch commented Aug 12, 2021

In that case I'll continue maintaining libp2p-quic. I assume @mxinden will wait a few months to see what happens before deciding anything.

I'd strongly prefer for us to implement something that is already specified (i.e. TLS+QUIC) and not start creating parity-nquic that will get abandoned after 6 months.

Let's hope your branch doesn't get abandonned again in 6 months.

@tomaka
Copy link
Member

tomaka commented Aug 12, 2021

So parity decided to finish it's branch now. How convenient.

My (and Parity's) position has never ever changed: we want QUIC+TLS, not QUIC+Noise, and the reason the branch was stalled is that either nobody is working on it or (for the longest time) cross-dependencies issues w.r.t. quinn-proto and rustls.

@dvc94ch
Copy link
Contributor Author

dvc94ch commented Aug 12, 2021

This branch supports TLS too, although @kpp seems to have done some improvements and cleanup.

@kpp
Copy link
Contributor

kpp commented Aug 16, 2021

So parity decided to finish it's branch now. How convenient.

I talked to Pierre. Let's unite on your PR.

transports/quic/src/tls/certificate.rs Outdated Show resolved Hide resolved
transports/quic/Cargo.toml Show resolved Hide resolved
transports/quic/Cargo.toml Outdated Show resolved Hide resolved
transports/quic/Cargo.toml Outdated Show resolved Hide resolved
transports/quic/src/transport.rs Show resolved Hide resolved
transports/quic/tests/smoke.rs Show resolved Hide resolved
transports/quic/Cargo.toml Outdated Show resolved Hide resolved
transports/quic/src/crypto.rs Outdated Show resolved Hide resolved
transports/quic/src/lib.rs Outdated Show resolved Hide resolved
@dvc94ch
Copy link
Contributor Author

dvc94ch commented Sep 11, 2021

so last missing thing is merging master and getting the build green. great work @kpp and @Demi-Marie

@kpp
Copy link
Contributor

kpp commented Sep 11, 2021

Let's wait #2226, #2227 and possibly #2228 before merging master.

kpp and others added 2 commits September 14, 2021 17:11
* protocols/gossipsub: Fix inconsistency in mesh peer tracking (libp2p#2189)

Co-authored-by: Age Manning <Age@AgeManning.com>

* misc/metrics: Add auxiliary crate to record events as OpenMetrics (libp2p#2063)

This commit adds an auxiliary crate recording protocol and Swarm events
and exposing them as metrics in the OpenMetrics format.

* README: Mention security@ipfs.io

* examples/: Add file sharing example (libp2p#2186)

Basic file sharing application with peers either providing or locating
and getting files by name.

While obviously showcasing how to build a basic file sharing
application, the actual goal of this example is **to show how to
integrate rust-libp2p into a larger application**.

Architectural properties

- Clean clonable async/await interface ([`Client`]) to interact with the
network layer.

- Single task driving the network layer, no locks required.

* examples/README: Give an overview over the many examples (libp2p#2194)

* protocols/kad: Enable filtering of (provider) records (libp2p#2163)

Introduce `KademliaStoreInserts` option, which allows to filter records.

Co-authored-by: Max Inden <mail@max-inden.de>

* swarm/src/protocols_handler: Impl ProtocolsHandler on either::Either (libp2p#2192)

Implement ProtocolsHandler on either::Either representing either of two
ProtocolsHandler implementations.

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

* *: Make libp2p-core default features optional (libp2p#2181)

Co-authored-by: Max Inden <mail@max-inden.de>

* core/: Remove DisconnectedPeer::set_connected and Pool::add (libp2p#2195)

This logic seems to be a leftover of
libp2p#889 and unused today.

* protocols/gossipsub: Use ed25519 in tests (libp2p#2197)

With f2905c0 the secp256k1 feature is
disabled by default. Instead of enabling it in the dev-dependency,
simply use ed25519.

* build(deps): Update minicbor requirement from 0.10 to 0.11 (libp2p#2200)

Updates the requirements on [minicbor](https://gitlab.com/twittner/minicbor) to permit the latest version.
- [Release notes](https://gitlab.com/twittner/minicbor/tags)
- [Changelog](https://gitlab.com/twittner/minicbor/blob/master/CHANGELOG.md)
- [Commits](https://gitlab.com/twittner/minicbor/compare/minicbor-v0.10.0...minicbor-v0.11.0)

---
updated-dependencies:
- dependency-name: minicbor
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): Update salsa20 requirement from 0.8 to 0.9 (libp2p#2206)

* build(deps): Update salsa20 requirement from 0.8 to 0.9

Updates the requirements on [salsa20](https://github.com/RustCrypto/stream-ciphers) to permit the latest version.
- [Release notes](https://github.com/RustCrypto/stream-ciphers/releases)
- [Commits](RustCrypto/stream-ciphers@ctr-v0.8.0...salsa20-v0.9.0)

---
updated-dependencies:
- dependency-name: salsa20
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* *: Bump pnet to v0.22

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Max Inden <mail@max-inden.de>

* *: Dial with handler and return handler on error and closed (libp2p#2191)

Require `NetworkBehaviourAction::{DialPeer,DialAddress}` to contain a
`ProtocolsHandler`. This allows a behaviour to attach custom state to its
handler. The behaviour would no longer need to track this state separately
during connection establishment, thus reducing state required in a behaviour.
E.g. in the case of `libp2p-kad` the behaviour can include a `GetRecord` request
in its handler, or e.g. in the case of `libp2p-request-response` the behaviour
can include the first request in the handler.

Return `ProtocolsHandler` on connection error and close. This allows a behaviour
to extract its custom state previously included in the handler on connection
failure and connection closing. E.g. in the case of `libp2p-kad` the behaviour
could extract the attached `GetRecord` from the handler of the failed connection
and then start another connection attempt with a new handler with the same
`GetRecord` or bubble up an error to the user.

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

* core/: Remove deprecated read/write functions (libp2p#2213)

Co-authored-by: Max Inden <mail@max-inden.de>

* protocols/ping: Revise naming of symbols (libp2p#2215)

Co-authored-by: Max Inden <mail@max-inden.de>

* protocols/rendezvous: Implement protocol (libp2p#2107)

Implement the libp2p rendezvous protocol.

> A lightweight mechanism for generalized peer discovery. It can be used for
bootstrap purposes, real time peer discovery, application specific routing, and
so on.

Co-authored-by: rishflab <rishflab@hotmail.com>
Co-authored-by: Daniel Karzel <daniel@comit.network>

* core/src/network/event.rs: Fix typo (libp2p#2218)

* protocols/mdns: Do not fire all timers at the same time. (libp2p#2212)

Co-authored-by: Max Inden <mail@max-inden.de>

* misc/metrics/src/kad: Set query_duration lowest bucket to 0.1 sec (libp2p#2219)

Probability for a Kademlia query to return in less than 100 milliseconds
is low, thus increasing the lower bucket to improve accuracy within the
higher ranges.

* misc/metrics/src/swarm: Expose role on connections_closed (libp2p#2220)

Expose whether closed connection was a Dialer or Listener.

* .github/workflows/ci.yml: Use clang 11 (libp2p#2233)

* protocols/rendezvous: Update prost (libp2p#2226)

Co-authored-by: Max Inden <mail@max-inden.de>

* *: Fix clippy warnings (libp2p#2227)

* swarm-derive/: Make event_process = false the default (libp2p#2214)

Co-authored-by: Max Inden <mail@max-inden.de>

Co-authored-by: Max Inden <mail@max-inden.de>
Co-authored-by: Age Manning <Age@AgeManning.com>
Co-authored-by: Ruben De Smet <ruben.de.smet@rubdos.be>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: rishflab <rishflab@hotmail.com>
Co-authored-by: Daniel Karzel <daniel@comit.network>
Co-authored-by: David Craven <david@craven.ch>
@dvc94ch
Copy link
Contributor Author

dvc94ch commented Sep 14, 2021

@kpp, @DemiMarie can you approve the PR? I think all issues have been addressed.

@mxinden what are the next steps to get this merged?

Copy link
Contributor

@kpp kpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far so good

Copy link
Contributor

@DemiMarie DemiMarie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend running the tests thousands of times in a tight loop, using netem to simulate packet loss and/or reordering. This caught bugs I was working on my version, both in my code and in quinn-proto.

_dss: &DigitallySignedStruct,
) -> Result<HandshakeSignatureValid, TLSError> {
Err(TLSError::PeerIncompatibleError(
"Only TLS 1.3 certificates are supported".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is not reachable as rustls should have aborted the handshake sooner, but panicking here is not critical.

@mxinden
Copy link
Member

mxinden commented Sep 25, 2021

@mxinden what are the next steps to get this merged?

I hope to find some time next week to give this an in-depth review. Thanks for the ping.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry for the delay here.

I still need more time for a full review. I got a couple questions which I am posting ahead of time. Let me know in case you would rather wait for the full review.

@@ -0,0 +1,199 @@
use libp2p_core::PeerId;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of consistency, would you mind including license headers here?

/// UDP address to connect to.
addr: SocketAddr,
/// The remotes public key.
public_key: C::PublicKey,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that one can thus not dial a remote without their PeerId?


#[derive(Debug)]
pub struct TransportChannel<C: Crypto> {
tx: mpsc::UnboundedSender<ToEndpoint<C>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, this is the channel from libp2p not from the socket. other than erroring if the queue is full and trying to dial I don't see what we can do

I don't understand why the direction (libp2p -> socket) matters in regards to bounded or unbounded. Can you expand on that @dvc94ch?

public_key,
tx,
};
self.tx.unbounded_send(msg).expect("endpoint has crashed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can one not use a bounded channel here and await both the sending and the result through the oneshot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you can do that, but what's the point

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point would be to enable backpressure throughout the system. Through backpressure an upper layer component is paused when the lower layer is overwhelmed and woken up when the lower layer is ready to receive new items.

transports/quic/src/endpoint.rs Show resolved Hide resolved

if me.event_slot.is_none() && me.incoming_slot.is_none() {
let mut metas = [RecvMeta::default(); BATCH_SIZE];
let mut iovs = MaybeUninit::<[IoSliceMut; BATCH_SIZE]>::uninit();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming this is an optimization. If so, given the usage of unsafe code, what performance impact did this optimization show in benchmarks?

inner.connection.handle_event(event);
}

let _max_datagrams = inner.endpoint.max_datagrams();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method call have any side effects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this is when using quinn-proto from git which changes the api

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +154 to +155
// Break here so that the noise upgrade can finish.
return Poll::Pending;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on this? Why does this Future have to pause? Where is the waker of the task registered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
Event::Stream(StreamEvent::Readable { id }) => {
tracing::trace!("stream readable {}", id);
if let Some(substream) = inner.substreams.get_mut(&id) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a scenario where one would receive a StreamEvent::Readable but not have a reference to it in inner.substreams? If not, how about an expect here to enforce this assumption?

transports/quic/src/transport.rs Outdated Show resolved Hide resolved
Co-authored-by: Max Inden <mail@max-inden.de>
@dvc94ch
Copy link
Contributor Author

dvc94ch commented Oct 2, 2021

I no longer work for actyx (they downsized to 3 employees). I'm currently working on a startup with @rklaehn and @wngr and as such my priorities have shifted and I won't be able to devote time to this effort for the time being.

@kpp
Copy link
Contributor

kpp commented Oct 2, 2021

I am back next week. 🌴

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comments @dvc94ch!

A couple of follow ups on the backpressure discussions. I am sorry for being stubborn here. Still I think this is an important aspect of the implementation. I am happy to expand further if need be. Thanks for bearing with me.


#[derive(Debug)]
pub struct TransportChannel<C: Crypto> {
tx: mpsc::UnboundedSender<ToEndpoint<C>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't prevent a user from dos'ing himself

How does an upper layer component know whether the lower layer is able to keep up. In other words, how does the upper layer know when to add more items and when to slow down? Backpressure through bounded channels enforces just that.

public_key,
tx,
};
self.tx.unbounded_send(msg).expect("endpoint has crashed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point would be to enable backpressure throughout the system. Through backpressure an upper layer component is paused when the lower layer is overwhelmed and woken up when the lower layer is ready to receive new items.

&self,
id: ConnectionHandle,
) -> (ConnectionChannel<C>, mpsc::Sender<ConnectionEvent>) {
let (tx, rx) = mpsc::channel(12);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether the queue is needed at all. As a rule of thumb I follow the rule of Queues exist only to handle burstiness. Is creating new connections a bursty task? If not, why introduce a queue?

socket: UdpSocket,
crypto_config: Arc<CryptoConfig<C>>,
connections: FnvHashMap<ConnectionHandle, mpsc::Sender<ConnectionEvent>>,
outgoing: VecDeque<udp_socket::Transmit>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queues are used throughout the codebase,

That is correct, yes, though I would argue that those places are either (a) not relevant (e.g. libp2p-ping with little to no bandwidth usage) or (b) a deficiency introduced in the past.

That said, lower layers of libp2p do enforce backpressure today, e.g. the TCP transport (libp2p-tcp) or the Yamux implementation..

@mxinden
Copy link
Member

mxinden commented Oct 6, 2021

I no longer work for actyx (they downsized to 3 employees).

I am sorry to hear that.

I won't be able to devote time to this effort for the time being.

Understandable. I hope to be able to test this pull request sometime this or next week.

@DemiMarie
Copy link
Contributor

@mxinden @dvc94ch I wholeheartedly agree that the unbounded queues and the VecDeque need to go.

@kpp
Copy link
Contributor

kpp commented Oct 14, 2021

I no longer work for actyx (they downsized to 3 employees). I'm currently working on a startup with @rklaehn and @wngr and as such my priorities have shifted and I won't be able to devote time to this effort for the time being.

Should I open a new PR because I can push to this PR but to David'd branch through an intermediate PR?

@dvc94ch David, you didn't add license headers neither to https://github.com/ipfs-rust/libp2p-quic nor here. Would you please post license headers you prefer? I appreciate your work on this project, I can't add "Parity 2021". That would be inconvenient.

@dvc94ch
Copy link
Contributor Author

dvc94ch commented Oct 14, 2021

Feel free to open a new PR. I guess as license header you can add Cloudpeer 2021

@dvc94ch dvc94ch closed this Oct 14, 2021
@kpp
Copy link
Contributor

kpp commented Oct 14, 2021

Copyright 2021 Cloudpeer. ?

@dvc94ch
Copy link
Contributor Author

dvc94ch commented Oct 14, 2021

not officially founded yet, hq will be in romania, but I guess we can start establishing it as a good open source player.

@kpp
Copy link
Contributor

kpp commented Oct 14, 2021

Well, I believe the header should state the company you were working on while writing this code. Or your name if it was your pet project.

@dvc94ch
Copy link
Contributor Author

dvc94ch commented Oct 14, 2021

then David Craven if you prefer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Have the decl_storage! macro import rstd::prelude::*
5 participants