-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce overhead around DKG #3770
Milestone
Comments
This was referenced Feb 6, 2024
Merged
tomaszslabon
added a commit
that referenced
this issue
Feb 8, 2024
Refs: #3770 Closes: #3761 Here we upgrade all libp2p libraries to the recent versions. To make it possible, we were also forced to bump the Go version from 1.18 to 1.20. This is the minimum version supported by recent libp2p packages. I recommend reviewing commit by commit where specific changes are described in detail. Here is a brief summary of what has been done: ### Upgrade Go from 1.18 to 1.20 Upgrade of Go resulted in a need to: - Adjust the return type of the `slices.SortFunc` compare function we are using in one unit test. This is because the version of the `golang.org/x/exp` package had to be bumped up as well. The returned type of the compare function used in `slices.SortFunc` was changed from `bool` to `int` somewhere between (5a980c7) - Fix the `TestCoordinationExecutor_Coordinate` which started to be flaky due to a changed behavior of `ecdsa.GenerateKey`. [Since Go 1.20](golang/go#58637), the returned key no longer depends deterministically on the bytes read from the provided RNG, and may change between calls and/or between versions (2ed7179) - Fix the `TestWalletRegistry_getWalletByPublicKeyHash_NotFound` which used a dummy curve point. Since Go 1.19, such a behavior leads to a panic (50b6bd6) - Reformat code using the new `gofmt` version (3c2274e) - Adjust the Dockerfile (8e07451) - Bump `staticcheck` version used by CI and fix the new warnings about deprecated standard library functions by replacing them as recommended (a87eea3) ### Upgrade of libp2p libraries Upgrade of libp2p packages forced us to: - Adjust `go-libp2p-core` imports to be `go-libp/core` as this package was moved to the `go-libp2p` monorepo (95d60b8) - Adjust our `transport` and `authenticatedConnection` implementations to expose some additional functions required by libp2p interfaces (ac01765) - Set up our `transport` differently due to the changes around libp2p `Security` option (110fbb3, 6953b79)
tomaszslabon
added a commit
that referenced
this issue
Feb 8, 2024
Refs: #3770 Depends on: #3771 Recent libp2p versions (we started to use them in #3771) introduced a way to set the seen messages cache TTL and strategy. Here we leverage those settings to reduce the excessive message flooding effect that sometimes occurs on mainnet. This pull request consists of two steps ### Use longer TTL for pubsub seen messages cache Once a message is received and validated, pubsub re-broadcasts it to other peers and puts it into the seen messages cache. This way, subsequent arrivals of the same message are not re-broadcasted unnecessarily. This mechanism is important for the network to avoid excessive message flooding. The default value used by libp2p is 2 minutes. However, Keep client messaging sessions are quite time-consuming so, we use a longer TTL of 5 minutes to reduce flooding risk even further. Worth noting that this time cannot be too long as the cache may grow excessively and impact memory consumption. ### Use `LastSeen` as seen messages cache strategy By default, the libp2p seen messages cache uses the `FirstSeen` strategy which expires an entry once TTL elapses from when it was added. This means that if a single message is being received frequently and consistently, pubsub will re-broadcast it every TTL, rather than never re-broadcasting it. In the context of the Keep client which additionally uses app-level retransmissions, that often leads to a strong message amplification in the broadcast channel which causes a significant increase in the network load. As the problem is quite common (see libp2p/go-libp2p-pubsub#502), the libp2p team added a new `LastSeen` strategy which behaves differently. This strategy expires an entry once TTL elapses from the last time the message was touched by a cache write (`Add`) or read (`Has`) operation. That gives the desired behavior of never re-broadcasting a message that was already seen within the last TTL period. This reduces the risk of unintended over-amplification.
tomaszslabon
added a commit
that referenced
this issue
Feb 8, 2024
Refs: #3770 Depends on: #3775 The currently used DKG retry mechanism based on random exclusion turned out to be ineffective for a higher number of participating operators. Such retries have a very small chance of success and produce a lot of unnecessary network traffic that consumes bandwidth and CPU excessively. Here we aim to improve the situation. First, we are making DKG a single-shot process that fails fast if the result cannot be produced during the first attempt. Second, we are doubling down the announcement period to maximize participation chances for all selected operators, even those at the edge of the network. Last but not least, we are reducing the submission delay that is preserved between operators attempting to submit the final result on-chain. All those changes combined allow us to achieve shorter DKG iterations that can be timed out quicker. This way, we will be able to repeat DKG more often, with different operator sets. Last but not least, we are also changing the re-transmission strategy for the `resultSigningState` which was still using `StandardRetransmissionStrategy` with retransmissions occurring on each tick. All DKG states use the `BackoffRetransmissionStrategy` strategy which leads to a sparse distribution of retransmissions and is considered more lightweight. There is no point in making an exception for the `resultSigningState`. This should reduce network load in case one of the participants fails at the end of the protocol.
tomaszslabon
added a commit
that referenced
this issue
Feb 9, 2024
Closes: #3770 Here we are tweaking some logs, mostly related to the network layer. The goal here is to make debugging easier. We are trying to achieve that by adding additional logs that may help during that process and limiting the unuseful ones. I recommend reviewing commit by commit. A short summary of changes: - Set libp2p pubsub logs to `warn` by default and make that customizable using the `PUBSUB_LOG_LEVEL` environment variable (b725e1c) - Log members that are not ready during DKG/signing network announcement (551e57a) - Log distinct operators participating in DKG (980c85e) - Execute a ping test for freshly connected peers (b3d97ff)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The extension of the beta operators set made it clear that the current DKG process is resource-heavy and problematic for operators. The current retry scheme is inefficient, a huge amount of network traffic is generated, and logs are often clunky. This task aims to improve the situation.
Tasks
The text was updated successfully, but these errors were encountered: