-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offloading messages for async validation #169
Comments
Validation is run asynchronously in a background goroutine. |
@vyzo the concern is not with blocking the gossip thread. The use case is that validation of message M is co-dependant on other messages M’ that could’ve arrived previously, but may have not. If they didn’t, the client can pull them from the network. That process can cause validation of message M to lengthen to seconds or more. All the while, Gossipsub has a 150ms validation timeout, and also a throttling gadget. Would you mind addressing the questions above so we can all gain more clarity on this scenario? Thanks. |
With the current implementation it's not possible. With quite a bit of work it may be possible. |
Ok, so the validator would have to fail when it enters the non-deterministic scenario. We’d need a callback for failed validations, so that those messages can be processed separately. Once we’re able to validate the message, we’d have to republish it. What’s the trade-off in terms of amplification and dedup? (It’s still the same message) |
It's a rather complex change to implement. The trade off is that the message propagation would be very slow, as it wouldn't be forwarded until it could be validated. |
I think that tradeoff is known and accepted. They basically want nodes to forward only messages whose correctness can be verified against past state (e.g. one block depends on its parent). Since they’re async and eventually consistent, it’s possible that gossiped stuff arrives out of order. Also it’s possible that gossips never arrive, correct? That’s ok. I’m more worried about the extra amplification, as the message cache could’ve slid before the message is republished and therefore it could reach the entire network again as gossipsub wouldn’t dedup, they’d have to dedup in their logic. When you publish a message, can you force the original message ID? |
Re dedup, I don't think any sane eth2 client will rely on libp2p-level dedup - we have a block merkle root by which we identify the payload, both when requesting them and when receiving them from the network - this root is persistent across sessions. I'd regard that part of the protocol as a nice-to-have optimization, nothing else. In fact, I find it hard to imagine an application that relies on once-only ordered delivery on top of a gossip setting and is correct at the same time. Perhaps the right thing to do here is simply not to broadcast the message again. It's kind of natural that broadcasts are ephemeral, and trying to get that behavior from a gossip network goes against its grain somewhat. It does raise an interesting question: how would a sat-link connection with high latency affect the system? How is the cache timeout tuned? the problem can happen naturally, in the wild, as well. |
I’m talking about dedup insofar controlling amplification is concerned @arnetheduck. This is important to prevent cycling. |
(Of course apps should ensure idempotency when relying on pubsub.) |
yeah, sorry for being unclear there: that's what I was alluding to with the sat-link question - how is the anti-cycling tuned with respect to high-latency links? |
Right now it's not adaptive. We should explore this case together ;-) @arnetheduck |
Copying over from the ethresearch/p2p Gitter thread:
|
By popular petition, we need to take this up, see #172. I have a design in mind which I’ll post later as I’m on mobile now. |
An async validator feature could look like this: type AsyncValidationResult struct {
msg *pubsub.Message
result error
}
type AsyncValidator interface {
// Queue queues a message for future validation. If error is nil, the implementation promises to
// validate the message and return the result in the supplied channel at a later time.
//
// The async validator is responsible for offloading the message from memory when
// appropriate. It can use a Datastore or some other medium for this.
Queue(ctx context.Context, msg *pubsub.Message, resp chan<- AsyncValidationResult) error
} We'd need to work out how offloading a message would impact message caches and sliding windows. |
The seen cache would be most severely impacted, as messages can be rebroadcast into the network way after the 120s cache duration. |
In terms of structure, we can add an api for forwarding prepared messages (ie messages published by someone else, already signed). |
#176 supports long-running validators in the simplest possible manner: |
Note that you need to adjust the time cache duration accordingly. On the other hand there is still a use case for completely offline validators, which could take days to complete. |
From @arnetheduck (Nimbus, ETH 2.0 client):
To summarise:
In a nutshell: is it possible to offload a message from the pubsub router for async validation, then resume its gossiping conditionally?
The text was updated successfully, but these errors were encountered: