Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offloading messages for async validation #169

Open
raulk opened this issue Mar 18, 2019 · 18 comments
Open

Offloading messages for async validation #169

raulk opened this issue Mar 18, 2019 · 18 comments
Assignees

Comments

@raulk
Copy link
Member

raulk commented Mar 18, 2019

From @arnetheduck (Nimbus, ETH 2.0 client):

@raulk we discussed topic validation in libp2p as a way to prevent bad information from spreading across the gossipsub network, though from what I can tell, the block propagation filtering method in libp2p that you pointed me to is synchronous (

err := psubs[0].RegisterTopicValidator("foo", func(context.Context, peer.ID, *Message) bool {
). this might not sit well with block validation, where we might want to prevent gossiping a block until we've verified it based on data that we receive later. how would you recommend we approach this?

here's the scenario in detail:

  • over gossip, we receive a block whose parent we're missing
  • worst case, this means we cannot yet tell if it's a good / useful block or not
  • we don't want the block to be gossiped further until we've recovered its parent to ensure that it's sane. once we do know it's sane, we want to pass it on.

To summarise:

  1. Validation can be costly or not feasible in some scenarios to perform sync.
  2. Is it feasible to consume the message, do validation offline, then republish it? How does that affect message caches, duplicate detection across the network (e.g. if we send the message to peers who had already seen it -- and possibly even propagated it if they had more complete data than us), do we generate a new message ID?
  3. What are the differences on the wire between publishing a message afresh, and spreading a gossiped message?

In a nutshell: is it possible to offload a message from the pubsub router for async validation, then resume its gossiping conditionally?

@vyzo
Copy link
Collaborator

vyzo commented Mar 18, 2019

Validation is run asynchronously in a background goroutine.

@raulk
Copy link
Member Author

raulk commented Mar 18, 2019

@vyzo the concern is not with blocking the gossip thread. The use case is that validation of message M is co-dependant on other messages M’ that could’ve arrived previously, but may have not. If they didn’t, the client can pull them from the network. That process can cause validation of message M to lengthen to seconds or more. All the while, Gossipsub has a 150ms validation timeout, and also a throttling gadget.

Would you mind addressing the questions above so we can all gain more clarity on this scenario? Thanks.

@vyzo
Copy link
Collaborator

vyzo commented Mar 18, 2019

With the current implementation it's not possible. With quite a bit of work it may be possible.

@raulk
Copy link
Member Author

raulk commented Mar 18, 2019

Ok, so the validator would have to fail when it enters the non-deterministic scenario. We’d need a callback for failed validations, so that those messages can be processed separately.

Once we’re able to validate the message, we’d have to republish it. What’s the trade-off in terms of amplification and dedup? (It’s still the same message)

@vyzo
Copy link
Collaborator

vyzo commented Mar 18, 2019

It's a rather complex change to implement. The trade off is that the message propagation would be very slow, as it wouldn't be forwarded until it could be validated.

@raulk
Copy link
Member Author

raulk commented Mar 18, 2019

I think that tradeoff is known and accepted. They basically want nodes to forward only messages whose correctness can be verified against past state (e.g. one block depends on its parent). Since they’re async and eventually consistent, it’s possible that gossiped stuff arrives out of order. Also it’s possible that gossips never arrive, correct?

That’s ok. I’m more worried about the extra amplification, as the message cache could’ve slid before the message is republished and therefore it could reach the entire network again as gossipsub wouldn’t dedup, they’d have to dedup in their logic.

When you publish a message, can you force the original message ID?

@arnetheduck
Copy link

Re dedup, I don't think any sane eth2 client will rely on libp2p-level dedup - we have a block merkle root by which we identify the payload, both when requesting them and when receiving them from the network - this root is persistent across sessions.

I'd regard that part of the protocol as a nice-to-have optimization, nothing else. In fact, I find it hard to imagine an application that relies on once-only ordered delivery on top of a gossip setting and is correct at the same time.

Perhaps the right thing to do here is simply not to broadcast the message again. It's kind of natural that broadcasts are ephemeral, and trying to get that behavior from a gossip network goes against its grain somewhat.

It does raise an interesting question: how would a sat-link connection with high latency affect the system? How is the cache timeout tuned? the problem can happen naturally, in the wild, as well.

@raulk
Copy link
Member Author

raulk commented Mar 18, 2019

I’m talking about dedup insofar controlling amplification is concerned @arnetheduck. This is important to prevent cycling.

@raulk
Copy link
Member Author

raulk commented Mar 18, 2019

(Of course apps should ensure idempotency when relying on pubsub.)

@arnetheduck
Copy link

I’m talking about dedup insofar controlling amplification is concerned @arnetheduck. This is important to prevent cycling.

yeah, sorry for being unclear there: that's what I was alluding to with the sat-link question - how is the anti-cycling tuned with respect to high-latency links?

@raulk
Copy link
Member Author

raulk commented Mar 21, 2019

Right now it's not adaptive. We should explore this case together ;-) @arnetheduck

@raulk
Copy link
Member Author

raulk commented Apr 3, 2019

Copying over from the ethresearch/p2p Gitter thread:

Kevin Mai-Husan Chia @mhchia 12:07
We can use Validator to validate received content and return a boolean to tell
the pubsub to relay it or not. IMO in the simple cases the current structure
is enough for our usage. However, as the situation pointed out in the
discussion, later blocks might be received before the previous blocks. Then
the Validator run for those "orphan blocks" will be blocked, and the
Validators will time out. Even without the timeout, the number of the
concurrent Validators might go too large.

Raúl Kripalani @raulk 12:11
@mhchia thanks for rescuing that thread! a change to make validation async
would be welcome. it wouldn’t be too difficult. there’s already an
abstraction for datastores, so you would inject the datastore into the
pubsub router, and have it persist messages it is unable to validate
instantaneously, then spawn the validation job and report the result to
the router later. We’d need some form of GC to drop persisted messages
after a grace period, if the validation result never arrived.

@raulk
Copy link
Member Author

raulk commented Apr 11, 2019

By popular petition, we need to take this up, see #172. I have a design in mind which I’ll post later as I’m on mobile now.

@raulk
Copy link
Member Author

raulk commented Apr 11, 2019

An async validator feature could look like this:

type AsyncValidationResult struct {
    msg     *pubsub.Message
    result  error
}

type AsyncValidator interface {
    // Queue queues a message for future validation. If error is nil, the implementation promises to 
    // validate the message and return the result in the supplied channel at a later time. 
    //
    // The async validator is responsible for offloading the message from memory when
    // appropriate. It can use a Datastore or some other medium for this.
    Queue(ctx context.Context, msg *pubsub.Message, resp chan<- AsyncValidationResult) error
}

We'd need to work out how offloading a message would impact message caches and sliding windows.

@vyzo
Copy link
Collaborator

vyzo commented Apr 11, 2019

The seen cache would be most severely impacted, as messages can be rebroadcast into the network way after the 120s cache duration.
We need to consider the effects of this.

@vyzo
Copy link
Collaborator

vyzo commented Apr 11, 2019

In terms of structure, we can add an api for forwarding prepared messages (ie messages published by someone else, already signed).
This way we can offload the message for async validation. When the validator has completed, it can forward the message using the new api.

@vyzo vyzo mentioned this issue Apr 25, 2019
4 tasks
@ghost ghost added the in progress label Apr 25, 2019
@vyzo
Copy link
Collaborator

vyzo commented Apr 26, 2019

#176 supports long-running validators in the simplest possible manner:
It removes the default (short) timeout and allows the validators to run arbitrarily long without any need for api changes or complex contraptions.

@vyzo
Copy link
Collaborator

vyzo commented Apr 26, 2019

Note that you need to adjust the time cache duration accordingly.

On the other hand there is still a use case for completely offline validators, which could take days to complete.

@libp2p libp2p deleted a comment from bitcard Feb 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants