-
Notifications
You must be signed in to change notification settings - Fork 37
ref impl: driver agent: respond to head events and stay in sync #131
Conversation
Quick question: this is divergent from staging, I assume this is the canonical version? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think the architecture that we conceptually have three concurrent loops for checking l1 head, l2 head and taking a sync step (with the sync loop slowing down each time it doesn't move closer to sync) is pretty neat.
Also something that could use a description. I'm thinking we probably need some markdown implementation notes.
opnode/l2/driver.go
Outdated
e.Log.Error("failed to fetch L2 head info", "err", err) | ||
} | ||
cancel() | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to verify that the L2 head of the execution engine is either:
- something we're already aware of
- congruent with the referenced L1 block (i.e. valid) (in the case this is a head that was synced by the execution engine that we didn't set ourselves)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to track what the engine says it has to determine if we can go through the happy sync path. We trust our engine to have L2 blocks with correct L1 information (since we inserted it and set it with forkchoice updates). FindSyncStart can handle any out-of-sync cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we're making the assumption that we are the one inserting the L2 blocks, what's the purpose of getting the execution engine to tell us about new heads (which we already know about & track).
I guess the point of this is being a stub for later, when we do have L2-level sync, at which point we will need to validate the block, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the point of this is being a stub for later, when we do have L2-level sync, at which point we will need to validate the block, right?
We don't need to validate, state-sync via engine is always towards a block-hash we already trust (i.e. we messaged the engine with a forkchoice update, finalized or head, depending on trust assumptions). And even when the engine has the wrong head, we can just reorg away if it's not canonical.
153473a
to
538412b
Compare
Rebased on base branch |
3415946
to
8c74a7a
Compare
Went back and forth whether or not we need |
ad0b2c8
to
f5d7631
Compare
8c74a7a
to
6314c80
Compare
Rebased on the updated impl-sync-start PR (rebased that on main) |
My recommendation for this is to make the |
@trianglesphere implemented your suggestion (large refactor, no functional changes, but should pay off in test-ability), can you review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on the refactor. I think pulling it apart like this teases out the fundamental structure and makes it a bit easier to understand.
One comment was for more comments, and the other is about what I think's a bug. Feel free to merge this once you think you've got it resolved.
Codecov Report
@@ Coverage Diff @@
## main #131 +/- ##
==========================================
- Coverage 40.56% 34.26% -6.30%
==========================================
Files 8 11 +3
Lines 604 750 +146
==========================================
+ Hits 245 257 +12
- Misses 339 468 +129
- Partials 20 25 +5
Continue to review full report at Codecov.
|
Part of #119: staging -> main migration
This:
EngineDriver
is like a binding to a remote engine, tracking to where it's synced.Drive
is a sync process that can be started/shutdown per remote engine, and responds to head events from L1, and syncs (happy extend-case, or worst-case sync with reorg) the L2 chain by running a driver-step on the right starting-point. There is only a single active driving process at a time per engine.Depends on #130
Review: any team, systems team preferred.
Spec: Do we want to go-indepth, like back-off strategy and failure-cases of sync in the spec? Or maybe just the "if out of sync, run a driver-step on with L1 block
X_n
and L2 block parentY_{n-1}
" summary of driver routine behavior? (we already more or less have the latter)Testing: this is the most time/event/integration heavy of all of the reference implementation. Only a single event hub, but still challenging to unit-test. It works e2e though. Any suggestions/help how to best test this?