-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move settlement queue to the driver #3129
Conversation
/// Archive node URL used to index CoW AMM | ||
archive_node_url: Option<Url>, | ||
} | ||
|
||
fn default_settle_queue_size() -> usize { | ||
3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it makes sense to keep more than 3 pending settlements, at least for mainnet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to still review better the spawning thread logic, but it would be really cool if we could have e2e tests in which we can hit the limit 🙏
Yes, before implementing it, I just wanted to finalize the approach itself to avoid back-and-forth changes. |
This pull request has been marked as stale because it has been inactive a while. Please update this pull request or it will be automatically closed. |
@squadgazzz should I review this or aren't we going to have the queue in the end? |
@m-lord-renkse, we still need the queue for solvers that cannot handle multiple settlements in the same block. |
"settle deadline exceeded. unable to return a response" | ||
); | ||
} | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we reach here is because the deadline hit before we could even call /settle
, it could be the first call or it could be the 5th one. Shouldn't we log here for debug purposes? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we reach here, response_sender
successfully sent the error, where receiver logs the error:
services/crates/driver/src/domain/competition/mod.rs
Lines 352 to 355 in c7dd81d
response_rx.await.map_err(|err| { | |
tracing::error!(?err, "Failed to dequeue /settle response"); | |
Error::SubmissionError | |
})? |
.await | ||
.unwrap_or_else(|_| Err(competition::Error::SubmissionError))?) | ||
let sender = state.settle_queue_sender(); | ||
let (response_tx, response_rx) = oneshot::channel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be created within settle_queue_sender
? as we have now, we create sender
and then response_tx
just to be passed to sender
in send()
function. The queue sender should create its corresponding resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume, this is addressed now.
competition::Error::UnableToEnqueue | ||
})?; | ||
|
||
Ok(response_rx.await.map_err(|err| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we have a timeout here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, since the mempool internally checks for the submission deadline and the task will be dropped once it is reached:
services/crates/driver/src/domain/mempools.rs
Lines 124 to 133 in bcd024e
if block.number >= submission_deadline { | |
tracing::info!( | |
?hash, | |
deadline = submission_deadline, | |
current_block = block.number, | |
"tx not confirmed in time, cancelling", | |
); | |
self.cancel(mempool, settlement.gas.price, solver).await?; | |
return Err(Error::Expired); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea for this change makes sense to me but the implementation seems awkward and at the wrong place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; just some small suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, assuming comments above will be addressed.
Must be merged together with #3133 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing major, logic looks good.
BTW, what happens to colocated drivers running reference driver implementation? Could they experience issues if they don't pull the latest changes that contain this PR and we apply the release? Is it a lot of work to add a configuration parameter to autopilot that will switch from queued to non-queued submission? This way we could:
OR
|
The only thing I can think of is that we return to the state where no settlement queue exists for colocated solvers, so some of them can again suffer from receiving settlement requests after the deadline is exceeded, which happens rarely.
Not sure I am following why we need this.
Probably, makes sense. So I'll merge this after the next weekly release to have some time to run it in staging, and solvers will also have time for an update. |
# Conflicts: # crates/driver/src/domain/competition/mod.rs # crates/driver/src/infra/api/mod.rs # crates/driver/src/infra/api/routes/solve/mod.rs # crates/driver/src/infra/config/file/load.rs # crates/driver/src/infra/config/file/mod.rs # crates/driver/src/infra/solver/mod.rs
# Conflicts: # crates/driver/src/infra/config/file/mod.rs
Description
This is a follow-up to #3116, which addresses a suggestion of moving the settlement queue to the driver. This should help avoid increasing the block deadline since it can still be calculated starting from the simulation block.
Changes
https://github.com/cowprotocol/infrastructure/pull/2241 should be reverted.
How to test
Existing tests. More driver tests will be implemented once the approach is approved.