bugfix: check that timer servicing worked #4846

ggreif · 2025-01-14T16:42:40Z

This fixes a disappearing timer situation described by Timo in https://dfinity.slack.com/archives/CPL67E7MX/p1736347600078339.

It turns out that under high message load the async timer servicing routine cannot be run. The fix is simple, check if the self-call succeeded (causes a throw already), and if not, set a very near global timer to retry ASAP (in the top-level catch).

TODO:

catch send errors for user workers (and mitigate) — see Mitigate timer job submission failures #4852
document that the user thunk may be called more than once, and thus should have no side effects other than submitting the self-call — see doc: add notes that timer jobs should be side-effect-free motoko-base#682

src/prelude/internals.mo

github-actions · 2025-01-14T18:43:28Z

Comparing from 607ecfd to 77fa565:
In terms of gas, no changes are observed in 5 tests.
In terms of size, no changes are observed in 5 tests.

that sets the global timer for ASAP retry

src/mo_frontend/typing.mli

crusso · 2025-01-15T11:53:26Z

src/ir_passes/await.ml

+(* if self-call queue full: expire global timer soon and retry *)
+and t_timer_throw context exp =
+  t_on_throw context exp
+    (blockE [expD (primE


Are we sure there aren't other reasons for the error than queue full? If so, it might make make sense to test the error code of the error and only set the global timer when appropriate...

Checking the Wasm, the only possibility is send failure.

crusso

LGTM so far. What will you do about failure to enqueue tasks?

ggreif · 2025-01-15T13:09:19Z

LGTM so far. What will you do about failure to enqueue tasks?

A PR on top of this one and with the logic to insert the failed expirations into the priority queue's head (in order).

review feedback

github-actions · 2025-01-15T13:39:51Z

Download the artifacts for this pull request:

berestovskyy

Seems it fixes the issue, I can't reproduce it anymore on my branch https://github.com/dfinity/ic/commits/andriy/motoko-timers-repro/

This deals with the (unlikely) possibility that the send queue is not full when the timer servicing action is submitted, but becomes full while submitting the user jobs. Now we catch the failure and re-add (single-expiration) jobs to the start of the priority queue. This is the missing piece to #4846. This is an incremental change, so that we don't have to touch the happy path. A rewrite would be justified to collapse gathering and self-sends. There is an optimisation realised in `@prune`.

crusso

LGMT

check that servicing worked

4a0c1b1

ggreif changed the title ~~bugfix: check that servicing worked~~ bugfix: check that timer servicing worked Jan 14, 2025

ggreif added 3 commits January 14, 2025 17:48

do the same for @timer_helper too

4aaefba

fix

445a2ba

more fixes

501b533

ggreif commented Jan 14, 2025

View reviewed changes

src/prelude/internals.mo Outdated Show resolved Hide resolved

make it more efficient

c9b442a

ggreif self-assigned this Jan 14, 2025

ggreif added the build_artifacts Upload moc binary as workflow artifacts label Jan 14, 2025

refactor

9793747

ggreif marked this pull request as ready for review January 14, 2025 19:10

ggreif requested a review from a team as a code owner January 14, 2025 19:10

ggreif requested review from crusso and berestovskyy January 14, 2025 21:15

ggreif marked this pull request as draft January 15, 2025 10:26

ggreif added 2 commits January 15, 2025 11:54

WIP: use the right hook for timer-related send failure

d65ed55

run a specific error continuation for timer replica callbacks

021dfcb

that sets the global timer for ASAP retry

ggreif marked this pull request as ready for review January 15, 2025 11:41

crusso reviewed Jan 15, 2025

View reviewed changes

src/mo_frontend/typing.mli Outdated Show resolved Hide resolved

crusso reviewed Jan 15, 2025

View reviewed changes

move a few type definitions

980cd23

review feedback

dfinity deleted a comment from github-actions bot Jan 15, 2025

ggreif mentioned this pull request Jan 15, 2025

feat: support IC low Wasm memory hook #4849

Open

berestovskyy previously approved these changes Jan 16, 2025

View reviewed changes

ggreif mentioned this pull request Jan 16, 2025

Mitigate timer job submission failures #4852

Merged

ggreif removed the build_artifacts Upload moc binary as workflow artifacts label Jan 17, 2025

ggreif dismissed berestovskyy’s stale review via af743f3 January 17, 2025 14:49

ggreif requested a review from crusso January 17, 2025 15:00

ggreif added Bug Something isn't working automerge-squash When ready, merge (using squash) labels Jan 17, 2025

crusso approved these changes Jan 17, 2025

View reviewed changes

Merge branch 'master' into gabor/timer-check

77fa565

mergify bot merged commit a9bc214 into master Jan 17, 2025
11 checks passed

mergify bot removed the automerge-squash When ready, merge (using squash) label Jan 17, 2025

mergify bot deleted the gabor/timer-check branch January 17, 2025 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: check that timer servicing worked #4846

bugfix: check that timer servicing worked #4846

ggreif commented Jan 14, 2025 •

edited

Loading

github-actions bot commented Jan 14, 2025 •

edited

Loading

crusso Jan 15, 2025

ggreif Jan 15, 2025

crusso left a comment •

edited

Loading

ggreif commented Jan 15, 2025

github-actions bot commented Jan 15, 2025 •

edited

Loading

berestovskyy left a comment

crusso left a comment

bugfix: check that timer servicing worked #4846

bugfix: check that timer servicing worked #4846

Conversation

ggreif commented Jan 14, 2025 • edited Loading

github-actions bot commented Jan 14, 2025 • edited Loading

crusso Jan 15, 2025

Choose a reason for hiding this comment

ggreif Jan 15, 2025

Choose a reason for hiding this comment

crusso left a comment • edited Loading

Choose a reason for hiding this comment

ggreif commented Jan 15, 2025

github-actions bot commented Jan 15, 2025 • edited Loading

berestovskyy left a comment

Choose a reason for hiding this comment

crusso left a comment

Choose a reason for hiding this comment

ggreif commented Jan 14, 2025 •

edited

Loading

github-actions bot commented Jan 14, 2025 •

edited

Loading

crusso left a comment •

edited

Loading

github-actions bot commented Jan 15, 2025 •

edited

Loading