Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

:TX_MEDIATOR_TIMECAST CRIT: Actor# [...] got update from Mediator# 72075186224037894 with LatestStep# 0 previous LatestStep# 1739943505740 #14794

Closed
snaury opened this issue Feb 19, 2025 · 2 comments · Fixed by #14800
Assignees

Comments

@snaury
Copy link
Member

snaury commented Feb 19, 2025

I often see messages like these under nemesis:

:TX_MEDIATOR_TIMECAST CRIT: Actor# [50004:7473000454368397226:2064] got update from Mediator# 72075186224037894 with LatestStep# 0 previous LatestStep# 1739943505740

This doesn't actually look like a problem, probably a restarted mediator reporting LatestStep# 0? Doesn't look like something that needs a CRIT log level.

@snaury snaury self-assigned this Feb 19, 2025
@snaury
Copy link
Member Author

snaury commented Feb 19, 2025

Actually it looks like some strange problem, since we're supposed to receive events with ever increasing SubscriptionId and we're filtering old messages. We also set WatchSynced = false every time we resubscribe and we don't set WatchSynced = true until latest step matches what we have in memory. Since mediator sends its AcceptedStep as latest step it's not supposed to go backwards ever, so a message like this is totally unexpected.

@snaury
Copy link
Member Author

snaury commented Feb 19, 2025

Oh, we don't set WatchSynced = true when subscribing to bucket after an initial reconnect. So we could:

  • Have all tablets in a bucket to unregister
  • Lose connection with mediator due to its restart
  • Reconnect without subscribing to the bucket above
  • A new tablet starting very quickly and subscribing to bucket

We would erroneously think the bucket is synced (based on the previous connection) when in reality it is not. The code still treats it the same as unsynchronized case, but this spams with an unwanted CRIT in the log, need to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant