Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate test deployments freeze before #2057 #2059

Closed
serban300 opened this issue Apr 21, 2023 · 2 comments
Closed

Investigate test deployments freeze before #2057 #2059

serban300 opened this issue Apr 21, 2023 · 2 comments
Assignees

Comments

@serban300
Copy link
Collaborator

Before #2057 I had some issues with the rialto nodes. They were consuming ~300MB at startup and then growing until taking up the entire memory of the system, at which point the system froze.

We should check if rialto parachain not producing blocks was causing this. And if so see if it's expected or if it's because of a wrong rialto configuration.

@serban300 serban300 self-assigned this Apr 21, 2023
@serban300
Copy link
Collaborator Author

Found the issue. It was because of the node_impl_version error, not because of rialto-parachain not producing blocks.

I'm not very familiar with this logic but high level what was happening was:

  1. At some point this logic was called, which was executing prepare-worker inside. This was failing because of the node_impl_version problem. Then this was retried in an infinite loop, once every 3 seconds. This is not causing any issue in itself.
  2. For each block after that, the PVF logic would try to get the result of this validation, and the code would get here because the validation was in progress (in the infinite loop from point 1.). So it would do awaiting_prepare.add(...) which basically adds a handler to be called when the validation is ready. And this would slowly fill the memory with these handlers.

I'm not very familiar with this logic. Maybe this failure could be handled better, but I guess that to some extent this is expected if prepare-worker has issues. I'll be thinking a bit more about it. Not sure if it's an issue at all.

@serban300
Copy link
Collaborator Author

Resolving this. I guess that the assumption is that prepare-worker shouldn't fail. Otherwise no worker could start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant