Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit memory reads for mmaped files #329

Merged
merged 2 commits into from
Oct 18, 2018

Conversation

bloodearnest
Copy link
Contributor

@bloodearnest bloodearnest commented Oct 18, 2018

This is a follow on fix for the initialisation corruption fixed in #328.

When corruption of the master's mmaped file occured, it ended up reading the corrupted data as an large integer. It would then feed this into struct.unpack_from as the length to read, which would rightly raise an exception.

However, it also somehow caused a much bigger problem. After this had occurred, Gunicorn would for some unknown reason fail to successfully launch new workers, and thus exit. We saw this pattern of corruption and subsequent gunicorn death happen consistently over multiple machines and services.
The issue was transient, and we didn't get coredumps, but the suspicion is that trying to read a large chunk of memory somehow broke gunicorn.

So this change adds a reasonable bounds check on read length, and if corruption occurs in future, then it should blow up earlier, with a better error, and without breaking Gunicorn.

Signed-off-by: Simon Davy <simon.davy@canonical.com>
Signed-off-by: Simon Davy <simon.davy@canonical.com>
@brian-brazil brian-brazil merged commit 38e9f48 into prometheus:master Oct 18, 2018
@brian-brazil
Copy link
Contributor

Thanks!

@bloodearnest
Copy link
Contributor Author

For posterity, the workers were of course failing due to the corrupted file, which is cached in the master's MultiProcessValue, and until #328, wasn't cleared when forked on metric initialisation, and the metric initialising on the worker would try to read it's state from the master's corrupt file.

The above is still a useful improvement in error messages though, it took us a while to figure out which file was corrupt, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants