-
-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exports use a lot of disc space, and don't use S3 #3204
Comments
I have a tingly feeling this is the bug to discuss my failed export on bookwyrm.social. |
Yes, almost certainly |
I'm trying to understand what's happening here, and how difficult it will be to fix. A few questions:
A cleanup process for old exports was on the original list but we left it for another day. Possibly that needs to be revisted sooner rather than later. Some things I've found that may or may not be helpful for this:
If I am remembering correctly, @dannymate did most of the work setting up Celery jobs, and @CSDUMMI did most of the work setting up the tarfile code, so they may be able to provide some suggestions. |
It does unfortunately seem like a sooner rather than later issue. I'm seeing the disk usage creep up and then revert back to previous levels once the task fails -- I believe it's on disk not in RAM, but anything sys admin is in my weaker area of knowledge. Traceback:
|
At the moment, the archive files are stored in |
I was wondering about |
I'll take a look at this tomorrow. |
There's a couple logging lines in both bookwyrm_export_job and Bookwyrm_import_job. Not done by me. If you'd like to go further. It's been a little while but, in terms of error handling and logging, you can override the on_failure method on Celery classes/tasks. If you want a generic handling of failure say to just log the exception then you can put this in job.py > Job class. You can also look at this example. If you want more detailed logging say to log the size of an export you can put that closer to the logic say in BookwyrmExportJob. If you want you can message me on Matrix or here and I can give it a shot if there's any information you specfically want logged. |
Ok I've run a test in dev with Digital Ocean S3, and read @mouse-reeve's traceback. If I'm reading the traceback correctly, this is an error with the database connection: it's saying it couldn't even properly mark the job as failed, because that means updating the I've run an (admittedly very small) use export in a dev environment and can confirm:
That leaves open the question of how much disk space is required whilst the export is running. I feel like that may not be the whole story though, because the traceback tells us that this is failing on
The |
Not sure how exactly is it implemented in Bookwyrm, but connection also could be lost because of a simple timeout on DBMS side if something takes a very long time and makes no queries or keepalives in the mean time. |
Resolved in #3228 |
Yay 🎉 Waiting for new release! |
When a user export is running, it creates a very large file that is stored locally (rather than S3, when S3 is configured). This may be major blocker to the feature working, as it can hit the disk limit and cancel the task.
The text was updated successfully, but these errors were encountered: