-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⚡ Refactor thumbnail generation and scanner IO operations #426
Conversation
Push up last nights changes
Note to self: the last commit broke thumbnail loading on the client |
So I ran a little experiment against these changes:
The initial scan:
Full details
Some thoughts re: potential areas for performance improvements:
TLDR; the scan was slow, but I think it's mostly due to the bottlenecked setup. Memory usage was relatively stable, and there are some potential areas for performance improvements around adding opt-in features for metadata extraction and file hashing. EDIT: I started another run with the hashing and The thumbnail generation:
Full details
Some thoughts re: memory consumption:
TDLR; the generation was also slow, but I do think it's improved from before. I could not replicate the memory issue reported in #427, in fact I saw relatively stable and low memory usage throughout the process All this has definitely been interesting to investigate, though I wish I had a clearer understanding of the linked issue. I'll try and setup a NFS for a follow-up, which hypothetically should result in improved speeds |
Follow-up experiment:
The initial scan:
Notes
Some thoughts re: potential areas for performance improvements:
TLDR; the scan was still slow, memory usage about the same, any speed diff is likely NFS vs SMB. I feel this doesn't really shed any light on the issues reported in #427, as I couldn't replicate the memory issues with either SMB or NFS mounts. I didn't try analyzing the thumbnail generation, canceling it after a few minutes. I assume it would be similar to the scan re: overall faster but similar memory usage. |
This comment #427 (comment) is really important context. A TLDR; is some internals/configurations in the |
I'm going to do a last pass review and aim to merge this in today. I want to call out the warning in the PR description:
I will promote the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't get the plot to work in this bash script, if anyone wants to give it a go. I created this as a quick debugging tool to try and visualize a long-running scan
* ⚡ Refactor thumbnail generation * ⚡ Refactor scanner IO operations Push up last nights changes * debug: push to new tag for testing * async-ify IO operations outside scan context * debug: add tracing * wip: rename library options and add fields requires migration still * wip: rename library options, fix frontend * handwrite migration 😬 * fix ui * NON FUNCTIONAL: wip migration kill me * debug: stop auto pushing * fix migration? * wip: support processing options and restructure * 🥪 lunch allocator experiment * super wip: distroless image, zip downgrade * cleanup docker stuff * Revert "debug: stop auto pushing" This reverts commit bd6da98. * remove missed feature * fix job upsert after refresh * cleanup comments and wip review * Reapply "debug: stop auto pushing" This reverts commit f43c187. * cleanup
There is a lot of good and important context in this PR and the referenced issue #427. I won't rewrite it all, but I will summarize the actual changes below:
rayon
usage for heavy IO operationsSTUMP_MAX_SCANNER_CONCURRENCY
: The maximum number of concurrent files which may be processed by a scanner (default 200)STUMP_MAX_THUMBNAIL_CONCURRENCY
: The maximum number of concurrent files which may be processed by a thumbnail generator (default 50)This PR aims to (hopefully) improve the performance for both the thumbnail generation and scanner jobs. I don't think the changes are quite ready, but am making the PR to gather feedback and give myself time for proper review and testing.The main thing to highlight is that I am trying to move away fromrayon
for anything IO-related, as that is actually a misuse of the library as I am learning, and towards dedicated, blocking threads. I've also added two concurrency-related configuration options in the hopes to provide better options for machines of varying power capacities.