Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding helper_source_path in rules/common.smk takes forever on a slow filesystem #879

Closed
2 tasks done
aodenweller opened this issue Jan 19, 2024 · 3 comments · Fixed by #926
Closed
2 tasks done
Labels

Comments

@aodenweller
Copy link
Contributor

Hi everybody,

I found the following bug, which doesn't break anything, but may make the model painfully slow if the filesystem is also slow.

Checklist

Describe the Bug

The line

helper_source_path = [match for match in glob.glob("**/_helpers.py", recursive=True)]

which recursively looks for _helpers.py makes the model very slow in my experience. This is very likely due to a slow filesystem on our cluster at the Potsdam Institute. We've had similar issues with the filesystem in the past. So the bug is more on our side, but still, other users could run into similar issues on slow filesystems, so it might be worth looking into it.

If I revert c3bcaee everything works fine again.

Tagging @euronion so he's aware.

Thanks!

Error Message

There's no error message.

@fneum
Copy link
Member

fneum commented Jan 19, 2024

Thanks for letting us know! Is this faster?:

import os

helper_source_path = [os.path.join(root, '_helpers.py') for root, dirs, files in os.walk('.') if '_helpers.py' in files]

@aodenweller
Copy link
Contributor Author

I just tried it and after 5 minutes it still hadn't built the DAG, which is when I cancelled again. So no, unfortunately not. But again, this is simply due to the painfully slow filesystem at PIK.

@euronion
Copy link
Contributor

@fneum The code may cause significant issues, as it parses the full filesystem recursively and adds all _helpers.py locations to the PATH it can find. Not good, if there are multiple present on the same system and they contain incompatible code.

I don't have a full setup right now to test it with, but could you try to use

helper_source_path = workflow.source_path("scripts/_helpers.py")

instead? That should do the trick without the search and the issues outlined above :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants