Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support recursive globbing (**/) for download paths #123

Closed
jameshadfield opened this issue Jun 16, 2021 · 3 comments · Fixed by #215
Closed

Support recursive globbing (**/) for download paths #123

jameshadfield opened this issue Jun 16, 2021 · 3 comments · Fixed by #215
Assignees

Comments

@jameshadfield
Copy link
Member

When downloading assets from large builds (I'm looking at you nCoV) the --download arg is immensely helpful. Currently to download the intermediate files the nCoV workflow one needs to supply an argument for each directory, e.g.

--download results/* --download results/translations/* --download results/global/* --download results/asia/* ...

How feasible would it be to support recursive globbing, such as --download results/**?

@tsibley
Copy link
Member

tsibley commented Jun 18, 2021

Yeah, this would be nice!

I wanted ** from the beginning, but support for it isn't built into Python's standard library, so I postponed implementing it. I don't know if there's a decent library focused just on path-matching/glob-parsing out there, but even if not, it should be pretty feasible to implement.

@tsibley
Copy link
Member

tsibley commented Sep 7, 2021

Just a small note that I stumbled across Sphinx's path matching routines recently, which include an implementation of ** support. I haven't surveyed what else is available, and that due diligence should probably still be done, but this is one source to look at again in the future. Sphinx is BSD licensed, and so we could easily vendor their impl. as the basis for ours and adapt as needed.

@tsibley tsibley changed the title Support recursive globbing for download paths Support recursive globbing (**) for download paths Sep 7, 2021
@tsibley tsibley changed the title Support recursive globbing (**) for download paths Support recursive globbing (**/) for download paths Sep 7, 2021
@tsibley
Copy link
Member

tsibley commented Sep 9, 2021

Following up to say that I think wcmatch is the library we want here: either wcmatch.glob's globmatch() or wcmatch.pathlib's Path.globmatch().

It turns out this sort of matching was going to be really handy for some other work I'm doing, and my searches this time turned up wcmatch. Trialing it out, it works great and has minimal deps. Aim is to match Bash, which is exactly what we want.

Once wcmatch is in the codebase, we can loop back and update the build --download flag to use it.

@tsibley tsibley self-assigned this Nov 8, 2021
tsibley added a commit that referenced this issue Aug 2, 2022
This helps reduce the number of patterns needed to download
large-but-specific subsets of a build with many output files (e.g.
ncov).

Resolves <#123>.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants