-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
review of nextstrain remote behavior #169
Comments
Thanks for the review, @jameshadfield. Really appreciate more 👀 and 🧠 on this stuff! There's a lot here and I'm away for a bit starting Real Soon Now, so I'll come back to this more fully when I return. But I thought I'd jot down a few first thoughts as I read through now. I expect this issue will want to get broken out into several sub-issues with concise descriptions of what to change once we have some discussion and reach consensus. Re: the suggested syntax for multi-resource uploads:
I don't think this syntax is all that appealing for several reasons:
All that said, I do agree that the UX of uploading multi-resources in a single invocation can be improved! I've sketched some ideas out before in my (public) notes and review comments, Slack, etc. and will try to pull them together and summarize here when I'm back. One such improvement suggested also by @joverlee521 is Re: existence checks. I understand the desire to prevent accidental overwrites and am open to reconsidering the default, but I think routinely overwriting a dataset with an updated version is a very common pattern and something that shouldn't require an extra flag (much less a flag scarily-named Uploading a sidecar without a main dataset is indeed a corner case that we could improve. I would want to implement it server-side not client-side but didn't because it wasn't critical and there's some nuance given race conditions introduced by existence pre-checks. Re: surfacing which sidecars exist. Definitely. Something like that was my already my plan, and @joverlee521 and I discusesd it in the nextstrain.org remote review. Re: upload vs. download behaviour. I agree there's more consideration to do here. @joverlee521 and I most recently discussed this a bit in the prior review, and there were many other discussions around this elsewhere beforehand. I see the upsides to your preferred behaviour, though there are downsides too. I think it's worth a larger team discussion over the baseline/typical usage patterns we expect (for both ourselves and others) and what behaviour would best support those and remain straightforward to reason about. It might be as you suggest! but that's not clear to me right now. At this point, such discussion is probably best done as a scheduled, real-time meeting instead of async writing. Let's do that when I'm back! Re: minor notes:
|
@jameshadfield A few weeks ago I changed the behaviour of |
I just now opened a discussion issue for |
Closing as discussion has been had here. We can re-open if more discussion is needed or open new issues. |
This is essentially a review of the
nextstrain remote
functionality and how it interacts with data stored in multi-tenant groups. It comes out of a review of nextstrain/docs.nextstrain.org#104 which documents this functionality.Single-asset vs multi-asset differences
The behavior for uploading a single narrative or sinlge dataset + associated sidecars is great. I think it's intuitive the way the CLI detects that the provided files belong to the same asset (i.e. dataset + sidecar(s)) and then follows the pattern of
[single asset] -> [specified destination]
or[single asset] -> [group URL + filename converted to URL]
depending on whether a base groups URL is provided or a specific URL corresponding to an asset is provided. This is nice.However in multi-asset mode things get confusing. This is because the CLI now uses a different pattern:
[files corresponding to multiple assets] -> [provided destination + asset filenames joined together]
. This is similar to the AWS CLI and I always find this unintuitive and confusingMulti-asset mode is also limiting in that you can't specify the destination for each asset, and you also can't upload narrative + dataset assets together.
suggested solution
Force the user to be explicit, via the syntax:
Benefits:
Downsides:
Existence checks
Currently you can overwrite a file with no warning or log that it exists. We should consider if this is the right move, rather than defaulting to not overwriting files and requiring
--force
or similar. (I think we should require this.)Sidecars without main dataset files
Currently you can upload a sidecar without the main dataset existing. We should check that the main dataset exists in the group (or exists in the files being uploaded) before being able to upload sidecars.
Relatedly, you can't delete a sidecar file if the main dataset doesn't exist, however I think this should be addressed by preventing the sidecar upload in the first place.
Push and pull between files and assets
One of the tensions which the CLI tries to abstract away is the differences between a set of files representing the asset (i.e. main JSON + sidecars) and the asset itself. In some circumstances it's really nice to simply reason about an asset, but in others we need to know which underlying files are there. I only really found this tension problematic in one place:
nextstrain remote list
should convey which assets have which sidecar files. This could be done in a compact format such asupload and download default behavior is different
Most easily seen with an example
I know this has been discussed from time to time, but I really think the behavior here should be to save the asset as
a_b.json
. This has two main benefits for me:nextstrain.org/groups/<name>/flu/seasonal/h3n2/ha/2y
as2y.json
is going to lead to cognitive disassociation between the file and the URL.Minor notes
Things I noticed during testing, but are probably too minor to address. Listed here for completeness.
Slight conflict between
delete
andlist
nextstrain remote list nextstrain.org/groups/blab-private/james
lists all datasets under that URL prefixnextstrain remote rm nextstrain.org/groups/blab-private/james
doesn't work. (And I'm really happy about this!)Multi-asset delete
I found myself wishing you could do things like:
nextstrain remote delete nextstrain.org/groups/<name>/a nextstrain.org/groups/<name>/b
(Wildcard expansion also doesn't work, but I see this as a feature not a bug.)
The text was updated successfully, but these errors were encountered: