Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RESTful API: upload/delete #452

Merged
merged 24 commits into from
Feb 23, 2022
Merged

RESTful API: upload/delete #452

merged 24 commits into from
Feb 23, 2022

Conversation

tsibley
Copy link
Member

@tsibley tsibley commented Jan 25, 2022

Builds on top of #443 to implement the endpoints for upload and delete and a corresponding authorization system.

Works well in my extensive testing and this is ready for review, but I still have some further todos:

  • Authz system doc
  • Unit tests for parts of the authz framework
  • Integration tests for the upload/delete endpoints, if I can work out provisioning for this

@nextstrain-bot nextstrain-bot temporarily deployed to nextstrain-s-restful-ap-1djr0e January 25, 2022 01:12 Inactive
@tsibley
Copy link
Member Author

tsibley commented Jan 25, 2022

(CI failures look like minor oversights and I'll address them soon. Addressed.)

@tsibley tsibley temporarily deployed to nextstrain-s-restful-ap-1djr0e January 25, 2022 20:38 Inactive
@tsibley tsibley requested a review from a team January 25, 2022 20:38
@tsibley
Copy link
Member Author

tsibley commented Feb 1, 2022

Pushed a new doc page to this PR describing the authz system introduced in this PR.

@tsibley
Copy link
Member Author

tsibley commented Feb 1, 2022

…and the RTD site for the Sphinx project added by this PR as well (using a temp. version pointing at this PR branch).

@tsibley tsibley mentioned this pull request Feb 1, 2022
The raw response object gets stringified as "[object Response]" and
naively JSON encoded as a useless object, so do better than that.
Matches the convention used elsewhere in the codebase for Source classes
vs. instances and helps keep the two straight.
…ontent-Type

This is the complement to contentTypesProvided(), which dispatches based
on request Accept (i.e. desired response Content-Type).  It is useful
for PUT/POST requests to handle different Content-Types.
Clients upload individual subresources (files) for a dataset or
narrative.

Routes are defined for all applicable sources (core, staging, Groups)
but are only operational for Nextstrain Groups sources currently.

This commit contains no new authorization checks; these will come later.
Adds a static "Subresource" property defined by each Resource subclass
to customize which class is used.

This lets us implement things like a "subresources()" method in the base
class.
Clients issue a single deletion, which is fanned out by the server to
all subresources.

Routes are defined for all applicable sources (core, staging, Groups)
but are only operational for Nextstrain Groups sources currently.

This commit contains no new authorization checks; these will come later.
Refactors the handling for source authz failures to apply to the whole
server, in anticipation of adding additional authz checks for which we
want the same behaviour.
The intended audience for this redirect is humans following bookmarks,
browser history, or other saved links, which will only ever be GET (and
_maybe_ HEAD).

While other request methods may come from an interactive browser, it's
much more likely they'll come from non-user-navigation requests.
Checking for req.accepts("html") is still an imperfect proxy for "user
navigation in browser", but I think it's still good enough with this
added request method limit.  As an example of the issue, prior to this
change the redirect tripped up an API client issuing an unauthenticated
DELETE because our API doesn't use Accept headers for DELETE requests
(and no Accept header means Accept: */*).
Essentially flips the default/fallback value to be "no" instead of "yes"
by checking for explicit inclusion of text/html instead of if text/html
is merely acceptable.  This is because sending Accept: */* by default is
a very common practice of many clients (browser and non-browser).  ~All
browsers seem to declare text/html explicitly, while other clients
rarely do, so this works better in practice.

Updates a test fetch() to be more browser-like when testing our HTML
error handling.
…ap objects

Allows us to use a Set in req.user objects without breaking res.json()
when its passed the user object.
Users are members of roles.  Objects (e.g. datasets or narratives) have
tags.  Policies define rules which allow a given role to take a set of
actions on any object with a given tag.

Currently the policies, tags, and roles are statically defined, but
these can easily become customized or user-defined in the future.

Cognito group memberships change from being equivalent to Nextstrain
Groups memberships to instead being authorization role groups, including
membership roles for Nextstrain Groups (viewers, editors, owners).

Completely replaces previous interim access control via visibleToUser()
methods.

The Charon API endpoints now return 401/403 instead of 404 to
unauthorized requests since their access control checks now go through
the app-wide error handler.  This centralization is an overall
improvement and doesn't leak existence of private paths any more than we
already were with other endpoints.
During a transition period while we move users from unsuffixed Cognito
groups to role-suffixed Cognito groups, assume the least privileged
Nextstrain Group role (viewer) for each unsuffixed group membership we
find.  This matches the existing capabilities via nextstrain.org before
the existence of group membership roles.

During our migration process (not yet fully determined), it may be that
a user is temporarily a member of both a legacy unsuffixed Cognito group
and a role-suffixed Cognito group for the same Nextstrain Group.  This
is ok as our authz code doesn't care how many roles a user has and our
roles have hierarchical permissions.
Force fixed expiration timestamps and thus stable signed URLs by
dynamically adjusting the requested validity lifetime based on the next
fixed expiration point.  This increases the lifetime of signed URLs from
15 min to a minimum of 1 hour and maximum of 2 hours.

S3 signed URLs incorporate an expiration timestamp in their signed
parameters.  This timestamp is calculated during generation from the
current time + the requested validity lifetime of the signature.  Using
a fixed validity lifetime (e.g. default of 900s) caused the signed URLs
to change every second.  Thus the URLs weren't cacheable in practice
since each URL was new, even for the same request.

Our server-side cache is now able to store and revalidate requests
instead of missing every time.
We've taken a different road due to limitations of scoping permissions
granted by Cognito Identity Pools.
Useful to have this developer/internal information integrated into the
rest of our doc system, and means we can make full use of Sphinx's rST
directives.
The placement at the bottom of the page is less of an interruption, and
although I think there's some way to automatically move footnotes to the
bottom with Sphinx/docutils, it was stretching the purpose of a footnote
anyhow (this isn't a DFW book after all).
Converts from Markdown to rST now that we have a Sphinx project and
removes terms that moved into the general Nextstrain glossary.
Makes it possible to test the range of behaviour with specific policies
that we don't otherwise use.
Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good from reading through the code. It was nice to see how things were built on top of the foundation laid out in the previous PR. I left some comments and questions from my read through. I'll actually test tomorrow along with the CLI PR to provide more UX feels feedback there.



function splitGroupRole(cognitoGroup) {
// I wish split("/", 1) worked reasonably. -trs, 20 Dec 2021
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. Never realized the JavaScript split limit worked the way it does until I saw your comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤪

Comment on lines +133 to +138
get authzTags() {
return new Set([
authz.tags.Type.Source,
authz.tags.Visibility.Public,
]);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can authz.tags.Type.Source here be replaced with ...super.authzTags? (Not suggesting, just learning OOP 😄 )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. For the authz properties, I opted not to use inheritance by default so that it was crystal clear when looking at each method in isolation exactly what it'd return, without tracing up the parent classes. This also seemed "safer" for sensitive methods, since changes we make to parent classes can make unexpected action-at-a-distance changes to subclasses. In my eye, the tradeoff is different for non-sensitive functionality, which is why the source classes make use of inheritance elsewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. I was initially confused that PrivateGroupSource didn't have authzTags and then I realized it was just inheriting it from the parent class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's an inconsistency I hadn't noticed earlier.

Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from reading through the code, and thanks a bunch for the helpful documentation.

I don't claim to fully understand the tags, as we seem to be comparing the tags in profiles against the authzTags but both are defined on the classes themselves. When I'm back I'll go read the tailscale article to better understand.

// Body of the request as a Node stream
let body = req;

// Compress on the fly to gzip if it's not already gzip compressed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question about server load with the functionality being introduced here - is the idea to scale up heroku instances if this affects performance? Do you expect any noticeable changes to performance given that for the foreseeable future uploading data is much rarer than accessing data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I don't expect ~any or much noticeable change given that uploading is much rarer than viewing. Additionally, nextstrain remote upload will pre-compress the data anyway, so the server won't have to unless someone is making their own direct API requests (e.g. as we might in a future web UI). If this does affect perf (much) farther down the road, scaling up Heroku instances is one of the options on the table.

["application/vnd.nextstrain.dataset.root-sequence+json", putDatasetRootSequence],
["application/vnd.nextstrain.dataset.tip-frequencies+json", putDatasetTipFrequencies],

/* XXX TODO: Support v1 (meta and tree) too? We could, but maybe ok to say
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think covered in another PR, but I don't consider this a todo.

@tsibley
Copy link
Member Author

tsibley commented Feb 16, 2022

I don't claim to fully understand the tags, as we seem to be comparing the tags in profiles against the authzTags but both are defined on the classes themselves.

[Assuming you mean "profiles" → "policies"?]

Yep, Source classes both declare a policy (for the Source + contained Resources) and tags (for the Source itself). Other classes, Dataset and Narrative, declare just tags.

The policies don't have to be declared on Source classes—they could instead, for example, be defined in an authz.policies module or (with some small tweaks) a set of flat files or database—but defining them on the Source ultimately seemed to make the most sense to me when I was making the decision about where to put them. Part of that is that currently these policies are scoped at the per Source level, so it makes sense to me to attach them to the Source, the top of the scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

4 participants