-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RESTful API: upload/delete #452
Conversation
( |
7ebe28f
to
2ebb5dd
Compare
Pushed a new doc page to this PR describing the authz system introduced in this PR. |
…and the RTD site for the Sphinx project added by this PR as well (using a temp. version pointing at this PR branch). |
4ab05f9
to
53054ef
Compare
Not enabled in production.
The raw response object gets stringified as "[object Response]" and naively JSON encoded as a useless object, so do better than that.
Matches the convention used elsewhere in the codebase for Source classes vs. instances and helps keep the two straight.
…ontent-Type This is the complement to contentTypesProvided(), which dispatches based on request Accept (i.e. desired response Content-Type). It is useful for PUT/POST requests to handle different Content-Types.
Clients upload individual subresources (files) for a dataset or narrative. Routes are defined for all applicable sources (core, staging, Groups) but are only operational for Nextstrain Groups sources currently. This commit contains no new authorization checks; these will come later.
Adds a static "Subresource" property defined by each Resource subclass to customize which class is used. This lets us implement things like a "subresources()" method in the base class.
Clients issue a single deletion, which is fanned out by the server to all subresources. Routes are defined for all applicable sources (core, staging, Groups) but are only operational for Nextstrain Groups sources currently. This commit contains no new authorization checks; these will come later.
Refactors the handling for source authz failures to apply to the whole server, in anticipation of adding additional authz checks for which we want the same behaviour.
The intended audience for this redirect is humans following bookmarks, browser history, or other saved links, which will only ever be GET (and _maybe_ HEAD). While other request methods may come from an interactive browser, it's much more likely they'll come from non-user-navigation requests. Checking for req.accepts("html") is still an imperfect proxy for "user navigation in browser", but I think it's still good enough with this added request method limit. As an example of the issue, prior to this change the redirect tripped up an API client issuing an unauthenticated DELETE because our API doesn't use Accept headers for DELETE requests (and no Accept header means Accept: */*).
Essentially flips the default/fallback value to be "no" instead of "yes" by checking for explicit inclusion of text/html instead of if text/html is merely acceptable. This is because sending Accept: */* by default is a very common practice of many clients (browser and non-browser). ~All browsers seem to declare text/html explicitly, while other clients rarely do, so this works better in practice. Updates a test fetch() to be more browser-like when testing our HTML error handling.
…ap objects Allows us to use a Set in req.user objects without breaking res.json() when its passed the user object.
Users are members of roles. Objects (e.g. datasets or narratives) have tags. Policies define rules which allow a given role to take a set of actions on any object with a given tag. Currently the policies, tags, and roles are statically defined, but these can easily become customized or user-defined in the future. Cognito group memberships change from being equivalent to Nextstrain Groups memberships to instead being authorization role groups, including membership roles for Nextstrain Groups (viewers, editors, owners). Completely replaces previous interim access control via visibleToUser() methods. The Charon API endpoints now return 401/403 instead of 404 to unauthorized requests since their access control checks now go through the app-wide error handler. This centralization is an overall improvement and doesn't leak existence of private paths any more than we already were with other endpoints.
During a transition period while we move users from unsuffixed Cognito groups to role-suffixed Cognito groups, assume the least privileged Nextstrain Group role (viewer) for each unsuffixed group membership we find. This matches the existing capabilities via nextstrain.org before the existence of group membership roles. During our migration process (not yet fully determined), it may be that a user is temporarily a member of both a legacy unsuffixed Cognito group and a role-suffixed Cognito group for the same Nextstrain Group. This is ok as our authz code doesn't care how many roles a user has and our roles have hierarchical permissions.
Force fixed expiration timestamps and thus stable signed URLs by dynamically adjusting the requested validity lifetime based on the next fixed expiration point. This increases the lifetime of signed URLs from 15 min to a minimum of 1 hour and maximum of 2 hours. S3 signed URLs incorporate an expiration timestamp in their signed parameters. This timestamp is calculated during generation from the current time + the requested validity lifetime of the signature. Using a fixed validity lifetime (e.g. default of 900s) caused the signed URLs to change every second. Thus the URLs weren't cacheable in practice since each URL was new, even for the same request. Our server-side cache is now able to store and revalidate requests instead of missing every time.
We've taken a different road due to limitations of scoping permissions granted by Cognito Identity Pools.
Useful to have this developer/internal information integrated into the rest of our doc system, and means we can make full use of Sphinx's rST directives.
The placement at the bottom of the page is less of an interruption, and although I think there's some way to automatically move footnotes to the bottom with Sphinx/docutils, it was stretching the purpose of a footnote anyhow (this isn't a DFW book after all).
This is a JS project, after all!
Converts from Markdown to rST now that we have a Sphinx project and removes terms that moved into the general Nextstrain glossary.
Makes it possible to test the range of behaviour with specific policies that we don't otherwise use.
53054ef
to
cbc96fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good from reading through the code. It was nice to see how things were built on top of the foundation laid out in the previous PR. I left some comments and questions from my read through. I'll actually test tomorrow along with the CLI PR to provide more UX feels feedback there.
|
||
|
||
function splitGroupRole(cognitoGroup) { | ||
// I wish split("/", 1) worked reasonably. -trs, 20 Dec 2021 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow. Never realized the JavaScript split limit worked the way it does until I saw your comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤪
get authzTags() { | ||
return new Set([ | ||
authz.tags.Type.Source, | ||
authz.tags.Visibility.Public, | ||
]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can authz.tags.Type.Source
here be replaced with ...super.authzTags
? (Not suggesting, just learning OOP 😄 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. For the authz properties, I opted not to use inheritance by default so that it was crystal clear when looking at each method in isolation exactly what it'd return, without tracing up the parent classes. This also seemed "safer" for sensitive methods, since changes we make to parent classes can make unexpected action-at-a-distance changes to subclasses. In my eye, the tradeoff is different for non-sensitive functionality, which is why the source classes make use of inheritance elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay. I was initially confused that PrivateGroupSource
didn't have authzTags
and then I realized it was just inheriting it from the parent class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's an inconsistency I hadn't noticed earlier.
Prompted by @joverlee521's review. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from reading through the code, and thanks a bunch for the helpful documentation.
I don't claim to fully understand the tags, as we seem to be comparing the tags in profiles against the authzTags
but both are defined on the classes themselves. When I'm back I'll go read the tailscale article to better understand.
// Body of the request as a Node stream | ||
let body = req; | ||
|
||
// Compress on the fly to gzip if it's not already gzip compressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question about server load with the functionality being introduced here - is the idea to scale up heroku instances if this affects performance? Do you expect any noticeable changes to performance given that for the foreseeable future uploading data is much rarer than accessing data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I don't expect ~any or much noticeable change given that uploading is much rarer than viewing. Additionally, nextstrain remote upload
will pre-compress the data anyway, so the server won't have to unless someone is making their own direct API requests (e.g. as we might in a future web UI). If this does affect perf (much) farther down the road, scaling up Heroku instances is one of the options on the table.
["application/vnd.nextstrain.dataset.root-sequence+json", putDatasetRootSequence], | ||
["application/vnd.nextstrain.dataset.tip-frequencies+json", putDatasetTipFrequencies], | ||
|
||
/* XXX TODO: Support v1 (meta and tree) too? We could, but maybe ok to say |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think covered in another PR, but I don't consider this a todo.
[Assuming you mean "profiles" → "policies"?] Yep, Source classes both declare a policy (for the Source + contained Resources) and tags (for the Source itself). Other classes, Dataset and Narrative, declare just tags. The policies don't have to be declared on Source classes—they could instead, for example, be defined in an |
Builds on top of #443 to implement the endpoints for upload and delete and a corresponding authorization system.
Works well in my extensive testing and this is ready for review, but I still have some further todos: