RESTful API: upload/delete #452

tsibley · 2022-01-25T01:12:30Z

Builds on top of #443 to implement the endpoints for upload and delete and a corresponding authorization system.

Works well in my extensive testing and this is ready for review, but I still have some further todos:

Authz system doc
Unit tests for parts of the authz framework
Integration tests for the upload/delete endpoints, if I can work out provisioning for this

tsibley · 2022-01-25T01:14:58Z

(~~CI failures look like minor oversights and I'll address them soon.~~ Addressed.)

tsibley · 2022-02-01T00:43:13Z

Pushed a new doc page to this PR describing the authz system introduced in this PR.

tsibley · 2022-02-01T00:51:49Z

…and the RTD site for the Sphinx project added by this PR as well (using a temp. version pointing at this PR branch).

Not enabled in production.

The raw response object gets stringified as "[object Response]" and naively JSON encoded as a useless object, so do better than that.

Matches the convention used elsewhere in the codebase for Source classes vs. instances and helps keep the two straight.

…ontent-Type This is the complement to contentTypesProvided(), which dispatches based on request Accept (i.e. desired response Content-Type). It is useful for PUT/POST requests to handle different Content-Types.

Clients upload individual subresources (files) for a dataset or narrative. Routes are defined for all applicable sources (core, staging, Groups) but are only operational for Nextstrain Groups sources currently. This commit contains no new authorization checks; these will come later.

Adds a static "Subresource" property defined by each Resource subclass to customize which class is used. This lets us implement things like a "subresources()" method in the base class.

Clients issue a single deletion, which is fanned out by the server to all subresources. Routes are defined for all applicable sources (core, staging, Groups) but are only operational for Nextstrain Groups sources currently. This commit contains no new authorization checks; these will come later.

Refactors the handling for source authz failures to apply to the whole server, in anticipation of adding additional authz checks for which we want the same behaviour.

The intended audience for this redirect is humans following bookmarks, browser history, or other saved links, which will only ever be GET (and _maybe_ HEAD). While other request methods may come from an interactive browser, it's much more likely they'll come from non-user-navigation requests. Checking for req.accepts("html") is still an imperfect proxy for "user navigation in browser", but I think it's still good enough with this added request method limit. As an example of the issue, prior to this change the redirect tripped up an API client issuing an unauthenticated DELETE because our API doesn't use Accept headers for DELETE requests (and no Accept header means Accept: */*).

Essentially flips the default/fallback value to be "no" instead of "yes" by checking for explicit inclusion of text/html instead of if text/html is merely acceptable. This is because sending Accept: */* by default is a very common practice of many clients (browser and non-browser). ~All browsers seem to declare text/html explicitly, while other clients rarely do, so this works better in practice. Updates a test fetch() to be more browser-like when testing our HTML error handling.

…ap objects Allows us to use a Set in req.user objects without breaking res.json() when its passed the user object.

Users are members of roles. Objects (e.g. datasets or narratives) have tags. Policies define rules which allow a given role to take a set of actions on any object with a given tag. Currently the policies, tags, and roles are statically defined, but these can easily become customized or user-defined in the future. Cognito group memberships change from being equivalent to Nextstrain Groups memberships to instead being authorization role groups, including membership roles for Nextstrain Groups (viewers, editors, owners). Completely replaces previous interim access control via visibleToUser() methods. The Charon API endpoints now return 401/403 instead of 404 to unauthorized requests since their access control checks now go through the app-wide error handler. This centralization is an overall improvement and doesn't leak existence of private paths any more than we already were with other endpoints.

During a transition period while we move users from unsuffixed Cognito groups to role-suffixed Cognito groups, assume the least privileged Nextstrain Group role (viewer) for each unsuffixed group membership we find. This matches the existing capabilities via nextstrain.org before the existence of group membership roles. During our migration process (not yet fully determined), it may be that a user is temporarily a member of both a legacy unsuffixed Cognito group and a role-suffixed Cognito group for the same Nextstrain Group. This is ok as our authz code doesn't care how many roles a user has and our roles have hierarchical permissions.

Force fixed expiration timestamps and thus stable signed URLs by dynamically adjusting the requested validity lifetime based on the next fixed expiration point. This increases the lifetime of signed URLs from 15 min to a minimum of 1 hour and maximum of 2 hours. S3 signed URLs incorporate an expiration timestamp in their signed parameters. This timestamp is calculated during generation from the current time + the requested validity lifetime of the signature. Using a fixed validity lifetime (e.g. default of 900s) caused the signed URLs to change every second. Thus the URLs weren't cacheable in practice since each URL was new, even for the same request. Our server-side cache is now able to store and revalidate requests instead of missing every time.

We've taken a different road due to limitations of scoping permissions granted by Cognito Identity Pools.

Useful to have this developer/internal information integrated into the rest of our doc system, and means we can make full use of Sphinx's rST directives.

The placement at the bottom of the page is less of an interruption, and although I think there's some way to automatically move footnotes to the bottom with Sphinx/docutils, it was stretching the purpose of a footnote anyhow (this isn't a DFW book after all).

This is a JS project, after all!

Converts from Markdown to rST now that we have a Sphinx project and removes terms that moved into the general Nextstrain glossary.

Makes it possible to test the range of behaviour with specific policies that we don't otherwise use.

joverlee521

This looks good from reading through the code. It was nice to see how things were built on top of the foundation laid out in the previous PR. I left some comments and questions from my read through. I'll actually test tomorrow along with the CLI PR to provide more UX feels feedback there.

src/utils/index.js

src/endpoints/sources.js

joverlee521 · 2022-02-08T23:07:13Z

src/authn/index.js

+
+
+function splitGroupRole(cognitoGroup) {
+  // I wish split("/", 1) worked reasonably.  -trs, 20 Dec 2021


Wow. Never realized the JavaScript split limit worked the way it does until I saw your comment.

joverlee521 · 2022-02-09T00:11:40Z

src/sources/community.js

+  get authzTags() {
+    return new Set([
+      authz.tags.Type.Source,
+      authz.tags.Visibility.Public,
+    ]);
+  }


Can authz.tags.Type.Source here be replaced with ...super.authzTags? (Not suggesting, just learning OOP 😄 )

Yep. For the authz properties, I opted not to use inheritance by default so that it was crystal clear when looking at each method in isolation exactly what it'd return, without tracing up the parent classes. This also seemed "safer" for sensitive methods, since changes we make to parent classes can make unexpected action-at-a-distance changes to subclasses. In my eye, the tradeoff is different for non-sensitive functionality, which is why the source classes make use of inheritance elsewhere.

Ah, okay. I was initially confused that PrivateGroupSource didn't have authzTags and then I realized it was just inheriting it from the parent class.

Yeah, that's an inconsistency I hadn't noticed earlier.

@joverlee521

Prompted by @joverlee521's review. Thanks!

jameshadfield

Looks good from reading through the code, and thanks a bunch for the helpful documentation.

I don't claim to fully understand the tags, as we seem to be comparing the tags in profiles against the authzTags but both are defined on the classes themselves. When I'm back I'll go read the tailscale article to better understand.

jameshadfield · 2022-02-15T03:48:24Z

src/endpoints/sources.js

+    // Body of the request as a Node stream
+    let body = req;
+
+    // Compress on the fly to gzip if it's not already gzip compressed.


General question about server load with the functionality being introduced here - is the idea to scale up heroku instances if this affects performance? Do you expect any noticeable changes to performance given that for the foreseeable future uploading data is much rarer than accessing data?

Right, I don't expect ~any or much noticeable change given that uploading is much rarer than viewing. Additionally, nextstrain remote upload will pre-compress the data anyway, so the server won't have to unless someone is making their own direct API requests (e.g. as we might in a future web UI). If this does affect perf (much) farther down the road, scaling up Heroku instances is one of the options on the table.

jameshadfield · 2022-02-15T03:50:25Z

src/endpoints/sources.js

+  ["application/vnd.nextstrain.dataset.root-sequence+json", putDatasetRootSequence],
+  ["application/vnd.nextstrain.dataset.tip-frequencies+json", putDatasetTipFrequencies],
+
+  /* XXX TODO: Support v1 (meta and tree) too?  We could, but maybe ok to say


I think covered in another PR, but I don't consider this a todo.

tsibley · 2022-02-16T00:13:42Z

I don't claim to fully understand the tags, as we seem to be comparing the tags in profiles against the authzTags but both are defined on the classes themselves.

[Assuming you mean "profiles" → "policies"?]

Yep, Source classes both declare a policy (for the Source + contained Resources) and tags (for the Source itself). Other classes, Dataset and Narrative, declare just tags.

The policies don't have to be declared on Source classes—they could instead, for example, be defined in an authz.policies module or (with some small tweaks) a set of flat files or database—but defining them on the Source ultimately seemed to make the most sense to me when I was making the decision about where to put them. Part of that is that currently these policies are scoped at the per Source level, so it makes sense to me to attach them to the Source, the top of the scope.

nextstrain-bot temporarily deployed to nextstrain-s-restful-ap-1djr0e January 25, 2022 01:12 Inactive

tsibley mentioned this pull request Jan 25, 2022

Nextstrain.org remote nextstrain/cli#141

Merged

tsibley force-pushed the restful-api-rw branch from 7ebe28f to 2ebb5dd Compare January 25, 2022 20:38

tsibley temporarily deployed to nextstrain-s-restful-ap-1djr0e January 25, 2022 20:38 Inactive

tsibley requested a review from a team January 25, 2022 20:38

tsibley mentioned this pull request Feb 1, 2022

RESTful API #443

Merged

tsibley force-pushed the restful-api-rw branch from 4ab05f9 to 53054ef Compare February 2, 2022 21:54

tsibley added 20 commits February 3, 2022 11:33

Add two Groups for development/testing: test and test-private

a1b1be5

Not enabled in production.

Improve internal error message when upstream GET/HEAD fails

3d60a08

The raw response object gets stringified as "[object Response]" and naively JSON encoded as a useless object, so do better than that.

Rename variable to make it clear it's a class not an instance

de6f2b8

Matches the convention used elsewhere in the codebase for Source classes vs. instances and helps keep the two straight.

negotiate: Add contentTypesConsumed() for dispatch based on request C…

dd5847a

…ontent-Type This is the complement to contentTypesProvided(), which dispatches based on request Accept (i.e. desired response Content-Type). It is useful for PUT/POST requests to handle different Content-Types.

sources: Refactor subresource construction into Resource base class

ef13cd6

Adds a static "Subresource" property defined by each Resource subclass to customize which class is used. This lets us implement things like a "subresources()" method in the base class.

Handle authz failures in the app-wide error handler

fe2e5bd

Refactors the handling for source authz failures to apply to the whole server, in anticipation of adding additional authz checks for which we want the same behaviour.

Configure Express JSON serialization with basic support for Set and M…

8514e55

…ap objects Allows us to use a Set in req.user objects without breaking res.json() when its passed the user object.

docs: Remove speculation about the future that has not come to pass

b4efdb9

We've taken a different road due to limitations of scoping permissions granted by Cognito Identity Pools.

docs: Setup Sphinx project to build docs

ddc9296

Useful to have this developer/internal information integrated into the rest of our doc system, and means we can make full use of Sphinx's rST directives.

docs: Retitle Charon API doc for better fit in the TOC

bac8f6a

docs: Set default Sphinx domain and highlight language to JS

7608c35

This is a JS project, after all!

docs: Update glossary

df8f24a

Converts from Markdown to rST now that we have a Sphinx project and removes terms that moved into the general Nextstrain glossary.

tsibley added 3 commits February 3, 2022 11:33

docs: Describe the authz system

4c266b4

authz: Split policy evaluation from authorized()

d2e3351

Makes it possible to test the range of behaviour with specific policies that we don't otherwise use.

tests: Add authz tests

cbc96fb

tsibley force-pushed the restful-api-rw branch from 53054ef to cbc96fb Compare February 3, 2022 19:33

joverlee521 approved these changes Feb 9, 2022

View reviewed changes

Add some clarifying comments based on review feedback

28edaee

Prompted by @joverlee521's review. Thanks!

jameshadfield approved these changes Feb 15, 2022

View reviewed changes

tsibley merged commit af4bbaf into master Feb 23, 2022

tsibley deleted the restful-api-rw branch February 23, 2022 20:07

This was referenced Feb 23, 2022

authn: Upgrade existing sessions which pre-date addition of authzRoles #479

Merged

authn: Resave an on-the-fly updated user object back to the stored session #480

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RESTful API: upload/delete #452

RESTful API: upload/delete #452

tsibley commented Jan 25, 2022 •

edited

Loading

tsibley commented Jan 25, 2022 •

edited

Loading

tsibley commented Feb 1, 2022 •

edited

Loading

tsibley commented Feb 1, 2022

joverlee521 left a comment

joverlee521 Feb 8, 2022

tsibley Feb 10, 2022

joverlee521 Feb 9, 2022

tsibley Feb 10, 2022

joverlee521 Feb 11, 2022

tsibley Feb 11, 2022

jameshadfield left a comment

jameshadfield Feb 15, 2022

tsibley Feb 15, 2022

jameshadfield Feb 15, 2022

tsibley commented Feb 16, 2022



		function splitGroupRole(cognitoGroup) {
		// I wish split("/", 1) worked reasonably. -trs, 20 Dec 2021

RESTful API: upload/delete #452

RESTful API: upload/delete #452

Conversation

tsibley commented Jan 25, 2022 • edited Loading

tsibley commented Jan 25, 2022 • edited Loading

tsibley commented Feb 1, 2022 • edited Loading

tsibley commented Feb 1, 2022

joverlee521 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameshadfield left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsibley commented Feb 16, 2022

tsibley commented Jan 25, 2022 •

edited

Loading

tsibley commented Jan 25, 2022 •

edited

Loading

tsibley commented Feb 1, 2022 •

edited

Loading