RESTful API #443

tsibley · 2021-12-10T00:12:59Z

See the new documentation page for a description of the API itself. Commit messages and comments contain details on the implementation.

I did extensive manual testing during development and also extended the automated test suite to cover the new endpoints (as much as possible without first adding large new features to our testing infra).

I will continue work on the corresponding PUT and DELETE methods, but wanted to get this foundation out there for review sooner than later.

Send errors as JSON objects instead of an HTML page and respond 403 Unauthorized instead of redirecting to an interactive login page intended for humans. Matches Accept against application/*+json as well as application/json in anticipation of using such extension types for our own resources.

Express sets it automatically for application/json, but not JSON extension types.

Each media type can now be followed by a stack of handlers instead of just one, like Express route handlers (but without the unnecessary array-flattening behaviour). All but the last handler in the stack should call "return next()" or "await next()" in order to chain to the next handler in the stack (unless that is not desired).

joverlee521

LGTM.
One question I had while reviewing: will the RESTful API eventually replace charon? Or do we want to keep "nextstrain.org" specific logic sequestered to charon and have the RESTful API be more general purpose?

src/negotiate.js

joverlee521 · 2021-12-14T00:31:24Z

src/fetch.js

@@ -1,22 +1,76 @@
 const __fetch = require("make-fetch-happen");
 const utils = require("./utils");

+const FETCH_OPTIONS = Symbol("Request options for make-fetch-happen");


I've never seen Symbol before. Is it used here to avoid name clashes with properties in __fetch.Request?

Yes, and also to make it exceedingly annoying to access the options object here outside of the code which has access to the actual Symbol object. You can think of it like a stronger version of the convention of marking private fields with a leading underscore, e.g. foo._private.

So it's a way to say, "This options object is very much internal to this code and shouldn't be used by other code."

(Especially because it seems like it might be tempting to use it outside of this code.)

Cool, thanks for the explanation!

P.S. Jover we use Symbols in auspice too, so you might come across them during your work on multitree!

Dynamic API clients can use this to discover alternate representations, and it's a self-documenting hint to humans looking at responses too.

make-fetch-happen accepts this interface, but doesn't actually support Request options like "cache" when using it. Using Request objects is handy as it avoids needing to a) pass around (url, options) tuples and b) merge and canonicalize options objects in every function receiving that tuple.

Starts to reduce function proliferation, and it seems clearer to use "endpoints" as the housing object instead of extending the pattern of "charon" and "users" to "static" (can't actually use that name) and "sources" (potentially confusing with ./src/sources.js). Source-related endpoints are still destructured into individual variables because they're used in so many routes and repeating "endpoints.sources." all over the place would be harder to read.

…ditional media types Only sends Auspice's entrypoint for text/html-accepting clients (i.e. all browsers) and moves existence check into the endpoint itself. Other media types won't need the existence check as they don't need to prevent Auspice from hitting a 404 when it tries to load the dataset/narrative.

Reorganizes tests within the file into more describe() groups for more readable verbose output given all the new tests. Installs and configures jest-extended to provide expect(…).toBeOneOf(), which is implicitly made available for all tests (like the rest of Jest's functions).

Reflects that it doesn't test just routing anymore, but the whole request/response cycle end-to-end.

Extends GET endpoints for existing dataset and narrative routes to provide the following media types via content negotiation: application/vnd.nextstrain.dataset.main+json (also aliased as just application/json) application/vnd.nextstrain.dataset.root-sequence+json application/vnd.nextstrain.dataset.tip-frequencies+json text/vnd.nextstrain.narrative+markdown (also aliased as just text/markdown) A new documentation page covers how to use the API. I wrote the page in rST instead of Markdown because it's more expressive, and my intention is for the page to be incorporated into docs.nextstrain.org eventually. Not sure yet if this repo eventually becomes another RTD subproject or if we do cross-repo sourcing of the pages instead. Documentation includes prospective support for the PUT and DELETE methods that are not included in this commit but will come later.

This version should be replaced by the next release of make-fetch-happen that includes my PR¹ for fixing caching behaviour related to the "compress" option. Those fixes were a direct result of development on this branch finding bugs. ¹ npm/make-fetch-happen#65

tsibley · 2021-12-14T02:17:13Z

One question I had while reviewing: will the RESTful API eventually replace charon? Or do we want to keep "nextstrain.org" specific logic sequestered to charon and have the RESTful API be more general purpose?

I intentionally didn't set out to replace Charon here, although I think this API could with some additional endpoints (that I don't currently plan on writing in the immediate future). It is worth broader discussion of though, and there would probably be some benefits to doing so in terms of caching, response piping improvements, etc.

Also, there's also two ways to think about "replacing" Charon, if we wanted to do so. One way is to have Auspice redefine the requests it makes to match this interface.¹ I think that's what's probably on your mind? But the other way is to have Auspice continue to use the Charon API, but for nextstrain.org implement Charon by redirecting (either visibly or invisibly to Auspice) to the appropriate endpoint. I can see reasons why we might prefer to do the latter way to avoid churn or breaking existing custom Auspice deploys.

Or the interface of the *.json convenience routes, which aren't in this PR, but would help converge dynamic- vs. static-hosted Auspice. ↩

joverlee521 · 2021-12-14T19:28:17Z

But the other way is to have Auspice continue to use the Charon API, but for nextstrain.org implement Charon by redirecting (either visibly or invisibly to Auspice) to the appropriate endpoint. I can see reasons why we might prefer to do the latter way to avoid churn or breaking existing custom Auspice deploys.

Right, I forget that Charon is a part of Auspice. With that in mind, I think this approach makes sense.

jameshadfield

Looks exciting Tom! I picked a few code paths to dive into and it all makes sense. I didn't fully explore the proxying behavior (e.g. forwarding headers) but I don't have expertise in that area anyway.

jameshadfield · 2022-01-12T00:07:45Z

src/endpoints/sources.js

@@ -105,10 +131,45 @@ const canonicalizeDataset = (pathBuilder) => (req, res, next) => {
 */
 const ifDatasetExists = async (req, res, next) => {
  if (!(await req.context.dataset.exists())) throw new NotFound();
-  next();
+  return next();


Am i correct in thinking that the return statement here simply removes ifDatasetExists from the stack but doesn't change behavior? I.e. without it, when the code resulting from next() is finished we'll be back in this function and exectute the (implicit) return undefined?

The added return does make this a candidate for tail call optimization¹, but the reason I added the return here is because ifDatasetExists() went from being called directly by Express (which really only deals in sync functions, even with the @awaitjs wrapper) to being called by our contentTypesProvided() (which deals in async functions). The latter passes a next() function which is async (unlike Express'), so it needs awaiting:

nextstrain.org/src/negotiate.js

Lines 12 to 16 in 9b14ff8

* More than one handler for a type may be provided, in which case a stack of

* handlers is formed, like Express route handlers (but without any

* array-flattening behaviour). In this case, all but the last handler in the

* stack should call "return next()" or "await next()" in order to chain to the

* next handler in the stack (unless that is not desired).

nextstrain.org/src/negotiate.js

Lines 55 to 69 in 9b14ff8

// Multiple handlers may be given, in which case they must call either

// `return next()` or `await next()`.

const handlers = handlersByType[contentType];

let handlerIdx = 0;

async function nextHandler(err) {

if (err) return next(err);

const handler = handlers[handlerIdx++];

if (!handler) return next();

return await handler(req, res, nextHandler);

}

return await nextHandler();

That awaiting can happen explicitly in the handler with await next(), or implicitly by return next(). The implicit form is still compatible with Express' own next(), so by using it ifDatasetExists can still be used in either place.

¹ Though I don't know that any JS VM actually does tail call optimization.

Ahh yes... promises not returning async functions is such an easy bug to walk into and, in my experience, a tricky one to debug. The comment inside contentTypesProvided is spot on, but I think we need to go one step further and explicitly document each middleware function (in ./src/endpoints/sources.js) because there we have two different types of middleware-like functions with subtle but important differences:

setNarrative is correctly documented as "Express middleware" and so it can simply call next(), which it does.

ifNarrativeExists docstring states "Express middleware function" but it's actually a handler we call ourselves¹ (in the sense that express isn't calling it, we are calling it from an express-called middleware layer). This means it needs return next(), which it does.

setSource (or the function returned from it) is only used as express middleware, yet uses return next()

ditto for setGroupSource

etc

Alternatively, why not simply enforce return next() in all middleware-like functions? (There exists middleware that wants to do work after the chain is complete, but I don't think we have any. If we did, we could use await next() and make sure to add a comment explaining we're doing this on purpose.)

P.S. (from memory) express 5 is going to support async middleware out of the box, so that should make our lives easier.

¹ I'm still a bit foggy on this, but I think it's right. They are handlers all called by the dispatchByContentType middleware layer, and when there are no more handlers then dispatchByContentType calls (express') next() and so we proceed to the next middleware layer.

Alternatively, why not simply enforce return next() in all middleware-like functions? (There exists middleware that wants to do work after the chain is complete, but I don't think we have any. If we did, we could use await next() and make sure to add a comment explaining we're doing this on purpose.)

Yes, it is safe for middleware to always use return next() if you don't need to continue processing after calling next(). I didn't think it was worth changing everywhere to add return (instead opting to add it incidentally or as necessary), but I can do that if we want.

P.S. (from memory) express 5 is going to support async middleware out of the box, so that should make our lives easier.

Yeah… but Express development over the past few years seems to be limited to minimal maintenance. While there are alphas of Express 5, it's not clear to me it'll ever get beyond that.

jameshadfield · 2022-01-12T00:13:14Z

src/app.js

+  canonicalizeDataset,
+  getDataset,
+  getNarrative,
+} = endpoints.sources;


Throughout this file some middleware is referred to in a fashion which conveys some of the details of the endpoint (endpoints.charon.getAvailable, endpoints.static.gatsbyAssets, endpoints.static.sendGatsby404 etc) but the sources endpoints are expanded here so that we don't get this information conveyed when reading through the code. Is there a reason to mix-and-match this syntax rather than being specific? For instance, why not the following?

app.routeAsync(coreBuildRoutes) .all( endpoints.sources.getDataset(req => req.path) endpoints.sources.canonicalizeDataset(path => `/${path}`) ) .getAsync(endpoints.sources.getDataset)

It's merely preference. I noted why in the message of the introducing commit (3d65697):

Source-related endpoints are still destructured into individual
variables because they're used in so many routes and repeating
"endpoints.sources." all over the place would be harder to read.

Hmm, I understand not wanting too much verbosity, but since we have multiple getDataset handlers I think it would really help to convey which endpoint category¹ they come from, especially helpful when doing "find getDataset" across the codebase, for example.

We could remove some of the verbosity like so:

const {charon, sources} = endpoints; app.routeAsync("/charon/getDataset") .getAsync(charon.getDataset); app.routeAsync(coreBuildRoutes) .all(sources.setDataset(req => req.path), sources.canonicalizeDataset(path => `/${path}`)) .getAsync(sources.getDataset) ;

¹ is there a better word for this? I'm referring here to the organisation of the files in endpoints which map nicely to different APIs we have.

Ok. I can do const {charon, sources} = endpoints. Another alternative is splitting up the dataset and narrative-specific bits into separate modules and leaving the shared bits in src/endpoints/sources.js. It'd look something like this: https://gist.github.com/tsibley/af7ef26a07781e00e0e56cf1bd8c246d

is there a better word for this? I'm referring here to the organisation of the files in endpoints which map nicely to different APIs we have.

"endpoint module" or "endpoint collection", perhaps?

@jameshadfield Any preference on which approach you'd like me to take?

jameshadfield · 2022-01-12T00:37:05Z

src/fetch.js

@@ -1,22 +1,76 @@
 const __fetch = require("make-fetch-happen");
 const utils = require("./utils");

+const FETCH_OPTIONS = Symbol("Request options for make-fetch-happen");


P.S. Jover we use Symbols in auspice too, so you might come across them during your work on multitree!

jameshadfield · 2022-01-12T00:53:59Z

src/endpoints/sources.js

+  ["application/vnd.nextstrain.dataset.root-sequence+json", getDatasetRootSequence],
+  ["application/vnd.nextstrain.dataset.tip-frequencies+json", getDatasetTipFrequencies],
+
+  /* XXX TODO: Support v1 (meta and tree) too?  We could, but maybe ok to say


but maybe ok to say "just v2" for these new endpoints

Yes -- more encouragement for folks to move to v2 I think

jameshadfield · 2022-01-12T01:15:20Z

src/negotiate.js

    res.vary("Accept");

+    // Automatically provide a Link: header for alternate types we support,
+    // even if we end up sending a 406 Not Acceptable.


The concept of links seems really powerful, but with this implementation we are sending every possible alternate, regardless of whether that resource exists (e.g. curl http://localhost:5000/zika -I indicates tip-frequencies as an alternate).

Is this a placeholder implementation, and one day we'll filter them using the exists method (e.g.), or is the idea that links are the set of all possible alternates and you need to try to get them to determine if they actually exist?

Yeah, I think ideally we filter them in the future if feasible. For now (and maybe ~forever), it seems acceptable to me to expect clients to gracefully handle the 404 if they try to fetch a non-existent alternate. I didn't want to add filtering now since it'd be best with a stronger caching strategy when determining existence of sidecars, and it would add to the existing request amplification.

jameshadfield · 2022-01-12T01:43:53Z

src/endpoints/sources.js

+  sendSubresource(req => req.context.dataset.subresource(type));
+
+
+const getDatasetSubresource = type => contentTypesProvided([


I found this section hard to get my head around, and I think it'll be tricky to debug when something goes wrong!

We're sometimes calling the (higher order function) contentTypesProvided in a nested fashion,

once to set up a iterator across the provided handlers for a content type (this is getDataset)

once again across a provided handler to set up an iterator across different content-types (getDatasetSubresource)

I must be missing something important here -- why can we not get rid of these helper function and have the following?

const getDataset = contentTypesProvided([ ["text/html", ifDatasetExists, sendAuspiceEntrypoint], ["application/json", sendDatasetSubresource("main")], ["application/vnd.nextstrain.dataset.main+json", sendDatasetSubresource("main")], ["application/vnd.nextstrain.dataset.root-sequence+json", sendDatasetSubresource("root-sequence")], ["application/vnd.nextstrain.dataset.tip-frequencies+json", sendDatasetSubresource("tip-frequencies")],

Ah, good question! You're not missing anything! The simplification you give as an example would work just fine here in this PR, and quite possibly is what I should have committed for this PR.

The extra getDatasetMain, getDatasetRootSequence, etc. handlers (with their own content-type dispatch) exist only because I anticipated using them to handle the convenience routes discussed in the partial design doc), e.g. /ncov/open/global.json, /ncov/open/global_root-sequence.json, etc. I actually have a prototype of those routes which I omitted from this PR because they weren't complete yet and I wanted to get the rest of the work out for review sooner than later.

Ok, I don't fully grasp how exactly those convenience routes would work without seeing code but I do worry about the complexity we are introducing here. A "linear" stack of express middleware is easy to reason about (at least, once one is familiar with the behavior of express!) but we now have a stack of middleware where some layers iterate over a set of (async) handlers¹ and some of those handlers also iterate over their own set of (async) handlers (and maybe those handlers will iterate over their own handlers etc etc).

¹ and these handlers have arguments that make them look like express middleware, which I think is by design so they can be used as express middleware in some circumstances? (I can't find an example of this, but there probably is one.)

The partial prototype is in 9592f5c, if you want to 👀.

I hear your concern about complexity, but it's not one I share here. It seems to me that once you understand Express' handler stacks, understanding nested handler stacks isn't much a leap. (And Express itself nests stacks too.)

There are certainly downsides to using declarative handler stacks for routes instead of only using one imperative handler per route, but that's a bigger topic which goes against Express' design and which I think falls outside the scope of this PR if we wanted to consider switching paradigms.

which I think is by design so they can be used as express middleware in some circumstances?

Yes, by design, not so that we can necessarily use them as Express middleware, but so that our stacks of handlers are consistent (as possible) regardless of if they're called directly or indirectly by Express.

jameshadfield · 2022-01-12T01:54:51Z

src/endpoints/sources.js

+ * @returns {expressEndpointAsync}
+ */
+function sendSubresource(subresourceExtractor) {
+  return async (req, res) => {


Just to double check, this returned function (async (req, res) => ...) is the handler that's called by nextHandler (within dispatchByContentType). This handler ends up running proxyResponseBodyFromUpstream but that's not itself a handler (in the express middleware form of the word)?

tsibley · 2022-02-01T21:59:31Z

Barring any new review comments, my plan is to merge and deploy this tomorrow, with #452 coming sometime later after it gets 👀.

This fixes a key use case of /fetch where AWS S3 signed URLs are given. These signed URLs dictate the permitted S3 action, and the same URL can't support both a HEAD (headObject) and GET (getObject) request. Prior to the merge and deploy of #443 this morning, the /fetch endpoints by design didn't guard Auspice's entrypoint with a HEAD request for the dataset to check existence. This inadvertently changed in "Refactor endpoints for GETing a dataset/narrative to make room for additional media types" (86190b7). Now once again for /fetch we'll always serve up Auspice, which will make a request to Charon, which may return an error if the upstream request fails, and an error from Charon will trigger Auspice's error page. That's unlike everywhere else where we try to avoid Auspice's error page, but it's ok. Resolves #461.

tsibley added 3 commits December 9, 2021 13:38

negotiate: Set a UTF-8 charset for application/*+json media types

c46c9f2

Express sets it automatically for application/json, but not JSON extension types.

tsibley requested a review from a team December 10, 2021 00:13

nextstrain-bot temporarily deployed to nextstrain-s-restful-ap-wipaft December 10, 2021 00:13 Inactive

tsibley force-pushed the restful-api branch from df7677e to 2b078df Compare December 10, 2021 19:29

tsibley temporarily deployed to nextstrain-s-restful-ap-wipaft December 10, 2021 19:30 Inactive

tsibley mentioned this pull request Dec 13, 2021

Explain the design and intent of our source model classes #440

Merged

joverlee521 approved these changes Dec 14, 2021

View reviewed changes

tsibley added 9 commits December 13, 2021 17:59

negotiate: Automatically send a Link header with alternates

61fcb9a

Dynamic API clients can use this to discover alternate representations, and it's a self-documenting hint to humans looking at responses too.

tests: Rename routing.test.js to requests.test.js

d268210

Reflects that it doesn't test just routing anymore, but the whole request/response cycle end-to-end.

docs: Fix typo

592aa7d

tsibley force-pushed the restful-api branch from 2b078df to 9b14ff8 Compare December 14, 2021 02:00

tsibley temporarily deployed to nextstrain-s-restful-ap-wipaft December 14, 2021 02:00 Inactive

jameshadfield reviewed Jan 12, 2022

View reviewed changes

tsibley mentioned this pull request Jan 25, 2022

RESTful API: upload/delete #452

Merged

3 tasks

tsibley merged commit eec2250 into master Feb 2, 2022

tsibley deleted the restful-api branch February 2, 2022 18:39

tsibley mentioned this pull request Feb 3, 2022

Nextstrain.org/fetch broke sometime between yesterday and today #461

Closed

tsibley mentioned this pull request Feb 3, 2022

Don't check existence of datasets and narratives for /fetch/… #462

Merged

tsibley mentioned this pull request Jun 14, 2022

dev: Switch back to upstream releases of make-fetch-happen #555

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RESTful API #443

RESTful API #443

tsibley commented Dec 10, 2021 •

edited by victorlin

Loading

joverlee521 left a comment

joverlee521 Dec 14, 2021

tsibley Dec 14, 2021 •

edited

Loading

joverlee521 Dec 14, 2021

jameshadfield Jan 12, 2022

tsibley commented Dec 14, 2021

joverlee521 commented Dec 14, 2021

jameshadfield left a comment

jameshadfield Jan 12, 2022

tsibley Jan 12, 2022

jameshadfield Jan 12, 2022 •

edited

Loading

tsibley Jan 13, 2022

jameshadfield Jan 12, 2022

tsibley Jan 12, 2022

jameshadfield Jan 12, 2022

tsibley Jan 13, 2022

tsibley Jan 25, 2022

jameshadfield Jan 12, 2022

jameshadfield Jan 12, 2022

jameshadfield Jan 12, 2022

tsibley Jan 12, 2022

jameshadfield Jan 12, 2022

jameshadfield Jan 12, 2022

tsibley Jan 12, 2022

jameshadfield Jan 12, 2022

tsibley Jan 13, 2022

jameshadfield Jan 12, 2022

tsibley Jan 12, 2022

tsibley commented Feb 1, 2022

	* More than one handler for a type may be provided, in which case a stack of
	* handlers is formed, like Express route handlers (but without any
	* array-flattening behaviour). In this case, all but the last handler in the
	* stack should call "return next()" or "await next()" in order to chain to the
	* next handler in the stack (unless that is not desired).

	// Multiple handlers may be given, in which case they must call either
	// `return next()` or `await next()`.
	const handlers = handlersByType[contentType];
	let handlerIdx = 0;

	async function nextHandler(err) {
	if (err) return next(err);

	const handler = handlers[handlerIdx++];
	if (!handler) return next();

	return await handler(req, res, nextHandler);
	}

	return await nextHandler();

		sendSubresource(req => req.context.dataset.subresource(type));


		const getDatasetSubresource = type => contentTypesProvided([

RESTful API #443

RESTful API #443

Conversation

tsibley commented Dec 10, 2021 • edited by victorlin Loading

joverlee521 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsibley Dec 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsibley commented Dec 14, 2021

Footnotes

joverlee521 commented Dec 14, 2021

jameshadfield left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameshadfield Jan 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsibley commented Feb 1, 2022

tsibley commented Dec 10, 2021 •

edited by victorlin

Loading

tsibley Dec 14, 2021 •

edited

Loading

jameshadfield Jan 12, 2022 •

edited

Loading