-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RESTful API #443
RESTful API #443
Conversation
Send errors as JSON objects instead of an HTML page and respond 403 Unauthorized instead of redirecting to an interactive login page intended for humans. Matches Accept against application/*+json as well as application/json in anticipation of using such extension types for our own resources.
Express sets it automatically for application/json, but not JSON extension types.
Each media type can now be followed by a stack of handlers instead of just one, like Express route handlers (but without the unnecessary array-flattening behaviour). All but the last handler in the stack should call "return next()" or "await next()" in order to chain to the next handler in the stack (unless that is not desired).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
One question I had while reviewing: will the RESTful API eventually replace charon
? Or do we want to keep "nextstrain.org" specific logic sequestered to charon
and have the RESTful API be more general purpose?
@@ -1,22 +1,76 @@ | |||
const __fetch = require("make-fetch-happen"); | |||
const utils = require("./utils"); | |||
|
|||
const FETCH_OPTIONS = Symbol("Request options for make-fetch-happen"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never seen Symbol
before. Is it used here to avoid name clashes with properties in __fetch.Request
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and also to make it exceedingly annoying to access the options object here outside of the code which has access to the actual Symbol object. You can think of it like a stronger version of the convention of marking private fields with a leading underscore, e.g. foo._private
.
So it's a way to say, "This options object is very much internal to this code and shouldn't be used by other code."
(Especially because it seems like it might be tempting to use it outside of this code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for the explanation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. Jover we use Symbols in auspice too, so you might come across them during your work on multitree!
Dynamic API clients can use this to discover alternate representations, and it's a self-documenting hint to humans looking at responses too.
make-fetch-happen accepts this interface, but doesn't actually support Request options like "cache" when using it. Using Request objects is handy as it avoids needing to a) pass around (url, options) tuples and b) merge and canonicalize options objects in every function receiving that tuple.
Starts to reduce function proliferation, and it seems clearer to use "endpoints" as the housing object instead of extending the pattern of "charon" and "users" to "static" (can't actually use that name) and "sources" (potentially confusing with ./src/sources.js). Source-related endpoints are still destructured into individual variables because they're used in so many routes and repeating "endpoints.sources." all over the place would be harder to read.
…ditional media types Only sends Auspice's entrypoint for text/html-accepting clients (i.e. all browsers) and moves existence check into the endpoint itself. Other media types won't need the existence check as they don't need to prevent Auspice from hitting a 404 when it tries to load the dataset/narrative.
Reorganizes tests within the file into more describe() groups for more readable verbose output given all the new tests. Installs and configures jest-extended to provide expect(…).toBeOneOf(), which is implicitly made available for all tests (like the rest of Jest's functions).
Reflects that it doesn't test just routing anymore, but the whole request/response cycle end-to-end.
Extends GET endpoints for existing dataset and narrative routes to provide the following media types via content negotiation: application/vnd.nextstrain.dataset.main+json (also aliased as just application/json) application/vnd.nextstrain.dataset.root-sequence+json application/vnd.nextstrain.dataset.tip-frequencies+json text/vnd.nextstrain.narrative+markdown (also aliased as just text/markdown) A new documentation page covers how to use the API. I wrote the page in rST instead of Markdown because it's more expressive, and my intention is for the page to be incorporated into docs.nextstrain.org eventually. Not sure yet if this repo eventually becomes another RTD subproject or if we do cross-repo sourcing of the pages instead. Documentation includes prospective support for the PUT and DELETE methods that are not included in this commit but will come later.
This version should be replaced by the next release of make-fetch-happen that includes my PR¹ for fixing caching behaviour related to the "compress" option. Those fixes were a direct result of development on this branch finding bugs. ¹ npm/make-fetch-happen#65
I intentionally didn't set out to replace Charon here, although I think this API could with some additional endpoints (that I don't currently plan on writing in the immediate future). It is worth broader discussion of though, and there would probably be some benefits to doing so in terms of caching, response piping improvements, etc. Also, there's also two ways to think about "replacing" Charon, if we wanted to do so. One way is to have Auspice redefine the requests it makes to match this interface.1 I think that's what's probably on your mind? But the other way is to have Auspice continue to use the Charon API, but for nextstrain.org implement Charon by redirecting (either visibly or invisibly to Auspice) to the appropriate endpoint. I can see reasons why we might prefer to do the latter way to avoid churn or breaking existing custom Auspice deploys. Footnotes
|
Right, I forget that Charon is a part of Auspice. With that in mind, I think this approach makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks exciting Tom! I picked a few code paths to dive into and it all makes sense. I didn't fully explore the proxying behavior (e.g. forwarding headers) but I don't have expertise in that area anyway.
@@ -105,10 +131,45 @@ const canonicalizeDataset = (pathBuilder) => (req, res, next) => { | |||
*/ | |||
const ifDatasetExists = async (req, res, next) => { | |||
if (!(await req.context.dataset.exists())) throw new NotFound(); | |||
next(); | |||
return next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am i correct in thinking that the return statement here simply removes ifDatasetExists
from the stack but doesn't change behavior? I.e. without it, when the code resulting from next()
is finished we'll be back in this function and exectute the (implicit) return undefined
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added return
does make this a candidate for tail call optimization¹, but the reason I added the return
here is because ifDatasetExists()
went from being called directly by Express (which really only deals in sync functions, even with the @awaitjs
wrapper) to being called by our contentTypesProvided()
(which deals in async functions). The latter passes a next()
function which is async (unlike Express'), so it needs awaiting:
nextstrain.org/src/negotiate.js
Lines 12 to 16 in 9b14ff8
* More than one handler for a type may be provided, in which case a stack of | |
* handlers is formed, like Express route handlers (but without any | |
* array-flattening behaviour). In this case, all but the last handler in the | |
* stack should call "return next()" or "await next()" in order to chain to the | |
* next handler in the stack (unless that is not desired). |
nextstrain.org/src/negotiate.js
Lines 55 to 69 in 9b14ff8
// Multiple handlers may be given, in which case they must call either | |
// `return next()` or `await next()`. | |
const handlers = handlersByType[contentType]; | |
let handlerIdx = 0; | |
async function nextHandler(err) { | |
if (err) return next(err); | |
const handler = handlers[handlerIdx++]; | |
if (!handler) return next(); | |
return await handler(req, res, nextHandler); | |
} | |
return await nextHandler(); |
That awaiting can happen explicitly in the handler with await next()
, or implicitly by return next()
. The implicit form is still compatible with Express' own next()
, so by using it ifDatasetExists
can still be used in either place.
¹ Though I don't know that any JS VM actually does tail call optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yes... promises not returning async functions is such an easy bug to walk into and, in my experience, a tricky one to debug. The comment inside contentTypesProvided
is spot on, but I think we need to go one step further and explicitly document each middleware function (in ./src/endpoints/sources.js
) because there we have two different types of middleware-like functions with subtle but important differences:
setNarrative
is correctly documented as "Express middleware" and so it can simply callnext()
, which it does.ifNarrativeExists
docstring states "Express middleware function" but it's actually a handler we call ourselves¹ (in the sense that express isn't calling it, we are calling it from an express-called middleware layer). This means it needsreturn next()
, which it does.setSource
(or the function returned from it) is only used as express middleware, yet usesreturn next()
- ditto for
setGroupSource
- etc
Alternatively, why not simply enforce return next()
in all middleware-like functions? (There exists middleware that wants to do work after the chain is complete, but I don't think we have any. If we did, we could use await next()
and make sure to add a comment explaining we're doing this on purpose.)
P.S. (from memory) express 5 is going to support async middleware out of the box, so that should make our lives easier.
¹ I'm still a bit foggy on this, but I think it's right. They are handlers all called by the dispatchByContentType
middleware layer, and when there are no more handlers then dispatchByContentType
calls (express') next()
and so we proceed to the next middleware layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, why not simply enforce
return next()
in all middleware-like functions? (There exists middleware that wants to do work after the chain is complete, but I don't think we have any. If we did, we could useawait next()
and make sure to add a comment explaining we're doing this on purpose.)
Yes, it is safe for middleware to always use return next()
if you don't need to continue processing after calling next()
. I didn't think it was worth changing everywhere to add return
(instead opting to add it incidentally or as necessary), but I can do that if we want.
P.S. (from memory) express 5 is going to support async middleware out of the box, so that should make our lives easier.
Yeah… but Express development over the past few years seems to be limited to minimal maintenance. While there are alphas of Express 5, it's not clear to me it'll ever get beyond that.
canonicalizeDataset, | ||
getDataset, | ||
getNarrative, | ||
} = endpoints.sources; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throughout this file some middleware is referred to in a fashion which conveys some of the details of the endpoint (endpoints.charon.getAvailable
, endpoints.static.gatsbyAssets
, endpoints.static.sendGatsby404
etc) but the sources endpoints are expanded here so that we don't get this information conveyed when reading through the code. Is there a reason to mix-and-match this syntax rather than being specific? For instance, why not the following?
app.routeAsync(coreBuildRoutes)
.all(
endpoints.sources.getDataset(req => req.path)
endpoints.sources.canonicalizeDataset(path => `/${path}`)
)
.getAsync(endpoints.sources.getDataset)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's merely preference. I noted why in the message of the introducing commit (3d65697):
Source-related endpoints are still destructured into individual
variables because they're used in so many routes and repeating
"endpoints.sources." all over the place would be harder to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I understand not wanting too much verbosity, but since we have multiple getDataset
handlers I think it would really help to convey which endpoint category¹ they come from, especially helpful when doing "find getDataset" across the codebase, for example.
We could remove some of the verbosity like so:
const {charon, sources} = endpoints;
app.routeAsync("/charon/getDataset")
.getAsync(charon.getDataset);
app.routeAsync(coreBuildRoutes)
.all(sources.setDataset(req => req.path), sources.canonicalizeDataset(path => `/${path}`))
.getAsync(sources.getDataset)
;
¹ is there a better word for this? I'm referring here to the organisation of the files in endpoints
which map nicely to different APIs we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I can do const {charon, sources} = endpoints
. Another alternative is splitting up the dataset and narrative-specific bits into separate modules and leaving the shared bits in src/endpoints/sources.js
. It'd look something like this: https://gist.github.com/tsibley/af7ef26a07781e00e0e56cf1bd8c246d
is there a better word for this? I'm referring here to the organisation of the files in
endpoints
which map nicely to different APIs we have.
"endpoint module" or "endpoint collection", perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameshadfield Any preference on which approach you'd like me to take?
@@ -1,22 +1,76 @@ | |||
const __fetch = require("make-fetch-happen"); | |||
const utils = require("./utils"); | |||
|
|||
const FETCH_OPTIONS = Symbol("Request options for make-fetch-happen"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. Jover we use Symbols in auspice too, so you might come across them during your work on multitree!
["application/vnd.nextstrain.dataset.root-sequence+json", getDatasetRootSequence], | ||
["application/vnd.nextstrain.dataset.tip-frequencies+json", getDatasetTipFrequencies], | ||
|
||
/* XXX TODO: Support v1 (meta and tree) too? We could, but maybe ok to say |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but maybe ok to say "just v2" for these new endpoints
Yes -- more encouragement for folks to move to v2 I think
res.vary("Accept"); | ||
|
||
// Automatically provide a Link: header for alternate types we support, | ||
// even if we end up sending a 406 Not Acceptable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concept of links seems really powerful, but with this implementation we are sending every possible alternate, regardless of whether that resource exists (e.g. curl http://localhost:5000/zika -I
indicates tip-frequencies as an alternate).
Is this a placeholder implementation, and one day we'll filter them using the exists
method (e.g.), or is the idea that links are the set of all possible alternates and you need to try to get them to determine if they actually exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think ideally we filter them in the future if feasible. For now (and maybe ~forever), it seems acceptable to me to expect clients to gracefully handle the 404 if they try to fetch a non-existent alternate. I didn't want to add filtering now since it'd be best with a stronger caching strategy when determining existence of sidecars, and it would add to the existing request amplification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
sendSubresource(req => req.context.dataset.subresource(type)); | ||
|
||
|
||
const getDatasetSubresource = type => contentTypesProvided([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this section hard to get my head around, and I think it'll be tricky to debug when something goes wrong!
We're sometimes calling the (higher order function) contentTypesProvided
in a nested fashion,
- once to set up a iterator across the provided handlers for a content type (this is
getDataset
) - once again across a provided handler to set up an iterator across different content-types (
getDatasetSubresource
)
I must be missing something important here -- why can we not get rid of these helper function and have the following?
const getDataset = contentTypesProvided([
["text/html", ifDatasetExists, sendAuspiceEntrypoint],
["application/json", sendDatasetSubresource("main")],
["application/vnd.nextstrain.dataset.main+json", sendDatasetSubresource("main")],
["application/vnd.nextstrain.dataset.root-sequence+json", sendDatasetSubresource("root-sequence")],
["application/vnd.nextstrain.dataset.tip-frequencies+json", sendDatasetSubresource("tip-frequencies")],
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good question! You're not missing anything! The simplification you give as an example would work just fine here in this PR, and quite possibly is what I should have committed for this PR.
The extra getDatasetMain
, getDatasetRootSequence
, etc. handlers (with their own content-type dispatch) exist only because I anticipated using them to handle the convenience routes discussed in the partial design doc), e.g. /ncov/open/global.json
, /ncov/open/global_root-sequence.json
, etc. I actually have a prototype of those routes which I omitted from this PR because they weren't complete yet and I wanted to get the rest of the work out for review sooner than later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I don't fully grasp how exactly those convenience routes would work without seeing code but I do worry about the complexity we are introducing here. A "linear" stack of express middleware is easy to reason about (at least, once one is familiar with the behavior of express!) but we now have a stack of middleware where some layers iterate over a set of (async) handlers¹ and some of those handlers also iterate over their own set of (async) handlers (and maybe those handlers will iterate over their own handlers etc etc).
¹ and these handlers have arguments that make them look like express middleware, which I think is by design so they can be used as express middleware in some circumstances? (I can't find an example of this, but there probably is one.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The partial prototype is in 9592f5c, if you want to 👀.
I hear your concern about complexity, but it's not one I share here. It seems to me that once you understand Express' handler stacks, understanding nested handler stacks isn't much a leap. (And Express itself nests stacks too.)
There are certainly downsides to using declarative handler stacks for routes instead of only using one imperative handler per route, but that's a bigger topic which goes against Express' design and which I think falls outside the scope of this PR if we wanted to consider switching paradigms.
which I think is by design so they can be used as express middleware in some circumstances?
Yes, by design, not so that we can necessarily use them as Express middleware, but so that our stacks of handlers are consistent (as possible) regardless of if they're called directly or indirectly by Express.
* @returns {expressEndpointAsync} | ||
*/ | ||
function sendSubresource(subresourceExtractor) { | ||
return async (req, res) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double check, this returned function (async (req, res) => ...
) is the handler that's called by nextHandler
(within dispatchByContentType
). This handler ends up running proxyResponseBodyFromUpstream
but that's not itself a handler (in the express middleware form of the word)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
Barring any new review comments, my plan is to merge and deploy this tomorrow, with #452 coming sometime later after it gets 👀. |
This fixes a key use case of /fetch where AWS S3 signed URLs are given. These signed URLs dictate the permitted S3 action, and the same URL can't support both a HEAD (headObject) and GET (getObject) request. Prior to the merge and deploy of #443 this morning, the /fetch endpoints by design didn't guard Auspice's entrypoint with a HEAD request for the dataset to check existence. This inadvertently changed in "Refactor endpoints for GETing a dataset/narrative to make room for additional media types" (86190b7). Now once again for /fetch we'll always serve up Auspice, which will make a request to Charon, which may return an error if the upstream request fails, and an error from Charon will trigger Auspice's error page. That's unlike everywhere else where we try to avoid Auspice's error page, but it's ok. Resolves #461.
See the new documentation page for a description of the API itself. Commit messages and comments contain details on the implementation.
I did extensive manual testing during development and also extended the automated test suite to cover the new endpoints (as much as possible without first adding large new features to our testing infra).
I will continue work on the corresponding PUT and DELETE methods, but wanted to get this foundation out there for review sooner than later.