-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to stream data (for non-serverless adapters like adapter-node) #1563
Comments
This comment has been minimized.
This comment has been minimized.
I've thought about this problem before, and my first idea for the API design for this in endpoints is something along these lines: export async function* get() {
yield {
status: 200,
headers: {
"Connection": "keep-alive",
"Content-Type": "multipart/mixed",
"Transfer-Encoding": "chunked",
},
};
// ... every time you get new data to send (so suppose this is in a for/while loop or waiting for another promise)
yield {
headers: {
"Content-Type": "application/json; charset=utf-8",
},
body: newData,
};
// return from the function when done
}
// this was modeled after https://github.com/contrawork/graphql-helix/#basic-usage (Disclaimer: I honestly haven't done streamed responses in Express myself before so I might have missed something about how they work). The original sample has some work with chunk boundaries and This was just an initial thought for someone else to refine / reject; hope it's constructive. |
@babichjacob using an asynchronous generator for the body instead of a Node Stream is a great idea - though it wouldn't send new headers on every iteration, so it would just be for the body contents. I like the idea of leaning on JavaScript and away from Node, so that this would be more universal and compatible with Deno as well (and other non-Node JavaScript runtimes that may arise in the future). UPDATE: it turns out that Node Streams (Node v10+) are already asynchronous iterators! So maybe allowing for an async iterator to be passed to the body would be the sweet spot - allowing existing Node Stream libraries to be used here without jumping through extra hoops. So maybe something like this would work? export async function get() {
return {
status: 200,
headers: {
"Content-Type": "text/csv; charset=utf-8",
},
body: generateCSV() // could also be a Node Stream
};
}
async function* generateCSV() {
yield "first,row,of,a,csv,file\n";
yield "second,row,of,a,csv,file\n";
} |
Converting an asynchronous iterator to a string (for serverless & static adapters) would be as simple as this: async function iteratorToString(iterator) {
let string = '';
for await (let chunk of iterator) {
string += String(chunk);
}
return string;
} Though we may also need to deal separately with binary data (Uint8Array) generators, but I'm not sure about it. |
@benmccann that's great! I'd be very happy to help with the implementation of this - I'll try to put together a pull request in the next week or so. |
Don't forget to wait until multiple team members have commented to figure out what the API should look like—and other things related to that. |
Ah, I had searched for an issue (which didn't exist) and went ahead with an experimental implementation before opening what is now a duplicate issue. Happy to help here if I can. I concur with @jesseskinner's design sensibilities. |
Literally the only missing puzzle piece for me. If there is anything I can do to help, ping me. |
I'm pretty comfortable with JavaScript, but it isn't my primary language, so I haven't used async iterators before and don't have any opinion on whether or not it's the best design choice. If you'd like to send an implementation, please feel free as long as you don't mind getting some feedback about reworking it. If you'd rather have that feedback up front I can try to ask some of the other maintainers about this ticket |
@benmccann feedback up front would be ideal, but I'll probably end up starting an implementation sometime soon even if I don't hear back. Thanks! |
There is an implementation linked in the issue I created. See here: https://github.com/sveltejs/kit/compare/master...kjmph:feature/event-source?expand=1 I didn't open a pull request because it suggests opening an issue prior to a PR. We can expand this beyond just an |
It would be fine to open a PR referencing this issue, but I do think that you should try to make it more generic than |
I would vote for having some way of accessing the raw request object on whatever the platform you are. That would allow not only to use streaming on other environments apart from NodeJS but also allow any other use case that requires access to this objects or even any future development or features that will be available on every particular platform. It would mean that the sveltekit team will not have to think too much when new requests to support new things comes in as having access to this objects means that there will always be ways to workaround any problem until a more "user friendly" solution is develop. In my particular case I want to try streaming on cloudflare workers, but if that doesn't work I might also try vercel. I also need to pipe the requests through a number of transform streams before using the final result. The goal would be to bypass the part in which the |
Hello @benmccann; @jesseskinner and I synced on what a more generic solution would look like. We updated the experimental branch to be generic. Still seen at the same repo/branch: https://github.com/sveltejs/kit/compare/master...kjmph:feature/event-source?expand=1 We'll open a PR referencing this issue after a final review by @jesseskinner. RE: @juanmait; while I understand the comment, I think this issue is focused on using the same function signature with a different returned object to support streaming through async iterators. The suggestion to add the request and/or response object into the function signature would change the design of endpoints in SvelteKit. Perhaps you should open a new issue to address that? |
Having some support is very welcome. I'm wondering if this solution will fit in an environment like cloudflare workers. The stream API is different from nodejs, it's basically the same API that ships in modern browsers. I'll try this option anyways in nodejs. Thanks! |
Access to raw Async iterables are cool, but could cause some hard to find issues, like you can't just write to a stream i.e. for await (const event of rendered.body) {
res.write(event);
} you need to handle backpressure in this case, PS: |
Hello @istarkov, a few points to address in regards to your comment. I'll break them out into a numbered list.
|
I think that we will end up exposing the ability to add middleware in |
Async iterators can be converted to streams for adapters that support the Streams API (we could support Cloudflare Workers and I can add that if we want to see it). I don't know the maintainers philosophy, yet I'm hesitant to suggest a feature that is dependent on non-LTS versions of Node.js or dependencies that would be foisted on downstream developers. Plus, where Node.js 16.6.2 has experimental support, the ReadableStream is still converted to a Node Stream via async iterator support to send the ServerResponse. I am not a maintainer, but I think the SvelteKit design is one that the endpoints are emitting data, and the async iterator promotes that design choice. Regardless of my opinions, here are two demonstrative endpoints for comparison. (Tested on Node.js 16.6.2 on experimental branches in my fork) Streams API: import type { RequestHandler } from '@sveltejs/kit';
import type { Locals } from '$lib/types';
import { ReadableStream } from 'node:stream/web';
const wait = ms => new Promise(resolve => setTimeout(resolve, ms));
// GET /es.json
export const get: RequestHandler<Locals> = async (request) => {
// Handle setup here..
const es = new ReadableStream({
async start(controller) {
while (true) {
controller.enqueue(`event: time\ndata: ${(new Date()).toISOString()}\n\n`);
await wait(1000);
}
},
cancel() {
// Handle cleanup here..
}
});
return {
headers: {
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Content-Type': 'text/event-stream',
},
body: es
};
}; Async Iterators: import type { RequestHandler } from '@sveltejs/kit';
import type { Locals } from '$lib/types';
const wait = ms => new Promise(resolve => setTimeout(resolve, ms));
// GET /es.json
export const get: RequestHandler<Locals> = async (request) => {
// Handle setup here..
async function* es() {
try {
while (true) {
yield `event: time\ndata: ${(new Date()).toISOString()}\n\n`;
await wait(1000);
}
} finally {
// Handle cleanup here..
}
}
return {
headers: {
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Content-Type': 'text/event-stream',
},
body: es()
};
}; The only caveat is that the import of ReadableStream from node would have to be injected by SvelteKit into the global namespace per adapter. I don't know how to accomplish that. I suppose this is another reason why I like async iterators, they are more universally available today.. |
We do that with |
@mrkishi good to see you! long time. Just chiming in to say that I think using async iterators rather than
|
It really seems like a lot of effort is going into making svelte-kits capabilities uniform across the spectrum of adapters. I understand that, to a degree. It makes documenting and migration easier. I think an important question is, what do you lose by doing so? In reading through this issue and related (people requesting middleware or file uploads), I'm sort of left with confusion over what Svelte Kit aims to be. @Rich-Harris, you've called SvelteKit a "serverless framework" and have voiced opposition to exposing the underlying connect-style This makes sense if the objective of the project is to be a serverless framework. I totally understand why, in that context, that you'd rather people abandon node and serverful environments. It's just another thing that needs to be maintained and a deviation from the goal, a distraction. If the objective is for SvelteKit to be agnostic of where a person's application resides, be that a kubernetes cluster, netlify, vercel, or the next trend, then I think you may want to consider how it can be perceived from folks who are not going with serverless environments. It seems as though you are limiting yourself to the lowest common denominator while devoting energy to construct work arounds which conform to that standard. I understand that some of the defacto packages in the node ecosystem can be confusing or less than desirable. In fact, it is somewhat surprising to me that many haven't been supplanted over the years. Having said that, they are clearly still viable and heavily depended upon. This means that adoption of kit will add on even more development tax of using Svelte as people are forced to re-invent the wheel with less resources and capabilities than are present in their stack. I think it may also limit folks to address kit-specific challenges. For example, I'd like to be able to generate static files, house them in GCS, and stream them down upon request. That's not feasible with kit as-is. The usage of async generators may make part of that problem solvable but it'll require more effort on my part than utilizing node streams today or web streams in the future. I'm not sure I would have chosen Svelte and Kit, despite my opinion that Svelte is the best option available for front-end frameworks, if my API weren't being built in another language. I would have probably rolled with Next and just dealt with my dislike of React. |
The goal is to make SvelteKit apps as portable as possible, which definitely doesn't preclude adding non-portable things like direct access to underlying primitives like This thread is a wonderful example of that. If It's definitely tempting to think 'if we have the escape hatch, not only will we be able to solve $THIS_PROBLEM, we'll be able to solve a bunch of $OTHER_PROBLEMS in future, but what I've often found is that when you actually sit down and think about $OTHER_PROBLEMS it turns out to be a finite list where we already know the general shape of the solutions; we just have to implement them. Sometimes there are problems you couldn't anticipate, and in those cases escape hatches are useful, but it does need to be an escape hatch rather than a blessed solution otherwise it ends up being used for everything. The consequence of relying on
Could you elaborate on this part? You would be able to use node streams like so... return {
body: fs.createReadStream('data.csv')
}; ...and web streams like so: // today
return {
// https://jakearchibald.com/2017/async-iterators-and-generators/#making-streams-iterate
body: streamAsyncIterator(readable)
};
// tomorrow
return {
body: readable.getIterator()
}; |
I'm going to back-peddle on my reply above a bit. Both Fastify and Koa departed from the traditional connect/express approach to server middleware and did just fine. Micro, the server for Next, is different. In the case of Koa and Micro, middleware could be wrapped to work. Fastify has done quite well despite the fact that node's legacy middleware are all but incompatible and require ports, such as fastify/fastify-passport. |
My hope is that many use cases could be addressed via an adapter that converts Connect middleware, not unlike how serverless-http mocks // src/hooks.js
import { sequence } from '@sveltejs/kit/hooks';
import { convert } from 'svelte-kit-middleware-adapter';
import foo from 'express-foo';
import bar from 'express-bar';
export const handle = sequence(
convert(foo()),
convert(bar()),
({ request, resolve }) => resolve(request)
); |
Sorry, I posted my backtrack above without seeing that you had responded to me.
Thats certainly true. It'll slow adoption considerably but perhaps thats not a bad thing. If the framework provides viable mechanisms to employ the logic, its just a matter of time before the solution bank is built up. It puts more on early adopters but that's always the case.
I stand corrected. That'll be clean enough for me.
That'll work. Thanks for taking the time to explain your stance and reasoning. And for Svelte/Kit. |
I’m glad to see the alignment solidifying. I concur that canceling the timeout is a fairer comparison, if a more robust example was helpful, we could move on that.. I must make the callout that I’m running a company and could use any assistance on getting that pull request over the hump; otherwise I’ll continue to be fitful with progress. In what way can I be the most helpful? |
@kjmph No worries, I was hoping to give it a shot this week, EDIT: hadn't seen that you already dealt with rebasing in the new PR - awesome, thanks! |
And @Rich-Harris big thanks for giving this your feedback and blessing, it's exciting to be reassured we're on the right track with this approach. |
Great discussion, everybody. I think I should've skipped the first part of my comment as I realized midway async iterators were a good default, but ended up only mentioning it in passing. After testing some async iterators behavior, I'm wary about the stream to async iterator to stream conversion that's going to be necessary on Node/Deno/Web when doing IO. There's a subtle but important mismatch in that async iterators don't support cancellation. I don't think we can guarantee cleanup in the general case. Check out people trying to deal with cancellation and how finicky it can be. I'm not sure how to deal with that. Note that it'd be possible to leak resources even without crazy pipelines: const req = await fetch('/sparse-stream');
return { body: streamAsyncIterator(req.body) }; When the downstream connection closes, at most we could call |
@mrkishi good point, there would be no way to know a connection was closed and thus no opportunity to abort and release resources early other than timeouts. Note this would also be the case if you tried to create a massive string response asynchronously - even if the connection is lost, the async endpoint will continue working away. So you have a good point, that async iterators are an imperfect and incomplete proxy for streams, and so hooking into the |
@mrkishi @jesseskinner this could probably be solved by providing an AbortController inside api handlers. AbortControllers are web agnostic so I doubt the break the existing goals of sveltekit https://developer.mozilla.org/en-US/docs/Web/API/AbortController. Psuedocode would look like this: import type { Request } from '@sveltejs/kit'
export async function get(params: Request) {
const { signal } = params.controller
// sveltekit will call controller.abort() if the request connection is closed
const external_resource = fetch('https://example.com', { signal })
return {
body: streamAsyncIterator(external_resource.body)
}
} Internally, sveltekit would just have to make sure it returns the iterator before cancelling the AbortController. This would mean manually passing around abort controllers, but I think this is actually the ideal scenario. It means we have full control over which resources we want to close with request and which we want to keep open. Imo its totally fine waiting to implement this separately from async iterator support, but there might be other opinions there. |
Just a follow question because I may have misunderstood the use case here. If I want to serve video files to the browser, would I be able to use iterators to stream that data or would I have to write an endpoint that understands |
RE: RE: |
I created a separate issue about streaming requests on endpoints but it was closed by maintaner. Does this issue mention this use case? If not then here's my input on data streaming #2992 |
That's correct, this issue focuses on responses. Requests are limited to parsing in some restricted cases, as the documentation clarifies. Here is an example of the runtime, and a Cloudflare Worker Request Context constructing the rawBody. There is an expectation that streams aren't a valid input for rawBody. I'd suggest clarifying that in your linked issue @quentincaffeino, perhaps it can be re-opened and investigation can commence on if it is feasible to stream a request body. |
Hi folks, I wanted everyone to know that Rich posted a mini-RFC regarding file uploads, which overlaps quite a bit with the discussion here around async iterators, etc. Please take a look |
I've opened #3419 which covers the use case discussed here as well as multipart forms (#70), in light of the changes in #3384. tl;dr @mrkishi convinced me that Closing this in favour of that issue — thanks for the valuable discussion everyone! |
Is your feature request related to a problem? Please describe.
I have APIs built with Sapper that I'd like to migrate to SvelteKit, but some endpoints rely heavily on using Node streams to pipe data to the response, as there can be gigabytes of data that are too large to serialize to a string server-side in memory. Some of this data is coming from a database, other data is coming from streaming and processing large files that live on AWS S3.
Describe the solution you'd like
I'd like to be able to return an async iterator in the body attribute of the response. I realise streaming isn't possible with most (all?) serverless adapters or
adapter-static
, but would certainly be possible withadapter-node
and possibly other adapters that don't yet exist.Describe alternatives you've considered
adapter-node
and the development server could expose the Noderes
object to the request so I could pipe directly to the res object inside the request handler.Otherwise, I will have to split off the API into a standalone Node server that does not live within SvelteKit, or continue to use Sapper and have a much slower build process during development.
How important is this feature to you?
I would love to keep my API in SvelteKit as I currently do in Sapper. It's really nice having the API endpoints live in the same router. To split it off into a separate server will incur extra complexity, such as dealing with sharing authentication, keeping the servers in sync, multiple code bases, etc. I would really rather not stay with Sapper long term.
Additional context
Add any other context or screenshots about the feature request here.
EDITED: changed the preferred solution from stream object to async iterator, because Node Streams in v10+ are also async iterators.
The text was updated successfully, but these errors were encountered: