-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy-less s3 upload for nodejs - progressStream copied the data in memory for no good reason #945
Conversation
…mory for no good reason For Stream Body - changed the progress to tap into the stream without copying using the 'data' event in addition to piping. For non-Stream Body - simply emit the progress event immediately since the data is already loaded in memory. This reduces CPU consumption, and increases the throughput pushed by the client. Tested to push more than 2000 MB/sec from a single node client process compared to less than 1000 MB/sec before.
Hey @chrisradek, would love to get your feedback on this fix. I know it's in a very common path so let me know if you see anything i could do to write it safer for existing code and make it easier to merge. Thanks! |
@guymguym |
@guymguym Sorry for the delay! |
Hey @chrisradek Sorry about the hassle. |
I ran the project's
|
@guymguym Sorry I haven't dug deeper into whether this is an actual issue or not yet, but it is on the list! |
it was a string before since it was taken from the content-length header
@chrisradek no worries, at least the progress test that you pointed out is quite easy to figure out. This test asserts that once a putObject of 2 MB buffer is used, there will be 2 or more progress events, where the first should have a loaded count of at least 10. Before my change every buffer was converted to a stream, which had a highWaterMark of 16KB which means that the event was issued many times, certainly more than once. In my change, I simply changed to avoid such a conversion from memory buffer to stream that was hurting performance, and simply write the entire buffer at one call ( It is quite simple to imitate the asserted behavior, but the question is if it makes sense to assert that a buffer should provide multiple progress events? Would be great to know what you think. Thanks! |
I tried a fix that will emit a progress event once the output 'drain' event is received. |
BTW the travis failures seem unrelated to my changes - not sure what's that jam:
|
@chrisradek |
@guymguym |
@chrisradek
|
@guymguym Then, to run the tests, you type: Thanks for pinging me here. I should have more time this week to review it. /cc @LiuJoyceC as well. |
thanks, all passed:
|
@guymguym well needed! good job on this one |
Hey @chrisradek Do you see any gaps? I had a minor dilema about the way buffer/string data emits progress events:
Personally I would probably prefer to keep it simple, but I can certainly understand if you think the finer events are better for clients, and it depends on how common it is to send very large request body. Would be great to know how you think about it.
|
|
||
if (body instanceof Stream) { | ||
// for progress support of streaming content | ||
// tap the data event of the stream in addition to piping | ||
body.on('data', function(chunk) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think tapping into the data
event on a stream could cause some unintended side-effects. In node.js 0.10.x, binding a data
listener on a stream will cause the stream to emit 'data' events as fast as it can, ignoring the back-pressure that's automatically in place when using pipe
. If the writable stream is slow, then this could cause loss of data.
I don't think this is an issue in versions of node.js >= 0.12.x, and some simple testing seems to confirm that. However, we need to work with node.js 0.10.x as well.
The current method creates a new writable
stream that also gets piped into in order to emit the 'sendProgress' events. I know it'll require refactoring your logic but that path seems safer across all our supported versions of node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point. Thanks for taking the time to review in depth.
In one of my previous tests I saw that the pipe to the writable progressStream caused buffers to be copied. But now I'm not sure about it completely, as it might have mixed with the buffer path in the code in my runs.
Anyhow, I think to change the path of stream-body progress to use Transform stream to emit the events in the familiar pattern body.pipe(progressStream).pipe(stream)
. Transform stream will push the same buffer so will surely avoid copies. sounds right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a transform stream to emit the events sounds like a great idea!
…e 0.10 progress for buffer body fixed to emit just once the stream drains and fix the feature test to require just 1 event instead of 2
@chrisradek |
body.pipe(stream); | ||
// For progress support of streaming content - | ||
// pipe the data through a transform stream to emit 'sendProgress' events | ||
body.pipe(this.progressStream(stream, totalBytes)).pipe(stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one last change! Since we still have users running on node 0.8, we should only pipe into the progress stream if typeof TransformStream !== undefined
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sure, so there is no way to provide stream progress for 0.8?
BTW I remembered correctly that using multiple writable streams behaves weird in nodejs v4 - I just submitted this issue nodejs/node#6491. |
@guymguym |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread. |
For Stream Body - changed the progress to tap into the stream without copying using the 'data' event in addition to piping.
For non-Stream Body - simply emit the progress event immediately since the data is already loaded in memory.
This reduces CPU consumption, and increases the throughput pushed by the client.
Tested to push more than 2000 MB/sec from a single node client process compared to less than 1000 MB/sec before.