-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Starting with 10.8.0 I get the last chunk of a previous read stream at the start of the next one #22420
Comments
This issue tracker is for reporting bugs in node core and submitting feature requests for node core. General help questions should be posted to the nodejs/help issue tracker instead. Issues with third-party modules, npm, or other tools that use node, should be posted to the appropriate issue tracker for that project, unless it can be proven that the issue is in fact with node core and not the module/tool in question. |
This comment has been minimized.
This comment has been minimized.
Can I ask that you either copy the code into comments here or into a gist. I (and I know others here) are generally unwilling to download an opaque tar ball... |
Also, I assume you mean Node.js 10.8.0 and not 10.0.8? |
EDIT: type annotations removed Test file 1: streams 'use strict';
const path = require('path');
const fs = require('fs');
const crypto = require('crypto');
module.exports.createReadStream = function (hash, ondata, encoding) {
return new Promise((resolve, reject) => {
const stream = fs.createReadStream(
hash,
{encoding}
);
const cryptoHashObj = crypto.createHash('sha256');
let buf;
if (encoding) {
stream.on('data', data => {
cryptoHashObj.update(Buffer.from(data, 'utf8'));
ondata(data);
});
} else {
// Convert the node.js Buffer to an ArrayBuffer and adjust the size if it is not full
stream.on('data', data => {
cryptoHashObj.update(data);
if (data.length < data.buffer.byteLength) {
ondata(data.buffer.slice(0, data.length));
} else {
ondata(data.buffer);
}
});
}
stream.on('error', err => reject(err));
stream.on('end', () => resolve(cryptoHashObj.digest('hex')));
});
};
module.exports.createWriteStream = function (filename, encoding) {
const cryptoHashObj = crypto.createHash('sha256');
const stream = fs.createWriteStream(
filename,
{encoding}
);
const write = data => {
const buf = typeof data === 'string' ?
Buffer.from(data, encoding) :
Buffer.from(data);
cryptoHashObj.update(buf);
stream.write(buf);
};
const end = () => {
return new Promise((resolve, reject) => {
const hash = cryptoHashObj.digest('hex');
stream.once('error', err => reject(err));
stream.once('finish', () => resolve(hash));
stream.end();
});
};
stream.once('error', err => console.error('Stream error', err));
return {
write,
end
};
}; Test file 2: main 'use strict';
const streams = require('./streams.js');
const UTF8_FILE = '9101a84eb2320001628926e3c4decd09eff6680809e95a920db4e18afbcf0201';
const BINARY_FILE = 'ee1758d957bac3706b6a0ad450ffaeab34d55d5c6d5988a8bef7ce72c8a7db85';
const txtFile = async function () {
console.log('\nTEXT FILE TEST\n');
const writeStream = streams.createWriteStream('copy-' + UTF8_FILE, 'utf8');
const onTxtData = function (data) {
if (data instanceof ArrayBuffer) {
throw new Error('What?');
}
console.log(
' TEXT CHUNK',
data.substr(0, 75).replace(/\n/g, '') + '...',
data.length
);
writeStream.write(data);
};
const readTextFileHash = await streams.createReadStream(UTF8_FILE, onTxtData, 'utf8');
const writeTextFileHash = await writeStream.end();
console.log('\nOriginal text file hash:', UTF8_FILE);
console.log('Read file calculated hash:', readTextFileHash);
console.log('Write file calculated hash:', writeTextFileHash);
};
const binFile = async function () {
console.log('\nBINARY FILE TEST\n');
const writeStream = streams.createWriteStream('copy-' + BINARY_FILE);
const onBinaryData = function (data) {
if (typeof data === 'string') {
throw new Error('What?');
}
console.log(
' BINARY CHUNK',
String.fromCharCode.apply(null, new Uint8Array(data)).substr(0, 75).replace(
/[^A-Za-z 0-9 \.,\?""!@#\$%\^&\*\(\)-_=\+;:<>\/\\\|\}\{\[\]`~]*/g,
''
) + '...',
data.byteLength
);
writeStream.write(data);
};
const readBinaryFileHash = await streams.createReadStream(BINARY_FILE, onBinaryData);
const writeBinaryFileHash = await writeStream.end();
console.log('\nOriginal binary file hash:', BINARY_FILE);
console.log('Read file calculated hash:', readBinaryFileHash);
console.log('Write file calculated hash:', writeBinaryFileHash);
};
const main = async function () {
await txtFile();
await binFile();
// To run more of those two tests in any sequenc:
// await txtFile();
// await txtFile();
// await txtFile();
// await binFile();
console.log('\nYou can also compare ');
};
main().catch(console.error); |
This code is quite strange if all it is doing is copying files and calculating a hash while it happens. With the non-standard syntax here it is going to be impossible to determine if this really is a bug in Node.js, whatever transpiling utility you're using, or in your code. Can I ask you to remove the non-standard syntax from this and see if it has the same problem. |
It is standard javascript, it merely has type annotations. They don't do anything. "Transpiling" merely means removing the types. This is Flow, not TypeScript. I'll remove the types. They are just like comments. |
I updated the code above to remove the types. The code is the way it is because the actual use case is a) cross platform (React Native, Browser, node.js) to read files on one node, send them via websocket, write them on another. That is why I have a frontend for streams, there is a different one on each platform. Just for the explanation, I think the code is simple enough, just standard stream things, I don't do anything fancy at all. I mean, The node.js part is just read a stream in chunks and write them to a write stream, Most of main.js is just to create a nice example. And by the way, all that encoding stuff is necessary for the React Native platform which can do binary only using base64 encoding(!). Otherwise I would have only Buffers and no string streams. |
Looking into it, I think it’s indeed that PR, in that the more efficient usage of the underlying ArrayBuffers is what’s causing this: if (data.length < data.buffer.byteLength) {
ondata(data.buffer.slice(0, data.length));
} else {
ondata(data.buffer);
} This piece of your code doesn’t seem to do what it should; what if |
Ah, right, yep, just spotted that also. Prior to the change in #21968, that offset wouldn't really have mattered for |
@addaleax ... I'm wondering if we shouldn't at least add a comment to the stream docs advising folks that they need to pay attention to the |
Closing, as this appears to be a bug in the user code and not in Node.js. The solution is to properly account for |
No there isn't, as I said, I have a cross-platform scenario with a lot of stuff around it. All stream modules are equal so that the main code can use one common stream interface, and I have to do the stupid string-based encoding stuff for React Native where binary streams are base64 strings. When I change the zero to THAT IS A BUG IN NODE.JS |
This fixes your code. |
Relevant documentation: https://nodejs.org/dist/latest-v10.x/docs/api/buffer.html#buffer_buf_byteoffset
|
@addaleax Indeed it does. However, something changed from 10.7.0 to 10.8.0 and I always read the changelog — there was nothing in it that seemed relevant... |
@lll000111 I’m not sure – the relevant PR is the one linked above by @jasnell, #21968. It’s one of only two entries in the 10.8.0 changelog that target the |
'use strict'
const cloneable = require('cloneable-readable')
const fs = require('fs')
const crypto = require('crypto')
const { Transform, pipeline } = require('stream')
const stream = cloneable(fs.createReadStream(__filename))
const hash = crypto.createHash('sha256')
const encoder = new Transform({
transform(chunk, encoding, callback) {
callback(null, chunk.toString('hex'))
}
})
pipeline(stream.clone(), hash, encoder, process.stdout)
pipeline(stream, process.stdout) |
@jasnell Is that code about your earlier comment
Because I already said a few things about that. I can't use fancy node.js specific things. |
Platforms: Windows and Linux
Everything is fine as long as I use 10.7.0 or older. The problem starts when I go to 10.8.0 or newer.
I created a test project that reproduces the issue. It
test.tar.gz
What you will see is that the write stream for the binary file gets the last chunk from the previous text file's read stream.
The read stream SHA-256 still is good, however, but the write stream's SHA-256 is not. The problem only seems to happen with binary files.
Test project output:
The text was updated successfully, but these errors were encountered: