-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunk name collisions if 2 chunks' content is identical #928
Comments
Thanks for calling this out! I did not anticipate this issue. I will need to mix input file names into the file name hash to avoid this. |
That was quick! Thanks for fixing. When you say "input file names", do you mean the files that the code originated in? Or the entry points that import the chunk? i.e. in my example above is it Neither would be ideal because in first case there could still be collisions if all the code originates in same file (as in my example), and in 2nd case the hash would change if a further entry point also imports the same statements, despite the content of the shared chunk remaining the same. I think what you're doing in c05c489 is the former, and then mixing in the "part range" which I assume refers to the line/column numbers of the source - which would solve all cases I can envisage. But I don't "speak" Go, so I'm not 100% sure I've understood the change you've made. Could you possibly confirm? |
The input file names are names of all the original source code files that ended up in the chunk, so just This is why the part range is also necessary. The indices of the part range are top-level statement indices instead of line/column numbers but it's the same idea. Top-level statement indices are the used because this is the granularity of the file splitting optimization. But I'm considering removing the file splitting optimization somewhat soon to make it possible to implement support for top-level await, which has file-level evaluation order semantics. Then the hash algorithm wouldn't need the part range data anymore. It would then only need the input file names. |
Thanks for your explanation. Yes, that's what I'd guessed the fix you made was doing (bar statement indices instead of lines/columns). I just made a comment on the 0.9.0 roadmap PR saying how valuable the statement-level splitting is. But hey, if it's infeasible to make it top-level await work with it, I can see TLA would be higher priority. |
Actually, I have another solution which I think is slightly better. The intent of the hashes in chunk filenames is to facilitate caching. As much as possible, if the content of the file doesn't change, it's filename shouldn't change either. Mixing metadata into the hash produces another source of change which is not related to the actual contents of the chunk file. For example, in my case above, if Until statement-level code-splitting is removed, reversing the order of the statements in A simpler method to ensure unique hashes without this side effect would be to use a simple counter. In JS: const usedHashes = Object.create(null);
function hashForFileName( contents ) {
const hash = sha1( contents );
const numUses = usedHashes[hash] || 0;
usedHashes[hash] = numUses + 1;
// If not used this hash before, return it
if (numUses === 0) return hash;
// Hash already used - mix in counter and hash again
return sha1( `${hash}${numUses}` );
} I guess this has the downside of being less parallelisable - the check for whether a hash has been used before would need to be in main thread - but I can't imagine it'll be a big hit. |
When splitting, chunk filenames are based on a content hash. If the content of 2 files is identical, they will have identical hashes and therefore ESBuild only outputs 1 shared chunk when it should output 2.
Here is a (rather artificial) case to demonstrate:
Output is as follows:
Running
src/index.js
logsfalse
, whereas runningbuild/index.js
logstrue
.This example is contrived, but given how many chunks ESBuild can create, and how small they can be, I think collisions like this are quite possible in the real world.
I'm not sure what best solution is, but ESBuild should probably throw an error rather than creating incorrect output at least.
The text was updated successfully, but these errors were encountered: