Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect import order with code splitting and multiple entry points #399

Open
iamakulov opened this issue Sep 20, 2020 · 29 comments
Open

Comments

@iamakulov
Copy link

iamakulov commented Sep 20, 2020

For the workaround, see #399 (comment).


Consider the following code:

// entry1.js
import "./init-dep-1.js";
import "./run-dep.js";

// entry2.js
import "./init-dep-2.js";
import "./run-dep.js";

// init-dep-1.js
global.foo = {
  log: () => console.log("foo.log() (from entry 1) called"),
};

// init-dep-2.js
global.foo = {
  log: () => console.log("foo.log() (from entry 2) called"),
};

// run-dep.js
global.foo.log();

When you bundle this code with esbuild --bundle --outdir=build --format=esm --splitting --platform=node --out-extension:.js=.mjs ./entry1.js ./entry2.js, ESBuild discovers that run-dep.js is a module shared between both entry points. Because code splitting is enabled, ESBuild moves it into a separate chunk:

// build/entry1.js
import "./chunk.P2RPFHF3.js";

global.foo = {
  log: () => console.log("foo.log() (from entry 1) called")
};

However, by doing so, ESBuild puts run-dep.js above the init-dep-1.js or init-dep-2.js code. This changes the import order – and breaks the code:

$ esbuild --bundle --outdir=build --format=esm --splitting --platform=node --out-extension:.js=.mjs ./entry1.js ./entry2.js
$ node build/entry1.mjs
global.foo.log();
           ^

TypeError: Cannot read property 'log' of undefined
@evanw
Copy link
Owner

evanw commented Sep 21, 2020

As far as I know the execution order of imported ECMAScript modules is undefined (see also this question). The run-time is free to, for example, asynchronously download all imports in parallel and evaluate them in the order that their downloads finish, which could be different on different runs. The only guarantee is that all imports will be evaluated before any code in the script doing the importing is evaluated.

Using ECMAScript module imports means having to write your code to be robust to different import execution orders. The bundling algorithm in esbuild takes advantage of this flexibility to do the above transformation.

If you need deterministic order of imported code, you can either use require() instead of import or use import but defer the execution of certain code by wrapping it in a function and calling it later.

@guybedford
Copy link
Contributor

guybedford commented Sep 21, 2020

Module evaluation in ECMA-262 is synchronous from beginning to end. It is defined strictly as a post-order graph execution in https://tc39.es/ecma262/#sec-innermoduleevaluation. There isn't any room for misinterpretation of this and no JS engines violate this behaviour.

What is not synchronous is the instantiation phase which means that a top-level execution job has task queue yielding before execution begins. The confusion in eg that Stack Overflow thread is because it is not a functional execution like require() but a graph execution.

There should be no task queue operations triggered during the entire top-level execution operation once it starts.

The only case a module graph executes asynchronously is if one or more of the modules in that graph use top-level await. In such a case, the first top-level await acts as a sort of yield for that level of the dependency tree. The full graph is still executed synchronuosly up to this first top-level await in the post-order traversal, and resumes as a callback and synchronously after it with dependencies queues in parallel for this waiting.

The sync post-order execution is the start of the semantics though, and top-level await was added very much as an additional adjustment to this algorithm without loosening any of these strong constraints which was one of the key goals for its specification.

@evanw
Copy link
Owner

evanw commented Sep 21, 2020

Thanks for the additional detail. I have personally struggled to find information about how this works, and that Stack Overflow question was pretty much the only reference I could find (I personally find that part of the specification incomprehensible). To clarify: are you saying the Stack Overflow answer is simply incorrect? If so I need to do an audit of the bundler and I may need to rework more than just this, since this is the mental model I've been working with while writing the bundler.

There should be no task queue operations triggered during the entire top-level execution operation once it starts.

Does that mean transforming ECMAScript modules to AMD is an incorrect transformation? I'm not too familiar with AMD but I believe it has the semantics I originally described, not these semantics.

@guybedford
Copy link
Contributor

To clarify: are you saying the Stack Overflow answer is simply incorrect?

Yes!

The specification linked there is an implementation of the tarjan strongly connected components algorithm, which is the missing background if wanting to decipher it.

Does that mean transforming ECMAScript modules to AMD is an incorrect transformation? I'm not too familiar with AMD but I believe it has the semantics I originally described, not these semantics.

I think you might be right here that AMD does not specify timing at all and RequireJS for example does not seem like it will wait to execute everything as a single top-level job, but rather executes dependencies whenever they are ready. This is certainly not the only lossy aspect semantically though.

@guybedford
Copy link
Contributor

Also to clarify, the only reason the spec has to use Tarjan is to ensure that the module registry loading states for a cycle transition together and do not leave partial states for "strongly connected components". Apart from that it's just a standard depth-first post-order execution. Using Tarjan did turn out to be useful in detecting the strongly connected components when it came to top-level await though since top-level await and cycles actually turned out to need all this information in managing completion handling.

@evanw
Copy link
Owner

evanw commented Nov 28, 2020

I'm in the middle of working on another iteration of the code splitting feature and I just discovered what could be considered another ordering bug. Currently /* @__PURE__ */ comments are treated mostly the same as pure code (i.e. code having no side effects). This means they can end up being reordered within the same file in some code splitting situations.

However, there are definitely going to be cases where people annotate impure code with /* @__PURE__ */ and expect the impure code to continue to work. So esbuild should not consider these expressions to be pure.

I'll have to make this side effect boolean a three-state value:

  • Impure and cannot be reordered or removed
  • Pure and can be reordered or removed
  • "User marked as pure but not actually pure" meaning ok to remove but not reorder

Just documenting this here for later. Back to the drawing board.

Edit: Interesting. This could potentially lead to dead code when combined with code splitting because this means it's no longer valid to extract statements containing /* @__PURE__ */ calls out of the file they came from and inline them into another file. So all such statements in a given file will be included in all entry points if they are used by only a single entry point. Oh well. I don't see a way around this really unless you get really complicated with tracing ordering constraints across chunks but that's going too far.

Edit 2: Actually Terser does sometimes reorder function calls1 marked as /* @__PURE__ */ so maybe I can just keep treating it as having no side effects and avoid the extra complexity of order preservation. Although I bet Terser doesn't reorder whole statements marked as /* @__PURE__ */ like esbuild may be doing. One example of a problem this could cause is some function calls marked as /* @__PURE__ */ that add things to a registry on startup being reordered depending on the code splitting boundary. Arguably that could be considered a problem with the code instead of with esbuild since calls to real pure functions are fine to reorder. At least there's --tree-shaking=ignore-annotations to disable interpretation of /* @__PURE__ */ comments.

1 For example, this code will be minified to b()(a() ? c : d) by both Terser and esbuild, which means a() and b() are reordered:

if (/* @__PURE__ */ a()) return (/* @__PURE__ */ b())(c)
else return (/* @__PURE__ */ b())(d)

@evanw
Copy link
Owner

evanw commented Nov 29, 2020

I just discovered that I think Rollup also has similar ordering bugs with code splitting. Here is an example (link):

// main.js
import './lib2'
import './lib1'
if (window.lib !== 'lib1') throw 'fail'
// main2.js
import './lib1'
import './lib2'
if (window.lib !== 'lib2') throw 'fail'
// lib1.js
window.lib = 'lib1'
// lib2.js
window.lib = 'lib2'

With this input Rollup generates this code, which incorrectly causes the second entry point to throw:

// output/main.js
import './lib1-4ba6e17e.js';
if (window.lib !== 'lib1') throw 'fail'
// output/main2.js
import './lib1-4ba6e17e.js';
if (window.lib !== 'lib2') throw 'fail'
// output/lib1-4ba6e17e.js
window.lib = 'lib2';
window.lib = 'lib1';

I wonder if this is a known bug with Rollup or not. A quick search of existing issues didn't uncover anything.

My plan for handling this in esbuild is to make code in non-entry point chunks lazily-evaluated (wrapped in callbacks) and then have the entry points trigger the lazy evaluation (call the callbacks) in the right order. This has the additional benefit of avoiding conflating the binding and evaluation phases in non-esm output formats. If evaluation is deferred, then all binding among chunks happens first followed by all evaluation afterward.

Another possible approach could be to generate many more output chunks for each unique atomic piece of code. But that seems like it'd be undesirable because it would generate too many output chunks. I would also still have to deal with the binding/evaluation issues for non-esm output formats if I went with that approach.

@evanw
Copy link
Owner

evanw commented Feb 20, 2021

I'm deep in the middle of working on this at the moment. I just discovered an issue with my approach. It's a problem similar to the one I discovered in #465: external imports (imports to code that is not included in the bundle) must not be hoisted past internal code or the ordering of the code changes, which is invalid. The problem is that this requires the generation of extra chunks instead of just generating a single chunk of code.

Consider this input file:

import {a} from './some-internal-file.js';
import {b} from '@external/package';
console.log(a, b);

It would be invalid to transform that to something like this:

import {b} from '@external/package';
let a = /* ... code from ./some-internal-file.js ... */;
console.log(a, b);

That reorders the side effects of the internal file past the side effects of the package. Instead you have to do something like this for correctness:

import {a} from './chunk.HSYX7.js';
import {b} from '@external/package';
console.log(a, b);

where ./chunk.HSYX7.js is an automatically-generated chunk containing all of the code from before the external import. The external import means it is impossible to bundle this code into a single file if you care about import order. I don't think you can replace this with await import() to get around this because there could be multiple external imports involved in a cycle that need to be able to reference each other's exports.

I have been just doing what other bundlers do so I haven't considered this case before. But other bundlers such as Rollup appear to get this wrong too. This also gets massively more complicated with top-level await and different parallel execution graphs. Not sure what to do about this new level of complexity yet.

@bogdan-panteleev
Copy link

bogdan-panteleev commented Aug 2, 2023

Also faced with a similar problem.

I have a code:

// main.ts
import dotenv from 'dotenv';
dotenv.config();
console.log('configured', process.env.DOTENV_RESULT);

import { ENDPOINTS } from '../contants/endpoints';

console.log('check import', ENDPOINTS.USERS); // to escape tree shaking

// endpoints.ts
console.log('inside dependency');
export const ENDPOINTS = Object.freeze({
  USERS: `/users`,
  POSTS: `/posts`,
});

I build these files via command:
esbuild main.ts --bundle --format=esm --splitting --outdir=dist --platform=node --sourcemap.

I also tried esbuild main.ts --bundle --outdir=dist --platform=node --sourcemap. The result is the same, despite that resulting bundles are a little bit different.

The resulting log is:

  1. inside dependency
  2. configured
  3. check import

So the problem is that I do not have dotenv configured inside dependency, because dependency is initialized before. This is applicable to any other scenario with dependencies.

@maelp
Copy link

maelp commented Dec 8, 2023

I'm arriving at this issue also because of import order bugs, so what is the take on this issue?

  • is it that ESM does not specify import order, and we have to deal with it ourselves (eg assuming all imports are async somehow?)
  • or is that something where there is work in progress to fix the issue?

@Murtatrxx
Copy link

What is the current status and is there an ETA for a fix?

@richarddesan
Copy link

Facing the same problem, we are postponing the use of esbuild in our monorepo for the moment until there is a solution for this

@Valgrifer
Copy link

Put a comment /* @__FIXED__ */ or something else, it was not a solution to indicate to esbuild to place this code before the other import?

@zhiyan114
Copy link

Looking for a solution as well!

The latest version of sentry.io SDK requires init to be called before the rest of the code and currently esbuild is still treating

import sentryStuff;
init(...);
import "./index.js"

as

import sentryStuff;
import "./index.js"
init(...);

Which breaks sentry's auto-instrument

@pfdgithub
Copy link

Looking for a solution as well!也在寻找解决方案!

The latest version of sentry.io SDK requires init to be called before the rest of the code and currently esbuild is still treating最新版本的 sentry.io SDK 要求在其余代码之前调用 init,目前 esbuild 仍在处理

import sentryStuff;
init(...);
import "./index.js"

as 如

import sentryStuff;
import "./index.js"
init(...);

Which breaks sentry's auto-instrument这破坏了哨兵的自动仪表

you can import and initialize sentry in html file

@zhiyan114
Copy link

Looking for a solution as well!也在寻找解决方案!
The latest version of sentry.io SDK requires init to be called before the rest of the code and currently esbuild is still treating最新版本的 sentry.io SDK 要求在其余代码之前调用 init,目前 esbuild 仍在处理

import sentryStuff;
init(...);
import "./index.js"

as 如

import sentryStuff;
import "./index.js"
init(...);

Which breaks sentry's auto-instrument这破坏了哨兵的自动仪表

you can import and initialize sentry in html file

I'm running the code in a backend environment (NodeJS). For the workaround, I just replace import with require, but I still like it to be import, mostly so that the transpiler can "optimize" the import.

hyf0 added a commit to rolldown/rolldown that referenced this issue Jul 10, 2024
<!-- Thank you for contributing! -->

### Description

This feature try to solve the execution order problem caused by rollup
and esbuild's code splitting logic. The nature of these problem is that
the execution order of shared modules are hoisted unexpectedly.

- evanw/esbuild#399
- evanw/esbuild#2598

This is done by adding helper function to simulate the original the
semantic esm input while keep the static linking between modules.

<!-- Please insert your description here and provide especially info
about the "what" this PR is solving -->
@rockyshi1993
Copy link

Maybe you should try this open source project nodestack

@tmcconechy
Copy link

@rockyshi1993 how would that help? It seems it uses esbuild in it? Also would prefer this got fixed since esbuild is so good and fast except for this bug

@rockyshi1993
Copy link

rockyshi1993 commented Jul 23, 2024

@rockyshi1993 how would that help? It seems it uses esbuild in it? Also would prefer this got fixed since esbuild is so good and fast except for this bug

Problems will only occur if multiple entries are configured at the same time. If you use a server-side rendered page to configure a single entry point, you can completely ignore the current problem

@theoephraim
Copy link

theoephraim commented Nov 20, 2024

Similar to the sentry issue, I'm running into issues trying to inject a process.on('uncaughtException', ... handler before anything else gets imported. Ideally would be able to mark a section of code as "do not move" or even "hoist to top" to fix this.

I have figured out a workaround for my issue, which is just needing something to run first.

I am using tsup (which is using esbuild under the hood). The fix:

  • move the code you need to run to a separate file
  • import that file
  • add that file as a new entrypoint to build, and make sure it is the FIRST entrypoint

bep added a commit to bep/hugo that referenced this issue Dec 13, 2024
We already do this for scripts e.g. inside a group.

This makes sure that group A's entry points gets added before B's, which can be an important property, see evanw/esbuild#399 (comment)
bep added a commit to gohugoio/hugo that referenced this issue Dec 13, 2024
We already do this for scripts e.g. inside a group.

This makes sure that group A's entry points gets added before B's, which can be an important property, see evanw/esbuild#399 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests