Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: ESM loader approach #36954

Closed
JakobJingleheimer opened this issue Jan 15, 2021 · 24 comments
Closed

module: ESM loader approach #36954

JakobJingleheimer opened this issue Jan 15, 2021 · 24 comments
Labels
discuss Issues opened for discussions and feedbacks. esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders module Issues and PRs related to the module subsystem.

Comments

@JakobJingleheimer
Copy link
Member

JakobJingleheimer commented Jan 15, 2021

There are 2 leading approach proposals for ESM loaders and chaining them.

Similarities

Both approaches:

  • consolidate hooks into
    • resolve(): finding the source (equivalent to the current experimental resolve()); returns {Object}
    • load(): supplying the source (a combination of the current experimental getFormat(), getSource(), and transformSource()); returns {Object}
  • Hooks of the same kind (resolve and load) are chained:
    1. all resolve()s are executed (resolve1, resolve2, …resolveN)
    2. all load()s are executed (load1, load2, …loadN)

Differences

Next()

This approach is originally detailed in #36396.

Hooks are called in reverse order (last first): a hook's 3rd argument would be a next() function, which is a reference to the previous loader hook. Ex there are 3 loaders: unpkg, http-to-https, and cache-buster (cache-buster is the final loader in the chain):

cache-buster invokes http-to-https, which in turn invokes unpkg (which itself invokes Node's default):
cache-busterhttp-to-httpsunpkg ← Node's default

The user must actively connect the chain or it (likely) fails: If a hook does not call next, "the loader short-circuits the chain and no further loaders are called".

Done()

This approach was also proposed in #36396 (in this comment).

The guiding principle of this approach is principal of least knowledge.

Hooks are called in the order they're declared/listed, and the return of the previous is fed as the input of the subsequent/next hook, and each hook is called automatically (unless short-circuited):

unpkghttp-to-httpscache-buster (if none of the supplied loaders output valid values, node's default loader/hook is invoked, enabling a hook to potentially handle only part and avoid re-implementing the native functionality node already provides via the default hook).

Hooks have a done argument, used in rare circumstances to short-circuit the chain.

Additionally, this proposal includes a polymorphic return:

hook returns continue? result scenario
done(validValue) no use validValue as final value (skipping any remaining loaders) module mocking (automated tests)
false no skip file (use empty string for value?) file is not needed in current circumstances
invalid value no throw and abort instead (current behaviour) user error
nullish yes loader did nothing: continue to next loader isn't for the file type
valid value yes pass value to next loader expected use

Examples

--loader https-loader \
--loader mock-loader \
--loader coffee-loader \
--loader other-loader

Resulting in https-loader being invoked first, mock-loader second, etc, and node's internal defaultLoader last.

For illustrative purposes, I've separated resolve and load hooks into different code blocks, but they would actually appear in the same module IRL.

Resolve hook chain

HTTPS Loader
const httpProtocols = new Set([
  'http:',
  'https:',
]);

/**
 * @param {Object} interimResult The result from the previous loader
 * (if any previous loader returned anything).
 * @param {string} [interimResult.format='']
 * @param {string} [interimResult.url='']
 * @param {string} context.originalSpecifier The original value of the import
 * specifier
 * @param {string?} context.parentUrl
 * @param {function} defaultResolver The built-in Node.js resolver (handles
 * built-in modules like `fs`, npm packages, etc)
 * @param {function(finalResult?)} done A short-circuit function to break the
 * resolve hook chain
 * @returns {false|{format?: string, url: string}?} If participating, the hook
 * resolves with a `url` and optionally a `format`
 */
export async function resolve(
  interimResult,
  // context,
  // defaultResolver,
  // done,
) {
  let url;

  try {
    url = new URL(interimResult.url);

    // there is a protocol and it's not one this loader supports: step aside.
    if (!httpProtocols.has(url.protocol)) return;
  }
  catch (err) {
    // specifier does not meet conditions for this loader; step aside.
    if (!determineWhetherShouldHandle(interimResult)) return;
  }

  return {
    url: '…',
  };
}
Mock Loader
export async function resolve(
  interimResult,
  context,
  defaultResolver,
  // done,
) {
  let url;

  try { url = new URL(interimResult.url) }
  catch (err) { url = new URL(defaultResolver(interimResult /* , … */).url) }

  url.searchParams.set('__quibble', generation);

  return {
    url: url.toString(),
  };
}

Load hook chain

HTTPS Loader
const contentTypeToFormat = new Map([
  ['text/coffeescript',		'coffeescript'],
  ['application/node',		'commonjs'],
  ['application/vnd.node.node',	'commonjs'],
  ['application/javascript',	'javascript'],
  ['text/javascript',		'javascript'],
  ['application/json',		'json'],
  // …
]);

/**
 * @param {Object} interimResult The result from the previous loader (if any
 * previous loader returned anything)
 * @param {string} [interimResult.format=''] Potentially a transient value. If
 * the resolve chain settled with a `format`, that is the initial value here.
 * @param {string|ArrayBufferView|TypedArray} [interimResult.source='']
 * @param {Object} context
 * @param {Array} context.conditions
 * @param {string?} context.parentUrl
 * @param {string} context.resolvedUrl The module's resolved url (as
 * determined by the resolve hook chain).
 * @param {Function} defaultLoader The built-in Node.js loader (handles file
 * and data URLs).
 * @param {Function} done A terminating function to break the load hook chain;
 * done accepts a single argument, which is used for the final result of the
 * load hook chain.
 */
export async function load(
  interimResult,
  { resolvedUrl },
  // defaultLoader,
  // done,
) {
  if (interimResult.source) return; // step aside (content already retrieved)

  const url = new URL(resolvedUrl);

  if (!httpProtocols.has(url.protocol)) return; // step aside

  const result = await new Promise((res, rej) => {
    get(resolvedUrl, (rsp) => {
      const format = contentTypeToFormat.get(rsp.headers['content-type']);
      let source = '';

      rsp.on('data', (chunk) => source += chunk);
      rsp.on('end', () => res({
        format,
        source,
      }));
      rsp.on('error', (err) => rej(err));
    });
  });

  return result;
}
Mock Loader
export async function load(
  interimResult,
  { resolvedUrl },
  defaultLoader,
  // done,
) {
  const isQuibbly = (new URL(resolvedUrl)).searchParams.get('__quibble');

  if (!isQuibbly) return;

  const mock = defaultLoader(urlToMock); // or some runtime-supplied mock

  return { source: mock };
}
CoffeeScript Loader
const exts = new Set([
  '.coffee',
  '.coffee.md',
  '.litcoffee',
]);

export async function load(
  interimResult,
  context,
  defaultLoader,
  // done,
) {
  if (
    !!interimResult.format
    && interimResult.format !== 'coffeescript'
  ) return; // step aside

  const ext = extname(context.resolvedUrl);

  if (!exts.has(ext)) return; // step aside

  const rawSource = interimResult.source || defaultLoader(
    {
      format: 'coffeescript', // defaultLoader currently doesn't actually care
    },
    context
  ).source;
  const transformedSource = coffee.compile(rawSource.toString(), {
    whateverOptionSpecifies: 'module'
  });

  return {
    format: 'module',
    source: transformedSource,
  };
}
Updates to ESMLoader.load()
class ESMLoader {
  async load(resolvedUrl, moduleContext, resolvedFormat = '') {
    const context = {
      ...moduleContext,
      resolvedUrl,
    }
    let shortCircuited = false; // should we support calling done with no arg?
    let finalResult;
    let format = resolvedFormat;
    let source = '';

    function done(result) {
      finalResult = result;
      shortCircuited = true;
    }

    for (let i = 0, count = this.loaders.length; i < count; i++) {
      const tmpResult = await loader(
        { format, source },
        context,
        defaultLoader,
        done,
      );

      if (shortCircuited) break;

      if (tmpResult == null) continue; // loader opted out

      if (tmpResult === false) {
        finalResult = { source: '' };
        break;
      }

      if (tmpResult?.format != null) format = tmpResult.format;
      if (tmpResult?.source != null) source = tmpResult.source;
    }

    finalResult ??= interimResult;

    // various existing result checks and error throwing
  }
}

Concerns Raised

Next()

  1. This creates an Inception-like pattern, which could confuse users: loaders would be specified in a different sequence than called, as loaders are called in a nested manner: the final loader calls the previous, and the previous calls its previous, etc all the way back to the beginning.
  2. The next function does not behave as many current, well-known implementations behave (ex javascript's native generator's next is the inverse order to this's, and not calling ExpressJS's route-handler's next does not break the chain).
  3. Requires the user to have specific knowledge: next is effectively required (not calling next will likely lead to adverse/undesirable behaviour, and in many cases, break in very confusing ways).
  4. Unit testing is more difficult (requiring spying in almost all cases, whereas done's needs spying very rarely, by, likely, more advanced users)

Done()

  1. This could potentially cause issue for APMs (does the next approach also?) After chatting with @bengl, it seems like this is not an issue as V8 exposes what they need.
  2. A hook that unintentionally does not return / returns nullish might be difficult to track down I believe this was resolved in the previous issue discussion?
@GeoffreyBooth GeoffreyBooth added esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem. discuss Issues opened for discussions and feedbacks. labels Jan 15, 2021
@JakobJingleheimer
Copy link
Member Author

@GeoffreyBooth I feel rather vindicated RE Concerns with Next() #3: When I was setting up the resolve hook test fixture for #37468, I indeed did forget the default-case return next(…) at the end. Took me a few minutes to figure out why the heck suddenly all my cases were failing.

I think if we opt for the next() approach (as I was authoring #37468, I'm increasingly leading toward "shouldn't", not least of which because the implementation looks to be much more difficult—read: buggy, real or perceived), I think some extra fault detection will be needed. Without, I think this could be a table-flip moment for users (I was getting annoyed, and I wrote the darn thing—so if anybody should know/remember, it would be me (and you)).

@DerekNonGeneric DerekNonGeneric added loaders Issues and PRs related to ES module loaders loaders-agenda Issues and PRs to discuss during the meetings of the Loaders team labels May 26, 2021
@GeoffreyBooth
Copy link
Member

This is worth discussing, sure, but I didn’t think #37468 was supposed to implement chaining; I thought it was just meant to combine the hooks and stop there, and a later PR would add chaining. Isn’t this something to figure out after #37468 has merged in?

@JakobJingleheimer
Copy link
Member Author

Yes #37468 isn't/doesn't, and yes that's all #37468 does. #37468 is ready to go, so I think this is next?

@giltayar
Copy link
Contributor

One point of concern about the Done method: when implementing a loader for mocking (as I did for testdouble's ESM support), you need to add cache busting to the final url, which means that you want to be the last in the chain. And yet, when loading the source, you want to be the first in the chain, because you know you're not really transforming the code, but rather replacing it.

This is why I prefer the Next() approach. While it means that loader developers need to be more aware of how the chaining works, it gives them more flexibility in how they implement the resolve and load functions of the loader, and does not "chain" them to the opinionated thinking of how things should happen, which is what the Done approach is trying to do.

Exhibit A 😊, the ESM mock loader for testdouble: https://github.com/testdouble/quibble/blob/main/lib/quibble.mjs

@JakobJingleheimer
Copy link
Member Author

JakobJingleheimer commented May 28, 2021

@giltayar sorry, I'm not following (but would like to understand your concern). done() would ensure it's the last hook executed. If you mean out of all possible hooks, all other hooks need to run and a particular one dynamically needs be last in queue, yes, next() would facilitate that and done() would not. Needing to do that dynamically is a critical differentiator but it sounds like very much an edge-case (please do correct me if I'm misunderstanding you).

Regarding needing the particular resolve hook's sibling load hook to be called first, both done() and next() could do that because ESMLoader will be aware of what file they both came from. But why do you need that?

I looked through your code example (I've looked thru Quibble previously—slick stuff!), but the "why" didn't jump out at me.

At first read, it sounds like what you're describing meticulously strong-arms a process that would probably naturally result in what you want anyway.

If this is an edge-case, unless it's a very compelling edge-case, I would likely lean towards the option that does not ensure all developers have a much more complicated and gotcha-prone experience so that a small subset might use a small bit of extra functionality.

@cspotcode
Copy link

next allows a single loader to:
a) pre-process the call to the next loader
b) post-processing the results from the next loader
c) skip/suppress the call to the next loader

done supports 2 out of 3?

If I understand correctly, neither API allows awaiting an entirely different module resolution. For example if the format of ./transpiled-language.* depends on the contents of package.json and/or transpiled-language-config.json, and if the module is coming from the web through an unpkg, url-remapping, or http-to-https loader. However, node's own loader does not use the loaders chain to read package.json, does it? It assumes they exist on the local filesystem. I'm not sure if that will change, and I'm not sure whether loaders want to be a virtual filesystem of sorts.

@JakobJingleheimer
Copy link
Member Author

JakobJingleheimer commented May 29, 2021

done would not be able to support b in your list because by the time the subsequent hook is invoked, the current will have finished (I'm not sure if it should do b). It can do a and c.

Currently, node does read the package's package.json, checking its "type" field.

If you need to orchestrate the sequence of imports such that foo is available before bar, a dynamic import might be what you're looking for:

const { default: foo } = await import('foo');

const { default: bar } = await import('bar');

If you're saying that foo needs to happen during and be available to bar's load hook, the above may suit with some finagling, but this may be something to use context.conditions (which is a bit vaguely defined at the moment, but could potentially be extended to facilitate).

@JakobJingleheimer

This comment has been minimized.

@giltayar
Copy link
Contributor

all other hooks need to run and a particular one dynamically needs be last in queue, yes, next() would facilitate that and done() would not. Needing to do that dynamically is a critical differentiator

Yes, you understood!

but it sounds like very much an edge-case

Is a mocking library an edge case for loaders? I don't think so. I'm guessing there will be a few of those.

Regarding needing the particular resolve hook's sibling load hook to be called first, both done() and next() could do that

How does that happen in the Done method? If the resolve MUST be called last, that means that the loader should be the last in the chain, which means that the load will also be last, no? Using the next method, I can be the first in the chain, yet call next() and transform the value I get from next(), thus acting like I'm last in the chain, whereas my load() can just not call next() and thus be the first (and last!) in the chain.

At first read, it sounds like what you're describing meticulously strong-arms a process that would probably naturally result in what you want anyway.

Maybe I'm missing something, but how would it naturally occur?

If this is an edge-case, unless it's a very compelling edge-case...

As I said above, I believe mocking libraries are a compelling case.

@JakobJingleheimer
Copy link
Member Author

JakobJingleheimer commented May 30, 2021

For sure supporting mocking is compelling.

I'm thinking mocking would be used only during automated testing (if not, please let me know of the other scenario(s)). If so, I would expect the mocking loader to need only a load() hook (allowing the normal resolves to happen as they would). If so, just put the mocking loader at the front of the queue:

"test": "NODE_OPTIONS='--loader mockLoader' …"

Then in mockLoader's load, read a static list of what to mock, check the url against that, and do on match (and opt-out on mismatch):

export async function load(input, url /* … */) {
  const mock = mapOfMocks.get(url);
  if (mock) return {
    format: 'module',
    source: mock,
  };
  // omitting a return = opt-out
}

If you wanted to bypass a mock, you could add some kind of query param flag to the specifier, like 'foo.mjs?noMock'. You could similarly have an adhoc mocking opt-in with a query param (and the easiest would be some kind of location convention, like adjacent, but the query param could have a value with a path to the mock): 'foo.mjs?doMock' and the mockLoader looks for a file of the same name like 'foo.mock.mjs':

export async function load(input, inputUrl, context, done, defaultLoader) {
  const url = new URL(inputUrl);
  const params = new URLSearchParams(url.search);

  if (params.has('noMock')) return;

  if (params.has('doMock')) {
    const mockUrl = ;
    const mock = defaultLoader(mockUrl);
    if (mock) return {
      format: 'module',
      source: mock,
    };
  }
  else { /* global mocks */ }

  // omitting a return → opt-out
}

Perhaps don't call done() in mockLoader because the mock file might be typescript or something that a subsequent loader would transform.

@giltayar
Copy link
Contributor

@JakobJingleheimer

The reason mocking loaders also need to hook the resolve is for "cache busting": in any mocking library, you can mock an ESM one way, and then mock it another. To make this work, the resolve adds a "generation" query parameter to the URL to enable it to load the "latest" change to the mock. But for this to work, it needs to be the last in the chain.

(in Quibble currently it's also used as a hacky way to figure out where the module file is (search for "dummyImportModuleToGetAtPath" in my blog post, but I expect this hack to go away once we have import.meta.resolve)

@cspotcode
Copy link

Does jest's mocking loader need resolve for an additional reason: to redirect imports into __mocks__ directories?

@GeoffreyBooth
Copy link
Member

@JakobJingleheimer In your HTTPS loader example, what is input and why would it being defined mean we would opt out?

export async function load(
  input,
  inputUrl,
  context,
  // done,
  // defaultLoader,
) {
  if (input) return; // opt-out

I’m thinking mocking would be used only during automated testing (if not, please let me know of the other scenario(s)).

I use mocking of network calls during development to control API responses without needing live backends in particular states. That could be implemented at a low level (mock/intercept the networking machinery itself) or by mocking the function where the network fetch occurs.

Application monitoring could also be considered another form of mocking. If you think of a use case like a live dashboard of an app’s traffic, getting the data for such a tool could be implemented by mocking/proxying lower level functions like core parts of Connect/Express/other framework or of Node.

@JakobJingleheimer
Copy link
Member Author

@giltayar so actually I had read that blog post way back when! Having re-read it, now I remember the purpose of the "generation" query param.

To sum up the problem, it sounds like you're using the generation number in resolve to ensure 2 things:

  1. Instances of the mock are independent/isolated: the "global" foo.mock used by bar.test and qux.test has no cross-contamination (ex bar.test does something to foo.mock and that shouldn't bleed into qux.test, like call counts)
  2. A module can have multiple, concurrent mocks: foo is locally mocked in bar.test and also in qux.test and their local mocks are different (ex bar's mock of foo contains exports a and b, and qux's mock of foo has exports b and c)

When running tests in parallel, bar.test and qux.test can be being processed at the same time (typically facilitated via multiple child processes); importantly, there is only 1 node process, and that contains the caches (so that is shared).

Have I got that right?

If so, why does your resolve need to be last? I think all that matters is that your query params exist in the final value of url returned from the resolve chain.

Example
--loader=httpsLoader \
--loader=quibbleLoader \
--loader=babelLoader

httpsLoaderquibbleLoaderbabelLoader

// quibbleLoader
export async function resolve(specifier, parentUrl) {
  if (!isQuibbly) return;

  // …

  const url = new URL(/* … */);
  url.searchParams.set('__quibble', global.__quibble.stubModuleGeneration);

  return {
    url: url.href,
  };
}
// babelLoader
export async function resolve(specifier, parentUrl) {
  if (!isBabelly) return;

  // …

  const url = new URL(/* … */);
  url.searchParams.set('__babel', babelStuff);

  return {
    url: url.href,
  };
}

The result of the resolve chain would be something like https://example.com/foo.mjs?__quibble=2&__babel=babelsConfig

@JakobJingleheimer
Copy link
Member Author

Does jest's mocking loader need resolve for an additional reason: to redirect imports into __mocks__ directories?

@cspotcode yes for the same reason(s) Quibble does.

@JakobJingleheimer
Copy link
Member Author

In your HTTPS loader example, what is input and why would it being defined mean we would opt out?

@GeoffreyBooth input is the (interim) result of the previous loader, if any previous loader(s) provided anything yet (ex there could be 5 loaders that ran before HTTPS Loader, and they may have all opted out, in which case input would be empty).

In the case of HTTPS Loader, if interim source already exists, that indicates there's nothing for HTTPS Loader to do. My example may have over-simplified slightly, since input would be empty or an object (so it would actually be checking input?.source). Perhaps input should be initialised as an object with empty format and source properties (but that may make the node-land code a little more complicated).

I use mocking of network calls during development to control API responses without needing live backends in particular states. That could be implemented at a low level (mock/intercept the networking machinery itself) or by mocking the function where the network fetch occurs.

Ah, true. In that case, just ensure that mocker occurs ahead of HTTPS Loader (and any other remote loaders) in the queue 😉 I'm thinking that they would not need to conditionally change sequence.

Application monitoring could also be considered another form of mocking. If you think of a use case like a live dashboard of an app’s traffic, getting the data for such a tool could be implemented by mocking/proxying lower level functions like core parts of Connect/Express/other framework or of Node.

I think in that case, it would just need to be first in the queue (which they already need, citing something like "ensure Foo is the first require() in your application").

The updates to ESMLoader.load() would also need to ensure individual properties are not inadvertently stomped (eg. loader4 returns only a source property and omits format, that probably shouldn't overwrite an existing format from a previous loader, but returning source and an empty format would).

@giltayar
Copy link
Contributor

giltayar commented Jun 2, 2021

A module can have multiple, concurrent mocks: foo is locally mocked in bar.test and also in qux.test and their local mocks are different (ex bar's mock of foo contains exports a and b, and qux's mock of foo has exports b and c)

Nope. There can be only one specific mock per-process at a specific time. But a test could mock the module in one way at a certain time, and serially after that mock it in another way. And that is the purpose of the generations: to allow the test to change the mocks serially.

If so, why does your resolve need to be last?

It needs to be last because I don't really care how the module is resolved. I just want to add my query parameter to it. Hence, it needs to be the last.

I think all that matters is that your query params exist in the final value of url returned from the resolve chain.

Exactly. That's why it needs to be last! Or did I misunderstand?

@JakobJingleheimer
Copy link
Member Author

JakobJingleheimer commented Jun 2, 2021

Nope. There can be only one specific mock per-process at a specific time. But a test could mock the module in one way at a certain time, and serially after that mock it in another way. And that is the purpose of the generations: to allow the test to change the mocks serially.

I think not important for next vs done, buut: Possibly over-complicating things, might using the parentUrl be more robust and easier to troubleshoot (and also preclude collision)? Ex

file://…/foo.mjs?__quibble=…/bar.mjs
file://…/foo.mjs?__quibble=…/qux.mjs

I think all that matters is that your query params exist in the final value of url returned from the resolve chain.

Exactly. That's why it needs to be last! Or did I misunderstand?

Yes, I'm pretty sure you misunderstood 🙂 If all you need is your query param to be present in the final value, that can happen anywhere in the chain (as long as other loaders behave themselves). Think of it like an assembly-line, where each loader does its own part: I make and attach the headlights, you make and attach the upholstery; neither of us does that to the exclusion of the other (hence why calling done would be exceptional).

@giltayar
Copy link
Contributor

giltayar commented Jun 2, 2021

might using the parentUrl be more robust

Won't work, because same file might be importing the module multiple times (usingawait import), yet wanting different mocking for each import.

that can happen anywhere in the chain (as long as other loaders behave themselves)

  1. as long as other loaders behave themselves: I don't want to depend on them keeping existing query parameters (or hashes).
  2. I can't add query parameters to bare specifiers, so I first have to have somebody resolve it to a file, and then add the query parameters.

@JakobJingleheimer
Copy link
Member Author

JakobJingleheimer commented Jun 2, 2021

  1. as long as other loaders behave themselves: I don't want to depend on them keeping existing query parameters (or hashes).

That sounds like something everybody would want, which is mutually exclusive / impossible (regardless of next or done). If another loader is misbehaving, fix it or don't use it. I would ensure there's sufficient documentation and examples to understand how to get loaders to play nice together 😉 None of the concepts are novel, so someone with experience integrating multiple pieces would likely already be familiar.

  1. I can't add query parameters to bare specifiers, so I first have to have somebody resolve it to a file, and then add the query parameters.

If quibble is before another loader that would supply that, I thiiink this is addressed by leveraging the defaultResolver argument?

export async function resolve(specifier, , defaultResolver) {
  // …

  const fileUrl = defaultResolver(specifier, );

  const url = new URL(fileUrl);

  // …
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues opened for discussions and feedbacks. esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders module Issues and PRs related to the module subsystem.
Projects
None yet
Development

No branches or pull requests

5 participants