-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
esm: support loading data: URLs #28614
Conversation
cc: @nodejs/modules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I think this would be great to have.
Currently I simply didn't include the CJS MIME or the C++ Addon MIME for what can be loaded via data: URLs.
I think that's reasonable, especially at first. The require
loader isn't protocol aware so it would be
awkward to support file types that are "require-native" here.
this is very exciting :) i had been planning to open something like this when mime parsing landed. as long as this doesn't use proper parsing, it should probably be flagged. |
if (parsed.protocol === 'data:') { | ||
const [ , mime ] = /^([^/]+\/[^;,]+)(;base64)?,/.exec(parsed.pathname) || [ null, null, null ]; | ||
const format = ({ | ||
'text/javascript': 'module', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
text/javascript doesn't necessarily mean Module, it could also mean Script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under no JS loading spec does Script get checked against the MIME text/javascript. Script is effectively without a MIME and this table matches web standards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we support (browser-style) Script anywhere, so it would feel weird to do that here. A CommonJS script would be something like text/vnd.node.js
according to nodejs/TSC#371.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CJS is application/node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then can application/node
be added to this object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression though — /^([^/]+\/[^;,]+)(;base64)?,/
seems to assume the presence of mime type before the [;,]
— regardless of if it is mandatory, it might make sense to test it against long and malformed urls to potentially refine it if necessary.
Sorry for not wanting to muddy this with a bad attempt to wing it here, but I will try to locate the ones I worked on a while back for that very same purpose if it helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SMotaal we could, but i can't think of much we could do except limiting the size? Right now this lacks a variety of things, including MIME parameter parsing and the PR for parsing MIMEs is stuck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is anchored, it is certainly possible to add efficient guards in the current expression. I'd like to take on exploring how we can do that here, which is mainly just to carve a limited allowed chars when delimited per spec (I did this a while back just need to dig).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bmeck I looked into the various options for the expression and recommend:
/^(?:((?:text|application)\/(?:[A-Z][-.0-9A-Z]*)?[A-Z]+)((?:;[A-Z][!%'()*\-.0-9A-Z_~]*=[!%'()*\-.0-9A-Z_~]*)*)(;base64)?),/i
This would match any text/
and application/
subtype, along with the attribute-value parameters like charset=
(to be parsed separately), and optional base64
(captured separately from previous parameters).
For now, simply being more restrictive of the character ranges for greedy *
and +
captures is likely all we need to avoid unpredictable performance hazards with very long crafted/malformed strings.
See gist for more details.
Please let me know how to proceed, if this is worth incorporating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just post a suggestion
change and that looks fine
I am +1 on this change and code looks fine. I am not LGTMing this because tests are missing and two LGTMs mean this could land. I'd also like to echo Jan's statement (about the CJS and addon mime types):
|
To try to understand how
I must admit I'm a little worried about the security model implications, especially around systems that might want to restrict import access. Eg if we have packages on npm that use a In addition, what are the driving use cases for this work? |
yep.
yep.
yep. even though we could ban these, it doesn't seem like a big difference given how many evaluators there are with the ability to get
yup, that can be resolved since the specifier is an absolute URL.
Policy integrities are how people should restrict import access. We have a variety of evaluators that can access powerful APIs: The policy files currently are meant to prohibit loading resources that are not whitelisted through module loaders. A user would need to whitelist the data URL and the file URL to
Cross environment compatibility and runtime generation of modules in the main module map (with limits similar to browsers). A variety of uses of data URLs are possible, including but not limited to creation of shared module namespaces keyed by strings allowing things like modules to share a communication channel without directly needing to know where the other is. |
That is actually an interesting point - should a policy that prevents EDIT: I assume that integrity checks or other "allowed module sources" would be the more appropriate policy as opposed to throwing this in with |
@jkrems alternative works like Trusted Types are being looked at for ways to label "strings" as trusted for evaluators and has some agenda items at this month's TC39 meeting. We could add whatever policies people want to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a -0 on this to be completely honest. I think data:
URIs are slow and have lots of unfortunate edge cases including security / instancing concerns, and given that loaders don't provide any source hooking mechanism currently, will effectively be the only way to do that providing a pretty bad user experience on that front as well.
At the same time I can appreciate having more web features too.
I'm hopeful we can get more movement on the loader side though soon.
if (parsed.protocol === 'file:') { | ||
return readFileAsync(parsed); | ||
} else if (parsed.protocol === 'data:') { | ||
const [ , base64, body ] = DATA_URL_PATTERN.exec(parsed.pathname); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the case where the regex here fails parsing? (eg data:/
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would error, that invalid URL would have to be returned by a loader though and would already be error checked presumably by the attempted parse in
https://github.com/nodejs/node/pull/28614/files/95eab8ae6b9f5c13dbba10b326ba0e41c12b2274#diff-a7c0a5f3e4fc8503fefebbe82071bc38R48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if the .exec
returns null, the destructuring will throw I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll add some checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in df4dab6
I don't see much requests for changes and i think most of the implementation stuff is ironed out so I'm removing WIP |
To Guy's point, what's the use case for this other than parity with the Web? I'm familiar with data URIs for inserting images into the DOM and things like that. Expanding the scope of JavaScript libraries that can run in both Node and browsers is its own justification, of course, but does this enable any concrete use cases specific to Node? |
Thats a pretty open ended question, and |
None of the use cases this PR enables seems to be a new capability, and it seems to me like they can all be better accomplished with existing means. If there’s capabilities to add in the future, perhaps we should wait until then to add the basic support? Perhaps there’s a better way to add those capabilities? Adding in a new source of eval to node merely because browsers - which don’t have the power and flexibility of a filesystem available to them, and thus have no better option - doesn’t seem like a good addition to the platform to me. |
These are a different set of capabilities around location being accessed/addressed by content. That on its own is a new capability. In addition this adds compatibility and I do not see a claim of this as being a negative. This is a standardized way of achieving a feature, adding features ad-hoc for specialized circumstances might be useful, but that is not what this PR seeks. It is adding the basic feature that is suitable for a variety of means in a min-max scenario of support just focused on non-specialized loading. If other means can solve use cases with additional APIs that are more tailored to the use case, that seems fine in other PRs.
New sources of evaluation are being added consistently over time to JS the language, Node's runtime, and through new sources like WASM, or even by people implementing them in the ecosystem. I do not understand the argument here vs all those other places that continue to introduce them. If you can explain why having compatibility is a net negative without placing a claim that allowing evaluators are needing to be stopped as a whole needs to be mitigated that would be helpful. As it stands I don't see a clear opinion on why adding this feature is seen as harmful compared to others we continue to add either by choice such as through |
Evaluating a file (a lintable, testable, statically verifiable file) is an entirely different thing than evaluating a string. |
@ljharb Worker allows string based eval, as does WASM allow eval through buffers of bytes, as does new JS like AsyncGeneratorFunction, as do a variety of modules off npm. I don't see how this is different from those cases and others that continue to be added. |
Just because mistakes are being made elsewhere doesn’t mean we have to repeat them. |
@ljharb the claim that others are making mistakes is not one held by all here. I do not see them as mistakes. Evaluators on their own are not problematic is a generally held and stated stance by TC39 meetings when discussing things, if you could explain why this is a problem that should be prevented here in a way that does not call it a vague mistake that would be helpful. In particular, the ability to access powerful APIs that might grant authority that is unwanted is not prevented by preventing evaluators is often a point made as to why evaluators themselves are not problematic. Access to powerful APIs is possible without evaluators and evaluators can be recreated in userland as we saw with various technologies like ASM.js and now WASM becoming popular ways to do so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look because of the request for sec-wg review. I'm not sure what the data:
scheme does, and the PR and docs seem to require pre-existing familiarity. There is only one example, and its an example of something that doesn't work. Some examples of what does work and a description of what the purpose
and function of data:
would be helpful.
Of note, this scheme by being content-addressed does allow for creating resources at runtime. This combined with the nature of ES Modules being a shared global (per V8 Context/Realm) namespace of modules means that recreating a data URL is tantamount to giving a communication channel between modules. This is creation of shared communication is already true for
Embedding practical examples into the docs might be a bit unruly, perhaps I can explain them in english for now and we can discuss them as examples that way. Due to encoding and needing to be a string many things are not suitable for small scale examples. All examples would roughly take the form of: const txt = 'some.javascript()'; // could also be a big buffer of WASM
const src = Buffer.from(txt).toString('base64');
const mime = `application/wasm` || `text/javascript`; // pick based upon txt format
const url = new URL(`data:${mime};base64,${src}`)
import(url); For example purposes I will just be showing the
When loading some things from memory, such as larger files as WASM avoiding creating a second WebAssembly.Module can be achieved through This same purpose can be applied to some forms of JS singletons. Take a generic log sink. export const log = []; Then could be imported in various places, pushed to, and drained without ever coordinating file paths. Using
By avoiding extraneous loading we create a singleton of a Module based upon its content. Some polyfilling/framework mechanisms wish to avoid having incompatible versions/duplicates and could use this as a communications channel to see if multiple versions are loaded if they agree on a content-addressed store to check things. Other use cases like modules not wishing to rely on things like environment variables this could be used for configuration purposes. The example above with a log is an example of this.
Hitting the
Currently the evaluators often used are not able to be given a unique location, which makes them ill suited to begin to apply things like policies to. By using an absolute location we can achieve the ability to configure these URLs in a manner that is not suitable for things like |
The example is very useful, as is the enumeration of use-cases, thank you. Are the supported MIME types doced elsewhere? If I understand you correctly, in your And thus your The comparison I guess is by exact bytes of uninterpreted data? I think minimal docs would an example, and a link to the relevant specs so users can find more. We don't need to inline all the JS standard docs into our own, but if our docs don't point to more information they can become basically incomprehensible. The use-cases are perhaps too specfic to add to the docs and people can follow the YAML annotations to find the PR and this discussion. But then again, if there are things people should know specific to using this mechanism in node (perhaps security pitfalls to avoid?) then its good to document it. Shouldn't there be YAML change annotations added, since its a new feature? I know these are not exactly APIs and the doc format is different from the rest of the API docs, but knowing what Node.js version |
Not currently, I can add them. This PR supports
Correct, only 1 evaluation of the module would occur.
Correct.
Yes, in the URL, not in the body of the source text. Changing a MIME would change the "location".
I agree to some extent, I could link to MDN which is more user friendly than the specs though. I'd prefer that.
There isn't anything specific to Node that I know of at this time.
Can do. |
Thanks, and I agree, MDN is a better link destination. |
This needs a rebase and a CI run. |
rebased |
None of the CI errors seem related to this PR. |
You can use "Resume Build" on the left hand side to only re-run the platforms that failed so you can check. (I just clicked it, so it should post here any second now....) |
Co-Authored-By: Jan Olaf Krems <jan.krems@gmail.com>
Co-Authored-By: Jan Olaf Krems <jan.krems@gmail.com> PR-URL: #28614 Reviewed-By: Jan Krems <jan.krems@gmail.com>
Landed in 9fd9efa |
Co-Authored-By: Jan Olaf Krems <jan.krems@gmail.com> PR-URL: #28614 Reviewed-By: Jan Krems <jan.krems@gmail.com>
This PR allows loading some formats of modules as
data:
URLs. This matches the web spec and has some concerns as we grow what can be loaded into ESM as per the open discussion at nodejs/security-wg#520 . I'm opening this with expectation of some discussion to take place around MIME parsing (which is left in #21128) which this PR doesn't properly do and what to do in the cases where module formats are not supported outside of file contexts. Currently I simply didn't include the CJS MIME or the C++ Addon MIME for what can be loaded viadata:
URLs. We could expose those but likely things would be awkward for things like__filename
and that C++ addons would have to be written to disk first fordlopen
to work. Once we resolve that we can write up docs on the decision and move this PR forward.Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes