-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
esm: add deregister
method
#49159
base: main
Are you sure you want to change the base?
esm: add deregister
method
#49159
Conversation
Review requested:
|
lib/internal/modules/esm/hooks.js
Outdated
#removeFromChain(chain, target) { | ||
for (let i = 0; i < chain.length; ++i) { | ||
if (target === chain[i+1]?.fn) { | ||
chain.splice(i+1, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chain.splice(i+1, 1); | |
ArrayPrototypeSplice(chain, i + 1, 1); |
lib/internal/modules/esm/hooks.js
Outdated
return x.fn === instance.globalPreload; | ||
}); | ||
if (index >= 0) { | ||
this.#chains.globalPreload.splice(index, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.#chains.globalPreload.splice(index, 1); | |
ArrayPrototypeSplice(this.#chains.globalPreload, index, 1); |
lib/internal/modules/esm/hooks.js
Outdated
if (instance.load) { | ||
this.#removeFromChain(this.#chains.load, instance.load); | ||
} | ||
this.#loaderInstances.splice(index, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.#loaderInstances.splice(index, 1); | |
ArrayPrototypeSplice(this.#loaderInstances, index, 1); |
We would need some docs.
IMO |
lib/internal/modules/esm/hooks.js
Outdated
const index = ArrayPrototypeFindIndex(this.#loaderInstances, (target) => { | ||
return target.id === id; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a different data structure that doesn't require us to traverse through the whole array to find what we are looking for?
lib/internal/modules/esm/hooks.js
Outdated
if (index < 0) { | ||
return false; | ||
} | ||
const instance = this.#loaderInstances[index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are using index
to just get the element, we can use ArrayPrototypeFind
instead of index.
lib/internal/modules/esm/hooks.js
Outdated
const index = ArrayPrototypeFindIndex(this.#chains.globalPreload, (x) => { | ||
return x.fn === instance.globalPreload; | ||
}); | ||
if (index >= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (index >= 0) { | |
if (index !== -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
While |
https://english.stackexchange.com/a/40095/250362 TLDR: “deregister” is the verb, “unregistered” is the state once it’s been deregistered. So “deregister” is correct in this case, because the API is the action being taken. |
What is the current signature for Assuming it’s currently sync and has no return value, I agree with @joyeecheung that we should change its return value to be the ID, similar to |
https://english.stackexchange.com/questions/25931/unregister-vs-deregister/40095#comment1140785_40095 states that in the verb form, both prefixes are synonyms. |
I think |
Right, and in this case it's reverting/undoing a registration :-) |
As far as I can tell the only precedent is internal. We have no public-facing APIs that are called “unregister,” per https://www.google.com/search?q=site%3Anodejs.org+%22unregister%22. I feel pretty strongly that “deregister” is correct English, so the public API should have that name, and we can clean up/rename the internal APIs whenever we get a chance. If you read the full page at https://english.stackexchange.com/a/40095 it seems pretty clear that most people consider “deregister” more correct, with some arguing that they’re equivalent, but no one thinks that “unregister” is more correct English. This is a bikeshed and wasting attention. We already have a name. If you want to discuss it further please open a separate issue. |
|
@@ -160,6 +168,15 @@ class Hooks { | |||
load, | |||
} = pluckHooks(exports); | |||
|
|||
const instance = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is maybe not a good name: instance
usually means instance of a class, but here I think it's being used as "occurrence".
const instance = { | |
const registrationId = // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use something like uid(3)
(it looks like crypto.randomUUID() is the only thing already available in node, but that would be massively overkill) or perhaps the loader's url. Chatting with Geoffrey, he pointed out a loader can now be registered multiple times, so if the loader's url, it would need to have something else appended, like a generation id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use symbols for the public facing API, and the index for the internal API; using random string or number would be a mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooow! That's a good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s the value of a symbol as opposed to a string like file:///path/to/loader.js?1
? At least the latter provides some useful information, such as for debugging, and then we don’t need to have different data structures internally and externally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A symbol is guaranteed to be unique, so then there is no concern for a generational id on the same url:
Symbol(url) ≠ Symbol(url)
Also, the label of a symbol is visible, so it still provides almost identical debugging value 🙂
And bonus: it precludes the possibility of a user getting overly clever and foot-gunning a deregistration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also return a handle, like:
const handle = register(...)
handle.deregister()
Then the handle could potentially get other methods in the future. It could also contain the return value of initialize
as a property, like handle.initializationResult
.
Not sure I prefer this over other options, just adding it to the mix. We still need to deal with the problem that register
currently returns the return value of initialize
, so either we add this symbol into initialize
for the user to return, and then somehow we get symbols across the thread barrier; or we change the return signature of register
.
lib/internal/modules/esm/hooks.js
Outdated
#lastInstanceId = 0; | ||
#loaderInstances = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not use a simple number like this for id
. Instead, we should use a Map
and some string-friendly id (because maps are significantly more efficient with string-based indexes than numeric that gets cast back and forth). It also makes mutations far more efficient, as well as being inherently better on memory (because an array must be pre-allocated space, whereas a Map takes up only exactly the amount of space it needs).
#lastInstanceId = 0; | |
#loaderInstances = []; | |
#customLoaders = new SafeMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to store the order of the loaders, a Map
doesn't seem adequate. Also, it's the first time I hear that Map
would be more memory efficient than arrays, I'd be very surprised if that was the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maps are ordered, but I agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the sequence of Map items is guaranteed (insertion order): MDN: Map
The Map object holds key-value pairs and remembers the original insertion order of the keys.
Also, it's the first time I hear that Map would be more memory efficient than arrays, I'd be very surprised if that was the case.
Maps are better than Arrays in terms of both space and time efficiency:
Arrays are pre-sized. V8 pre-allocates a rather significant size¹ for an array in order to avoid having to move it (because the entire array is stored sequentially in memory) which is a very expensive operation.
Maps are not stored sequentially in memory, so there is no "move" operation when they grow (nor is there a pre-allocation of space).
¹ unless a size is specifically provided like new Array(2)
(then the array will be exactly sized); however this can be a de-op due to data type of the items themselves (under the hood, V8 can use a better-suited type of array, but it doesn't do that optimisation when you strong-arm it, so unless the items in the array are all the same data-type, generally a primitive, it's better to not force a size).
Sets are similarly better for space and time efficiency (and are also not stored sequentially in memory).
Also worth noting: Maps and Sets are significantly better for rapid mutation than Arrays, and that's exactly what we expect to happen here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also worth noting: Maps and Sets are significantly better for rapid mutation than Arrays, and that's exactly what we expect to happen here.
I don't think we expect that, I would expect that unregister
to not be useful to most applications – I've never seen anyone request it, but maybe there are some use cases I'm not seeing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 sides to this coin ;) deletions from de/un-register, but primarily additions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sets are similarly better for space and time efficiency (and are also not stored sequentially in memory).
Also worth noting: Maps and Sets are significantly better for rapid mutation than Arrays, and that's exactly what we expect to happen here.
How did you come to that conclusion? Is there a way to check if this is true?
From my understanding, and with some pointers from @targos, I don't see how this could be true.
In V8, Set
and Map
instances have a hash table associated with them which seems to be stored sequentially in memory:
TF_BUILTIN(SetPrototypeAdd, CollectionsBuiltinsAssembler) { |
node/deps/v8/src/builtins/builtins-collections-gen.cc
Lines 1789 to 1791 in f226350
// We do not have enough space, grow the table and reload the relevant | |
// fields. | |
CallRuntime(Runtime::kSetGrow, context, receiver); |
node/deps/v8/src/runtime/runtime-collections.cc
Lines 20 to 26 in edd537c
RUNTIME_FUNCTION(Runtime_SetGrow) { | |
HandleScope scope(isolate); | |
DCHECK_EQ(1, args.length()); | |
Handle<JSSet> holder = args.at<JSSet>(0); | |
Handle<OrderedHashSet> table(OrderedHashSet::cast(holder->table()), isolate); | |
MaybeHandle<OrderedHashSet> table_candidate = | |
OrderedHashSet::EnsureGrowable(isolate, table); |
node/deps/v8/src/objects/ordered-hash-table.cc
Lines 69 to 93 in f226350
template <class Derived, int entrysize> | |
MaybeHandle<Derived> OrderedHashTable<Derived, entrysize>::EnsureGrowable( | |
Isolate* isolate, Handle<Derived> table) { | |
DCHECK(!table->IsObsolete()); | |
int nof = table->NumberOfElements(); | |
int nod = table->NumberOfDeletedElements(); | |
int capacity = table->Capacity(); | |
if ((nof + nod) < capacity) return table; | |
int new_capacity; | |
if (capacity == 0) { | |
// step from empty to minimum proper size | |
new_capacity = kInitialCapacity; | |
} else if (nod >= (capacity >> 1)) { | |
// Don't need to grow if we can simply clear out deleted entries instead. | |
// Note that we can't compact in place, though, so we always allocate | |
// a new table. | |
new_capacity = capacity; | |
} else { | |
new_capacity = capacity << 1; | |
} | |
return Derived::Rehash(isolate, table, new_capacity); | |
} |
From the above, I understand the Set
or Map
hash table capacity is doubled everytime it reaches its capacity, so not so memory efficient. Now that Rehash
function has to recompute the hash for each elements for the new table:
node/deps/v8/src/objects/ordered-hash-table.cc
Lines 259 to 261 in f226350
template <class Derived, int entrysize> | |
MaybeHandle<Derived> OrderedHashTable<Derived, entrysize>::Rehash( | |
Isolate* isolate, Handle<Derived> table, int new_capacity) { |
node/deps/v8/src/objects/ordered-hash-table.cc
Lines 278 to 298 in f226350
for (InternalIndex old_entry : table->IterateEntries()) { | |
int old_entry_raw = old_entry.as_int(); | |
Object key = table->KeyAt(old_entry); | |
if (key.IsTheHole(isolate)) { | |
table->SetRemovedIndexAt(removed_holes_index++, old_entry_raw); | |
continue; | |
} | |
Object hash = key.GetHash(); | |
int bucket = Smi::ToInt(hash) & (new_buckets - 1); | |
Object chain_entry = new_table->get(HashTableStartIndex() + bucket); | |
new_table->set(HashTableStartIndex() + bucket, Smi::FromInt(new_entry)); | |
int new_index = new_table->EntryToIndexRaw(new_entry); | |
int old_index = table->EntryToIndexRaw(old_entry_raw); | |
for (int i = 0; i < entrysize; ++i) { | |
Object value = table->get(old_index + i); | |
new_table->set(new_index + i, value); | |
} | |
new_table->set(new_index + kChainOffset, chain_entry); | |
++new_entry; | |
} |
In comparison, %Array.prototype.push%
has a fast path, and seems also simpler (because no hashing involved), and also has optimizations based on what type of object you add to the array. The object path:
TF_BUILTIN(ArrayPrototypePush, CodeStubAssembler) { |
node/deps/v8/src/builtins/builtins-array-gen.cc
Lines 362 to 374 in f226350
BIND(&fast); | |
{ | |
array_receiver = CAST(receiver); | |
arg_index = IntPtrConstant(0); | |
kind = EnsureArrayPushable(context, LoadMap(array_receiver), &runtime); | |
GotoIf(IsElementsKindGreaterThan(kind, HOLEY_SMI_ELEMENTS), | |
&object_push_pre); | |
TNode<Smi> new_length = | |
BuildAppendJSArray(PACKED_SMI_ELEMENTS, array_receiver, &args, | |
&arg_index, &smi_transition); | |
args.PopAndReturn(new_length); | |
} |
node/deps/v8/src/builtins/builtins-array-gen.cc
Lines 405 to 410 in f226350
BIND(&object_push); | |
{ | |
TNode<Smi> new_length = BuildAppendJSArray( | |
PACKED_ELEMENTS, array_receiver, &args, &arg_index, &default_label); | |
args.PopAndReturn(new_length); | |
} |
node/deps/v8/src/codegen/code-stub-assembler.cc
Lines 3446 to 3450 in f226350
TNode<Smi> CodeStubAssembler::BuildAppendJSArray(ElementsKind kind, | |
TNode<JSArray> array, | |
CodeStubArguments* args, | |
TVariable<IntPtrT>* arg_index, | |
Label* bailout) { |
node/deps/v8/src/codegen/code-stub-assembler.cc
Lines 3458 to 3463 in f226350
// Resize the capacity of the fixed array if it doesn't fit. | |
TNode<IntPtrT> first = arg_index->value(); | |
TNode<BInt> growth = | |
IntPtrToBInt(IntPtrSub(args->GetLengthWithoutReceiver(), first)); | |
PossiblyGrowElementsCapacity(kind, array, var_length.value(), &var_elements, | |
growth, &pre_bailout); |
node/deps/v8/src/codegen/code-stub-assembler.cc
Lines 3429 to 3444 in f226350
void CodeStubAssembler::PossiblyGrowElementsCapacity( | |
ElementsKind kind, TNode<HeapObject> array, TNode<BInt> length, | |
TVariable<FixedArrayBase>* var_elements, TNode<BInt> growth, | |
Label* bailout) { | |
Label fits(this, var_elements); | |
TNode<BInt> capacity = | |
TaggedToParameter<BInt>(LoadFixedArrayBaseLength(var_elements->value())); | |
TNode<BInt> new_length = IntPtrOrSmiAdd(growth, length); | |
GotoIfNot(IntPtrOrSmiGreaterThan(new_length, capacity), &fits); | |
TNode<BInt> new_capacity = CalculateNewElementsCapacity(new_length); | |
*var_elements = GrowElementsCapacity(array, var_elements->value(), kind, kind, | |
capacity, new_capacity, bailout); | |
Goto(&fits); | |
BIND(&fits); | |
} |
node/deps/v8/src/codegen/code-stub-assembler.cc
Lines 5594 to 5598 in f226350
template <typename TIndex> | |
TNode<FixedArrayBase> CodeStubAssembler::GrowElementsCapacity( | |
TNode<HeapObject> object, TNode<FixedArrayBase> elements, | |
ElementsKind from_kind, ElementsKind to_kind, TNode<TIndex> capacity, | |
TNode<TIndex> new_capacity, Label* bailout) { |
node/deps/v8/src/codegen/code-stub-assembler.cc
Lines 5612 to 5620 in f226350
// Allocate the new backing store. | |
TNode<FixedArrayBase> new_elements = | |
AllocateFixedArray(to_kind, new_capacity); | |
// Copy the elements from the old elements store to the new. | |
// The size-check above guarantees that the |new_elements| is allocated | |
// in new space so we can skip the write barrier. | |
CopyFixedArrayElements(from_kind, elements, to_kind, new_elements, capacity, | |
new_capacity, SKIP_WRITE_BARRIER); |
So if the preallocated space is not enough, the algorithm to copy the array elements is a bit tedious, so I'll stop there:
node/deps/v8/src/codegen/code-stub-assembler.cc
Lines 5249 to 5255 in f226350
template <typename TIndex> | |
void CodeStubAssembler::CopyFixedArrayElements( | |
ElementsKind from_kind, TNode<FixedArrayBase> from_array, | |
ElementsKind to_kind, TNode<FixedArrayBase> to_array, | |
TNode<TIndex> first_element, TNode<TIndex> element_count, | |
TNode<TIndex> capacity, WriteBarrierMode barrier_mode, | |
HoleConversionMode convert_holes, TVariable<BoolT>* var_holes_converted) { |
It seems to me that when you are storing objects, they won't be copied over wether you use an array or a Set
(I assume the same is true for object/Map
), the engine will only store a reference (either in its hash table, or straight in the memory). In both cases, the engine pre-allocates some space, and when it's not enough, has to find another memory address where it can fit, and has to copy data over. In addition, for Map
and Set
, it has to recalculate the hashes.
TL;DR, Set
is not more memory nor performant efficient when it comes it changing data size, it's more efficient to ensure uniqueness and Set.prototype.has
is much more efficient than Array.prototype.includes
, otherwise you're better off with an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you come to that conclusion? Is there a way to check if this is true?
A couple years ago I did a benchmark hammering each with adds & deletes, and the results were like 50:1 Map vs plain object and Map vs Array, and like 20–10:1 Set vs Array (presumably Set was less good than Map because of the overhead enforcing uniqueness?); the performance gap widened non-linearly as size increased (mutations to the Map remained about the same cost, but mutations to Array became more and more expensive as the content within the Array got bigger).
I don't imagine that since then Map and Set got less performant or for Arrays to have changed much.
TL;DR,
Set
is not more memory nor performant efficient when it comes it changing data
I can't find the article anymore; it was written by one of the V8 people (or maybe Jake Archibald?) and specifically talked about how Map and Set are stored non-sequentially. Based on your investigation, it sounds like Map and Set internal hashtable (that remember meta information) is stored sequentially (which makes sense), but that the items are not necessarily (because the hashtable contains references/pointers, which can point anywhere). Moving a hashtable would be significantly less costly than moving a huge array with huge data.
But I think we can ignore Array vs Set for this PR because we want direct access, so it would be Array vs Map.
lib/internal/modules/esm/hooks.js
Outdated
return initialize?.(data); | ||
|
||
ArrayPrototypePush(this.#loaderInstances, instance); | ||
return initialize?.(data, { __proto__: null, id: instance.id }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return initialize?.(data, { __proto__: null, id: instance.id }); | |
return initialize?.(data, { __proto__: null, id: registrationId }); |
@aduh95 how does this obviate the need for a transfer list? How does the @GeoffreyBooth How does the |
@aduh95 @GeoffreyBooth @ljharb do you want me to run some benchmarks to actually test the performance of the different approaches? I expect the frequency with which |
@aduh95 |
No. Benchmarks aren't necessary but in general we always try to write performant code. I can imagine a worst case scenario where someone implements a bundler around this API, and registers and deregisters leaders over and over again for each imported module (because they're a poor programmer, probably, but still). So I think it's reasonable to lazy load things on first use, and to use a data structure where we can look up the loader by ID directly (like a map) rather than iterating through each element of an array. For the ID, I was imagining something like the resolved absolute URL of the module plus an integer like 1, 2, 3 to cover the case of the same loader being registered multiple times. So something like |
It's a good point that using the URL of the resolved module, plus a counter, would be a reliable choice for a Map (or null object) key - we'd just need to make sure there was a fast lookup from URL to "last used counter" for new registrations. |
@ljharb Why not just a counter? Is there a need to correlate URL with individual counts of loaders? It might be an interesting metric but a single monotonic number would suffice for an ID as far as I can tell…? |
This is what I was getting at with my question about how the docs don’t currently specify Alternatively |
That's the idea: it would no longer return data to the caller – hence, no need for a transferList anymore. |
Doesn't this… break one of the feature requests |
You can still pass a |
Is the idea that the We should get this clearly documented. If it’s confusing even us… |
I think it's already documented: Lines 780 to 815 in ee8b7f1
|
2c2bb6c
to
84be528
Compare
Suggestion from @GeoffreyBooth. Adds `deregister` method on `node:module` that looks like this: ```ts type Deregister = (id: string) => boolean; ``` Modifies the initialize hook to look like this: ```ts type Initialize = (data: any, meta: {id: opaque}) => Promise<any>; ``` Internally registered instances of hooks are now tracked. This is so they can be removed later. The id of the registered instance is now passed to the `initialize` hook. The return value of register is now the loader id. This is mapped to a symbol outside of the loader thread to prevent abuse. ```js // Caller import {register, deregister} from "node:module"; const id = register(...); // ... deregister(id); ```
84be528
to
2774269
Compare
TBD
chains
in favor ofloaderInstances
; there is a tiny performance penalty when going through any given chain (since there's a conditional branch now and possibly extra iterations up to the total number of loaders) but i think this is negligible compared to the usability increase – there is now a default loader calleddefault_loader
that is registered like any other as node's internal one and only one array instead of several. the logic for adding/removing loaders is significantly simpler now.register
– previously it returned whateverinitialize
did. @aduh95 noted that we should return a "loader handle" or a symbol to the loader instead of the initialize result which is what i've done here. i managed to make returning a symbol work across the thread gap. i am not 100% sold on symbol vs some kind of loader object which was also suggested (a thing that had a deregister method); i am still concerned about loader authors being able to pass initialization data back if it's somehow useful to them. N.B. this seems kind of like an 11th hour change now that 20.6.0 is imminent and the loaders API is being marked somewhat stable.initialize
and discard it, we now keep the result attached to the loader instance. this value is then fed back to hook invocations. after thinking about the comments in Replace globalPreload loaders#147 this actually seems necessary if there can be an arbitrary number of "copies" of any given loader; how can loaders differentiate their internal state from one another? there isn't a good way without something like this.deregister
– this builds on the first two changes; the first one makes it sensible/trivial to remove existing loaders, and the second makes it possible for the user to actually be able to reference an individual loader which is necessary for removalSuggestion from @GeoffreyBooth.
Adds
deregister
method onnode:module
that looks like this:Modifies the initialize hook to look like this:
Internally registered instances of hooks are now tracked. This is so they can be removed later. The id of the registered instance is now passed to the
initialize
hook which can then be passed back to the caller ofregister
.