-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Refactor structured clone into serialize/deserialize steps #2421
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of, thanks so much for tackling this. Hopefully this third rewrite of the algorithm is the one that puts us on solid ground.
I don't really understand the need for recursive transfers. A transfer is basically moving a pointer. You can't really divide that further.
Now we're more comfortable with IDL extension, should we add IDL extension to make it easier to define the platform object bits? E.g., it seems serialization type could just be the interface name.
I think the main problems you hint at with deserialization can be solved if we put objects in charge of the full deserialization dance, including allocating themselves.
IDB I think needs a new primitive for storing that ends up copying any shared data (or IDB should simply disallow shared memory and you just need to convert to AB first).
The way MessagePort needs to work is that the set of tasks associated with it end up on an event loop and the task then needs to create an event and deserialize data as appropriate for that event loop. So we need to put much more logic in the actual task.
I hope this and the many nits and suggestions help. It might help to discuss more difficult points in a corresponding issue perhaps so we only deal with nits here? Up to you.
source
Outdated
<dt>A <dfn data-export="">serialization type string</dfn></dt> | ||
<dd><p>A string that uniquely identifies this class of <span>serializable objects</span>, so that | ||
the <span>StructuredSerialize</span> and <span>StructuredDeserialize</span> algorithms can | ||
recognize instances of serialized data to which this definition applies.</dd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I would call this the "serialization type". We generally don't put the type in names.
<dd> | ||
<p>A set of steps that serializes the data in <var>input</var> into fields of | ||
<var>serialized</var>. The resulting data serialized into <var>serialized</var> must be | ||
independent of any <span>JavaScript Realm</span>.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On initial reading I thought this requirement could not be met by SAB. I understand what you're going for though and I don't really know a better way of phrasing it.
</dd> | ||
|
||
<dt><dfn data-export="">deserialization steps</dfn>, taking a <span>Record</span> | ||
<var>serialized</var> and a <span>platform object</span> <var>value</var></dt> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't these steps have to define how the platform object is allocated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would also make it easier to make failure happen in case the platform object cannot even appear in that realm, the constructor would simply fail.
source
Outdated
<span>"<code>DataCloneError</code>"</span> <code>DOMException</code>.</p> | ||
|
||
<p class="example">For instance, a proxy object.</p> | ||
</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: indentation.
source
Outdated
<span>"<code>DataCloneError</code>"</span> <code>DOMException</code>.</p></li> | ||
|
||
<li><p>If <var>value</var> has a [[BooleanData]] internal slot, then let <var>serialized</var> be | ||
{ [[Type]]: "Boolean", [[BooleanData]]: <var>value</var>.[[BooleanData]] }.</p></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also try to change the "let serialized" pattern? Using let for the same variable so many times makes me a little anxious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should, as we seem to be moving in that direction, but it just feels a bit silly when you write out the "right" way...
source
Outdated
|
||
<div class="example"> | ||
<p>It's important to realize that the <span data-x="Record">Records</span> | ||
produced by <span>StructuredSerialize</span> may contain "pointers" to other records that create |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: may -> can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed linter to catch this whatwg/html-build#105
source
Outdated
<span>platform object</span>.</p></li> | ||
|
||
<li><p>Let <var>value</var> be a new instance of the <span>platform object</span> type | ||
identified by <var>serialized</var>.[[Type]].</p></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this allocation should be defined by the deserialization steps of the platform object. In particular we might want to support a situation where you can have a platform object that doesn't exist in all contexts. E.g., it can only be passed around between [SecureContext] environments. But it also seems nice to allow platform objects the same conveniences when it comes to construction as we use for JavaScript objects. That is, reusing existing algorithms taking as arguments other bits from the record.
Can you explain why you think this is helpful or necessary? We're already being very vague around allocation of new objects. E.g. technically the correct way to allocate a Boolean wrapper is:
but this is pretty ridiculous, so we just lean on the reader knowing that
is equivalent to the first two steps and
is equivalent to the third step. I see the step
as analogous: it's a bit hand-wavy, but everyone knows what it means, and it should work fine. It should also be equivalent to the first step in most platform object's constructors, actually.
I don't see why this is a reason for requiring custom deserialization steps. It does imply I should add a note to the above stating that the object creation might fail for such a reason, though.
So, I'm not sure yet, but I think byte streams would be an example of this. You transfer the stream, which primarily means transferring any chunks already in the stream's queue recursively. (And then setting up a task to continually read from the stream and transfer more in the future.)
Great idea.
What do you mean exactly? I was thinking maybe we'll need SABs to serialize to normal ArrayBuffer stuff + source realm, so that when deserializing we can compare target and source realm for share-within-able-ness. And that is indeed weird from the sense of "realm independent", but technically still meets the definition. Was that what you were thinking, or something else? |
That would definitely help make the current setup less problematic.
I think what this means in practice is cloning the stream and transfering its contents. The main issue here is that it's no longer a synchronous operation, it becomes a sequence of intermittant operations. I would expect this to require a novel setup or some layering on top of what we have today.
I would expect that you can either find the Shared Data Block or not in the agent cluster. (Of course, with the current fallback of copying this isn't quite what implementations do, but this would seem the most quick.) |
I don't think that's true. Platform objects are all allocated the same way, by Web IDL.
I don't understand this point. The allocation can fail I guess (if we are OOMing), and in that case the "Let value be a new instance of the platform object type identified by serialized.[[Type]]" step will fail. Changing to allow custom allocation doesn't allow failure any better.
I think that is the future I see, yeah. I'm not sure it's necessary to define in detail, but the idea is that like in implementations, all platform objects are just allocated the same way. The only important thing is getting the realm right, but that's what "in targetRealm" is about. This ties into whatwg/webidl#135 which is about defining "a new X object", i.e. allocation, in a tiny bit more detail.
Well, the question is, does it go in the transfer list or not? I think it needs to go in the transfer list, because the stream becomes "detached" via the locking mechanism. So these will be transfer steps, not serialization steps. I think this will be slightly novel, but I think it fits into the setup reasonably well. You end up doing some kind of recursive promise loop inside the transfer steps. E.g. something like
This is clearly not complete, and might not work; maybe I should try to work it out completely, before we settle on this design. But I hope it demonstrates how recursive transfers might be used. (Another issue here is that streams are technically not platform objects, but let's ignore that for now...)
That's interesting. Currently the ES spec doesn't tie Shared Data Blocks to agent clusters, but it seems like a reasonable thing for it to do... Although, that brings up a better idea for speccing than storing the source realm: store the source agent cluster. At the spec level at least it's then just a simple check for identity, and is arguably more obviously realm-independent. |
Made more progress and pushed more commits today. Still left:
It also would be really good to come up with a more concrete idea of how this will allow transferring streams. |
7eec615
to
5a1d78c
Compare
My thinking for streams and promises and any other temporal API is that they effectively need a private MessageChannel to function if the initial clone/transfer happens through the existing Another thing we need to do is collect downstream users that need updating:
|
source
Outdated
</ol> | ||
</li> | ||
|
||
<li> | ||
<p>Otherwise, if <var>input</var> has [[SetData]] internal slot, then:</p> | ||
<p>Otherwise, if <var>value</var> is an Array exotic object, then:</p> <!-- IsArray supports | ||
proxies too, which we cannot --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I prefer keeping the comment on its own line. Makes them easier to spot/read.
source
Outdated
|
||
<li><p>Let <var>value</var> be an uninitialized value.</p></li> | ||
|
||
<li><p>If <var>serialized</var>.[[Transfer]] is true, then let <var>value</var> be ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/let/set, s/be/to/.
source
Outdated
<var>interfaceName</var>, created in <var>targetRealm</var>.</p> | ||
|
||
<p class="note">This step can potentially fail if for some reason allocating the object is | ||
impossible, such as low-memory situations.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably state that we rethrow exceptions here. However, given that we don't do that for Array objects, we should maybe not account for it here either (and only account for it in the deserialize steps).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand what exceptions we'd be rethrowing. We don't do that for array objects because it's impossible for it to throw an exception. I think the same is true here, at least excepting OOMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon reviewing though this note does seem rather out of place, so I'll just remove it.
source
Outdated
<span>StructuredDeserialize</span>(<var>entry</var>.[[Value]], <var>targetRealm</var>, | ||
<var>memory</var>).</p></li> | ||
|
||
<li><p>Perform ? <span>CreateDataProperty</span>(<var>value</var>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should always succeed at this point, I think?
previously-serialized <span>Record</span> <var>subSerialized</var>, and returns | ||
<span>StructuredDeserialize</span>(<var>subSerialized</var>, <var>targetRealm</var>, | ||
<var>memory</var>). (In other words, a <span>sub-deserialization</span> is a specialization | ||
of <span>StructuredDeserialize</span> to be consistent within this invocation.)</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we simply pass the deserialization steps the memory argument from the start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making consumers remember to pass through memory and targetRealm, and/or fill memory at the appropriate times, seems very error prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And with sub-deserialization they end up as global variables or some such? I guess I should study that again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, I see how it works now. I guess that's fine, though it seems a little weird. I don't understand why they'd have to fill memory at the appropriate times though. That's not something that would change if we just require them to invoke the actual operation directly.
source
Outdated
|
||
<p class="note">Unlike the corresponding step in <span>StructuredDeserialize</span>, this step | ||
is unlikely to throw an exception, as no new memory needs to be allocated: the memory occupied | ||
by [[ArrayBufferData]] is instead just getting transferred into the new ArrayBuffer.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true when you cross an agent cluster / process.
It never needed to, partly because "agent cluster" is an emergent property from agents sharing memory and partly because "agent cluster" does not have a representation (in ecma262). In reality of course, "agent cluster" == Unit of Same-origin Browsing Context, the agents that can legally receive a SAB from each other by any means.
Indeed that's what code I'm working on for Firefox is doing, it stores the identity of the sender's Unit of Same-origin Browsing Contexts with the SAB representation; the receiver checks that its ditto identity matches the identity sent. |
OK, I think this is ready for review and in theory landing. Note that deserialization error handling is divided into two cases:
I think that is OK for now and we'll fix the latter as part of the SAB work. If we care about handling the former differently we should file a separate issue and not hold up this work. I could be persuaded to leave it vague though instead of censoring to null. As for dependent specifications:
Note that the cases that currently use StructuredClone will not break since we still define it here. However, it does seem like nobody actually wants to use StructuredClone directly. Everyone wants serialize/deserialize, including this spec. Maybe we should remove it? |
Removing StructuredClone seems a little better, since it'll force dependencies to rethink their strategy (and fix it). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you maybe post a copy online for review? That would make such a large change a little easier to judge. Meanwhile I spotted a few errors while skimming through.
source
Outdated
queue</span> the event <var>e</var> now finds itself.</p></li> | ||
<li> | ||
<p>Let <var>finalTargetPort</var> be the <code>MessagePort</code> in whose <span>port message | ||
queue</span> the event <var>e</var> now finds itself.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work, there's no e at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can refer back to the task itself though I think?
source
Outdated
<li><p>If <var>transferable</var> does not have a <span>[[Detached]]</span> internal slot | ||
(i.e., <var>transferable</var> is not a <span data-x="transferable objects">transferable</span> | ||
<span>platform object</span>), then throw a <span>"<code>DataCloneError</code>"</span> | ||
<code>DOMException</code>.</p></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would end up throwing if you pass in an ArrayBuffer whose ArrayBufferData is not detached.
https://dl.dropboxusercontent.com/u/20140634/structured-clone-rewrite/index.html#safe-passing-of-structured-data is a rendered version of this revision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found two more nits and one somewhat larger concern. Really great work 😊
source
Outdated
</ol> | ||
</li> | ||
|
||
<li><p>Set <var>serialized</var> to ? <span>StructuredSerialize</span>(<var>input</var>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Set/Let/, s/to/be/.
|
||
<li> | ||
<p>If <var>O</var> has an [[ArrayBufferData]] internal slot, then:</p> | ||
<p>If <var>serialized</var> contains a [[TransferConsumed]] field, then:</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the value of the slot is rather meaningless. It also seems weird for StructuredDeserialize to contain transfer-specific steps. Can't we just keep those in the transfer algorithms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of the slot is used in an assert. It replaces the previous [[Transfer]] field whose value was indeed meaningless. But something is needed as a marker anyway I believe.
I don't think there's a reasonable way to do this in StructuredDeserializeWithTransfer only. We'd need to recursively crawl the serialization to find all the things marked as [[TransferConsumed]]: false
, then pull them out for a separate processing step. Such a processing step would then involve that weird "replace all references within a JS object graph" operation again. It's much better to just do this as we are deserializing the graph, in a single pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we already do the "weird" thing when serializing, no? Seems better to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's different there, because it's web-observable what order the effects happen in (e.g. if there is something unserializable in the structure, you don't want to detach your array buffers).
Also that doesn't require the "replace all references within a JS object graph", which raises questions about e.g. do we use Set() or DefineProperty() (answer: neither, some weird undefined-by-ES thing) but instead the much more reasonable "replace all references within a Record structure".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm happy to let this one go.
@@ -63755,6 +64325,34 @@ v6DVT (also check for '- -' bits in the part above) --> | |||
<li><p>Return <var>imageData</var>.</p></li> | |||
</ol> | |||
|
|||
<p><code>ImageData</code> objects are <span>serializable objects</span>. Their <span>serialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ImageData IDL needs to have the new Serializable keyword.
Updated! What do you think the right plan is for merging this? I'd like to do it sooner rather than later so I can resume work on the SAB stuff. But I can understand if we want to have PRs lined up for as many other specs as possible first. I can do most PRs but will need your help around the blob URL store. |
I would be okay with landing this with the "Breaking" prefix coupled with downstream issues and us or others fixing those over the next couple months. |
Note that I pushed a further fixup since you applied the annotation to the wrong interface. |
This allows circular references to work correctly
I wonder why these aren't being caught locally for me...
424e6a3
to
02c5ca4
Compare
Added WebAudio/web-audio-api to your list. |
This needs a lot of polish but I wanted to get it up there before @annevk woke up in case he was interested. Some review of the overall strategy would be good too.
Points I am actively thinking about:
Points I haven't thought hard about yet but am really hoping that this new framework makes it easy:
In this last list, all points but the last one don't necessarily need to be resolved before merge, but we should feel relatively confident in the path toward solving them, so that we don't have to throw this all away and start over yet again.