-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LibJS: Implement the Set Methods proposal #16279
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Also very nice to immediately have some tests.
Just a couple of questions, with really only the [[Size]]
thing being an issue for me
// 8 Set Records, https://tc39.es/proposal-set-methods/#sec-set-records | ||
struct SetRecord { | ||
NonnullGCPtr<Object> set; // [[Set]] | ||
double size { 0 }; // [[Size] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a double the appropriate type here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see it might be infinity, hmm that complicates things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think just using the raw double result of toIntegerOrInfinity is cleaner.
return vm.throw_completion<TypeError>(ErrorType::IntlNumberIsNaN, "size"sv); | ||
|
||
// 6. Let intSize be ! ToIntegerOrInfinity(numSize). | ||
auto integer_size = MUST(number_size.to_integer_or_infinity(vm)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe at least add a VERIFY here to ensure we're in safe integer range (below 2^53) and not negative? That will ensure that [[Size]]
does have a a non-negative integer or +∞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can know that we're in the safe integer range? to_integer_or_infinity
doesn't check for that, and the spec text doesn't talk about it either AFAICT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I guess it would work outside of safe integer range since we only compare sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome :)
// 4. NOTE: If rawSize is undefined, then numSize will be NaN. | ||
// 5. If numSize is NaN, throw a TypeError exception. | ||
if (number_size.is_nan()) | ||
return vm.throw_completion<TypeError>(ErrorType::IntlNumberIsNaN, "size"sv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess we should drop the Intl prefix (later)
34b4b13
to
8e08ef5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I'm interested to see the comment about
since the question of whether that step is actually implementable is one for which I definitely need feedback from implementations. I'm trying to get a sense of whether LibJS has an otherwise-performant implementation of Set which is unable to support that operation, or if it's already known to be less than optimally big-O performant and this is a consequence of that. In other words: the spec assumes that this step would be possible because I had a particular implementation in mind for Set; is it impossible in LibJS because LibJS's implementation is different but equivalently good to the one I had in mind, or does it have less than optimal performance in other ways? Poking around a little, it looks like Map.prototype.delete (which backs Set.prototype.delete as well) is already linear in the size of the Map, instead of ~constant, so it seems like the existing implementation is already less than optimal, and so the fact that this step cannot be done efficiently falls out of that? But I'd appreciate commentary from someone more familiar with the internals here. |
@bakkot LibJS takes an approach of correctness first, performance later, so it's very possible our implementation is not fully up-to-par w/regards to that yet. Regarding the implementation you linked, it's not obvious to me how it handles iterator invalidation, or rather, the lack-there-of. Specifically, an iterator over that implementation would not allow continuing the iteration if the deletion or addition of entry caused a rehash of the map. Handling that specific issue is the reason why the initial LibJS implementation was eventually replaced with a balanced-binary-tree based one. To be honest I did not think too hard about how easy it would be to fix said FIXME, but it doesn't sound completely impossible to do in O(smaller N). The main issue I'm thinking of at the moment, is the question of how the original ordering of the elements in the larger set could be extracted without iterating over the larger set in the first place, is that readily available somehow in that implementation? |
To further comment on the iterator matter, I don't see any way to achieve O(~const) under the existing constraints:
more concretely, I don't think O(const) can be achieved by using a linked-list to keep track of the order (while there are live iterators) as deletion of the current or next element will either explode, or force a re-seek from the first known-available entry - neither of which are O(const). The best solution I can think of is to add a second hashmap to our existing rbtree/hashmap impl to make deletion O(lgn). I'd love to know your thoughts on how live iterators should be handled (and whether any of those constraints are actually a misunderstanding on my part - or if I'm missing some obvious fact here). |
Towards that end, you can get the right ordering semantics (with worse performance) for
Yeah, V8's actual implementation, and SpiderMonkey's, both have a lot more details to get full spec compliance, some of which is for handling iterator invalidation. I don't actually know the exact approaches they take. At a quick glance, V8's involves updating iterators to point to the new table, whereas SpiderMonkey's iterators keep track both of an index within the data table and a count of the actual elements which have been emitted so far and have the underlying map call appropriate methods to keep those values in sync when the map is rehashed or an element is removed.
The thing you actually need is the ability to efficiently determine the relative order of any two items (so you can do a sort). And that's easy to get with the "Deterministic hash tables" implementation I linked (I believe): because the Entry objects are allocated in a linear, insertion-order array, you can do a normal lookup to get the Entry objects corresponding to your items and then compare the locations of those objects in memory.
I don't think anyone actually uses a linked list to keep track of order, LibJS included, so I'm not sure of the relevance of this comment.
Yeah, I'm afraid I don't see a way short of an additional table given the current implementation. That said my data structure design skills are a bit rusty, so I might well be missing something. (If you do add an extra table which lets you do key->index lookups so that you can make |
We used to, with Idan's original implementation that used |
Re: the sort step, that's now been dropped, so you can just remove the fixme. |
https://github.com/tc39/proposal-set-methods