Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinct operator performance issues. #2009

Closed
bafolts opened this issue Oct 6, 2016 · 7 comments
Closed

Distinct operator performance issues. #2009

bafolts opened this issue Oct 6, 2016 · 7 comments
Assignees

Comments

@bafolts
Copy link
Contributor

bafolts commented Oct 6, 2016

RxJS version:
5.0.0-rc1

Code to reproduce:
Create a distinct observable with 100k unique items, it will progressively get slower as internally a serial search for the item is used.

var rx = require("rxjs/Rx");

var s = new rx.Subject();

s.distinct()
    .count()
    .subscribe((n) => {
        console.log(n);
    });

for (var i = 0; i < 100000; i++) {
    s.next(i);
}

s.complete();

Expected behavior:
This code should finish under 1 second.

Actual behavior:
This code takes minutes to run.

Additional information:
The internals of distinct needs to utilize a set to determine if the item has already been observed. The serial search will not scale. I stumbled upon this issue by not fully understanding how distinct worked.

@jadbox
Copy link
Contributor

jadbox commented Oct 14, 2016

"RxJS version: most recent" Please be specific: Is it RC1?

@bafolts
Copy link
Contributor Author

bafolts commented Oct 17, 2016

https://github.com/ReactiveX/rxjs/blob/master/src/operator/distinct.ts#L68-L73

Whatever version this would be is using the for loop which will lead to the poor performance.

@benlesh benlesh self-assigned this Oct 17, 2016
@benlesh
Copy link
Member

benlesh commented Oct 17, 2016

@bafolts Yeah, that is a problem. I'm surprised it made it this long without being pointed out. The distinct operator was added during a phase where we were trying to get functional parity with v3. However, that operator even changed since then for v4. I think we can add the keySelector and probably entirely drop the comparer. I'm not sure who wants to check for distinct in that way anyhow.

I should have a PR for this later today.

We'll probably:

  • Add keySelector
  • Drop distinctKey operator entirely.

@benlesh
Copy link
Member

benlesh commented Oct 17, 2016

@jayphelps
Copy link
Member

jayphelps commented Oct 18, 2016

My completely unscientific testing suggests to me that we should indeed utilize a set of some kind, ideally in modern browsers use an ES6 Set and but fall back to an array with an indexOf check for older platforms like IE9-10.

Again, not scientific but I feel pretty safe in my belief that Sets are super fast compared to any approach that uses Arrays--at least in Chrome. And it makes sense too cause the underlying runtime can implement a true HashSet instead of needing to loop through all the items.

next_Set 25.91 ms
next_indexOf 2775.84 ms
next_loop 4547.62 ms

The downside to using this would be that we could no longer accept the same compare function we do today, because that relies on arbitrary comparison of each individual item aka a loop. compare = (prev, next) => prev !== next.

In some distant future when we only support IE11+ we could accept a compare function that is provided the Set and then expected to do something with it to decide if the value is distinct, like the default (set, value) => set.has(value) which would allow you to iterate over each item in the Set if you really wanted to.

Would love to hear others thoughts on this and if they have alternative, IE9+ supported solutions. Without proof, I feel that 99% of people using distinct do not need a compare callback, and if true I'd rather the perf of the default implementation be literally 100x better and drop that feature.

@bafolts
Copy link
Contributor Author

bafolts commented Oct 18, 2016

A set is definitely the way to go. Instead of a compare method, we could use a method that generated the hash key for the item. If the hash key had to be a string then IE9 could use generated Object properties as a set. I bet people may use the compare callback to select a property to compare on an object. They could use pluck but then they may lose their original object. With a compare callback they can get distinct objects without losing their original object. If users could provide a function for how to hash their Object I think they could convert their current compare methods fine. If a string return type was enforced we should also be fine for all browsers. IE9 users would have to provide the callback function.

Observable.fromEvent(document, "mousemove")
     .distinct((event: MouseMoveEvent) => {
          return `${event.clientX}:${event.clientY}`;
     });

Internally IE9 can now use the poor man's hash from days of yore {} and Object.prototype.hasOwnProperty.

@lock
Copy link

lock bot commented Jun 6, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants