Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Hystrix Observable for a zip operation leading to a memory leak #1370

Closed
bijukunjummen opened this issue Oct 4, 2016 · 12 comments
Closed
Labels

Comments

@bijukunjummen
Copy link

bijukunjummen commented Oct 4, 2016

Hi, I am not sure if this issue fits here or with rx-java. The behavior I am seeing is if I return an Observable from a set of Hystrix Commands using toObservable method and use that later in a zip call, say:

        return Observable.zip(
                part1Observable,
                part2Observable,
                part3Observable,
                (part1, part2, part3) -> new FullResponse(part1, part2, part3)).toBlocking().single(); 

with some of the circuits open, I see a leak with a lot of instances of OperatorZip building up in memory, here is how the instances look using Memory Analyzer tool:

hystrixmemoryleak

This behavior is not seen if I were to wrap the hystrix call using Observable.create and then use it in a zip operation, which leads me to think that the issue is somewhere in Hystrix libraries and not in rx-java itself.

Rxjava version - 1.1.5
Hystrix version - 1.5.3

@mattrjacobs
Copy link
Contributor

Thanks for the report, @bijukunjummen . Can you provide more details on the 3 Observables in your first example? Are they all the result of observing HystrixCommands? Are they succeeding/failing/slow?

@bijukunjummen
Copy link
Author

bijukunjummen commented Oct 5, 2016

Hi @mattrjacobs , thanks for your response. Yes, they are stock Hystrix commands created using a tiny wrapper - https://github.com/bijukunjummen/obs-leak-test/blob/master/src/main/java/leaktest/commands/Part1Command.java

I have put a sample project here - https://github.com/bijukunjummen/obs-leak-test which replicates this behavior.

In the sample I have forcibly opened the circuit for the first command through this property - hystrix.command.part1.circuitBreaker.forceOpen=true.

The result in the app is as expected, but because of the above leak ultimately the app crashes.
Also, just a note that if the services wrapped with hystrix commands are responding well with the circuit closed then this issue is not seen.

@bijukunjummen
Copy link
Author

One more observation - this happens only if there is no fallback available in the command with open circuit. With a clean fallback there is no build up in memory.

@mattrjacobs
Copy link
Contributor

I just added and ran a jmh test for single command execution when the circuit is forced open. This didn't uncover a memory leak

open-circuit-heap

I then removed the fallback and re-ran the test. This also did not exhibit a memory leak

open-circuit-no-fallback

My next guess would be to add an Observable.zip and see if there's anything interesting going on external to the command.

@bijukunjummen
Copy link
Author

I repeated a jmh test and eliminated all other dependencies that I had in my previous sample @mattrjacobs and am able to replicate the leak. Here is the sample that I have - https://github.com/bijukunjummen/obs-leak-benchmark/blob/master/src/main/java/obs/ObservableZipBenchmark.java

heapcpu

With the benchmarks I quickly got a "GC overhead limit exceeded error":

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at rx.internal.util.SubscriptionList.add(SubscriptionList.java:67)
        at rx.Subscriber.add(Subscriber.java:93)
        at rx.internal.operators.UnicastSubject$State.call(UnicastSubject.java:218)
        at rx.internal.operators.UnicastSubject$State.call(UnicastSubject.java:91)
        at rx.Observable$2.call(Observable.java:162)
        at rx.Observable$2.call(Observable.java:154)
        at rx.Observable$2.call(Observable.java:162)
        at rx.Observable$2.call(Observable.java:154)

@mattrjacobs
Copy link
Contributor

mattrjacobs commented Oct 18, 2016

Thanks for the benchmark, @bijukunjummen, that's very helpful. The first thing I tried was using Observable.error({some exception}) for part1Obs and Observable.just({some string}) for part2Obs. When I did that, the memory leak went away. So it does appear that Hystrix is leaking something somewhere. I'm investigating further.

@mattrjacobs
Copy link
Contributor

My first interesting finding (which I'd love for you to verify) is that switching the order of the 2 Observables does not result in the leak. That leads me to believe that somehow the Zip operator is seeing the error and exiting early, and some data structure is being references somewhere.

Still chasing that down...

@bijukunjummen
Copy link
Author

Yes, able to replicate the behavior you are seeing @mattrjacobs!

A workaround that I have used is to explicitly trap the exception, say using fugue library, then the leak is not there also :

            Observable<Either<Throwable, String>> part1ObsEither = command1.toObservable()
                    .map(Either::<Throwable, String>right)
                    .onErrorReturn(Either::left);

            Observable<Either<Throwable, String>> part2ObsEither = command2.toObservable()
                    .map(Either::<Throwable, String>right)
                    .onErrorReturn(Either::left);

            return Observable.zip(part1ObsEither, part2ObsEither,
                    (part1Either, part2Either) -> {
                        Either<Throwable, String> resEither = part1Either.flatMap(part1 -> part2Either.map(part2 -> part1 + part2));
                        if (resEither.isRight()) {
                            return resEither.right().get();
                        }
                        throw (RuntimeException)resEither.left().get();
                    }).toBlocking().single();

@mattrjacobs
Copy link
Contributor

mattrjacobs commented Oct 19, 2016

OK, finally got to the bottom of this. The behavior of zip is to subscribe to each Observable but then to propagate an error out ASAP and unsubscribe from the remainder of the Observables.

In the case of Hystrix, this manifested as an unsubscribe before a subscribe. This isn't something that we had tests for, since the only way to induce it is through an operator like zip.

Once I made that realization, the fix is straightforward. Hystrix will now return Observable.never() when it detects unsubscription. Before, it was doing some setup work which never had a chance to get cleaned up.

I used the supplied jmh test to verify that doing this prevents any references from building up. Thanks again for supplying that!

@bijukunjummen
Copy link
Author

Awesome, great to hear that you have found the root cause and that the fix is simple. Thanks for digging into it.

I will try out the PR and let you know how it goes.

@bijukunjummen
Copy link
Author

I was able to test the PR @mattrjacobs, it works perfectly, thanks for the quick turnaround.

@mattrjacobs
Copy link
Contributor

Great, thanks for the confirmation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants