Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayIndexOutOfBoundsException when monitoring in dashboard #788

Closed
killme2008 opened this issue May 6, 2015 · 23 comments
Closed

ArrayIndexOutOfBoundsException when monitoring in dashboard #788

killme2008 opened this issue May 6, 2015 · 23 comments

Comments

@killme2008
Copy link
Contributor

When the app server is connected to dashboard by event stream, i found some exception in our log:

java.lang.ArrayIndexOutOfBoundsException
    at org.HdrHistogram.AbstractHistogramIterator.next(AbstractHistogramIterator.java:100)
    at org.HdrHistogram.RecordedValuesIterator.next(RecordedValuesIterator.java:18)
    at org.HdrHistogram.AbstractHistogram.getMean(AbstractHistogram.java:1121)
    at com.netflix.hystrix.util.HystrixRollingPercentile$PercentileSnapshot.getMean(HystrixRollingPercentile.java:327)
    at com.netflix.hystrix.util.HystrixRollingPercentile.getMean(HystrixRollingPercentile.java:155)
    at com.netflix.hystrix.HystrixCommandMetrics.getTotalTimeMean(HystrixCommandMetrics.java:244)
    at com.netflix.hystrix.contrib.metrics.eventstream.HystrixMetricsPoller$MetricsPoller.run(HystrixMetricsPoller.java:140)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

The dashboard still works, but maybe there is a bug here.

@mattrjacobs
Copy link
Contributor

Thanks for the report, @killme2008 . The integration with HdrHistogram is new, so this is likely a bug in our usage of it. I'll check it out

@mattrjacobs
Copy link
Contributor

@killme2008 I'm not able to reproduce it with any of the obvious possible errors. Could you provide more detail on what's happening in your system when you encounter this error?

I've tested:

  • Empty histogram
  • Concurrent access/modification to histogram
  • Pulling a metrics stream from a Netflix app running Hystrix 1.4.7 was successful

@killme2008
Copy link
Contributor Author

Hi, @mattrjacobs
The error is not in dashboard but the app server.
We use https://github.com/josephwilk/hystrix-event-stream-clj in our app server to output event stream for dashboard.
I don't do any special operations on dashboard,just open the index page,then connect one app server with url ,and the error begin to output in the app log.

The error log is not too much, but always happen.

@killme2008
Copy link
Contributor Author

The jar versions:

hystrix-clj-1.4.7.jar          
hystrix-event-stream-clj-0.1.3.jar
hystrix-core-1.4.7.jar                 
hystrix-metrics-event-stream-1.1.2.jar

And the web browser is chrome 42.0.2311.135 (64-bit)

@killme2008
Copy link
Contributor Author

HdrHistogram version is 2.1.4

@killme2008
Copy link
Contributor Author

@mattrjacobs
Copy link
Contributor

Yeah, I'm trying to track down under what conditions we are hitting that line. Will keep you posted.

@lazycodeninja
Copy link

I've been encountering this error as well. In my case it's happening together with a more serious problem where the dashboard seems to block the whole app from running. I've got several Hystrix apps deployed in a WAS Liberty container, and when I have a dashboard open for more than 2 of the apps, it all becomes non-responsive and the error starts to show up in my logs. Sometimes it recovers after closing all the dashboards, sometimes not.

Interestingly, I can directly access the hystrix.stream URL in my browser for all the apps at once with no issues.

@enrico-scalavino
Copy link

I am encountering the same error:

WARN  [2015-05-18 09:27:41,120] com.netflix.hystrix.contrib.metrics.eventstream.HystrixMetricsPoller: Failed to output metrics as JSON
! java.lang.ArrayIndexOutOfBoundsException: null
! at org.HdrHistogram.AbstractHistogramIterator.next(AbstractHistogramIterator.java:100) ~[messages.jar:0.0.76]
! at org.HdrHistogram.RecordedValuesIterator.next(RecordedValuesIterator.java:18) ~[messages.jar:0.0.76]
! at org.HdrHistogram.AbstractHistogram.getMean(AbstractHistogram.java:1121) ~[messages.jar:0.0.76]
! at com.netflix.hystrix.util.HystrixRollingPercentile$PercentileSnapshot.getMean(HystrixRollingPercentile.java:327) ~[messages.jar:0.0.76]
! at com.netflix.hystrix.util.HystrixRollingPercentile.getMean(HystrixRollingPercentile.java:155) ~[messages.jar:0.0.76]
! at com.netflix.hystrix.HystrixCommandMetrics.getTotalTimeMean(HystrixCommandMetrics.java:244) ~[messages.jar:0.0.76]
! at com.netflix.hystrix.contrib.metrics.eventstream.HystrixMetricsPoller$MetricsPoller.getCommandJson(HystrixMetricsPoller.java:237) ~[messages.jar:0.0.76]
! at com.netflix.hystrix.contrib.metrics.eventstream.HystrixMetricsPoller$MetricsPoller.run(HystrixMetricsPoller.java:154) ~[messages.jar:0.0.76]
! at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_25]
! at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_25]
! at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_25]
! at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_25]
! at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_25]
! at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_25]
! at java.lang.Thread.run(Thread.java:745) [na:1.8.0_25]

@mattrjacobs
Copy link
Contributor

I'm actively looking at this issue - I believe that the changes to use HdrHistogram weren't complete. I should have a fix shortly

@mattrjacobs
Copy link
Contributor

The root cause of the problem was that read functionality (like getPercentile/getMean) was accessing the histogram directly and from multiple threads. Histograms are not safe to read concurrently, and exhibit problems like the above when they are.

My fix likely isn't the final word, but I believe it does address the multihreaded access. When snapshots of many window chunks are created, a new histogram is created, and is read-only. I'm now caching the commonly accessed values (mean, p10, p50, p90, etc). Specifically, I now cache at least all values that the metrics pollers are accessing. I then let arbitrary percentile calculations work, but synchronize on this operation.

I will open a new issue to fix this more thoroughly, and discuss design options for doing so.

@killme2008
Copy link
Contributor Author

@manuelbernhardt Thanks for your update.Is there a SNAPSHOT version to test?

@mattrjacobs
Copy link
Contributor

I just released this in 1.4.8. Can you please report if the ArrayOOB is fixed for you?

@killme2008
Copy link
Contributor Author

@mattrjacobs I will try it in our beta environment.Thanks for your mention.

@killme2008
Copy link
Contributor Author

@mattrjacobs We are running 1.4.8 in our production for 3 days, and it looks that the exception has gone! Thanks for your fix.

@mattrjacobs
Copy link
Contributor

Glad to hear it. You should probably upgrade to 1.4.9, as I found a similar issue where some percentile writes were lost (#799).

@deathrowe
Copy link

deathrowe commented Apr 6, 2016

I believe I am seeing this issue again using hystrix-core-1.5.1

2016-04-06 13:42:39.225  WARN 27798 --- [rixMetricPoller] c.n.h.c.m.e.HystrixMetricsPoller         : Failed to output metrics as JSON

rx.exceptions.OnErrorNotImplementedException: null
    at rx.observers.Observers$1.onError(Observers.java:40) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.observers.Subscribers$1.onError(Subscribers.java:58) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:71) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.internal.operators.OperatorMerge$MergeSubscriber.reportError(OperatorMerge.java:243) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.internal.operators.OperatorMerge$MergeSubscriber.checkTerminate(OperatorMerge.java:779) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.internal.operators.OperatorMerge$MergeSubscriber.emitLoop(OperatorMerge.java:540) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.internal.operators.OperatorMerge$MergeSubscriber.emit(OperatorMerge.java:529) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.internal.operators.OperatorMerge$MergeSubscriber.onError(OperatorMerge.java:253) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.internal.operators.OperatorMap$1.onError(OperatorMap.java:48) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.internal.operators.OperatorWindowWithSize$InexactSubscriber.onError(OperatorWindowWithSize.java:229) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.internal.operators.NotificationLite.accept(NotificationLite.java:147) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.subjects.SubjectSubscriptionManager$SubjectObserver.accept(SubjectSubscriptionManager.java:318) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.subjects.SubjectSubscriptionManager$SubjectObserver.emitLoop(SubjectSubscriptionManager.java:291) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.subjects.SubjectSubscriptionManager$SubjectObserver.emitFirst(SubjectSubscriptionManager.java:270) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.subjects.BehaviorSubject$1.call(BehaviorSubject.java:106) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.subjects.BehaviorSubject$1.call(BehaviorSubject.java:102) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.subjects.SubjectSubscriptionManager.add(SubjectSubscriptionManager.java:95) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.subjects.SubjectSubscriptionManager.call(SubjectSubscriptionManager.java:60) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.subjects.SubjectSubscriptionManager.call(SubjectSubscriptionManager.java:35) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.Observable$2.call(Observable.java:162) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.Observable$2.call(Observable.java:154) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.Observable$2.call(Observable.java:162) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.Observable$2.call(Observable.java:154) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.Observable$2.call(Observable.java:162) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.Observable$2.call(Observable.java:154) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.Observable$2.call(Observable.java:162) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.Observable$2.call(Observable.java:154) ~[rxjava-1.1.1.jar:1.1.1]
    at rx.Observable.unsafeSubscribe(Observable.java:8314) ~[rxjava-1.1.1.jar:1.1.1]
    at com.netflix.hystrix.metric.consumer.RollingDistributionStream.startCachingStreamValuesIfUnstarted(RollingDistributionStream.java:153) ~[hystrix-core-1.5.1.jar:1.5.1]
    at com.netflix.hystrix.metric.consumer.RollingDistributionStream.getLatest(RollingDistributionStream.java:166) ~[hystrix-core-1.5.1.jar:1.5.1]
    at com.netflix.hystrix.metric.consumer.RollingDistributionStream.getLatestMean(RollingDistributionStream.java:128) ~[hystrix-core-1.5.1.jar:1.5.1]
    at com.netflix.hystrix.HystrixCommandMetrics.getTotalTimeMean(HystrixCommandMetrics.java:325) ~[hystrix-core-1.5.1.jar:1.5.1]
    at com.netflix.hystrix.contrib.metrics.eventstream.HystrixMetricsPoller$MetricsPoller.getCommandJson(HystrixMetricsPoller.java:341) ~[hystrix-metrics-event-stream-1.5.1.jar:1.5.1]
    at com.netflix.hystrix.contrib.metrics.eventstream.HystrixMetricsPoller$MetricsPoller.run(HystrixMetricsPoller.java:168) ~[hystrix-metrics-event-stream-1.5.1.jar:1.5.1]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_73]
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_73]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_73]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_73]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_73]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_73]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_73]
Caused by: java.lang.ArrayIndexOutOfBoundsException: null
    at org.HdrHistogram.AbstractHistogramIterator.next(AbstractHistogramIterator.java:103) ~[HdrHistogram-2.1.7.jar:2.1.7]
    at org.HdrHistogram.RecordedValuesIterator.next(RecordedValuesIterator.java:18) ~[HdrHistogram-2.1.7.jar:2.1.7]
    at org.HdrHistogram.AbstractHistogram.getMean(AbstractHistogram.java:1140) ~[HdrHistogram-2.1.7.jar:2.1.7]
    at com.netflix.hystrix.metric.CachedValuesHistogram.<init>(CachedValuesHistogram.java:65) ~[hystrix-core-1.5.1.jar:1.5.1]
    at com.netflix.hystrix.metric.CachedValuesHistogram.backedBy(CachedValuesHistogram.java:53) ~[hystrix-core-1.5.1.jar:1.5.1]
    at com.netflix.hystrix.metric.consumer.RollingDistributionStream$4.call(RollingDistributionStream.java:82) ~[hystrix-core-1.5.1.jar:1.5.1]
    at com.netflix.hystrix.metric.consumer.RollingDistributionStream$4.call(RollingDistributionStream.java:79) ~[hystrix-core-1.5.1.jar:1.5.1]

@mattrjacobs
Copy link
Contributor

Thanks for the report, I'm investigating

@mattrjacobs mattrjacobs reopened this Apr 6, 2016
@mattrjacobs
Copy link
Contributor

I made a fix in #1171 and released in 1.5.2. Can you re-test with this?

@deathrowe
Copy link

I think this is good now. I have been running with 1.5.2 today and have not seen an issue. thanks!

@mattrjacobs
Copy link
Contributor

Thanks for the confirmation. Closing, but please re-open if you see otherwise

@stepanv
Copy link

stepanv commented Jan 17, 2017

Just for the record, I created a reproducer for the java.lang.ArrayIndexOutOfBoundsException being thrown from DoubleHistogram.getMean() and reported this problem at HdrHistogram/HdrHistogram#117

@mattrjacobs
Copy link
Contributor

The HdrHistogram is fairly well-documented to be not thread-safe. It was a bug on my part to use it otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants