You must be signed in to change notification settings - Fork 1
The Hystrix Dashboard enables realtime monitoring of Hystrix metrics.
Use of this dashboard improved Netflix operations by reducing discovery and recovery times during operational events. The duration of most production incidents (already less frequent due to Hystrix) became far shorter, with diminished impact, due to the realtime insights into system behavior.
When a circuit is failing then it changes colors (gradient from green through yellow, orange and red) such as this:
The diagram below shows one "circuit" from the dashboard along with explanations of what all of the data represents.
We've purposefully tried to pack a lot of information into the dashboard so that engineers can quickly consume and correlate data.
It allows monitoring a single server or a cluster of servers aggregated using Turbine with low latency (typically around 1 or 2 seconds when aggregating a cluster, subsecond with a single server).
Here is another example from the Netflix API dashboard monitoring 476 servers aggregated using Turbine:
- Download hystrix-dashboard-1.1.2.war
- Install in servlet container such as Apache Tomcat 7
Usage examples below will assume installation to /webapps/hystrix-dashboard.war
./gradlew build
cp hystrix-dashboard/build/libs/hystrix-dashboard-*.war ./apache-tomcat-7.*/webapps/hystrix-dashboard.war (or other servlet container)
The hystrix-metrics-event-stream module exposes metrics in a text/event-stream formatted stream that continues as long as a client holds the connection.
The Hystrix Dashboard expects data in the format that this module emits.
See its README for installation instructions.
- Download turbine-web-1.0.0.war
- Install in servlet container such as Apache Tomcat 7
Usage examples below will assume installation to /webapps/turbine.war
git clone git://github.com/Netflix/Turbine.git
./gradlew build
cp turbine-web/build/libs/turbine-web-*.war ./apache-tomcat-7.*/webapps/turbine.war (or other servlet container)
Turbine configuration details can be found on its Configuration Wiki. It also supports custom plugins for Instance Discovery.
To get started as a "Hello World!" example a static configuration file pointing to specific instances can be used such as the following.
Create a file config.properties that lists hosts to aggregate.
This example includes 2 EC2 instances:
The 'turbine.instanceUrlSuffix' property is what will be appended to each hostname to create a URL that will result in the hystrix-metrics-event-stream.
The config.properties file can be:
- placed on the classpath such as in /WEB-INF/classes
- specified using a JVM property such as
You can test that Turbine is correctly accessing instances and streaming metrics like this:
curl http://hostname:port/turbine/turbine.stream
If successful you should see something like this:
$ curl http://ec2-23-20-84-255.compute-1.amazonaws.com:8080/turbine/turbine.stream
: ping
data: {"rollingCountFailure":0,"propertyValue_executionIsolationThreadInterruptOnTimeout":true,"rollingCountTimeout":0,"rollingCountExceptionsThrown":0,"rollingCountFallbackSuccess":0,"errorCount":0,"type":"HystrixCommand","propertyValue_circuitBreakerEnabled":true,"reportingHosts":1,"latencyTotal":{"0":0,"95":0,"99.5":0,"90":0,"25":0,"99":0,"75":0,"100":0,"50":0},"currentConcurrentExecutionCount":0,"rollingCountSemaphoreRejected":0,"rollingCountFallbackRejection":0,"rollingCountShortCircuited":0,"rollingCountResponsesFromCache":0,"propertyValue_circuitBreakerForceClosed":false,"name":"IdentityCookieAuthSwitchProfile","propertyValue_executionIsolationThreadPoolKeyOverride":"null","rollingCountSuccess":0,"propertyValue_requestLogEnabled":true,"requestCount":0,"rollingCountCollapsedRequests":0,"errorPercentage":0,"propertyValue_circuitBreakerSleepWindowInMilliseconds":5000,"latencyTotal_mean":0,"propertyValue_circuitBreakerForceOpen":false,"propertyValue_circuitBreakerRequestVolumeThreshold":20,"propertyValue_circuitBreakerErrorThresholdPercentage":50,"propertyValue_executionIsolationStrategy":"THREAD","rollingCountFallbackFailure":0,"isCircuitBreakerOpen":false,"propertyValue_executionIsolationSemaphoreMaxConcurrentRequests":20,"propertyValue_executionIsolationThreadTimeoutInMilliseconds":1000,"propertyValue_metricsRollingStatisticalWindowInMilliseconds":10000,"propertyValue_fallbackIsolationSemaphoreMaxConcurrentRequests":10,"latencyExecute":{"0":0,"95":0,"99.5":0,"90":0,"25":0,"99":0,"75":0,"100":0,"50":0},"group":"IDENTITY","latencyExecute_mean":0,"propertyValue_requestCacheEnabled":true,"rollingCountThreadPoolRejected":0}
When you access the Hystrix Dashboard homepage you should see something like this:
To monitor a single server you would use a URL such as:
To monitor an aggregate stream via Turbine it would be like:
The landing page does nothing more than generate the /monitor/monitor.html URLs that can then be bookmarked.
The 'delay' parameter controls the latency that is injected between polling cycles on the server to slow down the stream. This can be used to reduce the network and CPU usage on the client.
The 'title' parameter is used by the monitor.html page to display a nice title instead of the raw URL.
We expect that many will want to embed the dashboard functionality into their own existing dashboards.
To accommodate this we have kept the app very simple - primary just HTML, Javascript and CSS in modules that can be dropped into any app.
The only portion that is server side is a proxy servlet used to proxy streams between the browser and backend since EventSource CORS support is still a work in progress.
Displaying HystrixCommand monitors on an existing page is as simple as importing the javascript module, instantiating it with a DIV to use and giving it an EventStream:
var hystrixMonitor = new HystrixCommandMonitor('dependencies', {includeDetailIcon:false});
// start the EventSource which will open a streaming connection to the server
var source = new EventSource("http://hostname:port/hystrix.stream");
// add the listener that will process incoming events
source.addEventListener('message', hystrixMonitor.eventSourceMessageListener, false);
If you have UI improvements that you feel would benefit everyone please create a pull request and contribute back to the project and feel free to ask questions and file bugs.
A Netflix Original Production
Tech Blog | Twitter @NetflixOSS | Jobs