Skip to content

Latest commit

 

History

History
358 lines (301 loc) · 30.6 KB

RELEASE-NOTES.md

File metadata and controls

358 lines (301 loc) · 30.6 KB

Riak 1.3.0 Release Notes

New Features or Major Improvements for Riak

Active Anti-Entropy

New in Riak 1.3. Riak now includes an active anti-entropy (AAE) subsystem that works to verify and repair data cross an entire Riak cluster. The AAE system periodically exchanges information between data replicas in order to determine missing or divergent data. When bad replicas are detected, AAE triggers read repair to correct the situation. AAE is entirely automatic, and provides an additional layer of protection against various data loss scenarios (eg. disk failure, restoring from an outdated backup, bit rot, etc).

AAE is implemented using hash tree exchange, which ensures that the information exchanged between data replicas is proportional to the amount of divergent data rather than the total amount of data stored in Riak. When all data is in sync (the common case), exchanges are fast and have extremely low overhead. For this reason, AAE is able to perform multiple exchanges a minute with negligible impact on a cluster.

AAE hash trees are persistent entities stored in LevelDB instances separate from normal Riak K/V data. When first starting a fresh Riak 1.3 cluster (or upgrading from an older release), Riak will generate the hash tree information by traversing over each partition's data. By default, Riak will build one hash tree per hour per node. If the traversal over a partition's data takes more than an hour, then Riak may trigger a second tree build. However, by default at most two tree builds can occur at once.

Once a hash tree is built, it is kept up-to-date in real-time as writes are sent to Riak. However, trees are periodically expired and rebuilt to protect against potential divergence between the K/V data and its corresponding hash tree. Rebuilding trees also protects against silent data corruption (eg. bit rot). By default, trees are expired and rebuilt once a week.

All of the above settings (and more) can be configured in app.config. The AAE settings are in the riak_kv section, and have comments documenting the different options.

To provide insight into AAE, Riak provides the riak-admin aae-status command. The AAE status output is broken into three parts: Exchanges, Entropy Trees, and Keys Repaired.

================================== Exchanges ==================================
Index                                              Last (ago)    All (ago)
-------------------------------------------------------------------------------
0                                                  3.8 min       4.1 min
91343852333181432387730302044767688728495783936    3.3 min       7.8 min
182687704666362864775460604089535377456991567872   2.8 min       8.3 min
274031556999544297163190906134303066185487351808   2.3 min       6.3 min
365375409332725729550921208179070754913983135744   1.8 min       5.5 min
<snip>

The Exchanges section shows information about AAE exchanges for each K/V partition. The Last column lists when the most recent exchange between a partition and one of its sibling replicas was performed. The All column shows how long it has been since a partition exchanged with all of its sibling replicas. In essence, the All column sets the upperbound on how out-of-date an individual partition can be. Specifically, a partition can not have any missing or divergent data older that the value shown in All, unless all replicas for that data are invalid.

================================ Entropy Trees ================================
Index                                              Built (ago)
-------------------------------------------------------------------------------
0                                                  22.1 min
91343852333181432387730302044767688728495783936    22.6 min
182687704666362864775460604089535377456991567872   22.3 min
274031556999544297163190906134303066185487351808   22.9 min
365375409332725729550921208179070754913983135744   22.3 min
<snip>

The Entropy Trees section shows when the hash trees for a given partition were created. A hash tree must be built before a partition can participate in an exchange. As mentioned above, trees are built once and expired (by default) once a week.

================================ Keys Repaired ================================
Index                                                Last      Mean      Max
-------------------------------------------------------------------------------
0                                                     0         0         0
91343852333181432387730302044767688728495783936       87        21        87
182687704666362864775460604089535377456991567872      0         0         0
274031556999544297163190906134303066185487351808      0         0         0
365375409332725729550921208179070754913983135744      0         0         0
<snip>

The Keys Repaired section presents information about repairs triggered by AAE, including keys repaired in the most recent exchange, and the mean and max across all exchanges.

Note: All AAE status information is in-memory and is reset across a node restart. Only tree build times are persistent (since trees themselves are persistent).

Final notes about AAE:

  1. Trees must be built before exchange can occur. Since trees are built once an hour by default, it will take up to ring_size / number_of_nodes hours before all trees are built after first starting or upgrading to 1.3, and therefore that amount of time until AAE is fully protecting all data.

  2. Tree building typically uses 100% of a CPU when possible but should have minimal impact on Riak performance. When using Bitcask for K/V data, tree building may increase the latency for list_keys, list_buckets, and Riak EE's fullsync replication strategy. Once trees are built, these issues go away (until trees are expired/rebuilt a week later).

  3. AAE may occasionally repair a small number of keys (typically 1 or 2) even in a healthy cluster without divergent or missing data. This occurs when AAE is performing an exchange at the same time incoming writes are occurring to the same nodes. For example, a write may reach node A while being concurrently in-flight to node B, yet AAE happens to run at just the right moment to see the write on A but not B, and force a repair. Since AAE just triggers reads (to trigger read repair) this behavior is entirely safe.

  4. AAE is a feature of Riak K/V and does not protect Riak Search data.

MapReduce Sink Backpressure

Riak Pipe brought inter-stage backpressure to Riak KV's MapReduce system. However, prior to Riak 1.3, that backpressure did not extend to the sink. It was assumed that the Protocol Buffers or HTTP endpoint could handle the full output rate of the pipe. With Riak 1.3, backpressure has been extended to the sink so that those endpoint processes no longer become overwhelmed. This backpressure is tunable via a soft cap on the size of the sink's buffer, and a period at which a worker should check that cap. These can be configured at the Riak console by setting application environment variables, or in the riak_kv section of app.config (defaults shown):

{riak_kv,
 ...
 %% Soft cap on the MapReduce sink's buffer,
 %% expressed as a positive integer number of messages
 %% (one message is used per MapReduce result)
 {mrc_sink_buffer, 1000},

 %% Period at which a MapReduce worker must check
 %% the sink's buffer cap, expressed as an integer
 %% number of messages to send before waiting on
 %% an clear-to-send acknowledgement
 %%   0 = wait for acknowledgement of each message
 %%   1 = wait every other message
 %%   'infinity' = never wait for acknowledgements
 {mrc_sink_sync_period, 10}
}.

Additional IPv6 support

Riak Handoff and Protocol Buffers interfaces can now listen on IPv6 addresses (HTTP has always supported IPv6). You may specify the address using the short-hand string form, e.g. "::1" (for localhost), or as the 16-byte address in a tuple of 8 numbers, e.g. {0,0,0,0,0,0,0,1} (for localhost). IPv4 addresses may also be specified in either form (except the latter will be 4 bytes, tuple of 4 numbers). Note: This does not affect Riak node names. Refer to the inet_dist_* settings in the Erlang documentation to enable IPv6 support for cluster membership.

Luke Removal

The luke application was deprecated in the release of Riak 1.2. This release removes it, and all code using it.

riak getpid Added

A bug existed in how we used riak stop (listed below in Bugs Fixed) that justified a refactoring of how we got our own PID of Riak. While fixing the bug, it was thought getpid might be useful to system admins out there who don't want to rely on outside scripts to find the PID of Riak. riak getpid does what you expect, returns the PID of a running Riak or exits with 1 on failure. It is a small feature, but might save some time with ps, grep, and awk.

Riaknostic Included by Default

To encourage its use, we have now included Riaknostic in the Riak packages. Prior to 1.3, the user needed to download riaknostic separately, but now riak-admin diag will work out of the box.

Support added for SmartOS 1.8

Packages are now available for SmartOS machines based on 1.8 datasets as well as 1.6.

Health Check

New in Riak 1.3. Riak Core now includes a health check subsystem that actively monitors each node for specific conditions and disables/enables services based on those conditions.

To enable/disable all health checks a new setting has been added to the riak_core section of app.config:

%% Health Checks
%% If disabled, health checks registered by an application will
%% be ignored. NOTE: this option cannot be changed at runtime.
%% To re-enable, the setting must be changed and the node restarted.
{enable_health_checks, true},

Riak registers a health check with Riak Core to monitor the message queue lengths of KV vnodes. To configure the kv health check a new setting has been added to the riak_kv section of app.config:

%% This option configures the riak_kv health check that monitors
%% message queue lengths of riak_kv vnodes. The value is a 2-tuple,
%% {EnableThreshold, DisableThreshold}. If a riak_kv_vnode's message
%% queue length reaches DisableThreshold the riak_kv service is disabled
%% on this node. The service will not be re-enabled until the message queue
%% length drops below EnableThreshold.
{vnode_mailbox_limit, {1, 5000}}

Note: the kv health check does not apply to Riak Search or Riak Pipe vnodes.

Reset Bucket Properties

The HTTP interface now supports resetting bucket properties to their default values. Bucket properties are stored in Riak's ring structure that is gossiped around the cluster. Resetting bucket properties for buckets that are no longer used or that are using the default properties can reduce the amount of gossiped data.

Support for logging to syslog

Riak 1.3 now includes support for logging to syslog. To enable it, you can add something like this to the 'handlers' section of riak's app.config, under lager:

{lager_syslog_backend, ["riak", daemon, info]}

Which would log any messages at info or above to the daemon facility with the identity set to 'riak'. For more information see the lager_syslog documentation:

https://github.com/basho/lager_syslog

Installation Notes

For RHEL/Centos/Fedora users, the RPM tools have added a dependency on expect, so if you see a message like this:

$ sudo rpm -i riak-1.3.0rc1-1.el5.x86_64.rpm
error: Failed dependencies:
    /usr/bin/expect is needed by riak-1.3.0rc1-1.x86_64

You can fix this issue by installing the Riak RPM with yum which will resolve any dependencies automatically:

$ sudo yum -y install riak-1.3.0rc1-1.el5.x86_64.rpm
Preparing...                ########################################### [100%]
   1:expect                 ########################################### [100%]
   2:riak                   ########################################### [100%]

Issues / PR's Resolved