Skip to content
Traun Leyden edited this page May 18, 2015 · 1 revision

Non-default settings

On Linux we set swappiness to 0 and disable transparent huge pages (THP). ext4 with no tuning is the most common file system.

We make no changes to Windows settings apart from disabling scheduled disk defragmentation (also called "Disk optimization").

We also enable parallel bucket and views compaction. Fragmentation threshold differs from default settings and depends on test case and workload.

Hacks

Changing internal settings:

    curl -XPOST -u Administrator:password http://host:port/internalSettings -d 'maxBucketCount=30'

Passing alternative "swt":

    ERL_AFLAGS="+swt low" /etc/init.d/couchbase-server restart

Changing number of vbuckets per bucket:

    COUCHBASE_NUM_VBUCKETS=64 /etc/init.d/couchbase-server restart

Changing memcached settings in 3.0:

    curl -XPOST -u Administrator:password -d 'ns_config:set({node, node(), {memcached, extra_args}}, ["-t12", "-c5000"]).' http://127.0.0.1:8091/diag/eval

Changing number of shards:

    curl -XPOST -u Administrator:password -d 'ns_bucket:update_bucket_props("default", [{extra_config_string, "max_num_shards=1"}]).' http://127.0.0.1:8091/diag/eval

Monitoring Erlang run queues:

    wget -O- -q --user=Administrator --password=password --post-data 'erlang:statistics(run_queues).' http://127.0.0.1:8091/diag/eval

Netem

tc qdisc del dev em1 root
tc qdisc add dev em1 handle 1: root htb
tc class add dev em1 parent 1: classid 1:1 htb rate 1gbit
tc class add dev em1 parent 1:1 classid 1:11 htb rate 1gbit
tc qdisc add dev em1 parent 1:11 handle 10: netem delay 100ms 5ms loss 0.01% 50% duplicate 0.005% corrupt 0.005%
tc filter add dev em1 protocol ip prio 1 u32 match ip dst 172.23.100.19 flowid 1:11

Bucket compaction

  1. Disable auto-compaction.

  2. Create initial dataset, wait for persistence and TAP replication.

  3. Mutate (update) all items in dataset. Wait for persistence and TAP replication.

  4. Trigger bucket compaction, report total compaction throughput (MBytes/sec) measured as:

     (data_disk_size_before_compaction - data_disk_size_after_compaction) / total_compaction_time
    

Index compaction

  1. Disable auto-compaction, disable automatic index updates.

  2. Create initial dataset, wait for persistence and TAP replaication.

  3. Define design documents.

  4. Trigger index build, wait for indexing to finish.

  5. Mutate all items in dataset. Wait for persistence and TAP replication.

  6. Trigger index build, wait for indexing to finish.

  7. Trigger index compaction, report total compaction throughput (MBytes/sec) measured as:

     (views_disk_actual_size_before_compaction - views_disk_size_actual_after_compaction) / total_compaction_time
    

Initial indexing

  1. Disable auto-compaction.
  2. Create initial dataset, wait for persistence and TAP replication.
  3. Compact bucket.
  4. Define design documents.
  5. Trigger index build, report total indexing time in minutes.

Incremental indexing

  1. Disable auto-compaction, disable automatic index updates.
  2. Create initial dataset, wait for persistence and TAP replication.
  3. Compact bucket.
  4. Define design documents.
  5. Trigger index build, wait for indexing to finish.
  6. Double dataset. Wait for persistence and TAP replication.
  7. Compact bucket.
  8. Trigger index build, report total indexing time in minutes.

BgFetcher disk reads/sec

  1. Disable auto-compaction.
  2. Create initial DGM dataset, wait for persistence and TAP replication.
  3. Create working set via update or read operations, wait for persistence and TAP replication.
  4. Compact bucket.
  5. Start read-heavy (70/30) front-end workload with high cache miss ratio (~40%).
  6. Run workload for predefined time (e.g., 1 hour), report average ep_bg_fetched per node.

Flusher drain rate

  1. Create initial DGM dataset, wait for persistence and TAP replication.
  2. Create working set via update or read operations, wait for persistence and TAP replication.
  3. Compact bucket.
  4. Start write-heavy (90/10) front-end workload.
  5. Run workload for predefined time (e.g., 30 minutes), report average ep_diskqueue_drain per node.

OBSERVE latency

  1. Create initial DGM dataset, wait for persistence and TAP replication.
  2. Create working set via update or read operations, wait for persistence and TAP replication.
  3. Compact bucket.
  4. Start write-heavy (80/20) front-end workload.
  5. Run workload for predefined time (e.g., 1 hour), report 95th percentile of OBSERVE latency.

SET/GET latency (in-memory)

  1. Create initial non-DGM dataset, wait for persistence and TAP replication.
  2. Compact bucket.
  3. Start mixed (50/50) front-end workload.
  4. Run workload for predefined time (e.g., 1 hour), report 95th percentile of SET and GET latency.

GET latency (disk fetches)

  1. Create initial DGM dataset, wait for persistence and TAP replication.
  2. Create working set via update or read operations, wait for persistence and TAP replication.
  3. Compact bucket.
  4. Start mixed (50/50) front-end workload with high cache miss ratio (~30%).
  5. Run workload for predefined time (e.g., 1 hour), report 95th percentile of GET latency.

Warm up

  1. Create initial dataset, wait for persistence and TAP replication.
  2. Create working set via update or read operations, wait for persistence and TAP replication.
  3. Compact bucket.
  4. Start mixed (50/50) front-end workload.
  5. Run workload for predefined time (e.g., 1 hour).
  6. Restart all nodes, report master's ep_warmup_time in minutes.

TAP/URP throughput

  1. Single node, single bucket.
  2. Create initial dataset, wait for persistence and TAP replication.
  3. Read all data via TAP or UPR protocol, report average throughput (items/sec) measured as:

total_items / total_time

Memory fragmentation

  1. Keys look like 'AB_972518218995_0'.
  2. Values are several fields like this one '{"pn": "972516875596", "nam": "XxxxPhone_i5qbqg7iqugeg96v"};'.
  3. Single node, single bucket.
  4. Load 5M items, 700-1400 bytes, average 1KB (11-22 fields).
  5. Append data 0. Mark first 80% of items as working set. 0. Randomly update 75% of items in working set by adding 1 field at a time (62 bytes). 0. Mark first 40% of items as working set. 0. Randomly update 75% of items in working set by adding 1 field at a time (62 bytes). 0. Mark first 20% of items as working set. 0. Randomly update 75% of items in working set by adding 1 field at a time (62 bytes).
  6. Repeat step #5 5 times.

Query latency

  1. Create initial non-DGM or DGM dataset, wait for persistence and TAP replication.
  2. Create working set via update or read operations, wait for persistence and TAP replication.
  3. Compact bucket.
  4. Define design documents.
  5. Trigger index build, wait for indexing to finish.
  6. Start read-heavy (80/20) front-end workload with bounded view queries.
  7. Run workload for predefined time (e.g., 1 hour), report 80th of query latency (stale=update_after or stale=false).

Query throughput

  1. Create initial non-DGM dataset, wait for persistence and TAP replication.
  2. Create working set via update or read operations, wait for persistence and TAP replication.
  3. Compact bucket.
  4. Define design documents.
  5. Trigger index build, wait for indexing to finish.
  6. Start read-heavy (80/20) front-end workload with unbounded view queries.
  7. Run workload for predefined time (e.g., 1 hour), report average couch_view_ops.

Initial XDC replication

  1. Disable auto-compaction.
  2. Create initial dataset (source cluster), wait for persistence and TAP replication.
  3. Compact bucket.
  4. Initialize remote replication, report average replication rate (items/sec).

Ongoing XDCR

  1. Create initial dataset (source cluster), wait for persistence and TAP replication.
  2. Initialize remote replication, wait for initial replication, wait for persistence and TAP replication.
  3. Compact buckets.
  4. Start mixed (50/50) front-end workload
  5. Run workload for predefined time (e.g., 3 hours), report 90th percentile of XDCR lag.

The way lag is measured is based on following timeline:

t0 - client performs SET operation for key X on source cluster.

t1 - client receives response from source cluster.

t2 - client starts repeating GET requests for key X (with progressive polling interval) on destination.

t3 - client receives successful response from destination cluster.

The lag is calculated as t3 - t2

Rebalance

  1. Create initial dataset, wait for persistence and TAP replication.
  2. Create working set via update or read operations, wait for persistence and TAP replication.
  3. Compact bucket.
  4. Start mixed (50/50) front-end workload.
  5. Wait for predefined time (e.g., 20 minutes).
  6. Add/remove/swap nodes
  7. Trigger cluster rebalance, wait for rebalance to finish.
  8. Wait for predefined time (e.g., 20 minutes).
  9. Terminate front-end workload, report total rebalance time in minutes.

Rebalance after failover

  1. Create initial dataset, wait for persistence and TAP replication.
  2. Create working set via update or read operations, wait for persistence and TAP replication.
  3. Compact bucket.
  4. Start mixed (50/50) front-end workload.
  5. Wait for predefined time (e.g., 20 minutes).
  6. "Failover" one node.
  7. Add it back.
  8. Wait for predefined time (e.g., 10 minutes).
  9. Trigger cluster rebalance, wait for rebalance to finish.
  10. Wait for predefined time (e.g., 20 minutes).
  11. Terminate front-end workload, report total rebalance time in minutes.

Rebalance after graceful failover

  1. Create initial dataset, wait for persistence and TAP replication.
  2. Create working set via update or read operations, wait for persistence and TAP replication.
  3. Compact bucket.
  4. Start mixed (50/50) front-end workload.
  5. Wait for predefined time (e.g., 20 minutes).
  6. Gracefully "failover" one node.
  7. Wait for "failover rebalance" to finish, report total rebalance time in minutes.
  8. Add node back.
  9. Wait for predefined time (e.g., 10 minutes).
  10. Trigger cluster rebalance, wait for rebalance to finish.
  11. Wait for predefined time (e.g., 20 minutes).
  12. Terminate front-end workload, report total rebalance time in minutes.

Rebalance with view queries

  1. Create initial dataset, wait for persistence and TAP replication.
  2. Create working set via update or read operations, wait for persistence and TAP replication.
  3. Compact bucket.
  4. Define design documents.
  5. Trigger index build, wait for indexing to finish.
  6. Start mixed (50/50) front-end workload with bounded view queries.
  7. Wait for predefined time (e.g., 20 minutes).
  8. Add/remove/swap nodes and trigger cluster rebalance.
  9. Wait for rebalance to finish.
  10. Wait for predefined time (e.g., 20 minutes).
  11. Terminate front-end workload, report total rebalance time in minutes.

Rebalance with XDCR

  1. Create initial dataset, wait for persistence and TAP replication.
  2. Initialize remote replication, wait for initial replication, wait for persistence and TAP replication.
  3. Create working set via update or read operations, wait for persistence and TAP replication.
  4. Compact buckets.
  5. Start mixed (50/50) front-end workload.
  6. Wait for predefined time (e.g., 20 minutes).
  7. Add/remove nodes on source on destination sides, trigger cluster rebalance.
  8. Terminate front-end workload, report total rebalance time in minutes.