Skip to content

Commit

Permalink
Merge pull request #65 from genedragon/patch-2
Browse files Browse the repository at this point in the history
Update petabyte-scale.md
  • Loading branch information
lizsnyder authored Jul 25, 2022
2 parents c5140af + ba0a9a0 commit 18f8424
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions doc_source/petabyte-scale.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ Domains of this size exceed the default limit of 40 instances per domain\. To re
Before creating a domain of this size, check the [Amazon OpenSearch Service pricing](https://aws.amazon.com/elasticsearch-service/pricing/) page to ensure that the associated costs match your expectations\. Examine [UltraWarm storage for Amazon OpenSearch Service](ultrawarm.md) to see if a hot\-warm architecture fits your use case\.

**Storage**
The `i3` instance types are designed to provide fast, local non\-volatile memory express \(NVMe\) storage\. Because this local storage tends to offer performance benefits when compared to Amazon Elastic Block Store, EBS volumes are not an option when you select these instance types in OpenSearch Service\. If you prefer EBS storage, use another instance type, such as `r5.12xlarge.search`\.
The `i3` instance types are designed to provide fast, local non\-volatile memory express \(NVMe\) storage\. Because this local storage tends to offer performance benefits when compared to Amazon Elastic Block Store, EBS volumes are not an option when you select these instance types in OpenSearch Service\. If you prefer EBS storage, use another instance type, such as `r6.12xlarge.search`\.

**Shard size and count**
A common OpenSearch guideline is not to exceed 50 GB per shard\. Given the number of shards necessary to accommodate large domains and the resources available to `i3.16xlarge.search` instances, we recommend a shard size of 100 GB\.
For example, if you have 450 TB of source data and want one replica, your *minimum* storage requirement is closer to 450 TB \* 2 \* 1\.1 / 0\.95 = 1\.04 PB\. For an explanation of this calculation, see [Calculating storage requirements](sizing-domains.md#bp-storage)\. Although 1\.04 PB / 15 TB = 70 instances, you might select 90 or more `i3.16xlarge.search` instances to give yourself a storage safety net, deal with node failures, and account for some variance in the amount of data over time\. Each instance adds another 20 GiB to your minimum storage requirement, but for disks of this size, those 20 GiB are almost negligible\.
Controlling the number of shards is tricky\. OpenSearch users often rotate indexes on a daily basis and retain data for a week or two\. In this situation, you might find it useful to distinguish between "active" and "inactive" shards\. Active shards are, well, actively being written to or read from\. Inactive shards might service some read requests, but are largely idle\. In general, you should keep the number of active shards below a few thousand\. As the number of active shards approaches 10,000, considerable performance and stability risks emerge\.
To calculate the number of primary shards, use this formula: 450,000 GB \* 1\.1 / 100 GB per shard = 4,950 shards\. Doubling that number to account for replicas is 9,900 shards, which represents a major concern if all shards are active\. But if you rotate indexes and only 1/7th or 1/14th of the shards are active on any given day \(1,414 or 707 shards, respectively\), the cluster might work well\. As always, the most important step of sizing and configuring your domain is to perform representative client testing using a realistic dataset\.
To calculate the number of primary shards, use this formula: 450,000 GB \* 1\.1 / 100 GB per shard = 4,950 shards\. Doubling that number to account for replicas is 9,900 shards, which represents a major concern if all shards are active\. But if you rotate indexes and only 1/7th or 1/14th of the shards are active on any given day \(1,414 or 707 shards, respectively\), the cluster might work well\. As always, the most important step of sizing and configuring your domain is to perform representative client testing using a realistic dataset\.

0 comments on commit 18f8424

Please sign in to comment.