-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Evaluate the performance of hybridfs against mmapfs #8298
Comments
@rishabh6788 Can you see if you can help here? |
@jainankitk Does this require change in the underlying hardware, I mean the attached EBS volume type or is this something the OpenSearch handles logically? |
@rishabh6788 - Opensearch (more precisely lucene) takes care of the abstraction on top of EBS volume or any other storage type. Just need to specify the different fs types using the index.store.type setting |
As per my understanding you want to run benchmarks with different |
We can try both types of workload (nyc_taxis / http_logs) with few different memory settings like:
Options for index.store.type setting in opensearch.yml:
|
As of now the benchmark platform supports r5.xlarge/r6g.xlarge instance-types with 50% heap enabled. We do have a backlog to add multiple instance types for benchmark runs and working on it. For now I have initiated performance run for the following configuration: Single-node, r5.xlarge node, 16GB heap for nyc_taxis and http_logs workloads. NYC_TAXIS: HTTP_LOGS: All the performance metrics gets ingested into a separate datastore cluster which we use to generate visualizations and dashboards . In case you want to take a look at the benchmark final results generated for above mentioned runs you can check the console logs by clicking on |
Quick comparison between the nyc_taxis numbers show that the performance for mmapfs and hybridfs is almost similar and much better than niofs as expected. That being said, I doubt that nyc_taxis / http_logs workloads are hitting lucene segments files outside of ("nvd", "dvd", "tim", "tip", "dim", "kdd", "kdi", "cfs", "doc"), which are mmaped even on hybridfs. Hence, we need to identify workload that can hit segment files outside of this list like positions (.pos), payloads (.pay), Term Vector (.tvd) or Stored Fields (.fdt). @mikemccand – In case we don’t have such workload in Opensearch, I am wondering if we can leverage lucene microbenchmark for comparing the mmap / nio performance for these segment files? Also, I came across this github issue that advocates MADV_RANDOM flag for randomly accessed Lucene segment files to prevent page cache thrashing. Any thoughts on that? |
See apache/lucene#13196, which is going into Lucene 9.11. It is used with Java 21 or later. With recent Java 21/22 changes around project Panama this is no longer needed as you can pass a The only remaining problem is |
Describe the bug
Elasticsearch ran some benchmark 4-5 years back for defaulting to hybridfs. Given lot has changed since, makes sense to rerun the benchmark for comparing the performance of different file types
Expected behavior
Run the performance benchmark for different store types and change the default to best one. It can changed using the index.store.type setting.
The text was updated successfully, but these errors were encountered: