Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive old indexes with best_compression #240

Closed
rmuir opened this issue Dec 27, 2014 · 4 comments
Closed

archive old indexes with best_compression #240

rmuir opened this issue Dec 27, 2014 · 4 comments

Comments

@rmuir
Copy link

rmuir commented Dec 27, 2014

Ideally we would optimize indexes with the "best_compression" codec. This uses less disk space (~53% of the space for apache log data, and should improve to ~43% in the future) but the tradeoff is that access to the _source and lucene stored fields are slower.

Maybe this can somehow be an optional part of the optimize process? Optimizing in this way should not be noticeably longer (my guestimate: ~ 5%-10% on average) as the cost of merging the stored fields from lz4 to deflate takes only 19% longer after https://issues.apache.org/jira/browse/LUCENE-6115 and https://issues.apache.org/jira/browse/LUCENE-6141.

See elastic/elasticsearch#8863 for more information. This only applies to elasticsearch 2.0 or higher.

@rmuir
Copy link
Author

rmuir commented Dec 27, 2014

note, as the option currently works, space savings only apply to stored fields data (not postings lists or term dictionaries for indexed fields), but today this is still typically the largest portion of the index.

So overall, we can think of current space savings of something like ~20-25% on average. But for many use-cases this is still a good trade-off.

@untergeek
Copy link
Member

This will get some attention as beta releases of Elasticsearch 2.0 become available.

@untergeek
Copy link
Member

As this is dependent on Lucene 5.0, this will only be doable when I start testing with Elasticsearch 2.0 builds.

@untergeek untergeek added this to the 4.0.0 milestone Jun 18, 2015
@untergeek untergeek modified the milestones: 3.4.0, 4.0.0 Oct 19, 2015
@untergeek
Copy link
Member

It looks like this will be addressed by Curator, but not directly by enabling or disabling this feature.

From this post:

Alternatively, we expect many users will want to utilize a “hot/warm” architecture using the shard allocation filtering feature. In this scenario, time-based indexes on hot nodes can be configured to be created with the default LZ4 compression; when they’re migrated to the warm nodes (configured with index.codec: best_compression in elasticsearch.yml), the indexes can be compressed by optimizing that index. It may be preferable to pay the CPU penalty of compression during this optimize process (which we often recommend executing when the cluster is known to be less utilized) than at the time of initial indexing.

With this being the case, it seems that Curator's existing shard allocation functionality will address this need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants