archive old indexes with best_compression #240

rmuir · 2014-12-27T19:11:49Z

Ideally we would optimize indexes with the "best_compression" codec. This uses less disk space (~53% of the space for apache log data, and should improve to ~43% in the future) but the tradeoff is that access to the _source and lucene stored fields are slower.

Maybe this can somehow be an optional part of the optimize process? Optimizing in this way should not be noticeably longer (my guestimate: ~ 5%-10% on average) as the cost of merging the stored fields from lz4 to deflate takes only 19% longer after https://issues.apache.org/jira/browse/LUCENE-6115 and https://issues.apache.org/jira/browse/LUCENE-6141.

See elastic/elasticsearch#8863 for more information. This only applies to elasticsearch 2.0 or higher.

rmuir · 2014-12-27T19:20:15Z

note, as the option currently works, space savings only apply to stored fields data (not postings lists or term dictionaries for indexed fields), but today this is still typically the largest portion of the index.

So overall, we can think of current space savings of something like ~20-25% on average. But for many use-cases this is still a good trade-off.

untergeek · 2014-12-29T18:12:43Z

This will get some attention as beta releases of Elasticsearch 2.0 become available.

untergeek · 2015-05-21T15:40:13Z

As this is dependent on Lucene 5.0, this will only be doable when I start testing with Elasticsearch 2.0 builds.

untergeek · 2015-10-19T19:15:53Z

It looks like this will be addressed by Curator, but not directly by enabling or disabling this feature.

From this post:

Alternatively, we expect many users will want to utilize a “hot/warm” architecture using the shard allocation filtering feature. In this scenario, time-based indexes on hot nodes can be configured to be created with the default LZ4 compression; when they’re migrated to the warm nodes (configured with index.codec: best_compression in elasticsearch.yml), the indexes can be compressed by optimizing that index. It may be preferable to pay the CPU penalty of compression during this optimize process (which we often recommend executing when the cluster is known to be less utilized) than at the time of initial indexing.

With this being the case, it seems that Curator's existing shard allocation functionality will address this need.

rmuir added the feature_request label Dec 27, 2014

untergeek added the on_hold label Jun 18, 2015

untergeek added this to the 4.0.0 milestone Jun 18, 2015

untergeek modified the milestones: 3.4.0, 4.0.0 Oct 19, 2015

untergeek closed this as completed Oct 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

archive old indexes with best_compression #240

archive old indexes with best_compression #240

rmuir commented Dec 27, 2014

rmuir commented Dec 27, 2014

untergeek commented Dec 29, 2014

untergeek commented May 21, 2015

untergeek commented Oct 19, 2015

archive old indexes with best_compression #240

archive old indexes with best_compression #240

Comments

rmuir commented Dec 27, 2014

rmuir commented Dec 27, 2014

untergeek commented Dec 29, 2014

untergeek commented May 21, 2015

untergeek commented Oct 19, 2015