Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Transition based on cluster free available space #260

Open
drock opened this issue Jul 27, 2020 · 4 comments
Open

Transition based on cluster free available space #260

drock opened this issue Jul 27, 2020 · 4 comments
Labels
enhancement An improvement on the existing feature’s functionalities

Comments

@drock
Copy link

drock commented Jul 27, 2020

Problem:
Currently ISM allows you to transition an index to a new state based on the index's age. This works generally well for managing the overall size of a logging cluster. This works best when you have a consistent amount of volume that is generated at some interval, such as 100GB per day for example. In this case you can calculate the amount of time to keep an index around before deleting it with pretty good accuracy.

If you have a varying amount of volume, or it is largely unpredictable however, this becomes a problem. You cannot calculate with much certainty whether keeping an index around for 1, 5, 15, etc. days will make you eventually run out of space.

Solution Request
It would be great if you could define transitions based upon available space. If you are evenly distributing your data across nodes (which you should) then you could instead configure a policy to delete indices when there is less than 10%, 20%, etc. available space. This would have to delete indices in descending order from oldest to newest of course.

This would be especially beneficial to an organization like ours which has an annually seasonal usage pattern. If I define an ISM policy that keeps my disk usage at bay today, it may no longer work well a month from now when my usage and volume is higher.

This leads us to either have to over-provision our cluster for our heaviest usage or constantly monitor the cluster and keep adjusting the ISM policies.

An added bonus would be to be able to do this based on the storage space of different types of nodes. For example, we run a hot/warm workflow. Our warm nodes can hold far more data than the hot nodes. Thus, it would be optimal to be able to transition indices to warm once the hot nodes are getting full and then again to transition to delete once the warm nodes are getting full.

@drock drock added the enhancement An improvement on the existing feature’s functionalities label Jul 27, 2020
@dbbaughe
Copy link
Contributor

Hey @drock,

Thanks for opening an issue. I'm guessing using the min_size condition will run into a similar issue. Should let you rollover based on your volume and ingestion patterns, but the part you're looking for is a way to adjust how long that index sticks around after that rollover right?

One thing we want to avoid at least right now with the current architecture is having the individual index jobs rely on information from other jobs or have a side effect on those other jobs. And the main problem I'm thinking of is eventually you'll have n number of indices that are all executing/checking their transition conditions and you'll hit this condition and if you're unlucky you'll have them all transition to the next state (and in your case delete themselves presumably). And unfortunately ISM doesn't execute itself on an interval where it can just check which index is the oldest. Each of these indices (jobs) all execute individually with no knowledge of each other.

It definitely seems like a problem we should try to solve though. So in light of that just to open the discussion (while contradicting myself a bit), what about the possibility of using the total size of the indices an alias points to or an index pattern? If you are rolling over indices then you could possibly designate group_a, group_b, group_c, group_d, etc. all with their own reserved space.

e.g. group_a indices can have up to 400GB of primary store size, group_b indices can have up to 200GB of primary store size, etc. and once they go over that limit we can have the oldest index in that group delete itself.

This would offload the issue of each job having to figure out between other jobs who is oldest and with similar conditions. Although if you ever ended up with an alias group where the oldest was not being managed then all of these would get blocked (perhaps until another condition evaluated to true). Theoretically you could apply an alias to every index being managed and use this on that whole group (which could be every index in the cluster if you manage everything).

Perhaps there is an easier way to solve this problem though, will try to think through it some more.

@drock
Copy link
Author

drock commented Aug 11, 2020

@dbbaughe I like your suggestion. I think its actually better than mine of looking at total available space of the cluster because its more flexible. It would allow me to have multiple groups of indices, each with a maximum allocation of a certain percentage of space like logs* can take 60% of the space and metrics* can take 30%, etc. Given I know the total size of my cluster before hand, I can easily calculate how much 60% is and put that in the policy.

👍

@gittygoo
Copy link

This is something we would be looking for aswell.

Our scenario is pretty similar: we have a cluster which we allocate some space (say 500GB) and if it hits a certain percentage overall (say 80% usage) we would like to start deleting older indexes until the percentage falls below 80% again...

@tybalex
Copy link

tybalex commented Jun 2, 2021

my current workaround is take snapshot of cold indices and then delete it soon. However it would be better to have this feature implemented.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement on the existing feature’s functionalities
Projects
None yet
Development

No branches or pull requests

4 participants