Unnecessary PodDisruptionBudget update on operator restart #5812

thbkrkr · 2022-06-21T10:29:41Z

Opening this issue following PR #5783 to discuss about the potential issue.

Context:

The Elasticsearch controller uses Observers to cache the ES clusters health.
The PodDisruptionBudget is updated based on the ES cluster health (allowedDisruptions())

Issue:
The issue occurs when the operator restarts. At the first ES reconciliation, the ES controller starts the Observer and retrieves the observed cluster health which is still equal to unknown. The reconciliation is performed with the last stored status in the ES resource, so all is good, except that the unknown value is used to update the ES status when the reconciliation is complete. At the second ES reconcilition, the reconciliation is performed with an unknown status which triggers an update of the PDB. The ES status is updated again with the last observed health when the reconciliation is complete. At the third reconciliation, the reconciliation is performed with a known status and the PDB is updated again to return to normal.

I'm not sure that's a real issue. This translates from the operator point of view as until I don't see a green cluster I don't allow pod disruptions. It sounds ok for me.

Cause:
When an Observer is created, the cluster health is initialized to unknown. Then the Observer is started with a first observation and a timer to trigger next observations periodically. The whole start operation is executed asynchronously in a separated goroutine. This means there is no simple way to start the observer and get the first observed heath.

Solution proposed in #5783:
Move the first observation out of the goroutine to make it synchronous. Interestingly, it fails the TestManager_AddObservationListener unit test that tests the notification listeners mechanism. The test uses blocking channels to test the listeners like the ES controller does (WatchClusterHealthChange()). With the first observation synchronous that uses a listener which writes in a blocking channel, the test is stuck because the code that consumes the channel of the listener has not started. We could use asynchronous channels to fix this but...

I need a confirmation, I think that the first observation is intentionally asynchronous because we don't want to block the reconciliation if an ES cluster takes time to respond.

Let's take a step back.
Is it a real issue?
If yes, could we consider to reconcile the PDB only when the health is not unknown? This would imply that at the creation, the PDB is created once the pods are ready.

The text was updated successfully, but these errors were encountered:

jiuker · 2022-06-21T15:30:54Z

You can't avoid upgrading your version, it behaves the same as a reboot. So I think this is an issue

pebrc · 2022-06-21T16:04:39Z

You can't avoid upgrading your version, it behaves the same as a reboot. So I think this is an issue

I am not sure I understand this. Can you explain a bit more.

In general there is no strong reason to have the observation async. I mean we do a few more synchronous requests to the Elasticsearch API that are affected if the cluster does not respond to requests.

Having said that I think it is probably not worth the extra complexity to have the initial observation now outside of the observer. Also I am not sure that having the first observation synchronous will fix the issue as the PDB code takes the health from status sub resource which will only be updated after the first reconciliation and the Elasticsearch cluster will probably not be available at this point (takes ~ 60-70sec depending on hardware) So we would still see multiple PDB updates (unless I am missing something)

The current behaviour to disallow disruptions in the presence of (yet) unknown health sounds reasonable to me.

jiuker · 2022-06-22T01:13:16Z

You can't avoid upgrading your version, it behaves the same as a reboot. So I think this is an issue

I am not sure I understand this. Can you explain a bit more.

In general there is no strong reason to have the observation async. I mean we do a few more synchronous requests to the Elasticsearch API that are affected if the cluster does not respond to requests.

Having said that I think it is probably not worth the extra complexity to have the initial observation now outside of the observer. Also I am not sure that having the first observation synchronous will fix the issue as the PDB code takes the health from status sub resource which will only be updated after the first reconciliation and the Elasticsearch cluster will probably not be available at this point (takes ~ 60-70sec depending on hardware) So we would still see multiple PDB updates (unless I am missing something)

The current behaviour to disallow disruptions in the presence of (yet) unknown health sounds reasonable to me.

Thank you for your reply. I think it is because our E2E test did not restore our usage scenarios. I submitted a new commit and withdrew the previous asynchronous writing. I modified the usage scenario of E2E to make it more suitable for the running environment of Eck. In k8s, the receiver is asynchronous rather than synchronous, but E2E is synchronous. This leads to the E2E running to be stuck, but the Eck runs normally.Please review again.
#5783

thbkrkr · 2022-06-23T12:46:34Z

Also I am not sure that having the first observation synchronous will fix the issue as the PDB code takes the health from status sub resource which will only be updated after the first reconciliation and the Elasticsearch cluster will probably not be available at this point (takes ~ 60-70sec depending on hardware) So we would still see multiple PDB updates (unless I am missing something)

The issue occurs when the operator is restarted and an Elasticsearch is already running, so your comment about the time to startup isn't relevant here?
Good point about the status sub resource. It makes me realize that it's during the second reconciliation that we update the PDB.

-- Reconciliation 1 --
d.ES.Status.Health=green observedState=unknown
Reconcile PDB, es.Status.Health=green
Update ES status previous=green current=unknown     <-- we set the state to unknown

-- Reconciliation 2 --
d.ES.Status.Health= unknown observedState=green
Reconcile PDB, es.Status.Health=unknown             <-- we update PDB
Update ES status previous=unknown current=green

-- Reconciliation 3 --
d.ES.Status.Health= green observedState=green
Reconcile PDB, es.Status.Health=green               <-- back to green, update PDB again
Update ES status previous=green current=green

I don't find this behavior very satisfying, so I'm in favor of trying to improve it.

Since we already have a synchronous call to check the license, I think it's acceptable to pass the first observation synchronous. This only lengthens the first reconciliation by a few ms.

We can see that the first HTTP call has a cold start (+2s), then the connection is reused.

# current
 -> [sync] checkElasticsearchLicense 2.35s
     // [async] retrieveHealth 2.45s
=> 2.2s

# proposed
 -> [sync] retrieveHealth 2.33s
 -> [sync] checkElasticsearchLicense 604ms
=> 2.9s

thbkrkr added the discuss We need to figure this out label Jun 21, 2022

botelastic bot added the triage label Jun 21, 2022

thbkrkr mentioned this issue Jun 21, 2022

Move first ES cluster state observation out of go routine #5783

Merged

thbkrkr added >bug Something isn't working >enhancement Enhancement of existing functionality labels Jun 29, 2022

botelastic bot removed the triage label Jun 29, 2022

thbkrkr removed >bug Something isn't working discuss We need to figure this out labels Jun 29, 2022

thbkrkr closed this as completed in #5783 Jul 4, 2022

thbkrkr changed the title ~~The first ES cluster state observation is asynchronous~~ Unnecessary PodDisruptionBudget update on operator restart Jul 4, 2022

thbkrkr added >bug Something isn't working and removed >enhancement Enhancement of existing functionality labels Jul 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnecessary PodDisruptionBudget update on operator restart #5812

Unnecessary PodDisruptionBudget update on operator restart #5812

thbkrkr commented Jun 21, 2022 •

edited

Loading

jiuker commented Jun 21, 2022

pebrc commented Jun 21, 2022

jiuker commented Jun 22, 2022

thbkrkr commented Jun 23, 2022

Unnecessary PodDisruptionBudget update on operator restart #5812

Unnecessary PodDisruptionBudget update on operator restart #5812

Comments

thbkrkr commented Jun 21, 2022 • edited Loading

jiuker commented Jun 21, 2022

pebrc commented Jun 21, 2022

jiuker commented Jun 22, 2022

thbkrkr commented Jun 23, 2022

thbkrkr commented Jun 21, 2022 •

edited

Loading