Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix state store SetAction panic #5438

Merged
merged 7 commits into from
Sep 6, 2024

Conversation

AndersonQ
Copy link
Member

The state store SetAction did not correctly cover the case where a nil action is passed as parameter. The unenroll handler might pass a nil action if the unenroll is a auto-unenroll, that means, it does not come from fleet

Also the unenroll handler does checks for a nil state store anymore, it assumes it's valid just as it does for all other dependencies

What does this PR do?

Fixes the state store panicking when receiving an nil action

Why is it important?

Panic is bad, the agent should not panic
Also the unenroll handler sets an nil action on the state store if the unenroll is self-inflicted. The agent auto unenroll after the 7th authentication failure against fleet.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Disruptive User Impact

  • N/A

How to test this PR locally

run the tests added on the PR, all work
keep the tests, rever the changes on the state store, they will panic

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@AndersonQ AndersonQ added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team backport-8.15 Automated backport to the 8.15 branch with mergify backport-v8.x labels Sep 5, 2024
@AndersonQ AndersonQ self-assigned this Sep 5, 2024
@AndersonQ AndersonQ requested a review from a team as a code owner September 5, 2024 08:02
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@AndersonQ AndersonQ force-pushed the 5434-fix-state-store-panic branch from 7facba4 to 4f67b2a Compare September 5, 2024 09:16
@ycombinator ycombinator requested review from pchila and removed request for swiatekm September 5, 2024 18:56
The state store SetAction did not correctly cover the case where a nil action is passed as parameter. The unenroll handler might pass a nil action if the unenroll is a auto-unenroll, that means, it does not come from fleet

Also the unenroll handler does checks for a nil state store anymore, it assumes it's valid just as it does for all other dependencies.
@AndersonQ AndersonQ force-pushed the 5434-fix-state-store-panic branch from e9ddfec to 9cd34bc Compare September 6, 2024 04:59
Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment about the log level used to signal the error in action store when trying to save a nil action.
Moreover if the unenroll action is generated internally, shouldn't we fix that too somehow ?

@AndersonQ AndersonQ requested a review from pchila September 6, 2024 08:13
@@ -92,12 +92,10 @@ func (h *Unenroll) Handle(ctx context.Context, a fleetapi.Action, acker acker.Ac
unenrollPolicy := newPolicyChange(ctx, config.New(), a, acker, true)
h.ch <- unenrollPolicy

if h.stateStore != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to have been used just to test the handler without a state store, which is one of the reasons the panic was not caught

@AndersonQ AndersonQ merged commit c113a26 into elastic:main Sep 6, 2024
13 checks passed
mergify bot pushed a commit that referenced this pull request Sep 6, 2024
* fix state store SetAction panic

The state store SetAction did not correctly cover the case where a nil action is passed as parameter. The unenroll handler might pass a nil action if the unenroll is a auto-unenroll, that means, it does not come from fleet

Also the unenroll handler does checks for a nil state store anymore, it assumes it's valid just as it does for all other dependencies.

(cherry picked from commit c113a26)
mergify bot pushed a commit that referenced this pull request Sep 6, 2024
* fix state store SetAction panic

The state store SetAction did not correctly cover the case where a nil action is passed as parameter. The unenroll handler might pass a nil action if the unenroll is a auto-unenroll, that means, it does not come from fleet

Also the unenroll handler does checks for a nil state store anymore, it assumes it's valid just as it does for all other dependencies.

(cherry picked from commit c113a26)
@AndersonQ AndersonQ deleted the 5434-fix-state-store-panic branch September 9, 2024 15:57
AndersonQ added a commit that referenced this pull request Sep 12, 2024
* fix state store SetAction panic

The state store SetAction did not correctly cover the case where a nil action is passed as parameter. The unenroll handler might pass a nil action if the unenroll is a auto-unenroll, that means, it does not come from fleet

Also the unenroll handler does checks for a nil state store anymore, it assumes it's valid just as it does for all other dependencies.

(cherry picked from commit c113a26)

Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.15 Automated backport to the 8.15 branch with mergify bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Agent panics when trying to persist a nil action
3 participants