Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add message with error to the plan status to ease debugging #1050

Merged
merged 6 commits into from
Nov 19, 2019

Conversation

alenkacz
Copy link
Contributor

@alenkacz alenkacz commented Nov 14, 2019

What this PR does / why we need it:
Output from local testing

  planStatus:
    deploy:
      lastFinishedRun: null
        error expanding template serial.yaml: error rendering template: template:
        tpl:8:39: executing "tpl" at <.StepNumber>: map has no entry for key "StepNumber"'
      name: deploy
      phases:
        name: invalid-phase
        status: FATAL_ERROR
        steps:
        - message: 'Error during execution: fatal error: default/invalid1 failed in
            deploy.invalid-phase.invalid-step.invalid-task: error expanding template
            serial.yaml: error rendering template: template: tpl:8:39: executing "tpl"
            at <.StepNumber>: map has no entry for key "StepNumber"'
          name: invalid-step
          status: FATAL_ERROR
      status: FATAL_ERROR
      uid: 92014d56-3b24-4d15-ab98-8a929d1bd0dc
Plan(s) for "invalid1" in namespace "default":
.
└── invalid1 (Operator-Version: "invalid-v1" Active-Plan: "deploy")
    └── Plan deploy (serial strategy) [FATAL_ERROR] 
        └── Phase invalid-phase [FATAL_ERROR] 
            └── Step invalid-step (FATAL_ERROR) (Error during execution: fatal error: default/invalid1 failed in deploy.invalid-phase.invalid-step.invalid-task: error expanding template serial.yaml: error rendering template: template: tpl:8:39: executing "tpl" at <.StepNumber>: map has no entry for key "StepNumber")

Fixes #873

Copy link
Contributor

@zen-dog zen-dog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Aside from a few nits, we should figure out whether we propagate the message up and leave with somewhat repetitive plan status output.

pkg/apis/kudo/v1beta1/instance_types.go Outdated Show resolved Hide resolved
Status ExecutionStatus `json:"status,omitempty"`
}

func (s *StepStatus) SetStatus(status ExecutionStatus) {
Copy link
Contributor

@zen-dog zen-dog Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming nit: I'm not a big fan of Setters and Getter in method names in Go. Or maybe it's just Set and SetWithMessage e.g. phaseStatus.Set(ERROR, "something bad happened")?

phaseStatus.Status = v1beta1.ExecutionFatalError
stepStatus.Status = v1beta1.ExecutionFatalError
planStatus.Status = v1beta1.ExecutionFatalError
phaseStatus.SetStatusWithMessage(v1beta1.ExecutionFatalError, err.Error())
Copy link
Contributor

@zen-dog zen-dog Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking out loud: it was ok to propagate an ExecutionFatalError up, but do we want to do this for the message too? It seems somewhat verbose and redundant in the output. I'm slightly inclined to let the FATAL_ERROR message live at the level it occurred and only propagate the status up. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was not sure here as well... probably keeping it at the lowest level and not propagating up could be a good enough solution? Let's hear from @mpereira since he requested this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a plan's error message be an aggregation of its phases/steps error messages in case of multiple errored steps? Maybe it makes sense keeping error messages in the leaf steps? Assuming a possible future DAG plan engine, that would probably be the way to go, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep ... so if I understand it correctly we're all leaning toward propagating status but not propagating message. I'll adjust the PR...

@alenkacz
Copy link
Contributor Author

@zen-dog @mpereira the PR is updated! :)

Copy link
Member

@gerred gerred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me with some nits, I'm not going to hard request changes if they're not appropriate to this PR, so approved.

phaseStatus.Status = v1beta1.ExecutionFatalError
stepStatus.Status = v1beta1.ExecutionFatalError
planStatus.Status = v1beta1.ExecutionFatalError
phaseStatus.Set(v1beta1.ExecutionFatalError)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to move some of this out to a further helper function that sets all 3 at once, this is repetitive through the cases and is distracting me 😂

Copy link
Contributor Author

@alenkacz alenkacz Nov 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D I see, sometimes we set three, sometimes we set two, sometimes one... and it's not a doubly linked list so you cannot traverse from step up to propagate it :/ do you feel like doing something like:
propagateStatus(plan, phase, step) makes thing better? 🤔

@@ -175,7 +175,7 @@ func TestExecutePlan(t *testing.T) {
expectedStatus: &v1beta1.PlanStatus{
Status: v1beta1.ExecutionInProgress,
Name: "test",
Phases: []v1beta1.PhaseStatus{{Name: "phase", Status: v1beta1.ExecutionInProgress, Steps: []v1beta1.StepStatus{{Status: v1beta1.ErrorStatus, Name: "step"}}}},
Phases: []v1beta1.PhaseStatus{{Name: "phase", Status: v1beta1.ExecutionInProgress, Steps: []v1beta1.StepStatus{{Status: v1beta1.ErrorStatus, Name: "step", Message: "A transient error when executing task test.phase.step.task. Will retry. dummy error"}}}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not call it a dummy error - it sounds more like we threw it just to throw it, so ignore it, when it's a true error, just transient (as you mention). "Retrying until this step succeeds" or something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only in the test 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I am keeping it since it's just tests so I think it does not matter :) let me know if there's really a strong reason to change this

@@ -207,7 +207,7 @@ func TestExecutePlan(t *testing.T) {
expectedStatus: &v1beta1.PlanStatus{
Status: v1beta1.ExecutionFatalError,
Name: "test",
Phases: []v1beta1.PhaseStatus{{Name: "phase", Status: v1beta1.ExecutionFatalError, Steps: []v1beta1.StepStatus{{Status: v1beta1.ExecutionFatalError, Name: "step"}}}},
Phases: []v1beta1.PhaseStatus{{Name: "phase", Status: v1beta1.ExecutionFatalError, Steps: []v1beta1.StepStatus{{Status: v1beta1.ExecutionFatalError, Message: "Error during execution: fatal error: default/test-instance failed in test.phase.step.task: dummy error", Name: "step"}}}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -319,7 +319,7 @@ func TestExecutePlan(t *testing.T) {
Status: v1beta1.ExecutionInProgress,
Name: "test",
Phases: []v1beta1.PhaseStatus{{Name: "phase", Status: v1beta1.ExecutionInProgress, Steps: []v1beta1.StepStatus{
{Name: "stepOne", Status: v1beta1.ErrorStatus},
{Name: "stepOne", Status: v1beta1.ErrorStatus, Message: "A transient error when executing task test.phase.stepOne.taskOne. Will retry. dummy error"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -367,7 +367,7 @@ func TestExecutePlan(t *testing.T) {
Status: v1beta1.ExecutionInProgress,
Name: "test",
Phases: []v1beta1.PhaseStatus{{Name: "phase", Status: v1beta1.ExecutionInProgress, Steps: []v1beta1.StepStatus{
{Name: "stepOne", Status: v1beta1.ErrorStatus},
{Name: "stepOne", Status: v1beta1.ErrorStatus, Message: "A transient error when executing task test.phase.stepOne.taskOne. Will retry. dummy error"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -415,7 +415,7 @@ func TestExecutePlan(t *testing.T) {
Status: v1beta1.ExecutionFatalError,
Name: "test",
Phases: []v1beta1.PhaseStatus{{Name: "phase", Status: v1beta1.ExecutionFatalError, Steps: []v1beta1.StepStatus{
{Name: "stepOne", Status: v1beta1.ExecutionFatalError},
{Name: "stepOne", Status: v1beta1.ExecutionFatalError, Message: "Error during execution: fatal error: default/test-instance failed in test.phase.stepOne.taskOne: dummy error"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -463,7 +463,7 @@ func TestExecutePlan(t *testing.T) {
Status: v1beta1.ExecutionInProgress,
Name: "test",
Phases: []v1beta1.PhaseStatus{
{Name: "phaseOne", Status: v1beta1.ExecutionInProgress, Steps: []v1beta1.StepStatus{{Name: "step", Status: v1beta1.ErrorStatus}}},
{Name: "phaseOne", Status: v1beta1.ExecutionInProgress, Steps: []v1beta1.StepStatus{{Name: "step", Status: v1beta1.ErrorStatus, Message: "A transient error when executing task test.phaseOne.step.taskOne. Will retry. dummy error"}}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you get the idea 😂

Copy link
Contributor

@zen-dog zen-dog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! 🚢

@alenkacz
Copy link
Contributor Author

I am going to merge this, I gave it one more try locally and it looks good. I am happy to follow up on the comments @gerred made, feel free to reach out

@alenkacz alenkacz merged commit d049eed into master Nov 19, 2019
@alenkacz alenkacz deleted the av/errors-status-instance branch November 19, 2019 14:50
@mpereira
Copy link
Member

Thanks @alenkacz!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Show plan execution error information in the plan status output
4 participants