-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix traceflow spec validation logic. #5223
Conversation
Hi @shi0rik0 please fix the unit test failure. |
Currently, when some traceflow spec error happens, the status of such traceflows will be "Running", not "Failed". This is because some validation logic is located at wrong places. I fixed this problem by moving them to the right place. Signed-off-by: shi0rik0 <anguuan@outlook.com>
5692299
to
5bcf7cf
Compare
I think there is an issue with the unit test. The existing logic is as follows: the controller performs the first validation on the spec, and then the agent performs the second validation on the spec. It would be sufficient to have this validation inside the controller. Therefore, these unit tests are redundant. A workaround is to still keep the validation logic in the agent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Need to run full CI pipelines before merge.
/test-all |
1 similar comment
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. However, I would suggest to validate it before creating it.
if tf.Spec.Source.Pod == "" && tf.Spec.Destination.Pod == "" { | ||
return fmt.Errorf("Traceflow %s has neither source nor destination Pod specified", tf.Name) | ||
} | ||
|
||
if tf.Spec.Source.Pod == "" && !tf.Spec.LiveTraffic { | ||
return fmt.Errorf("Traceflow %s does not have source Pod specified", tf.Name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact these validations don't need to perform at runtime. Like other CRDs, they can be prevented from being created with CRD schema validation or webhook validation, which is more friendly to users. They can get immediate feedback when creating the resource and clear message what's the error, instead of having to polling the object to get the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but there are a couple of considerations:
- The existing validation logic is performed at runtime, and if we want to switch to using CRD schema validation, it may require significant changes.
- Expressing some complex validation logic using CRD schema can be challenging and may result in decreased YAML readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more ideal to prevent the creation of such invalid objects. Using CRD schema/wehbook validation is not that complex, and I reckon Traceflow is probably the only CRD doing such validation at runtime instead of pre-creation, due to historical reasons. But I'm fine to keep the patch as is if you just want the PR go this far.
Yes, implementing the current validation via CRD schema may be not practicable, that's why I mentioned "webhook validation" too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first time I heard of "webhook validation". It seems like it's a good idea to validate Traceflow CRD using webhook validation. If you wish, I'd like to refactor the validating logic to webhook validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching to webhook validation makes sense to me. You can start from https://github.com/antrea-io/antrea/blob/main/build/charts/antrea/templates/webhooks/validating/crdvalidator.yaml, and search the path "/validate/egress" in the code base to get some references.
if tf.Spec.Source.Pod == "" && tf.Spec.Destination.Pod == "" { | ||
return fmt.Errorf("Traceflow %s has neither source nor destination Pod specified", tf.Name) | ||
} | ||
|
||
if tf.Spec.Source.Pod == "" && !tf.Spec.LiveTraffic { | ||
return fmt.Errorf("Traceflow %s does not have source Pod specified", tf.Name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more ideal to prevent the creation of such invalid objects. Using CRD schema/wehbook validation is not that complex, and I reckon Traceflow is probably the only CRD doing such validation at runtime instead of pre-creation, due to historical reasons. But I'm fine to keep the patch as is if you just want the PR go this far.
Yes, implementing the current validation via CRD schema may be not practicable, that's why I mentioned "webhook validation" too.
@@ -625,7 +625,8 @@ func TestStartTraceflow(t *testing.T) { | |||
tf: &crdv1alpha1.Traceflow{ | |||
ObjectMeta: metav1.ObjectMeta{Name: "tf3", UID: "uid3"}, | |||
}, | |||
expectedErrLog: "Traceflow tf3 has neither source nor destination Pod specified", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this parameter can be removed now given the only two cases using it are changed, and there doesn't seem a reason why we should check error log in the future.
The statement is inaccurate. Not "moving" the code fix it, but making them return error does. |
I created another PR #5230. |
Currently, when some traceflow spec error happens, the status of such traceflows will be "Running", not "Failed". This is because some validation logic is located at wrong places. I fixed this problem by moving them to the right place.