-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle smartnic cleanup when container net namespace is empty #129
handle smartnic cleanup when container net namespace is empty #129
Conversation
This change attempts to cleanup smartVFs net representor from ovs bridge and rename smartVF back to its original name when container network namespace is no more available due to errors like OOMKilled, CrashLoopBackOff etc. Signed-off-by: Periyasamy Palanisamy <periyasamy.palanisamy@est.tech>
/release-note-none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, few comments
// SR-IOV Case - The sriov device is moved into host network namespace when args.Netns is empty. | ||
// This happens container is killed due to an error (example: CrashLoopBackOff, OOMKilled) | ||
var rep string | ||
if rep, err = sriov.GetNetRepresentor(netconf.DeviceID); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, use :=
assignment instead of var rep string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't use := as rep variable is used at line#447 (outside if scope).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks
pkg/sriov/sriov.go
Outdated
} | ||
|
||
return rep, nil | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
pkg/sriov/sriov.go
Outdated
// Make sure we have 1 netdevice per pci address | ||
if len(vfNetdevices) != 1 { | ||
// This would happen if netdevice is not yet visible in default network namespace. | ||
// so return ErrLinkNotFound error so that multus can attempt multiple times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this behaviour specific for multus only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can happen with any meta plugins that use ovs-cni, but I've tested only with multus. now changed it into generic term meta plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
pkg/plugin/plugin.go
Outdated
} | ||
if err = removeOvsPort(ovsDriver, rep); err != nil { | ||
// Don't throw err as delete can be called multiple times because of error in ResetVF and ovs | ||
// port is already deleted in a previous invocation. so log it and proceed further. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so log it and proceed further.
seems superfluous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was needed so that ResetVF is called at line#452 to rename the smartvf back to original name in case ResetVF thrown ErrLinkNotFound error in previous CmdDel call (noticed this behaviour in my tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry for not being clear enough. I meant the sentence in the comment: so log it and proceed further.
, I think it's not necessary as it's obvious from the next two lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah! yes. is it ok now ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good to me
pkg/plugin/plugin.go
Outdated
// Don't throw err as delete can be called multiple times because of error in ResetVF and ovs | ||
// port is already deleted in a previous invocation. so log it and proceed further. | ||
log.Printf("removal of ovs port %s is failed for device %s , it may be removed already"+ | ||
" by previous delete call, err %v", rep, netconf.DeviceID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other places in this module, we just log.Printf("Error: %v\n", err) in this situation, it may be better to stick with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, good. done now.
Signed-off-by: Periyasamy Palanisamy <periyasamy.palanisamy@est.tech>
/cc @JanScheurich |
@pperiyasamy: GitHub didn't allow me to request PR reviews from the following users: JanScheurich. Note that only kubevirt members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Periyasamy Palanisamy <periyasamy.palanisamy@est.tech>
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
🚀
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: phoracek, pperiyasamy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@pperiyasamy do you need a new release to consume this? |
yes @phoracek, it would be great if we have a release with this fix. |
Sure thing sir! Here you go https://github.com/kubevirt/ovs-cni/releases/tag/v0.14.0 |
This change attempts to cleanup smartVFs net representor from
ovs bridge and rename smartVF back to its original name when
container network namespace is no more available due to errors
like OOMKilled, CrashLoopBackOff etc.
Signed-off-by: Periyasamy Palanisamy periyasamy.palanisamy@est.tech