-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for userspace device drivers with HW offload mode #322
Add support for userspace device drivers with HW offload mode #322
Conversation
Hi @kim-tae-kyung. Thanks for your PR. I'm waiting for a k8snetworkplumbingwg member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@kim-tae-kyung thanks for the PR. @ykulazhenkov any chance you would be interested in reviewing this? Sadly, I'm not all that familiar with the SR-IOV portion of the project, and I don't feel confident reviewing this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for the PR @kim-tae-kyung.
I added few comments/questions
pkg/sriov/sriov.go
Outdated
// set VF MAC address if provided, otherwise get it from the VF | ||
// use the PF netlink, since there is no netdevice for the VF | ||
if mac != "" { | ||
if err := netlink.LinkSetVfHardwareAddr(pfLink, vfIdx, hwaddr); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the sriov-cni hardware address for VFs is configured for userspace drivers also. I think we need to do the same.
For userspace - configure mac with LinkSetVfHardwareAddr
For kernel net - configure mac with LinkSetVfHardwareAddr + LinkSetHardwareAddr (for VF net dev)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. For kernel net, execute ip link set $link address $hwaddr
(LinkSetHardwareAddr) as well as ip link set $link vf $vf mac $hwaddr
(LinkSetVfHardwareAddr)
Lines 200 to 206 in c65a4e6
// if MAC address is provided, set it to the VF by using PF netlink | |
// which is accessible in the host namespace, not in the container namespace | |
if hwaddr != nil { | |
if err := netlink.LinkSetVfHardwareAddr(pfLink, vfIdx, hwaddr); err != nil { | |
return err | |
} | |
} |
pkg/sriov/sriov.go
Outdated
_, err = renameLink(vfNetdevice, contIface.Name) | ||
// in userspace driver mode, configure the smart VF netdevice via PF netlink | ||
// otherwise, configure the smart VF netdevice directly in the container namespace | ||
if !dpdkMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to create a separate functions for dpdk and non-dpdk scenarios and call these functions from the SetupSriovInterface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I create two functions, setupKernelSriovContIface and setupUserspaceSriovContIface.
Thanks to you, the code is now cleaner.
Lines 292 to 302 in 074e965
if !userspaceMode { | |
// configure the smart VF netdevice directly in the container namespace | |
if err = setupKernelSriovContIface(contNetns, contIface, deviceID, ifName, mtu); err != nil { | |
return nil, nil, err | |
} | |
} else { | |
// configure the smart VF netdevice via PF netlink | |
if err = setupUserspaceSriovContIface(contNetns, contIface, pfLink, vfIdx, ifName); err != nil { | |
return nil, nil, err | |
} | |
} |
@@ -672,6 +700,11 @@ func CmdCheck(args *skel.CmdArgs) error { | |||
return err | |||
} | |||
|
|||
// TODO: CmdCheck for userspace driver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to implement this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not in this PR. I will make a separate commit for CmdCheck later.
pkg/plugin/plugin.go
Outdated
} | ||
} | ||
// remove the VF representor port from the bridge | ||
repIfName, err := sriov.GetNetRepresentor(cache.Netconf.DeviceID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? I think this https://github.com/kim-tae-kyung/ovs-cni/blob/4c01576a432fc30847f349413e52c704c4d7a449/pkg/plugin/plugin.go#L601 line should handle VF rep removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Your review is correct. My code was a duplicate implementation, so I have removed that part.
Lines 610 to 621 in 074e965
if sriov.IsOvsHardwareOffloadEnabled(cache.Netconf.DeviceID) { | |
// there is no network interface in case of userspace driver, so OrigIfName is empty | |
if !cache.UserspaceMode { | |
err = sriov.ReleaseVF(args, cache.OrigIfName) | |
if err != nil { | |
// try to reset vf into original state as much as possible in case of error | |
if err := sriov.ResetVF(args, cache.Netconf.DeviceID, cache.OrigIfName); err != nil { | |
log.Printf("Failed best-effort cleanup of VF %s: %v", cache.OrigIfName, err) | |
} | |
} | |
} | |
} else { |
pkg/types/types.go
Outdated
@@ -40,6 +40,7 @@ type NetConf struct { | |||
SocketFile string `json:"socket_file"` | |||
LinkStateCheckRetries int `json:"link_state_check_retries"` | |||
LinkStateCheckInterval int `json:"link_state_check_interval"` | |||
DPDKMode bool `json:"-,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear what you trying to achieve with `json:"-,omitempty"` syntax. If you want to hide the field from the API, then the correct syntax is just `json:"-"`. Currently DPDKMode renders as "-" in the cached json.
I see two options here:
- use meaningful json filed name for DPDKMode in NetConf struct
- move DPDKMode field to the CachedNetConf struct
The first option will pollute the user API. My personal preference is to use opt 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I also prefer the second option.
Additionally, I changed DPDKMode
to UserspaceMode
, as the VF can be used not only for containerized DPDK but also for containerized QEMU, such as in KubeVirt.
Lines 74 to 83 in 074e965
// CachedNetConf containing NetConfig, original smartnic vf interface name | |
// and kernel/userspace device driver mode of the smartnic vf interface | |
// (the last two are set only in case of ovs hareware offload scenario). | |
// this is intended to be used only for storing and retrieving config | |
// to/from a data store (example file cache). | |
type CachedNetConf struct { | |
Netconf *NetConf | |
OrigIfName string | |
UserspaceMode bool | |
} |
02ced28
to
c65a4e6
Compare
@ykulazhenkov Thanks for your review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kim-tae-kyung thanks for addressing my comments.
I added one important comment and one nit.
pkg/plugin/plugin.go
Outdated
if err != nil { | ||
return err | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if you remove the else
branch the code will have the same behavior.
If you will decide to drop the else
branch, you can also change line 295 to userspaceMode := false
to be more explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I agree. The else
branch was wired.
Lines 294 to 301 in 05df2fe
// check if the device driver is the type of userspace driver | |
userspaceMode := false | |
if sriov.IsOvsHardwareOffloadEnabled(netconf.DeviceID) { | |
userspaceMode, err = sriov.HasUserspaceDriver(netconf.DeviceID) | |
if err != nil { | |
return err | |
} | |
} |
pkg/plugin/plugin.go
Outdated
return err | ||
} | ||
|
||
if err := validateCache(cache, netconf); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache validation may fail in case if "bridge name auto discovery" is used. Please, check the comment on the line 689. You need to move cache validation call to be after getBridgeName()
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. It was my mistake. Thanks for your comment.
I moved the IPAM check below the cache validation, instead of moving the cache validation up.
Lines 690 to 698 in 05df2fe
// run the IPAM plugin | |
// userspace driver does not support IPAM plugin, | |
// because there is no network interface for the VF on the host | |
if netconf.NetConf.IPAM.Type != "" && !cache.UserspaceMode { | |
err = ipam.ExecCheck(netconf.NetConf.IPAM.Type, args.StdinData) | |
if err != nil { | |
return fmt.Errorf("failed to check with IPAM plugin type %q: %v", netconf.NetConf.IPAM.Type, err) | |
} | |
} |
Add support for SR-IOV VFs using both hardware offload (switchdev) and userspace device driver such as vfio-pci. Signed-off-by: Taekyung Kim <kim.tae.kyung@sk.com>
If the MAC address is provided from args, update the MAC address of the VF to the provided MAC address via netlink. Signed-off-by: Taekyung Kim <kim.tae.kyung@sk.com>
c65a4e6
to
5c039e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, looks good to me
@phoracek The PR looks good to me. Can you trigger the CI check? |
/test pull-e2e-ovs-cni |
Thanks a lot for the review @ykulazhenkov @SchSeba do you also want to take a look before we merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
nice work!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kim-tae-kyung, SchSeba, ykulazhenkov The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
Thanks to All! |
Add support for SR-IOV Virtual Functions (VFs) using both hardware offload (switchdev) and userspace device drivers such as vfio-pci.
If the MAC address is provided via arguments, update the VF’s MAC address accordingly using netlink.
What this PR does / why we need it:
This feature is essential for scenarios where SR-IOV VFs require both userspace device drivers and hardware offload.
For instance, containerized network functions with Intel E810 may enable eswitch switchdev mode for hardware offload while using the vfio-pci driver for running DPDK or VPP.
Another use case is the KubeVirt SR-IOV Interface.
After merging this PR, users can create a KubeVirt VM supported by OVS hardware offload without modifying the KubeVirt source code.
Special notes for your reviewer:
Even in scenarios where userspace device drivers are employed, and there is no netdevice for the VF on the host, the CNI should still return the name and sandbox for the VF interface to be recognized by Multus.
Release note: