-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic Flexvolume plugin discovery proposal. #833
Conversation
/assign @chakri-nelluri @msau42 @saad-ali |
@verult: GitHub didn't allow me to assign the following users: chakri-nelluri. Note that only kubernetes members can be assigned. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig storage |
|
||
## **Background** | ||
|
||
Beginning in version 1.8, the Kubernetes Storage SIG is putting a stop to accepting in-tree volume plugins and advises all storage providers to implement out-of-tree plugins. Currently, there are two recommended implementations: Container Storage Interface (CSI) and Flexvolume. CSI provides a single interface that storage vendors can implement in order for their storage solutions to work across many different container orchestrators, and volume plugins are out-of-tree by design. This is a large effort, the full implementation of CSI is several quarters away, and there is a need for an immediate solution for storage vendors to continue adding volume plugins. Flexvolume is an in-tree plugin that has the ability to run any storage solution by executing volume commands against a user-provided driver on the Kubernetes host, and this currently exists today. However, the process of setting up Flexvolume is very manual, pushing it out of consideration for many users. Problems include having to copy the driver to a specific location in each node, manually restarting kubelet, and user's limited access to machines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: split this up into smaller paragraphs to make it easier to read and comment on.
|
||
|
||
|
||
* Alice is a storage plugin author and would like to deploy a Flexvolume driver on all node instances. She creates an image by copying her Flexvolume driver directory, with driver names in `[vendor~]driver/driver` format (e.g. `flexvolume/k8s~nfs/nfs`), to the Flexvolume deployment base image at `/flexvolume`. Then she makes her image available Bob, a cluster admin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe split this up into steps per persona.
Could you add an example of an image Dockerfile and DaemonSet?
* Bob modifies the existing deployment DaemonSet spec with the name of the given image, and creates the DaemonSet. | ||
* Charlie, an end user, creates volumes using the installed plugin. | ||
|
||
The user story for driver update is similar: Alice creates a new image with her new drivers, and Bob deploys it using the DaemonSet spec. Note that the `/flexvolume` directory must look exactly like what is desired in the Flexvolume directory on the host, thus in order to add a new driver without removing existing ones, existing drivers must also appear in `/flexvolume`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe exactly "what is desired"
|
||
### Driver Installation Script | ||
|
||
The script will copy the existing content of `/flexvolume` on the host to a location in `/tmp`, and then attempt to copy user-provided drivers to that directory. If the copy fails, the original drivers are restored. This script will not perform any driver validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are failures shown to the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what's the best way. I'll add that to Open Questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the exit status from Daemonset to notify the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the "are restored" ? A driver should never ever touch anything that isn't it's own.
The install process, for most drivers, should be:
while true; do
if [ -f /flexvolume/myvendor~mydriver/mydriver ]; then
sleep 60
else
mkdir /flexvolume/myvendor~mydriver
cp -a /mydriver /flexvolume/myvendor~mydriver
fi
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model in the current design is the entire Flexvolume directory containing all drivers get replaced, so when a copy fails, the backup of the entire directory is restored.
After moving to the single-driver install model, this will be changed to backing up a single driver instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the hard line is at "you should never ever touch something you don't own".
|
||
### Dynamic Plugin Discovery | ||
|
||
Create a `PluginStub` interface containing a single method `Init()`, and have `VolumePlugin` extend it. Create a `PluginProber` type which extends `PluginStub` and includes methods `Init()` and `Probe()`. `Init()` initializes fsnotify, creates a watch on the driver directory as well as its subdirectories (if any), and spawn a goroutine listening to the signal. When the goroutine receives signal that a new directory is created, create a watch for the directory so that driver changes can be seen. `Probe()` scans the driver directory only when the goroutine sets a flag. If the flag is set, return true (indicating that new plugins are available) and the list of plugins. Otherwise, return false and nil. After the scan, the watch is refreshed to include the new list of subdirectories. The goroutine should only record a signal if there has been a 1-second delay since the last signal (see Security Considerations). Because inotify (used by fsnotify) can only be used to watch an existing directory, the goroutine needs to maintain the invariant that the driver directory always exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split this up into steps instead of one giant block of text
|
||
### Dynamic Plugin Discovery | ||
|
||
Create a `PluginStub` interface containing a single method `Init()`, and have `VolumePlugin` extend it. Create a `PluginProber` type which extends `PluginStub` and includes methods `Init()` and `Probe()`. `Init()` initializes fsnotify, creates a watch on the driver directory as well as its subdirectories (if any), and spawn a goroutine listening to the signal. When the goroutine receives signal that a new directory is created, create a watch for the directory so that driver changes can be seen. `Probe()` scans the driver directory only when the goroutine sets a flag. If the flag is set, return true (indicating that new plugins are available) and the list of plugins. Otherwise, return false and nil. After the scan, the watch is refreshed to include the new list of subdirectories. The goroutine should only record a signal if there has been a 1-second delay since the last signal (see Security Considerations). Because inotify (used by fsnotify) can only be used to watch an existing directory, the goroutine needs to maintain the invariant that the driver directory always exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who creates this interface?
|
||
### Copying Driver File to Image | ||
|
||
Using a publicly available deployment image containing the script as the base image, the plugin author copies the Flexvolume driver directory (e.g. `flexvolume/k8s~nfs/nfs`) to `/flexvolume` and makes the image available to the cluster admin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also have a section for the public deployment base image.
|
||
Because Flexvolume has two separate plugin instantiations (attachable and non-attachable), it's worth considering the case when a driver that implements attach/detach is replaced with a driver that does not, or vice versa. This does not cause an issue because plugins are recreated every time the driver directory is changed. | ||
|
||
There is a possibility that a probe occurs at the same time the DaemonSet updates the driver, so that the prober's view of drivers is inconsistent. However, this is very rare and when it does occur, the next `Probe()`call, which occurs shortly after, will be consistent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some parts of the volume code that iterate through all the volume plugins. This will also need to deal with synchronization with those calls.
22ce9e0
to
d8d5536
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also how does this work in conjunction with kubernetes/enhancements#278? It seems that we can utilize the base image to install/add mount tools to be available in the host.
|
||
Inside `InitPlugins()` from `volume/plugins.go`, if the `PluginStub` is an instance of `PluginProber`, only call its `Init()` and nothing else. Add an additional field, `flexVolumePluginList`, in `VolumePluginMgr` as a cache. For every iteration of the plugin list, call `Probe()` and update `flexVolumePluginList` if true is returned, and iterate through the new plugin list. If the return value is false, iterate through the existing `flexVolumePluginList`. | ||
|
||
Because Flexvolume has two separate plugin instantiations (attachable and non-attachable), it's worth considering the case when a driver that implements attach/detach is replaced with a driver that does not, or vice versa. This does not cause an issue because plugins are recreated every time the driver directory is changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the flexvolume implements attach/detach interface, how are you going to extend the controller-manager (assuming you are using a daemonset)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "extend"?
If you are asking how the drivers get deployed through DaemonSet to a master: DaemonSet will add a pod to the master node regardless of whether the node is set to schedulable.
If you are asking how the controller-manager picks up the newly added attach/detach procedures when the driver is replaced: during plugin probe a FlexVolumeAttachablePlugin is created, which replaces the previous plugin. The attachable plugin will enable attach/detach calls from AttachDetachController.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DaemonSet will add a pod to the master node
Not all masters run kubelet. E.g. GKE.
|
||
Pros: Designed for containers that eventually terminate. No need to have the container go into an infinite loop. | ||
|
||
Cons: Does not guarantee every node has a pod running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use affinity/placement to target all nodes? Although you would still have issues if new nodes are added to the k8s cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I'll add it to the doc. Yes like you said, with Jobs you have to specify how many pods you'd like to see running to completion, which is a problem when new nodes come up.
|
||
Pros: Simpler to implement. | ||
|
||
Cons: Kubelet or controller manager iterates through the plugin list many times, so Probe() is called very frequently. Using this model would increase unnecessary disk usage. This issue is mitigated if we guarantee that `PluginProber` is the last `PluginStub` in the iteration, and only `Probe()` if no other plugin is matched, but this logic adds additional complexity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Flexvolume drivers are not something that will be added or modified often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is probe called so often?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's called for every FindPluginBy*(), because currently it iterates through all plugins and errors out when multiple plugins are found. As for why FindPluginBy*() is called often, I don't know. I can only tell that from logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should track that backwards - it is surprising to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like there are (at least) two reasons:
-
In every DSW populator loop, it's called for volumes of every existing pod, in order to re-associate each pod with its required volumes.
-
Certain volume types are set to constantly remount, triggering a Probe every time the remount occurs.
Just to clarify: the containerized mounts feature doesn't install mount utilities on the host directly; rather, it keeps them in a container, and volume plugins need to do a container exec to access them. The deployment script in this design can be composed with the mount utility container, and the DaemonSet spec here can be combined with the containerized mounts DaemonSet spec. The Flexvolume drivers will have to be able to perform the container execs to access containerized mount utilities, but otherwise it should just work. |
|
||
### High Level Design | ||
|
||
The DaemonSet mounts a hostpath volume exposing the host's Flexvolume driver directory onto every pod. The base deployment image contains a script that copies drivers in the image to the hostpath. A notification is then sent to the filesystem watch from kubelet or controller manager. During volume creation, if there is a signal from the watch, kubelet or controller manager probes the driver directory and loads currently installed drivers as volume plugins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the flexvolume content have to be copied to every pod? Assuming non-containarized kubelet, shouldn't this just be on host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to say every DaemonSet pod. Will clarify.
|
||
`Probe()` scans the driver directory only when the goroutine sets a flag. If the flag is set, return true (indicating that new plugins are available) and the list of plugins. Otherwise, return false and nil. After the scan, the watch is refreshed to include the new list of subdirectories. The goroutine should only record a signal if there has been a 1-second delay since the last signal (see [Security Considerations](#security-considerations)). Because inotify (used by fsnotify) can only be used to watch an existing directory, the goroutine needs to maintain the invariant that the driver directory always exists. | ||
|
||
Inside `InitPlugins()` from `volume/plugins.go`, if the `PluginStub` is an instance of `PluginProber`, only call its `Init()` and nothing else. Add an additional field, `flexVolumePluginList`, in `VolumePluginMgr` as a cache. For every iteration of the plugin list, call `Probe()` and update `flexVolumePluginList` if true is returned, and iterate through the new plugin list. If the return value is false, iterate through the existing `flexVolumePluginList`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the
PluginStub
is an instance ofPluginProber
I thought both PluginStub
and PluginProber
were interfaces. Do you mean if plugin in InitPlugins
loop is an instance of PluginProber
? I assume all Volume plugins will be instance of PluginStub
since it would be the type that would be extended by VolumePlugin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, so the idea is, each time any function on VolumePluginMgr
is executed, we call Probe
on plugin and if that returns true we rescan the directories and return with updated plugin list. I am wondering, if we have can simply rescan the directory always whenever a call to VolumePluginMgr
is made, that is keep the polling on-demand rather than periodic and we won't need inotify etc hook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean if plugin in InitPlugins loop is an instance of PluginProber ? I assume all Volume plugins will be instance of PluginStub since it would be the type that would be extended by VolumePlugin.
Yes that's correct. Will clarify.
I am wondering, if we have can simply rescan the directory always whenever a call to VolumePluginMgr is made
This is discussed in Alternative Designs (2).
|
||
* Alice is a storage plugin author and would like to deploy a Flexvolume driver on all node instances. She | ||
1. prepares her Flexvolume driver directory, with driver names in `[vendor~]driver/driver` format (e.g. `flexvolume/k8s~nfs/nfs`, see [Flexvolume documentation](https://github.com/kubernetes/community/blob/master/contributors/devel/flexvolume.md#prerequisites)). | ||
2. creates an image by copying her driver to the Flexvolume deployment base image at `/flexvolume`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might simplify this proposal to focus it on dynamic discovery of new plugins, and reprobing of plugins from the kubelet/controllermanager, etc. Deployment via daemonset could be one mechanism, but shouldn't be prescriptive.
|
||
### High Level Design | ||
|
||
The DaemonSet mounts a hostpath volume exposing the host's Flexvolume driver directory onto every pod. The base deployment image contains a script that copies drivers in the image to the hostpath. A notification is then sent to the filesystem watch from kubelet or controller manager. During volume creation, if there is a signal from the watch, kubelet or controller manager probes the driver directory and loads currently installed drivers as volume plugins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Daemonset has to be privileged to copy the driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chakri-nelluri Could you clarify which exactly operations require privileged mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@php-coder Kubelet executes Flexvolume drivers outside the container's namespace, so driver files have to be copied into the hostpath volume in privileged mode.
|
||
The user story for driver update is similar: Alice creates a new image with her new drivers, and Bob deploys it using the DaemonSet spec. | ||
|
||
Note that the `/flexvolume` directory must look exactly like what is desired in the Flexvolume directory on the host (as described in the [Flexvolume documentation](https://github.com/kubernetes/community/blob/master/contributors/devel/flexvolume.md#prerequisites)). The deployment will replace the existing driver directory on the host with contents in `/flexvolume`. Thus, in order to add a new driver without removing existing ones, existing drivers must also appear in `/flexvolume`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order to add a new driver without removing existing ones, existing drivers must also appear in
/flexvolume
that is unexpected. I'd expect a flexvolume deployment to be focused on installing a single driver, e.g. populating one driver under /usr/libexec/kubernetes/kubelet-plugins/volume/exec/<vendor>~<driver>/<driver>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it has to be possible to install multiple of these
|
||
Cons: Depending on the polling period, either it's needlessly frequent, or it's too infrequent to pick up driver updates quickly. | ||
|
||
4) Using Jobs instead of DaemonSets to deploy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deployment method should be a separated out of this proposal, IMO. The core need is for kubelet and controller manager to detect and use added flex plugins without restart. If that requirement is met, you can use all sorts of methods for installing/maintaining the drivers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. We should provide an example. That's all.
|
||
Because Flexvolume has two separate plugin instantiations (attachable and non-attachable), it's worth considering the case when a driver that implements attach/detach is replaced with a driver that does not, or vice versa. This does not cause an issue because plugins are recreated every time the driver directory is changed. | ||
|
||
There is a possibility that a probe occurs at the same time the DaemonSet updates the driver, so the prober's view of drivers is inconsistent. However, this is very rare and when it does occur, the next `Probe()`call, which occurs shortly after, will be consistent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this approach.
|
||
### Driver Installation Script | ||
|
||
The script will copy the existing content of `/flexvolume` on the host to a location in `/tmp`, and then attempt to copy user-provided drivers to that directory. If the copy fails, the original drivers are restored. This script will not perform any driver validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what happens if the probe fails or the plugin fails to initialize?
We might have to document how to revert to an working older version.
|
||
### Driver Installation Script | ||
|
||
The script will copy the existing content of `/flexvolume` on the host to a location in `/tmp`, and then attempt to copy user-provided drivers to that directory. If the copy fails, the original drivers are restored. This script will not perform any driver validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add this is a recommended way. Users are free to innovate :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're planning to provide a driver installation mechanism as part of this proposal, but like @liggitt mentioned, it can be separated out.
|
||
The user story for driver update is similar: Alice creates a new image with her new drivers, and Bob deploys it using the DaemonSet spec. | ||
|
||
Note that the `/flexvolume` directory must look exactly like what is desired in the Flexvolume directory on the host (as described in the [Flexvolume documentation](https://github.com/kubernetes/community/blob/master/contributors/devel/flexvolume.md#prerequisites)). The deployment will replace the existing driver directory on the host with contents in `/flexvolume`. Thus, in order to add a new driver without removing existing ones, existing drivers must also appear in `/flexvolume`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it has to be possible to install multiple of these
|
||
### Driver Installation Script | ||
|
||
The script will copy the existing content of `/flexvolume` on the host to a location in `/tmp`, and then attempt to copy user-provided drivers to that directory. If the copy fails, the original drivers are restored. This script will not perform any driver validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the "are restored" ? A driver should never ever touch anything that isn't it's own.
The install process, for most drivers, should be:
while true; do
if [ -f /flexvolume/myvendor~mydriver/mydriver ]; then
sleep 60
else
mkdir /flexvolume/myvendor~mydriver
cp -a /mydriver /flexvolume/myvendor~mydriver
fi
done
|
||
In the volume plugin code, introduce a `PluginStub` interface containing a single method `Init()`, and have `VolumePlugin` extend it. Create a `PluginProber` type which extends `PluginStub` and includes methods `Init()` and `Probe()`. | ||
|
||
`Init()` initializes fsnotify, creates a watch on the driver directory as well as its subdirectories (if any), and spawn a goroutine listening to the signal. When the goroutine receives signal that a new directory is created, create a watch for the directory so that driver changes can be seen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use notify, if you are just going to cache the results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is discussed in Alternative Design (3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If probe() were called a reasonable amount of times per second, would you reconsider this point? Just trying to bound complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes definitely. Unfortunately I was seeing bursts of tens of milliseconds between calls. If we change the Find*() logic so that if there's already a match then don't check Flex, then we don't need a watch.
|
||
`Probe()` scans the driver directory only when the goroutine sets a flag. If the flag is set, return true (indicating that new plugins are available) and the list of plugins. Otherwise, return false and nil. After the scan, the watch is refreshed to include the new list of subdirectories. The goroutine should only record a signal if there has been a 1-second delay since the last signal (see [Security Considerations](#security-considerations)). Because inotify (used by fsnotify) can only be used to watch an existing directory, the goroutine needs to maintain the invariant that the driver directory always exists. | ||
|
||
Inside `InitPlugins()` from `volume/plugins.go`, if the `PluginStub` is an instance of `PluginProber`, only call its `Init()` and nothing else. Add an additional field, `flexVolumePluginList`, in `VolumePluginMgr` as a cache. For every iteration of the plugin list, call `Probe()` and update `flexVolumePluginList` if true is returned, and iterate through the new plugin list. If the return value is false, iterate through the existing `flexVolumePluginList`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think anything related to flex should leak outside the flex directory. It should just be part of the initial handshake to figure out if each thing is a plugin or a meta-plugin. Meta-plugins are probed later.
|
||
Pros: Simpler to implement. | ||
|
||
Cons: Kubelet or controller manager iterates through the plugin list many times, so Probe() is called very frequently. Using this model would increase unnecessary disk usage. This issue is mitigated if we guarantee that `PluginProber` is the last `PluginStub` in the iteration, and only `Probe()` if no other plugin is matched, but this logic adds additional complexity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is probe called so often?
|
||
Cons: Depending on the polling period, either it's needlessly frequent, or it's too infrequent to pick up driver updates quickly. | ||
|
||
4) Using Jobs instead of DaemonSets to deploy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. We should provide an example. That's all.
|
||
`Init()` initializes fsnotify, creates a watch on the driver directory as well as its subdirectories (if any), and spawn a goroutine listening to the signal. When the goroutine receives signal that a new directory is created, create a watch for the directory so that driver changes can be seen. | ||
|
||
`Probe()` scans the driver directory only when the goroutine sets a flag. If the flag is set, return true (indicating that new plugins are available) and the list of plugins. Otherwise, return false and nil. After the scan, the watch is refreshed to include the new list of subdirectories. The goroutine should only record a signal if there has been a 1-second delay since the last signal (see [Security Considerations](#security-considerations)). Because inotify (used by fsnotify) can only be used to watch an existing directory, the goroutine needs to maintain the invariant that the driver directory always exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, mere presence of a new directory means that there is a new flex driver there? File copy is not an atomic operation and it may happen that only part of the driver script was copied there. You should wait until whole driver has been copied there. And big question is how do you recognize that...
IMO, installation of a new driver (or driver update) should be atomic operation on the fs, e.g. link(2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jan, excellent point. The only atomic file op is rename, so we should be sure that flex EXPLICITLY ignores files that start with a .
. The installer must copy to flex/.../.mydriver
and then rename that to mydriver
. Let's make this as explicit as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That assumes that the driver is a single file. The installer must rename multiple files one by one (and rename
does not work well with directories).
I am ok with requiring the driver to be a single file, with no helper scripts around, but it must be clearly defined somewhere in flex documentation and a release note.
## **Open Questions** | ||
|
||
* How does this system work with containerized kubelet? | ||
* What if drivers are updated while Flexvolume plugin is executing commands? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very important question and I'd like to see some answer. Especially kubelet must not ever execute a half-copied driver - running partly copied shell script as root sounds scary to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename, as described above, should not suffer this problem, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the scenario Jan mentioned, there are two other possible bad scenarios:
- When a driver command is called, part of the driver command is loaded into memory and is being executed. The driver gets modified, and the remaining part of the command (which is new) gets executed. I'm not sure if this could happen.
- The driver gets modified while the command is read into memory, before execution even occurs. This could definitely happen.
I don't think renaming solves either of these issues. A write should never happen while an execution is underway. Ideally the solution fixes both the issues above and Jan's scenario.
My first thought is a RWLock implementation, but I'd like to avoid locking if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- should get an ETXTBUSY for the person trying to modify the binary
I don't understand 2?
In any case, rename is atomic. As soon as I open the file descriptor, if anyone renames another file over this file, I still get the old file. It is pinned until I close that fd. Happy to explain more - it's a good trick to have in your toolbox.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's really neat. In that case yes renaming will solve both of these scenarios.
What I tried to describe in (2) is: before the driver gets executed, part of its commands are read into memory, then the remaining commands are modified, and lastly the modified commands are read in.
Rename does work with symlinks, though. It's ugly but doable. mydriver ->
.mydriver/mydriver
…On Wed, Jul 26, 2017 at 10:06 AM, Jan Šafránek ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In contributors/design-proposals/flexvolume-deployment.md
<#833 (comment)>:
> + volumeMounts:
+ - mountPath: /flexmnt
+ name: flexvolume-mount
+ volumes:
+ - name: flexvolume-mount
+ hostPath:
+ path: <host_driver_directory>
+```
+
+### Dynamic Plugin Discovery
+
+In the volume plugin code, introduce a `PluginStub` interface containing a single method `Init()`, and have `VolumePlugin` extend it. Create a `PluginProber` type which extends `PluginStub` and includes methods `Init()` and `Probe()`.
+
+`Init()` initializes fsnotify, creates a watch on the driver directory as well as its subdirectories (if any), and spawn a goroutine listening to the signal. When the goroutine receives signal that a new directory is created, create a watch for the directory so that driver changes can be seen.
+
+`Probe()` scans the driver directory only when the goroutine sets a flag. If the flag is set, return true (indicating that new plugins are available) and the list of plugins. Otherwise, return false and nil. After the scan, the watch is refreshed to include the new list of subdirectories. The goroutine should only record a signal if there has been a 1-second delay since the last signal (see [Security Considerations](#security-considerations)). Because inotify (used by fsnotify) can only be used to watch an existing directory, the goroutine needs to maintain the invariant that the driver directory always exists.
That assumes that the driver is a single file. The installer must rename
multiple files one by one (and rename does not work well with
directories).
I am ok with requiring the driver to be a single file, with no helper
scripts around, but it must be clearly defined somewhere in flex
documentation and a release note.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#833 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVDz5L99Pw55PF7KMdOgDOOXDpX2Wks5sR3IJgaJpZM4OgFHi>
.
|
If we go with the route of having a separate credentials file, then like @jsafrane and @thockin mentioned before, we need to make sure that driver command execution doesn't interleave with driver installation. The deployment script in this proposal assumes there is a single file inside the driver directory. This problem only comes up if the crendentials file needs to be changed. One solution is to install each file individually (copy to .dot file & rename) rather than installing the entire directory all at once. By the way, the credentials file doesn't have to have a .dot prefix. The Flexvolume plugin uses the driver directory name to compute the name of the driver file. |
Kubernetes service VIP should be routable on the host as well, assuming you have kube-proxy running on the same host? |
@verult @MrHohn How do i know what the master IP is? However on minikube, the IP i get from the KUBERNETES_SERVICE_HOST env is not routable. |
@kokhang For Minikube, probably try |
The point is that flex plugins need a reliable way to know how to reach the API server. If they have to try to figure out what type of kube deployment they are on, they'll do it inconsistently at best and fail to handle some kube deployments at worst. In-process volume plugins are given an API client (via the host interface) as part of their
for now, the kubeconfig file could be the same as the one the kubelet uses, but if a need for a separate identity came up in the future, a different kubeconfig could be fed to the plugin |
We had a productive meeting today. We identified three problems for the Rook Flex Driver:
Agreed upon solution:
Once @verult's dynamic discovery of Flex drivers work is merged, we should add E2E test for Flex to prevent regressions (accidental changes to the Flex API). |
A PR for e2e Flexvolume test was recently merged and is in the process of being added to the testgrid: kubernetes/kubernetes#48366 Currently it uses kubelet restart. Once dynamic discovery is finished, this can be changed to make the test non-disruptive. |
|
||
Because Flexvolume has two separate plugin instantiations (attachable and non-attachable), it's worth considering the case when a driver that implements attach/detach is replaced with a driver that does not, or vice versa. This does not cause an issue because plugins are recreated every time the driver directory is changed. | ||
|
||
There is a possibility that a Flexvolume command execution occurs at the same time as the DaemonSet updates the driver, which leads to a bad execution. This cannot be solved within the Kubernetes system without an overhaul. Instead, this is discussed in [Atomic Driver Installation](#atomic-driver-installation) as part of the deployment mechanism. As part of the solution, the Prober will ignore all files that begins with "." in the driver directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first time DaemonSet was mentioned and it caught me by surprise that it is being referred to as the DaemonSet.
Could the proposal be extended to describe that DaemonSet is expected to be the mechanism to run on every node something which will then copy scripts to the host, before discussing the race condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or avoid discussing specific deployment mechanisms here and just say "…possibility that a Flexvolume command execution occurs at the same time as the driver is updated…"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks for catching this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me the solution proposed by @jsafrane can provide the same functionality, but
- aligns better with the mount-utils delivery work
- does not touch the host, which avoids to keep track of the host sided things
To me these are two pretty strong points.
Just my 2ct
I think @jsafrane 's proposal would be a natural evolution of Flex. The dynamic plugin discover work here will build towards that goal, offering the ability to discover unix domain socket on the fly. |
61fe071
to
ebee98b
Compare
Merging this. |
/lgtm |
Automatic merge from submit-queue |
Automatic merge from submit-queue Dynamic Flexvolume plugin discovery proposal. This lays out the design for the improved deployment story of Flexvolume, making it easier to work with out-of-tree volume plugins before CSI comes to fruition.
This lays out the design for the improved deployment story of Flexvolume, making it easier to work with out-of-tree volume plugins before CSI comes to fruition.