Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create KEP for Windows Node Support #676

Merged
merged 2 commits into from
Jan 11, 2019
Merged

Conversation

benmoss
Copy link
Member

@benmoss benmoss commented Jan 3, 2019

Adds a KEP covering Windows support and a sig-windows directory for it to live in.

Adds a KEP covering Windows support and a sig-windows directory for it to live in.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 3, 2019
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/pm labels Jan 3, 2019
@spiffxp
Copy link
Member

spiffxp commented Jan 3, 2019

/milestone v1.14

@k8s-ci-robot k8s-ci-robot added this to the v1.14 milestone Jan 3, 2019
@benmoss
Copy link
Member Author

benmoss commented Jan 7, 2019

/assign @bgrant0607

@benmoss
Copy link
Member Author

benmoss commented Jan 8, 2019

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jan 8, 2019
@spiffxp
Copy link
Member

spiffxp commented Jan 8, 2019

/sig windows

@k8s-ci-robot k8s-ci-robot added the sig/windows Categorizes an issue or PR as relevant to SIG Windows. label Jan 8, 2019
Copy link
Member

@justaugustus justaugustus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SIG Windows folder will also need an OWNERS file. Please use this as a template and s/azure/windows

@justaugustus
Copy link
Member

/assign @michmike @PatrickLang

Signed-off-by: Ben Moss <bmoss@pivotal.io>
@PatrickLang
Copy link
Contributor

/lgtm

Once this is merged as a draft, we can split up the work including adding the test case list and other sections @spiffxp has requested for v1.14 release.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 10, 2019
@justaugustus
Copy link
Member

/lgtm
/approve

@PatrickLang
Copy link
Contributor

PatrickLang commented Jan 11, 2019

pinging @bgrant0607 @jdumars or @jbeda - can we get this /approve'd? we have multiple people ready to contribute more to the draft and we can't do that until we get our own directory+OWNERS file here merged. KEPs are about merging early and iterating quickly until they're done.

@jbeda
Copy link
Contributor

jbeda commented Jan 11, 2019

/approve

Clearly this is something that folks are taking up and getting the doc checked in will facilitate more focused discussions. Note that this isn't official until it is marked "implementable".

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: benmoss, jbeda, justaugustus, michmike, PatrickLang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 11, 2019
@k8s-ci-robot k8s-ci-robot merged commit 1417757 into kubernetes:master Jan 11, 2019
@PatrickLang
Copy link
Contributor

/milestone v1.14

@bgrant0607
Copy link
Member

Since I don't know of a better way on Github, I'm going to review this PR even though it's already merged.

@bgrant0607
Copy link
Member

For reference, there was a draft of this sent by email:
https://hackmd.io/s/SJjDnO6C7

@bgrant0607
Copy link
Member

cc @yujuhong @pjh

@bgrant0607
Copy link
Member

For visibility:
SIG Storage: @saad-ali
SIG Network: @thockin
SIG Scheduling: @bsalamat
RuntimeClass: @tallclair
SIG Apps: @kow3ns @mattfarina

SIG Testing, Release, and Docs will need followup also

Copy link
Member

@bgrant0607 bgrant0607 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a first pass. I'll look again through previous emails and docs.


### Goals

- Enable users to run nodes on Windows servers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have written:

  • Enable users to run Windows server containers on Windows servers using Kubernetes

- Many<sup id="a1">[1]</sup> of the e2e conformance tests when run with [alternate Windows-based images](https://hub.docker.com/r/e2eteam/) which are being moved to [kubernetes-sigs/windows-testing](https://www.github.com/kubernetes-sigs/windows-testing)
- Persistent storage: FlexVolume with [SMB + iSCSI](https://github.com/Microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows), and in-tree AzureFile and AzureDisk providers

<sup id="a1">1</sup> This list should be available at https://k8s-testgrid.appspot.com/sig-windows but this test setup is not currently working. https://k8s-testgrid.appspot.com/google-windows#windows-prototype is also running against a Windows cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is addressing those issues part of #685?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm clarifying those in #685

**User experience**: Users today will need to use some combination of taints and node selectors in order to keep Linux and Windows workloads separated. In the best case this imposes a burden only on Windows users, but this is still less than ideal.

## Graduation Criteria

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@craiglpeters agreed to draft these

To use as a starting point, here are some issues discussed in email and in prior SIG Arch meetings:

  1. There need to be adequate, continuously run, non-flaky tests with publicly accessible results, enabled as part of the release-blocking suite. Without this it's hard to have reasonable discussions about what does and doesn't work, and the release team can't make a judgement about release readiness or risk. Really, this is needed for any feature at any stage of maturity in order for us to make it available to users in a Kubernetes release.

  2. There needs to be adequate end user and admin documentation that describes what the user does and how to use it. I know there is a start on user documentation (WIP: Windows doc set for v1.13 stable website#10875), which at least covered "how to use it", and I'll take another look at it. One purpose of this KEP was to fill the role that a priori design proposals traditionally fill in providing a deeper level of detail about how a feature works and why.

  3. Reliability needs to be sufficiently high. Users run GA features in production. Usually we have some mileage on features in beta before they go GA, and at least a quarter or two of e2e test results.

  4. Compatibility can't be broken in GA features, either for existing users/clusters/features or for the new feature going forward, and the feature needs to adhere to the deprecation policy (https://kubernetes.io/docs/reference/using-api/deprecation-policy/).

Note that a draft document stated "you may want to wait for Windows Server 2019 availability from Microsoft and support in Kubernetes for production workloads", which needs to be clarified.

There were also questions about the user experience, particular for mixed-OS clusters. Alternatives for ensuring Windows containers land on Windows nodes and Linux containers land on Linux nodes include:

  • Manual node labels and selectors for both Linux and Windows workloads
  • Manual taints and tolerations just for Windows workloads
  • Automatically applied nodeSelectors for both Linux and Windows workloads
    • derived from image manifest
    • derived from something else in PodSpec
  • Automatically applied tolerations for at least Windows workloads
    • derived from image manifest
    • derived from something else in PodSpec

Some issues with the above:

  • We don't want to break compatibility for existing Linux workloads
  • We don't want the UX for Windows apps to be worse than for Linux forever
  • Setting first-class os and arch properties by default in the apiserver would break existing use cases, such as ARM
    os and arch node labels appear to be still be beta
  • Not clear that most container images contain the necessary OS info
  • Not clear that extracting the OS info from the container image manifest during admission control is feasible for private image repos

Some of this was discussed in a document:
https://docs.google.com/document/d/1XLs8Mbz1-xOIiDW9XSSuhx9fshpxJM1NDD1a0oVbzfc/edit

- Privileged containers
- Reservations are not enforced by the OS, but overprovisioning could be blocked with `--enforce-node-allocatable=pods` (pending: tests needed)
- CSI plugins, which require privileged containers
- [Some parts of the V1 API](https://github.com/kubernetes/kubernetes/issues/70604)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please inline the contents of that issue into this document

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it seems we lost quite a bit of detail compared to previous discussions. Do those issues still hold true?
https://docs.google.com/document/d/1YkLZIYYLMQhxdI2esN5PuTkhQHhO0joNvnbHpW68yg8/edit#heading=h.4khm1q370oiq

For instance, some pod features didn't work due to: Single file volume mappings. No shipped releases of Windows can map a single file, only an entire folder, into a pod/container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other persistent issues: uid/guid vs usernames, per-user Linux filesystem permissions, read-only root filesystems

Other resolvable issues: images using Linux-specific tools, hardcoded images with no windows equivalent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are system OOMs reported?

### What will never work (without underlying OS changes)
- Certain Pod functionality
- Privileged containers
- Reservations are not enforced by the OS, but overprovisioning could be blocked with `--enforce-node-allocatable=pods` (pending: tests needed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that QoS (burstable, best effort) doesn't work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there equivalents of any of the shared namespaces (e.g., shareProcessNamespace)? Can containers within a pod see each other in any way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does terminationGracePeriodSeconds work?


### What will never work (without underlying OS changes)
- Certain Pod functionality
- Privileged containers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume Linux capabilities don't work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And Linux-specific security features, such as seccomp, SELinux, and AppArmor

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should enumerate all of the fields of PodSecurityContext that don't make sense for Windows


### What works today
- Windows-based containers can be created by kubelet, [provided the host OS version matches the container base image](https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility)
- ConfigMap, Secrets: as environment variables or volumes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about volumes, such as emptyDir, shared between containers within a Pod?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about storage medium Memory or HugePages?

- Certain Pod functionality
- Privileged containers
- Reservations are not enforced by the OS, but overprovisioning could be blocked with `--enforce-node-allocatable=pods` (pending: tests needed)
- CSI plugins, which require privileged containers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FlexVolume?

- Dockershim CRI
- Many<sup id="a1">[1]</sup> of the e2e conformance tests when run with [alternate Windows-based images](https://hub.docker.com/r/e2eteam/) which are being moved to [kubernetes-sigs/windows-testing](https://www.github.com/kubernetes-sigs/windows-testing)
- Persistent storage: FlexVolume with [SMB + iSCSI](https://github.com/Microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows), and in-tree AzureFile and AzureDisk providers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do pod hostname and subdomain fields work? How about hostAliases? dnsConfig?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically I expect someone to read through PodSpec field by field to make sure we haven't forgotten something.
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L2743

@PatrickLang
Copy link
Contributor

Thanks @bgrant0607 . I updated some of the test sections in #685, and will continue working with Craig on the other areas in additional PRs as we finish this KEP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants