Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nested support for GCP #3430

Merged
merged 1 commit into from
Jun 4, 2020

Conversation

amorenoz
Copy link
Contributor

@amorenoz amorenoz commented Apr 8, 2020

This is based on the a patch from @cgwalters. I just applied his patch and tweaked a bit the terraform variables file to avoid some errors.
I am not familiar with the project, so I cannot judge whether this implementation is acceptable or not. I can only testify that it does work.
So, this PR should be understood as me volunteering to rework this patch based on the feedback I receive.

Fixes #2649

@amorenoz
Copy link
Contributor Author

amorenoz commented Apr 8, 2020

cc @abhinavdahiya @mrunalp PTAL

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 8, 2020
@openshift-ci-robot
Copy link
Contributor

Hi @amorenoz. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Comment on lines 289 to 293
// See https://github.com/coreos/coreos-assembler/blob/master/doc/openshift-gcp-nested-virt.md
// and https://cloud.google.com/compute/docs/instances/enable-nested-virtualization-vm-instances
licenses := []string{}
if _, nestedvirt := os.LookupEnv("OPENSHIFT_INSTALL_OS_IMAGE_GCP_ENABLE_NESTED_VIRT"); nestedvirt {
licenses = append(licenses, "https://www.googleapis.com/compute/v1/projects/vm-options/global/licenses/enable-vmx")
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using env to configure the list of licenses is not great!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any alternative suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3430 (comment)

cannot answer or suggest without details on ^^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would a dedicated field in the the customizable install-config.yaml be an acceptable approach?

@abhinavdahiya
Copy link
Contributor

currently the installer creates the image, but in case it is forced to use an already exiting image, how would installer enforce this on an image that isn't enabled.

should users be allowed to turn it off, any reason some users might not like this default.

are there other URLs for licences that people would want to use. if yes, how would they use those, and if we allow users to specify a different list, how does that fit in with pre-existing images.

@cgwalters
Copy link
Member

cgwalters commented Apr 8, 2020

currently the installer creates the image, but in case it is forced to use an already exiting image, how would installer enforce this on an image that isn't enabled.

If someone is using the image override environment variable, they can figure this out on their own.

should users be allowed to turn it off, any reason some users might not like this default.

This is off by default.

are there other URLs for licences that people would want to use.

Not that I know of. When sketched out this patch I looked at that briefly and couldn't find any other "licenses" - it seems like they added something generic but nothing besides nested virt has used it yet.

But, in the (IMO unlikely) case that one appears and we want to use it...we could accept another environment variable.

I think it's OK too if we choose to break this interface later, as long as we give sufficient notice for users to adapt. Ideally, the only user is the openshift/release git repository at least to start.

But I dunno, we could call it OPENSHIFT_INSTALL_GCP_LICENSES and have it take comma/whitespace -separated values or so too.

if yes, how would they use those, and if we allow users to specify a different list, how does that fit in with pre-existing images.

If we get to the point where we have pre-existing images for RHCOS (which certainly may happen) then one of two things would need to happen: We change this code to copy it via Terraform, or we punt and ask users to do it.

The reason the installer needs to do it is really precisely because RHCOS is pinned in the installer and we don't have any public interface like openshift/enhancements#201 to do it...and I promise that will be implemented at some point but the benefit of supporting CNV and others using nested virt for CI in the meantime is substantial.

@abhinavdahiya
Copy link
Contributor

I promise that will be implemented at some point but the benefit of supporting CNV and others using nested virt for CI in the meantime is substantial.

If this is being added for CI,

  • This doesn't have to be configured by installer during install-time. not every thing needs to be.
  • The CI jobs creates a nested virt image.
  • The CI job then creates new compute machinesets using the new nested virt image.

machinesets should allow the CI job to easily do this. and the new steps registry for CI jobs should make it really easy.

The original issue that requested to RFE isn't something a lot of the users are expecting to see, no a lot of traction, nor are we seeing this through JIRA RFEs. So adding ENV based configuraton is something i would like to avoid, and install-config based has difficult maintainability concerns later on when we mature our OS image handling..

@amorenoz
Copy link
Contributor Author

I promise that will be implemented at some point but the benefit of supporting CNV and others using nested virt for CI in the meantime is substantial.

If this is being added for CI,

* This doesn't have to be configured by installer during install-time. not every thing needs to be.

* The CI jobs creates a nested virt image.

* The CI job then creates new compute machinesets using the new nested virt image.

This not for CI alone. We (i.e: the virt team trying to make Kata Containers work on OpenShift) would like to deliver a tech-preview version on GCP. So this would enable potential users to try nested GCP + Kata.

@abhinavdahiya
Copy link
Contributor

#3430 (comment)

since we are expecting to bring it to users.. I think we need to,

  • drive this using the install-config
  • request for a list of license URLs
  • make sure we document that using these URLs will force the installer to copy the source_image before using them for cluster. currently this doesn't matter but it could matter soon therefore the field should have this documented.
  • also that setting this field with OS_OVERRIDE is not allowed.

@amorenoz
Copy link
Contributor Author

OK @abhinavdahiya , I'll give it a spin

@amorenoz
Copy link
Contributor Author

@abhinavdahiya PTAL

Comment on lines 289 to 293
licenses := []string{}
for _, license := range installConfig.Config.GCP.Licenses {
licenses = append(licenses, license)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the copy? isn't it already a string slice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right.

@@ -29,4 +29,9 @@ type Platform struct {
// The value should be the name of the subnet.
// +optional
ComputeSubnet string `json:"computeSubnet,omitempty"`

// Licenses is a list of licenses to apply to the compute images
// The value should a list of strings representing the license keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. add the requirement that these are URLs (https) only
  2. add documentation that this field cannot be used to any kind of OS image override
    ^^ also update the docs to say that.. we don't want to enumerate the it's only the env, but generically all workflows like in future the clusterOSImage fields.
  3. also add documentation that when set, this will cause the installer to copy the image into user's project.

Copy link
Contributor

@abhinavdahiya abhinavdahiya Apr 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the docs https://cloud.google.com/compute/docs/reference/rest/v1/images/insert

licenses[] | string                     Any applicable license URI.                     
Authorization requires the following IAM permission on the specified resource licenses: compute.licenseCodes.use

it seems like the user creating the image with a license needs to have certain permissions to use it. we should add that to docs/

there is also an API to get the license.. https://cloud.google.com/compute/docs/reference/rest/v1/licenses/get
we should add a validation to pkg/asset/installconfig/gcp/validations.go so that we can validate, user(here the installer with the creds given to it) can GET the license and the users are not setting invalid licenses..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor Author

@amorenoz amorenoz Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving it a try, I think it might not be possible to validate the nested license through the compute.license API. Such API seems to be intended to inspect project-specific resources:
GET https://compute.googleapis.com/compute/v1/projects/{project}/global/licenses/{resourceId}

However, the nested license seems to live in a special internal "project" called "vm-options". That does not seem accessible through the API

https://www.googleapis.com/compute/v1/projects/vm-options/global/licenses/enable-vmx

Trying to access that resource from the API yields a permission error. And trying to grant permissions over this "vm-options" project also fails.
I have talked with GCP's tech support and they have confirmed this.

Should we still try to keep the option generic (string-based license URLs) or should we add a "enable-nested" knob?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like we can't validate if license is valid, so let's drop that specific validation.

@@ -8,6 +8,7 @@ Beyond the [platform-agnostic `install-config.yaml` properties](../customization
* `controlPlaneSubnet` (optional string): The name of an existing GCP subnet which should be used by the cluster control plane.
* `computeSubnet` (optional string): The name of an existing GCP subnet which should be used by the cluster nodes.
* `defaultMachinePlatform` (optional object): Default [GCP-specific machine pool properties](#machine-pools) which apply to [machine pools](../customization.md#machine-pools) that do not define their own GCP-specific properties.
* `licenses` (optional list of strings): A list of license URLs that should be applied to the compute images. The use of this property in combination with OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE is forbidden. Also, note that use of these URLs will force the installer to copy the source image before being used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to include docs from gcp that show where to get these licenses..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@amorenoz amorenoz force-pushed the wip/gcp_nested branch 3 times, most recently from 052f09b to 6b6565d Compare April 17, 2020 17:24
}

for i, license := range p.Licenses {
if !strings.HasPrefix(license, "https://") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use

// URIWithProtocol validates that the URI specifies a certain
// protocol scheme (e.g. "https")
func URIWithProtocol(uri string, protocol string) error {
parsed, err := url.Parse(uri)
if err != nil {
return err
}
if parsed.Scheme != protocol {
return fmt.Errorf("must use %s protocol", protocol)
}
return nil
}

@abhinavdahiya
Copy link
Contributor

@amorenoz i think https://github.com/openshift/installer/pull/3430/files#r410395101 is the only remaining comment from me.

after that we can merge this as soon as we open for 4.6 development.

@abhinavdahiya
Copy link
Contributor

/ok-to-test
/test e2e-gcp

@openshift-ci-robot openshift-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 23, 2020
Add an optional parameter in the GCP install-config that
contains a list of license URLs to be added to the compute image

Credits:
Based on the work by Colin Walters

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Copy link
Contributor

@abhinavdahiya abhinavdahiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@cgwalters
Copy link
Member

Thanks a lot for driving this to completion! It will help us use OpenShift CI more for RHCOS too!

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@cgwalters
Copy link
Member

Just to xref, this will obsolete coreos/coreos-assembler#748

@dustymabe
Copy link
Member

On a semi related note in FCOS land we are going to try to switch to enabling nested virtualization by default on our images. Our representatives from GCP have indicated there shouldn't be any drawbacks for doing so: coreos/fedora-coreos-pipeline#242

If we switch to using already created images in GCP for openshift-installer we maybe should consider doing the same thing (switching to nested virt by default) for RHCOS.

@cgwalters
Copy link
Member

Just to xref, in the future based on
coreos/fedora-coreos-pipeline#242 (comment)
we are likely to switch RHCOS+openshift-install to boot from a pre-created image with nested virt enabled by default.

@dustymabe
Copy link
Member

and more details in coreos/fedora-coreos-pipeline#242 (comment)

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

10 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 64f1a13 into openshift:master Jun 4, 2020
@fabiand
Copy link

fabiand commented Jun 4, 2020

Wohoo.

IS it already clear in which release this feature will land?

@fabiand
Copy link

fabiand commented Jun 4, 2020

Please ping me at the next conf for a cold beverage.

@amorenoz
Copy link
Contributor Author

amorenoz commented Jun 4, 2020

I think this should land on 4.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. version/4.6 Tracking changes that should end up in 4.6 release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFE] Add nested virtualisation support in GCP
9 participants