Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add and use stream metadata for RHCOS, add openshift-install coreos print-stream-json #4582

Merged
merged 4 commits into from
Mar 25, 2021

Conversation

cgwalters
Copy link
Member

@cgwalters cgwalters commented Jan 21, 2021

This implements part of the plan from:
openshift/os#477

When we originally added the pinned RHCOS metadata rhcos.json
to the installer, we also changed the coreos-assembler meta.json
format into an arbitrary new format in the name of some cleanups.
In retrospect, this was a big mistake because we now have two
formats.

Then Fedora CoreOS appeared and added streams JSON as a public API.

We decided to unify on streams metadata; there's now a published
Go library for it: https://github.com/coreos/stream-metadata-go

Among other benefits, it is a single file that supports multiple
architectures.

UPI installs should now use stream metadata, particularly
to find public cloud images.

This is an important preparatory step for exposing this via
oc as well as having something in the cluster update to
it.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 21, 2021
@openshift-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@LorbusChris
Copy link
Member

If we still have the OKD fork in 4.8, we'll want this there as well.
/cc @vrutkovs
/cc

@cgwalters
Copy link
Member Author

OK rebased 🏄 though only compile tested so far. Let's run this for some platforms:
/test e2e-aws
/test e2e-azure
/test e2e-libvirt

@cgwalters
Copy link
Member Author

Picking this back up, but it will need coreos/coreos-assembler#2052

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 23, 2021
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 23, 2021
@cgwalters
Copy link
Member Author

/test e2e-aws
/test e2e-azure
/test e2e-libvirt

@cgwalters
Copy link
Member Author

/test e2e-aws
/test e2e-azure
/test e2e-libvirt

@cgwalters
Copy link
Member Author

cgwalters commented Feb 24, 2021

OK cool! e2e-libvirt passed, and the other two just failed on some sort of Prometheus flake/issue that is very unlikely related to this.

Going to work on cleaning this up and then lift the draft soon.

@cgwalters cgwalters marked this pull request as ready for review February 24, 2021 21:29
@cgwalters cgwalters changed the title WIP: use stream metadata Use stream metadata for RHCOS, bump x86_64 to 48.83.202102230316-0 Feb 24, 2021
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 24, 2021
@cgwalters
Copy link
Member Author

OK lifting draft!

@cgwalters cgwalters force-pushed the use-stream-metadata branch 2 times, most recently from cdaa105 to 0889bd1 Compare February 24, 2021 22:15
data/data/rhcos-4.8.json Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
pkg/asset/cluster/tfvars.go Outdated Show resolved Hide resolved
pkg/asset/cluster/tfvars.go Outdated Show resolved Hide resolved
pkg/asset/cluster/tfvars.go Show resolved Hide resolved
pkg/asset/cluster/tfvars.go Outdated Show resolved Hide resolved
pkg/asset/cluster/tfvars.go Outdated Show resolved Hide resolved
pkg/asset/cluster/tfvars.go Show resolved Hide resolved
}
data, err := gcptfvars.TFVars(
gcptfvars.TFVarsSources{
Auth: auth,
MasterConfigs: masterConfigs,
WorkerConfigs: workerConfigs,
ImageURI: imageRaw,
ImageURI: gcpimage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any difference between this image URI and the one from the rhcos image asset any more? This code used to get the "raw image" but now gets the same image as the rhcos image asset does. Is that intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK right, I should have looked at this more closely. Some quick digging with git annotate points to
1f2d071
which is related to the special licenses dance. But in fact, we enable the nested virt license on the public RHCOS image by default since 4.6, which was the main motivation to support license overrides introduced in #3430

So...we could do a prep commit which cleans this up and drops the support from the installer for overriding licenses?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Random aside, bits of CoreOS CI testing actually relies on a nested KVM setup in our own Prow instance)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A prep commit would be nice so as not to lose the nuance of this change.

I am not very familiar with how the licenses work. Are you claiming that we do not need to support user-supplied licenses any more? If we switch away from using the raw image, we will lose support for user-supplied licenses?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I was afraid of. We cannot remove support for licenses. It is a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you claiming that we do not need to support user-supplied licenses any more?

I am 93.2% sure there is no use case for other licenses beyond enable-vmx (which again is the default now), I looked and couldn't even find references to this. I tried out the "license list" API linked there:
https://cloud.google.com/compute/docs/reference/rest/v1/licenses/list?apix_params=%7B%22project%22%3A%22openshift-gce-devel%22%2C%22returnPartialSuccess%22%3Atrue%7D
and I just got the empty set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I added a hack for this. I'd still argue we can drop the license stuff, but if we decide it all needs to stay as is I can try plumbing through the image URL into the stream metadata.

cgwalters added a commit to cgwalters/stream-metadata-go that referenced this pull request Feb 25, 2021
Came up in review of openshift/installer#4582 (comment)

Since error prefixing will use `:`, let's avoid ambiguity.  It's
going to be generally obvious from context what a stream name
and architecture are (particularly the architecture, everyone
knows `x86_64` and `aarch64` if they're provisioning a machine
for them).
@cgwalters
Copy link
Member Author

Now merges the changes from #4778

@vrutkovs
Copy link
Member

/retest

@staebler
Copy link
Contributor

I assume that this is going to need a rebase now that #4778 has merged.

cgwalters and others added 4 commits March 24, 2021 18:20
This is a new Go library for parsing stream metadata that will
be shared by FCOS and RHCOS.
This implements part of the plan from:
openshift/os#477

When we originally added the pinned RHCOS metadata `rhcos.json`
to the installer, we also changed the coreos-assembler `meta.json`
format into an arbitrary new format in the name of some cleanups.
In retrospect, this was a big mistake because we now have two
formats.

Then Fedora CoreOS appeared and added streams JSON as a public API.

We decided to unify on streams metadata; there's now a published
Go library for it: https://github.com/coreos/stream-metadata-go

Among other benefits, it is a single file that supports multiple
architectures.

UPI installs should now use stream metadata, particularly
to find public cloud images.  This is exposed via a new
`openshift-install coreos print-stream-json` command.

This is an important preparatory step for exposing this via
`oc` as well as having something in the cluster update to
it.

HOWEVER as a (really hopefully temporary) hack, we *duplicate*
the metadata so that IPI installs use the new stream format,
and UPI CI jobs can still use the old format (with different RHCOS versions).

We will port the UPI docs and CI jobs after this merges.

Co-authored-by: Matthew Staebler <staebler@redhat.com>
Now that we have a standardized JSON format for the bootimages,
add a CLI command to print it.  This is necessary for UPI installs;
previously what we've done for UPI is completely ad-hoc, such
as having the docs team copy the AMI list into HTML format.  Our
UPI CI automation copies `rhcos.json` from the installer git into
a container, but customers can't do that.

In the past I've even heard of people e.g. searching in AWS
for "RHEL CoreOS" and picking AMIs based on that, which is
extremely bad because they may not even get something from
us.

This closes a major gap in allowing customers to automate UPI
installs in the same way IPI works.

One aside: because the JSON data is embedded in the installer
binary, which is in turn embedded in the release image, which
is GPG signed, one can transitively verify the integrity
via that signature.
Briefly describe the history and future of the pinned {RHEL, Fedora} CoreOS
metadata in the installer.

Co-authored-by: Matthew Staebler <staebler@redhat.com>
@cgwalters cgwalters force-pushed the use-stream-metadata branch from c6c902c to 7a7b055 Compare March 24, 2021 18:21
@cgwalters cgwalters changed the title Use stream metadata for RHCOS, bump x86_64 to 48.83.202102230316-0 Add and use stream metadata for RHCOS, add openshift-install coreos print-stream-json Mar 24, 2021
@cgwalters
Copy link
Member Author

Rebased 🏄

Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 24, 2021
@staebler
Copy link
Contributor

/override ci/prow/e2e-aws-upgrade

@openshift-ci-robot
Copy link
Contributor

@staebler: Overrode contexts on behalf of staebler: ci/prow/e2e-aws-upgrade

In response to this:

/override ci/prow/e2e-aws-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@romfreiman
Copy link

🤞

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 25, 2021

@cgwalters: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-workers-rhel7 7a7b055 link /test e2e-aws-workers-rhel7

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@romfreiman
Copy link

/retest

@openshift-merge-robot openshift-merge-robot merged commit 5ae96ab into openshift:master Mar 25, 2021
cgwalters added a commit to cgwalters/installer that referenced this pull request Mar 25, 2021
Split out from openshift#4582

This copies the bits from https://github.com/cgwalters/rhel-coreos-bootimages
which builds a ConfigMap out of the stream metadata and injects
it into the cluster.

We have an `installer` image in the release image today; this adds
the "is an operator" label, even though it's not really an
operator.  We just want the CVO to inject the manifest.

Among other important semantics, this will ensure that in-place
cluster upgrades that have new pinned CoreOS stream data will
have this configmap updated.
staebler added a commit to staebler/installer that referenced this pull request Dec 9, 2021
Currently, the installer relies on a generated go file for determining
the AWS region in which the RHCOS image is published. The `go generate`
directive was inadvertently removed in openshift#4582.
Rather than resurrecting the directive, this commit removes the generated
code in favor of gathering the regions directly from the rhcos stream
data when needed.

https://issues.redhat.com/browse/CORS-1838
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants