-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds openshift image derivation test #746
adds openshift image derivation test #746
Conversation
Skipping CI for Draft Pull Request. |
baabcb1
to
ac1bedf
Compare
tests/layering/layering_test.go
Outdated
// will get cleaned up on reboot since file is in /run | ||
writeToMCDContainer(t, cs, infraNode, path.Join("/rootfs", authfilePath), authfileData) | ||
|
||
// rpm-ostree rebase --experimental ostree-unverified-image:docker://image-registry.openshift-image-registry.svc.cluster.local:5000/openshift-machine-config-operator/test-boot-in-cluster-image-build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// rpm-ostree rebase --experimental ostree-unverified-image:docker://image-registry.openshift-image-registry.svc.cluster.local:5000/openshift-machine-config-operator/test-boot-in-cluster-image-build | |
// rpm-ostree rebase --experimental ostree-unverified-image:image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/test-boot-in-cluster-image |
or maybe just
// rpm-ostree rebase --experimental ostree-unverified-image:docker://image-registry.openshift-image-registry.svc.cluster.local:5000/openshift-machine-config-operator/test-boot-in-cluster-image-build | |
// rpm-ostree rebase --experimental imageURL |
to avoid similar inconsistencies in the future
Ironically this inconsistency was a result of my own negligence 😆 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! I'm cleaning this up as I go.
Cool!
Is the test exactly the same? Could we instead of adding go vendoring here, have the builder also check out the MCO repo and build the test from there? |
The test currently lives in the MCBS branch, which was primarily used as a POC branch when we started experimenting with the CoreOS layering work. Unfortunately, this branch no longer has CI attached to it and it's no longer maintained at this point. As for the test itself, I've made some local changes now that we know more about the overall CI setup for everything, as well as doing some refactoring / cleanup. As for the dependencies, I only vendored them since that seems like the OpenShift / CoreOS convention; I wasn't sure if I could solely get away with just a I'll mention that the test itself does not directly test anything involving the MCO; we just have a couple of test libraries that I thought would be useful for this. I could probably duplicate a subset of those libraries here as a way of reducing the number of dependencies. |
So...this is definitely tricky. As of right now, this repository is only consumed by coreos-assembler - which already has a lot of Go code for tests. Adding new Go code here means there's another place to handle vendor bumps etc. Another balance we try to handle though is that we don't have a hard dependency for "coreos stuff" on OpenShift or even Kubernetes. (e.g. coreos-assembler can also be run via podman) WDYT about having this in e.g. https://github.com/openshift/origin/tree/master/test/extended instead? I think we can wire things up so that the new test is only run on this repo to start - but it would mean we could also more easily run it as a periodic outside of PRs to this repo? That said, on the other hand there are advantages to having tests live with the repository. A big part of the reason coreos external tests were created is that the tests can live inside the config repo (like this one) that also (for FCOS) also versions the content. I think I lean on balance towards adding this to the openshift tests, but it's by no means a strongly held opinion. Another counterbalance could be adding more tests in Go code here. |
Or I guess to say this another way what has me hesitating a bit is that today the output from a build of this repository is a "cosa build", which is the oscontainer image and disk images. Now we do have a Dockerfile, but that's mainly to "fake out" Prow and try to get "a build" of stuff together - we don't actually ship the result of that build. But if we're building a test, I think there are definitely cases where we'd want to build that test outside of building the repo - which is how the origin tests work. If we were to go this path, perhaps actually it should be fedora-coreos-config that has a Go testing framework, and this adds RHCOS tests to it? That would be a new middle ground between "external tests are in shell script" and "cosa tests which can be Go, but are lifecycled with cosa, not the OS". Anyone else have opinions? |
Personally, I don't really mind Go code being used for tests in this repo or in f-c-c. I think we've discussed in the past re. how if we want to keep draining kola of tests into the config repos, we'd probably need "cosa build"-level testing there to make I really dislike the Go vendoring model though. Maybe if we go that way, we can find some way to use the vendoring from cosa and not ship any vendoring in the config repos (which might be what you're implying already with "lifecycled with cosa"). But re. this test in specific, since this is a cluster-level test (which unlike kola tests, isn't really something developers are expected to easily run locally, right?) and OpenShift CI falls over anyway if GitHub is down, could we just ship |
I'm torn between this belonging here or in openshift/origin since this is a cluster-level test that tests interaction with RHCOS and various parts of OpenShift. I see benefits and drawbacks for both of those repos and I'm not particularly opinionated with it going in either of those places. However, I'm not sure if it belongs with coreos-assembler or even fedora-coreos-config since those seem like they're more OS-specific tests that exercise and validate the various aspects of CoreOS. But I do see the value of vendoring the dependencies for this test within either of those repos. That said, I share @jlebon's dislike of the Go vendor model and think we can get away with only the |
ac1bedf
to
461694b
Compare
/approve |
Also good with this as is. Minor suggestion to layout: add a |
Cool, thanks! Will make those changes and tidy up the test code. Will also update when this is ready for a final review. |
dc5bf39
to
0e0307d
Compare
This is now ready for a more in-depth review. |
tests/layering/layering_test.go
Outdated
// If this moves from /run, make sure files get cleaned up | ||
authfilePath = "/run/ostree/auth.json" | ||
mcoNamespace = "openshift-machine-config-operator" | ||
imagePullSpec = "registry.ci.openshift.org/rhcos-devel/rhel-coreos:%s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: why %s here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value gets replaced in with either latest
or the OCP release (e.g., 4.11
) using https://github.com/openshift/os/pull/746/files#diff-58ffaa9005e1e047a4897d0a0d8a414238a5334c730449caa62fbd8fca7cb34dR36.
This occurs here: https://github.com/openshift/os/pull/746/files#diff-58ffaa9005e1e047a4897d0a0d8a414238a5334c730449caa62fbd8fca7cb34dR75.
My reasoning was that this could run on all of the release branches as well as master
, so it should pull the appropriate image for that branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reason link not working what line is it (just for my own info when you get a chance)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/layering/helpers_test.go:75
|
||
// We haven't started the container yet | ||
if *status.Started != true { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: probably obvious but why a pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Started
field is a pointer and one has to deference it in order to check if its true or false. I probably could've done something like:
if status.Started != nil && *status.Started != true {
return false
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was thinking the same thing on that one :)
tests/layering/helpers_test.go
Outdated
// get ImagePullSecret for the MCD service account and save to authfilePath on the node | ||
// eventually we should use rest.InClusterConfig() instead of cs with kubeadmin | ||
func putPullSecretOnNode(t *testing.T, cs *framework.ClientSet, node *corev1.Node) func() { | ||
t.Logf("placing image pull secret on node") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.Logf("placing image pull secret on node") | |
t.Log("placing image pull secret on node") |
tests/layering/helpers_test.go
Outdated
|
||
return func() { | ||
// Clean up the pull secret from the node if the node has not rebooted and we do not want to delete it. | ||
t.Logf("cleaning up pull secret from node") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logf->log or alternatively add "node: %s, node.name" (or something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made that mistake a lot; sorry! I'll get them all fixed and make a note to tune my linter.
tests/layering/helpers_test.go
Outdated
// Rolls back to the original OS image that was on the node. | ||
func rollbackToOriginalImage(t *testing.T, cs *framework.ClientSet, node *corev1.Node) error { | ||
// Apply the rollback to the previous image | ||
t.Logf("rolling back to previous OS image") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) samesies as above (there are a couple more below as well)
tests/layering/helpers_test.go
Outdated
t.Logf("rebooting %s", node.Name) | ||
|
||
// Get an updated node object since this one may potentially be out-of-date | ||
updatedNode, err := cs.Nodes().Get(context.TODO(), node.ObjectMeta.Name, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updatedNode, err := cs.Nodes().Get(context.TODO(), node.ObjectMeta.Name, metav1.GetOptions{}) | |
updatedNode, err := cs.Nodes().Get(context.TODO(), node.Name, metav1.GetOptions{}) |
You shouldn't need ObectMeta to get name for node of corev1.Node type for example in our repo: https://github.com/openshift/machine-config-operator/blob/9c6c2bfd7ed498bfbc296d530d1839bd6a177b0b/pkg/daemon/node.go#L24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, you use node.Name right above here (line 309) too 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll go through and fix all of those. I'm not sure why I did that ¯_(ツ)_/¯
tests/layering/helpers_test.go
Outdated
|
||
startTime := time.Now() | ||
err = wait.Poll(2*time.Second, 10*time.Minute, func() (bool, error) { | ||
n, err := cs.Nodes().Get(context.TODO(), node.ObjectMeta.Name, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above re:objectmeta
@@ -0,0 +1,436 @@ | |||
package e2e_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to mention that I may want to eventually port this library to the MCO's testing library. With this present, we could reduce our need to shell out to oc
to execute arbitrary commands on a node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, along those lines, can you add an owners file to this tests/layering?
tests/layering/helpers_test.go
Outdated
return false | ||
} | ||
|
||
combined := string(cr.Stdout) + string(cr.Stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a space or colon or something here ie ...+ " " + string(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we do, but it couldn't hurt!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slightly better idea to do that, somewhat inspired by how os/exec
does it.
type Deployments struct { | ||
Booted bool `json:"booted"` | ||
ContainerImageReference string `json:"container-image-reference"` | ||
} | ||
|
||
type Status struct { | ||
Deployments []Deployments `json:"deployments"` | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really need to make this a library somewhere 😳
jkyros/machine-config-operator#1 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have a duplicated getRPMOStreeStatus
tests/layering/helpers_test.go
Outdated
|
||
if k8sErrors.IsAlreadyExists(err) { | ||
// We've already got an imagestream matching this name | ||
imagestream, err := cs.ImageStreams(mcoNamespace).Get(context.TODO(), imageStreamName, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how much it matters but I've been asked before to do ctx := context.TODO()
and then use ctx
when I'm calling context.TODO()
repeatedly. You have that here and a few other places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you suggested is good practice and something I should have been doing. Will fix.
tests/layering/helpers_test.go
Outdated
}, | ||
} | ||
|
||
_, err := cs.ImageStreams(mcoNamespace).Create(context.TODO(), imageStreamConfig, metav1.CreateOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be my imagination but it feels like a slightly more common pattern to call Get
first and then Create
second if it doesn't exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see that being the case. If that's what's more common, I'll adopt that pattern instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno if it actually is though, I bet @kikisdeliveryservice would know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused bc I can't find this line anymore - but it depends on what you're doing (which I can't tell bc I can't find the line)
tests/layering/helpers_test.go
Outdated
} | ||
} | ||
|
||
baseImageBuildArg := fmt.Sprintf(imagePullSpec, getBaseImageTag()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come you're grabbing this from an env var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended to run in a CI context where this information is readily available via env vars.
tests/layering/helpers_test.go
Outdated
build, err := cs.BuildV1Interface.Builds(mcoNamespace).Get(context.TODO(), buildConfig.Name, metav1.GetOptions{}) | ||
if err != nil { | ||
return fmt.Errorf("could not get build %s: %w", build.Name, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the Create
call already return the build? So you could drop the Get
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of. I don't think it's a complete build object. In other words, if you didn't supply some metadata, what it returns may be missing that metadata. However, I think that may be fine for what I'm doing with it.
tests/layering/helpers_test.go
Outdated
require.Nil(t, writeFileToNode(t, cs, node, authfileData, authfilePath)) | ||
|
||
return func() { | ||
// Clean up the pull secret from the node if the node has not rebooted and we do not want to delete it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Clean up the pull secret from the node if the node has not rebooted and we do not want to delete it. | |
// Allow cleaning up the pull secret from the node if the node will not be rebooted or deleted |
tests/layering/layering_test.go
Outdated
defer func() { | ||
// The pull secret should be cleared from the node upon reboot since | ||
// it is placed into the /run directory. However, if the node does | ||
// not reboot and we're not deleting it, we should clean it up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got confused about whether it was the pull secret or the node, maybe
// not reboot and we're not deleting it, we should clean it up. | |
// not reboot or get deleted, we should clean it up. |
3a8c8ca
to
7e24139
Compare
- Be more consistent with logf and object fields - Handle exit code 147 on reboots - Clarify comments - Build object cleanups - Consistently combine stdout and stderr - Nodecmd cleanups - Refactored build derivation code - Replace context.TODO - Allow derivation repo control via env vars - Show additional info about builds
7e24139
to
1fd41a9
Compare
After a bunch of refactoring, I think this is ready for a final review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops I started a review but forgot to click submit.
Nothing blocking though!
imageURL = ostreeUnverifiedRegistry + ":" + imageRegistry + "/" + mcoNamespace + "/" + imageStreamName | ||
) | ||
|
||
type Deployments struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is totally fine as is but we should do coreos/rpm-ostree#2389 at some point and share this between this test and the MCO.
}() | ||
|
||
// We don't have any retries, so just run the command. | ||
if runOpts.Retries <= 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a chat with jerry re hypershift one thought I had is we could use a https://kubernetes.io/docs/concepts/workloads/controllers/job/ instead of a raw pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that too. I do want to share what's in this file with the MCO test suite because doing so would allow us to completely eliminate shelling out to oc
.
Is it correct the state of things that this just adds the test, but nothing executes it today? Are you thinking we'd run the test in the periodic? Nothing stops us from having an opt-in OK but actually, don't we need to wire this test up to "launch a cluster" bits? |
From my PoV though I'm good to |
Yup. This just adds the test for now.
That was my plan. I think it would also be good to run on PRs as you mention below this.
Correct. That's being worked on in openshift/release#26912. The "magic" bit over there is the |
I can give this a final pass this afternoon |
This looks great, Zack. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, cheesesashimi, jlebon, kikisdeliveryservice The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@cheesesashimi: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This PR introduces the OpenShift in-cluster image derivation test. What this does is introduce a new test which will test that in-cluster image derivation works as we intend. The test works thusly:
rpm-ostree
.This is a deceptively large PR because it introduces a bunch of vendored dependencies. For ease of review, I added the vendored dependencies in a commit by themselves. Additionally, since I wasn't sure about the impact my changes would have upon the
coreos-assembler
image, I opted to multistage the Dockerfile build so I could inject a pre-compiled version of the layering test suite into thecoreos-assembler
image.The test itself originally came from openshift/machine-config-operator#2886.