update: Always use `podman pull+cp` #2751

cgwalters · 2021-09-02T17:20:42Z

A while ago we switched to using oc image extract in order
to reduce the I/O writes done to the host, but it turned out
that doesn't yet work in disconnected environments that need
ImageContentSourcePolicy.

Now, in https://bugzilla.redhat.com/show_bug.cgi?id=2000195 we discovered
that the podman fallback broke due to user.* extended attributes
in the content (which will be removed soon hopefully).

But still, a good part of the value proposition of OpenShift is that we
work consistently across platforms. Having two ways to apply
OS updates is not worth the maintenance overhead.

Eventually this flow will be more native to rpm-ostree, xref
coreos/fedora-coreos-tracker#812
and
https://github.com/ostreedev/ostree-rs-ext/#module-container-encapsulate-ostree-commits-in-ocidocker-images

openshift-ci · 2021-09-02T17:21:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cgwalters
To complete the pull request process, please assign kikisdeliveryservice after the PR has been reviewed.
You can assign the PR to them by writing /assign @kikisdeliveryservice in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

A while ago we switched to using `oc image extract` in order to reduce the I/O writes done to the host, but it turned out that doesn't yet work in disconnected environments that need ImageContentSourcePolicy. Now, in https://bugzilla.redhat.com/show_bug.cgi?id=2000195 we discovered that the podman fallback broke due to `user.*` extended attributes in the content (which will be removed soon hopefully). But still, a good part of the value proposition of OpenShift is that we work *consistently* across platforms. Having two ways to apply OS updates is not worth the maintenance overhead. Eventually this flow will be more native to rpm-ostree, xref coreos/fedora-coreos-tracker#812 and https://github.com/ostreedev/ostree-rs-ext/#module-container-encapsulate-ostree-commits-in-ocidocker-images

yuqi-zhang · 2021-09-02T17:31:06Z

Looking at #1941 it seems we had selinux concerns as well. If anything we should wait until 4.10 branches.

cc @sinnykumari for some more context.

cgwalters · 2021-09-02T19:18:05Z

Looking at #1941 it seems we had selinux concerns as well.

Yeah, a definite possibility. (I didn't test this before submitting BTW)

But the thing is, if there are SELinux or really any other issues, then those would also break in the case of disconnected - and we need that to work. So I think it's better to have one way to do it and ensure our single OS update path is tested on each e.g. e2e-aws run.

(edit: and accept the cost of this extra I/O for control plane and other hosts for now)

cgwalters · 2021-09-02T19:45:58Z

While e2e-agnostic-upgrade failed, AFAICS it failed on a known flake - the OS update part worked.

sinnykumari · 2021-09-03T09:06:47Z

If I understand correctly PR coreos/coreos-assembler#2406 is the actual fix for BZ#2000195 and this PR is a cleanup to keep one flow for OSImage extract?

I am not sure or convinced that we should go back to using podman for all the cases. Since 4.6, oc image extract is the default way of extracting OS image and effort was made to keep code neat and make update process faster. Today, podman cp is just a hacky code in MCO to workaround disconnected environment or any other fallback cases which will automatically won't be used when oc supports missing usecases.

Debugging a failure shouldn't be difficult with current two flow because podman is always a fallback.
Also as noted in bug BZ#2000195 issue shouldn't occur in 4.8+, perhaps we should keep the current OSUpdate workflow and make things better when we will rework on OSUpdate as part of MCBS changes.

cgwalters · 2021-09-10T21:45:56Z

If I understand correctly PR coreos/coreos-assembler#2406 is the actual fix for BZ#2000195 and this PR is a cleanup to keep one flow for OSImage extract?

Yeah, I hope anyways. It hasn't been verified yet.

Debugging a failure shouldn't be difficult with current two flow because podman is always a fallback.

I think the problem isn't so much debugging it (though it was a bit tricky in this case) so much as the fact that a customer OS update hit it in the field. What if it turned out the podman fallback had been silently broken? Then it'd be much harder to work around; we'd need to ship them a custom MCD, etc.

Hmmm...I think we don't even have an informing job for disconnected in Prow? That seems like a large oversight.

So my argument to merge this is basically disconnected is important and we should only have one OS update path. But, if you prefer to close this that's also OK by me.

Ultimately we will be replacing this bit with the ostree-native container bits which should be a lot better than either, but that's still a ways away and a fair bit of time for any new bugs in the podman path to creep in to be hit only in the field.

sinnykumari · 2021-09-13T10:16:15Z

I think the problem isn't so much debugging it (though it was a bit tricky in this case) so much as the fact that a customer OS update hit it in the field. What if it turned out the podman fallback had been silently broken? Then it'd be much harder to work around; we'd need to ship them a custom MCD, etc.

Hmmm...I think we don't even have an informing job for disconnected in Prow? That seems like a large oversight.

right, we don't have good ci coverage for disconnected.

So my argument to merge this is basically disconnected is important and we should only have one OS update path. But, if you prefer to close this that's also OK by me.

+1

Ultimately we will be replacing this bit with the ostree-native container bits which should be a lot better than either, but that's still a ways away and a fair bit of time for any new bugs in the podman path to creep in to be hit only in the field.

right, make sense considering ostree-native container work is going to take a while. As mentioned earlier using podman copy was a quick hack, need to think a little bit about any other impact or code changes needed before we merge it. Will get back to this either this week or once back from PTO.

openshift-ci · 2021-12-21T23:04:03Z

@cgwalters: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-vsphere-upgrade	`fb5a223`	link	`/test e2e-vsphere-upgrade`
ci/prow/e2e-aws-workers-rhel8	`fb5a223`	link	`/test e2e-aws-workers-rhel8`
ci/prow/e2e-aws-workers-rhel7	`fb5a223`	link	`/test e2e-aws-workers-rhel7`
ci/prow/e2e-aws-single-node	`fb5a223`	link	`/test e2e-aws-single-node`
ci/prow/e2e-gcp-op-single-node	`fb5a223`	link	`/test e2e-gcp-op-single-node`
ci/prow/e2e-aws-disruptive	`fb5a223`	link	`/test e2e-aws-disruptive`
ci/prow/e2e-aws-upgrade-single-node	`fb5a223`	link	`/test e2e-aws-upgrade-single-node`
ci/prow/e2e-agnostic-upgrade	`fb5a223`	link	true	`/test e2e-agnostic-upgrade`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci bot requested review from kikisdeliveryservice and sinnykumari September 2, 2021 17:21

cgwalters force-pushed the podman-pull-always branch from 5c251a3 to fb5a223 Compare September 2, 2021 17:26

cgwalters closed this Mar 9, 2022

cgwalters mentioned this pull request Jan 16, 2023

OCPBUGS-5379: First boot service should timeout on extract to be able to retry, adding more logs to extract coammand #3490

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update: Always use `podman pull+cp` #2751

update: Always use `podman pull+cp` #2751

cgwalters commented Sep 2, 2021

openshift-ci bot commented Sep 2, 2021

yuqi-zhang commented Sep 2, 2021

cgwalters commented Sep 2, 2021 •

edited

Loading

cgwalters commented Sep 2, 2021

sinnykumari commented Sep 3, 2021

cgwalters commented Sep 10, 2021

sinnykumari commented Sep 13, 2021

openshift-ci bot commented Dec 21, 2021

update: Always use podman pull+cp #2751

update: Always use podman pull+cp #2751

Conversation

cgwalters commented Sep 2, 2021

openshift-ci bot commented Sep 2, 2021

yuqi-zhang commented Sep 2, 2021

cgwalters commented Sep 2, 2021 • edited Loading

cgwalters commented Sep 2, 2021

sinnykumari commented Sep 3, 2021

cgwalters commented Sep 10, 2021

sinnykumari commented Sep 13, 2021

openshift-ci bot commented Dec 21, 2021

update: Always use `podman pull+cp` #2751

update: Always use `podman pull+cp` #2751

cgwalters commented Sep 2, 2021 •

edited

Loading