Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where step status ordering did not match step container ordering #3679

Merged
merged 1 commit into from Jan 16, 2021
Merged

Conversation

ghost
Copy link

@ghost ghost commented Jan 12, 2021

Changes

Fixes #3677

Pipelines sorts container statuses so they're in the same order as steps. The sorting is not currently working right. See #3677 for an example of this where image-digest-exporter was sorted before some user steps. This confuses a lot of things - the Succeeded status records the wrong step as original source of failure, the Dashboard shows the wrong step as the source of failure in a TaskRun, and also shows some Steps as "Not Run" even if they did.

This commit replaces the custom sort code with a much simpler version: just walk the containers and place the statuses in the same order.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes tests (if functionality changed/added)
  • Commit messages follow commit message best practices
  • Release notes block has been filled in or deleted (only if no user facing changes)

Release Notes

Fixed bug where sorted order of step statuses did not match order of step containers.

Prior to this commit it was possible for Tekton Pipelines to produce an ordering
of Step Statuses that did not correctly match the order of Step Containers. This
confuses the reconciler when figuring out the Succeeded status - the TaskRun could
end up with the incorrect Step recorded as the original failure. This also meant
that downstream projects like the Dashboard became confused about how to render
the TaskRun: they'd display the wrong order of steps and incorrectly show that
steps as "Not Running" when in fact they did.

This commit removes the custom `sort` code and just walks the list of step containers
in order, setting the status for that container to the same position container.
@ghost ghost added the kind/bug Categorizes issue or PR as related to a bug. label Jan 12, 2021
@tekton-robot tekton-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jan 12, 2021
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 12, 2021
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 92.4% 92.0% -0.4

@ghost
Copy link
Author

ghost commented Jan 12, 2021

Running unit tests locally I don't see any errors. I'm going to try rerunning that.

/test tekton-pipeline-unit-tests

@ghost
Copy link
Author

ghost commented Jan 12, 2021

The build tests passed and the integration tests are "still running" according to their prow pages. There's something really funky going on with the github status reporting here. Re-running one-at-a-time to see what happens.

/test pull-tekton-pipeline-build-tests

@ghost ghost removed the kind/bug Categorizes issue or PR as related to a bug. label Jan 12, 2021
@ghost
Copy link
Author

ghost commented Jan 12, 2021

/test pull-tekton-pipeline-integration-tests

@ghost ghost added the kind/bug Categorizes issue or PR as related to a bug. label Jan 12, 2021
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 14, 2021
@pritidesai
Copy link
Member

wow a huge simplification, thanks @sbwsg 👍

Just by creating a map of container names and its status, then iterating over the desired order of step containers and changing the order of container statuses based on that order.

This might leave us with a list of container statuses with duplicate containers and/or some missing if the for loop is interrupted which can only be possible when a process is killed. And should not be concern here.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 15, 2021
@tekton-robot tekton-robot merged commit 7e5ab1a into tektoncd:master Jan 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

image-digest-exporter step appears out of order with its position in pod
3 participants