Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Projects getting spammed with repeated bugs that are in dependencies and we can't directly fix #915

Closed
thomasvl opened this issue Dec 27, 2019 · 8 comments
Assignees
Labels

Comments

@thomasvl
Copy link
Member

bazelbuild/rules_swift#369
bazelbuild/rules_swift#368
bazelbuild/rules_swift#358

Please stop filing these since the projects can't do anything about them themselves.

@fweikert
Copy link
Member

Sorry for the noise. The new issues are the „real“ ones, whereas the previous ones shouldn’t have been filed due to several bugs. There won’t be any more issues unless someone creates a new flag.

I’ll see whether we can add some logic for handling upstream breakages.

fweikert added a commit that referenced this issue Dec 27, 2019
fweikert added a commit that referenced this issue Dec 27, 2019
@fweikert fweikert added P1 and removed P0 labels Dec 27, 2019
@junyer
Copy link

junyer commented Dec 28, 2019

Sorry for the noise. The new issues are the „real“ ones, whereas the previous ones shouldn’t have been filed due to several bugs. There won’t be any more issues unless someone creates a new flag.

At this point, 390+ issues have been filed across 50+ projects. Please stop @bazel-flag-bot from sending notifications for now. Which is to say, either put it into dry-run mode or turn it off entirely.

Also, given that these problems started nine days ago, please write a postmortem. It appears that various things could have been done better. Do Bazel folks not have a freeze at this time of year?

@fweikert
Copy link
Member

fweikert commented Dec 30, 2019

Mini Post-Mortem

Summary

A new feature was rolled out that filed > 300 issues across > 50 GitHub repositories, most of which had incorrect titles & contents, and all of which were not actionable due to a bug in Bazel.

Context

The Bazel team maintains a pipeline on its CI that tests several projects with incompatible flags that are going to be flipped in a future Bazel release. This pipeline runs every night.
In the past the team had complained that it had to check the pipeline results manually, and had to reach out to projects that needed to migrate. As a result, I filed and worked on a feature request to add automated notifications to the pipeline. The basic idea was that CI would file a GitHub issue for every (incompatible flag; project) combination.

Timeline

What went wrong (user impact)

  • Projects received multiple new issues that were not actionable: Most failures were caused by a bug in @bazel_tools, not by the notified projects. Consequently, the project owners couldn't do anything about the issues.
  • Additionally, the first set of issues suffered from multiple problems:
    • Incorrect Bazel version: The title referred to the current Bazel version, 1.2.1, which was clearly a bug.
    • Formatting issues: Links were not displayed correctly.
    • Lack of context: The issues didn't link to the documentation of the incompatible flag.

Root causes

I see two major root causes that led to the current situation:

  1. Technical & operational problems: The Bazel CI scripts rely on functions offered by our CI system, Buildkite, which makes it basically impossible to run them on a local developer machine. As a result, we have to test them on Buildkite. During testing we use fake configurations to hide changes from users, which in this case meant that all issues were to be created in a dummy GitHub repository. Unfortunately once the first iteration of the script was done, I enabled it without realizing that it didn’t contain the fake configuration (yet). I should have double-checked and started a manual run instead.
  2. Judgement error: When I activated the script for a second time, I was aware (and expected) that duplicate issues would be filed. I decided to proceed due to several reasons, but unfortunately I underestimated the negative impact on our users:
    1. Since the first set of issues was completely broken, I wanted to get a new set with correct details and additional links to relevant documentation.
    2. Due to the release of Bazel 2.0 on December 19, most of the original issues were irrelevant as their flags had already been flipped. Only four incompatible flags were left, which meant that the number of new issues would be low. Additionally, the new issues would indicate how much work was actually left to prepare for future releases.
    3. I had hoped that the underlying breakage was resolved by the Bazel 2.0 release (reminder: “hope is not a strategy”).

Lessons learned

  • Be more conservative with new features when there is visible impact to (multiple) users and projects.
  • Double check that user-visible changes are limited to dummy environments during testing.

@fweikert
Copy link
Member

Regarding the freeze: We do have a production freeze inside Google, which meant that I finally had time to work on this Bazel-only feature (which isn't affected by the freeze).

@thomasvl
Copy link
Member Author

@fweikert thanks for the write up.

Is the system able to look at closed issues on the repos before filing something? (can it look for open issues?) i.e. - should it be able to realized something was already filed (and maybe closed) and thus reopen/comment on that other issue rather than opening up another one?

@fweikert
Copy link
Member

Yes, the script checks the title of all existing issues created by bazel-flag-bot, and does not file a new issue if there is already one for a certain (flag, project, Bazel version) combination.

For example, if the script created an issue titled "Flag --incompatible_foo will break Envoy in Bazel 3.0", it will never create a new issue for Envoy and --incompatible_foo, unless we decide to postpone the flag flip to 4.0 or later (this case can be improved).

For some flags we haven't decided on when to flip them. In this case the script will create an issue titled "Flag --incompatible_bar will break rules_perl in a future Bazel release". Once we have a target release, the script will edit the title of the existing issue to mention the concrete Bazel version, but it won't create a new issue.

@junyer
Copy link

junyer commented Dec 31, 2019

@fweikert thanks for the write up.

+1. Thank you, @fweikert.

@laurentlb
Copy link
Contributor

Most failures were caused by a bug in @bazel_tools

This shows that a flag was not ready. Maybe the flag owner should opt in, before the bot starts creating bugs.

The basic idea was that CI would file a GitHub issue for every (incompatible flag; project) combination.

Something easier to roll out: send a weekly email to a mailing-list (e.g. bazel-dev until we have something better) with the summary of the breakages. If there's an issue with the script or a flag, it can be discussed on the mailing-list, instead of having issues in 50 repos.

laurentlb added a commit to bazelbuild/bazel-website that referenced this issue May 20, 2020
This was discussed in the past, but we never updated the policy.
As this came up recently in a discussion with Lukács, I'm doing sending this PR. The goal is to acknowledge that the Bazel team is not responsible for updating GitHub repositories (outside a few ones that we own).

* Notifying the owners is a step that should be automated (bazelbuild/continuous-integration#915).
* 14 days is aligned with the policy for recommended rules (https://www.bazel.build/recommended-rules.html)
laurentlb added a commit to bazelbuild/bazel-website that referenced this issue Jun 19, 2020
This was discussed in the past, but we never updated the policy.
As this came up recently in a discussion with Lukács, I'm doing sending this PR. The goal is to acknowledge that the Bazel team is not responsible for updating GitHub repositories (outside a few ones that we own).

* Notifying the owners is a step that should be automated (bazelbuild/continuous-integration#915).
* 14 days is aligned with the policy for recommended rules (https://www.bazel.build/recommended-rules.html)
@philwo philwo closed this as completed Sep 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants