-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "build: add asan check in Github action" #32324
Conversation
I suggest that we not add tests that always fail, for example #32310 (recent PR, picked at random for being a npm dep update so completely unrelated to ASAN it should not fail). |
cc @gengjiawen |
Also, see 9dac823 |
For context, it was not always failing, it started failing recently. But I'm fine reverting it for now |
I think the decent way is adding continue-on-error like we did in V8 8.1 |
Good to know. The test should be done in the CI docker build infrastructure if its a feature we care to ensure always passes, and have some reason to think it can stop working because of a PR. I don't think we should test things that don't have a reasonable chance of regression, human resources to setup builds and CPU/memory resources as well are not infinite. If we don't care enough to block PRs when it fails, I don't think it should be tested in github actions. Showing big red Xes to every single collaborator on every PR but not expecting them to pay attention or do something about the failure is off-putting and confusing to contributors. While the other github action test results don't block PRs in a technical sense ( |
the build failed simply because memory issue. the docker env is 19.10, Ubuntu 18 a different issue. |
PR got red Xs is annoying and mislead, so add continue-on-error for now is workaround. we see what happens when upgrade to V8 8.2. |
To elaborate on that: the ASAN build is also a Debug build, and some changes will cause the linking phase to use more memory than available on Actions (which is 7Gb). We could move it to Jenkins on a worker with more memory (probably the same we use for Debug builds already). OTOH I'm not entirely convinced that only building ASAN is enough, we should be running the test suite and take it into account as well (I think today we run it but allow it to fail). So as it is right now, we're only making sure it builds, which might be ok, but looks like a missing opportunity. On a side note: do we need the ASAN build to be a Debug build? Or could it be a Release build? |
The error is happening without V8 upgrade, so there's no guarantee it will be fixed on 8.2. And as I mentioned before, there are many changes that can make the ASAN CI fail due to memory limitations. |
Why run a test in github actions against every single PR (often multiple times per PR as the branch is repushed) just to then ignore its results? Tests should be run because they are relevant to all PRs, in the sense that the result of the test is actionable by the PR author. If someone wants to periodically check ASAN to make sure it still works (again, is there any change we could make in node.js that would cause it to stop working?), they can just run that check locally on occaison. Or someone could add a new CI job in jenkins on a weekly cron that did nothing but configure and run tests with ASAN on a single platform, and those interested could watch it, or perhaps even add their emails to the notify-on-failure. This doesn't seem to me to be a use-case that fits well with github actions. |
I think ASAN build is important and should be included in every PR (just like V8 does). For those not familiar, ASAN stands for AddressSanitizer), it helps identify common memory and pointer related issues. But as it is right now it doesn't work well for its purpose. Which brings me to this question again: "do we need the ASAN build to be a Debug build? Or could it be a Release build?" If we can run ASAN with Release builds, it'll take less time to run (today it takes ~3h with tests), and it's less likely there'll be linking issues. Also, based on our current policy, there should be an ASAN build on Jenkins if we want to block PRs when ASAN build or test fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree there's no point on using resources from GH Actions (we can only run 20 jobs in parallel in this org) if we're not enforcing this on every PR. I still think it's worth having ASAN checks as part of our test suite, but we need to explore other venues to do so.
Why not. ASAN checks should be run on every single PR. It's not a bonus part. It's an essential part.
The debug build will show the clear stacktrace, so developer can find where exactly the code cause the problem.
Current ASAN check already contains test, it already collect failed result as well.
We care. We just not clear current issues yet. In the long run we should remove all failure node/.github/workflows/ASAN.yml Lines 17 to 18 in 960df3f
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strongly against remove this from CI.
For some context, before introducing ASAN in their CI, V8 cleared the issues to make sure the codebase was compatible with ASAN. I think this is a better approach than having a perma-red build (or worse, a green build which is actually failing).
I just tried a ASAN Release build, and I got stack traces:
I think it's worth using the Release build instead, it will be a lot faster to run and it'll be less prone to breakage due to OOM issues. If we really want Debug ASAN builds, it should be on Jenkins (if we have an instance large enough for it). |
If we tell people to ignore CI results for some platforms/tests/configurations, then they will ignore them for nearly all platforms/tests/configurations. We've seen this in the past. If we care about the results, let's run it in an environment where it is likely to pass, whether that's Release build with GitHub Actions, Debug build with Jenkins, or whatever. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@gengjiawen You haven't made a case for why this test should be run. You know it fails, its not going to pass until "the long run", the negative effects have been pointed out.
See above thread for "Why not". Also, given that it currently fails, its clearly not essential for everyone. At the time the GH actions were added ASAN passed, which is why it was acceptable, though there were concerns at the time. It now fails. If your hope was that by running the tests in the GH actions they would cause the feature to be kept up to date, that has demonstrably NOT happened. I have provided several alternative CI setups above that would better serve the goals of keeping ASAN working:
If the current failures are in V8, an external dependency, failures might be out of the control of the Node.js project. You might need to work directly with the V8 team to add ASAN tests as blocking release criteria for their CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We should work on getting this working again properly soon after!
Changed from Debug build to Release build to avoid out of memory issues during the link step. This should also speed up the build. Commented out the test runner for now, since there are too many ASAN warnings there to be useful. We should re-enable the test runner once our ASAN warnings are fixed. Simplify the environment: instead of running on a container, runs directly on the host. With this, ASAN build looks identical to other builds in GitHub Actions, with the exception of the `--enable-asan` flag. Ref: https://github.com/nodejs/node/issue/32257 Ref: nodejs#32324
@sam-github @gengjiawen I opened #32352 as a compromise: it should fix the build on Actions, it accomplishes the same as before (ensures we don't break the build, not the tests), and it should be a lot faster. Of course, let's wait for the CI to finish to make sure it works before making any decisions. |
Ok, turns out even the release build is too heavy for GitHub Actions. My suggestion is moving this to Jenkins (if we have an instance large enough for it). |
Yeap, I agree on this. But once we fixed all current issue. It will find memory leaks in every PR. I looked current solution as flaky test, we can't just remove it.
It's essential for Node.js stability, c/c++ is not rust. I want we find memory leaks earlier instead of user run a service long time then segment fault or OOM.
I think this is doable. But before this landed I don't think it's a good idea we remove it from here. In the end, I want to state that if this further can be done via github acition. I prefer it done in GA. Here are reasons:
Edit: |
I think first we need a list of the current issues. And there's a good chance our ASAN build is misconfigured (V8 has specific configuration which we should be using when building it, but I don't think that's the case). |
@sam-github in order to override and objection by a collaborator when the consensus cannot be reached I believe we should have an issue in the TSC repo that references the issue, and TSC consensus in that issue that we should override the objection. It if it urgent, then I'd follow it up with an email on the TSC mailing list with the ask that people chime in on the issue by date/time X (as required by the urgency). |
I opened #32380 to work on properly enabling sanitizers on our CI, starting with ASAN. I still think this should be removed from GitHub Actions until:
|
That seems reasonable and I'll go open an issue now, but I'm also adding
|
|
Just throwing in #32143 which limits it to running on PRs that touch C/C++ files. Doesn't stop falls positives, but won't run on other types of PRs. Feel free to close that one if this lands though |
I still rather remove the Action than land #32143. When we re-add ASAN, we can skip it for |
We are not even set a pipeline in Jenkins to do ASAN check now. Removing this without any alternative solution being applied looks not okay with me. |
We already don't have any ASAN checks in our CI: this Action always fail, and it will keep always failing for the foreseeable future. Removing it doesn't mean it will never be added again in the future, it means we won't be wasting 1 (per PR) machine out of 20 concurrent machines available for the org on a test we know will fail. As mentioned before, when we have similar issues on Jenkins, we remove the machine from rotation. If we want Actions to be treated as a official build as well, it should follow the same rule. It also means we won't be sending the wrong signal to contributors: having an ASAN check showing up gives the false impression we're building and testing with ASAN, and that tests are passing. Not only the build is not working, if it works, the tests are not passing (and from what I've seen, several violations can actually be memory leaks and other issues), but we're showing as if it was. As soon as we figure out an alternative, we can re-add it, either as an Action, as a Jenkins job, or as whichever alternative we came up with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@gengjiawen let's land this and open an issue to add an asan check to our CI pipeline. We can reference this PR as a source of possible infrastructure solutions. If you would like to drive the issue forward @nodejs/build can no doubt help you come up with the right infrastructure solution for building with such high memory requirements. Travis doesn't seem up for the job. |
@gengjiawen we have #32380 for setting up sanitizers. |
This reverts commit 3ec4b21. See: #32257 PR-URL: #32324 Reviewed-By: Matheus Marchini <mat@mmarchini.me> Reviewed-By: Jiawen Geng <technicalcute@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Beth Griggs <Bethany.Griggs@uk.ibm.com> Reviewed-By: Gabriel Schulhof <gabriel.schulhof@intel.com>
Landed in 1d49558 |
This reverts commit 3ec4b21. See: #32257 PR-URL: #32324 Reviewed-By: Matheus Marchini <mat@mmarchini.me> Reviewed-By: Jiawen Geng <technicalcute@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Beth Griggs <Bethany.Griggs@uk.ibm.com> Reviewed-By: Gabriel Schulhof <gabriel.schulhof@intel.com>
This reverts commit 3ec4b21. See: #32257 PR-URL: #32324 Reviewed-By: Matheus Marchini <mat@mmarchini.me> Reviewed-By: Jiawen Geng <technicalcute@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Beth Griggs <Bethany.Griggs@uk.ibm.com> Reviewed-By: Gabriel Schulhof <gabriel.schulhof@intel.com>
This reverts commit 3ec4b21.
It always fails.
See: #32257
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes