-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v2.5.x] Fix stray bracket breaking pytest; fix include-patch for cross-compilation #346
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe/meta.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13223778212. Examine the logs at this URL for more detail. |
pytest-xdist
Sigh, what's happening with the windows builds now?
This doesn't seem to affect azure-pipelines, so it looks like it's specific to the windows server. Did anything change there recently? @wolfv @baszalmstra |
I’ve seen the same on azure recently, restart fixed it .. |
I had tried restarting, but it didn't work. 🤷 The good thing is that the windows builds here aren't relevant (because nothing changed compared to the previous published builds), but for #326 I'll need to get this going again. |
OK, now that the test suite runs, there's some more failures in the MKL plus CUDA job, but for tests that are explicitly "exercising terrible failures"
The test also has a bunch of skips à la: @unittest.skipIf(IS_WINDOWS, "Skipped on Windows!")
@unittest.skipIf(SM90OrLater and not TEST_WITH_ROCM, "Expected failure on sm90")
@unittest.skipIf(IS_FBCODE and IS_REMOTE_GPU, "cublas runtime error")
@skipCUDAIfRocmVersionLessThan((6, 0))
@onlyCUDA OTOH, I don't know where the It might have something to do with changing the ATEN include directory? pytorch-cpu-feedstock/recipe/patches/0018-make-ATEN_INCLUDE_DIR-relative-to-TORCH_INSTALL_PREF.patch Lines 21 to 22 in be20390
But still weird that this would only show up for MKL. |
OK, more digging on the version = _get_torch_cuda_version() Because if I remove some of the content of the if branches in the stacktrace above:
Likewise, another failure looks like this:
and if version had the correct value, it should be impossible for us that |
OK, we're finally back to a fully green CI (first time since merging #331), but unfortunately, the stream of issues hasn't abated yet. In the meantime, we've had #350 & #354 come in. Given the ~24h necessary for a full CI run, I'm going to stop iterating on this now and get the fixes for the previous set of issues out the door at least. Though I really wanted to merge a green CI as-is this time, I'll pick up the fix from #355, which is low-risk. Unfortunately, my tentative fix for #354 in #326 (c92777a) didn't work out. I'll iterate on this separately from this PR; on windows we're not as resource-constrained as on linux, so this shouldn't take too long. Famous last words 🤞 |
Sigh... The linux+CUDA+openblas job had been passing a number of times, but failed upon merge with
I'll restart once the rest of the CI finishes, and if it happens a second time, we can add a skip. |
Fixes #348
Fixes #349
Previously:
Similar to #344 and #345, test if removing cures the issues we're seeing on the CUDA MKL build.